CS226 - Fall 2012 - Day 21 SPLAY Trees (still BST) ------------------------------------------------------------------------- Main idea is to move elements closer to the root each time they are accessed so that their next access is faster. This is still a binary search tree, but not necessarily balanced. http://en.wikipedia.org/wiki/Splay_tree - rotations after every operation (including find) - move accessed node to root - zig (if accessed node is child of root) - zig-zag (like double AVL rotation) - zig-zig (like 2 single AVL rotations) - which node to splay: - find: found node or leaf where search ends (if not found) - insert: inserted node - remove: parent of actual leaf node that gets removed - complexity - O(d) to splay node at depth d - worst case O(h) for any operation (splay leaf) - M operations cost M log N time - amortized cost of each operation is O(log n) B-trees in brief: - m-ary search tree has <= m children instead of just 2 - useful for disk accesses instead of putting all data in memory GENERAL ADTs which BSTs are used to implement ======================== MAP ---------------------------------------------------------------- - key,value pairs - each pair is an Entry (nested class) - unique keys!! - operations: - isEmpty, clear, size - containsKey(KeyType key) - ValueType get(KeyType key) - ValueType put(KeyType key, ValueType value) Map.Entry nested class: - KeyType getKey() - ValueType getValue() - ValueType setValue(ValueType newval) - no iterator! has methods to get different groups of values instead - Set keySet() - Collection values() - Set> entrySet() TreeMap implementation - balanced binary search tree - logarithmic operations - usually Red-Black tree (not AVL) DICTIONARY ---------------------------------------------------------------- - similar to a Map - can have multiple entries with same key - additional operations: findAll, removeAll Ordered Maps & Dictionaries - must have comparator for keys!! HASHING ---------------------------------------------------------------- (Unordered) Map or Dictionary implementation Java API: HashSet, HashMap - elements must have equals and hashCode methods Example1: zip codes - associated with areas 5 digit numbers not all 100,000 are valid zip codes say maybe 40,000 are zip codes use array of about 50,000 to store have integer values, every value needs to get mapped to an index [0,49999] hash function: function to go from an integer key value to an integer in the range of the size of your hash table h(zip) = zip / 2 Example2: JHU IDs - 6 character alphanumeric code hash code: takes key and transforms it into an integer 1) get the ascii code of each character, add up 2) concatenate ascii codes instead of adding 3) multiply each character ascii code by different weight HASH TABLES -------------------------------------------------------------------------- - for (unorderd) maps & dictionaries - ops worst case O(N), average case O(1) - implement with bucket array: M buckets, indexed 0 to M-1 - if keys in range 0 to M-1 = just use key as index - hashing consists of code computation + compression - hash code: rule for assigning integers to keys - Java has hashCode() method for all objects - not always helpful - summing components - permutations give same hash code - polynomial summation let key be tuple (x0, x1, x2, ... xk-1), a != 1 hashcode = x0 a^k-1 + x1 a^k-2 + ... + xk-2 a + xk-1 use Horner's rule to compute efficiently: (((x0)a + x1)a + x2)a etc. - cyclic shift use bit operations - compression function: way to map hash codes to range 0 to M-1 - division method: code mod M - make M prime to reduce collisions - MAD (multiply, add, divide) method for key k abs(ak + b) mod M a > 0 is scaling factor b >= 0 is shift IMPORTANT TO CHOOSE HASH & COMPRESSION TO SPREAD AS MUCH AS POSSIBLE