CS226 - Day 23 - Fall 2012 - load factor: N/M where N is #entries, M is size of table (#buckets) - if N is O(M), load factor -> 1 - if it gets too high, rehash - make bigger array and remap codes - make new size = smallest prime > 2*old size - when is load too high? depends on collision strategy - when to rehash? >.5 load, some other load, insert fails - for complicated hash codes, store instead of recomputing Collisions: - more than one key might get hashed to the same array index (especially if implementing dictionary with multiple keys) - two main collision handling techniques: - chaining - open addressing Chaining: each bucket is sequence of elements w/same hash value - put multiple entries in same array slot - simple unordered linked list in each slot - on average O(load) to find & delete, O(1) to insert - Ordered array in each slot (if keys are comparable) - on average O(log load) to find, O(load) to insert & delete - BBST in each slot - on average O(log load) for all - issues? Alternate collision schemes - open addressing: - Linear Probing: choose next available slot (h(k) + i) mod M, where i = 0 to M-1 - Quadratic Probing: jump further ahead (h(k) + i^2) mod M, where i = 0 to M-1 avoids clumping better than linear Could fail, but won't if M is prime and load < .5 - Double Hashing: add 2nd hash function * factor (h(k) + i*h'(k)) mod M, where i=0 to M-1 - could fail - choice of h' is very important for spread - overhead of computing new hash function - Open addressing causes issues when the item in a slot that caused a conflict gets deleted - How do you know if other items were put elsewhere? - Flag items as deleted instead of creating holes - Until another item takes spot - Until rehash - Affect load calculation? - Extendible hashing? - for minimizing disk accesses (like B-trees) EXAMPLE -------------------------------------------------- Insert: cat, bat, tap, mad, dam, nap, qat, pat Hash code: a-z gets mapped to 1-26, sum the codes cat: 24 = 3+1+20 bat: 23 tap: 37 = 20+1+16 mad: 18 = 13+1+4 dam: 18 nap: 31 = 14+1+16 qat: 36 pat: 37 = 16+1+20 Table size: load < .5, M = 17, H(x) = x % M = x % 17 H(cat) = 7, H(bat) = 6, H(tap) = 3, H(mad) = 1, H(dam) = 1, H(nap) = 14, H(qat) = 4, H(pat) = 3 Linear probing: [1] mad [2] dam [3] tap [4] qat [5] pat [6] bat [7] cat [14] nap Quadratic probing: [7] cat, [6] bat, [3] tap, [1] mad, [2] dam, [14] nap, [4] qat, [12] pat (not 3, 4, 7) [1] mad, [2] dam, [3] tap, [4] qat, [6] bat, [7] cat, [14] nap, [12] pat See hashProblems.html exercise sheet for more examples!! Completed hash table worksheet: www.cs.jhu.edu/~joanne/cs226/notes/hashSolutions.pdf Reviewed hw8 briefly.