CS226 - Day 25 - Spring 2013 Problem A: Consider this little problem: Suppose I have a set of N students and I want to recommend the top k of them for a scholarship. How do I quickly determine which are the top k? Approach 1: Sort the array, take top k Time: Mergesort is N log N + k get top Approach 2: Create a AVL (BBST) tree, put k elements in tree, for each of the other N-k elements, compare to minimum in tree, if curr > min, delete the minimum, add the current. Time: find min is log k, delete min is log k, add curr is log k, repeat N-k times => (log k)*(N-k) = N log k - k log k Approach 3: Take all N, look at adjacent pairs, pick the better of each pair to move up to next level, repeat until only 1 left, work your way down the line of losers to the best one Time: N steps to find the best, ?? to get the k top ones Approach 4: Put all elements in BBST, take rightmost branch with top k Time: N log N to insert all, + k Approach 5: Create hash table, using GPA as key to hash, chaining for collisions, have N/k buckets (maybe k in each), take people out of highest bucket, keep working down, might have to sort the last bucket Time: N to put in hash table, pull out k, might have to sort a bucket - depends on size of that bucket - good distribution k/bucket, k log k to sort, bad distribution - N log N to sort Problem B: Suppose I have a set of jobs to do, each with a priority, and I need to figure out which to work on next. How do I keep the jobs rolling? What do these problems have in common? Let's invent a data structure to use in solving them... PRIORITY QUEUE ADT: - Entry object: (key, value) pair, keys must be comparable - main methods: - insert(entry) - removeMin() - min() (get) (Could create and use complementary maximum priority queue instead.) Applications: - job scheduling - resource allocation - traffic control Variation: ADAPTABLE PRIORTY QUEUE changePriority operation (we'll come back to this) Implementations: hash table: fast inserts, depends on collisions, good if lots of different priorities balanced binary search tree: log N insert, log N findMin, log N deleteMin (circular) ordered array: N to insert, constant to findMin, deleteMin (like insertion sort) unordered array: constant insert, N to findMin, deleteMin (like selection sort) ordered/unordered arrays could be linked lists too... BINARY HEAP - complete binary tree: - N nodes, height log N - full except for rightmost last level - last node corresponds to level numbering - order property: every node key (priority) <= it's childrens' keys (priorities) RECALL: Ranked Sequence array rep of complete binary tree - level numbering: f(v) = 1 if root = 2f(parent(v)) if left child = 2f(parent(v))+1 if right child - root, parent, leftChild, rightChild, isInternal, isExternal, isRoot - iterator methods for elements, positions, children - worst case array size = ? Use Heap to implement Priority Queue: Insert: add next leaf node according to complete property, bubble up (swim) Time: log N to bubble up, constant to add leaf with array rep, log N to add leaf with tree rep FindMin: constant - get root DeleteMin: return root, move last leaf node value to root, bubble down swapping with min child at each step (sink) Time: find last (constant with array or log N with tree) + log N to bubble down Summary: Binary Heap as Priority Queue min - O(1) insert - O(log N) deleteMin - O(log N) if heap implemented with sequence (array), find last position is O(1) if heap implemented with linked btree, find last position is O(log n) bubble-up and bubble-down are O(log n) Solve original problem: find k best in set of N 1) make a Priority Queue of all values: N log N to insert, k log N to removeMin 2) make a min Priority Queue of size k, for (N-k) other elements, compare to root, if higher: deleteMin, insert item Time: k log k to insert, (N-k)log k => N log k Build a heap with N values, bottom-up in O(N) time.