5

This question is from an interview:

Find kth largest element in the union of 2 max heaps given that this kth element appears in both heaps in O(k log n).

This is the algorithm that I come up with:

While k is not zero
 if(root of max-heap #1 > root of max-heap #2)
 {
    extract-max(heap #1)
 }
 else if(root of max-heap #1 < root of max-heap #2)
 {
    extract-max(heap #2)
 }
 else //case of duplicates
 {
    extract-max(heap #1)
    extract-max(heap #2)
 }
 k--

So when k = 0, the extract-max function will extract the kth largest.

I think that this algorithm will run in O(k log n) time since the extract-max operation will run in O(log n) and I perform it k times.

Is this the right approach? It seems that I have not used the given info "this kth element appears in both heaps"? Is there a better approach utilizing this info?

Bernhard Barker
  • 52,963
  • 14
  • 96
  • 130
LKT
  • 311
  • 1
  • 7
  • 17
  • 6
    Your approach and analysis seems fine to me – amit May 31 '14 at 22:30
  • 1
    Does *anyone* understand the point of "given that this kth element appears in both heaps"? – ooga May 31 '14 at 22:43
  • 2
    It could just be a way to simplify the problem slightly. Once you've popped off `k-1` elements, there's no need to examine the heads of both heaps. – Sneftel May 31 '14 at 22:44
  • I think your code is *almost* right. You probably don't want to take both elements in the case of duplicates. Make the first test `>=`, and get rid of the dangling `else`. Otherwise you have inconsistent behavior. If one heap has duplicate items, they're extracted one at a time. But if there are duplicate items across heaps, they're extracted in a single operation. That will lead to problems. Other than that, your approach is sound. Although I don't understand what "this kth element appears in both heaps" means. – Jim Mischel Jun 01 '14 at 04:13
  • Thanks guys for the comments. Yes, I quite don't get the extra info given in this problem too, that's why I posted the question. I think we might utilize it for a better approach than mine. @JimMischel: Let's say if there are a duplicate across two heaps. heap #1 hast root 6 and heap #2 has root 6. k = 2 (find second largest). If I just extract-max from heap #1, now heap #1 has root 5 (for example) and heap #2 still has root 6 and k = 1. Next extract-max will extract 6 as my second largest since k = 0 now although 5 should be the second largest in this situation. – LKT Jun 01 '14 at 06:50
  • Also I found a bug in my code and does not know how to fix. Let's say heap #1 has root 6, and another 6 duplicate in it. Heap #2 has root 6. k=2 (find second largest). According to my original code, I will extract-max both 6 in heap #1 and heap #2. After that, heap #1 will have root 6 and heap #2 will have let's say root 5 and now k = 1. Next I will extract-max heap #1 and k = 0 now and the code reports that my second largest is 6, but it should be 5 in this case. Does anybody know to fix it? I think we might need to use the additional given info that I have not used. Thanks! – LKT Jun 01 '14 at 06:55
  • @LKT This 'bug' can be easily solved with popping elements from the heap while the root is identical to the old root, every time you delete elements from a heap. – amit Jun 01 '14 at 07:45
  • @JimMischel The extraction of 2 elements, one from each heap is done because the question asks for k'th largest element in the **union** of heaps, usually the terminology for union means take at most one copy of each element. But since it is an interview question - asking for clarification from the interviewer can be a good idea, and will show him you have good understanding of the different possibilities. – amit Jun 01 '14 at 07:46
  • Also note: More efficient (asymptotically, not practically) solution would be to [find k'th biggest element in each heap in O(klogk)](http://stackoverflow.com/q/11209556/572670), create two arrays of size `k`, merge them to array of size `2k` in `O(k)` and find `k`th biggest element in a sorted array. – amit Jun 01 '14 at 07:52
  • @amit Could you expand on your comment, please? I failed to understand both your comment and the linked `O(k*log(k))` solution. Thanks! – Ali Jun 01 '14 at 10:41
  • @Ali You can use a heap of size O(k) to find the k largest elements in a max-heap of arbitrary size in O(k log k). In fact [it can even be done in O(k)](http://stackoverflow.com/questions/22574580/algorithm-for-finding-the-largest-k-numbers-in-a-max-heap-of-size-n-in-ok-time), which is somewhat surprising. You do that for both heaps. Now you have 2k unsorted elements and want to find the k-th largest among those. You can do the latter step by sorting in O(2k log (2k)) = O(k log k) and then retrieving the k-th element... – Niklas B. Jun 01 '14 at 11:38
  • ... You can also use the [median of medians selection algoirithm](http://en.wikipedia.org/wiki/Median_of_medians) to get the k-th element in O(k). So in total (@amit) it is in fact possible to solve the problem in O(k), but the heap selection due to Frederickson likely has galactic constant factors that make the algorithm totally impractical. – Niklas B. Jun 01 '14 at 11:38
  • @amit Yes, I can see that it solves the problem, but it will break the constraint O (k log n). For instance, an easy example to examine is heap #1 and heap #2 each has 5 nodes (4 of them are 7, 1 is 6 with 7 is being root of course since it's a max heap) - find the second largest (in this case it should return 6). By running my algorithm and your bug fix, I think the returned result is correct, but the big O is no longer k log n since we run the extract-max operation more than 2 times. – LKT Jun 02 '14 at 07:49
  • @LKT Handling with duplicates is impossible other way, assume a heap with `2n` values, `n` of them are '1', and all the others are greater than 1. How can you find the 2nd largest element, which is not 1 efficiently? You cannot. The 2nd largest element can be in any leaf node - it is undefined which one, and all non-leaf nodes are 1. Finding this value requires traversing all the leaves of the heap, which will be `O(n)`. – amit Jun 02 '14 at 12:29
  • @amit yup, that's what I thought also. The worst case is O (n log n). But I'm just wondering if we can utilize the info "given kth element appears in both heaps" to do something better? Maybe not...i guess :) – LKT Jun 02 '14 at 18:14

1 Answers1

0

Your approach is quite good and it can't be improved too much. It also has the expected complexity so you should be good to go. Only thing that I thought of and can be optimized is that you only need to check if k is zero, when you go in the last case:

While true
 if(root of max-heap #1 > root of max-heap #2)
 {
    extract-max(heap #1)
 }
 else if(root of max-heap #1 < root of max-heap #2)
 {
    extract-max(heap #2)
 }
 else //case of duplicates
 {
    extract-max(heap #1)
    extract-max(heap #2)
    if (k == 1) {
      break;
    }
 }
 k--

In this way you reduce the number of comparisons(which will have very slight effect) in the average case. This also takes advantage of the additional information that you're given - that the element is common for both heaps. However I don't think any benchmark will be able to notice this improvement as extract-max exceeds by far the complexity of the comparison.

Ivaylo Strandjev
  • 66,530
  • 15
  • 117
  • 170