0

TLDR: Could anyone kindly explain why my suggested algo did not work on large unsorted datasets?

Its a simple enough problem: Find the number of pairs of objects in an array e.g.

{1,2,3,4,5,3} where the integer denotes the item type, expected output is 1 pair found.

My algo below seems to be able to solve for small data sets:

static int findPairs(int n, int[] ar) {
    int pairCount = 0;
    int itemCount = 1;
    for (int i = 0; i <= n - 1; i++) {
        for (int j = i + 1; j < n && ar[j] != 0; j++) {
            if (ar[i] == ar[j]) {
                itemCount ++;
                if (itemCount % 2 == 0) {
                    pairCount ++;
                }
                ar[j] = 0;
            }
            if (j == n - 1) {
                // reset counter
                itemCount = 1;
            }
        }
    }

    return pairCount;
}

If i input a very small data set such as

10 20 20 10 10 30 50 10 20

The algo outputs just fine.

I started to stresstest the algo however, and I use a very large, complicated data set of 100 items:

int[] ar = {50, 49, 38, 49, 78, 36, 25, 96, 10, 67, 78, 58, 98, 8, 53, 1, 4, 7, 29, 6, 59, 93, 74, 3, 67, 47, 12, 85, 84, 40, 81, 85, 89, 70, 33, 66, 6, 9, 13, 67, 75, 42, 24, 73, 49, 28, 25, 5, 86, 53, 10, 44, 45, 35, 47, 11, 81, 10, 47, 16, 49, 79, 52, 89, 100, 36, 6, 57, 96, 18, 23, 71, 11, 99, 95, 12, 78, 19, 16, 64, 23, 77, 7, 19, 11, 5, 81, 43, 14, 27, 11, 63, 57, 62, 3, 56, 50, 9, 13, 45};

And the code seems to fail. Expected answer is 28 pairs but this algo outputs 6 pairs.

Now, this was a bruteforce (O(n^2)) attempt at finding the pairs in a unsorted array.

So i decided to sort the array first then call the same method findPairs and curiously enough, now it works:

Given Array
50 49 38 49 78 36 25 96 10 67 78 58 98 8 53 1 4 7 29 6 59 93 74 3 67 47 12 85 84 40 81 85 89 70 33 66 6 9 13 67 75 42 24 73 49 28 25 5 86 53 10 44 45 35 47 11 81 10 47 16 49 79 52 89 100 36 6 57 96 18 23 71 11 99 95 12 78 19 16 64 23 77 7 19 11 5 81 43 14 27 11 63 57 62 3 56 50 9 13 45 

Sorted array
1 3 3 4 5 5 6 6 6 7 7 8 9 9 10 10 10 11 11 11 11 12 12 13 13 14 16 16 18 19 19 23 23 24 25 25 27 28 29 33 35 36 36 38 40 42 43 44 45 45 47 47 47 49 49 49 49 50 50 52 53 53 56 57 57 58 59 62 63 64 66 67 67 67 70 71 73 74 75 77 78 78 78 79 81 81 81 84 85 85 86 89 89 93 95 96 96 98 99 100 
Number of pairs = 28 

Q1: Could anyone kindly explain why my suggested algo did not work on large unsorted datasets? I cant seem to fathom why on earth it wouldnt.

EXTRA QUESTION PLUS PROGRESS UPDATE BELOW

So, to try and solve this issue, I was thinking along the lines of:

1) Doing a mergeSort which is much more efficient than traditional sort
2) Iterate through the now sorted array once to count all the pairs 

Which I have already accomplished and indeed works(let me know if you want to see the code for that).

Q2: However, the code for MergeSorting alone is pretty lengthy(Two longass methods) and I would like to keep implementation to just one method call if possible, AND keep time complexity as fast as possible, any suggestions are welcome!

Thanks!

JohnDoeDeer
  • 77
  • 1
  • 8
  • In case of equal couples like (10, 10), (10, 10) you count them as one ? – dariosicily Apr 13 '20 at 16:15
  • What value of n are you passing in, and why do you need to pass in n? And where are you updating the value of pairCount? I guess the latter question is because of a typo when putting in your code and line 9 should be pairCount – Chris Apr 13 '20 at 16:17
  • My first guess is you are passing in a dodgy value of n, but I'd check your alorithm on pencil and paper to check it works - first glance it looks fine. If it works with a few values it will work with 100 – Chris Apr 13 '20 at 16:20
  • Instead of n, just use at.length. Also no need for that conditional when resetting itemCount I don't think, just reset it inside the outer loop – Chris Apr 13 '20 at 16:22
  • ar not at above - autocorrect! – Chris Apr 13 '20 at 16:23
  • I don't see where you increment `pairCount`. – user58697 Apr 13 '20 at 16:27
  • Typo, my apologies. I've made an edit to my first algo above – JohnDoeDeer Apr 13 '20 at 16:30
  • Your first algorithm is not right, it just happens that particular small dataset worked. You need to re-check it step by step. – SomeDude Apr 13 '20 at 16:30
  • @JohnDoeDeer What is the way in which we have to count pairs? – nice_dev Apr 13 '20 at 16:31
  • @dariosicily I've retested the problematic algo against a simple set of {10,10,10,10} which seems to output correctly (2 pairs) – JohnDoeDeer Apr 13 '20 at 16:32
  • 1
    You need to rethink your algo. Use a debugger and go through step-by-step for the 100 items array. This is a simple problem that can be solved in O(n) without sorting or any complicated steps in around 10 lines. Hint: use a hashset – denvercoder9 Apr 13 '20 at 16:34
  • @vivek_23 each integer in the array denotes an item type or class if you will. So i need to find out how many pairs of each item type is available in the array. {1,2,2,1} has two pairs as an example – JohnDoeDeer Apr 13 '20 at 16:35
  • @JohnDoeDeer How much does {2,2,2} have? – nice_dev Apr 13 '20 at 16:36
  • Does this answer your question? [Why is processing a sorted array faster than processing an unsorted array?](https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array) – Arvind Kumar Avinash Apr 13 '20 at 17:13

0 Answers0