TLDR: Could anyone kindly explain why my suggested algo did not work on large unsorted datasets?
Its a simple enough problem: Find the number of pairs of objects in an array e.g.
{1,2,3,4,5,3} where the integer denotes the item type, expected output is 1 pair found.
My algo below seems to be able to solve for small data sets:
static int findPairs(int n, int[] ar) {
int pairCount = 0;
int itemCount = 1;
for (int i = 0; i <= n - 1; i++) {
for (int j = i + 1; j < n && ar[j] != 0; j++) {
if (ar[i] == ar[j]) {
itemCount ++;
if (itemCount % 2 == 0) {
pairCount ++;
}
ar[j] = 0;
}
if (j == n - 1) {
// reset counter
itemCount = 1;
}
}
}
return pairCount;
}
If i input a very small data set such as
10 20 20 10 10 30 50 10 20
The algo outputs just fine.
I started to stresstest the algo however, and I use a very large, complicated data set of 100 items:
int[] ar = {50, 49, 38, 49, 78, 36, 25, 96, 10, 67, 78, 58, 98, 8, 53, 1, 4, 7, 29, 6, 59, 93, 74, 3, 67, 47, 12, 85, 84, 40, 81, 85, 89, 70, 33, 66, 6, 9, 13, 67, 75, 42, 24, 73, 49, 28, 25, 5, 86, 53, 10, 44, 45, 35, 47, 11, 81, 10, 47, 16, 49, 79, 52, 89, 100, 36, 6, 57, 96, 18, 23, 71, 11, 99, 95, 12, 78, 19, 16, 64, 23, 77, 7, 19, 11, 5, 81, 43, 14, 27, 11, 63, 57, 62, 3, 56, 50, 9, 13, 45};
And the code seems to fail. Expected answer is 28 pairs but this algo outputs 6 pairs.
Now, this was a bruteforce (O(n^2)) attempt at finding the pairs in a unsorted array.
So i decided to sort the array first then call the same method findPairs and curiously enough, now it works:
Given Array
50 49 38 49 78 36 25 96 10 67 78 58 98 8 53 1 4 7 29 6 59 93 74 3 67 47 12 85 84 40 81 85 89 70 33 66 6 9 13 67 75 42 24 73 49 28 25 5 86 53 10 44 45 35 47 11 81 10 47 16 49 79 52 89 100 36 6 57 96 18 23 71 11 99 95 12 78 19 16 64 23 77 7 19 11 5 81 43 14 27 11 63 57 62 3 56 50 9 13 45
Sorted array
1 3 3 4 5 5 6 6 6 7 7 8 9 9 10 10 10 11 11 11 11 12 12 13 13 14 16 16 18 19 19 23 23 24 25 25 27 28 29 33 35 36 36 38 40 42 43 44 45 45 47 47 47 49 49 49 49 50 50 52 53 53 56 57 57 58 59 62 63 64 66 67 67 67 70 71 73 74 75 77 78 78 78 79 81 81 81 84 85 85 86 89 89 93 95 96 96 98 99 100
Number of pairs = 28
Q1: Could anyone kindly explain why my suggested algo did not work on large unsorted datasets? I cant seem to fathom why on earth it wouldnt.
EXTRA QUESTION PLUS PROGRESS UPDATE BELOW
So, to try and solve this issue, I was thinking along the lines of:
1) Doing a mergeSort which is much more efficient than traditional sort
2) Iterate through the now sorted array once to count all the pairs
Which I have already accomplished and indeed works(let me know if you want to see the code for that).
Q2: However, the code for MergeSorting alone is pretty lengthy(Two longass methods) and I would like to keep implementation to just one method call if possible, AND keep time complexity as fast as possible, any suggestions are welcome!
Thanks!