1

I have a long array that I would like to be compared to itself faster than what I am currently seeing. I need to perform some operation on each element, then compare each result at least once. This is the fastest working version that I have been able to come up with thus far.

The array is already sorted in ascending order by the time this is method is called, but that only matters when it comes to not going over the set limit(realTopEnd). Each element is used to generate a 100 bit sequence. I have used BitSet as that has the .or operation that produces the result that I need. I am pretty sure that isn't the most efficient way of doing what I want, but it is the only way that I have found so far to get this .or comparison (if set1={1010} and set2={0110}, then the result={1110}).

The other area that could probably be improved is the order of comparison. Currently, this is being done sequentially. My previous fastest method had the array split in four and implemented multi-threading, so theoretically it should take 1/4th the time or at least close to that. The opposite seemed to be true. In this example, obviously the times are going to be fairly quick, but in the actual program, with multi-threading this was taking about 2 minutes, but when sequential it was taking less than 20 seconds. Even thought 20 seconds is a big improvement over 2 minutes, it's still too slow for what I hoping to achieve.

This method should be able to benefit from multi-threading as the order in which the elements are used to create the 100 bit sequences doesn't matter and the order in which those sequences are "combined" is irrelevant as the end result should be a sequence of 100 bits where the only 0s are at the indexes in which none of the other sequences had a 1 (if set1=101010},set2={100100}, set3={000101}; then result={101111}). Maybe I was implementing the multi-threading wrong, but my current theory is that the generateNewBitSet method was getting results faster than the executorService could assign tasks to threads so all threads weren't running simultaneously as intended.

Any help in improving this would be greatly appreciated, and if possible please provide code examples.

// Code example
static void doThisThing(long[] realCompareArray,long[] realOnesSequence, long[] realTwosSequence,
long[] realThreesSequence,long[] realFoursSequence, long realTopEnd) {
        long[] exampleCompareArray = new long[] {1L,2L,3L,4L,1001L,1002L,1003L,1004L,1102L,12345671L,12345672L,
12345673L,12345674L,22345671L,1234567890123451L,1234567890123452L,1234567890123453L,1234567890123454L};
        
        long startTime = System.nanoTime();
        for (int setCount = 1; setCount <= 10000; setCount++) {
            long loopStart = System.nanoTime();
            long topEnd = 1234567890123453L; //=realTopEnd
            BitSet currentBitSet = new BitSet(100);

            for (long l : exampleCompareArray) { //: realCompareArray) {                
                if (l > topEnd) {
                    break;
                }

                int remainder = (int) (l % 10);
                long[] currentSequence = null;

                switch (remainder) {
                case 1: {
                    currentSequence = new long[] { 2, 2, 2, 2 }; // =realOnesSequence;
                    break;
                }
                case 2: {
                    currentSequence = new long[] { 3, 6, 9, 12 }; // =realTwosSequence;
                    break;
                }
                case 3: {
                    currentSequence = new long[] { 5, 6, 7, 8 }; // =realThreesSequence;
                    break;
                }
                case 4: {
                    currentSequence = new long[] { 2, 14, 2, 2 }; // =realFoursSequence;
                    break;
                }
                }

                currentBitSet.or(generateNewBitSet(l, currentSequence, setCount));
            }

            System.out.println(currentBitSet);
            
            long loopEnd = System.nanoTime();
            long loopDuration = (loopEnd - loopStart); // divide by 1000000 to get milliseconds.
            System.out.println("Loop Duration: " + loopDuration);
        }
        long endTime = System.nanoTime();
        long duration = (endTime - startTime); // divide by 1000000 to get milliseconds.
        System.out.println("Total Duration: " + duration);
    }

    public static BitSet generateNewBitSet(long currentNumber, long[] sequence, long currentSet) {
        BitSet tempSet = new BitSet(100);
        long offset = -1;
        
        if (currentNumber !=3)
        {
            offset = (((currentNumber - ((((currentNumber-5)/10 )*2)+9))/2)-1);
        }
        
        currentSet = currentSet * 100;
        long sequenceMultiplier = currentNumber / 10;

        //Example sequence
        sequence[0] = sequence[0] + (5 * sequenceMultiplier);
        sequence[1] = sequence[1] + (10 * sequenceMultiplier);
        sequence[2] = sequence[2] + (5 * sequenceMultiplier);
        sequence[3] = sequence[3] + (5 * sequenceMultiplier);

        long loopSize = Arrays.stream(sequence).sum();
        long index = ((currentSet / loopSize) * loopSize) - currentSet + offset;
        int looper = 0;

        while (index < 100) {
            if (index > -1) {
                tempSet.set((int) index);
            }
            if (looper > 3) {
                looper = 0;
            }
            index += sequence[looper];
            looper++;
        }

        return tempSet;
    }
  • this is not about speed, but how you measure. use `jmh` – Eugene Oct 14 '21 at 01:11
  • I don't understand your comment. As far as I can tell it is entirely about speed. I have a dedicated server running this application and this method is the primary bottleneck. My machine is running for hours at a time at only 15% CPU on an 8 core machine. Memory use is only at 5/32GB. I am clearly not using my machine to its fullest with this current build. There must be a way to restructure this to maximize my machines speed. Even a 5% increase in run time would save me hours of processing time. – Timothy Joseph Oct 14 '21 at 04:47
  • You can use 2 approaches. One is to use some kind of profiler and see what is going on. I really enjoy perf + flamegraphs; but there are other excellent tools like async profiler, jfr etc. Also have a look at per CPU, cpu utilization. The second approach is to create a microbenchmark like you already have done, but in that case I would use JMH like @eugene already indicated. JMH has build profilers (including perf) that help to drill down where the problem is. JMH will protect you against typical benchmarks errors like dead code elimination. – pveentjer Oct 14 '21 at 05:06
  • If you get a reduction in performance with multiple threads compared to a single thread, then probably you are suffering from contention. I don't understand your data model well enough to see where the contention is. – pveentjer Oct 14 '21 at 05:08
  • Ah, JMH is a profiling tool. I never heard of it so didn't understand Eugene's comment. The code example I gave is runnable as is. Other than the actual data in the long arrays and the number limiting the top end, everything in my sample matches my actual running code. When I implemented multi-threading, it was based on ExecutorService examples. Either those code examples would have contention issues, or I implemented it incorrectly. Maybe I should post an example of my multi-thread attempts. – Timothy Joseph Oct 14 '21 at 16:38
  • Your measuring code contains println statements. That alone adds a problem, because writing to the console can vary extremely in the time it takes. So, use a framework , and read https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – GhostCat Oct 15 '21 at 07:16
  • Those are only for troubleshooting purposes. I am currently trying to use JMH to benchmark this project, but the vagueness in "tutorials" is making that more of a hassle to implement. They keep telling me to use command lines that don't work. I have spent more time trying to troubleshoot JMH and get that to work than I have on trying to fix my actual project. Thanks for the attempts at helping, but I think I would have been better off not posting here at all. – Timothy Joseph Oct 19 '21 at 11:46
  • "understanding" your code, "i feel" there is much optimization/parallization potential... (with the given example data) it does "something" ..but only on the first 4 iterations, the other 9(!)996 seem useless/print empty bitset... – xerx593 Dec 10 '21 at 16:56

0 Answers0