-1

I have an assignment to optimize a for loop so the compiler compiles code that runs faster. The objective is to get the code to run in 5 or less seconds, with the original run time being around 23 seconds. The original code looks like this:

#include <stdio.h>
#include <stdlib.h>

#define N_TIMES     600000
#define ARRAY_SIZE   10000

int main(void)
{
    double  *array = calloc(ARRAY_SIZE, sizeof(double));
    double  sum = 0;
    int     i;

    printf("CS201 - Asgmt 4 - I. Forgot\n");

    for (i = 0; i < N_TIMES; i++) {

        int     j;

        for (j = 0; j < ARRAY_SIZE; j++) {
            sum += array[j];
            }

        }

    return 0;
}

My first thought was to do loop unrolling on the inner for loop which got it down to 5.7 seconds and that loop looked like this:

 for (j = 0; j < ARRAY_SIZE - 11; j+= 12) {
            sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);
                }

After taking it out to 12 spots in the array per loop the performance wasn't increasing anymore so my next thought was to try and introduce some parallelism so I did this:

   sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5]);
   sum1 = sum1 + (array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);

That actually ended up slowing down the code and each additional variable again slowed the code down more so. I'm not sure if parallelism doesn't work here or if I'm implementing it wrong or what but that didn't work so now I'm not really sure how I can optimize it anymore to get it below 5 seconds.

EDIT: I forgot to mention I can't make any changes to the outer loop, only the inner loop

EDIT2: This is the part of the code I'm trying to optimize for my assignment:

        for (j = 0; j < ARRAY_SIZE; j++) {
            sum += array[j];
            }

Im using gcc compiler with the flags gcc -m32 -std=gnu11 -Wall -g a04.c -o a04 All compiler optimizations are turned off

csStudent
  • 127
  • 2
  • 10
  • 1
    Your `array` is all zeros. And the program doesn't have any output. So the optimized program will be the empty one. Update: Sorry, it should actually print "CS201 - Asgmt 4 - I. Forgot\n"... – Eugene Sh. Aug 22 '17 at 21:19
  • 3
    What makes you think there's gong to be any parallelism in the 'parallel' code? – Jonathan Leffler Aug 22 '17 at 21:19
  • You don't need the outer `i` loop since it's all just repetitive additions. `for (j = 0; j < ARRAY_SIZE; j++) { sum += N_TIMES * array[j] }`. Get rid of the `for (i = 0...` loop. Of course, `array` isn't initialized so you'll get zeroes or whatever. But I assume that's not the point of your code. – lurker Aug 22 '17 at 21:24
  • @Dante you missed an obvious optimization (as recommended by information_interchange), but the process you followed in unrolling the loop and in determining the best number to unroll is a good skill to keep in your back pocket. Some day, tricks like that will be very useful for you. – Scott Mermelstein Aug 22 '17 at 21:27
  • @EugeneSh. C does not require floating-point zero to be bitwise zero. At best, the behavior is implementation defined. – EOF Aug 22 '17 at 21:34
  • 1
    You do not use this table or the result so any compile time optimisation will remove both loops :) – 0___________ Aug 22 '17 at 21:46
  • Here, it's a duplicate: [C loop optimization help for final assignment](https://stackoverflow.com/q/32000917/69809). As some answers state in the link, whoever wrote this assignment doesn't really understand what's important in algorithmic complexity, optimizing compilers, and probably programming in general. Another thread here too: [Optimized sum for an array of doubles](https://stackoverflow.com/q/31918680/69809). – Groo Aug 22 '17 at 21:48
  • Does the inner loop have to stay a loop? Your unrolling indicates that not. In that case use a flag to notice execution in the first time through the outer loop, execute the inner loop, store the result and in all following execution just add the stored sum. I.e. do not bother about optimising the first time, optimise all others by 99.99%. – Yunnosch Aug 22 '17 at 22:03

2 Answers2

5

Since j and i don't depend on one another, I think you can just do:

for (j = 0; j < ARRAY_SIZE; j++) {
    sum += array[j];
}

sum *= N_TIMES
takendarkk
  • 3,178
  • 7
  • 23
  • 36
information_interchange
  • 2,026
  • 4
  • 24
  • 43
0

You can move the declaration of variable 'j' out of the loop like so:

 int j;
 for (i = 0; i < N_TIMES; i++) {

    //int     j; <-- Move this line out of the loop
    for (j = 0; j < ARRAY_SIZE - 11; j+= 12) {
        sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);
    }
 }

You don't need to declare a new variable 'j' each time the loop runs.

Devin L.
  • 419
  • 2
  • 12