0

I'm trying to use OpenMP to speedup my code but the problem is I need to print stuff and I don't know how to print them without race conditions.

My code looks something like that:

int some_comp(int n){
  /* some computation */
}

int main(){
  cin >> n;
  #pragma omp parallel for
  for(int i=0; i<=n; i++){
    cout << some_comp(i) << endl;
  }
}

It's easy to see (and pretty obvious) that there is a race condition in the line that prints the result and that it's different in each run.

What I want to do is get rid of the race condition and somehow print everything serially, as it should have printed if I hadn't used OpenMP.

I thought of using some kind of vector or list or some other data structure and save all the results there but the problem is that n<=2^64 so a vector/list/something else will need too much memory.

Any ideas will be appreciated.

sssss
  • 1
  • 4
  • I’m confused. If you’re spawning multiple threads that access the same data, you’re gonna have race conditions. You can try to mitigate that with mutexes and locks, but if you want each thread to access in the order they were spawned in, why bother spawning threads? – Cole Tobin May 20 '20 at 16:04
  • 1
    Basically, if you have been drawing something serially and now, you want to draw it in parallel, you have to divide your scene on the number of threads and each thread should start with a certain offset. Imagine we have 2 threads. We can divide the screen by two and draw each half by each thread. The second thread will start with screenSize / 2 offset. As @Cole Johnson mentions, threads are intended to be used, when you have different resources. The resources in this case are the parts of the screen. – Alex May 20 '20 at 16:07
  • Mark- that doesn't really help since I can't write so many variables into a file, it would take up too much storage (2^64 entries is like gigas of gigas). – sssss May 20 '20 at 19:13
  • Alex- is there a way to do that with OpenMP? Some other framework would be cool too, if you know any :) – sssss May 20 '20 at 19:14
  • Cole- I never said it was possible or smart. All I want to do is make my code run faster. Without printing and with parallelizing it took 0.07s and without parallelizing it took 1.05s so I wanted to try and make parallelizing work. – sssss May 20 '20 at 19:17

1 Answers1

0

If you want to achieve the same output as in the serial case, meaning that the ordering of the lines in the output is important, you can make use of the ordered facility of OpenMP:

#pragma omp parallel for schedule(static,1) ordered
for(int i=0; i<=n; i++){
  int result = some_comp(i);
  #pragma omp ordered
  cout << result << endl;
}

This assumes that some_comp(i) takes relatively long time to compute compared to the time spent in synchronous ordered execution. You can read more about how it all works here.

If some_comp(i) is comparable or faster than the I/O, then it makes sense to store the data in a buffer and print it in order afterwards:

std::vector<int> results(n);

#pragma omp parallel for
for (int i=0; i<=n; i++){
  results[i] = some_comp(i);
}

for (auto res : results){
  cout << res << endl;
}

If n is huge and you don't have that much space to store a huge vector of result values, simply divide the iteration space into chunks:

const int chunk_size = 1000;
std::vector<int> results(chunk_size);

for (int chunk = 0; chunk < (n+1 + chunk_size) / chunk_size; chunk++) {
  const int chunk_start = chunk * chunk_size;
  const int i_max = std::min(n+1 - chunk_start, chunk_size);

  #pragma omp parallel for
  for (int i = 0; i < i_max; i++){
    results[i] = some_comp(chunk_start + i);
  }

  for (int i = 0; i < i_max; i++){
    cout << results[i] << endl;
  }
}

I hope I got all the math correct and this should work both when chunk_size divides n+1 and not.

It is also possible to put all the code inside one parallel region so to prevent the overhead of multiple parallel regions, and make use of the single construct for the sequential parts, but if you choose the chunk size properly, there won't be much of a difference in the execution time and the code is more readable like it is now.

Hristo Iliev
  • 69,756
  • 12
  • 127
  • 175