If you want to achieve the same output as in the serial case, meaning that the ordering of the lines in the output is important, you can make use of the ordered facility of OpenMP:
#pragma omp parallel for schedule(static,1) ordered
for(int i=0; i<=n; i++){
int result = some_comp(i);
#pragma omp ordered
cout << result << endl;
}
This assumes that some_comp(i) takes relatively long time to compute compared to the time spent in synchronous ordered execution. You can read more about how it all works here.
If some_comp(i) is comparable or faster than the I/O, then it makes sense to store the data in a buffer and print it in order afterwards:
std::vector<int> results(n);
#pragma omp parallel for
for (int i=0; i<=n; i++){
results[i] = some_comp(i);
}
for (auto res : results){
cout << res << endl;
}
If n is huge and you don't have that much space to store a huge vector of result values, simply divide the iteration space into chunks:
const int chunk_size = 1000;
std::vector<int> results(chunk_size);
for (int chunk = 0; chunk < (n+1 + chunk_size) / chunk_size; chunk++) {
const int chunk_start = chunk * chunk_size;
const int i_max = std::min(n+1 - chunk_start, chunk_size);
#pragma omp parallel for
for (int i = 0; i < i_max; i++){
results[i] = some_comp(chunk_start + i);
}
for (int i = 0; i < i_max; i++){
cout << results[i] << endl;
}
}
I hope I got all the math correct and this should work both when chunk_size divides n+1 and not.
It is also possible to put all the code inside one parallel region so to prevent the overhead of multiple parallel regions, and make use of the single construct for the sequential parts, but if you choose the chunk size properly, there won't be much of a difference in the execution time and the code is more readable like it is now.