Data races with MESI optimization

Question

I dont really understand what exactly is causing the problem in this example:

Here is a quote from my book:

Optimization. On most hardware, the MESI protocol is highly optimized to minimize latency. This means that some operations aren’t actually performed immediately when messages are received over the ICB. Instead, they are deferred to save time. As with compiler optimizations and CPU out-of-order execution optimizations, MESI optimizations are carefully crafted so as to be undetectable by a single thread. But, as you might expect, concurrent programs once again get the raw end of this deal.

Here is the code:

int32_t g_data = 0;
int32_t g_ready = 0;
void ProducerThread() // running on Core 1
{
  g_data = 42;
  // assume no instruction reordering across this line
  g_ready = 1;
}

void ConsumerThread() // running on Core 2
{
  while (!g_ready)
  PAUSE();
  // assume no instruction reordering across this line
  ASSERT(g_data == 42);
}

For example:

How can g_data be computed but not present in the cache?
If g_data is not in cache, then why does this sentece end with a yet:

if Core 1 already has g_ready’s cache line in its local L1 cache, but does not have g_data’s line yet.

Would the CPU load the cache line with g_data after it has been computed?

If we read this sentence:

This means that some operations aren’t actually performed immediately when messages are received over the ICB. Instead, they are deferred to save time.

Then what operation is deferred in our example with producer and consumer threads?

So basically I dont understand how under the MESI protocol, some operations are visible to other cores in the wrong order, despite being computed in the right order by a specific core.

Are those C operations supposed to represent asm loads/stores? If so, make your variables `volatile` so a C compiler would have to actually do that. (Including no compile-time reordering of volatile accesses.) — Peter Cordes, Jun 05 '22 at 02:06
You didn't provide enough context to be sure what point your book was making in the text those quotes are part of, but probably: (1) [The store buffer](https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram) holds stores before they commit to L1d cache. And before that, variable values get computed in registers. (2) Every address is part of some cache **line**, whether a cache currently has a valid copy of it or not. — Peter Cordes, Jun 05 '22 at 02:11
A load will result in the line getting cached in L1d of this CPU when data arrives. The question is whether the copy of the line you read is from before or after some other store commits to L1d. (3) they might be talking about invalidation queues? — Peter Cordes, Jun 05 '22 at 02:13
MESI itself doesn't reorder anything; it's the local store buffer inside each core, and hit-under-miss load reodering, that causes memory-ordering effects. (AFAIK, those local effects can explain any reordering without needing to think about invalidation queues, but possibly that matters on more exotic machines like [POWER that can store-forward between logical cores of a physical core](https://stackoverflow.com/questions/27807118/will-two-atomic-writes-to-different-locations-in-different-threads-always-be-see/50679223#50679223), so they're not multi-copy atomic. ) — Peter Cordes, Jun 05 '22 at 02:16

Data races with MESI optimization

0 Answers0