What causes unpredictability when doing multi threading

Question

Multi threading could cause a racing condition if two threads are accessing the same memory slot, but why is that? From a HW point of view, if the two cores are designed the same, internal pipelines are the same, the logic gates/transistors pass electrons the same way and the speed of those electrons is a constant value then what causes the race?

Theoretically speaking shouldn't the two threads access the memory slot at the exact same time down to the nano second, every time?

One time you move the mouse and one of the CPU cores gets an interrupt to process the mouse move. — user253751, Jan 05 '23 at 19:42
The speed of electrons is not a constant value. Manufacturing/material defects, magnetic fields, temperature changes will all influence it. This usually doesn’t matter, but at the level of precision you’re talking about, it will. — Telastyn, Jan 05 '23 at 20:02
The answer to your last question is a firm "no". Transistors are fundamentally quantum mechanical in nature and the scale of contemporary processors is so small, that other quantum effects are very much in play. In other words, everything is probabilistic w/ regard to the context of your question. — JimmyJames, Jan 05 '23 at 21:11
Consider one of the cores is also spending half its time decoding cat videos, the other is scheduling dentist appointments. Now what happens? — whatsisname, Jan 05 '23 at 22:26
The whole point of multithreading is that the cores execute different instructions. — Bergi, Jan 06 '23 at 08:23
This question conflates some things: multiple threads don't require multiple cores, and unpredictability of programs can occur already in single-threaded programs on a single processor hardware. I recommend you clarify your understanding this last point first before adding twists like multthreading. — Doc Brown, Jan 06 '23 at 09:08
... when looking for an example on this site, I found this 7 year old answer of mine, maybe this helps. — Doc Brown, Jan 06 '23 at 09:14
The speed of electrons is irrelevant. Two cores with the same clock executing the same instruction sequence will do so in the same amount of time .. but only if they START at the same time, and are not interrupted! For a general purpose PC, neither of those is true. — pjc50, Jan 06 '23 at 09:54
The fundamental misunderstanding in your description is that you believe "memory slots" are things that processors access. That hasn't been true for decades. When you have a correct mental model of how memory works on modern machines, it will become much more clear why races exist. Modern processors access copies of memory pages. — Eric Lippert, Jan 06 '23 at 21:45
At any point in time, a modern OS runs hundreds of (sub-)programs in parallel, and you have only so many threads/cores. Your programs generally do not run on bare metal, OS kernel schedules their execution. So the most practical and short answer is that unless you are writing an OS kernel, you do not control when each thread will actually execute its instructions. And then CPUs also have branch prediction and whatnot, which is outside of the OS control as well. — Lodinn, Jan 07 '23 at 05:00
Just for the record, information is carried by changes in electric fields, which propagate at some fraction of the speed of light. Electrons move way slower than that, especially in metal wires, just to balance charges. Even in semiconductors the average drift velocity is slower than the time for a signal to get through a transistor. Veritaseum made a video about it: https://www.youtube.com/watch?v=oI_X2cMHNe0 . Time for a transistor to actually switch does depend on carriers moving within and around the gate, but it's tiny compared to the total path length. — Peter Cordes, Jan 07 '23 at 09:28
your assumptions on how threading is managed by a system seems to be closer to how an RTOS works. If that's the case, looking into how real-time computing works may help understanding how most computers don't work that way — njzk2, Jan 07 '23 at 23:31
Since I cannot post an answer: Cache, coherency only works on associated caches, L1, single thread, L2, single core, L3 single chip(or shared depending on arch), A memory write filters up slowly, multiple cycles until it is written to the slow ram. Depending on the thread scheduler and affinity, a second thread accessing the the same location may or may not read the new value while it is in the cache pipeline It is a matter of when and which core the the tread is dispatch to. Off-chip access is throttled by UPI bus(Intel) or the Memory controller on AMD. The second word would be contention... — agone, Jan 08 '23 at 06:32

score 51 · Accepted Answer · answered Jan 05 '23 at 20:02

51

Your understanding of computer hardware is flawed. Memory is not accessed by different cores in parallel, access is regulated like traffic at a road junction. Different threads can run simultaneously on different cores but they do not access the same memory cell together. What can happen is one thread ruining the work of another thread, like overwriting a result value before it has been read by a consumer. But that would be done sequentially, threads do not "collide", hitting the same cell at the same time. Reads and writes are all performed in a very controlled manner.

The race conditions software engineers speak of are not a thing at the transistor level. They are a thing at the much higher program logic level.

Think using a boolean value to control access to a resource. Before one thread uses the resource it checks the value to find it is false, meaning the resource is available. So it sets the value to true, signaling to other threads the resource is now occupied, and continues to use the resource. Between the check and set operation however another thread could have checked the value and also have found it to be false. This is the race and the unpredictability. Yet access to the variable by both threads was all performed sequentially in the most orderly fashion.

So we need something better than a boolean variable to regulate traffic at the software level and this can only work with hardware support. This problem cannot be solved in software alone. Modern processors support this feature, we most often call this a lock.

answered Jan 05 '23 at 20:02

Martin Maat

18,435

4

To add to what you have described here, processor caches add another level of complexity. One of the biggest pitfalls of multi-threaded programming is making sure the thread has the current value of shared memory when it matters. But if you constantly refresh from main memory, you are unlikely to get performance benefits from multithreading. – JimmyJames Jan 05 '23 at 21:06
16

The race conditions software engineers speak of are not a thing at the transistor level. => There are hardware data-races, though. That's how tearing occurs. For example, on x86, if an unaligned integer is stored across two cache lines, you could have half of the new value and half of the old one at the byte level. The same half old/new also occurs with incorrectly synchronized reads/writes through a PCI bus (absent proper write-combining buffers) if I recall correctly. (The OP understanding is still flawed, though) – Matthieu M. Jan 06 '23 at 09:23
"we most often call this a lock" or we have a hardware instruction like compare-and-swap – Caleth Jan 06 '23 at 10:29
8

It's important to realize that these kinds of scenarios can even happen in single-threaded applications, when using asynchronous programming techniques. Multiple threads or processes just increases surface area where such problems can occur. – shadowtalker Jan 06 '23 at 17:45
1

@shadowtalker Yes, this can even happen in "single-core" "single-threaded" applications like in embedded systems with interrupts. For example, if a 16-bit value is read by an 8-bit processor but in the middle of reading it, an interrupt occurs and changes the value mid-read. +1 to your comment. – Marco Jan 06 '23 at 18:42
2

"This problem cannot be solved in software alone." - this is not exactly the case - for single-core CPU it is not a problem to have software-only solution. – Alexei Levenkov Jan 06 '23 at 19:48
"Memory is not accessed by different cores in parallel, access is regulated like traffic at a road junction." My understanding is this is not true in PRAM, GDDR doesn't exactly work this way, especially when combined with SM memory controllers on GPU which actually do effectively allow multiple "cores" to access the same data. In fact, I'm not even sure it's impossible to get real hardware data races on a GPU. Of course, for OP's purposes that doesn't matter. – Krupip Jan 06 '23 at 19:56
Hardware race conditions are often caused by clock-domain crossings as another answer pointed out. – user253751 Jan 07 '23 at 00:22
AFAICT, your answer doesn't explain where the randomness is coming from. Hardware and software race conditions might be hard to debug or predict, but they could still be deterministic. Starting from exactly the same state and with the same input, the output of multi-threaded program should always be the same, even if resources are shared. At least in theory. Because the computer will probably do other things, and e.g. user input could be seen as an entropy source. Your race example could appear buggy but still be deterministic. Where's Schrödinger's cat in your answer? – Eric Duminil Jan 07 '23 at 12:55
@Eric I underlined that computer hardware operates in a highly regulated manner, that nothing is left to chance. I could have underlined that, although the hardware is deterministic, the software layer does not have full control over what the hardware is doing and thus cannot make assumptions about the state of it at any given time. Software cannot know at what statement it will be preempted or what other threads are doing while it is on hold. So there is some randomness from the software's perspective. For argument's sake, by hardware I mean everything that comes packaged in the processor. – Martin Maat Jan 07 '23 at 13:57
2

@EricDuminil: Many systems would behave in fully deterministic fashion if caches were initially clear and all I/O operations always yielded predictable values and were triggered at precisely predictable times. In many real-world systems, however, I/O operations may take unpredictable amounts of time to complete, and once any inconsistency exists between two possible executions, caching issues may cause them to diverge quickly after that. – supercat Jan 07 '23 at 17:49
@supercat: Exactly. This should probably be mentioned in the above answer, to explain the unpredictability. – Eric Duminil Jan 07 '23 at 20:56
See RTOS on how this is solved in software – njzk2 Jan 07 '23 at 23:29

score 10 · Answer 2 · answered Jan 05 '23 at 18:46

This is a great question! There's several factors here:

Firstly,

the logic gates/transistors pass electrons the same way and the speed of those electrons is a constant value

This isn't the case. At the high switching speeds of modern chips, and with just how fast electricity flows, nothing is ever perfectly synchronized. (As an aside, electricity isn't the flow of one electron pushing the next, like a tube of marbles. It's complicated) This is why digital chips rely on a clock. A clock synchronizes everything within its clock domain so that everything runs just a little slower than the slowest thing in the clock domain.

The fundamental issue with threading is that you're almost always going to want more software threads than you can concurrently execute on your hardware. If this weren't the case, then every disk read or network request would be needlessly stalling your CPU and wasting its opportunity to do other work. To mitigate this, you have a kernel that's responsible for scheduling ready-to-run threads onto the CPU, and switching between them rapidly enough to give the illusion of true parallelism.

The "race" comes from the unpredictability of the scheduling of threads. At any point, the kernel might decide "that's enough from you, give others a chance", and preempt your thread. You can imagine this as if you had a call to anyOtherThreadCanRunNow() between every single line of your program. Worse-yet, this preemption doesn't happen on the boundary between one line of high-level code and the next, but between every single instruction your program compiles to.

The difficulty comes from the combination of:

Just how many states your current thread can be when preempted
How many ways this state can be changed, by any one of the other threads in your process which will run between your thread gets resumed.

This is very inaccurate; race conditions are just as prominent when software threads exactly match hardware threads, and in fact you can even create race conditions with just a single thread (look at JavaScript, for example). — gntskn, Jan 06 '23 at 14:21
Are you talking about the events running on a single threaded event loop? Yes those can have rave conditions, but I’d argue they’re logically equivalent to threads: scheduled sequences of work that outnumber the hardware they run on, and thus need to get paused and resumed with the ability for them to having conflicting writes to shared state — Alexander, Jan 06 '23 at 14:37
Race* conditions (rave conditions are probably something else haha). But to finish that thought, I think the crux of the matter is having the unpredictability of a scheduler, and the event loop is really just a kind of user-mode scheduler. — Alexander, Jan 06 '23 at 14:55

candied_orange · Answer 3 · 2023-01-17T13:33:49.280

8

Theoretically speaking shouldn't the two threads access the memory slot at the exact same time down to the nano second, every time?

Theoretically speaking is fine unless you want to build something real. Build something based on that theory and you’ve just created a very troublesome requirement for anything those threads do.

If nanosecond synchronization across threads is required then you have a problem with file IO, cache misses, or indeed any kind of OS or hardware interrupt. Unless you make all threads wait together they’re going to get out of sync.

Even the task itself can cause trouble. Two threads doing the same work, with the same instructions, but different input, can cause one thread to experience an error that it needs to recover from. Again you’d have to force the threads to wait together to stay in sync.

So you could do this but what’s the point? That’s a lot of work and wasted cycles just because you don’t want to think about avoiding race conditions.

Even if you pushed for this really hard and spread this paranoia as far as you could, eventually you’d be talking with something outside this control, and be facing the same problem all over again.

So please stop blindly⁼ checking if a file exists before writing to it. Even if it did you still have no right to expect it to exist now. Just write and find out if it does. Yes it feels weird to think this way. But this is how it works. No magic wand is going to keep us from needing to think this way. Just get used to it.

edited Jan 17 '23 at 13:33

answered Jan 05 '23 at 18:53

candied_orange

108,538

I believe IBM will happily sell you a system that can execute multiple CPUs in perfect lockstep. They will also happily take a couple of million dollars from you for it. – Jörg W Mittag Jan 06 '23 at 01:38
I got lost at the analogy of the file writing... Would you mind to elaborate? (As comment is perfectly fine) – Laiv Jan 06 '23 at 11:57
4

@Laiv the preemptive multithreading computing model is that you have to assume that your thread may be interrupted (preempted) at any point and an arbitrary amount of time (and external actions!) can pass between any two "consecutive" operations of your thread. So when you check that a file exists and get a positive result, you have no right to assume that the file still exists when the next operation in your thread (e.g. writing to that file) is done, as it's permissible for another thread or process to do arbitrary stuff between these operations. – Peteris Jan 06 '23 at 13:21
@Peteris thanks your the insight! I got it! – Laiv Jan 06 '23 at 13:47
@JörgWMittag They will, but that has nothing to do with avoiding race conditions, it’s to make the system fault-tolerant WRT CPU issues, so the synchronization domain just covers the CPUs (which is actually ‘easy’ to achieve, you just make the clock external to the CPUs and share the clock lines in a way that you have the same degree of clock slew for each CPU). – Austin Hemmelgarn Jan 06 '23 at 19:25
There's nothing wrong with checking that a file exists before acquiring a lock to update it. That won't eliminate the need to check whether it still exists after acquiring the lock, but it will eliminate a lock acquisition/release step in cases where the file doesn't exist and thus won't need to be updated. – supercat Jan 06 '23 at 21:02
@supercat well no. Same race condition. You don't actually know if the other thread is deleting or creating the file. I know. It's a weird way to think. – candied_orange Jan 07 '23 at 05:44
1

@candied_orange: Code which acquires a lock which is specified as guarding a resource is entitled to expect that the guarded resource won't be modified while the lock is held. That's the whole point of the lock. My point was that having code ensure that an operation would be likely to succeed before acquiring a lock that would be useless in cases where the operation couldn't succeed is often a useful optimization. – supercat Jan 07 '23 at 17:46
@candied_orange: My point is that the first test isn't to distinguish cases where an operation would definitely succeed or definitely fail, but rather to distinguish situations where an operation would likely succeed, or could be presumed to fail. Whether an operation should be presumed to fail in situations where a file doesn't exist but some other thread which might be both holding a lock and creating the file would depend upon the application. On the other hand, if one adjusts the pattern to acquire a read lock (generally cheap) before testing whether the file exists, then... – supercat Jan 07 '23 at 17:53
...even that scenario would be impossible. If code holding a read lock observes that a file guarded by the lock doesn't exist, there's no way the file could legitimately come into existence while the lock is held. – supercat Jan 07 '23 at 17:54
@supercat I brought this up as an example of a race condition. Nothing you’re arguing disputes that. Rather you seem to be arguing that there is a use case for expending (rather expensive) file IO for the sake of preserving admittedly limited lock resources. If I’m reading you correctly I won’t dispute the existence of that use case. Rather I’m trying to teach a practical relatable example of a race condition. I’m open to better examples. – candied_orange Jan 07 '23 at 21:37

score 5 · Answer 4 · answered Jan 06 '23 at 05:04

5

Because the hard- and software is optimized for efficiency, not predictability.

For instance:

CPU caches accelerate repeated/nearby/sequential reads from memory
branch prediction starts executing the next instruction predicted before the previous one has completed, speeding up execution if the prediction is accurate
hardware interrupts can occur at any time, are triggered unpredictably, often by components with their own clock cycles, or even outside events (keystroke, mouse move, ethernet packet, ...), and cause user instructions to be delayed until the interrupt is dealt with.
disk access time is affected by what other threads are doing with the disk as well as recoverable hardware failures (bad sectors, wear leveling)
the operating system performs thread scheduling and CPU power state management according to a variety of inputs, including which other processes are currently asking for CPU time.

answered Jan 06 '23 at 05:04

meriton

4,090

1

You have only mentioned the hardware side of optimizations. It should be noted that for any programming language except assembly, the resulting code may not match what has been written in source code, as compilers are free to substitute it with faster alternatives that have the same outcome—in a sequential execution. Using multiple threads without synchronization primitives which would restrict these optimizations, may exhibit these differences. – Holger Jan 06 '23 at 08:13
1

It's also worth mentioning that while the name RAM was initially selected because access time was constant, it's no longer the case. And it gets worse with NUMA. – Matthieu M. Jan 06 '23 at 09:25
1

Excellent answer. Disk access (for actual spinning disks) isn't entirely tied to the same clock(s) as the CPU and RAM. An HDD has one or more microcontrollers, running on an independent clock accessing a mechanical device that cannot have the same predictability as a solid state device. Even if you could eliminate major hiccups (like bad sectors), the seek time, rotational latency, and transfer speed for a particular access will vary due to tiny environmental variations that are largely decoupled from those affecting the CPU and RAM. – Adrian McCarthy Jan 06 '23 at 16:30

slebetman · Answer 5 · 2023-01-09T00:58:17.960

Your proposal is worse!

shouldn't the two threads access the memory slot at the exact same time down to the nano second, every time

What you are describing here is something much, much worse than a race condition - you are describing contention. Contention is when two different circuits are trying to drive the same output at the same time.

Why contention is bad

Imagine the two cores are trying to write to a bit. Think of that bit as a wire. The first core is trying to write a 1 and the second core is trying to write a 0. For the sake of simplicity let's assume a positive logic convention and a CPU core running at 1.2 volts. What happens to that wire when one of the cores is trying to drive that wire to 1.2v and the other core is trying to drive that wire to 0v?

Well, in terms of hardware design we say the value of that bit is unpredictable and in simulation software it will be marked with a value of "x" instead of "1", "0", "z" or "?" (true, false, high-impedence, unknown). But let's say we ignore a hardware designer's sensibilities and the warnings of circuit simulators. What is physically happening to that wire?

Practically anything can happen. The details depend on the exact design of the circuit and what type of transistor you are using to drive the output and even the position of stars in the sky (not literally but kind of true - everything from cosmic rays to the earth's magnetic field can have a tiny effect).

But let's say we design the circuit in such a way as to be stable in this case and not affected by silly things like the state of the universe. For simplicity let's make the circuit behave like a perfect voltage divider if a part of it is equally being sourced some voltage and is sinking voltage. So the value of the wire is 0.6v. If 1.2v has a value of 1 and 0v has a value of 0 what is the digital (binary) value of 0.6v?

What happens in the real world is that the memory cell will have a cut-off value of what it considers 1 and what it considers 0. Let's say for the sake of simplicity our memory cell interprets anything lower than 0.6v as the value 0 and 0.6v and higher as the value of 1. So in theory if both cores is trying to write different values to memory the result should be 1 right? So is it that simple to solve?

Unfortunately no. Due to manufacturing defects that 0.6v cut-off is not exactly 0.6v. Some cells will have a cut-off at 0.6000001. Some will have a cut-off at 0.59999999 some will have a cut-off at 0.602311 etc. So in effect you cannot predict what the value will be in memory when two cores are trying to write different values to the same variable.

Can't we allow both to write?

But let's say we can design a circuit to safely solve the unpredictability. It's actually quite simple. We can combine the outputs of both circuits using something like an OR gate. So it is solved? In terms of hardware yes. In terms of how a programmer expects the CPU to behave no.

Consider the two cores are trying to write two different values to the same variable:

Thread 1 is trying to write the character "P"
Thread 2 is trying to write the character "F"
The binary value for "P" is 01010000 and the binary value of "F" is 01000110. This results in the value in memory becoming "V":
```
    01010000 = P
 or 01000110 = F
 -----------
    01010110 = V
```

I assume the programmer who wrote the program does not expect the value "V" at all. I'd personally expect the variable to contain the value "P" or "F" and no other value. Isn't this a bug? I'd consider this a form of data corruption!

Real hardware avoid contention

Because of this very weirdly unexpected behavior, no real hardware will allow two source to have write access to the same memory at the same time (read access is no problem - we do this all the time with dual-ported memory in GPUs allowing the GPU to read the memory in parallel with the CPU). Generally the CPU will be designed so that only one source will have write access to the same area of memory.

There are many different techniques a hardware designer can use to select which source can access the memory. These techniques (algorithms) are generally called "arbitration". The simplest is to simply have a default priority: eg. core0 will always go first, core1 will go next etc.

If contention is solved, why do we have race condition?

While the root cause of a race condition is real-world timing and thus is a kind of hardware problem, the reason we experience it is not a hardware issue. If we know for a fact that there is a priority of which core gets to write a value to the same variable first and which one gets to write next we can design our logic to take this into account.

The problem is we don't write our software to run on CPUs. We write them to run on Operating Systems and the operating system we use will automatically allocate our threads to run on separate cores. So when we write our code, we don't know which thread gets to write to the variable first - and that fact of not knowing is the crux of the problem with race conditions.

Race condition is often not even an issue with CPU design!

When you asked this question you talk about CPU design and why they cause race conditions. The fact is, in the real world the timing problem does not involve the CPU at all.

Let's look at an example: say you are trying to download some data from two servers: SERVER_A and SERVER_B. For the sake of simplicity let's assume we are trying to get a list of IP addresses from a Bittorrent cloud.

SERVER_A has very good low latency. Pings to it respond almost immediately. So we connect to SERVER_A first and get a response from it first.

SERVER_B has bad high latency and takes a longer to respond to the connection. However, SERVER_B has a very high speed connection so even though SERVER_A answers first SERVER_B completes sending all the data before SERVER_A.

Assume SERVER_A sends:

22.44.1.2
128.77.0.43

Assume SERVER_B sends:

34.2.66.7
34.2.131.17

Assume we write our software without thinking about race conditions and write each response directly to file as they arrive. Because SERVER_A sends the response first the file will probably begin with 22. but because SERVER_A is slow the file will also end with .43 instead of .17!

So potentially the file could look something like:

22.434.24..61.62
7
34.21.28..131.107
.43

Instead of:

22.44.1.2
128.77.0.43
34.2.66.7
34.2.131.17

This is because of the race condition where SERVER_B's response is received while we are processing SERVER_A's response.

You can see that in this case the race condition has nothing to do with the CPU design or even your motherboard's design but is related to real-world timing. Quite literally a race condition bug is when a programmer forgets that his software runs in the real world where things are not as predictable as expected.

score 5 · Answer 6 · answered Jan 06 '23 at 09:35

It's all about trade-offs

When designing the HW of a computer, the designers need to consider a variety of properties, and determinism is only one of them. Other properties include efficiency, throughput, ...

In the embedded world, with very small micro-processors, you may indeed find chips (& boards) which are fairly deterministic, though more as a result of absence of sophistication, than by design.

In the mobile/desktop/server world, efficiency and throughput have long been prioritized over determinism:

NUMA (Non-Uniform Memory Architecture) means that RAM access latency varies depending on which core (or socket) attempts to access it: some core have a physically longer path to some RAM sockets.
RAM (Random Access Memory) was named so because it used to have constant time access. The name endured even though modern RAM doesn't.
Multiple layers of caches (L1, L2, L3) mean that cores do NOT access RAM directly, and depending on whether a piece of memory is in cache (and which) access time will vary.
CPU instructions are cached, too, so a core may stall while the next instructions are being fetched from RAM while another is already executing them.
CPU frequency is not constant either, AVX-512 instructions were famous to lead to down-clocking at some point, as CPUs sought to avoid melting down.
...

So, even without touching on (1) Cache Coherency Protocols and (2) Software, as you can see, modern computers have a lot of source of differences between their various cores.

score 1 · Answer 7 · answered Jan 06 '23 at 10:05

Take a processor with a single core. There is a timer which fires say 100 times a second, and when it fires, the operating system decides whether to switch from one thread to another. That is about the simplest implementation of pre-emptive multithreading. (There is also co-operative multithreading which is totally different and not much in use anymore).

Most likely the timer is not 100% precise. When it fires after 10 milliseconds, the number of instructions performed by a thread is not guaranteed to be the same. The timer can be controlled by a different clock than the processor, so it doesn't always fire after the same number of clock cycles.

Now I move the mouse while the program is running. This will cause an interrupt, so your thread stops running while the mouse movement is being processed. But the timer fires at the same time, so after executing fewer instructions.

Now I have a processor that protects the computer from excessive heat by slowing down if it gets hot. But the timer isn't affected by this. My computer performs 2.0 instead of 3.0 billion instructions per seconds, so 20 instead of 30 million instructions between the timer firing twice.

Now I have a laptop with a battery. When the battery goes low it reduces the clock speed. Guess what happens.

That's just a few things that make it quite unpredictable what exactly happens. And if you look at Matthieu M.'s answer, he has a similar list but from the other end of your computer (deeply internal things instead of external influences).

Cooperative multitasking is still pretty common in the embedded-systems world, since CM designs are easier to validate than ones using preemptive multitasking. — supercat, Jan 06 '23 at 20:58
Didn’t realise that. Cooperative multitasking doesn’t work well with multiple cores, and on desktop/mobile there is very little with a single core. — gnasher729, Jan 08 '23 at 17:07
What matters for CM is having a known set of tasks running. I haven't done any multi-core shared memory embedded systems, but I would expect CM would be excellent for such systems as well in cases where each core is devoted to a fixed set of tasks, and communication between tasks running on different cores is relatively limtied. — supercat, Jan 08 '23 at 17:59

score 0 · Answer 8 · answered Jan 06 '23 at 13:55

The best way to think about this question is a 'what happens if' excercise:

What happens if we have a memory system which allows 2 separate CPUs to make accesses simultaneously, without delaying either access in any case. An 'access' in this case being either a read or a write of a single memory location, and the storage for each location is independent. Simultaneous access will fall into one of the following categories:

Different memory locations: No problem, each access proceeds as expected
2 reads from the same location: doesn't have to be a problem, each CPU will read the same value
One CPU reads from a location while the other writes to it. What value does the reading CPU read? We could say that by design it will see either the old value or the new value.
Both CPUs attempt to write the same location simultaneously. What value ends up stored? Again we could say by design that one of the values 'wins'. Maybe assigning a priority to each CPUs connection to the memory.

So we can produce some rules about how this memory should behave, the same behaviour can be produced by arbitrating the accesses and delaying one access, which is probably simpler electronically.

We also haven't removed the 'race condition'. Races happen when changes to the relative execution timing affect observed behaviour. While computers do have (in general) deterministic timing, in most real world conditions we cannot rely on it. In the cases I've described above, one access happening a single cycle earlier can change either the final value stored in memory, or the value read by a different CPU.

When talking about multi-threading code, we are implicitly talking about preemptive multitasking. This means that there is some supervisory process running on the computer that can halt the execution of a thread at any time for any amount of time. Given this environment, it is impossible to make any statements about the relative execution time of CPU instructions on different threads. The only statement you can make is that the instructions execute in order for a single thread (and for modern CPU designs that's only a guarantee as far as externally visible behaviour goes)