Atomic read-compare-write instructions from multiple cores at the same time do contend with each other, but it's up to hardware to sort that out. Hardware arbitration of atomic RMW instructions is a real thing in modern CPUs, and provides some degree of fairness so that one thread spinning on lock cmpxchg can't totally block other threads doing the same thing. (Although that's a bad design: better to spin on an acquire-load and only do the CAS if it should succeed)
There's no guarantee what order they happen in, which is why you need to carefully design your algorithm so that correctness only depends on that compare-and-exchange being atomic. (The ABA problem is a common pitfall).
BTW, that entire block of pseudocode happens as a single atomic operation. Making a read-compare-write or read-modify-write happen as a single atomic operation is much harder for the hardware than just stores, which MESIF/MOESI handle just fine.
are you sure? I thought that it's unsafe to do that because, for example, x86 doesn't guarantee atomicity of writes for non-aligned DWORDs
lock cmpxchg makes the operation atomic regardless of alignment. It's potentially a lot slower for unaligned, especially on cache-line splits where atomically modifying a single cache line isn't enough.
See also Atomicity on x86 where I explain what it means for an operation to be atomic.