why atomic load/store didn't generate lock-prefixed instructions

Question

I wrote some code as below:

char abcd [10] = {0x12, 0x34, 0x56, 0x78, 0x11, 0x22, 0x33};
atomic<int> * data = (atomic<int> *)&abcd[1]; // to let data not aligned with 16/32/64 bit

extern void test() {
    auto result = data->load(memory_order_relaxed);
    //printf("data:%p , %x\n", data,  result);
}

The ASM code is:

0000000000001140 <_Z4testv>:
    1140:       f3 0f 1e fa             endbr64
    1144:       48 8b 05 d5 2e 00 00    mov    0x2ed5(%rip),%rax        # 4020 <data>
    114b:       8b 00                   mov    (%rax),%eax
    114d:       c3                      retq
    114e:       66 90                   xchg   %ax,%ax

If I enable "printf("data:%p , %x\n", data, result);" in test function, the output is: data:0x55f85a60b011 , 11785634

My question is:

The int data is not aligned with 16/32/64 bit, but why atomic load just is "mov (%rax),%eax", as I know, not aligned data is not atomic, seems "lock mov (%rax),%eax" is expected result, what's the reason?

From the printf output 0x11785634:

0000000000004010 <abcd>:
    4010:       12 34 56                adc    (%rsi,%rdx,2),%dh
    4013:       78 11                   js     4026 <data+0x6>
    4015:       22 33                   and    (%rbx),%dh

from the print output, data is read form byte4011 to byte4044, the unaligned 4 bytes read can atomic?

You’re reinterpret casting something into a type that it isnt (ub) and then asking why? The answer is because it can. — Taekahn, May 06 '22 at 04:12
Maybe it's a low-level mistake, but I didn't realize it, can you elaborate？ — taoozh, May 06 '22 at 05:25
Every type in C++ has a required alignment, and the compiler is allowed to assume that all pointers to that type have that alignment. When you break that rule, all bets are off and everything can fail. Atomics just happen to fail in a different way than other types. In short, unaligned `std::atomic` is simply not supported by C++ compilers and nobody ever claimed that it was. — Nate Eldredge, May 06 '22 at 05:35
`lock mov` doesn't exist. It's only supported for memory-destination RMWs. See [Why is integer assignment on a naturally aligned variable atomic on x86?](https://stackoverflow.com/a/36685056) which specifically mentions the non-existence of `lock mov`. Also related: [Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?](https://stackoverflow.com/q/47510783) re: deref of misaligned pointers being UB — Peter Cordes, May 06 '22 at 05:50
Also, you're re-inventing C++ `std::atomic_ref<>` without respecting `std::atomic_ref::required_alignment`. [atomic\_ref when external underlying type is not aligned as requested](https://stackoverflow.com/q/61996108) — Peter Cordes, May 06 '22 at 05:51
BTW, your dword load actually *is* guaranteed atomic from cacheable memory on x86-64 CPUs. Since `char abcd[10]` happens to start at a 16-byte aligned address, `abcd[1..4]` is contained within a single aligned qword (8 bytes), and thus load/store is guaranteed atomic on AMD and Intel CPUs. — Peter Cordes, May 06 '22 at 06:02
Thank you very much, you provided detailed explanations and solved my doubts. — taoozh, May 06 '22 at 06:20
Although there is no `lock mov` as Peter points out, on x86 you can atomically load or store an unaligned variable with `xchg`, though it can be extremely slow if a cache line boundary is crossed. But on some other architectures, there is simply no way at all to do an unaligned load or store atomically. So it would not stand to reason for a cross-platform C++ compiler to support such behavior. It could only be done with a mutex, and it would be unreasonable for a compiler to check every atomic access for alignment and fall back to a mutex. — Nate Eldredge, May 06 '22 at 14:53

why atomic load/store didn't generate lock-prefixed instructions

0 Answers0