Some PCIe devices (for example FPGA card) can expose segments of its physical memory via host's BARs and the host can access the memory region via the memory devices (on Linux, we can memory mapped the devices to virtual memory). I suppose the device itself could also access this part of memory through /dev/mem mapped mechanism if it runs Linux too.
One thing a program could do to the (virtual) memory is atomic operations such as "__atomic_sub_fetch" and that could be very useful when writing high performance code.
My question is what if the memory comes from the above PCIe shared memory (and mapped to user's virtual memory space)? Does the atomic operation still hold? I do not know if PCIe can guarantee the atomic-ness considering the atomic operations could come from both the host and the device's CPUs at the same time. If yes, how is its perf compare to the same atomic operation on the regular memory?
I have seen related question asked here, not direct answer. PCI Express BAR memory mapping basic understanding
Thanks a lot!