I'm having what appears to be a caching problem when using /dev/mem with mmap on a dual ARM processor system (Xilinx Zynq, to be exact). My configuration is asymmettric, with one processor running Linux and the other processor running a bare metal application. They communicate through a block of RAM that isn't in the Linux virtual memory space (it was excluded by the devicetree file). When my userspace Linux application writes to memory using the pointer returned from mmap(), it can take anywhere from 100 ms to well over a second for the second processor to detect the changed memory content.
On the open() call to /dev/mem, I tried to specify O_RDRW, O_SYNC, and O_DIRECT, but the O_DIRECT caused the open to fail, so I removed O_DIRECT. I thought O_SYNC should have guaranteed that data was written to memory before the write() call returned, but I'm using a memory pointer instead of writing through write(). I don't see any parameters on the mmap() call that would seem to address caching issues.
I've tried calling fsync(fd) and fdatasync() after writing to memory, but that didn't change the behavior.
What DID seem to work was spawning this command immediately after the memory write: sync; echo 3 /proc/sys/vm/drop_caches
What is the simplest way to get writes via a mapped memory pointer to flush immediately?