Say I have the defacto standard x86 CPU with 3 level of Caches, L1/L2 private, and L3 shared among cores. Is there a way to allocate shared memory whose data will not be cached on L1/L2 private caches, but rather it will only be cached at L3? I don't want to fetch data from memory (that's too costly), but I'd like to experiment with performance with and without bringing the shared data into private caches.
The assumption is that L3 is shared among the cores (presumably physically indexed cache) and thus will not incur any false sharing or cache line invalidation for heavily used shared data.
Any solution (if it exists) would have to be done programmatically, using C and/or assembly for intel based CPUs (relatively modern Xeon architectures (skylake, broadwell), running linux based OS.
Edit:
I have latency sensitive code which uses a form of shared memory for synchronization. The data will be in L3, but when read or written to it will go into L1/L2 depending on cache inclusivity policy. By implication of the problem, the data will have to be invalidated adding an unnecessary (I think) performance hit. I'd like to see if it's possible to just store the data, either through some page policy or special instructions only in L3.
I know it's possible to use the special memory register to inhibit caching for security reasons, but that requires CPL0 privilege.
Edit2:
I'm dealing with parallel codes that run on high performance systems for months at at time. The systems are high core-count systems (eg. 40-160+ cores) that periodically perform synchronization which needs to execute in usecs.