2

I have seen the related question including here and here, but it seems that the only instruction ever mentioned for serializing rdtsc is cpuid.

Unfortunately, cpuid takes roughly 1000 cycles on my system, so I am wondering if anyone knows of a cheaper (fewer cycles and no read or write to memory) serializing instruction?

I looked at iret, but that seems to change control flow, which is also undesirable.

I have actually looked at the whitespaper linked in Alex's answer about rstscp, but it says:

The RDTSCP instruction waits until all previous instructions have been executed before reading the counter. However, subsequent instructions may begin execution before the read operation is performed.

That second point seems to be make it less than ideal.

Community
  • 1
  • 1
merlin2011
  • 67,488
  • 40
  • 178
  • 299
  • Regarding the edit: Have you read the next section? They add CPUID exactly for that purpose (avoiding subsequent instructions from reordering above the RDTSCP) – Leeor May 03 '14 at 09:05

3 Answers3

7

Have you looked at the rdtscp instruction? This is the read serialized version of rdtsc.

For benchmarking I would recommend to read this whitepaper. It provides a couple of best practices for measuring clock ticks.

Alex(Intel)

Alexander Weggerle
  • 1,871
  • 1
  • 10
  • 7
  • 1
    Thanks for this answer. I actually did look at it before but forgot to add it to my post. I just updated my question. – merlin2011 Apr 25 '14 at 20:04
  • Given the original phrasing of the question, this is still the best answer. – merlin2011 Apr 26 '14 at 08:06
  • Did you had a look on the whitepaper I mentioned above? This explicitly provides ways to workaround the limitations of `RDTSCP`. But this unfortunately doesn't solve the overhead involved. – Alexander Weggerle Apr 28 '14 at 07:43
  • I perused it but have not had a chance to dig into it yet. – merlin2011 Apr 28 '14 at 07:48
  • `rdtscp` isn't serializing, it can reorder with later instructions. It's maybe good at the *end* of a timed region, but you might want `lfence` after it at the start. See [clflush to invalidate cache line via C function](https://stackoverflow.com/a/51830976) for an example. – Peter Cordes Aug 18 '18 at 16:12
1

The answer is apparently not. The Intel Manual, Volume 3a lists only 3 non-privileged serializing instructions (cpuid, iret, and rsm), and the latter two seem to have control-flow side-effects.

merlin2011
  • 67,488
  • 40
  • 178
  • 299
0

Well,I guess this is helpfull:lfence.Ref this 《64-ia-32-architectures-software-developer-manual》 Vol.2B 4-301

ioilala
  • 259
  • 2
  • 10