3

Under Windows, my application makes use of QueryPerformanceCounter (and QueryPerformanceFrequency) to perform "high resolution" timestamping.

Since Windows 10 (and only tested on Intel i7 processors so far), we observe erratic behaviours in the values returned by QueryPerformanceCounter. Sometimes, the value returned by the call will jump far ahead and then back to its previous value. It feels as if the thread has moved from one core to another and was returned a different counter value for a lapse of time (no proof, just a gut feeling).

This has never been observed under XP or 7 (no data about Vista, 8 or 8.1).

A "simple" workaround has been to enable the UsePlatformClock boot opiton using BCDEdit (which makes everything behaves wihtout a hitch).

I know about the potentially superior GetSystemTimePreciseAsFileTime but as we still support 7 this is not exactly an option unless we write totatlly different code for different OSes, which we really don't want to do.

Has such behaviour been observed/explained under Windows 10 ?

Denis Troller
  • 7,331
  • 1
  • 22
  • 36
  • Do you have any info on which i7 processors you saw this poor behavior on, and which windows 10 version(s)? There have been i7 processors for almost two years, so I'm really curious if it is with modern i7 CPUs or older ones, or both. – aggieNick02 Apr 26 '18 at 19:45
  • 1
    I saw this problem specifically on my Macbook Pro with core i7-4870HQ running under Windows 10 (I cannot remember which specific version I was running and I have not tested with a more rcent one). I have seen this problem on other Dell machines, but I do not know what CPU they were using. – Denis Troller May 15 '18 at 12:11
  • Thanks for the details there. While it's not a brand new processor, it's not that old either - late 2014. Windows 10 released ~9 months later, so seeing issues in a somewhat current processor is frustrating. – aggieNick02 May 15 '18 at 14:13
  • Some important caveats for the couple options you mentioned. First, GetSystemTimePreciseAsFileTime can be affected by things like NTP updates. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms724943(v=vs.85).aspx. Second, UsePlatformClock via BCDEdit is documented as for debugging only, and can cause performance issues. See https://www.anandtech.com/show/12678/a-timely-discovery-examining-amd-2nd-gen-ryzen-results – aggieNick02 May 15 '18 at 14:25
  • Possible reason for seeing the issue on your Macbook Pro. The erratum for that processor includes "TSC May Be Incorrect After A Deep C-State Exit". See https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-mobile-specification-update.pdf – aggieNick02 May 15 '18 at 15:16
  • I am aware of the UsePlatformClock issues, but I had no other option at the time. We do not use it unless we encounter the problem, and we have since added support for GetSystemTimePreciseAsFileTime under Windows 10. And Ntp is actually a bonus for us since we always were looking for timestamping and not time-measures. The notes on this specific processor are welcome though, thanks for that ! – Denis Troller May 16 '18 at 15:01

1 Answers1

3

I'd need much more knowledge about your code but let me highlight few things from MSDN:

When computing deltas, the values [from QueryPerformanceCounter] should be clamped to ensure that any bugs in the timing values do not cause crashes or unstable time-related computations.

And especially this:

Set that single thread to remain on a single processor by using the Windows API SetThreadAffinityMask ... While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for multiple processors, bugs in the BIOS or drivers may result in these routines returning different values as the thread moves from one processor to another. So, it's best to keep the thread on a single processor.

Your case might exploited one of those bugs. In short:

  • You should query the timestamp always from one thread (setting same CPU affinity to be sure it won't change) and read that value from any other thread (just an interlocked read, no need for fancy synchronizations).
  • Clamp the calculated delta (at least to be sure it's not negative)...

Notes:

QueryPerformanceCounter() uses, if possible, TSC (see MSDN). Algorithm to synchronize TSC (if available and in your case it should be) is vastly changed from Windows 7 to Windows 8 however note that:

With the advent of multi-core/hyper-threaded CPUs, systems with multiple CPUs, and hibernating operating systems, the TSC cannot be relied upon to provide accurate results — unless great care is taken to correct the possible flaws: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. Therefore, a program can get reliable results only by limiting itself to run on one specific CPU.

Then, even if in theory QPC is monotonic then you must always call it from the same thread to be sure of this.

Another note: if synchronization is made by software you may read from Intel documentation that:

...It may be difficult for software to do this in a way than ensures that all logical processors will have the same value for the TSC at a given point in time...


Edit: if your application is multithreaded and you can't (or you don't wan't) to set CPU affinity (especially if you need precise timestamping at the cost to have de-synchronized values between threads) then you may use GetSystemTimePreciseAsFileTime() when running on Win8 (or later) and fallback to timeGetTime() for Win7 (after you set granularity to 1 ms with timeBeginPeriod(1) and assuming 1 ms resolution is enough). A very interesting reading: The Windows Timestamp Project.

Edit 2: directly suggested by OP! This, when applicable (because it's a system setting, not local to your application), might be an easy workaround. You can force QPC to use HPET instead of TSC using bcdedit (see MSDN). Latency and resolution should be worse but it's intrinsically safe from above described issues.

Adriano Repetti
  • 62,720
  • 18
  • 132
  • 197
  • Thanks ! Basically we're screwed short of a massive rethinking because our code is massively multithreaded with everybody using QPC for obtaining precise timestamps (the only way of obtaining millisecond or sub-millisecond timestamps on pre-W8 Windows as far as we know...) – Denis Troller May 17 '17 at 12:32
  • Well, as _dirty workaround_ you may make one of your (long-living) threads responsible to call QPC and (kind of) search & replace all the other calls to QueryPerformanceCounter() to simply read that value – Adriano Repetti May 17 '17 at 12:42
  • Yes, but that defeats the purpose of "precise" timestamping, since we now rely on another thread pulling the values at a "somewhat 'low' frequency", artificially recreating the problem we try to escape with the DateTime.Now problem where the value is updated at "high" intervals. – Denis Troller May 17 '17 at 12:50
  • True (even if a quasi real-time working thread may mitigate the issue). If applicable note that `KeQueryPerformanceCounter()` has much better throughput (in case you opt for your own "timestamping service"). HOWEVER...given that you have to (at least) write your own function to replace QPC then...why don't you use `GetSystemTimePreciseAsFileTime()` for Win8+ and (assuming 1 ms resolution is enough) `timeGetTime()` (after a call to `timeBeginPeriod(1)`) for Win7? – Adriano Repetti May 17 '17 at 13:08
  • That's what I'm going to have to think about. Still not sure what my trick of setting UsePlatformClock seems to work though :) – Denis Troller May 17 '17 at 13:41
  • @Denis actually that's a nice trick! I didn't know you can force QPC to use HPET, probably latency is much worse but it's intrinsically _CPU-safe_! – Adriano Repetti May 17 '17 at 13:52
  • Apparently forcing QPC/system to use HPET can have some massive negative performance implications: https://www.anandtech.com/show/12678/a-timely-discovery-examining-amd-2nd-gen-ryzen-results – aggieNick02 Apr 25 '18 at 21:55
  • 1
    The MSDN document linked to in the answer, Acquiring high-resolution time stamps, was modified 5/31/2018, and no longer reads as quoted in this answer. Instead, it says that "the performance counter results are consistent across all processors in multi-core and multi-processor systems, even when measured on different threads or processes" with the exceptions of pre-Vista OS, and between thread tick-counts having a +/- 1 tick ambiguous ordering. Despite the more optimistic documentation, I'd be still wary of not locking the timer down to one core... – Jacob Lee Jul 22 '21 at 23:13