I'd need much more knowledge about your code but let me highlight few things from MSDN:
When computing deltas, the values [from QueryPerformanceCounter] should be clamped to ensure that any bugs in the timing values do not cause crashes or unstable time-related computations.
And especially this:
Set that single thread to remain on a single processor by using the Windows API SetThreadAffinityMask ... While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for multiple processors, bugs in the BIOS or drivers may result in these routines returning different values as the thread moves from one processor to another. So, it's best to keep the thread on a single processor.
Your case might exploited one of those bugs. In short:
- You should query the timestamp always from one thread (setting same CPU affinity to be sure it won't change) and read that value from any other thread (just an interlocked read, no need for fancy synchronizations).
- Clamp the calculated delta (at least to be sure it's not negative)...
Notes:
QueryPerformanceCounter() uses, if possible, TSC (see MSDN). Algorithm to synchronize TSC (if available and in your case it should be) is vastly changed from Windows 7 to Windows 8 however note that:
With the advent of multi-core/hyper-threaded CPUs, systems with multiple CPUs, and hibernating operating systems, the TSC cannot be relied upon to provide accurate results — unless great care is taken to correct the possible flaws: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. Therefore, a program can get reliable results only by limiting itself to run on one specific CPU.
Then, even if in theory QPC is monotonic then you must always call it from the same thread to be sure of this.
Another note: if synchronization is made by software you may read from Intel documentation that:
...It may be difficult for software to do this in a way than ensures that all logical processors will have the same value for the TSC at a given point in time...
Edit: if your application is multithreaded and you can't (or you don't wan't) to set CPU affinity (especially if you need precise timestamping at the cost to have de-synchronized values between threads) then you may use GetSystemTimePreciseAsFileTime() when running on Win8 (or later) and fallback to timeGetTime() for Win7 (after you set granularity to 1 ms with timeBeginPeriod(1) and assuming 1 ms resolution is enough). A very interesting reading: The Windows Timestamp Project.
Edit 2: directly suggested by OP! This, when applicable (because it's a system setting, not local to your application), might be an easy workaround. You can force QPC to use HPET instead of TSC using bcdedit (see MSDN). Latency and resolution should be worse but it's intrinsically safe from above described issues.