Scenario:
We write online banking software and for the time being (due to legacy design decisions) are using Quartz 2.2 right now as a queue system to perform a handful of operatoins long term. [Let's sidebar the discussion on if it's the right tool, it's worked well for many years until we updated to 2.2]
Part of that quartz.net has the following tables (related) for schema: https://gist.github.com/jcolebrand/8695603
So the process is we're inserting records in this table, to the tune of say 80k records. (I have three replications of quartz.net serving three different configurations, one has 80k records, one 50k, one 280k, so it varies). We insert them from some other table that is the table of record, so rebuilding this table isn't a terrible loss. The issue is described below.
I have a tool which can bulk schedule these tasks (most of which are one-time and will happen anywhere from two weeks to twenty years in the future, again, this is about the db and not the architecture choices) and I can debug step through, see the rows get inserted into the table with no concern. I can monitor the table and see that they get inserted. I then come back 30 minutes later (well, this part varies. It's not deterministic on when they disappear) and check and the records are now missing.
Out of 80k on the one instance, I'm missing approximately 2700 records. And when I run my tool, I see them all synced up, and after some matter of time, they have evaporated again.
Here's things I've tried:
- checking the all transactions report in SSMS
- checking the all blocking transactions report in SSMS
- leaving my scheduling application open for long duration (in case, by some freak magic, there was an open, uncommitted transaction)
- restarting the Quartz.net windows service application maintaining the database
- inserting records with the quartz.net windows service disabled
Things I haven't tried:
- restarting SQL Server
- my queries don't use "with(nolock)" [I've heard that's bad juju for day-to-day]
Things I think I should do and don't know how:
- run SQL Profiler against the instance and monitor for "delete" statements
- magic to determine if there is an uncommitted transaction
- sacrifice unicorns
- telephone a friend
- daily double
@@version:
Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64)
Jun 28 2012 08:36:30
Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows NT 6.1 (Build 7601: Service Pack 1) (Hypervisor)
EDIT 2014-01-30
It is imperative for internal reasons that I can demonstrate precisely why the records are disappearing, including but not limited to:
- deletion by a user
- uncommitted transaction
- deletion by Quartz itself
- some other unexplained phenomena
- butterfly effect
- trigger effect
- unicorns
Where possible, I need the specific statement involved in the delete.
TL;DR
So, basically, the TL;DR is: records are inserted. For some duration of time up to at least 30 minutes record definitively show in queries against the table. After some duration, records no longer show in queries against the table.
What gives? What am I overlooking? How would you profile this IN PRODUCTION to see what's happening here?
fn_dblogexhibits the same problem. (just making the point explicitly as the names are very similar and could cause confusion) – Martin Smith Jan 29 '14 at 21:19