3

I'm storing a queue of events in Sql Server with concurrent writers and I'm assigning a sequence number using an IDENTITY column.

I'm willing to accept the gaps that might occur, but I would like readers not to read past the minimum identity active in any uncomitted transaction... like what I can do with min_active_rowversion for ROWVERSION columns.

Is this possible?

EDIT: The actual problem

What I'm trying to do is finding a solution for the problem described here: https://dba.stackexchange.com/a/135116/176861

This is solved for ROWVERSION with min_active_rowversion like this:

TX A begins and inserts a row with rowversion = 1
SELECT * FROM events WHERE rowversion < min_active_rowversion() => no result
TX B begins and inserts a row with rowversion = 2
SELECT * FROM events WHERE rowversion < min_active_rowversion() => no result    
TX B completes
SELECT * FROM events WHERE rowversion < min_active_rowversion() => no result    
TX A completes
SELECT * FROM events WHERE rowversion < min_active_rowversion() => both rows as result

Like this, I won't "read past" the minimum active rowversion and won't miss updates. But rowversion is not immutable like a proper identity is, and I would like to do the same but using an IDENTITY column.

I would like to make a SELECT statement that does not read a row with an id greater than the lowest identity in any active transaction.

EDIT 2: The schema

A simplified version of my table looks like this:

CREATE TABLE events
(  
    sequence int NOT NULL IDENTITY(0,1),  
    name nvarchar (850),  
    data varbinary(max)
);  

EDIT 3: Proposed solution

I got an idea for a - probably obvious - solution: Have both IDENTITY and ROWVERSION.

If a row have both, I can use an immutable Identity value as the starting point of where I want to read from (which row did I read and successfully process last) and min_active_rowversion for my cutoff, so I don't read to far ahead.

I'll try it out an report back here.

asgerhallas
  • 225
  • 1
  • 6
  • 1
    I'm confused, have you tested this? Have you seen another transaction reading past the identity as described? – George.Palacios Apr 10 '19 at 07:46
  • 1
    Schema definition and code examples might help here. A sequence and an identity column in SQL Server are different things, so it's important to clarify what you're using. – HandyD Apr 10 '19 at 07:56
  • @George.Palacios please see edits. Do they clarify what I mean? – asgerhallas Apr 10 '19 at 08:15
  • Am i being dumb, or can you just solve this by using the correct isolation level? – Philᵀᴹ Apr 10 '19 at 08:40
  • @Philᵀᴹ I think the issue here is that the current seed of the identity column is updated outside the context of a transaction, so isolation level won't help. – George.Palacios Apr 10 '19 at 09:01
  • @George.Palacios yes, precisely, thank you for clarifying. – asgerhallas Apr 10 '19 at 09:07
  • Could you not simply include a "processed" flag in your events table that is updated when you've processed an event? Then it doesn't matter if ID 5 is inserted before ID 4 because when you re-query the table for the next batch of events (where processed = 0) you will get event 4, 6, 7 and so on? – HandyD Apr 11 '19 at 08:17
  • @HandyD unfortunately not, because they must be processed in order. And because there can be multiple processors, that each need to know where in the stream they are. – asgerhallas Apr 11 '19 at 11:36
  • @asgerhallas how did you go with Edit 3: Proposed solution. Or did you find another workable solution? I had the same idea around min_active_rowversion, but viability would depend on if rowversion is issued first or identity value – Aaron0 Jul 10 '20 at 07:48
  • @Aaron0 it has worked very well, but I ended up manually assigning ids for starting points instead of using IDENTITY, though not because it did not work, but because it solved other problems... I'm not sure I understand why the order of issuing matters, could you explain? – asgerhallas Jul 27 '20 at 07:20
  • @asgerhallas, basically I don't' know when in the process identity & rowversion values are handed out. But if say identity values were handed out in bulk at the start of the process and then row version values were handed at row write time. There would be a small window between identity generation & row writing were a second transaction could commit and cause min_active_rowversion to not be an effective safety measure – Aaron0 Jul 29 '20 at 07:40
  • I'm currently planning on doing a POC around two different options. 1. use two tables one for writes and one for reads and have a single thread that copied committed events between the tables. 2. Use a single table but have a two sequence number with one being written by a single-threaded worker only after first committed, consumers will ignore recorded not yet given the second sequence. Both methods try to use a single thread to force the system to be sequential – Aaron0 Jul 29 '20 at 09:22

0 Answers0