14

I want to apply pagination on a table with huge data. All I want to know a better option than using OFFSET in SQL Server.

Here is my simple query:

SELECT *
FROM TableName
ORDER BY Id DESC 
OFFSET 30000000 ROWS
FETCH NEXT 20 ROWS ONLY
Martin Smith
  • 419,657
  • 83
  • 708
  • 800
  • 3
    Yes. https://use-the-index-luke.com/sql/partial-results/fetch-next-page – jarlh Dec 29 '21 at 12:44
  • Why do you need a *better option* than `OFFSET / FETCH` ? What's wrong with `OFFSET / FETCH` ?? – marc_s Dec 29 '21 at 12:49
  • 3
    Do you really need to jump to page 1,500,000? Perhaps you need a better criteria? – Stu Dec 29 '21 at 13:10
  • @marc_s Recently I have learnt that the way OFFSET works are by counting how many rows it should skip. After that it is giving your result. In other words, to get the results from rows 30000000 to 30000020 it needs to scan through the first 30000000 and then throw them away. It seems to much wasteful. So I want to know a better option to apply server side pagination if there exist any better way. – Md. Tarikul Islam Soikot Dec 29 '21 at 13:33
  • 2
    See also https://use-the-index-luke.com/no-offset, it's called Keyset Pagination. – Charlieface Dec 29 '21 at 13:47
  • @The Impaler Can you please explain about Sing key set pagination? Actually I want to know better approach if there exist any – Md. Tarikul Islam Soikot Dec 29 '21 at 16:45
  • @Md.TarikulIslamSoikot I'm sorry it was a typo. I meant "Keyset Pagination". The answer below describes it. – The Impaler Dec 29 '21 at 21:50
  • Is there a real world use for this? Does someone really need to look at 30 millionth row? – Salman A Jan 07 '22 at 10:09
  • @Md.TarikulIslamSoikot You needed to actually manually award the bounty if it was placed after you accepted my answer, see https://meta.stackexchange.com/questions/107139/bounty-disappeared – Charlieface Jan 08 '22 at 18:57
  • Does this answer your question? [What is the best way to paginate results in SQL Server](https://stackoverflow.com/questions/109232/what-is-the-best-way-to-paginate-results-in-sql-server) – Michael Freidgeim May 18 '22 at 23:55
  • @SalmanA yes -- when your client is not a human but machine readable there are numerous use cases – That Realty Programmer Guy May 27 '22 at 04:23
  • @Md.TarikulIslamSoikot,can you your table schema and existing index ,along with datatype ? – KumarHarsh May 31 '22 at 09:13

4 Answers4

20

You can use Keyset Pagination for this. It's far more efficient than using Rowset Pagination (paging by row number).

In Rowset Pagination, all previous rows must be read, before being able to read the next page. Whereas in Keyset Pagination, the server can jump immediately to the correct place in the index, so no extra rows are read that do not need to be.

For this to perform well, you need to have a unique index on that key, which includes any other columns you need to query.

In this type of pagination, you cannot jump to a specific page number. You jump to a specific key and read from there. So you need to save the unique ID of page you are on and skip to the next. Alternatively, you could calculate or estimate a starting point for each page up-front.

One big benefit, apart from the obvious efficiency gain, is avoiding the "missing row" problem when paginating, caused by rows being removed from previously read pages. This does not happen when paginating by key, because the key does not change.


Here is an example:

Let us assume you have a table called TableName with an index on Id, and you want to start at the latest Id value and work backwards.

You begin with:

SELECT TOP (@numRows)
  *
FROM TableName
ORDER BY Id DESC;

Note the use of ORDER BY to ensure the order is correct

In some RDBMSs you need LIMIT instead of TOP

The client will hold the last received Id value (the lowest in this case). On the next request, you jump to that key and carry on:

SELECT TOP (@numRows)
  *
FROM TableName
WHERE Id < @lastId
ORDER BY Id DESC;

Note the use of < not <=

In case you were wondering, in a typical B-Tree+ index, the row with the indicated ID is not read, it's the row after it that's read.


The key chosen must be unique, so if you are paging by a non-unique column then you must add a second column to both ORDER BY and WHERE. You would need an index on OtherColumn, Id for example, to support this type of query. Don't forget INCLUDE columns on the index.

SQL Server does not support row/tuple comparators, so you cannot do (OtherColumn, Id) < (@lastOther, @lastId) (this is however supported in PostgreSQL, MySQL, MariaDB and SQLite).

Instead you need the following:

SELECT TOP (@numRows)
  *
FROM TableName
WHERE (
    (OtherColumn = @lastOther AND Id < @lastId)
    OR OtherColumn < @lastOther
)
ORDER BY
  OtherColumn DESC,
  Id DESC;

This is more efficient than it looks, as SQL Server can convert this into a proper < over both values.

The presence of NULLs complicates things further. You may want to query those rows separately.

Charlieface
  • 29,562
  • 5
  • 12
  • 30
  • 1
    So you cannot jump directly to 30,000,000th row unless you have read all previous rows to calculate the `@lastId`. How is this useful? – Salman A Jan 07 '22 at 10:12
  • That's true, you cannot, and I did mention this. But it's pretty rare that a user actually *wants* to do so from the start. As you said yourself in a comment *"Does someone really need to look at 30 millionth row?"* Normally they might say "I'm up to this point in the list, I want the next few rows" and paginating by key works better for this, because the missing rows problem doesn't affect it (if you remove rows from earlier pages and you page by row-number then you will miss rows). It works very well for infinite scrolling, and for batch operations – Charlieface Jan 07 '22 at 12:40
2

On very big merchant website we use a technic compound of ids stored in a pseudo temporary table and join with this table to the rows of the product table.

Let me talk with a clear example.

We have a table design this way :

CREATE TABLE S_TEMP.T_PAGINATION_PGN
(PGN_ID              BIGINT IDENTITY(-9 223 372 036 854 775 808, 1) PRIMARY KEY,
 PGN_SESSION_GUID    UNIQUEIDENTIFIER NOT NULL,
 PGN_SESSION_DATE    DATETIME2(0) NOT NULL,
 PGN_PRODUCT_ID      INT NOT NULL,
 PGN_SESSION_ORDER   INT NOT NULL);
CREATE INDEX X_PGN_SESSION_GUID_ORDER 
   ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_GUID, PGN_SESSION_ORDER)
   INCLUDE (PGN_SESSION_ORDER);
CREATE INDEX X_PGN_SESSION_DATE 
   ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_DATE);

We have a very big product table call T_PRODUIT_PRD and a customer filtered it with many predicates. We INSERT rows from the filtered SELECT into this table this way :

DECLARE @SESSION_ID UNIQUEIDENTIFIER = NEWID();
INSERT INTO S_TEMP.T_PAGINATION_PGN
SELECT @SESSION_ID , SYSUTCDATETIME(), PRD_ID,
       ROW_NUMBER() OVER(ORDER BY --> custom order by
FROM   dbo.T_PRODUIT_PRD 
WHERE  ... --> custom filter

Then everytime we need a desired page, compound of @N products we add a join to this table as :

...
JOIN S_TEMP.T_PAGINATION_PGN
   ON PGN_SESSION_GUID = @SESSION_ID
      AND 1 + (PGN_SESSION_ORDER / @N) = @DESIRED_PAGE_NUMBER
      AND PGN_PRODUCT_ID = dbo.T_PRODUIT_PRD.PRD_ID

All the indexes will do the job !

Of course, regularly we have to purge this table and this is why we have a scheduled job which deletes the rows whose sessions were generated more than 4 hours ago :

DELETE FROM S_TEMP.T_PAGINATION_PGN
WHERE  PGN_SESSION_DATE < DATEADD(hour, -4, SYSUTCDATETIME());
SQLpro
  • 2,501
  • 1
  • 4
  • 11
2

In the same spirit as SQLPro solution, I propose:

WITH CTE AS
(SELECT 30000000  AS N
UNION ALL SELECT N-1 FROM CTE
WHERE N > 30000000 +1 - 20)
SELECT T.* FROM CTE JOIN TableName T ON CTE.N=T.ID
ORDER BY CTE.N DESC

Tried with 2 billion lines and it's instant ! Easy to make it a stored procedure... Of course, valid if ids follow each other.

Datafly
  • 21
  • 2
-1

Recently I have learnt that the way OFFSET works are by counting how many rows it should skip. After that it is giving your result. In other words, to get the results from rows 30000000 to 30000020 it needs to scan through the first 30000000 and then throw them away.

This is not due to OFFSET.This your purely performance issue because of badly written query or due to absent of index or index not being use.

Even KEYSET pagination technique working similar to OFFSET.

Whenever there is million of records involve ,you need to optimize query or understand where is the problem then optimize query and tune index.

In your example you need to throw Table Schema and their existing index. You need to show real query,then it can be plan accordingly.

In other words, to get the results from rows 30000000 to 30000020 it needs to scan through the first 30000000 and then throw them away.

whenever query is not written in proper way or index not being utilize then same thing happen be it OFFSET or any other predicate

Suppose id is PK of table TableName then

;with CTE as
(
SELECT id
FROM TableName
ORDER BY Id DESC 
OFFSET 30000000 ROWS
FETCH NEXT 20 ROWS ONLY
)

Select tb.col,tb.col2,tb.col3
from TableName tb inner join CTE C on tb.id=c.id

You notice the query plan in this scenario.OFFSET wont do any SCAN. All above query are fine and has been modified in order to utilize INDEX only.

Did you check query plan what happending with your query ?

How you have implemented KEYSET Pagination ?

KumarHarsh
  • 4,894
  • 1
  • 16
  • 20
  • 1
    "OFFSET wont do any SCAN" it will still scan all 30mln rows and throw them away, then take the last 10 and do the join, so you haven't really gained other than to read a smaller index. – Charlieface May 31 '22 at 13:27
  • @Charlieface,Thanks for your response.Sorry for my language above.If I have suitable example then I can explain it wont scan.,if it uses index then it will know which page to jump directly and read Key Value.It is Scanning because it is not using Index,try to get me – KumarHarsh May 31 '22 at 13:41
  • As you can see from this fiddle https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=05fefd9e5db04b2cffbd13d6a4de55f9 there is a scan on the smaller index, then a nested loops join with a seek on the clustered index. It is impossible to get a seek on the inner part of your CTE purely with `OFFSET` – Charlieface May 31 '22 at 13:58
  • @Charlieface,As you know for smaller result,optmizer will always Scan because that will be cost effective and fastest route for optmizer.Also I come with suitable example tomorrow.I was only trying to explain how OFFSET query can be improve to use index in certain situation and there is no harm in OFFSET – KumarHarsh May 31 '22 at 14:16