I want to apply pagination on a table with huge data. All I want to know a better option than using OFFSET in SQL Server.
Here is my simple query:
SELECT *
FROM TableName
ORDER BY Id DESC
OFFSET 30000000 ROWS
FETCH NEXT 20 ROWS ONLY
I want to apply pagination on a table with huge data. All I want to know a better option than using OFFSET in SQL Server.
Here is my simple query:
SELECT *
FROM TableName
ORDER BY Id DESC
OFFSET 30000000 ROWS
FETCH NEXT 20 ROWS ONLY
You can use Keyset Pagination for this. It's far more efficient than using Rowset Pagination (paging by row number).
In Rowset Pagination, all previous rows must be read, before being able to read the next page. Whereas in Keyset Pagination, the server can jump immediately to the correct place in the index, so no extra rows are read that do not need to be.
For this to perform well, you need to have a unique index on that key, which includes any other columns you need to query.
In this type of pagination, you cannot jump to a specific page number. You jump to a specific key and read from there. So you need to save the unique ID of page you are on and skip to the next. Alternatively, you could calculate or estimate a starting point for each page up-front.
One big benefit, apart from the obvious efficiency gain, is avoiding the "missing row" problem when paginating, caused by rows being removed from previously read pages. This does not happen when paginating by key, because the key does not change.
Here is an example:
Let us assume you have a table called TableName with an index on Id, and you want to start at the latest Id value and work backwards.
You begin with:
SELECT TOP (@numRows)
*
FROM TableName
ORDER BY Id DESC;
Note the use of
ORDER BYto ensure the order is correctIn some RDBMSs you need
LIMITinstead ofTOP
The client will hold the last received Id value (the lowest in this case). On the next request, you jump to that key and carry on:
SELECT TOP (@numRows)
*
FROM TableName
WHERE Id < @lastId
ORDER BY Id DESC;
Note the use of
<not<=
In case you were wondering, in a typical B-Tree+ index, the row with the indicated ID is not read, it's the row after it that's read.
The key chosen must be unique, so if you are paging by a non-unique column then you must add a second column to both ORDER BY and WHERE. You would need an index on OtherColumn, Id for example, to support this type of query. Don't forget INCLUDE columns on the index.
SQL Server does not support row/tuple comparators, so you cannot do (OtherColumn, Id) < (@lastOther, @lastId) (this is however supported in PostgreSQL, MySQL, MariaDB and SQLite).
Instead you need the following:
SELECT TOP (@numRows)
*
FROM TableName
WHERE (
(OtherColumn = @lastOther AND Id < @lastId)
OR OtherColumn < @lastOther
)
ORDER BY
OtherColumn DESC,
Id DESC;
This is more efficient than it looks, as SQL Server can convert this into a proper < over both values.
The presence of NULLs complicates things further. You may want to query those rows separately.
On very big merchant website we use a technic compound of ids stored in a pseudo temporary table and join with this table to the rows of the product table.
Let me talk with a clear example.
We have a table design this way :
CREATE TABLE S_TEMP.T_PAGINATION_PGN
(PGN_ID BIGINT IDENTITY(-9 223 372 036 854 775 808, 1) PRIMARY KEY,
PGN_SESSION_GUID UNIQUEIDENTIFIER NOT NULL,
PGN_SESSION_DATE DATETIME2(0) NOT NULL,
PGN_PRODUCT_ID INT NOT NULL,
PGN_SESSION_ORDER INT NOT NULL);
CREATE INDEX X_PGN_SESSION_GUID_ORDER
ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_GUID, PGN_SESSION_ORDER)
INCLUDE (PGN_SESSION_ORDER);
CREATE INDEX X_PGN_SESSION_DATE
ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_DATE);
We have a very big product table call T_PRODUIT_PRD and a customer filtered it with many predicates. We INSERT rows from the filtered SELECT into this table this way :
DECLARE @SESSION_ID UNIQUEIDENTIFIER = NEWID();
INSERT INTO S_TEMP.T_PAGINATION_PGN
SELECT @SESSION_ID , SYSUTCDATETIME(), PRD_ID,
ROW_NUMBER() OVER(ORDER BY --> custom order by
FROM dbo.T_PRODUIT_PRD
WHERE ... --> custom filter
Then everytime we need a desired page, compound of @N products we add a join to this table as :
...
JOIN S_TEMP.T_PAGINATION_PGN
ON PGN_SESSION_GUID = @SESSION_ID
AND 1 + (PGN_SESSION_ORDER / @N) = @DESIRED_PAGE_NUMBER
AND PGN_PRODUCT_ID = dbo.T_PRODUIT_PRD.PRD_ID
All the indexes will do the job !
Of course, regularly we have to purge this table and this is why we have a scheduled job which deletes the rows whose sessions were generated more than 4 hours ago :
DELETE FROM S_TEMP.T_PAGINATION_PGN
WHERE PGN_SESSION_DATE < DATEADD(hour, -4, SYSUTCDATETIME());
In the same spirit as SQLPro solution, I propose:
WITH CTE AS
(SELECT 30000000 AS N
UNION ALL SELECT N-1 FROM CTE
WHERE N > 30000000 +1 - 20)
SELECT T.* FROM CTE JOIN TableName T ON CTE.N=T.ID
ORDER BY CTE.N DESC
Tried with 2 billion lines and it's instant ! Easy to make it a stored procedure... Of course, valid if ids follow each other.
Recently I have learnt that the way OFFSET works are by counting how many rows it should skip. After that it is giving your result. In other words, to get the results from rows 30000000 to 30000020 it needs to scan through the first 30000000 and then throw them away.
This is not due to OFFSET.This your purely performance issue because of badly written query or due to absent of index or index not being use.
Even KEYSET pagination technique working similar to OFFSET.
Whenever there is million of records involve ,you need to optimize query or understand where is the problem then optimize query and tune index.
In your example you need to throw Table Schema and their existing index. You need to show real query,then it can be plan accordingly.
In other words, to get the results from rows 30000000 to 30000020 it needs to scan through the first 30000000 and then throw them away.
whenever query is not written in proper way or index not being utilize then same thing happen be it OFFSET or any other predicate
Suppose id is PK of table TableName then
;with CTE as
(
SELECT id
FROM TableName
ORDER BY Id DESC
OFFSET 30000000 ROWS
FETCH NEXT 20 ROWS ONLY
)
Select tb.col,tb.col2,tb.col3
from TableName tb inner join CTE C on tb.id=c.id
You notice the query plan in this scenario.OFFSET wont do any SCAN.
All above query are fine and has been modified in order to utilize INDEX only.
Did you check query plan what happending with your query ?
How you have implemented KEYSET Pagination ?