visit
In SQL, LIMIT
and OFFSET
allow you to retrieve just a portion of the rows that are generated by the rest of the query. This is useful when you need to display results page by page in a web application. It sounds good, but there is a major issue: as the number of skipped rows increases, query times become slower.
Let's say we have a PostgreSQL table called tasks
with 100 million rows in it.
SELECT * FROM tasks ORDER BY id LIMIT 10 OFFSET 10
Now take a look at the query plan, which is the result of EXPLAIN <query>
:
Limit (cost=1.34..2.11 rows=10 width=628)
-> Index Scan using tasks_pkey on tasks
(cost=0.57..5439151.65 rows=100000000 width=628)
Estimated start-up cost. This is the time expended before the output phase can begin, e.g., time to do the sorting in a sort node.
Estimated total cost. This is stated on the assumption that the plan node is run to completion, i.e., all available rows are retrieved. In practice a node's parent node might stop short of reading all available rows.
Estimated number of rows output by this plan node. Again, the node is assumed to be run to completion.
- Estimated average width of rows output by this plan node (in bytes).
We can then break down the query plan using this knowledge:
Limit: It returns 10 rows. The cost of executing is estimated to be between 1.34 and 2.11. The width of the result is 628 bytes for each row.
Index Scan: It uses an index, which is good in terms of performance. The cost is estimated to be between 0.57 and 5,439,151.65. However, since we have a Limit step, in reality, it will never use all 10 million rows, so the actual cost is closer to a minimal one. The width is the same as in the Limit step.
In summary, it's not so bad. However, avoid jumping to conclusions.
SELECT * FROM tasks ORDER BY id LIMIT 10 OFFSET 1000000
Limit (cost=77270.24..77271.02 rows=10 width=628)
-> Index Scan using tasks_pkey on tasks
(cost=0.57..5439162.17 rows=100000000 width=628)
As you can see now, the Limit step has become much worse in this case. It costs at least 77,270.24, which is significant, but the Index Scan has remained almost the same. Therefore, the more rows we have to skip using OFFSET, the heavier the query becomes. Why is it so? The answer is in the :
The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large OFFSET might be inefficient.
An alternative approach is to use "WHERE id > N" instead of OFFSET:
SELECT * FROM tasks WHERE id > 1000000 ORDER BY id LIMIT 10
Limit (cost=0.57..1.36 rows=10 width=628)
-> Index Scan using tasks_pkey on tasks
(cost=0.57..5504715.15 rows=100000000 width=628)
Index Cond: (id > 1000000)
As you can see, in this case, we have an additional index condition (id > 1,000,000) that helps us filter out rows that do not meet the criteria while scanning the index. This condition improves the performance of the query by narrowing down the search space.