Five ways to paginate in Postgres, from the basic to the exotic
However the PostgreSQL statistics collector maintains per-column histograms of value distribution. We can use these estimates in conjunction with limits and small offsets to get fast random-access pagination through a hybrid approach.
First let’s look at the statistics of our medley:
<span class="k">SELECT</span><span style="color: #dddddd;"> <span class="n">array_length</span><span class="p">(</span><span class="n">histogram_bounds</span><span class="p">,</span> </span><span class="mi">1</span><span style="color: #dddddd;"><span class="p">)</span> </span><span class="o">-</span> <span class="mi">1</span> <span class="k">FROM</span><span style="color: #dddddd;"> <span class="n">pg_stats</span></span> <span class="k">WHERE</span><span style="color: #dddddd;"> <span class="n">tablename</span> </span><span class="o">=</span> <span class="s1">'medley'</span> <span class="k">AND</span><span style="color: #dddddd;"> <span class="n">attname</span> </span><span class="o">=</span> <span class="s1">'n'</span><span class="p"><span style="color: #dddddd;">;</span></span>
In my database the column n has 101 bound-markers, i.e. 100 ranges between bound-markers. The particular values aren’t too surprising because my data is uniformly distributed
<span class="p"><span style="color: #dddddd;">{</span></span><span class="mi">719</span><span class="p"><span style="color: #dddddd;">,</span></span><span class="mi">103188</span><span class="p"><span style="color: #dddddd;">,</span></span><span class="mi">193973</span><span class="p"><span style="color: #dddddd;">,</span></span><span class="mi">288794</span><span class="p"><span style="color: #dddddd;">,</span></span> <span class="err">…</span> <span class="p"><span style="color: #dddddd;">,</span></span><span class="mi">9690475</span><span class="p"><span style="color: #dddddd;">,</span></span><span class="mi">9791775</span><span class="p"><span style="color: #dddddd;">,</span></span><span class="mi">9905770</span><span class="p"><span style="color: #dddddd;">,</span></span><span class="mi">9999847</span><span class="p"><span style="color: #dddddd;">}</span></span>
Notice that the values are approximate. The first number is not exactly zero, and the last is not exactly ten million. The ranges divide our information into a block size B = 10,000,000 / 100 = 100,000 rows.
We can use the histogram ranges from the PostgreSQL stats collector to obtain probabilistically correct pages. If we choose a client-side page width of W how do we request the ith page? It will reside in block iW / B, at offset iW % B.
Choosing W=20 let’s request page 270,000 from the medley table. Note that PostgreSQL arrays are one-based so we have to adjust the values in the array lookups:
<span class="k">WITH</span><span style="color: #dddddd;"> <span class="n">bookmark</span> </span><span class="k">AS</span><span style="color: #dddddd;"> <span class="p">(</span></span> <span class="k">SELECT</span><span style="color: #dddddd;"> <span class="p">(</span><span class="n">histogram_bounds</span><span class="p">::</span><span class="n">text</span><span class="p">::</span><span class="n">int</span><span class="p">[])[((</span></span><span class="mi">270000</span> <span class="o">*</span> <span class="mi">20</span><span style="color: #dddddd;"><span class="p">)</span> </span><span class="o">/</span> <span class="mi">100000</span><span class="p"><span style="color: #dddddd;">)</span></span><span class="o">+</span><span class="mi">1</span><span style="color: #dddddd;"><span class="p">]</span> </span><span class="k">AS</span> <span class="k">start</span><span class="p"><span style="color: #dddddd;">,</span></span> <span style="color: #dddddd;"> <span class="p">(</span><span class="n">histogram_bounds</span><span class="p">::</span><span class="n">text</span><span class="p">::</span><span class="n">int</span><span class="p">[])[((</span></span><span class="mi">270000</span> <span class="o">*</span> <span class="mi">20</span><span style="color: #dddddd;"><span class="p">)</span> </span><span class="o">/</span> <span class="mi">100000</span><span class="p"><span style="color: #dddddd;">)</span></span><span class="o">+</span><span class="mi">2</span><span style="color: #dddddd;"><span class="p">]</span> </span><span class="k">AS</span><span style="color: #dddddd;"> <span class="n">stop</span></span> <span class="k">FROM</span><span style="color: #dddddd;"> <span class="n">pg_stats</span></span> <span class="k">WHERE</span><span style="color: #dddddd;"> <span class="n">tablename</span> </span><span class="o">=</span> <span class="s1">'medley'</span> <span class="k">AND</span><span style="color: #dddddd;"> <span class="n">attname</span> </span><span class="o">=</span> <span class="s1">'n'</span> <span class="k">LIMIT</span> <span class="mi">1</span> <span style="color: #dddddd;"> <span class="p">)</span></span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span><span style="color: #dddddd;"> <span class="n">medley</span></span> <span class="k">WHERE</span><span style="color: #dddddd;"> <span class="n">n</span> </span><span class="o">>=</span><span style="color: #dddddd;"> <span class="p">(</span></span><span class="k">select</span> <span class="k">start</span> <span class="k">from</span><span style="color: #dddddd;"> <span class="n">bookmark</span><span class="p">)</span></span> <span class="k">AND</span><span style="color: #dddddd;"> <span class="n">n</span> </span><span class="o"><</span><span style="color: #dddddd;"> <span class="p">(</span></span><span class="k">select</span><span style="color: #dddddd;"> <span class="n">stop</span> </span><span class="k">from</span><span style="color: #dddddd;"> <span class="n">bookmark</span><span class="p">)</span></span> <span class="k">ORDER</span> <span class="k">BY</span><span style="color: #dddddd;"> <span class="n">n</span> </span><span class="k">ASC</span> <span class="k">LIMIT</span> <span class="mi">20</span> <span class="k">OFFSET</span><span style="color: #dddddd;"> <span class="p">((</span></span><span class="mi">270000</span> <span class="o">*</span> <span class="mi">20</span><span style="color: #dddddd;"><span class="p">)</span> </span><span class="o">%</span> <span class="mi">100000</span><span class="p"><span style="color: #dddddd;">);</span></span>
This performs blazingly fast (notice the offset happens to be zero here). It gives back rows with n = 5407259 through 5407278. The true values on page 270000 are n = 5400001 through 5400020. The values is off by 7239, or about 0.1%.