There’s one big reason that Postgres can’t kill Oracle, and it’s not the technology

Postgres is popular but can’t unseat Oracle due to its business practices. Here’s how Oracle wins with its contracts and database stickiness.

A few years back EnterpriseDB, built on Postgres, offered enterprises something they shouldn’t have been able to refuse: All the Oracle database goodness without any of the Oracle. Enterprises were intrigued, but the efforts foundered. The problem? Contracts. As Keith Alsheimer, chief marketing officer for EnterpriseDB, explained:

Some of the practices that Oracle has of locking customers in contractually is a real challenge. Even if they want to move over [to another database system], they still have this number of licenses they bought, and they have to pay support for them [even if those licenses remain unused]. It’s very hard to get out of that.

It’s not merely a matter of contracts, but also how Oracle sells. Over the years, Oracle has acquired a bevy of applications (ERP, CRM, SCM, etc.) and tends not to certify that those applications will work with any other database. They probably would work fine, but not many CIOs would take that risk, especially given that doing so would violate their (wait for it!) contracts.

PostgreSQL Domain Integrity In Depth

CREATE EXTENSION uri;

CREATE DOMAIN http_uri AS uri CHECK (
      uri_scheme(VALUE) IS NOT NULL
  AND uri_scheme(VALUE) IN ('http', 'https')
  AND uri_host(VALUE) IS NOT NULL
  AND uri_host(VALUE) <> ''
);

-- works great
SELECT 'https://www.foo.com/bar/baz'::http_uri;

-- forbidden
SELECT 'ftp://www.foo.com/bar/baz'::http_uri;
CREATE FUNCTION is_valid_cc(smallint[]) RETURNS boolean AS $$
SELECT SUM(
CASE WHEN (pos % 2 = 0) THEN
2*digit - (CASE WHEN digit < 5 THEN 0 ELSE 9 END)
ELSE
digit
END
) % 10 = 0
FROM
unnest(ARRAY( -- loop over digit/position
SELECT $1[i] -- ... which we read backward
FROM generate_subscripts($1,1) AS s(i)
ORDER BY i DESC
)
) WITH ordinality AS t (digit, pos)
$$
LANGUAGE SQL

Five ways to paginate in Postgres, from the basic to the exotic

However the PostgreSQL statistics collector maintains per-column histograms of value distribution. We can use these estimates in conjunction with limits and small offsets to get fast random-access pagination through a hybrid approach.

First let’s look at the statistics of our medley:

SELECT array_length(histogram_bounds, 1) - 1
  FROM pg_stats
 WHERE tablename = 'medley'
   AND attname = 'n';

In my database the column n has 101 bound-markers, i.e. 100 ranges between bound-markers. The particular values aren’t too surprising because my data is uniformly distributed

{719,103188,193973,288794,  ,9690475,9791775,9905770,9999847}

Notice that the values are approximate. The first number is not exactly zero, and the last is not exactly ten million. The ranges divide our information into a block size B = 10,000,000 / 100 = 100,000 rows.

We can use the histogram ranges from the PostgreSQL stats collector to obtain probabilistically correct pages. If we choose a client-side page width of W how do we request the ith page? It will reside in block iW / B, at offset iW % B.

Choosing W=20 let’s request page 270,000 from the medley table. Note that PostgreSQL arrays are one-based so we have to adjust the values in the array lookups:

WITH bookmark AS (
    SELECT (histogram_bounds::text::int[])[((270000 * 20) / 100000)+1] AS start,
           (histogram_bounds::text::int[])[((270000 * 20) / 100000)+2] AS stop
    FROM pg_stats
    WHERE tablename = 'medley'
    AND attname = 'n'
    LIMIT 1
  )
SELECT *
FROM medley
WHERE n >= (select start from bookmark)
AND n < (select stop from bookmark)
ORDER BY n ASC
LIMIT 20
OFFSET ((270000 * 20) % 100000);

This performs blazingly fast (notice the offset happens to be zero here). It gives back rows with n = 5407259 through 5407278. The true values on page 270000 are n = 5400001 through 5400020. The values is off by 7239, or about 0.1%.