PostgreSQL Domain Integrity In Depth

URIs

For example, consider URIs. The naive way to store them is in a text field. One step better is to add a regex to match URI syntax, and wrap the check in a custom domain as we have done so many times in this article already.

.. To extract the pieces-to-be from a raw blob of address data you can use PostgreSQL extension helper functions. PostGIS provides a basic parser, and then there’s the pgsql-postal extension powered by libpostal. The latter is the big gun, but may consume large amounts of RAM during its operation.

Email Addresses

Email addresses are case insensitive, so it makes sense to represent them that way. Also it’s not a concern of domain integrity per se, but Web applications often add a uniqueness constraint on user emails. That’s because it doesn’t make sense to use the same email address between multiple user accounts. The constraint ought to prevent duplicates including those of case variation. One way to do this is make a unique index on a text expression, like

CREATE UNIQUE INDEX users_lower_email_key
ON users (LOWER(email));
Unfortunately it’s touchy. Any queries filtering the users table by email must remember to lowercase the prospective value. It would be better if any comparison between emails was case insensitive so that nobody has to remember to explicitly lowercase the values.

This situation is perfect for the citext (aka Case Insensitive Text) extension. It’s a type that stores text verbatim, but compares without regard to case.

Enumerations

A few RDBMSes (PostgreSQL and MySQL) have a special enum type that ensures a variable or column must be one of a certain list of values. This is also enforcible with custom domains.

CREATE TYPE log_level AS ENUM
(‘notice’, ‘warning’, ‘error’, ‘severe’);

— allows convenient queries like
SELECT * FROM log WHERE level >= ‘warning’;
This is the kind of logic you would otherwise have to implement yourself.

If you do want to use the enum type, then changing its values requires some special DDL commands.

Normally CHECK conditions in domains have limited complexity. They can’t contain subqueries, or reference other rows. However using stored procedures we can get around these restrictions.

PostgreSQL Domain Integrity In Depth

<span class="kw">CREATE</span> EXTENSION uri;

<span class="kw">CREATE</span> <span class="kw">DOMAIN</span> http_uri <span class="kw">AS</span> uri <span class="kw">CHECK</span> (
      uri_scheme(<span class="fu">VALUE</span>) <span class="kw">IS</span> <span class="kw">NOT</span> <span class="kw">NULL</span>
  <span class="kw">AND</span> uri_scheme(<span class="fu">VALUE</span>) <span class="kw">IN</span> (<span class="st">'http'</span>, <span class="st">'https'</span>)
  <span class="kw">AND</span> uri_host(<span class="fu">VALUE</span>) <span class="kw">IS</span> <span class="kw">NOT</span> <span class="kw">NULL</span>
  <span class="kw">AND</span> uri_host(<span class="fu">VALUE</span>) <> <span class="st">''</span>
);

<span class="co">-- works great</span>
<span class="kw">SELECT</span> <span class="st">'https://www.foo.com/bar/baz'</span>:<span class="ch">:http_uri</span>;

<span class="co">-- forbidden</span>
<span class="kw">SELECT</span> <span class="st">'ftp://www.foo.com/bar/baz'</span>:<span class="ch">:http_uri</span>;
CREATE FUNCTION is_valid_cc(smallint[]) RETURNS boolean AS $$
SELECT SUM(
CASE WHEN (pos % 2 = 0) THEN
2*digit - (CASE WHEN digit < 5 THEN 0 ELSE 9 END)
ELSE
digit
END
) % 10 = 0
FROM
unnest(ARRAY( -- loop over digit/position
SELECT $1[i] -- ... which we read backward
FROM generate_subscripts($1,1) AS s(i)
ORDER BY i DESC
)
) WITH ordinality AS t (digit, pos)
$$
LANGUAGE SQL