So the difficulty of a dynamic pivot is: in an SQL query, the output columns must be determined before execution. But to know which columns are formed from transposing rows, we’d to need to execute the query first. To solve this chicken and egg problem, we need to loosen the constraints a bit.
Result in a nested structure inside a column
Firstly, a SQL query can return the pivoted part encapsulated inside a single column with a composite or array type, rather than as multiple columns. This type could be array[text], JSON, XML… This solution is used by Oracle with its PIVOT XML clause. That’s a one-step method, which is good, but the result has a non-tabular structure that does necessarily match the users expectations.
Here is an example using modern PostgreSQL with JSON:
Without having to enumerate the years in the query, we get the same data as above, but as 2 columns, the first for the “vertical axis”, and the next one for all the rest in JSON format:
That somehow does the job, but visually the alignment is not very good, and if we want to copy-paste this result into a spreadsheet or import it as tabular data, it’s clear that it’s not going to work.
Getting a tabular result in two steps
The other solutions based on a SQL query are based on the idea of a two-step process:
a first query build the result with all its columns, and returns an indirect reference to this result.
a second query returns the result set, now that its structure is known by the SQL engine thanks to the previous step.
Ever worked with temperature ranges, calendar scheduling, price ranges and the like? Working with intervals are one of those deceptively simple things that gently lead you into hair-pulling and late night debugging.
Here’s a table with a range column, and some values:
CREATE TABLE prices (
item text,
price int4range -- int4range is a range of regular integers
);
INSERT INTO prices VALUES ('mouse', '[10,16)');
INSERT INTO prices VALUES ('keyboard', '[20,31)');
INSERT INTO prices VALUES ('joystick', '[35,56)');
The numbers in the mismatched brackets represent a half-open interval. Here is the query to find all items that are in the price range $15 to $30, using the && operator (range overlap):
PostgreSQL has supported arrays for a long time. Array types can reduce the amount of boilerplate application code and simplify queries. Here is a table that uses an array column:
CREATE TABLE posts (
title text NOT NULL PRIMARY KEY,
tags text[]
);
Assuming each row represents a blog post, each having a set of tags, here is how we can list all the posts that have both “postgres” and “go” tags:
test=# SELECT title, tags FROM posts WHERE '{"postgres", "go"}' <@ tags;
title | tags
-----------------------------------+------------------------
Writing PostgreSQL Triggers in Go | {postgres,triggers,go}
(1 row)
The usage of the array type here makes for concise data modelling and simpler queries. Postgres arrays come with operators and functions, including aggregate functions. You can also create indexes on array expressions. Here is an article on using arrays with Go.
pg_stat_statements
pg_stat_statements is an extension that is present by default in your PostgreSQL distribution, but is not enabled. This extension records a wealth of information about each statement executed, including the time taken, the memory used and disk I/Os initiated. It’s indispensible for understanding and debugging query performance issues.
The overhead of installing and enabling this extension is small, it’s very simple to use, and there is no reason NOT to run it on your production servers! Read the docs for more info.
Hash, GIN and BRIN Indexes
The default index type in PostgreSQL is the B-Tree, but there are also other types, that are documented here. Other index types are very helpful in cases that aren’t actually uncommon. In particular, setting up indexes of the hash, GIN and BRIN type might just be the solution to your performance issues:
Hash: Unlike B-Tree indexes which have inherent ordering, hash indexes are unordered and can only do equality matches (lookup). However, hash indexes occupy much lesser space and are faster than B-Trees for equality matches. (Also, note that prior to PostgreSQL 10 it was not possible to replicate hash indexes; they were unlogged.)
GIN: GIN is an inverted index, which essentially allows for multiple values for a single key. GIN indexes are useful for indexing arrays, JSON, ranges, full text search etc.
BRIN: If your data has a specific, natural order – for example, time series data – and your queries typically work only with a small range of it, then BRIN indexes can speed up your queries with very little overhead. BRIN indexes maintain ranges per block of data, allowing the optimizer to skip over blocks that contain rows that won’t be selected by the query.
About pgDash
pgDash is an in-depth monitoring solution designed specifically for PostgreSQL deployments. pgDash shows you information and metrics about every aspect of your PostgreSQL database server, collected using the open-source tool pgmetrics.
One not too well known aspect of SQL is that it’s Turing complete. This means any program you can write in a general purpose programming language, you can also write in SQL. In this post, I’ll show you how you can use some of the dark corners of SQL in order to generate fractals. We’ll ultimately wind up with a set of queries which will allow us to generate an entire class of fractals known as the “escape-time fractals”.
Common Table Expressions
Before we begin, I’m going to go over one very important SQL feature that I’m going to use a ton throughout this post. That feature is known as CTEs (common table expressions). CTEs effectively give you a way to define what are effectively variables in SQL. They are really nice for large SQL queries because they let you break down the query into smaller pieces.