sql

SQL: Using Filter to turn EAV (entity/attribute/value) rows into rows of entities

#302 — APRIL 24, 2019 READ ON THE WEB

This issue’s Tip of the Week looks at SQL’s FILTER clause. Scroll to the end of this issue to check it out..

Postgres Weekly

.. snip ..

supported by

Tip of the Week

Using FILTER to turn entity/value tables into rows of entities

An entire article could be written on this topic, but I wanted to show off the most basic use of SQL:2003’s FILTER clause that was added to Postgres 9.4.

Let’s say you have a table called props that represents entities, attributes, and values with an integer id column, textual attribute and value columns, and the following contents:

The FILTER clause essentially adds an extra WHERE clause to aggregate functions (such as MIN, MAX and SUM) allowing you to scope them.

This comes in very handy for pulling out values from our val column based upon the value of the attr column, therefore allowing us to turn a table of entities, attributes and values into a more classical set of columns.

SELECT id,
MAX(val) FILTER(WHERE attr=’name’) AS name,
MAX(val) FILTER(WHERE attr=’age’) AS age,
MAX(val) FILTER(where attr=’city’) AS city
FROM props GROUP BY id;

You might need to reproduce this table and play with the query to get the feel of what’s going on, but essentially we are selecting each row grouped by the ID (i.e. the ID of the underlying entity) and then picking the value associated with each ID that matches certain attribute names, allowing us, in this case, to extract the name, age, and city values. MAX works fine as an aggregate function here as there is only one attr/val pair per entity.

The FILTER clause has a lot more uses than this, but I felt this was both a pretty neat and perhaps unexpected example of its use.

You can learn more in this article and in the official Postgres documentation.

This week’s tip is sponsored by Hasura, creators of the high-performance GraphQL engine on new and existing Postgres databases. Check them out on GitHub.

Be careful with CTE in PostgreSQL

>A lesser known fact about CTE in PostgreSQL is that the database will evaluate the query inside the CTE and store the results.

PostgreSQL materialized the CTE, meaning, it created a temporary structure with the results of the query defined in the CTE, and only then applied the filter to it. Because the predicate was not applied on the table (but the CTE) PostgreSQL was unable to utilize the index on the ID column.

Unlike PostgreSQL, Oracle is not materializing CTEs by default and the two queries generate the same execution plan.

PostgreSQL’s Powerful New Join Type: LATERAL

PostgreSQL 9.3 has a new join type! Lateral joins arrived without a lot of fanfare, but they enable some powerful new queries that were previously only tractable with procedural code. In this post, I’ll walk through a conversion funnel analysis that wouldn’t be possible in PostgreSQL 9.2.

What is a LATERAL join?

The best description in the documentation comes at the bottom of the list of FROM clause options:

The LATERAL key word can precede a sub-SELECT FROM item. This allows the sub-SELECT to refer to columns of FROM items that appear before it in the FROM list. (Without LATERAL, each sub-SELECT is evaluated independently and so cannot cross-reference any other FROMitem.)
…
When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the LATERAL item is evaluated using that row or row set’s values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s).

What are the options for storing hierarchical data in a relational database?

Generally speaking, you’re making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:

One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set and Nested Interval I’ve found.

Models for hierarchical data: slides with good explanations of tradeoffs and example usage

Representing hierarchies in MySQL: very good overview of Nested Set in particular

Hierarchical data in RDBMSs: most comprehensive and well-organized set of links I’ve seen, but not much in the way of explanation