PostgreSQL’s Powerful New Join Type: LATERAL

PostgreSQL 9.3 has a new join type! Lateral joins arrived without a lot of fanfare, but they enable some powerful new queries that were previously only tractable with procedural code. In this post, I’ll walk through a conversion funnel analysis that wouldn’t be possible in PostgreSQL 9.2.

What is a LATERAL join?

The best description in the documentation comes at the bottom of the list of FROM clause options:

The LATERAL key word can precede a sub-SELECT FROM item. This allows the sub-SELECT to refer to columns of FROM items that appear before it in the FROM list. (Without LATERAL, each sub-SELECT is evaluated independently and so cannot cross-reference any other FROMitem.)

When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the LATERAL item is evaluated using that row or row set’s values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s).


Saving a Tree in Postgres Using LTREE

In my last post, I showed you how to install and enable a Postgres extension called LTREE. LTREE allows me to save, query on and manipulate trees or hierarchical data structures using a relational database table. As we’ll see, using LTREE I can count leaves, cut off branches, and climb up and down trees easily – all using SQL right inside my application’s existing Postgres database!

But trees are natural, haphazard, branching structures with countless leaves, while database tables are man-made rectangles full of numbers and text. How can I possibly save a beautiful tree structure into an ugly, boring database table?

Path Enumeration

Let’s return to the example tree from the first post in this series:

The LTREE extension uses the path enumeration algorithm, which calls for each node in the tree to record the path from the root you would have to follow to reach that node.

Won’t You Be My Neighbor? Quickly Finding Who is Nearby

Many applications these days want us to know how close we are to things:

  • What are the three closest coffee shops to my current location?
  • Which is the nearest airport to the office?
  • What are the two closest subway stops to the restaurant?

and countless more examples.

Another way of asking these questions is to say “who are my nearest neighbors to me?” This maps to a classic algorithmic problem: efficiently finding the K-nearest neighbors (or K-NN), where K is a constant. For example, the first question would be a 3-NN problem as we are trying to find the 3 closest coffee shops.

(If you are interested in learning more about K-NN problems in general, I highly recommend looking at how you can solve this using n-dimensional Voronoi diagrams, a wonderful data structure developed in the field of computational geometry.)

PostgreSQL defines a distance operator for geometric types that looks like this “<->” that, in the case of points, calculates the 2-Dimensional Euclidean distance. For example:

SELECT POINT(0,0) <-> POINT(1,1);


.. If we want to find the three friends who were closest to us on October 1, 2012 between 7:00am and 9:00am, we could construct a query like this:

SELECT visitor, visited_at, geocode
FROM visits
    visited_at BETWEEN '2012-10-01 07:00' AND '2012-10-01 09:00'
ORDER BY POINT(40.7127263,-74.0066592) <-> geocode

PostgreSQL 9.1 introduced the KNN-GiST index as a way to accelerate searching for neighbors. It has been implemented on several data types, including points and trigrams, and is also leveraged by the PostGIS geospatial extension.

.. You can use a KNN-GiST index simply by creating a GiST index on a supported data type, which in this case, is the geocode column:

CREATE INDEX visits_geocode_gist_idx ON visits USING gist(geocode);

To demonstrate its power, let’s see what happens if we try to find the 3 nearest points to a given location:

EXPLAIN ANALYZE SELECT visitor, visited_at, geocode
FROM visits
ORDER BY POINT(40.7127263,-74.0066592) <-> geocode