Won’t You Be My Neighbor? Quickly Finding Who is Nearby

Many applications these days want us to know how close we are to things:

  • What are the three closest coffee shops to my current location?
  • Which is the nearest airport to the office?
  • What are the two closest subway stops to the restaurant?

and countless more examples.

Another way of asking these questions is to say “who are my nearest neighbors to me?” This maps to a classic algorithmic problem: efficiently finding the K-nearest neighbors (or K-NN), where K is a constant. For example, the first question would be a 3-NN problem as we are trying to find the 3 closest coffee shops.

(If you are interested in learning more about K-NN problems in general, I highly recommend looking at how you can solve this using n-dimensional Voronoi diagrams, a wonderful data structure developed in the field of computational geometry.)

PostgreSQL defines a distance operator for geometric types that looks like this “<->” that, in the case of points, calculates the 2-Dimensional Euclidean distance. For example:

SELECT POINT(0,0) <-> POINT(1,1);

    ?column?
-----------------
 1.4142135623731


.. If we want to find the three friends who were closest to us on October 1, 2012 between 7:00am and 9:00am, we could construct a query like this:

SELECT visitor, visited_at, geocode
FROM visits
WHERE
    visited_at BETWEEN '2012-10-01 07:00' AND '2012-10-01 09:00'
ORDER BY POINT(40.7127263,-74.0066592) <-> geocode
LIMIT 3;


PostgreSQL 9.1 introduced the KNN-GiST index as a way to accelerate searching for neighbors. It has been implemented on several data types, including points and trigrams, and is also leveraged by the PostGIS geospatial extension.

.. You can use a KNN-GiST index simply by creating a GiST index on a supported data type, which in this case, is the geocode column:

CREATE INDEX visits_geocode_gist_idx ON visits USING gist(geocode);
VACUUM ANALYZE visits;

To demonstrate its power, let’s see what happens if we try to find the 3 nearest points to a given location:

EXPLAIN ANALYZE SELECT visitor, visited_at, geocode
FROM visits
ORDER BY POINT(40.7127263,-74.0066592) <-> geocode
LIMIT 3;