Add “Faux” Predicates: Postgres Optimization

Made by Developers and Non-Developers

The query fetches sales that were modified before 2019. There is no index on this field, so the optimizer generates an execution plan to scan the entire table.

Let’s say you have another field in this table with the time the sale was created. Since it’s not possible for a sale to be modified before it was created, adding a similar condition on the created field won’t change the result of the query. However, the optimizer might use this information to generate a better execution plan:

db=# (
  SELECT *
  FROM sale
  WHERE modified < '2019-01-01 asia/tel_aviv'
  AND   created < '2019-01-01 asia/tel_aviv';
);
                                           QUERY PLAN
--------------------------------------------------------------------------------------------------
Index Scan using sale_created_ix on sale (cost=0.44..4.52 rows=1 width=276)
  Index Cond: (created < '2019-01-01 00:00:00+02'::timestamp with time zone)
  Filter: (modified < '2019-01-01 00:00:00+02'::timestamp with time zone)

After we added the “Faux Predicate” the optimizer decided to use the index on the created field, and the query got much faster! Note that the previous predicate on the modified field is still being evaluated, but it’s now being applied on much fewer rows.

A “Faux Predicate” should not change the result of the query. It should only be used to provide more information to the optimizer that can improve the query performance. Keep in mind that the database has to evaluate all the predicates, so adding too many might make a query slower.

Announcing Yugabyte DB 2.0 GA: Jepsen Tested, High-Performance Distributed SQL

We are excited to announce the general availability of Yugabyte DB 2.0! The highlight of this release is that it delivers production readiness for Yugabyte SQL (YSQL), our high-performance, fully-relational distributed SQL API. For those of you new to distributed SQL, Yugabyte DB is a Google Spanner-inspired, cloud-native distributed SQL database that is 100% open source. It puts a premium on high performance, data resilience, geographic distribution while ensuring PostgreSQL compatibility.

Ultimate PostgreSQL Slug Function

What is a slug?!

If you’re reading this, you probably know, but a slug is a URL/SEO friendly representation of a string. If for example you wrote an article:

The World’s “Best” Cafés!

You’d want a slug function to create a representation that looks a bit like:

the-worlds-best-cafes

Granted browsers can probably handle the accented “e” but it’s good to be over conservative for backwards compatibility.

PostgreSQL slug function

The most comprehensive slug function I’ve seen for Postgres (and in general) is by ianks on Github. Slug functions are one of those things that never seems to have a definitive “best and final” version, but this has been very reliable for me.

Quick and Dirty Address Matching with LibPostal

  • While it is not a 100% solution, using normalized addresses and full text search provides a relatively fast (less than 100ms) matching approach for loose address matching.

libpostal only has two operations, “address normalization” and “address parsing”, that are exposed by pgsql-postal with the postal_normalize() and postal_parse() functions.

Normalization takes an address string and converts it to all the standard forms that “make sense”. For example:

SELECT unnest(postal_normalize('390 Greenwich St., New york, ny, 10013'));
390 greenwich saint new york ny 10013
390 greenwich saint new york new york 10013
390 greenwich street new york ny 10013
390 greenwich street new york new york 10013

Parsing takes apart a string into address components and returns a JSONB of those components:

SELECT jsonb_pretty(postal_parse('390 Greenwich St., New york, ny, 10013'));
{                           
"city": "new york",     
"road": "greenwich st.",
"state": "ny",          
"postcode": "10013",    
"house_number": "390"   
}

We can use normalization to create a table of text searchable address strings, and then use full text search to efficiently search that table for potential matches for new addresses.

International Normalization

As we saw above, normalization takes raw address strings and turns them into “possible standard forms”, which are suitable for searching against. They aren’t necessarily the best forms, more regionally-aware parsing can do a better job of standard North American parsing and formatting, but where libpostal shines is being a ready-to-run fully international solution that doesn’t even need to be told what language it is working on.

For example, this address in Berlin:

SELECT unnest(postal_normalize('Potsdamer Straße 3, 10785 Berlin, Germany'));
potsdamer strasse 3 10785 berlin germany