Quick and Dirty Address Matching with LibPostal

  • While it is not a 100% solution, using normalized addresses and full text search provides a relatively fast (less than 100ms) matching approach for loose address matching.

libpostal only has two operations, “address normalization” and “address parsing”, that are exposed by pgsql-postal with the postal_normalize() and postal_parse() functions.

Normalization takes an address string and converts it to all the standard forms that “make sense”. For example:

SELECT unnest(postal_normalize('390 Greenwich St., New york, ny, 10013'));
390 greenwich saint new york ny 10013
390 greenwich saint new york new york 10013
390 greenwich street new york ny 10013
390 greenwich street new york new york 10013

Parsing takes apart a string into address components and returns a JSONB of those components:

SELECT jsonb_pretty(postal_parse('390 Greenwich St., New york, ny, 10013'));
"city": "new york",     
"road": "greenwich st.",
"state": "ny",          
"postcode": "10013",    
"house_number": "390"   

We can use normalization to create a table of text searchable address strings, and then use full text search to efficiently search that table for potential matches for new addresses.

International Normalization

As we saw above, normalization takes raw address strings and turns them into “possible standard forms”, which are suitable for searching against. They aren’t necessarily the best forms, more regionally-aware parsing can do a better job of standard North American parsing and formatting, but where libpostal shines is being a ready-to-run fully international solution that doesn’t even need to be told what language it is working on.

For example, this address in Berlin:

SELECT unnest(postal_normalize('Potsdamer Straße 3, 10785 Berlin, Germany'));
potsdamer strasse 3 10785 berlin germany