Postgres GIS Address Standardizer

This is a question address canonicalization and parsing. Essentially what you’re talking about is handled through a gazetteer (geographical rule set). There are two ways to do this right,

  1. address_standardizer from the PostGIS project and certainly better if you’re only using United States addresses.
  2. pgsql-postal may be a better method for international addresses.

I’ll show the address standardizer version for the address,

And, then we can use it like this.

SELECT * FROM standardize_address('us_lex',
   'us_gaz', 'us_rules', '10511 Homestead Rd, Pahrump, NV 89061');
 building | house_num | predir | qual | pretype |   name    | suftype | sufdir | ruralroute | extra |  city   | state  | country | postcode | box | unit 
----------+-----------+--------+------+---------+-----------+---------+--------+------------+-------+---------+--------+---------+----------+-----+------
          | 10511     |        |      |         | HOMESTEAD | ROAD    |        |            |       | PAHRUMP | NEVADA | USA     | 89061    |     | 
(1 row)

Deep dive into postgres stats: pg_stat_database

In this post we continue our discussion about postgres stats. This time we’ll be focusing on pg_stat_database. As mentioned in postgres documentation, this view contains one row for each database in the cluster, showing database-wide statistics. It is well known that postgres may have several databases within single instance, hence this view contains stats about all of them.
You can find full description of view’s columns in the official documentation so here I will focus on types of problems that it helps us to solve:
  • Cache hit ratio.
  • Commit ratio.
  • Database anomalies.
  • Load distribution.

pg-libphonenumber

A (partially implemented!) PostgreSQL extension that provides access to Google’s libphonenumber

  • Parsing/formatting/validating phone numbers for all countries/regions of the world.
  • getNumberType – gets the type of the number based on the number itself; able to distinguish Fixed-line, Mobile, Toll-free, Premium Rate, Shared Cost, VoIP and Personal Numbers (whenever feasible).
  • isNumberMatch – gets a confidence level on whether two numbers could be the same.
  • getExampleNumber/getExampleNumberByType – provides valid example numbers for all countries/regions, with the option of specifying which type of example phone number is needed.
  • isPossibleNumber – quickly guessing whether a number is a possible phonenumber by using only the length information, much faster than a full validation.
  • isValidNumber – full validation of a phone number for a region using length and prefix information.
  • AsYouTypeFormatter – formats phone numbers on-the-fly when users enter each digit.
  • findNumbers – finds numbers in text input.
  • PhoneNumberOfflineGeocoder – provides geographical information related to a phone number.
  • PhoneNumberToCarrierMapper – provides carrier information related to a phone number.

PostgreSQL: Array of LIKEs

Now you may ask what is wrong with that? Well on its own yes, nothing…. but what about when we need to search for 5 product conditions at once? The query will get just kind of ugly and unwieldy!

1
2
3
4
5
6
7
8
9
10
11
SELECT sum(product_cost) FROM t_test
WHERE product_code LIKE '%123%'
OR product_code LIKE '%234%'
OR product_code LIKE '%345%'
OR product_code LIKE '%456%'
OR product_code LIKE '%567%';

Not something I normally enjoy writing…so can we do better? Yes, we can! Say hello to our “array of LIKEs”:

1
2
3
4
5
6
7
SELECT sum(product_cost) FROM t_test WHERE product_code LIKE ANY( array['%123%', '%234%', '%345%', '%456%', '%567%']);
-- or the same using shorter Postgres array notation
SELECT sum(product_cost) FROM t_test WHERE product_code LIKE ANY(‘{%123%,%234%,%345%,%456%,%567%}’);