The Guardians Switched from Mongo to Postgres

In April the Guardian switched off the Mongo DB cluster used to store our content after completing a migration to PostgreSQL on Amazon RDS. This post covers why and how

At the Guardian, the majority of content – including articles, live blogs, galleries and video content – is produced in our in-house CMS tool, Composer. This, until recently, was backed by a Mongo DB database running on AWS. This database is essentially the “source of truth” for all Guardian content that has been published online – approximately 2.3m content items. We’ve just completed our migration away from Mongo to Postgres SQL.

Add “Faux” Predicates: Postgres Optimization

Made by Developers and Non-Developers

The query fetches sales that were modified before 2019. There is no index on this field, so the optimizer generates an execution plan to scan the entire table.

Let’s say you have another field in this table with the time the sale was created. Since it’s not possible for a sale to be modified before it was created, adding a similar condition on the created field won’t change the result of the query. However, the optimizer might use this information to generate a better execution plan:

db=# (
  FROM sale
  WHERE modified < '2019-01-01 asia/tel_aviv'
  AND   created < '2019-01-01 asia/tel_aviv';
                                           QUERY PLAN
Index Scan using sale_created_ix on sale (cost=0.44..4.52 rows=1 width=276)
  Index Cond: (created < '2019-01-01 00:00:00+02'::timestamp with time zone)
  Filter: (modified < '2019-01-01 00:00:00+02'::timestamp with time zone)

After we added the “Faux Predicate” the optimizer decided to use the index on the created field, and the query got much faster! Note that the previous predicate on the modified field is still being evaluated, but it’s now being applied on much fewer rows.

A “Faux Predicate” should not change the result of the query. It should only be used to provide more information to the optimizer that can improve the query performance. Keep in mind that the database has to evaluate all the predicates, so adding too many might make a query slower.

pgsql-http (Github)

HTTP client for PostgreSQL, retrieve a web page from inside the database.

SELECT content FROM http_get(‘’);
{“origin”:”″} +
(1 row)

http_header(field VARCHAR, value VARCHAR) returns http_header
http(request http_request) returns http_response
http_get(uri VARCHAR) returns http_response
http_post(uri VARCHAR, content VARCHAR, content_type VARCHAR) returns http_response
http_put(uri VARCHAR, content VARCHAR, content_type VARCHAR) returns http_response
http_patch(uri VARCHAR, content VARCHAR, content_type VARCHAR) returns http_response
http_delete(uri VARCHAR) returns http_response
http_head(uri VARCHAR) returns http_response
http_set_curlopt(curlopt VARCHAR, value varchar) returns boolean
http_reset_curlopt() returns boolean
urlencode(string VARCHAR) returns text



Trying to decide between to_tsqueryplainto_tsquery and phraseto_tsquery can be difficult. It was kind of straightforward in our case – we’re not searching on any phrases really.

The Postgres team decided to be helpful in this regard, especially when it comes to web applications, so they created websearch_to_tsquery. It basically treats the input as if it were entered into a Google search. To be dead honest I have no idea what’s happening under the covers here, but it’s supposed to be a bit more intelligent than plainto_tsquery and a little less strict than phraseto_tsquery.