The Guardians Switched from Mongo to Postgres

In April the Guardian switched off the Mongo DB cluster used to store our content after completing a migration to PostgreSQL on Amazon RDS. This post covers why and how

At the Guardian, the majority of content – including articles, live blogs, galleries and video content – is produced in our in-house CMS tool, Composer. This, until recently, was backed by a Mongo DB database running on AWS. This database is essentially the “source of truth” for all Guardian content that has been published online – approximately 2.3m content items. We’ve just completed our migration away from Mongo to Postgres SQL.

merging many json files into one

jq solution:

jq -s '{ attributes: map(.attributes[0]) }' file*.json
  • -s (--slurp) – instead of running the filter for each JSON object in the input, read the entire input stream into a large array and run the filter just once.

Sample output:

  "attributes": [
      "name": "Node",
      "value": "test"
      "name": "version",
      "value": "11.1"

How can I import a JSON file into PostgreSQL?

You can feed the JSON into a SQL statement that extracts the information and inserts that into the table. If the JSON attributes have exactly the name as the table columns you can do something like this:

with customer_json (doc) as (
        "id": 23635,
        "name": "Jerry Green",
        "comment": "Imported from facebook."
        "id": 23636,
        "name": "John Wayne",
        "comment": "Imported from facebook."
insert into customer (id, name, comment)
select p.*
from customer_json l
  cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update 
  set name =, 
      comment = excluded.comment;

New customers will be inserted, existing ones will be updated. The “magic” part is the json_populate_recordset(null::customer, doc) which generates a relational representation of the JSON objects.