Developer Blog PostgreSQL + WAL-E + Cloudfiles = Awesome

If you’re a big PostgreSQL fan like I am, you may have heard of a tool called WAL-E. Originally developed by Heroku, WAL-E is a tool for efficiently sending PostgreSQL’s WAL (Write Ahead Log) to the cloud. In addition to Heroku, WAL-E is now used by many companies with large PostgreSQL deployments, including Instagram.

Let’s unpack what that means. If you’ve ever set up replication with PostgreSQL you’re probably familiar with the WAL. Essentially there are two parts to replication and backup in PostgreSQL, the “base backup” and the WAL. Base backups are a copy of your database files that can be taken while the database is running. You might create base backups every night, for example. The WAL is where PostgreSQL writes each and every transaction, as they happen. When you run normal replication, the leader will send its log file to the followers as it writes it.

Instead of just using a simple socket to communicate, WAL-E sends these base backups and WAL files across the internet with the help of a cloud object store, like Cloudfiles (or any OpenStack Swift deployment). This gives you the advantage that, in addition to just being replication, you have a durable backup of your database for disaster recovery. Further, you have effectively infinite read scalability from the archives, you can keep adding more followers without putting more stress on the leader.

 

Trigrams: Showing similar results in Postgres (Typos)

If you search “trgrams in pstgres” in Google, you’ll get results for “trigrams in postgres”. To say that is useful is an understatement! Recently at work I wanted to search a table of tags, but I wanted to account for spelling mistakes. Just because the user is 1 letter off doesn’t mean I shouldn’t give them a meaningful search result. This is especially true with mobile users, where spelling mistakes are all the more common (even with keyboard corrections). We are going to use the Trigram feature in Postgres to help us achieve this.

PostgreSQL anti-patterns: Unnecessary json/hstore dynamic columns

json is the new EAV – a great tool when you need it, but not something you should use as a first choice.

.. For example, sometimes application users want to be able to add arbitrary attributes to an entity. It’s a business requirement. The client don’t like it when you say that the database doesn’t do that so they can’t have notes in their application, and they’re muttering things about “just doing it in the application” or “we didn’t have these problems with MongoDB”.

How to decide when to use json

Use json if your data won’t fit in the database using a normal relational modelling. If you’re choosing between using EAV, serializing a Java/Ruby/Python object into a bytea field, or storing a key to look up an external structured object somewhere else … that’s when you should be reaching for json fields.