Postgres Backup with Wal-e

The following are the steps I took to setup Wal-e 0.6.2 on Ubuntu 12.04.2 LTS and Postgres 9.1.9. After following the installation instructions, every minute Wal-e will make incremental backups to Amazon S3.

Installation

$ sudo apt-get install libevent-dev python-all-dev daemontools lzop pv postgresql-client
$ sudo pip install wal-e
$ umask u=rwx,g=rx,o=
$ mkdir -p /etc/wal-e.d/env
$ echo "secret-key-content" > /etc/wal-e.d/env/AWS_SECRET_ACCESS_KEY
$ echo "access-key" > /etc/wal-e.d/env/AWS_ACCESS_KEY_ID
$ echo 's3://some-bucket/directory/or/whatever' > /etc/wal-e.d/env/WALE_S3_PREFIX
$ sudo chown -R root:postgres /etc/wal-e.d

Added the following to the end of the file, /etc/postgresql/9.1/main/postgresql.conf:

wal_level = archive
archive_mode = on
archive_command = 'envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-push %p'
archive_timeout = 60

Restart postgres:

$ sudo service postgresql restart

1.2 Billion Taxi Rides on AWS RDS running PostgreSQL

On November 17th, 2015, Todd Schneider published a blog post titled Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance in which he analysed the metadata of 1.1 billion Taxi journeys made in New York City between 2009 and 2015. Included with this work was a link to a GitHub repository where he published the SQL, Shell and R files he used in his work and instructions on how to get everything up and running. There are a few additional charts created by the R files which were used in follow up posts as well.

In this blog post I’ll launch 4 different types of AWS RDS instances running PostgreSQL 9.5.2 and benchmark creating the same graphs that Todd Schneider did in his analysis.