scalability

Developer Blog PostgreSQL + WAL-E + Cloudfiles = Awesome

If you’re a big PostgreSQL fan like I am, you may have heard of a tool called WAL-E. Originally developed by Heroku, WAL-E is a tool for efficiently sending PostgreSQL’s WAL (Write Ahead Log) to the cloud. In addition to Heroku, WAL-E is now used by many companies with large PostgreSQL deployments, including Instagram.

Let’s unpack what that means. If you’ve ever set up replication with PostgreSQL you’re probably familiar with the WAL. Essentially there are two parts to replication and backup in PostgreSQL, the “base backup” and the WAL. Base backups are a copy of your database files that can be taken while the database is running. You might create base backups every night, for example. The WAL is where PostgreSQL writes each and every transaction, as they happen. When you run normal replication, the leader will send its log file to the followers as it writes it.

Instead of just using a simple socket to communicate, WAL-E sends these base backups and WAL files across the internet with the help of a cloud object store, like Cloudfiles (or any OpenStack Swift deployment). This gives you the advantage that, in addition to just being replication, you have a durable backup of your database for disaster recovery. Further, you have effectively infinite read scalability from the archives, you can keep adding more followers without putting more stress on the leader.

Sharding & IDs at Instagram

Our application servers run Django with PostgreSQL as our back-end database. Our first question after deciding to shard out our data was whether PostgreSQL should remain our primary data-store, or whether we should switch to something else. We evaluated a few different NoSQL solutions, but ultimately decided that the solution that best suited our needs would be to shard our data across a set of PostgreSQL servers.

IDs should ideally be 64 bits (for smaller indexes, and better storage in systems like Redis)

At The Heart Of A Giant: Postgres At TripAdvisor

2 core-site datacenters with Postgres infrastructure tested to be capable of handling well over 1,100,000 database queries per minute.

Servers with 768GB of RAM so everything fits in memory.

Multi terabyte databases where only 5% can fit in RAM.

A sharded core site (not warehouse!) table with over 2,600,000,000 tuples

This talk will be a look at how Postgres can form the backbone of a site at the scale of 315 million unique visitors a month

Monitoring Server load: DreamHost vs WebFaction

Before somebody comes up and say “That’s not a conclusive proof, DreamHost’s hardware may be capable of handling more users than WebFaction’s”, here you have some specifications for every server — guess what? This WebFaction server is better than my shared host at DreamHost!

.. The conclusive proof to demostrate that DreamHost overloads their servers, while WebFaction cares about their servers’ load, is the amount of users per server:

wc –l /etc/passwd (num users)

free –m

cat /proc/cpuinfo