scalability

Amazon S3: Request Rate and Performance Considerations

Amazon S3 scales to support very high request rates. If your workload in an Amazon S3 bucket routinely exceeds 100 PUT/LIST/DELETE requests per second or more than 300 GET requests per second, follow the guidelines in this topic to ensure the best performance and scalability. If your request rate grows steadily, Amazon S3 automatically partitions your buckets as needed to support higher request rates. However, if you expect a rapid increase in the request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, we recommend that you open a support case to prepare for the workload and avoid any temporary limits on your request rate. To open a support case, go to Contact Us.

.. Example 1: Add a Hex Hash Prefix to Key Name

One way to introduce randomness to key names is to add a hash string as prefix to the key name. For example, you can compute an MD5 hash of the character sequence that you plan to assign as the key name. From the hash, pick a specific number of characters, and add them as the prefix to the key name. The following example shows key names with a four-character hash.

MoSQL: a MongoDB → SQL streaming translator

At Stripe, we love MongoDB. We love the flexibility it gives us in changing data schemas as we grow and learn, and we love its operational properties. We love replsets. We love the uniform query language that doesn’t require generating and parsing strings, tracking placeholder parameters, or any of that nonsense.

The thing is, we also love SQL. We love the ease of doing ad-hoc data analysis over small-to-mid-size datasets in SQL. We love doing JOINs to pull together reports summarizing properties across multiple datasets. We love the fact that virtually every employee we hire already knows SQL and is comfortable using it to ask and answer questions about data.

Samza: Apache stream processing framework

Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.

How to scale PostgreSQL on Amazon RDS using pg_shard

Apart from simplifying database administration, using Amazon RDS or a similar solution with pg_shard has another benefit. RDS instances have automatic failover using streaming replication, which means that it is not necessary to use pg_shard’s built-in replication for high availability. Without replication, pg_shard can be used in a multi-master / masterless set-up.

At this week’s PGDay UK, we demonstrated a distributed PostgreSQL cluster consisting of 4 worker nodes on RDS and 2 master nodes on EC2 with pg_shard installed (as shown below). We showed how the cluster automatically recovers when you terminate workers or master nodes while running queries. To make it even more interesting, we put the master nodes in an auto-scaling group and put a load-balancer in front of them. This architecture is somewhat experimental, but it can support a very high number of transactions per second and very large data sizes.

.. In our initial benchmarks using the CloudFormation template we saw well over 100k INSERTS/second across 4 RDS instances (db.r3.4xlarge).