Handling 1 Million Requests per Minute with Go

They’re uploading each POST payload to S3 at a rate of up to 1M uploads a minute? They’re going to go broke from S3 operational fees. PUT fees are $0.005 per 1k, or $5/minute, or $7200/dayS3 is an absolutely terrible financial choice for systems that need to store a vast number of tiny files.

They’re batching the requests into larger files on S3. The 1M refers to the number of HTTP requests hitting their server.

Can you show me where in the post that is described because I do not see it. All I see is a description of how they moved the UploadToS3 aspect to a job queue, but it’s still sending individual files to S3.

Storing millions of tiny files in any filesystem is a terrible choice.

Fair point, I was mostly focused on the absurd cost for that specific implementation. What would you suggest as an alternative? A document-oriented database?

If you’re on AWS I would probably go with DynamoDB, if you’re on GCP Datastore. They aren’t drop-in replacements for one another but the way you architect your system will be similar(ish). The main benefit is that it’ll cost less upfront and require less to manage. Now that AWS have simplified back-ups it’s a pretty simple system to operate. If you’re looking for better controls over latency then I’d probably go with Cassandra.There’s a big caveat to any NoSQL database and that’s how you handle aggregates/roll-ups. With a standard database it’s easy to write these queries. If you do it without thinking on a NoSQL system it’ll cost you in performance and where billed per access, money. There’s a few ways to address this;

– batch ala map-reduce.

– streaming ala Apache Beam, Spark, etc.

– in query counting (aka sharded counters).

An underused option is actually SQLite. That gives you a surprisingly feature-rich system with very low overhead. In fact, you may see benefits: faster access and less disk usage https://www.sqlite.org/fasterthanfs.htmlA key-value store would probably work well, depending on how well its storage layer is architected.

The growing trend toward compiled languages

Go was designed by a team in Google that included Dennis Richie and Ken Thompson, who were the original designers of C. They created a programming language which is based on parallelism and safety. Go is simpler in syntax than C/C++ and employs automatic garbage collection like Java, but with better performance. The language isn’t low-level enough for operating systems and device drivers, but it is finding a large number of uses, including running Google Web Services.

.. Swift is designed to replace Apple’s Objective-C. Unlike Go, it incorporates more of the features from higher-level languages. It also has automatic freeing of memory, but it uses resource counters to determine when to free the memory, rather than a garbage collector which imposes a performance penalty.

.. The most interesting of these new languages in my opinion is Rust, which was designed by the Mozilla Foundation in order to develop a new web browser engine based on safety and parallelism. Instead of a garbage collector or resource counter, Rust uses a novel concept of ownership to determine when to free memory. Only one variable at a time can own a piece of memory, so memory is automatically freed once its variable goes out of scope

.. Rust strives to give the programmer all the control of C/C++, but with greater safety and many features of high level languages such as closures, pattern matching, generics, loops with automatic iterators, and traits without the overhead of classes and inheritance. The Rust designers describe these features as “zero-cost abstraction,” meaning that they want to give programmers features from high-level languages, but without the performance cost of that abstraction.

.. Whereas Go and Swift are easier to learn and the syntax is simpler than C/C++, Rust has a very complex syntax and is much harder to learn in my opinion. Writing code in Rust forces the programmer to do a lot more mental work from the outset and type more code to do the same thing as Go or Swift.

.. Despite its steep learning curve and complex syntax, Rust is rated by Stack Overflow’s 2016 Developer Survey as the programming language with the highest percentage of users who want to continue using it. The language with the second highest percentage is Swift and Go is the fifth highest.

.. The rise of Go, Swift and Rust is a sign of the growing recognition that Moore’s Law is sputtering out and we will need to live in a world of limited resources. We don’t have the luxury of solving all problems by simply throwing more hardware and costly layers of abstraction at it. Instead, we need leaner code that consumes as little memory and processing cycles as possible, while still providing safety and concurrency.

Grumpy: Go running Python!

Google runs millions of lines of Python code. The front-end server that drives youtube.com and YouTube’s APIs is primarily written in Python, and it serves millions of requests per second! YouTube’s front-end runs on CPython 2.7,

.. but we always run up against the same issue: it’s very difficult to make concurrent workloads perform well on CPython.

.. Grumpy is an experimental Python runtime for Go. It translates Python code into Go programs, and those transpiled programs run seamlessly within the Go runtime.

.. The goal is for Grumpy to be a drop-in replacement runtime for any pure-Python project.

.. In particular, Grumpy has no global interpreter lock, and it leverages Go’s garbage collection for object lifetime management instead of counting references. We think Grumpy has the potential to scale more gracefully than CPython for many real world workloads.

.. Grumpy programs can import Go packages just like Python modules! For example, the Python snippet below uses Go’s standard net/http package to start a simple server