Handling 1 Million Requests per Minute with Go

They’re uploading each POST payload to S3 at a rate of up to 1M uploads a minute? They’re going to go broke from S3 operational fees. PUT fees are $0.005 per 1k, or $5/minute, or $7200/dayS3 is an absolutely terrible financial choice for systems that need to store a vast number of tiny files.

They’re batching the requests into larger files on S3. The 1M refers to the number of HTTP requests hitting their server.

Can you show me where in the post that is described because I do not see it. All I see is a description of how they moved the UploadToS3 aspect to a job queue, but it’s still sending individual files to S3.

Storing millions of tiny files in any filesystem is a terrible choice.

Fair point, I was mostly focused on the absurd cost for that specific implementation. What would you suggest as an alternative? A document-oriented database?

If you’re on AWS I would probably go with DynamoDB, if you’re on GCP Datastore. They aren’t drop-in replacements for one another but the way you architect your system will be similar(ish). The main benefit is that it’ll cost less upfront and require less to manage. Now that AWS have simplified back-ups it’s a pretty simple system to operate. If you’re looking for better controls over latency then I’d probably go with Cassandra.There’s a big caveat to any NoSQL database and that’s how you handle aggregates/roll-ups. With a standard database it’s easy to write these queries. If you do it without thinking on a NoSQL system it’ll cost you in performance and where billed per access, money. There’s a few ways to address this;

– batch ala map-reduce.

– streaming ala Apache Beam, Spark, etc.

– in query counting (aka sharded counters).

An underused option is actually SQLite. That gives you a surprisingly feature-rich system with very low overhead. In fact, you may see benefits: faster access and less disk usage https://www.sqlite.org/fasterthanfs.htmlA key-value store would probably work well, depending on how well its storage layer is architected.

Ask HN: Best practice for estimating development tasks?

. My personal bias is that I really dislike under-estimating, and I typically over estimate how long tasks will take (by design) – but usually it’s pretty close (i.e. not an order of magnitude off).

1. Usually, the smallest unit is a week. When people say something will take a day, it’s usually wishful thinking – they forget about meetings, other commitments, etc.

2. Think about what “done” means for individual tasks when estimating. Often more junior engineers will give estimates where “done” means the code is submitted to source control – but really think about the work that it’ll take to get that code into production. Testing, configuration, documentation, monitoring/metrics/alerts, etc. I think a lot of under-estimating comes from this, where the estimates are just for writing the code, and all of the other stuff comes either as a scramble at the end, or a ton of time to actually launch as tons of bugs and missed features are discovered late in the game.

3. Think in latency and throughput. Estimates typically have both baked in – if something takes a week of “at-keyboard’ time, it might take a month to get done because of other commitments. Both estimates are important.

4. Think in probability distributions rather than absolutes. I usually set my threshold around 90% confidence – i.e. my estimates are based on where I am 90% confident I can have something done. When you think about how long something takes, if the stars align, a 10 day task might take 5 days, but if things go wrong, that 10 day task could take several months or more. Things can only go a little bit better than expected, or a lot worse than expected. But when people make estimates, I think they are often biased toward the “stars align” case, and forget about this fat tail.

5. With team estimates, add a friction value to longer term estimates to account for vacations, sick days, distractions, etc. Usually around 30% seems to work. So that one month task (20 days) I plan for it taking 20 * 1.3 = 26 days.

6. Work backwards from the end goal. This forces you to factor in things like testing and productionalization. This also helps with coordination too, as you can figure out when to parallelize tasks to get to the end goal.

7. Write estimates down, and go back and reflect. When something that you thought would take a week took a month, sit down and really think about why. This will help you recognize and refine your biases.

w3lib.url.canonicalize_url

w3lib.url.canonicalize_url(urlkeep_blank_values=Truekeep_fragments=Falseencoding=None)[source]

Canonicalize the given url by applying the following procedures:

  • sort query arguments, first by key, then by value
  • percent encode paths ; non-ASCII characters are percent-encoded using UTF-8 (RFC-3986)
  • percent encode query arguments ; non-ASCII characters are percent-encoded using passed encoding (UTF-8 by default)
  • normalize all spaces (in query arguments) ‘+’ (plus symbol)
  • normalize percent encodings case (%2f -> %2F)
  • remove query arguments with blank values (unless keep_blank_values is True)
  • remove fragments (unless keep_fragments is True)

The url passed can be bytes or unicode, while the url returned is always a native str (bytes in Python 2, unicode in Python 3).

>>> import w3lib.url
>>>
>>> # sorting query arguments
>>> w3lib.url.canonicalize_url('http://www.example.com/do?c=3&b=5&b=2&a=50')
'http://www.example.com/do?a=50&b=2&b=5&c=3'
>>>
>>> # UTF-8 conversion + percent-encoding of non-ASCII characters
>>> w3lib.url.canonicalize_url(u'http://www.example.com/r\u00e9sum\u00e9')
'http://www.example.com/r%C3%A9sum%C3%A9'
>>>