Cost of a Join

How expensive is a join?

It depends! It depends what the join criteria is, what indexes are present, how big the tables are, whether the relations are cached, what hardware is being used, what configuration parameters are set, whether statistics are up-to-date, what other activity is happening on the system, to name a few things.

  1. But we can still try and get a feel for what a couple simple scenarios look like and see what happens when:
  2. The number of tables being joined increases
  3. The number of rows in those tables increases
    Indexes are present / not present

It’s interesting that when indexes are being used it almost doesn’t matter how many rows are in the table, as we can see their times are all more or less together.

.. Even with a query that already joins 150 tables w/ 100k rows each, adding another table is only an additional 1.2 ms increase. Cool!

.. Based on the improvement we saw with indexes previously, it’s not too much of a surprise that we get great performance with our million row tables as well. But still, joining 50 tables with 1M rows each, in just 12ms. Wow!

Yes, Trump Is Weak. So Is Congress.

Mr. Trump is a uniquely dysfunctional chief executive. He contributed to this latest failure of governance with some characteristic misbehavior: erratic, contradictory commitments; confusing tweets; even blowing up a negotiating session by crudely insulting vast swaths of humanity.

As Mitch McConnell, the Senate majority leader, said last week, “As soon as we figure out what he is for, then I would be convinced that we were not just spinning our wheels.”

.. The problem Mr. Trump poses for the rest of the constitutional system is not that he is too strong and overbearing, but that he is too weak and fitful.

For Congress, such a problem might easily present an opportunity. A president unsure of what he wants could be a chance for the legislative branch to put itself in the driver’s seat.

That nothing of the sort has happened suggests that Mr. Trump is far from the whole story of contemporary Washington’s debilitation. His weakness has shed light on Congress’s weakness, and should force legislators to face some tough questions about the state of their own institution.

.. Conservatives are accustomed to blaming that on aggression by the other two branches — an overweening executive and administrative state and a hyperactive judiciary. There is surely truth to that indictment. But we should acknowledge, too, that the aggression of the other two branches has often been invited by the willful weakness of the Congress.

.. Not wishing to take responsibility for making hard choices, members of Congress (particularly when the president is of their party) have long been happy to enact vague legislation at best and to leave big decisions to the executive and judicial branches.

.. Is Congress’s purpose to

  • implement the agenda of the majority party most effectively, or is its purpose to
  • compel and enable accommodations in a divided country?

Today’s Congress does neither very well. But which failure is a bug and which is a feature?

.. Those two visions of Congress’s purpose (which the political scientist Daniel Stid labels “Wilsonian” and “Madisonian,” respectively) generally point in opposite directions when it comes to strengthening Congress,

.. The Wilsonian vision would have Congress function more like a European parliament, with stronger centralized leadership and fewer choke points and protections of minority prerogatives. It would enable the party that won a majority of seats to enact its agenda and see what voters make of it in the next election.

.. The Madisonian vision would recover the purpose of Congress in our larger constitutional system but would mean slow going, greater cacophony, less centralization and more opportunities for coalitions of strange bedfellows to form. It would have Congress serve as an arena for continuing bargaining and compromise, on the premise that greater social peace is better for the country than either party’s bright ideas.

A more parliamentary Congress has been the dream of progressive reformers for more than a century, but it is a poor fit not only for a system of divided powers but also for a polarized society. We need Congress to pursue and drive accommodations — in fact, as the political scientist Philip Wallach has recently argued, Congress is really the only institution in our system of government that could do that.

.. Too often, members in both parties seem to conceive of their work as performative rather than deliberative and use Congress as a platform to raise their profiles or build their personal brands before a larger audience, rather than letting Congress’s constitutional contours contain, reshape and channel their ambitions.

.. This is also how President Trump conceives of the presidency — and in some key respects how his predecessor did, too. It is how too many judges think of their work, and how too many journalists, professors and other professionals think of theirs. They think of institutions not as formative but as performative, not as molds that shape their character and actions but as platforms for displaying themselves and signaling their virtue.

Handling 1 Million Requests per Minute with Go

They’re uploading each POST payload to S3 at a rate of up to 1M uploads a minute? They’re going to go broke from S3 operational fees. PUT fees are $0.005 per 1k, or $5/minute, or $7200/dayS3 is an absolutely terrible financial choice for systems that need to store a vast number of tiny files.
They’re batching the requests into larger files on S3. The 1M refers to the number of HTTP requests hitting their server.
Can you show me where in the post that is described because I do not see it. All I see is a description of how they moved the UploadToS3 aspect to a job queue, but it’s still sending individual files to S3.
Storing millions of tiny files in any filesystem is a terrible choice.
Fair point, I was mostly focused on the absurd cost for that specific implementation. What would you suggest as an alternative? A document-oriented database?
If you’re on AWS I would probably go with DynamoDB, if you’re on GCP Datastore. They aren’t drop-in replacements for one another but the way you architect your system will be similar(ish). The main benefit is that it’ll cost less upfront and require less to manage. Now that AWS have simplified back-ups it’s a pretty simple system to operate. If you’re looking for better controls over latency then I’d probably go with Cassandra.There’s a big caveat to any NoSQL database and that’s how you handle aggregates/roll-ups. With a standard database it’s easy to write these queries. If you do it without thinking on a NoSQL system it’ll cost you in performance and where billed per access, money. There’s a few ways to address this;

– batch ala map-reduce.

– streaming ala Apache Beam, Spark, etc.

– in query counting (aka sharded counters).

An underused option is actually SQLite. That gives you a surprisingly feature-rich system with very low overhead. In fact, you may see benefits: faster access and less disk usage https://www.sqlite.org/fasterthanfs.htmlA key-value store would probably work well, depending on how well its storage layer is architected.