This article supplements a webinar series on deploying and managing containerized workloads in the cloud. The series covers the essentials of containers, including managing container lifecycles, deploying multi-container applications, scaling workloads, and working with Kubernetes. It also highlights best practices for running stateful applications.
If you don’t like to sweat too much and do some pioneering then the safest way to scale of course would be to stick with proven out-of-the-box features of Postgres – so first I’d recommend to take a look at the following keywords with some short explanations and maybe it’s all that you need.
- Light-weight / special purpose indexes
For a complex OLTP system, supporting hundreds of freaky queries, it is very common that the indexes actually take much more disk space than the table files holding the data. To improve on that (especially for indexes that are used infrequently) one can reduce the index sizes drastically with appropriate use of partial, BRIN, GIN or even a bit experimental BLOOM indexes. In total there are 7 different index types supported…but mostly people only know about and use the default B-tree – a big mistake in a multi-TB setting!
Partial indexes allow indexing only a subset of the data – for example in a sales system we might not be interested in fast access to orders in status “FINISHED” (some nightly reports deal with that usually and they can take their time), so why should we index such rows?
GIN, the most know non-default index type perhaps, has been actually around for ages (full-text search) and in short is perfect for indexing columns where there are lot of repeating values – think all kinds of statuses or good old Mr/Mrs/Miss. GIN only stores every unique column value only once as for the default B-tree you’ll have e.g. 1 millon leaf nodes with the integer “1” in it.
BRIN (block-range a.k.a. min-max index) on the other hand is something newer and very different – it’s a lossy index type with a very small disk footprint where not all column values are actually indexed but only the biggest and smallest values for a range of rows (1 MB section of a table by default) – but this still works very well on ordered values and is for example perfect for time series data or other “log” type of tables.
BLOOM might be an exotic but if you manage to find a good use case (“bitmap/matrix search”) for it, it can be up to 20x more efficient than traditional indexing – see here for an example use case when it seems too abstract.
.. advantages of partitioning are: it’s possible to cleanly separate “cold data” and “hot data” – and this gives us some nice options like compacting old data maximally with VACUUM FULL or placing it on another media
As mentioned above – it is possible to move tables / indexes selectively to different disk media with the help of tablespaces. Here one can achieve different goals – to just save money by using slower/affordable disk partitions for “cold” data, keeping only the most recent/important data on fast/expensive media, using some special compressed file systems for data that has a lot of repetitions or using some network shares or even in-memory file systems on remote nodes for massive non-persistent data – there are quite some options. And management of tablespaces is also quite straightforward actually, only transferring existing tables / indexes during live operation can be problematic due to full locking.
.. What I call hybrid tables, are actually based on Postgres’ excellent SQL MED standard implementation also know as Foreign Data Wrappers, and they basically look like normal Postgres tables for read queries but the data might reside or be piped over from literally anywhere – it might be coming from Twitter, LDAP or Amazon S3, see here for the full list of crazy datasources supported. In practice the most used application of Foreign Data Wrappers (FDW-s) is probably making normal (correctly formatted) files look like tables, for example exposing the server log as a table to make monitoring easier.
.. Where’s the scaling part you may ask though? The FDW approach works very well in the sense that it enables to reduce the amount of data by using some clever file formats or just compression, that typically reduces the data size 10-20x so that the data would fit on the node! This works very well for “cold” data, leaving more disk space/cache available for real tables with “hot” data. Since Postgres 10 it is also very easy to implement – sample code here.
Another very promising use case is to use the columnar data storage format (ORC) – take a look at the “c_store” extension project for more info. It’s especially suited for helping to scale large Data Warehouses with tables being up to 10x smaller and queries up to 100% faster.
If we repeat the previous insert of 1-million rows, the new table size is 117,030,912 bytes, or roughly 112MB. By simply reorganizing the table columns, we’ve saved 21% of the total space.
.. I’ve seen 60TB Postgres databases; imagine reducing that by 6-12TB without actually removing any data.
Much like filling a jar with rocks, pebbles, and sand, the most efficient way to declare a Postgres table is by the column alignment type. Bigger columns first, medium columns next, small columns last, and weird exceptions like NUMERIC and TEXT tacked to the end as if they were dust in our analogy. That’s what we get for playing with pointers.
.. Some might ask why this isn’t built into Postgres. Surely it knows the ideal column ordering and has the power to decouple a user’s visible mapping from what actually hits the disk. That’s a legitimate question, but it’s a lot more difficult to answer and involves a lot of bike-shedding.
One major benefit from decoupling physical from logical representation is that Postgres would finally allow column reordering, or adding a column in a specific position. If a user wants their column listing to look pretty after multiple modifications, why not let them?
It’s all about priorities. There’s been a TODO item to address this going back to at least 2006. Patches have gone back and forth since then, and every time, the conversation eventually ends without a definitive conclusion. It’s clearly a difficult problem to address, and there are, as they say, bigger fish to fry.
Given sufficient demand, someone will sponsor a patch to completion, even if it requires multiple Postgres versions for the necessary under-the-hood changes to manifest. Until then, a simple query can magically reveal the ideal column ordering if the impact is pressing enough for a particular use case.
Now he’s talking publicly for the first time. Under pressure from Mark Zuckerberg and Sheryl Sandberg to monetize WhatsApp, he pushed back as Facebook questioned the encryption he’d helped build and laid the groundwork to show targeted ads and facilitate commercial messaging.
Acton also walked away from Facebook a year before his final tranche of stock grants vested. “It was like, okay, well, you want to do these things I don’t want to do,” Acton says. “It’s better if I get out of your way. And I did.” It was perhaps the most expensive moral stand in history. Acton took a screenshot of the stock price on his way out the door—the decision cost him $850 million.
.. “As part of a proposed settlement at the end, [Facebook management] tried to put a nondisclosure agreement in place,” Acton says. “That was part of the reason that I got sort of cold feet in terms of trying to settle with these guys.”
.. That kind of answer masks the kind of issues that just prompted Instagram’s founders to abruptly quit. Kevin Systrom and Mike Krieger reportedly chafed at Facebook and Zuckerberg’s heavy hand. Acton’s account of what happened at WhatsApp—and Facebook’s plans for it—provides a rare founder’s-level window into a company that’s at once the global arbiter of privacy standards and the gatekeeper of facts, while also increasingly straying from its entrepreneurial roots.
.. Despite a transfer of several billion dollars, Acton says he never developed a rapport with Zuckerberg. “I couldn’t tell you much about the guy,” he says. In one of their dozen or so meetings, Zuck told Acton unromantically that WhatsApp, which had a stipulated degree of autonomy within the Facebook universe and continued to operate for a while out of its original offices, was “a product group to him, like Instagram.”
.. So Acton didn’t know what to expect when Zuck beckoned him to his office last September, around the time Acton told Facebook brass that he planned to leave. Acton and Koum had a clause in their contract that allowed them to get all their stock, which was being doled out over four years, if Facebook began “implementing monetization initiatives” without their consent.
.. The Facebook-WhatsApp pairing had been a head-scratcher from the start. Facebook has one of the world’s biggest advertising networks; Koum and Acton hated ads. Facebook’s added value for advertisers is how much it knows about its users; WhatsApp’s founders were pro-privacy zealots who felt their vaunted encryption had been integral to their nearly unprecedented global growth.
.. This dissonance frustrated Zuckerberg. Facebook, Acton says, had decided to pursue two ways of making money from WhatsApp. First, by showing targeted ads in WhatsApp’s new Status feature, which Acton felt broke a social compact with its users. “Targeted advertising is what makes me unhappy,” he says. His motto at WhatsApp had been “No ads, no games, no gimmicks”—a direct contrast with a parent company that derived 98% of its revenue from advertising. Another motto had been “Take the time to get it right,” a stark contrast to “Move fast and break things.”
.. Facebook also wanted to sell businesses tools to chat with WhatsApp users. Once businesses were on board, Facebook hoped to sell them analytics tools, too. The challenge was WhatsApp’s watertight end-to-end encryption, which stopped both WhatsApp and Facebook from reading messages.
.. For his part, Acton had proposed monetizing WhatsApp through a metered-user model, charging, say, a tenth of a penny after a certain large number of free messages were used up. “You build it once, it runs everywhere in every country,” Acton says. “You don’t need a sophisticated sales force. It’s a very simple business.”
.. Acton’s plan was shot down by Sandberg. “Her words were ‘It won’t scale.’ ”
.. “I called her out one time,” says Acton, who sensed there might be greed at play. “I was like, ‘No, you don’t mean that it won’t scale. You mean it won’t make as much money as . . . ,’ and she kind of hemmed and hawed a little. And we moved on. I think I made my point. . . . They are businesspeople, they are good businesspeople. They just represent a set of business practices, principles and ethics, and policies that I don’t necessarily agree with.”
.. When Acton reached Zuckerberg’s office, a Facebook lawyer was present. Acton made clear that the disagreement—Facebook wanted to make money through ads, and he wanted to make it from high-volume users—meant he could get his full allocation of stock. Facebook’s legal team disagreed, saying that WhatsApp had only been exploring monetization initiatives, not “implementing” them.
.. Zuckerberg, for his part, had a simple message: “He was like, This is probably the last time you’ll ever talk to me.”
.. Acton graduated from Stanford with a bachelor’s in computer science and eventually became one of the first employees at Yahoo in 1996, making millions in the process. His biggest asset from that time at Yahoo: befriending Koum, a Ukrainian immigrant he clicked with over their similar no-nonsense style.
.. WhatsApp, persuading a handful of former Yahoo colleagues to fund a seed round while he took on cofounder status and wound up with a roughly 20% stake.
.. two things sparked Zuckerberg’s mega-offer in early 2014. One was hearing that WhatsApp’s founders had been invited to Google’s Mountain View headquarters for talks, and he did not want to lose them to a competitor.
.. He recalls Zuckerberg being “supportive” of WhatsApp’s plans to roll out end-to-end encryption, even though it would block attempts to harvest user data. If anything, he was “quick to respond” during the discussions. Zuckerberg “was not immediately evaluating ramifications in the long term.”
.. told them that they would have “zero pressure” on monetization for the next five years.
.. Facebook prepared Acton to meet with around a dozen representatives of the European Competition Commission in a teleconference. “I was coached to explain that it would be really difficult to merge or blend data between the two systems,”
.. Later he learned that elsewhere in Facebook, there were “plans and technologies to blend data.” Specifically, Facebook could use the 128-bit string of numbers assigned to each phone as a kind of bridge between accounts. The other method was phone-number matching, or pinpointing Facebook accounts with phone numbers and matching them to WhatsApp accounts with the same phone number.
.. Within 18 months, a new WhatsApp terms of service linked the accounts and made Acton look like a liar. “I think everyone was gambling because they thought that the EU might have forgotten because enough time had passed.” No such luck: Facebook wound up paying a $122 million fine for giving “incorrect or misleading information” to the EU—a cost of doing business
.. Linking these overlapping accounts was a crucial first step toward monetizing WhatsApp. The terms-of-service update would lay the groundwork for how WhatsApp could make money. During the discussions over these changes, Facebook sought “broader rights” to WhatsApp user data, Acton says, but WhatsApp’s founders pushed back, reaching a compromise with Facebook management. A clause about no ads would remain, but Facebook would still link the accounts to present friend suggestions on Facebook and offer its advertising partners better targets for ads on Facebook.
.. By then, three years since the deal, Zuckerberg was growing impatient, Acton says, and he expressed his frustrations at an all-hands meeting for WhatsApp staffers. “The CFO projections, the ten-year outlook—they wanted and needed the WhatsApp revenues to continue to show the growth to Wall Street,”
.. Internally, Facebook had targeted a $10 billion revenue run rate within five years of monetization, but such numbers sounded too high to Acton—and reliant on advertising.
.. Acton had left a management position on Yahoo’s ad division over a decade earlier with frustrations at the Web portal’s so-called “Nascar approach” of putting ad banners all over a Web page. The drive for revenue at the expense of a good product experience “gave me a bad taste in my mouth,” Acton remembers. He was now seeing history repeat.
.. He has supercharged a small messaging app, Signal, run by a security researcher named Moxie Marlinspike with a mission to put users before profit, giving it $50 million and turning it into a foundation. Now he’s working with the same people who built the opensource encryption protocol that is part of Signal and protects WhatsApp’s 1.5 billion users and that also sits as an option on Facebook Messenger, Microsoft’s Skype and Google’s Allo messenger. Essentially, he’s re-creating WhatsApp in the pure, idealized form it started: free messages and calls, with end-to-end encryption and no obligations to ad platforms.
Want to change the world? Don’t bother volunteering—get a real, ‘boring’ job.
If you’re volunteering at shelters or working for most nonprofits, that’s all very nice, but it’s one-off. You’re one of the privileged few who have the education to create lasting change. It may feel good to ladle soup to the hungry, but you’re wasting valuable brain waves that could be spent ushering in a future in which no one is hungry to begin with.
There’s a word that was probably never mentioned by your professors: Scale. No, not the stuff on the bottom of your bong or bathtub. It’s the concept of taking a small idea and finding ways to implement it for thousands, or millions, or even billions. Without scale, ideas are no more than hot air. Stop doing the one-off two-step. It’s time to scale up.\
Don’t spend all your time caring for the sick. Prevent disease. Gene therapy, early detection and immunotherapy can change the trajectory of disease because they scale. Don’t build temporary shelters. Figure out how to 3-D print real homes quickly and cheaply. Why tutor a few students when you can capture lessons from best-of-breed teachers and deliver them electronically to millions? That’s scale.
.. There is too much talk of sustainability, the fight over slices of a pie, zero-sum games. That’s the wrong framework. You need sustainability only if you stick to one-off moves.
.. detoxifying oppression
.. Channel that energy to change the stagnant status quo through scale in education, banking and especially government.
.. listen to Bono. As he told Georgetown students a few years ago, “Entrepreneurial capitalism takes more people out of poverty than aid.”
Productivity, safety, and operational simplicity in a unified distributed database.
From the team that scaled Twitter.
AWS Lambda has stamped a big DEPRECATED on containers
When AWS releases their own tooling, it always seems to start out pretty bad, so the temptation is to fill in those gaps with your own tool.
But AWS services change and get better at a very rapid rate. So I think the lesson I learned is lean on AWS as much as possible, or build on top of their foundation and make it pluggable in a way that you can just revert to the AWS tooling when it gets better.
.. As I talk to developers and sysadmins, I feel like I encounter a lot of rage about serverless as a concept. People always want to tell me the three reasons why it would never work for them. Why do you think this concept inspires so much animosity and how do you try to change hearts and minds on this?
A big part of it is that we are deprecating so many things at one time. It does feel like a very big step to me compared to something like containers. Kelsey Hightower said something like this at one point: containers enable you to take the existing paradigm and move it forward, whereas serverless is an entirely new paradigm.
.. And so all these things that people have invented and invested time and money and resources in are just going away, and that’s traumatic, that’s painful. It won’t happen overnight, but anytime you make something that makes people feel like what they’ve maybe spent the last 10 years doing is obsolete, it’s hard.
.. the first time we launched a serverless service, we brought down all of our Redis instances — because Lambda spun up all these containers and we hit connection limits that you would never expect to hit in a normal app.
.. So if you’ve got something sitting on a mainframe somewhere that is used to only having 20 connections and then you moved over some upstream service to Lambda and suddenly it has 10,000 connections instead of 20. You’ve got a problem.
.. You could have an application that’s actually looking at what’s happening in the code and saying: “Wow this one part of your code is taking a long time to run; we should make that its own Lambda function and we should automatically deploy that and set up this SNS trigger for you.” That’s all very pie in the sky, but I think we’re not that far off from having these tools.