This article supplements a webinar series on deploying and managing containerized workloads in the cloud. The series covers the essentials of containers, including managing container lifecycles, deploying multi-container applications, scaling workloads, and working with Kubernetes. It also highlights best practices for running stateful applications.
IDEAS FOR SCALING POSTGRESQL TO MULTI-TERABYTE AND BEYOND
If you don’t like to sweat too much and do some pioneering then the safest way to scale of course would be to stick with proven out-of-the-box features of Postgres – so first I’d recommend to take a look at the following keywords with some short explanations and maybe it’s all that you need.
- Light-weight / special purpose indexes
For a complex OLTP system, supporting hundreds of freaky queries, it is very common that the indexes actually take much more disk space than the table files holding the data. To improve on that (especially for indexes that are used infrequently) one can reduce the index sizes drastically with appropriate use of partial, BRIN, GIN or even a bit experimental BLOOM indexes. In total there are 7 different index types supported…but mostly people only know about and use the default B-tree – a big mistake in a multi-TB setting!
Partial indexes allow indexing only a subset of the data – for example in a sales system we might not be interested in fast access to orders in status “FINISHED” (some nightly reports deal with that usually and they can take their time), so why should we index such rows?
GIN, the most know non-default index type perhaps, has been actually around for ages (full-text search) and in short is perfect for indexing columns where there are lot of repeating values – think all kinds of statuses or good old Mr/Mrs/Miss. GIN only stores every unique column value only once as for the default B-tree you’ll have e.g. 1 millon leaf nodes with the integer “1” in it.
BRIN (block-range a.k.a. min-max index) on the other hand is something newer and very different – it’s a lossy index type with a very small disk footprint where not all column values are actually indexed but only the biggest and smallest values for a range of rows (1 MB section of a table by default) – but this still works very well on ordered values and is for example perfect for time series data or other “log” type of tables.
BLOOM might be an exotic but if you manage to find a good use case (“bitmap/matrix search”) for it, it can be up to 20x more efficient than traditional indexing – see here for an example use case when it seems too abstract.
.. advantages of partitioning are: it’s possible to cleanly separate “cold data” and “hot data” – and this gives us some nice options like compacting old data maximally with VACUUM FULL or placing it on another media
As mentioned above – it is possible to move tables / indexes selectively to different disk media with the help of tablespaces. Here one can achieve different goals – to just save money by using slower/affordable disk partitions for “cold” data, keeping only the most recent/important data on fast/expensive media, using some special compressed file systems for data that has a lot of repetitions or using some network shares or even in-memory file systems on remote nodes for massive non-persistent data – there are quite some options. And management of tablespaces is also quite straightforward actually, only transferring existing tables / indexes during live operation can be problematic due to full locking.
.. What I call hybrid tables, are actually based on Postgres’ excellent SQL MED standard implementation also know as Foreign Data Wrappers, and they basically look like normal Postgres tables for read queries but the data might reside or be piped over from literally anywhere – it might be coming from Twitter, LDAP or Amazon S3, see here for the full list of crazy datasources supported. In practice the most used application of Foreign Data Wrappers (FDW-s) is probably making normal (correctly formatted) files look like tables, for example exposing the server log as a table to make monitoring easier.
.. Where’s the scaling part you may ask though? The FDW approach works very well in the sense that it enables to reduce the amount of data by using some clever file formats or just compression, that typically reduces the data size 10-20x so that the data would fit on the node! This works very well for “cold” data, leaving more disk space/cache available for real tables with “hot” data. Since Postgres 10 it is also very easy to implement – sample code here.
Another very promising use case is to use the columnar data storage format (ORC) – take a look at the “c_store” extension project for more info. It’s especially suited for helping to scale large Data Warehouses with tables being up to 10x smaller and queries up to 100% faster.
Postgres Column Order Affects Space Used: On Rocks and Sand
If we repeat the previous insert of 1-million rows, the new table size is 117,030,912 bytes, or roughly 112MB. By simply reorganizing the table columns, we’ve saved 21% of the total space.
.. I’ve seen 60TB Postgres databases; imagine reducing that by 6-12TB without actually removing any data.
Much like filling a jar with rocks, pebbles, and sand, the most efficient way to declare a Postgres table is by the column alignment type. Bigger columns first, medium columns next, small columns last, and weird exceptions like NUMERIC and TEXT tacked to the end as if they were dust in our analogy. That’s what we get for playing with pointers.
.. Some might ask why this isn’t built into Postgres. Surely it knows the ideal column ordering and has the power to decouple a user’s visible mapping from what actually hits the disk. That’s a legitimate question, but it’s a lot more difficult to answer and involves a lot of bike-shedding.
One major benefit from decoupling physical from logical representation is that Postgres would finally allow column reordering, or adding a column in a specific position. If a user wants their column listing to look pretty after multiple modifications, why not let them?
It’s all about priorities. There’s been a TODO item to address this going back to at least 2006. Patches have gone back and forth since then, and every time, the conversation eventually ends without a definitive conclusion. It’s clearly a difficult problem to address, and there are, as they say, bigger fish to fry.
Given sufficient demand, someone will sponsor a patch to completion, even if it requires multiple Postgres versions for the necessary under-the-hood changes to manifest. Until then, a simple query can magically reveal the ideal column ordering if the impact is pressing enough for a particular use case.
Exclusive: WhatsApp Cofounder Brian Acton Gives The Inside Story On #DeleteFacebook And Why He Left $850 Million Behind
Now he’s talking publicly for the first time. Under pressure from Mark Zuckerberg and Sheryl Sandberg to monetize WhatsApp, he pushed back as Facebook questioned the encryption he’d helped build and laid the groundwork to show targeted ads and facilitate commercial messaging.
Acton also walked away from Facebook a year before his final tranche of stock grants vested. “It was like, okay, well, you want to do these things I don’t want to do,” Acton says. “It’s better if I get out of your way. And I did.” It was perhaps the most expensive moral stand in history. Acton took a screenshot of the stock price on his way out the door—the decision cost him $850 million.
.. “As part of a proposed settlement at the end, [Facebook management] tried to put a nondisclosure agreement in place,” Acton says. “That was part of the reason that I got sort of cold feet in terms of trying to settle with these guys.”
.. That kind of answer masks the kind of issues that just prompted Instagram’s founders to abruptly quit. Kevin Systrom and Mike Krieger reportedly chafed at Facebook and Zuckerberg’s heavy hand. Acton’s account of what happened at WhatsApp—and Facebook’s plans for it—provides a rare founder’s-level window into a company that’s at once the global arbiter of privacy standards and the gatekeeper of facts, while also increasingly straying from its entrepreneurial roots.
.. Despite a transfer of several billion dollars, Acton says he never developed a rapport with Zuckerberg. “I couldn’t tell you much about the guy,” he says. In one of their dozen or so meetings, Zuck told Acton unromantically that WhatsApp, which had a stipulated degree of autonomy within the Facebook universe and continued to operate for a while out of its original offices, was “a product group to him, like Instagram.”
.. So Acton didn’t know what to expect when Zuck beckoned him to his office last September, around the time Acton told Facebook brass that he planned to leave. Acton and Koum had a clause in their contract that allowed them to get all their stock, which was being doled out over four years, if Facebook began “implementing monetization initiatives” without their consent.
.. The Facebook-WhatsApp pairing had been a head-scratcher from the start. Facebook has one of the world’s biggest advertising networks; Koum and Acton hated ads. Facebook’s added value for advertisers is how much it knows about its users; WhatsApp’s founders were pro-privacy zealots who felt their vaunted encryption had been integral to their nearly unprecedented global growth.
.. This dissonance frustrated Zuckerberg. Facebook, Acton says, had decided to pursue two ways of making money from WhatsApp. First, by showing targeted ads in WhatsApp’s new Status feature, which Acton felt broke a social compact with its users. “Targeted advertising is what makes me unhappy,” he says. His motto at WhatsApp had been “No ads, no games, no gimmicks”—a direct contrast with a parent company that derived 98% of its revenue from advertising. Another motto had been “Take the time to get it right,” a stark contrast to “Move fast and break things.”
.. Facebook also wanted to sell businesses tools to chat with WhatsApp users. Once businesses were on board, Facebook hoped to sell them analytics tools, too. The challenge was WhatsApp’s watertight end-to-end encryption, which stopped both WhatsApp and Facebook from reading messages.
.. For his part, Acton had proposed monetizing WhatsApp through a metered-user model, charging, say, a tenth of a penny after a certain large number of free messages were used up. “You build it once, it runs everywhere in every country,” Acton says. “You don’t need a sophisticated sales force. It’s a very simple business.”
.. Acton’s plan was shot down by Sandberg. “Her words were ‘It won’t scale.’ ”
.. “I called her out one time,” says Acton, who sensed there might be greed at play. “I was like, ‘No, you don’t mean that it won’t scale. You mean it won’t make as much money as . . . ,’ and she kind of hemmed and hawed a little. And we moved on. I think I made my point. . . . They are businesspeople, they are good businesspeople. They just represent a set of business practices, principles and ethics, and policies that I don’t necessarily agree with.”
.. When Acton reached Zuckerberg’s office, a Facebook lawyer was present. Acton made clear that the disagreement—Facebook wanted to make money through ads, and he wanted to make it from high-volume users—meant he could get his full allocation of stock. Facebook’s legal team disagreed, saying that WhatsApp had only been exploring monetization initiatives, not “implementing” them.
.. Zuckerberg, for his part, had a simple message: “He was like, This is probably the last time you’ll ever talk to me.”
.. Acton graduated from Stanford with a bachelor’s in computer science and eventually became one of the first employees at Yahoo in 1996, making millions in the process. His biggest asset from that time at Yahoo: befriending Koum, a Ukrainian immigrant he clicked with over their similar no-nonsense style.
.. WhatsApp, persuading a handful of former Yahoo colleagues to fund a seed round while he took on cofounder status and wound up with a roughly 20% stake.
.. two things sparked Zuckerberg’s mega-offer in early 2014. One was hearing that WhatsApp’s founders had been invited to Google’s Mountain View headquarters for talks, and he did not want to lose them to a competitor.
.. He recalls Zuckerberg being “supportive” of WhatsApp’s plans to roll out end-to-end encryption, even though it would block attempts to harvest user data. If anything, he was “quick to respond” during the discussions. Zuckerberg “was not immediately evaluating ramifications in the long term.”
.. told them that they would have “zero pressure” on monetization for the next five years.
.. Facebook prepared Acton to meet with around a dozen representatives of the European Competition Commission in a teleconference. “I was coached to explain that it would be really difficult to merge or blend data between the two systems,”
.. Later he learned that elsewhere in Facebook, there were “plans and technologies to blend data.” Specifically, Facebook could use the 128-bit string of numbers assigned to each phone as a kind of bridge between accounts. The other method was phone-number matching, or pinpointing Facebook accounts with phone numbers and matching them to WhatsApp accounts with the same phone number.
.. Within 18 months, a new WhatsApp terms of service linked the accounts and made Acton look like a liar. “I think everyone was gambling because they thought that the EU might have forgotten because enough time had passed.” No such luck: Facebook wound up paying a $122 million fine for giving “incorrect or misleading information” to the EU—a cost of doing business
.. Linking these overlapping accounts was a crucial first step toward monetizing WhatsApp. The terms-of-service update would lay the groundwork for how WhatsApp could make money. During the discussions over these changes, Facebook sought “broader rights” to WhatsApp user data, Acton says, but WhatsApp’s founders pushed back, reaching a compromise with Facebook management. A clause about no ads would remain, but Facebook would still link the accounts to present friend suggestions on Facebook and offer its advertising partners better targets for ads on Facebook.
.. By then, three years since the deal, Zuckerberg was growing impatient, Acton says, and he expressed his frustrations at an all-hands meeting for WhatsApp staffers. “The CFO projections, the ten-year outlook—they wanted and needed the WhatsApp revenues to continue to show the growth to Wall Street,”
.. Internally, Facebook had targeted a $10 billion revenue run rate within five years of monetization, but such numbers sounded too high to Acton—and reliant on advertising.
.. Acton had left a management position on Yahoo’s ad division over a decade earlier with frustrations at the Web portal’s so-called “Nascar approach” of putting ad banners all over a Web page. The drive for revenue at the expense of a good product experience “gave me a bad taste in my mouth,” Acton remembers. He was now seeing history repeat.
.. He has supercharged a small messaging app, Signal, run by a security researcher named Moxie Marlinspike with a mission to put users before profit, giving it $50 million and turning it into a foundation. Now he’s working with the same people who built the opensource encryption protocol that is part of Signal and protects WhatsApp’s 1.5 billion users and that also sits as an option on Facebook Messenger, Microsoft’s Skype and Google’s Allo messenger. Essentially, he’s re-creating WhatsApp in the pure, idealized form it started: free messages and calls, with end-to-end encryption and no obligations to ad platforms.