How Google Edged Out Rivals and Built the World’s Dominant Ad Machine: A Visual Guide

The U.S. is investigating whether the tech giant has abused its power, including as the biggest broker of digital ad sales across the web

Nexstar Media Group Inc., the largest local news company in the U.S., recently tested what would happen if it stopped using Google’s technology to place ads on its websites.

Over several days, the company’s video ad sales plummeted. “That’s a huge revenue hit,” said Tony Katsur, senior vice president at Nexstar. After its brief test, Nexstar switched back to Google.

Alphabet Inc. ’s Google is under fire for its dominance in digital advertising, in part because of issues like this. The U.S. Justice Department and state attorneys general are investigating whether Google is abusing its power, including as the dominant broker of digital ad sales across the web. Most of the nearly 130 questions the states asked in a September subpoena were about the inner workings of Google’s ad products and how they interact.

We dug into Google’s vast, opaque ad machine, and in a series of graphics below, show you how it all works—and why publishers and rivals have had so many complaints about it.

Much of Google’s power as an ad broker stems from acquisitions of ad-technology companies, especially its 2008 purchase of DoubleClick. Regulators who approved that $3.1 billion deal warned they would step in if the company tied together its offerings in anticompetitive ways.

In interviews, dozens of publishing and advertising executives said Google is doing just that with an array of interwoven products. Google operates the leading selling and buying tools, and the biggest marketplace where online ad deals happen.

When Nexstar didn’t use Google’s selling tool, it missed out on a huge amount of demand that comes through its buying tools, Mr. Katsur said: “They want you locked in.”

What Ever Happened to Google Books?

It was the most ambitious library project of our time—a plan to scan all of the world’s books and make them available to the public online. “We think that we can do it all inside of ten years,” Marissa Mayer, who was then a vice-president at Google, said to this magazine in 2007, when Google Books was in its beta stage. “It’s mind-boggling to me, how close it is.”

Today, the project sits in a kind of limbo. On one hand, Google has scanned an impressive thirty million volumes, putting it in a league with the world’s larger libraries (the library of Congress has around thirty-seven million books). That is a serious accomplishment. But while the corpus is impressive, most of it remains inaccessible. Searches of out-of-print books often yield mere snippets of the text—there is no way to gain access to the whole book. The thrilling thing about Google Books, it seemed to me, was not just the opportunity to read a line here or there; it was the possibility of exploring the full text of millions of out-of-print books and periodicals that had no real commercial value but nonetheless represented a treasure trove for the public. In other words, it would be the world’s first online library worthy of that name. And yet the attainment of that goal has been stymied, despite Google having at its disposal an unusual combination of technological means, the agreement of many authors and publishers, and enough money to compensate just about everyone who needs it.

The problems began with a classic culture clash when, in 2002, Google began just scanning books, either hoping that the idealism of the project would win everyone over or following the mantra that it is always easier to get forgiveness than permission. That approach didn’t go over well with authors and publishers, who sued for copyright infringement. Two years of insults, ill will, and litigation ensued. Nonetheless, by 2008, representatives of authors, publishers, and Google did manage to reach a settlement to make the full library available to the public, for pay, and to institutions. In the settlement agreement, they also put terminals in libraries, but didn’t ever get around to doing that. But that agreement then came under further attacks from a whole new set of critics, including the author Ursula Le Guin, who called it a “deal with the devil.” Others argued that the settlement could create a monopoly in online, out-of-print books.

Four years ago, a federal judge sided with the critics and threw out the 2008 settlement, adding that aspects of the copyright issue would be more appropriately decided by the legislature. “Sounds like a job for Congress,” James Grimmelmann, a law professor at the University of Maryland and one of the settlement’s more vocal antagonists, said at the time. But, of course, leaving things to Congress has become a synonym for doing nothing, and, predictably, a full seven years after the court decision was first announced, we’re still waiting.

There are plenty of ways to attribute blame in this situation. If Google was, in truth, motivated by the highest ideals of service to the public, then it should have declared the project a non-profit from the beginning, thereby extinguishing any fears that the company wanted to somehow make a profit from other people’s work. Unfortunately, Google made the mistake it often makes, which is to assume that people will trust it just because it’s Google. For their part, authors and publishers, even if they did eventually settle, were difficult and conspiracy-minded, particularly when it came to weighing abstract and mainly worthless rights against the public’s interest in gaining access to obscure works. Finally, the outside critics and the courts were entirely too sanguine about killing, as opposed to improving, a settlement that took so many years to put together, effectively setting the project back a decade if not longer.

In the past few years, the Authors Guild has usefully proposed a solution known as an “extended collective licensing” system. Using a complex mechanism, it would allow the owners of scanned, out-of-print libraries, such as Google or actual non-profits like the Hathitrust library, to make a limited set of them available with payouts to authors. The United States Copyright Office supports this plan. I have a simpler suggestion, nicknamed the Big Bang license. Congress should allow anyone with a scanned library to pay some price—say, a hundred and twenty-five million dollars—to gain a license, subject to any opt-outs, allowing them to make those scanned prints available to institutional or individual subscribers. That money would be divided equally among all the rights holders who came forward to claim it in a three-year window—split fifty-fifty between authors and publishers. It is, admittedly, a crude, one-time solution to the problem, but it would do the job, and it might just mean that the world would gain access to the first real online library within this lifetime.

To Break Google’s Monopoly on Search, Make Its Index Public

Ex-Google-Search engineer here, having also done some projects since leaving that involve data-mining publicly-available web documents.

This proposal won’t do very much. Indexing is the (relatively) easy part of building a search engine. CommonCrawl already indexes the top 3B+ pages on the web and makes it freely available on AWS. It costs about $50 to grep over it, $800 or so to run a moderately complex Hadoop job.

(For comparison, when I was at Google nearly all research & new features were done on the top 4B pages, and the remaining 150B+ pages were only consulted if no results in the top 4B turned up. Difficulty of running a MapReduce over that corpus was actually a little harder than running a Hadoop job over CommonCrawl, because there’s less documentation available.)

The comments here that PageRank is Google’s secret sauce also aren’t really true – Google hasn’t used PageRank since 2006. The ones about the search & clickthrough data being important are closer, but I suspect that if you made those public you still wouldn’t have an effective Google competitor.

The real reason Google’s still on top is that consumer habits are hard to change, and once people have 20 years of practice solving a problem one way, most of them are not going to switch unless the alternative isn’t just better, it’s way, way better. Same reason I still buy Quilted Northern toilet paper despite knowing that it supports the Koch brothers and their abhorrent political views, or drink Coca-Cola despite knowing how unhealthy it is.

If you really want to open the search-engine space to competition, you’d have to break Google up and then forbid any of the baby-Googles from using the Google brand or google.com domain name. (Needless to say, you’d also need to get rid of Chrome & Toolbar integration.) Same with all the other monopolies that plague the American business landscape. Once you get to a certain age, the majority of the business value is in the brand, and so the only way to keep the monopoly from dominating its industry again is to take away the brand and distribute the productive capacity to successor companies on relatively even footing.

Ex-Google-Search engineer here, having also done some projects since leaving that involve data-mining publicly-available web documents.This proposal won’t do very much. Indexing is the (relatively) easy part of building a search engine. CommonCrawl already indexes the top 3B+ pages on the web and makes it freely available on AWS. It costs about $50 to grep over it, $800 or so to run a moderately complex Hadoop job.

(For comparison, when I was at Google nearly all research & new features were done on the top 4B pages, and the remaining 150B+ pages were only consulted if no results in the top 4B turned up. Difficulty of running a MapReduce over that corpus was actually a little harder than running a Hadoop job over CommonCrawl, because there’s less documentation available.)

The comments here that PageRank is Google’s secret sauce also aren’t really true – Google hasn’t used PageRank since 2006. The ones about the search & clickthrough data being important are closer, but I suspect that if you made those public you still wouldn’t have an effective Google competitor.

The real reason Google’s still on top is that consumer habits are hard to change, and once people have 20 years of practice solving a problem one way, most of them are not going to switch unless the alternative isn’t just better, it’s way, way better. Same reason I still buy Quilted Northern toilet paper despite knowing that it supports the Koch brothers and their abhorrent political views, or drink Coca-Cola despite knowing how unhealthy it is.

If you really want to open the search-engine space to competition, you’d have to break Google up and then forbid any of the baby-Googles from using the Google brand or google.com domain name. (Needless to say, you’d also need to get rid of Chrome & Toolbar integration.) Same with all the other monopolies that plague the American business landscape. Once you get to a certain age, the majority of the business value is in the brand, and so the only way to keep the monopoly from dominating its industry again is to take away the brand and distribute the productive capacity to successor companies on relatively even footing.

Sure, it costs $50 to grep it, but how much does it cost to host an in-memory index with all the data?This is not a proposal to just share the crawl data, but the actual searchable index, presumably at arms length cost both internally & externally.

The same ideas could be extended to the Knowledge Graph, etc.

IMO the goal here should not be to kill Google, but to keep Google on their toes by removing barriers to competition.

This ^ times a 1000.Google simply has the best search product. They invest in it like crazy.

I’ve tried bing multiple times. It’s slow, it spams msn ads in your face on the homepage. Microsoft just doesn’t get the value of a clean UX.

DuckDuckGo results are pretty irrelevant the last time I tried them. There is nothing that comes close to their usability. To make the switchover, it has to be much much better than Google. Chances are that if something is, Google will buy them.

One thing to keep in mind when comparing DuckDuckGo to Google is that people do not use Google with an alternative backup in mind. When you DDG something and it fails, you can always switch to google.But what about when Google fails? Unlike DDG, there is no culture of switching between search engines when googling. Typically, you’ll just rewrite the query for google. And as rewriting the query is an entrenched part of googling, you are less likely to notice this as a failure. It is this training that’s the core advantage nostrademons points out.

Webspam is a really big problem, yes. It’s very unlikely that you’d be able to catch up or keep up in that regard without Google’s resources.Building the index itself is relatively easy. There are some subtleties that most people don’t think about (eg. dupe detection and redirects are surprisingly complicated, and CJK segmentation is a pre-req for tokenizing), but things like tokenizing, building posting lists, and finding backlinks are trivial – a competent programmer could get basic English-only implementations of all three running in a day.

> 1) a record of searches and user clicks for the past 20 years

From what I can tell, Google cares a lot more about recency.

When I switch over to a new framework or language, search results are pretty bad for the first week, horrible actually as Google thinks I am still using /other language/. I have to keep appending the language / framework name to my queries.

After a week or so? The results are pure magic. I can search for something sort of describing what I want and Google returns the correct answer. If I search for ‘array length’ Google is going to tell me how to find the length of an array in whatever language I am currently immersed in!

As much as I try to use Duck Duck Go, Google is just too magic.

But I don’t think it is because they have my complete search history.

Also people forget that the creepy stuff Google does is super useful.

For example, whatever framework I am using, Google will start pushing news updates to my Google Now (or whatever it is called on my phone) about new releases to that framework. I get a constant stream of learning resources, valuable blog posts, and best practices delivered to me every morning!

It really is impressive.

 

 

 

Publish Blog Using Google Docs

We make publishing

blog posts easier

Write and collaborate using Google Docs, then publish to your blog or website with the click of a button

Start your free trial

Write and collaborate using Google Docs
Enhance your productivity by using the powerful editing and collaboration features of Google Docs.
One-click publishing
Publish your content from Google Docs to your website with the click on a button.
Preserve your formatting
Cloudpress will preserve your formatting, upload your images, and even has a few other tricks up its sleeve.