The Internet Archive Is Making Wikipedia More Reliable

The operator of the Wayback Machine allows Wikipedia’s users to check citations from books as well as the web.

Wikipedia is the arbiter of truth on the internet. It’s what settles arguments at bars. It supplies answers for the information snippets you see on your Google or Bing search results. It’s the first stop for nearly everyone doing online research.

The reason people rely on Wikipedia, despite its imperfections, is that every claim is supposed to have citations. Any sentence that isn’t backed up with a credible source risks being slapped with the dreaded “citation needed” label. Anyone can check out those citations to learn more about a subject, or verify that those sources actually say what a particular Wikipedia entry claims they do—that is, if you can find those sources.

It’s easy enough when the sources are online. But many Wikipedia articles rely on good old-fashioned books. The entry on Martin Luther King Jr., for example, cites 66 different books. Until recently, if you wanted to verify that those books say what the article says they say, or if you just wanted to read the cited material, you’d need to track down a copy of the book.

Now, thanks to a new initiative by the Internet Archive, you can click the name of the book and see a two-page preview of the cited work, so long as the citation specifies a page number. You can also borrow a digital copy of the book, so long as no else has checked it out, for two weeks—much the same way you’d borrow a book from your local library. (Some groups of authors and publishers have challenged the archive’s practice of allowing users to borrow unauthorized scanned books. The Internet Archive says it seeks to widen access to books in “balanced and respectful ways.”)

So far the Internet Archive has turned 130,000 references in Wikipedia entries in various languages into direct links to 50,000 books that the organization has scanned and made available to the public. The organization eventually hopes to allow users to view and borrow every book cited by Wikipedia, with the ultimate goal being to digitize every book ever published.

“Our goal is to be a library that’s useful and reachable by more people,” says Mark Graham, director of the Internet Archive’s Wayback Machine service.

If successful, the Internet Archive’s project would be a boon to students, journalists, or anyone who wants to check the references of a Wikipedia entry. Google Books also has a massive collection of digitized print books, but it tends to only show small snippets of a text.

“I’ve tried to verify Wikipedia pages by searching blurbs in Google Books but it’s an unpredictable link, and you often don’t have enough surrounding context to evaluate the use,” says Mike Caulfield, a digital literacy expert and director of blended and networked learning at Washington State University Vancouver. “The ability to read a page or two of context around a quote is crucial to both editors trying to protect the integrity of articles, and to readers who need to get to that next step of verification.”

You could, of course, verify the information the traditional way by tracking down a physical copy of a book. But students working late into the night on term papers, or reporters on tight deadlines, might not have time to order a book on Amazon or wait for a library book to become available. In other cases, books might be hard to come by. The Wikipedia entry on the internment of Japanese-Americans during World War II, for example, cites hard-to-find titles, says Internet Archive director of partnerships Wendy Hanamura. But thanks to the Internet Archive’s Digital Library of Japanese-American Incarceration, created with the Seattle-based organization Densho, many of those rare books are now available online.

The Internet Archive embarked on its effort to weave digital books into Wikipedia after the 2016 election. “No matter who you wanted to be president, I would say almost everyone would agree the whole process was a train wreck,” Internet Archive founder Brewster Kahle said in a speech in San Francisco last week. From fake news and inauthentic social media campaigns waged by foreign nations to concerns about voting systems themselves being rigged, there were plenty of ways that technology and information systems failed the public. So Kahle convened a group of people to discuss how to improve the information ecosystem. One issue that came up was the fragility of Wikipedia citations. Books and academic journals supply some of the best, most reliable information for Wikipedia editors, but those sources frequently are either unavailable online or are behind paywalls. And even freely available internet content often disappears.

The Internet Archive was in a unique position to help solve this problem. The organization’s Wayback Machine service has archived 387 billion webpages since 2001. It’s also been digitizing physical books and other analog media, and has now scanned 3.8 million books. It has millions more books warehoused.

Graham and company created the InternetArchiveBot, a tool that scans Wikipedia for broken links and automatically adds links to versions archived in the Wayback Machine. Because automatic editing tools require special permission to use, Graham has to work with the Wikipedia communities that manage versions of the encyclopedia in different languages. “All told, we’ve edited 14 million links; more than 11 million point to Internet Archive,” he says.

Adding links to books is similar but more challenging. “If a book has an ISBN number and an entry has a traditional citation format, it’s pretty easy,” Graham explains. But not all books have ISBN numbers, and many Wikipedia citations aren’t properly formatted. For instance, some only cite the book and not a specific page number. There can also be differences between different editions of a book.

Of course, the Internet Archive hasn’t scanned all the books cited by Wikipedia yet. It’s working hard to digitize collections from libraries around the world, along with donations from companies like Better World Books. Graham says the organization scans more than 1,000 books per day. But it has plenty more work to do.

The greatest album covers of jazz

Blue Note captured the refined sophistication of jazz during the early 60s, giving it its signature look in the process.

yeah that is dynamite. One of the things
that amazed me was what I call the
pullback effect. Take Hank Mobley’s no
room for squares. There was a new subway
station that was built. It was unlike any
other subway stop. It had these metal
concentric circles. Now try to find the
final album cover there. It is the
pullback shows you the whole image and
it gives you an insight into the eye of
the designer that I think is absolutely
amazing

The Memo Doesn’t Make Its Case 

The truth requires greater transparency

.. That experience teaches me that the memo simply doesn’t make its case. Indeed, it gets less persuasive — and the material omissions more glaring — with each successive read. It might disclose the existence of troubling FBI misconduct, but the fair-minded reader has no way of knowing whether it does.

.. A good summary always supports assertions with evidence. A good summary provides context. A good summary even includes relevant information that contradicts its thesis so that the reader can evaluate the best counter-arguments. 

.. legal arguments typically depend on lawyers taking thousands (sometimes tens of thousands) of pages of depositions and documents, crafting a concise narrative, and communicating that narrative to a judge — with citations referring to the relevant evidence and quotations of it as well.

.. If there is no citation or quotation, a judge will typically ask the lawyer, “Counselor, what record evidence supports that assertion?”

.. One of the first and most vital assertions in the entire memo is the claim that “the ‘dossier’ compiled by Christopher Steele (Steele dossier) on behalf of the Democratic National Committee (DNC) and the Hillary Clinton campaign formed an essential part of the Carter Page FISA application.” This statement is initially offered without proof. One has to read down to the next page to see any reference to evidence:

Furthermore, Deputy Director [Andrew] McCabe testified before the Committee in December 2017 that no surveillance warrant would have been sought from the FISC without the Steele dossier information.

.. When I read this, I had two immediate thoughts. First, what did he actually say? And second, why the subtle change in language from the argument that the “dossier” was an “essential part” of the FISA application to the statement that the warrant wouldn’t have been sought without the dossier “information”? The “dossier” and the “information” are not the same thing.

.. An effective memo would do more to end the debate. How? By quoting the relevant portions of McCabe’s testimony.

Better yet, it could quote the testimony and attach an appropriately redacted copy of the testimony as an appendix.

.. Even the characterization that the dossier was “essential” is a judgment call based on evidence unavailable to the public. Even worse, it was a judgment call based in part on evidence unavailable even to the rest of the committee.

.. memo should have plainly stated the agreement between the DOJ and the committee, along with the reasons for this agreement.

.. good summaries don’t just support conclusions with evidence, they provide vital and necessary context. On this point, the memo fails utterly.

.. it fails to answer the following questions:

  1. How did the FISA application actually describe Steele?

    .. Democrats are arguing that the political nature of his work was appropriately disclosed.  Don’t we need the actual words used to properly evaluate whether the FBI materially misled the court?

  2. In addition to the information from the Steele dossier, what other information did the FISA application include?
  3. To what extent did the multiple renewal applications depend on the information in the dossier? The memo notes that a FISA order must be renewed every 90 days, and each renewal must be supported by an “independent” probable-cause finding. A Trump appointee, Deputy Attorney General Rod Rosenstein, signed at least one of these FISA applications. He apparently believed that the request was supported by probable cause. Why?
  4.  What is the “information” regarding Papadopoulos that triggered the opening of the investigation in July 2016 — a full three months before the Page FISA application? The memo provides information obviously designed to impair the credibility of that investigation — by referring to FBI agent Peter Strzok’s well-known political leanings — but it provides no information about any facts supporting the opening of the probe, leaving the reader with the impression that it was opened solely because Strzok dislikes Trump.

I also wrote above that a good summary “even includes relevant information that contradicts its thesis.” The memo omits any such information, but a Democratic rebuttal exists.

.. But even if the public reviews the Democratic rebuttal, the process is still flawed. The proper way to resolve explosive claims of political bias at the highest levels of government isn’t by dribbling out short memoranda but by issuing comprehensively researched and comprehensively supported majority and minority committee reports.

..it’s not by itself scandalous to review political opposition research — a politically motivated person is no more suspect than the terrorists and criminals who routinely provide information used to support even the most intrusive warrants.

.. When I was in Iraq, we were constantly aware that our sources had their own axes to grind. They didn’t want to defeat their opponents in an election. They wanted them to die in a hail of gunfire.

.. Biased sources are an inherent part of intelligence-gathering.