To begin, you need:
- A free archive.org user account. Sign up for an account.
- A Google account and a Google Sheet with a list of URLs in the first column. The app will update your Sheet with Wayback Machine capture information next to each URL.Sign in
Please sign in to archive.org first to continue.
I’ve gotten lots of mileage out of “How do big decisions get made at this company?” And you want to turn it into a conversation – probe their answers, ask for examples, etc.
You learn about how each interviewer experiences the decision-making process. How are the decisions communicated? Who decides? Do initiatives steamroll people and teams? Are people able to filter up suggestions or start their own initiatives? Can decisions get made team-by-team, or do they have to be made for the whole company? Do they change their mind when they get new evidence? Do people change their mind too much? Do they have trouble saying “no”? If they want to tell you “no,” do they actually say it? Are they too afraid to make decisions that can change the culture, and just kinda drift?
As an IC engineer, this is what I have the least control over (and can find the most frustrating). So I want to hear exactly how broad change happens. It helps me imagine how it’ll feel to spend four years in the environment.
This is the kind of question that you need to ask everyone. You won’t get a good answer from one person. You want to ask a few people and see if their stories line up.I’ve found Culture Queries to be a good resource for this. Helped me avoid questions that led to boilerplate answers (“do you have work/life balance?” -> “How responsive are people to emails/Slack over the weekends and after 6pm?”).Can I talk with the staff I will be working with. It is not enough that they want you, and feel they can work with you, you also need to feel you can work with them. When talking w the staff, discuss what a typical day is like. What are their frustrations and joys. My last question is usually ‘So why have you not moved on to something better?’Stay away from companies who don’t let you talk to the lowest-ranking members of the team.I totally agree. Their perspective and thoughts also matter. This kinda goes along with the whole “Do they allow their entire team in interviews”.What type of firefights have you had to deal with? How often do they occur? What did you do to make sure your last firefight never happens again?
– What’s the roadmap for this year? This gives me a lot of insight on what I could be working on, and also if the company is a bit clueless about their direction.
– What would I do in the position I’m being interviewed for? This completes the picture, similar to the previous one, but different perspective.Let’s say I’m working on a project and am blocked by technical issues that I don’t understand. What do I do next?What is the channel for client feedback to reach the dev team? Can you give me an example of how this has worked in the past?
If I’m working on a feature and discover technical debt that will make it more difficult to implement, how do I decide whether to focus on that debt or the feature? Can you give an example?
The reason interviewers like to ask for examples is that it’s easy to bullshit when speaking abstractly, but people are less likely to lie to your face. Use that to your advantage
Questions about tests. I have found consistently that when an interviewer does not know what the test coverage of their codebase is, more likely than not they don’t care about such trivialities.
No matter how respectable the company might look on the outside, the codebase is an incorrigible mess built by cowboys and they will expect you to maintain it.Ask about on-call expectations.How often do you go on-call. Does everyone participate. How is the documentation handled for resolutions. How often are there problems that you would get called for.
I think a good tech one that is a bit opinionated is: Does a project(s) build, and run (fully, without problems) in 5 or less steps?how often do you do standups?Here’s another one: Do employees actively try to recruit their friends and people they respect in their industry for reasons besides a referral bonus?What’s the biggest mistake you’ve made on the job, and how did your team/management respond to it?
1. Set up ArchiveBox
We recommend using Docker because it has all the extractors and dependencies working out-of-the-box:
View the differences between versions of the same web page
A decade ago, the FBI sent Brewster Kahle, founder of the Internet Archive, a now-infamous type of subpoena known as a National Security Letter, demanding the name, address and activity record of a registered Internet Archive user. The letter came with an everlasting gag order, barring Kahle from discussing the order with anyone but his attorney — not even his wife could know.
The operator of the Wayback Machine allows Wikipedia’s users to check citations from books as well as the web.
Wikipedia is the arbiter of truth on the internet. It’s what settles arguments at bars. It supplies answers for the information snippets you see on your Google or Bing search results. It’s the first stop for nearly everyone doing online research.
The reason people rely on Wikipedia, despite its imperfections, is that every claim is supposed to have citations. Any sentence that isn’t backed up with a credible source risks being slapped with the dreaded “citation needed” label. Anyone can check out those citations to learn more about a subject, or verify that those sources actually say what a particular Wikipedia entry claims they do—that is, if you can find those sources.
It’s easy enough when the sources are online. But many Wikipedia articles rely on good old-fashioned books. The entry on Martin Luther King Jr., for example, cites 66 different books. Until recently, if you wanted to verify that those books say what the article says they say, or if you just wanted to read the cited material, you’d need to track down a copy of the book.
Now, thanks to a new initiative by the Internet Archive, you can click the name of the book and see a two-page preview of the cited work, so long as the citation specifies a page number. You can also borrow a digital copy of the book, so long as no else has checked it out, for two weeks—much the same way you’d borrow a book from your local library. (Some groups of authors and publishers have challenged the archive’s practice of allowing users to borrow unauthorized scanned books. The Internet Archive says it seeks to widen access to books in “balanced and respectful ways.”)
So far the Internet Archive has turned 130,000 references in Wikipedia entries in various languages into direct links to 50,000 books that the organization has scanned and made available to the public. The organization eventually hopes to allow users to view and borrow every book cited by Wikipedia, with the ultimate goal being to digitize every book ever published.
“Our goal is to be a library that’s useful and reachable by more people,” says Mark Graham, director of the Internet Archive’s Wayback Machine service.
If successful, the Internet Archive’s project would be a boon to students, journalists, or anyone who wants to check the references of a Wikipedia entry. Google Books also has a massive collection of digitized print books, but it tends to only show small snippets of a text.
“I’ve tried to verify Wikipedia pages by searching blurbs in Google Books but it’s an unpredictable link, and you often don’t have enough surrounding context to evaluate the use,” says Mike Caulfield, a digital literacy expert and director of blended and networked learning at Washington State University Vancouver. “The ability to read a page or two of context around a quote is crucial to both editors trying to protect the integrity of articles, and to readers who need to get to that next step of verification.”
You could, of course, verify the information the traditional way by tracking down a physical copy of a book. But students working late into the night on term papers, or reporters on tight deadlines, might not have time to order a book on Amazon or wait for a library book to become available. In other cases, books might be hard to come by. The Wikipedia entry on the internment of Japanese-Americans during World War II, for example, cites hard-to-find titles, says Internet Archive director of partnerships Wendy Hanamura. But thanks to the Internet Archive’s Digital Library of Japanese-American Incarceration, created with the Seattle-based organization Densho, many of those rare books are now available online.
The Internet Archive embarked on its effort to weave digital books into Wikipedia after the 2016 election. “No matter who you wanted to be president, I would say almost everyone would agree the whole process was a train wreck,” Internet Archive founder Brewster Kahle said in a speech in San Francisco last week. From fake news and inauthentic social media campaigns waged by foreign nations to concerns about voting systems themselves being rigged, there were plenty of ways that technology and information systems failed the public. So Kahle convened a group of people to discuss how to improve the information ecosystem. One issue that came up was the fragility of Wikipedia citations. Books and academic journals supply some of the best, most reliable information for Wikipedia editors, but those sources frequently are either unavailable online or are behind paywalls. And even freely available internet content often disappears.
The Internet Archive was in a unique position to help solve this problem. The organization’s Wayback Machine service has archived 387 billion webpages since 2001. It’s also been digitizing physical books and other analog media, and has now scanned 3.8 million books. It has millions more books warehoused.
Graham and company created the InternetArchiveBot, a tool that scans Wikipedia for broken links and automatically adds links to versions archived in the Wayback Machine. Because automatic editing tools require special permission to use, Graham has to work with the Wikipedia communities that manage versions of the encyclopedia in different languages. “All told, we’ve edited 14 million links; more than 11 million point to Internet Archive,” he says.
Adding links to books is similar but more challenging. “If a book has an ISBN number and an entry has a traditional citation format, it’s pretty easy,” Graham explains. But not all books have ISBN numbers, and many Wikipedia citations aren’t properly formatted. For instance, some only cite the book and not a specific page number. There can also be differences between different editions of a book.
Of course, the Internet Archive hasn’t scanned all the books cited by Wikipedia yet. It’s working hard to digitize collections from libraries around the world, along with donations from companies like Better World Books. Graham says the organization scans more than 1,000 books per day. But it has plenty more work to do.