Algorithms Could Save Book Publishing—But Ruin Novels
Over four years, Archer and Jockers fed 5,000 fiction titles published over the last 30 years into computers and trained them to “read”—to determine where sentences begin and end, to identify parts of speech, to map out plots. They then used so-called machine classification algorithms to isolate the features most common in bestsellers.
.. The result of their work—detailed in The Bestseller Code, out this month—is an algorithm built to predict, with 80 percent accuracy, which novels will become mega-bestsellers.
What does it like? Young, strong heroines who are also misfits (the type found in The Girl on the Train, Gone Girl, and The Girl with the Dragon Tattoo). No sex, just “human closeness.” Frequent use of the verb “need.” Lots of contractions. Not a lot of exclamation marks. Dogs, yes; cats, meh. In all, the “bestseller-ometer” has identified 2,799 features strongly associated with bestsellers.
.. It’s sad to think that data could narrow our tastes and possibilities.”
.. There’s a wrinkle, though: Companies such as Amazon and Apple have the data for books read on their devices, and they aren’t sharing it with publishers.
.. The ability to know who reads what and how fast is also driving Berlin-based startup Inkitt
..Albazaz, now 26, sees himself as democratizing the publishing world. “We never, ever, ever judge the books. That’s not our job. We check that the formatting is correct, the grammar is in place, we make sure that the cover is not pixelated,” he says. “Who are we to judge if the plot is good? That’s the job of the market. That’s the job of the readers.”
.. Callisto studies the search terms Amazon suggests when users start typing in the first few letters, and found that people would frequently search for something that led to no results. “Consumers are searching for a piece of information, but no product exists to satisfy that consumer demand,”
.. Don’t we risk losing the distinction between what’s important and what’s popular? As NPR noted last year, books nominated for prestigious prizes like the Man Booker Prize or the National Book Award typically don’t sell many copies.
.. The computer found much to love: a strong, young female protagonist whose most-used verbs are “need” and “want.”