

The toxic combination of this data’s power in the industry and its secretive inaccessibility to those beyond the industry reveals a broader problem. But, as I found to my surprise, pretty much everybody else is explicitly banned from using BookScan data, including academics. All the major publishing houses now rely on BookScan data, as do many other publishing professionals and authors. Since its launch in 2001, BookScan has grown in authority. However, the underlying source for all these sales figures is typically an exclusive subscription service called BookScan: the most granular, comprehensive, and influential book sales data in the industry (though it still has significant holes-more on that to come). Moreover, select book sales figures are often reported to journalists-like the fact that Station Eleven has sold more than 1.5 million copies overall-and also shared through outlets like Publishers Weekly. Every week, the New York Times of course releases its famous list of “bestselling” books, but this list does not include individual sales numbers. The problem with book sales data may not, at first, be apparent. And I learned that this is a big problem. What I learned was that the single most influential data in the publishing industry-which, every day, determines book contracts and authors’ lives-is basically inaccessible to anyone beyond the industry. I went looking for book sales data, only to find that most of it is proprietary and purposefully locked away. John Mandel’s pandemic novel Station Eleven were being sold in COVID-19 times compared to when the novel debuted in 2014? And what about Giovanni Boccaccio’s much older-14th-century-plague stories, The Decameron? Were people clinging to or fleeing from pandemic tales during peak coronavirus panic? You might think, as I naively did, that a researcher would be able to find out exactly how many copies of a book were sold in certain months or years. I’m a data scientist and a literary scholar, and I wanted to know what books people were turning to in the early days of the pandemic for comfort, distraction, hope, guidance. Together, they show a new way of understanding how culture is made, and how we can make it better.Īfter the first lockdown in March 2020, I went looking for book sales data.
#NYTIMES BOOKS SERIES#
This a partnership between the Data Collective and Public Books, a series called Hacking the Culture Industries, brings you data-driven essays that change how we understand audiobooks, bestselling books, streaming music, video games, influential literary institutions such as the New York Times and the New Yorker, and more.

To that end, we created the Post45 Data Collective, an open access site that peer reviews and publishes literary and cultural data. It’s time to use their data to study them. To pass in parameter values to the source plugin, include the parameter name and value as a property of the options object in your config.js file.Culture industries increasingly use our data to sell us their products.

The books API allows you to query best sellers lists, reviews and books by passing any number of different parameter values. Get top 5 books for all the Best Sellers lists for specified date. If no date is provided returns the latest list. Check out the NYTimes Books API documentation to learn more about what parameters are available and required to successfully create a query. Below are the available types and the endpoints they correspond to. Each endpoint has a corresponding type that can be set to pull data from. There are six different endpoints available from the NYTimes Books API.
