Reading Into Book Sales

A recent flurry of press attention to the suggestion that Secretary Clinton could use data from her upcoming book tour to aid her presidential campaign caught our attention.

Ultimately, Simon & Schuster said that Clinton would “unequivocally” not have access to data they’re collecting.

But if she does decide to run, that data could be extremely useful, if used in the right way.

It used to be that you could tell a lot about a person by what they kept on their bookshelves. Today, even with the proliferation of electronic publishing, we can still use what people choose to read in their spare time as a lens to predict their voting behavior. You still can’t judge an e-book by its cover, but you can learn a lot from the patterns of the people buying it.

As a general rule, organizations should be collecting data on every interaction they have with people, in every channel. Whether they are ready to put it to use today, or planning for a potential future.

When it comes to book purchases, retailers will never release the individual data they collect on purchases. (And this is probably a good thing.) However, it is certainly feasible to release sales data by small units of geography, like county or ZIP code – either publicly, or to publishers and authors. This data would be incredibly useful for directly targeting or modeling support at a geographic level, particularly in instances where we can identify specific books which closely track voter’s expressed preferences and likely behavior.

Specific titles can provide a lot of insight into expected behavior all across the political spectrum:

Examples from current best-seller lists: Thomas Piketty’s “Capital in the 21st Century” / Elizabeth Warren’s “A Fighting Chance” / Ben Carson’s “One Nation” / Tim Geithner’s “Stress Test” / Arianna Huffington’s “Thrive: The Third Metric” / Sheryl Sandberg’s “Lean In”.

On the individual level, we can predict with a fair degree of certainty how a buyer of any of these books might vote. But people buy books for lots of reasons, and we caution against making the mistake of directly imputing a political bent to a reader who might pick up a book simply for reasons other than to satiate their ideological views.

But, in the aggregate, we can use purchases of certain titles to contribute to predicting expected voting outcomes at the individual, group and geographic levels.

In the 2008 and 2012 general elections, campaigns had a very basic sense of this dynamic, as book releases by candidates like President Barack Obama’s best-selling “The Audacity of Hope” and Senator Rick Santorum’s “It Takes a Family” set the stage for political movements. But the e-book industry was then a mere fraction of what it is today.

With the 2014 midterms around the corner, and the 2016 presidential primary season to follow soon on its heels, to get a sense of where their prospective voters’ hearts are leaning, smart campaigns will be wise to consider their potential supporters’ bookshelves in addition to their voting behavior.

Michael Simon is President and co-founder of HaystaqDNA. In 2008, he ran the in-house Obama for America analytics department.

Mo’ Data, Mo’ Problems?

Society is just beginning to understand the potential of ‘big data’. It’s estimated that 90% of the world’s data was created in the past few years, and the computing power needed to harness that information is growing quickly to keep up.

This rapid change means it’s important for policy makers to begin thinking now about the appropriate role of government in the field. Advancements in technology aren’t going away, but are only just coming to maturity — and we need to quickly figure out how to create a regulatory environment that balances privacy concerns with encouraging innovation that can unlock the incredible potential of data analytics. The Obama Administration’s review is an important start.

Recent news reports about the review’s results have focused on the potential misuse of big data, particularly ways it could exacerbate discriminatory hiring practices. But there are also a growing number of examples where big data can be used to protect against discrimination by rooting out the unfairness that sometimes accompanies human bias.

In one famous example, corporations have used analytics on vast consumer datasets to predict when women may become pregnant, so that they might market diapers and baby strollers to these women. In the wrong hands, those very same predictions could potentially be used to avoid hiring women who are likely to become pregnant in the near future.

While the government should take steps to prevent this type of corporate abuse, the sheer proliferation of data and technology will make it exceedingly difficult to prevent corporations from engaging in consumer analysis. Instead, we should learn to use the same data sets and analysis to identify and prevent abuse before it occurs.  If, for example, a review of hiring decisions indicated that job candidates with a greater predicted likelihood of having children are being hired at different rates, policymakers could use that information to stop such practices. Similarly, researchers using big data to predict crime are finding it can also help exonerate inmates who may have been wrongly convicted.

We believe that any organization dealing with data – big or small – has a responsibility to safeguard individual data while developing new ways of using new technology and resources for the greater good.

Smart use of data may not represent a solution to everything, but by finding patterns in apparently random behavior, it has the potential to give us unique insights into many of the most intractable problems in society today. Discrimination is one of those problems. We support starting an intelligent debate over how to balance human necessities like privacy and security with society’s needs for justice and innovation. At HaystaqDNA, we’re eager to get this conversation started, and excited to explain how data doesn’t need to intrude on our boundaries to expand our horizons.

Michael Simon is President and co-founder of HaystaqDNA. In 2008, he ran the in-house Obama for America analytics department.