January 2017 - HaystaqDNA

Download a PDF of the Case Study

As the new Congress rushes towards a repeal of the Affordable Care Act, many are working against the opinion of the voters in their own districts. Research conducted by HaystaqDNA during the 2016 campaign showed that a majority of Americans support the ACA. However, members of Congress are more concerned with opinions of their constituents than they are with national numbers. Therefore, Haystaq looked at support levels by Congressional District. 253 of 435 or 58% of Congressional Districts show a majority of voters supporting ACA.

Not surprisingly, the majority of these pro-ACA districts are held by Democrats. However, 61 pro-ACA districts are currently held by Republicans. Many of these districts are relatively safely Republican, but in many, the difference in support in favor of the ACA is near or above the margin of victory in the 2016 election. This would suggest that voting to repeal the act puts these candidates at risk next year, even more so once voters realize how they will be personally affected by a repeal of the ACA.

The Haystaq microtargeting models have identified 98,942,762 likely ACA supporters nationwide, 41,697,492 of whom live in Republican districts.

METHODOLOGY

These numbers are based on a national survey of approximately 10,000 registered voters. The survey responses were used to build microtargeting models predicting how any individual voter would have an- swered the question had they been surveyed. The Congressional District percent in support of ACA is based on the number of voters in each district with an ACA support score of 50% or higher. The ACA support score predicts the likelihood that a voter would say that they support the ACA if surveyed. These numbers differ from poll results in that they are not weighted. A poll is likely to be weighted based on assumptions about likely turnout. The Haystaq models are applied to every registered voter.

The microtargeting models were built using a combination of the survey results and nearly 1,000 fields of commercial marketing data, Census demographics and proprietary derived indicators. Haystaq combines a variety of statistical and machine learning algorithms including Penalized Logistic Regression and Random Forests. The predictive models were validated against a hold-out sample to confirm that they accurately predicted the likely survey responses of individuals whose responses were not used in building the models.

Following is the question wording used in the survey:

Which comes closest to your opinion on the Affordable Care Act or Obamacare: that it is beneficial but doesn’t go far enough, that it is about right, or that it goes too far and should be repealed? Please press 1 if you think Obamacare is beneficial but doesn’t go far enough, press 2 if you like the law as it is, press 3 if you think Obamacare goes too far and should be repealed, or press 4 if you are not sure.

The model predicts the likelihood that a voter with an opinion on ACA would select option 1 (Support ACA but thinks it doesn’t go far enough) or option 2 (like the law as it is) vs. 3 (Goes too far and should be repealed). Because the model is predicting support only among those with an opinion, respondents picking option 4 (unsure) are not included.

The survey was conducted using a combination of live and IVR (automated phone calls) to a random sample of more than 10,000 voters nationwide.

CD	Name	% of Vote in 2016 Election	% of Voters Supporting ACA
TX23	Will Hurd	50.90%	72.40%
NY11	Daniel Donovan	63.30%	70.40%
FL27	Ileana Ros-Lehtinen	54.90%	67.20%
FL26	Carlos Curbelo	56.30%	65.30%
WA8	Dave Reichert	60.00%	64.90%
CA21	David G. Valadao	93.20%	63.80%
IL12	Mike Bost	57.80%	63.30%
MI11	David Trott	56.90%	61.40%
VA10	Barbara Comstock	52.90%	61.00%
KY6	Andy Barr	61.10%	60.60%
IL13	Rodney Davis	59.70%	60.50%
NJ11	Rodney Frelinghuysen	60.00%	60.40%
NJ7	Leonard Lance	55.70%	59.50%
VA2	Scott Taylor	61.70%	59.10%
MI8	Mike Bishop	58.80%	58.60%
IL6	Peter J. Roskam	59.50%	58.40%
FL18	Brian Mast	55.50%	58.10%
NM2	Steve Pearce	62.80%	57.90%
FL25	Mario Diaz-Balart	62.40%	57.90%
MI6	Fred Upton	61.70%	57.60%
CA25	Stephen Knight	54.20%	57.50%
CO6	Mike Coffman	54.70%	56.70%
FL2	Neal Dunn	69.20%	56.40%
NY24	John Katko	61.00%	55.70%
NY19	John Faso	54.70%	55.60%
AZ2	Martha McSally	56.70%	54.80%
CA39	Edward Royce	57.70%	54.60%
MI7	Tim Walberg	57.90%	54.60%
MI1	Jack Bergman	58.20%	54.60%
PA15	Charles W. Dent	60.60%	54.30%
PA18	Tim Murphy	100.00%	54.20%
PA8	Brian Fitzpatrick	54.50%	54.10%
IL14	Randy Hultgren	59.60%	54.10%
MI4	John Moolenaar	65.80%	54.00%
IA1	Rod Blum	53.90%	53.90%
WA5	Cathy McMorris Rodgers	59.50%	53.90%
TX32	Pete Sessions	100.00%	53.90%
NJ3	Tom MacArthur	60.60%	53.70%
WA3	Jaime Herrera Beutler	61.40%	53.60%
NJ4	Chris Smith	65.50%	53.60%
NJ2	Frank LoBiondo	61.60%	53.60%
MN3	Erik Paulsen	56.90%	53.60%
PA12	Keith Rothfus	61.90%	53.50%
KY1	James Comer Jr.	71.20%	53.30%
MI3	Justin Amash	61.30%	53.00%
ME2	Bruce Poliquin	54.90%	52.70%
GA6	Tom Price	61.60%	52.30%
VA5	Thomas Garrett	58.30%	52.10%
TX27	Blake Farenthold	58.90%	52.10%
LA4	Mike Johnson	65.20%	52.00%
NY2	Peter T. King	62.40%	51.90%
LA5	Ralph Abraham	100.00%	51.80%
TX7	John Culberson	56.20%	51.70%
NC13	Ted Budd	56.10%	51.50%
CA49	Darrell Issa	51.00%	51.40%
NY1	Lee Zeldin	59.00%	51.40%
PA6	Ryan Costello	57.30%	51.20%
FL15	Dennis A. Ross	57.50%	51.10%
OH14	David Joyce	62.70%	51.10%
GA12	Rick Allen	61.60%	50.70%
OH1	Steve Chabot	59.60%	50.40%

Download a PDF of the Case Study

Modern presidential campaigns need to contact voters at high volume and on a sensitive timeline; to encourage persuadable voters to support them, to solicit donations, to engage volunteers, and to ensure that supportive voters turn out at the polls. Targeting each message to the most receptive audience is central to succeeding at all of these functions, not only because campaign resources are scarce, but also because contacting unsympathetic voters can produce harmful backlash. Bernie Sanders ran for the Democratic nomination for president in 2016 with little name recognition, little money, and little support. With HaystaqDNA’s help, the Sanders campaign attracted the support of almost half of Democratic voters and a substantial share of convention delegates, and Bernie Sanders was able to bring his message to a national audience.

Microtargeting and predictive analytics has been a cornerstone of modern campaigns at the presidential level ever since our CEO and founder, Ken Strasma pioneered the approach on John Kerry’s 2004 Democratic primary campaign. While the Democratic National Committee now offers some basic models for the use of Democratic campaigns, the Sanders campaign engaged Haystaq so that it could take advantage of state-of-the-art candidate- and state-specific modeling, data-informed delegate maximization strategies, and experimental design guidance. Haystaq’s models were used for a wide variety of applications, including targeted deployment of distributed volunteer calls, addressable television and digital advertising, segmentation of fundraising email lists, turnout tracking, and field program optimization.

1. Sanders Support Models

Using survey data combined with advanced statistical and machine learning modeling techniques, Haystaq created state-specific models predicting the extent to which any of a sample of eligible Democratic primary voters surveyed could be expected to support Bernie Sanders for president over Hillary Clinton. Once a state’s support model was validated and optimized using a test-set of survey records, it was applied to all eligible Democratic primary voters in a state, assigning each a probability score predicting the likelihood of that voter supporting Bernie Sanders. To ensure the models remained up-to-date and were incorporating all available data, they were regularly refreshed to include new data from our daily tracking surveys, and new data collected by the campaign’s field program.

In some states, Haystaq also created specific support models for hard-to-reach demographic groups such as African Americans and Hispanics. Those groups of voters were often scored low on our support models, but it was important to the campaign to expand its appeal among minority voters, so these models allowed the campaign to find which African-American voters, for example, were relatively more likely to be open to Sanders. These were used for specific outreach to those groups, to maximize persuasive impact while minimizing the risks of contacting unsupportive voters.

2. Primary Turnout Models

Using a similar process, Haystaq also created scores predicting each individual voter’s likelihood of participating in the Democratic primary. Primary elections typically draw significantly lower participation than generals; they are often open only to registered partisans and even then, only the most engaged activist voters tend to participate. In some states, like Iowa, which is always first on the primary calendar, delegates are chosen not in a regular election but a caucus. Since participating in these contests can require voters to stay at their caucus location for up to several hours, participation is extremely low. (In 2008, the year of the last contested primary on the Democratic side, Iowa caucus participation was a record high at 239,000.)

To create turnout models for such unusual contests requires a state-specific approach, and is a challenge because there can be no direct source for a 2016 turnout dependent variable before the election has taken place. The methodology we have honed defines the most recent similar election as a model. Then we “roll back” date-based indicators like age and past election voting history in order to create a model “predicting” 2008 primary turnout using only what was known before 2008, once this model is validated (that is, we verify that it tracked 2008 participation accurately), we “roll forward” the indicator set, to apply it to 2016 voters.

This is the most valid method available, but it is flawed in that it assumes repetition of past voting patterns, and no turnout model can be validated until the election is over. The outputs of turnout models generally need to be artificially adjusted to account for expected changes in voting patterns. This was especially true in this case since we were relying on participation patterns of 2008, which could not be expected to be repeated in part because of the different groups of voters motivated by Bernie Sanders versus Barack Obama, and differences in the reception to Hillary Clinton in 2008 versus 2016. In the case of the Sanders campaign, we knew that our supporters were much less likely than Clinton supporters to have participated in the 2008 primaries, and less likely to be registered as Democrats, and so they would be rated as less likely to participate under the traditional metrics. For this reason, we also modeled voters’ “self-reported” turnout likelihood using responses to a survey question about intent to vote. Our studies have shown that people generally exaggerate their likelihood of voting when asked, but that a self-reported turnout score does work well as a relative measure. The self-reported models allowed us to account for the increased enthusiasm felt by the many Sanders supporters who were first-time primary voters.

3. Campaign Engagement and Other Models

Among the other models Haystaq created for the campaign were volunteer, email-responsiveness, and fundraising models, each of which were based upon individuals’ direct engagement with the campaign. Using the voter records of people who volunteered with the campaign’s field team, canvassing or phone-banking to reach voters, we created scores to find people who had not volunteered with the campaign, but who were most likely to respond favorably if asked. These models allowed the campaign to more efficiently recruit the volunteers who were critical to its field program.

Similarly, we created “look-alike” models to help the campaign find the people most likely to donate, including multiple versions of this model to approximate the expected value of an individual’s donation. In this way, we helped the campaign to maximize their record-breaking small-donor fundraising success, and to expand the program from email-only to targeted digital ads.

4. Delegate Strategy

Because the delegates that ultimately decided the nomination were, in many states, awarded according to congressional district (or by precinct in IA), the campaign needed to work to optimize delegates rather than total votes. Delegates are often awarded in small integers in proportion to district-level vote totals, so there are specific times where gaining or losing a small number of votes can make the difference of a delegate. Using our support and turnout models, we were able to predict which districts were likely to be divided close to these points and recommend that the campaign direct its resources there. The simplest example of this is that in a district with an odd number of delegates, the winning candidate gets an extra delegate no matter how small the margin; whereas in a district with an even number of delegates, 50% of the vote plus 1 is worth dramatically less effort, since the candidates will split the delegates evenly unless one wins by a much larger margin.

5. Election Day turnout tracking

Haystaq also managed the campaign’s election day turnout tracking operation, which involved crowdsourcing reports of how many ballots had been cast at particular precincts at various times on election day and aggregating those reports upward to project voting trends. Precincts were categorized according to the average support score of eligible primary voters as favorable to Clinton or Sanders. Our turnout tracking system compared the actual number of ballots cast at base Sanders and Clinton precincts to the baseline expectation during the day to give the campaign an indication of whether it was likely to win the state, and then perhaps more importantly, how delegates apportioned at district levels were likely to be distributed. This allowed the campaign to direct its resources to the places where additional campaign phone calls or canvassing were most likely to make a difference in the delegate total.

6. Fundraising Models

Haystaq and the campaign also conducted a series of experiments aimed at optimizing the response to its fundraising emails by varying the “ask amount” referenced in the email text. In general, asking for larger donations does yield larger donations, but asking for smaller donations yields more donations. We found in a series of A/B tests conducted in early April that asking people who had not previously contributed to the campaign for $2.70 (1/10 the campaign’s oft-cited average contribution of $27) consistently produced a higher expected return than the campaign’s original practice of asking for $3. While the average donation amount decreased marginally with the smaller ask, this effect was moderated by the fact that people often donated more than the amount asked for once they reached the donation page.

7. Direct mail

Haystaq microtargeting models allowed for the creation of nuanced and very precisely targeted direct mail universes. We were able to increase the volume of mail sent to precincts or congressional districts where our modeling showed that the campaign was near the tipping point for winning additional delegates, and also to target specific messages to voters most likely to be sympathetic. Using a suite of more than 50 issues ranging from climate change to school choice support, we were able to direct messaging on climate change, for example, to the voters most likely to strongly agree with Sanders’ climate message. We were also able to direct additional mail to individuals with low television viewing or low social media usage scores, improving reach among voters unlikely to see the campaign’s messaging in other media.

8. Television Targeting

Microtargeting is often thought of as a tool only for direct voter contact like phones, door knocking or direct mail. However, given the share of resources spent on television advertising, this may be the most important application of microtargeting. Haystaq used ratings data by various different demographic and lifestyle indicators to calculate the likelihood of any individual voter watching a particular show, network or daypart. We then overlaid this data with our modeled lists of persuadable voters and likely supporters to calculate the cost of reaching any one of the campaign’s targets via television. This often led to our being able to identify much more efficient targets than would have been possible using traditional metrics like cost per point or cost per adult viewer.

Even greater precision was available through addressable television. We generated lists of individuals who were scored as highly persuadable for persuasion ads, and likely supporters who might not vote for turnout ads. These were also segmented by demographics and by congressional district so that we could maximize the ad spend in districts where we were the closest to the tipping point for winning additional delegates.

Results

While ultimately Sanders narrowly lost the nomination, the campaign dramatically outperformed expectations at its launch, engaged segments of voters previously ignored by the Democratic Party, and eventually garnered the support of 47% of Democrats. The campaign resulted in the elevation of Sanders’ stature within the party and an ongoing engagement for Haystaq with the campaign’s successor organization Our Revolution.

Month: January 2017

Support for the Affordable Care Act 1/23/2017

Download a PDF of the Case Study

METHODOLOGY

HaystaqDNA and Bernie Sanders 2016 1/05/2017

Download a PDF of the Case Study

1. Sanders Support Models

2. Primary Turnout Models

3. Campaign Engagement and Other Models

4. Delegate Strategy

5. Election Day turnout tracking

6. Fundraising Models

7. Direct mail

8. Television Targeting

Results