Introduction
We now know more about the ways in which a single poll can go awry. But how can the pollster protect themselves in situations when multiple polls are biased, leading to flawed forecasts?
As mentioned in earlier chapters, one of the more common methods for reducing sample variability is poll aggregation. The rough idea is that by assessing the results of multiple polls together, the effective sample size will be increased and thus the margin of error minimized.Footnote 1 Aggregators all have the same technical objective of minimizing noise and maximizing signal. But does this work in practice? This is what we will explore in detail in this chapter.
Before we dive in, let’s take a look at what poll aggregation looks like in its simplest form – that is to say, as a simple average. Table 7.1 aggregates the polls conducted during the last two days of the 2016 US presidential election cycle. This results in a total sample size of 17,677 interviews, with a corresponding margin of error of 0.7%. In contrast, as we learned in Chapter 5, any single poll with a sample size of around 1,000 interviews has about an +/−3.1% margin of error. See how the individual polls are more variable relative to each other and their margin of errors are larger. Ultimately, the market average came close to the actual election results (+3.2 versus +2.1).
Table 7.1 Poll results published in the last two days of the 2016 election
| Date | Sample size | Margin of error (MOE) | Percent support Clinton | Percent support Trump | Spread | |
|---|---|---|---|---|---|---|
| Bloomberg | 11/4–11/6 | 799 | 3.5% | 46 | 43 | Clinton +3 |
| IBD/TIPP Tracking | 11/4–11/7 | 1107 | 2.9% | 43 | 42 | Clinton +1 |
| The Economist/YouGov | 11/4–11/7 | 3669 | 1.6% | 49 | 45 | Clinton +4 |
| LA Times/USC Tracking | 11/1–11/7 | 2935 | 1.8% | 44 | 47 | Trump +3 |
| ABC/Wash Post Tracking | 11/3–11/6 | 2220 | 2.1% | 49 | 46 | Clinton +3 |
| Fox News | 11/3–11/6 | 1295 | 2.7% | 48 | 44 | Clinton +4 |
| Monmouth | 11/3–11/6 | 748 | 3.6% | 50 | 44 | Clinton +6 |
| NBC News/Wall Street Journal | 11/3–11/5 | 1282 | 2.7% | 48 | 43 | Clinton +5 |
| CBS News | 11/2–11/6 | 1426 | 2.6% | 47 | 43 | Clinton +4 |
| Reuters/Ipsos | 11/2–11/6 | 2196 | 2.1% | 44 | 39 | Clinton +5 |
| AVERAGE | 17677 | 0.7% | 46.8 | 43.6 | Clinton +3.2 |
As we move through this chapter, remember that the margin of error is a measure of sampling error. Non-sampling error also contributes to polling misses.
Not all aggregators are created equal. Some report a simple rolling average over a given time period. Others use sophisticated algorithms, such as Monte Carlo Markov Chain models, to account for outliers and sparse data. Still others utilize additional inputs like economic data or historic election results to “smooth” their estimates.
The aggregate is only as good as the individual polls that underpin it. Biased individual polls lead to biased aggregates. The “market of polls” can also have a gravitational force of its own. Polling outfits will closely watch the market average and in some cases seek to adjust their own results to reflect the general consensus. This is known as “herding” and can push the polls toward an artificial standard. We will discuss herding in more detail in Chapter 8.
In the preceding chapters, we detailed and then employed our total error framework to assess the quality of a single poll. As we saw there, many election misses come down to failing to correctly identify the voting population or accounting for coverage bias. We also saw how other forms of error, such as poor question formulation, can lead to significant analytic uncertainty.
In this chapter, we will apply our total error framework to the polls in aggregate. Such analysis is typically done retrospectively in order to determine why the polls did not predict a given outcome.
However, aggregate assessment can also be done in real-time to assess the polls against models, economic data, social media activity, and the like. In these instances, we seek to probe why the polls are at variance with other evidence, and to detect whether there is some systemic bias in the polls that is sending the wrong signal.
We most commonly assess the performance of polls relative to elections. But we can apply this to non-electoral cases as well, such as referenda, impeachments, and reform bills. The primary characteristic of all these instances is that they are distinct outcomes, bounded with some degree of discreteness. In this chapter, we will conduct a retrospective analysis of one of the most astonishing polling misses in recent memory, the 2015 Greek referendum, or Grexit.
2015 Greek Referendum: Grexit
Context
In the summer of 2015, the Greek sovereign debt crisis had reached a breaking point.Footnote 2 The Greek government missed its $1.7 billion debt payment due to the International Monetary Fund (IMF). As a result, banks closed, and Greek citizens scrambled to withdraw cash from ATMs. The IMF, the European Commission, and the European Central Bank offered Greece a bailout with certain austerity conditions. The specter of Greece’s exit from the Eurozone and a return to the drachma loomed. Prime Minister Alexis Tsipras, who had been elected on an anti-austerity platform, opposed the bailout and called a last-minute referendum allowing the citizens to vote on whether or not to accept said conditions. Tsipras and his Syriza party argued that a “no” vote on the referendum would strengthen Greece’s negotiating position as it would show that Greece wasn’t willing to accept the austerity terms without some kind of push back.
Yet the Greeks were not wholly in favor of a “no” vote. In opposition, the grassroots movement Menoume Europi (Stay in Europe) arose, which advocated for a “yes” vote, reflecting their eponymous desire to stick with the European Union. In the lead-up, EU leaders affirmed that they would read the “no” vote as a rejection of Europe, although Tsipras denied this.
The Greek referendum was announced just eight days before it was held. In the interim, Greece defaulted on its debt payment to the IMF. To make matters worse, the question on the ballot asked voters if they approved of a by-then-outdated proposal, made on June 25, 2015, by Greece’s creditors. These terms were already invalidated because, as mentioned, Greece has just defaulted on its debt.
Complicating this matter, the proposal to Greek voters was one of bureaucratic verbiage about tax changes and pension rules. Difficult stuff for the average citizen to comprehend, and with just eight days to study up on it, very little time to unpack it.
Meanwhile, the global financial markets, international leaders, and political pundits were nervously eying the referendum as the day approached. Tsipras rallied his supporters, exhorting them in fiery terms, “I call on you to say a big ‘no’ to ultimatums, a ‘no’ to blackmail. Turn your back on those who would terrorize you.” On the other side, the opposition emphasized to the public that this was really a vote of whether to stay in the EU or not.
At the time, Aristos Doxiadis, an economist and adviser to To Potami, an opposition, pro-Europe party, commented, “Once the banks closed, the whole game, or point of the referendum, changed completely. How on earth were we going to have functioning banks again? The referendum was never going to be about specific agreements. It is about whether we stay in the Eurozone or not.” Needless to say, the polls took on an outsized importance leading up to this high-stakes event.
The Problem
Unfortunately, events would reveal that not only were the polls wrong, they were significantly off. The final vote gave a 22.6-point advantage to the “no” vote over “yes,” as Table 7.2 shows. Interestingly, at the beginning of the week, the polls did show a substantial margin for “no.” Early polls were closer to the final results (17.7 versus 22.6) than the later polls. But over the course of the week, amid the bank closings and generalized chaos, many thought the narrowing gap shown by the polls made complete sense. Ultimately, the polls suggested a much closer race of around 5 points for “no.” The polls sent all the wrong signals to markets and other decision-makers.
Table 7.2 2015 Greek referendum, polling, and actual results
| Votes for “Yes” | Votes for “No” | Spread (No-Yes) | Number of polls | Sample size | Margin of error | |
|---|---|---|---|---|---|---|
| June 27–29, 2015 | 34.3 | 53 | −17.7 | 6 | 6,052 | 2.5% |
| July 4–5, 2015 | 46.9 | 50.2 | −3.3 | 5 | 5,000 | 2.8% |
| All polls | 41.7 | 47.3 | −5.6 | 31 | 31,325 | 1.1% |
| Actual referendum results | 38.7 | 61.3 | −22.6 | N/A | N/A | N/A |
Grexit was a major upset, seemingly defying the logic of the times.
Assessment
So, what went wrong? To assess, we will employ our total survey error framework here. As in Chapter 6, we will use the spread as our primary evaluative statistic. It is simple to calculate and intuitively appealing. However, many analysts use other statistics to evaluate the polls. The most common of which is the Average Absolute Difference (AAD).Footnote 4 To calculate the AAD, we take the actual election result
for candidate
minus the poll result
for candidate
and divide that by the number of candidates
in a given race. One benefit of the AAD is that it can be used in three-way races or more. See the following equation:
But, shifting gears back to our thought experiment, we will set aside the AAD and focus our analysis on the spread. As we dive in, remember that there are two broad classes of error – sampling and non-sampling error. For this exercise, we will rely on thirty-one polls conducted leading up to the referendum. The equation to take a simple average of the polls is shown as follows. In this case,
is the vote share of the individual poll,
is the total number of individual polls, and
is the average vote share of all the polls.
We can then calculate the “spread”
of our polling estimate. This is calculated simply by subtracting the vote share results for “no”
from that of “yes”
. The spread can be negative or positive.
Alternatively, we could have weighted the results by the sample size of each poll (a larger poll would get a larger weight) or some other criteria. Jackman’s equation is the industry standard for aggregating polls in this way.Footnote 5 To do this, all we need is the vote share and sample size for each poll. From there, we then:
Calculate the standard deviation
for each poll based on the vote share
and sample size
using this equation: 
Calculate the “precision” of each poll, which is based on the standard deviation and used to weight the poll based on the sample size. We calculate the precision
using this equation: 
Now we have all of the pieces for the individual polls to combine them. For simplicity’s sake, let’s say we’re just combining two polls: one from organization A and the other from organization B. We need the precision
and vote share
from A and B and we combine them like this:

Here, we have what’s called a “precision-weighted average” vote share. Polls with larger sample sizes are given more weight. Additionally, analysts often take into account the “house effects” of each individual polling firm. This is done to account for the systematic bias of any given polling firm. We opted not to make these additional adjusts for the sake of simplicity.
Now, as we move on to our error assessment, keep these key points in mind:
The spread (No-Yes) of the actual referendum results was −22.6 points.
The spread (No-Yes) shown in the polls was −5.6 points.
The margin of error of all the polls included in this analysis, which produced a sample of 31,325, was plus or minus 1.1%.
Sampling Error
First, could the miss have been due to sampling error? Taking into consideration the margin of error, the spread of all the polls could reasonably have been as high as −4.5 points and as low as −6.7 points.Footnote 6 Yet clearly, at −22.6, the actual spread was far outside these bounds. The polls conducted at the end of the week, or July 4 and 5, are outside the margin of error as well (−6.1 versus −22.6). While the polls published at the beginning of the week (June 27 to 29) are closer to reality (−20.2 versus −22.6), the spread of these polls still falls outside the margin of error.
In other words, no matter which way we slice it, the difference between the polls and the referendum itself was much larger than chance variability already baked into the polls. As such, we can’t chalk the miss up to random noise. Something else is at play. By definition, if not sampling error, then the polling problem in Greece must be the fault of non-sampling error. But which one?
Non-Sampling Error
As we turn our attention to non-sampling error, let’s first consider the problem of measurement error.
Divergent polls can result from questionnaire construction, the wording of specific questions, or the way in which we administer the questions to the respondent, all forms of measurement error. We also learned some best practices for constructing unbiased questions, or the way in which we administer the questions to the respondent. These are useful rules to assess public opinion polls.
In particular, we focus on three aspects of the ballot question:
It should be at or near the beginning of the questionnaire in order to avoid unintended influence from questions. Such influence can come from the general context of the question, or more specifically to the sequence of the questions.
It should be as neutral as possible to minimize biased responses. Here, we want to stay away from hot button words that might elicit a strong emotional response, inadvertently influencing responses.
The response options should be randomized in order to ensure that the order of the responses do not influence the way people answer.
In the case of the Greek referendum, very few of the polling firms published their questionnaires and more detailed methodological statements. This complicates the assessment of measurement error. That said, in our experience, referenda ballot questions often are difficult to understand because they deal with technical or esoteric topics. The case of the Greece referendum was no different. See the following referendum. As you can see, it is quite vague, difficult to understand and only makes tangential reference to key documents with little additional detail.
The Greek Referendum Question
The Greek people are asked to decide with their vote whether to accept the outline of the agreement submitted by the European Union, the European Central Bank and the International Monetary Fund at the Eurogroup of 25/06/15 and is made up of two parts which constitute their unified proposal:
The first document is entitled: Reforms for the completion of the current program and beyond and the second is Preliminary Debt Sustainability Analysis.
Whichever citizens reject the proposal by the three institutions vote: Not Approved/NO
Whichever citizens agree with the proposal by the three institutions vote: Approved/YES
The Grexit question was met with bemused astonishment by experts worldwide. What the Greek citizens themselves actually thought is harder to parse, although polling leading up to the referendum suggests that perspectives on what the fallout would be were split. The vast majority of “no” voters believed that their vote would not lead to Greece’s exit from the euro zone. Meanwhile, more than half of the “yes” voters believed that Grexit would likely result from a “no” vote.Footnote 7
Additionally, there are obvious political motivations in how referenda questions are framed. In this case, Tsipras and Syriza controlled the wording of the ballot question. Again, they were in favor of a “no” vote.
Consider the construction of the referendum question above. Notice how the “no” option precedes the “yes.” This construction runs counter to how humans think, which is typically from positive to negative, not the contrary. This construction makes the question appear to be an obvious attempt from the “no” camp to take advantage of the response order to influence voters. Ultimately, the opaque wording and biased question construction raises doubts about whether the true wishes of Greek voters were captured.
However, further evidence will show that a confusing ballot question was not the primary reason for the polling miss. Often, such an assessment is a process of elimination.
Nonresponse Bias, Coverage Bias, and Estimation Error
Could the Grexit polling miss have resulted from other forms of non-sampling error, such as coverage bias, nonresponse bias or estimation error? As we indicated in Chapters 5 and 6, nonresponse bias is not easy to assess directly. We often make the simplifying assumption that post-survey weighting will correct for any issues with nonresponse. This is a significant assumption. But for simplicity in this case, let’s remove it from the list.
In our experience, coverage bias is a common reason for polling misses in elections and other election-like events. It might be a strong culprit for the Grexit polling miss. Is this the case? The data suggests no. See in Table 7.3 how those in favor of “no” were more educated than those in favor of “yes.” People with this profile typically are less likely to have access to a telephone. So, our proxy coverage variable – education – is negatively correlated with no. If there were coverage bias, “no” should be more pronounced, not less.
Table 7.3 Those who are likely to vote by education
| Q8: Intention of vote in the referendum | All respondents (%) | <10 Years (%) | 12 Years (%) | University (%) |
|---|---|---|---|---|
| Accepting (YES) | 38 | 48 | 36 | 41 |
| Rejecting (NO) | 55 | 45 | 58 | 51 |
| White/invalid | 1 | 1 | 1 | 1 |
| Haven’t decided | 3 | 5 | 2 | 2 |
| Abstention | 1 | 1 | 1 | 2 |
| Don’t know | 0 | 0 | 0 | 0 |
| Don’t answer | 2 | 1 | 2 | 3 |
Additionally, remember that all thirty-one polls leading up to the referendum were conducted by phone, many of which deployed some mix of landline and cell phones. As you might recall from Chapter 5, pollsters may use mode of survey administration (face-to-face, telephone, mail, or online) as a quick proxy for coverage bias in the absence of other evidence. Again, as public opinion analysts, we normally don’t have access to the raw data from polling firms. We know that the rate of telephone ownership in Greece was high at the time (circa 90%). So, looks like the polling miss was not a result of coverage bias.
Alternatively, could the Greek miss have resulted from estimation error, or more specifically, from incorrectly identifying who would vote in the referendum? Determining what the voter population will look like is oftentimes the single most difficult task for pollsters. Likely voter models are equally, if not more, challenging for third-party analysts to interpret because outside observers don’t have access to the raw polling data or know what elements were considered when constructing the model.
As mentioned in Chapter 5, pollsters employ a variety of likely voter models in order to separate out those who will vote from those who won’t. Remember that from an international perspective, participation in elections is generally not obligatory. As a result, not everyone who is eligible to vote actually does so. Globally, the average turnout in national elections is around 65% among the voting age population.Footnote 8
In the case of the 2015 Greek referendum, 63% of the voting population turned out, which was on par with parliamentary election turnout at the time. But the question is, did the pollsters in Greece get the right subset of voters, that is, did they correctly identify who would show up to vote?
To get at this, we will take an Ipsos tracking poll conducted from June 29 to July 4, 2015 (Graph 7.1). The poll consisted of a daily sample of 800 interviews; it aggregated the daily sample into a three-day rolling average of 2,400 interviews to minimize sampling error. To assess different turnout scenarios, we employed a modified-Gallup likely voter model based on a summated index of multiple items.

Graph 7.1 Greek referendum polling (“No” responses minus “Yes” responses)
We ranked respondents by their likely voter scores. We, then, made likely vote cuts at different levels of turnout (65% to all adults). The “all adults” scenario represents a naïve model utilized by most pollsters at the time; the 65% scenario represents a likely voter model that approximates the actual turnout levels (65% versus 63%).
As the graph would indicate, the pollsters did not correctly identify who would vote. At a turnout of around 65%, close to that of the referendum, we replicate the final election results (−24 as opposed to −22.6). In contrast, the naïve model (100% turnout) closely mimics the polling results at the time with an average spread of around −7 points. Here, it is worth noting that there was a trend toward “yes” over the course of the week. But, after taking the likely voter population into account, we see that this never put “no” at risk. These are important insights lost to decision-makers at the time. The polling miss looks to be a likely voter problem.
Conclusion
In this chapter, we applied our total error framework to the case of the 2015 Greek bailout referendum. This polling miss had profound market consequences and was a black eye for the polling industry. But there are ways to assess what went wrong, in Greece and other instances where the polls whiffed, as demonstrated in this chapter. Often such assessment is far from cut and dry but instead is a process of elimination and empirical conjecture. The total survey error framework is essential for thinking through such problems.
As discussed, the evidence strongly suggests that the problem in Greece resulted from a likely voter problem; more specifically, incorrectly identifying who would show up on election-day. So, why did pollsters not utilize likely voter models at the time? This is a complex question to answer.
Many didn’t; some did. The most common approach at the time was to weight the data by the results of the last parliamentary election. This is a brute force method for determining the profile of who ends up voting on referendum day. Yet, it does not measure likelihood to vote directly and makes a strong assumption that the past will predict the future. Some electoral events follow a completely different logic than past ones. We only have to think of the 2016 US election, as seen in Table 7.4, to see the risks of assuming that the voting patterns of the past will play out in the future.
Table 7.4 Some examples of election misses
| Actual margin | Projected margin based on polling | Picked correct winner | Issues at play | |
|---|---|---|---|---|
| 2016 US presidential election | 2.1 | 3.2 | No | Rural white voters were missing in the polls; the polls themselves favored Clinton over Trump; Swing states mattered most. State polls overstated as well |
| 2019 Argentinian PASO election | 14 | 4 | Yes | The polls overestimated Macri relative to Fernandez. Online and telephone polls failed to adequately cover lower SES voters. They were untested methodologies. |
| 2018 Columbian referendum | −0.4 | 30 | No | Coverage bias and silent refusals or “no’s” |
| 2015 Greek referendum, Grexit | −22.6 | −3.3 | Yes | Polls in the last two days were particularly off. Some analysts pointed to herding or coverage bias. Others point to pollsters not using likely voter models. |
| 2016 Brexit | −3.8 | 2 | No | The polls missed soft Brexit voters, those who were struggling with economic issues. Differential nonresponse also might have been a culprit. |
| 2020 US presidential election | 1.7 | 4.3 | Yes | Post-election analysis suggests that there was an issue of nonresponse bias among Trump voters, particularly those who do not typically vote |
Remember the Greek case was a referendum and not a parliamentary election. This means that the patterns would not necessarily map to those traditionally seen during the latter. Ultimately, the methodological approach was a serious blind spot for pollsters as it reinforced already preexisting beliefs in favor of “yes” – as mentioned, many elites and more educated Greeks were pro-EU and pro-yes. Such cognitive biases together with their own experiences of the chaos on the ground only validated the polling data coming in. We will learn more about such problems in Chapter 8.
Some argued that the polling miss resulted from “herding” as pollsters adjusted their results in response to other polls and what seemed most intuitively “correct” given the circumstances.Footnote 9 This might have been a secondary or tertiary culprit, but would be very difficult to ascertain directly. The more likely explanation was that pollsters were using the same or similarly faulty approaches for identifying likely voters, as detailed earlier.
Finally, the Grexit question was also marred by the biased construction of the referendum wording itself. Generally, voters are given more straightforward options at the polls. While Grexit is an extreme case, precedent tells us that even when question wording is perfectly clear, the polls may still be off in aggregate. The lessons of Grexit still offer useful insights that apply to more neutrally worded ballot questions as well as contexts in which voters seemingly act in contradiction to their own interests.Footnote 10
As a final note, poll aggregation sites have become ubiquitous around the world. We use them frequently as data sources and for additional analytical insight. However, aggregator sites are not the only source. Wikipedia entries and desk research also make polling data accessible with a bit of legwork. The pollster of today does not lack for data.
