The Case of Grexit and Assessing the Polls in Aggregate

Clifford Young; Kathryn Ziemer

doi:10.1017/9781108855310.009

7 - The Case of Grexit and Assessing the Polls in Aggregate

from Part II - The Pollster as Data Scientist

Published online by Cambridge University Press: 01 November 2024

Clifford Young and

Kathryn Ziemer

Show author details

Clifford Young: Affiliation:
Ipsos Public Affairs
Kathryn Ziemer: Affiliation:
Ipsos Public Affairs

Book contents

Summary

This chapter applies the total error framework presented in Chapter 5 to a case example of aggregate polls in the 2015 Greek referendum. The focus here is on why the polls in aggregate predicted the wrong outcome.

Keywords

poll aggregation 2015 Greek referendum Grexit total error framework

Information

Type: Chapter
Information: Polls, Pollsters, and Public Opinion
A Guide for Decision-Makers
, pp. 94 - 108

DOI: https://doi.org/10.1017/9781108855310.009 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2024

7 The Case of Grexit and Assessing the Polls in Aggregate

Introduction

We now know more about the ways in which a single poll can go awry. But how can the pollster protect themselves in situations when multiple polls are biased, leading to flawed forecasts?

As mentioned in earlier chapters, one of the more common methods for reducing sample variability is poll aggregation. The rough idea is that by assessing the results of multiple polls together, the effective sample size will be increased and thus the margin of error minimized.Footnote ¹ Aggregators all have the same technical objective of minimizing noise and maximizing signal. But does this work in practice? This is what we will explore in detail in this chapter.

Before we dive in, let’s take a look at what poll aggregation looks like in its simplest form – that is to say, as a simple average. Table 7.1 aggregates the polls conducted during the last two days of the 2016 US presidential election cycle. This results in a total sample size of 17,677 interviews, with a corresponding margin of error of 0.7%. In contrast, as we learned in Chapter 5, any single poll with a sample size of around 1,000 interviews has about an +/−3.1% margin of error. See how the individual polls are more variable relative to each other and their margin of errors are larger. Ultimately, the market average came close to the actual election results (+3.2 versus +2.1).

Table 7.1 Poll results published in the last two days of the 2016 election

	Date	Sample size	Margin of error (MOE)	Percent support Clinton	Percent support Trump	Spread
Bloomberg	11/4–11/6	799	3.5%	46	43	Clinton +3
IBD/TIPP Tracking	11/4–11/7	1107	2.9%	43	42	Clinton +1
The Economist/YouGov	11/4–11/7	3669	1.6%	49	45	Clinton +4
LA Times/USC Tracking	11/1–11/7	2935	1.8%	44	47	Trump +3
ABC/Wash Post Tracking	11/3–11/6	2220	2.1%	49	46	Clinton +3
Fox News	11/3–11/6	1295	2.7%	48	44	Clinton +4
Monmouth	11/3–11/6	748	3.6%	50	44	Clinton +6
NBC News/Wall Street Journal	11/3–11/5	1282	2.7%	48	43	Clinton +5
CBS News	11/2–11/6	1426	2.6%	47	43	Clinton +4
Reuters/Ipsos	11/2–11/6	2196	2.1%	44	39	Clinton +5
AVERAGE		17677	0.7%	46.8	43.6	Clinton +3.2

Sources: Bloomberg, Reuters/Ipsos, CBS News, NBC News/Wall Street Journal, Fox News, ABC/Wash Post, Monmouth, LA Times/USC, The Economist/YouGov, IBD/TIPP

As we move through this chapter, remember that the margin of error is a measure of sampling error. Non-sampling error also contributes to polling misses.

Not all aggregators are created equal. Some report a simple rolling average over a given time period. Others use sophisticated algorithms, such as Monte Carlo Markov Chain models, to account for outliers and sparse data. Still others utilize additional inputs like economic data or historic election results to “smooth” their estimates.

The aggregate is only as good as the individual polls that underpin it. Biased individual polls lead to biased aggregates. The “market of polls” can also have a gravitational force of its own. Polling outfits will closely watch the market average and in some cases seek to adjust their own results to reflect the general consensus. This is known as “herding” and can push the polls toward an artificial standard. We will discuss herding in more detail in Chapter 8.

In the preceding chapters, we detailed and then employed our total error framework to assess the quality of a single poll. As we saw there, many election misses come down to failing to correctly identify the voting population or accounting for coverage bias. We also saw how other forms of error, such as poor question formulation, can lead to significant analytic uncertainty.

In this chapter, we will apply our total error framework to the polls in aggregate. Such analysis is typically done retrospectively in order to determine why the polls did not predict a given outcome.

However, aggregate assessment can also be done in real-time to assess the polls against models, economic data, social media activity, and the like. In these instances, we seek to probe why the polls are at variance with other evidence, and to detect whether there is some systemic bias in the polls that is sending the wrong signal.

We most commonly assess the performance of polls relative to elections. But we can apply this to non-electoral cases as well, such as referenda, impeachments, and reform bills. The primary characteristic of all these instances is that they are distinct outcomes, bounded with some degree of discreteness. In this chapter, we will conduct a retrospective analysis of one of the most astonishing polling misses in recent memory, the 2015 Greek referendum, or Grexit.

2015 Greek Referendum: Grexit

Context

In the summer of 2015, the Greek sovereign debt crisis had reached a breaking point.Footnote ² The Greek government missed its $1.7 billion debt payment due to the International Monetary Fund (IMF). As a result, banks closed, and Greek citizens scrambled to withdraw cash from ATMs. The IMF, the European Commission, and the European Central Bank offered Greece a bailout with certain austerity conditions. The specter of Greece’s exit from the Eurozone and a return to the drachma loomed. Prime Minister Alexis Tsipras, who had been elected on an anti-austerity platform, opposed the bailout and called a last-minute referendum allowing the citizens to vote on whether or not to accept said conditions. Tsipras and his Syriza party argued that a “no” vote on the referendum would strengthen Greece’s negotiating position as it would show that Greece wasn’t willing to accept the austerity terms without some kind of push back.

Yet the Greeks were not wholly in favor of a “no” vote. In opposition, the grassroots movement Menoume Europi (Stay in Europe) arose, which advocated for a “yes” vote, reflecting their eponymous desire to stick with the European Union. In the lead-up, EU leaders affirmed that they would read the “no” vote as a rejection of Europe, although Tsipras denied this.

The Greek referendum was announced just eight days before it was held. In the interim, Greece defaulted on its debt payment to the IMF. To make matters worse, the question on the ballot asked voters if they approved of a by-then-outdated proposal, made on June 25, 2015, by Greece’s creditors. These terms were already invalidated because, as mentioned, Greece has just defaulted on its debt.

Complicating this matter, the proposal to Greek voters was one of bureaucratic verbiage about tax changes and pension rules. Difficult stuff for the average citizen to comprehend, and with just eight days to study up on it, very little time to unpack it.

Meanwhile, the global financial markets, international leaders, and political pundits were nervously eying the referendum as the day approached. Tsipras rallied his supporters, exhorting them in fiery terms, “I call on you to say a big ‘no’ to ultimatums, a ‘no’ to blackmail. Turn your back on those who would terrorize you.” On the other side, the opposition emphasized to the public that this was really a vote of whether to stay in the EU or not.

At the time, Aristos Doxiadis, an economist and adviser to To Potami, an opposition, pro-Europe party, commented, “Once the banks closed, the whole game, or point of the referendum, changed completely. How on earth were we going to have functioning banks again? The referendum was never going to be about specific agreements. It is about whether we stay in the Eurozone or not.” Needless to say, the polls took on an outsized importance leading up to this high-stakes event.

The Problem

Unfortunately, events would reveal that not only were the polls wrong, they were significantly off. The final vote gave a 22.6-point advantage to the “no” vote over “yes,” as Table 7.2 shows. Interestingly, at the beginning of the week, the polls did show a substantial margin for “no.” Early polls were closer to the final results (17.7 versus 22.6) than the later polls. But over the course of the week, amid the bank closings and generalized chaos, many thought the narrowing gap shown by the polls made complete sense. Ultimately, the polls suggested a much closer race of around 5 points for “no.” The polls sent all the wrong signals to markets and other decision-makers.

Table 7.2 2015 Greek referendum, polling, and actual results

	Votes for “Yes”	Votes for “No”	Spread (No-Yes)	Number of polls	Sample size	Margin of error
June 27–29, 2015	34.3	53	−17.7	6	6,052	2.5%
July 4–5, 2015	46.9	50.2	−3.3	5	5,000	2.8%
All polls	41.7	47.3	−5.6	31	31,325	1.1%
Actual referendum results	38.7	61.3	−22.6	N/A	N/A	N/A

Sources: Greek Ministry of the Interior, MRB, MARC, Metron Analysis, Alco, GPO, Ipsos, PAMAK, Public Issue³

Grexit was a major upset, seemingly defying the logic of the times.

Assessment

So, what went wrong? To assess, we will employ our total survey error framework here. As in Chapter 6, we will use the spread as our primary evaluative statistic. It is simple to calculate and intuitively appealing. However, many analysts use other statistics to evaluate the polls. The most common of which is the Average Absolute Difference (AAD).Footnote ⁴ To calculate the AAD, we take the actual election result $(A R)$ for candidate $i$ minus the poll result $(P R)$ for candidate $i$ and divide that by the number of candidates $(c)$ in a given race. One benefit of the AAD is that it can be used in three-way races or more. See the following equation:

A A D = (\sum_{}^{} |A R_{i} - P R_{i}|) / c

But, shifting gears back to our thought experiment, we will set aside the AAD and focus our analysis on the spread. As we dive in, remember that there are two broad classes of error – sampling and non-sampling error. For this exercise, we will rely on thirty-one polls conducted leading up to the referendum. The equation to take a simple average of the polls is shown as follows. In this case, ${\hat{a}}_{i}$ is the vote share of the individual poll, $n$ is the total number of individual polls, and $\underline{a}$ is the average vote share of all the polls.

\underline{a} = \frac{1}{n} \sum_{i = 1}^{n} {\hat{a}}_{i}

We can then calculate the “spread” $(\underline{s})$ of our polling estimate. This is calculated simply by subtracting the vote share results for “no” $({\underline{a}}_{no})$ from that of “yes” $({\underline{a}}_{yes})$ . The spread can be negative or positive.

\underline{s} = {\underline{a}}_{no} - {\underline{a}}_{yes}

Alternatively, we could have weighted the results by the sample size of each poll (a larger poll would get a larger weight) or some other criteria. Jackman’s equation is the industry standard for aggregating polls in this way.Footnote ⁵ To do this, all we need is the vote share and sample size for each poll. From there, we then:

Calculate the standard deviation $(s)$ for each poll based on the vote share $(\hat{a})$ and sample size $(n)$ using this equation: $s = \sqrt{\hat{a} (1 - \hat{a}) / n}$
Calculate the “precision” of each poll, which is based on the standard deviation and used to weight the poll based on the sample size. We calculate the precision $(p)$ using this equation: $p = 1 / s^{2}$

Now we have all of the pieces for the individual polls to combine them. For simplicity’s sake, let’s say we’re just combining two polls: one from organization A and the other from organization B. We need the precision $(p)$ and vote share $(a)$ from A and B and we combine them like this:

{\hat{a}}_{A B} = \frac{p_{A} {\hat{a}}_{A} + p_{B} {\hat{a}}_{B}}{p_{A} + p_{B}}

Here, we have what’s called a “precision-weighted average” vote share. Polls with larger sample sizes are given more weight. Additionally, analysts often take into account the “house effects” of each individual polling firm. This is done to account for the systematic bias of any given polling firm. We opted not to make these additional adjusts for the sake of simplicity.

Now, as we move on to our error assessment, keep these key points in mind:

The spread (No-Yes) of the actual referendum results was −22.6 points.
The spread (No-Yes) shown in the polls was −5.6 points.
The margin of error of all the polls included in this analysis, which produced a sample of 31,325, was plus or minus 1.1%.

Sampling Error

First, could the miss have been due to sampling error? Taking into consideration the margin of error, the spread of all the polls could reasonably have been as high as −4.5 points and as low as −6.7 points.Footnote ⁶ Yet clearly, at −22.6, the actual spread was far outside these bounds. The polls conducted at the end of the week, or July 4 and 5, are outside the margin of error as well (−6.1 versus −22.6). While the polls published at the beginning of the week (June 27 to 29) are closer to reality (−20.2 versus −22.6), the spread of these polls still falls outside the margin of error.

In other words, no matter which way we slice it, the difference between the polls and the referendum itself was much larger than chance variability already baked into the polls. As such, we can’t chalk the miss up to random noise. Something else is at play. By definition, if not sampling error, then the polling problem in Greece must be the fault of non-sampling error. But which one?

Non-Sampling Error

As we turn our attention to non-sampling error, let’s first consider the problem of measurement error.

Divergent polls can result from questionnaire construction, the wording of specific questions, or the way in which we administer the questions to the respondent, all forms of measurement error. We also learned some best practices for constructing unbiased questions, or the way in which we administer the questions to the respondent. These are useful rules to assess public opinion polls.

In particular, we focus on three aspects of the ballot question:

It should be at or near the beginning of the questionnaire in order to avoid unintended influence from questions. Such influence can come from the general context of the question, or more specifically to the sequence of the questions.
It should be as neutral as possible to minimize biased responses. Here, we want to stay away from hot button words that might elicit a strong emotional response, inadvertently influencing responses.
The response options should be randomized in order to ensure that the order of the responses do not influence the way people answer.

In the case of the Greek referendum, very few of the polling firms published their questionnaires and more detailed methodological statements. This complicates the assessment of measurement error. That said, in our experience, referenda ballot questions often are difficult to understand because they deal with technical or esoteric topics. The case of the Greece referendum was no different. See the following referendum. As you can see, it is quite vague, difficult to understand and only makes tangential reference to key documents with little additional detail.

The Greek Referendum Question

The Greek people are asked to decide with their vote whether to accept the outline of the agreement submitted by the European Union, the European Central Bank and the International Monetary Fund at the Eurogroup of 25/06/15 and is made up of two parts which constitute their unified proposal:

The first document is entitled: Reforms for the completion of the current program and beyond and the second is Preliminary Debt Sustainability Analysis.

Whichever citizens reject the proposal by the three institutions vote: Not Approved/NO

Whichever citizens agree with the proposal by the three institutions vote: Approved/YES

The Grexit question was met with bemused astonishment by experts worldwide. What the Greek citizens themselves actually thought is harder to parse, although polling leading up to the referendum suggests that perspectives on what the fallout would be were split. The vast majority of “no” voters believed that their vote would not lead to Greece’s exit from the euro zone. Meanwhile, more than half of the “yes” voters believed that Grexit would likely result from a “no” vote.Footnote ⁷

Additionally, there are obvious political motivations in how referenda questions are framed. In this case, Tsipras and Syriza controlled the wording of the ballot question. Again, they were in favor of a “no” vote.

Consider the construction of the referendum question above. Notice how the “no” option precedes the “yes.” This construction runs counter to how humans think, which is typically from positive to negative, not the contrary. This construction makes the question appear to be an obvious attempt from the “no” camp to take advantage of the response order to influence voters. Ultimately, the opaque wording and biased question construction raises doubts about whether the true wishes of Greek voters were captured.

However, further evidence will show that a confusing ballot question was not the primary reason for the polling miss. Often, such an assessment is a process of elimination.

Nonresponse Bias, Coverage Bias, and Estimation Error

Could the Grexit polling miss have resulted from other forms of non-sampling error, such as coverage bias, nonresponse bias or estimation error? As we indicated in Chapters 5 and 6, nonresponse bias is not easy to assess directly. We often make the simplifying assumption that post-survey weighting will correct for any issues with nonresponse. This is a significant assumption. But for simplicity in this case, let’s remove it from the list.

In our experience, coverage bias is a common reason for polling misses in elections and other election-like events. It might be a strong culprit for the Grexit polling miss. Is this the case? The data suggests no. See in Table 7.3 how those in favor of “no” were more educated than those in favor of “yes.” People with this profile typically are less likely to have access to a telephone. So, our proxy coverage variable – education – is negatively correlated with no. If there were coverage bias, “no” should be more pronounced, not less.

Table 7.3 Those who are likely to vote by education

Q8: Intention of vote in the referendum	All respondents (%)	<10 Years (%)	12 Years (%)	University (%)
Accepting (YES)	38	48	36	41
Rejecting (NO)	55	45	58	51
White/invalid	1	1	1	1
Haven’t decided	3	5	2	2
Abstention	1	1	1	2
Don’t know	0	0	0	0
Don’t answer	2	1	2	3

Source: Eurasia/Ipsos Tracking Poll June 29 to July 4

Additionally, remember that all thirty-one polls leading up to the referendum were conducted by phone, many of which deployed some mix of landline and cell phones. As you might recall from Chapter 5, pollsters may use mode of survey administration (face-to-face, telephone, mail, or online) as a quick proxy for coverage bias in the absence of other evidence. Again, as public opinion analysts, we normally don’t have access to the raw data from polling firms. We know that the rate of telephone ownership in Greece was high at the time (circa 90%). So, looks like the polling miss was not a result of coverage bias.

Alternatively, could the Greek miss have resulted from estimation error, or more specifically, from incorrectly identifying who would vote in the referendum? Determining what the voter population will look like is oftentimes the single most difficult task for pollsters. Likely voter models are equally, if not more, challenging for third-party analysts to interpret because outside observers don’t have access to the raw polling data or know what elements were considered when constructing the model.

As mentioned in Chapter 5, pollsters employ a variety of likely voter models in order to separate out those who will vote from those who won’t. Remember that from an international perspective, participation in elections is generally not obligatory. As a result, not everyone who is eligible to vote actually does so. Globally, the average turnout in national elections is around 65% among the voting age population.Footnote ⁸

In the case of the 2015 Greek referendum, 63% of the voting population turned out, which was on par with parliamentary election turnout at the time. But the question is, did the pollsters in Greece get the right subset of voters, that is, did they correctly identify who would show up to vote?

To get at this, we will take an Ipsos tracking poll conducted from June 29 to July 4, 2015 (Graph 7.1). The poll consisted of a daily sample of 800 interviews; it aggregated the daily sample into a three-day rolling average of 2,400 interviews to minimize sampling error. To assess different turnout scenarios, we employed a modified-Gallup likely voter model based on a summated index of multiple items.

Graph 7.1 Greek referendum polling (“No” responses minus “Yes” responses)

Source: Eurasia/Ipsos Polling, June 29–July 4, 2015

We ranked respondents by their likely voter scores. We, then, made likely vote cuts at different levels of turnout (65% to all adults). The “all adults” scenario represents a naïve model utilized by most pollsters at the time; the 65% scenario represents a likely voter model that approximates the actual turnout levels (65% versus 63%).

As the graph would indicate, the pollsters did not correctly identify who would vote. At a turnout of around 65%, close to that of the referendum, we replicate the final election results (−24 as opposed to −22.6). In contrast, the naïve model (100% turnout) closely mimics the polling results at the time with an average spread of around −7 points. Here, it is worth noting that there was a trend toward “yes” over the course of the week. But, after taking the likely voter population into account, we see that this never put “no” at risk. These are important insights lost to decision-makers at the time. The polling miss looks to be a likely voter problem.

Conclusion

In this chapter, we applied our total error framework to the case of the 2015 Greek bailout referendum. This polling miss had profound market consequences and was a black eye for the polling industry. But there are ways to assess what went wrong, in Greece and other instances where the polls whiffed, as demonstrated in this chapter. Often such assessment is far from cut and dry but instead is a process of elimination and empirical conjecture. The total survey error framework is essential for thinking through such problems.

As discussed, the evidence strongly suggests that the problem in Greece resulted from a likely voter problem; more specifically, incorrectly identifying who would show up on election-day. So, why did pollsters not utilize likely voter models at the time? This is a complex question to answer.

Many didn’t; some did. The most common approach at the time was to weight the data by the results of the last parliamentary election. This is a brute force method for determining the profile of who ends up voting on referendum day. Yet, it does not measure likelihood to vote directly and makes a strong assumption that the past will predict the future. Some electoral events follow a completely different logic than past ones. We only have to think of the 2016 US election, as seen in Table 7.4, to see the risks of assuming that the voting patterns of the past will play out in the future.

Table 7.4 Some examples of election misses

	Actual margin	Projected margin based on polling	Picked correct winner	Issues at play
2016 US presidential election	2.1	3.2	No	Rural white voters were missing in the polls; the polls themselves favored Clinton over Trump; Swing states mattered most. State polls overstated as well
2019 Argentinian PASO election	14	4	Yes	The polls overestimated Macri relative to Fernandez. Online and telephone polls failed to adequately cover lower SES voters. They were untested methodologies.
2018 Columbian referendum	−0.4	30	No	Coverage bias and silent refusals or “no’s”
2015 Greek referendum, Grexit	−22.6	−3.3	Yes	Polls in the last two days were particularly off. Some analysts pointed to herding or coverage bias. Others point to pollsters not using likely voter models.
2016 Brexit	−3.8	2	No	The polls missed soft Brexit voters, those who were struggling with economic issues. Differential nonresponse also might have been a culprit.
2020 US presidential election	1.7	4.3	Yes	Post-election analysis suggests that there was an issue of nonresponse bias among Trump voters, particularly those who do not typically vote

Sources: RealClearPolitics, AP, UK Election Commission, Financial Times, The Telegraph, Greek Ministry of the Interior, MRB, MARC, Metron Analysis, Alco, GPO, Ipsos, PAMAK, Public Issue, ToThePoint,

Remember the Greek case was a referendum and not a parliamentary election. This means that the patterns would not necessarily map to those traditionally seen during the latter. Ultimately, the methodological approach was a serious blind spot for pollsters as it reinforced already preexisting beliefs in favor of “yes” – as mentioned, many elites and more educated Greeks were pro-EU and pro-yes. Such cognitive biases together with their own experiences of the chaos on the ground only validated the polling data coming in. We will learn more about such problems in Chapter 8.

Some argued that the polling miss resulted from “herding” as pollsters adjusted their results in response to other polls and what seemed most intuitively “correct” given the circumstances.Footnote ⁹ This might have been a secondary or tertiary culprit, but would be very difficult to ascertain directly. The more likely explanation was that pollsters were using the same or similarly faulty approaches for identifying likely voters, as detailed earlier.

Finally, the Grexit question was also marred by the biased construction of the referendum wording itself. Generally, voters are given more straightforward options at the polls. While Grexit is an extreme case, precedent tells us that even when question wording is perfectly clear, the polls may still be off in aggregate. The lessons of Grexit still offer useful insights that apply to more neutrally worded ballot questions as well as contexts in which voters seemingly act in contradiction to their own interests.Footnote ¹⁰

As a final note, poll aggregation sites have become ubiquitous around the world. We use them frequently as data sources and for additional analytical insight. However, aggregator sites are not the only source. Wikipedia entries and desk research also make polling data accessible with a bit of legwork. The pollster of today does not lack for data.

Footnotes

¹ Jackman (2005) “Pooling the Polls over an Election Campaign,” Australian Journal of Political Science Vol. 40 Issue 4, pp. 499–517; Jackson (2018) “The Rise of Poll Aggregation and Election Forecasting,” in Polling and Survey Methods. Eds. Atkeson and Alvarez. Oxford Press.

² Council on Foreign Relations. Greece’s Debt Crisis: 1974–2018. Retrieved from: www.cfr.org/timeline/greeces-debt-crisis-timeline.

³ The polling results shown in the table are based on thirty-one phone-based polls conducted prior to the referendum.

⁴ Mitofsky (1998) “The Polls – Review: Was 1996 a Worse Year for the Polls than 1948?” in Public Opinion Quarterly Vol. 62, pp. 230–249.

⁵ Jackman (2005) “Pooling the Polls over an Election Campaign,” in Australian Journal of Political Science Vol. 40, Issue 4, pp. 499–517.

⁶ Here, we calculate the MOE of “the spread” by adjusting it by standard error of the difference of two proportions of the same population, where p1 is the proportion of the vote for “No”; p2 is the proportion of the vote for “Yes” and t is a t-value of 1.96 for a 95% confidence interval.

MOE=t*1n *p1+p2−p1−p22

⁷ Walter (2015) “What Were the Greeks Thinking? Here’s a Poll Taken Just before the Referendum.” Washington Post. July 9, 2015.

⁸ DeSilver (2020) “In Past Elections, US Trailed Most Developed Countries in Voter Turnout.” Pew Research Center, November 3, 2020.

⁹ Silver (2015) “The Polls Were Bad in Greece. The Conventional Wisdom Was Worse,” in FiveThirtyEight July 7, 2015.

¹⁰ Walter (2020) “The Mass Politics of International Disintegration.” Center for Comparative and International Studies. Working Paper No. 105. June 2020.