1. Introduction
Does happiness generally fall in middle age and then rise as people get older, in line with the popular ‘ushape’ idea? A recent contribution to research on this question (Blanchflower, Reference Blanchflower2021) offered as its core finding the idea that the relationship between age and happiness is indeed ushaped virtually ‘everywhere’, including 45 of 46 countries in Europe.
More broadly, the ushape idea is a matter of significant dispute. In some contributions, the relationship is perceived as flat (Easterlin, Reference de Ree and Alessie2006; Kassenboehmer and HaiskenDeNew, Reference Jowell2012). Some researchers find that happiness does not fall in early adulthood but rather rises (Galambos et al., Reference Frijters and Beatton2015). Others find that happiness falls (instead of rising) in older age, especially as people become very old (e.g. Frijters and Beatton, Reference Engman2012; Gerstorf et al., Reference Gangl2008). Laaksonen (Reference Kratz and Brüderl2018) finds that the pattern is typically more complex than a ushape. The idea that there is a uniform pattern at country level is evaluated (and dismissed) by Bartram (Reference Bartram2021a), Bittman (Reference Bittmann2021) and Galambos et al. (Reference Galambos, Fang, Krahn, Johnson and Lachman2020). In the context of these competing conclusions, a paper that finds a ushape everywhere is intriguing, to say the least.
In this article, I reconsider that finding by analysing data on European countries using a set of methodological decisions that depart from those of Blanchflower (Reference Blanchflower2021). Blanchflower’s ‘ushapes everywhere’ finding comes from models that restrict the analysis to people under the age of 70, impose a quadratic function for the age effects, and include control variables for individual characteristics/circumstances (sex, marital status, education and labour force status). I focus first on the decisions to include the specified controls and to restrict the analysis to people younger than 70; I show that these decisions inflate the coefficients for the quadratic age function, pushing them away from zero and thus increasing the impression of a ushape. In line with Bartram (Reference Bartram2021a), Glenn (Reference Gerstorf, Ram, Estabrook, Schupp, Wagner and Lindenberger2009) and Hellevik (Reference Glenn2017), I argue that inclusion of controls for individual circumstances induces bias in the estimate of a causal impact of age on happiness (age → happiness is a useful shorthand); there are good reasons (presented below) to prefer models without these controls. For analysis of European countries, I also argue that we should not restrict age to a maximum of 70, given that life expectancy in Europe is generally rather higher than that.
I then present models that portray the age → happiness relationship via age ranges (categories), rather than using the conventional quadratic functional form. Use of a specified functional form imposes an unnecessary constraint (Bittman Reference Bittmann2021; Kratz and Brüderl, Reference Kassenboehmer and HaiskenDeNew2021). If the actual relationship is in line with the specified function, then that relationship will also show up in an analysis that does not impose the constraint. If, however, the actual relationship is not consistent with the specified function, then the constraint is likely to act as a distortion—an impediment to discerning the underlying relationship. Compared to an analysis that is less restrictive in functional form, a specific function is likely to be all pain and no gain, certainly if the actual relationship is not consistent with that function. Using age ranges, it is also more feasible to include a control for cohort, as a way of ensuring that variation assigned to an age variable is not in fact reflecting differences rooted in people’s earlylife experiences.
The main focus of this article is methodological, so I do not review previous findings in substantive/theoretical terms (i.e. reasons we might expect to find ushapes, or not find them). To keep things relatively simple, I focus only on Europe, use a single dataset (the European Social Survey, rounds 1–8), and analyse only one dependent variable (happiness). If readers find the methodological arguments convincing, the approach used here can (and surely should) be applied to other regions, datasets and dependent variables.
2. A methodological evaluation and revision
In a very general sense, there is no single ‘right’ way to analyse data for any specific research question. All methodological decisions come with advantages and disadvantages. To foster confidence in our decisions, it helps to present readers with the advantages and disadvantages explicitly, and to demonstrate the consequences of our decisions via comparison to results reached via different decisions. In this section, I explore two key methodological issues noted above (control variables and restriction of the age range), first via arguments and then via a comparative analysis of results for a single country (Germany). Having reached a detailed conclusion about the ‘best’ way to analyse data for estimating age → happiness (not a perfect way, but better than the alternatives), I then proceed in the following section to present a more comprehensive set of results for European countries.
2.1. Control variables
In general, quantitative researchers would almost always use control variables in analyses that consider the relationship between one variable (X) and another (Y), especially if the relationship is to be interpreted as causal (X → Y). Unfortunately, a great deal is typically taken for granted about what control variables do (in regression models), and about how to select them (Bartram, Reference Bartram2021b). In some quantitative contributions, no particular criterion for selection of control variables is articulated. Sometimes researchers are simply guided by precedent, using the same set of controls used by one (or a few) previous researcher(s). In recent years, we see more attention paid to those topics (e.g. Gangl, Reference Galambos, Krahn, Johnson and Lachman2010; Pearl and Mackenzie, Reference O’Brien2018). It is possible (and surely necessary) to go beyond what we might call a ‘conventional’ perspective about controls.
If a criterion of any sort is articulated, that criterion is conventionally: control for ‘other determinants’ of the dependent variable (Y). That criterion is ineffective and potentially misleading. Having identified ‘other determinants’ of Y, it is then necessary to consider the relationship between the potential controls (here labelled W) and the focal independent variable (X). It bears emphasising that this is not (yet) the standard practice in the social sciences—though it is well established in epidemiology (e.g. Schisterman et al., Reference Schisterman, Cole and Platt2009).
To discern the importance of taking this additional step we can consider a straightforward distinction between two possibilities:

1. Is the potential control (W) an antecedent of the focal independent variable? Is the pattern W → X?

2. Or, is the potential control influenced by the focal independent variable? In other words, is it X → W?
Control variables do not do just one thing. A great deal depends on this distinction. To see why, we can articulate what the purpose of statistical control is. The purpose is to avoid bias in our estimates of X → Y. The relationship we see in a bivariate analysis might reflect the influence of some third variable on X and Y. In the classic example, children’s academic abilities are correlated with their shoe size—but it is a nonsense to conclude that one causes the other. Once we control for age, the correlation disappears; age is the true ‘cause’ of both. The example works because the control (age) is an antecedent of the other two variables; in particular, it is an antecedent of X (W → X), whichever variable is identified here as X (shoe size or academic abilities). The control works as intended to redress bias on this basis.
However, when the potential control is influenced by the focal independent variable (X → W), the control does not work to redress bias; instead, inclusion of the control exacerbates bias. Suppose we want to estimate unemployment → happiness. One of the ‘other determinants’ of happiness is income. Should we control for income? Income is not an important antecedent of unemployment (our X here); people can get sacked from any sort of job, regardless of salary. The relationship is better captured by X → W: when someone gets sacked, their income goes down (often dramatically), and the lower income leads to lower happiness. If we control for income when estimating unemployment → happiness, we compare unemployed people to employed people while holding income constant—that is, the comparison takes place among people earning the same level of income. The estimate of unemployment → happiness now obscures part of the actual effect of unemployment on happiness, the portion that travels via unemployment’s impact on income. We likely need other controls, but it is an error to include income as a control here.
So, to estimate X → Y, we need controls where W → X, and we must exclude controls where X → W (Bartram, Reference Bartram2021b). (Perhaps some potential controls are unrelated to X. In that case, they are irrelevant, even if related to Y; they are not needed to estimate X → Y, though including them will not harm the estimate.) A consideration of how W relates to X is essential; it is not enough to consider how W relates to Y.
So, to identify the controls needed to estimate age → happiness, we need to ask: what are the antecedents of age? The only sensible answer is: there are none. Until they die, everyone keeps getting older, at exactly the same rate, no matter what their other characteristics are. None of the ‘other determinants’ of happiness affect how old someone is, or the rate at which they get older (in the usual numerical sense). Some characteristics or situations might have an impact on mortality/lifespan—but for people who are still alive (and are thus available to participate in surveys) those variables have no impact at all on their age. There are no controls for which W → X.
What is worse, we are decidedly in the realm of X → W. Ageing has an impact on a wide range of other aspects of people’s situations. An especially relevant one here is marital status. As people get older, they are more likely to experience the death of a spouse/partner—an event with negative consequences for their happiness. If we control for marital status when estimating age → happiness, we will obscure a portion of age’s impact on happiness, the portion that travels via age’s impact on marital status. The situation is analogous to the example above (unemployment → income → happiness).
For estimation of age → happiness, we are in a situation that is open to misunderstandings. No controls are needed to estimate that effect. (We still need a discussion of cohort and period; see below.) Blanchflower (Reference Blanchflower2021) and Blanchflower and Oswald (Reference Blanchflower and Oswald2009), responding to Glenn’s (Reference Gerstorf, Ram, Estabrook, Schupp, Wagner and Lindenberger2009) critique of Blanchflower and Oswald (Reference Blanchflower and Oswald2008) on this point, say that a model with no controls is (merely?) a ‘descriptive’ analysis. They argue for use of controls to achieve a ‘ceteris paribus analytical’ finding. This rhetoric leads us astray in this context. What we see here is best understood as a special case. For most causal estimations, it does make sense to include some control variables in one’s model (the ones where W → X). But when there are no antecedents of X, no controls are needed for an estimate of X → Y (that is what makes this case ‘special’). A specification of this sort does not make a model ‘descriptive’; it makes it correct—especially relative to a model that includes controls where X → W. To make sense of the idea of ‘ceteris paribus’, we need to ask: which other variables are being held constant, and why? Inclusion of inappropriate control variables does not yield a ‘ceteris paribus analytical’ model that is superior to a model containing no control variables, in situations where there are no antecedents of X.
From a different angle, an analysis that uses controls is not incorrect; it simply offers different information. If we include a control where X → W, we get a result for X → Y that can be interpreted as a ‘direct effect’. Some might say that an analysis of age → happiness that includes controls gives us a result reflecting a ‘pure’ impact of age. But we should have clarity on what a result of that sort has been ‘purified’ of. What we can see from the discussion above of marital status is that a direct effect is not net of the impact of ‘other’ variables—it is net of the part of the effect of age itself. Results from an analysis that creates direct effects should be carefully articulated in those terms; it is not the same as a ‘total’ effect—so, it is not the effect of age.
I demonstrate below the consequences of using inappropriate controls—but I turn first to a discussion of whether it makes sense to restrict the analysis to respondents younger than 70.
2.2. An age restriction?
Blanchflower (Reference Blanchflower2021) offers a global analysis, covering 145 countries. Life expectancy is of course lower in some countries, relative to life expectancy typical in very wealthy countries. To ensure consistency of results across a broad range of countries (and in consideration of the small sample sizes available for older people in some countries), Blanchflower restricts his analysis to respondents younger than 70.
This methodological decision is by no means ‘incorrect’. It is however consequential. We can draw on existing research to predict the likely consequence. As against a ‘ushape’ pattern, some analysts (e.g. Beja, Reference Beja2018; Brockman, 2010; Frijters and Beatton, Reference Engman2012; Gerstorf et al., Reference Gangl2008) find an ‘sshape’ pattern: happiness is higher among younger people, declines towards middle age, rises as people get older, but then declines again as people get very old and start to experience significant challenges (declining health, widowhood, etc.—see e.g. Hudomiet et al., Reference Hellevik2020). An analysis that includes not just an age term and an agesquared term but also an agecubed term reveals that pattern in some instances.
If (or where) that pattern prevails, then restricting the upper age bound to 70 is likely to increase the impression of a ushape, relative to an analysis that does not impose that restriction. The argument for the restriction is sensible, in the context of an analysis that covers a very broad range of countries. But we should be mindful of the impact the decision will have for countries where people live for substantial periods past the age of 70. The decision is not ‘wrong’ in general, but if we want to know whether (and to what extent) age → happiness is ushaped in wealthy countries we likely have reason to prefer an analysis that does not impose an upper age restriction.
2.3. Demonstration
I now show the consequences of these two methodological decisions in Blanchflower (Reference Blanchflower2021), via an analysis that uses the same (quadratic) functional form. I focus for now on one country: Germany (selected simply as a typical example, to demonstrate patterns that are also readily apparent in other countries). I draw on data from the European Social Survey (rounds 1 through 8, corresponding to 2002–2016; see Jowell, Reference Hudomiet, Hurd and Rohwedder2007). The survey mode is facetoface, using random probability selection methods at each stage of the multistage design; average response rate for the participating countries in the most recent round is 55.4 per cent (rates are taken directly from the survey project website). Happiness is drawn from a question asking ‘Taking all things together, how happy would you say you are?’, with 11 available response options (0 through 10). Age is given in years; including an agesquared term as well gives us the usual functional form. All models include a period variable (‘round’ from the ESS). For the model that includes control variables, we have sex (male vs. female), education (five categories), marital status (six categories including ‘other’) and labour force status (‘main activity’, eight categories, with community/military service condensed into ‘other’ on grounds of very small numbers). All models include the design weights offered with the dataset.
Table 1 starts by presenting Blanchflower’s own estimate for Germany, drawn from a model that includes the indicated controls. I present my own version of that model (2) as a replication; the coefficients are very close to Blanchflower’s. The third model removes the indicated controls (keeping the period variable). The first three columns use the restriction of age < 70; the fourth column removes that restriction.
Note: All models contain a control for period (survey year).
Comparing Model 2 to Model 3, we see that removal of the controls cuts the age and agesquared coefficients in half (a pattern very much in line with results in Frijters and Beatton, Reference Engman2012). On the basis that the pattern describing the controls is X → W, we can say that the controls induce significant bias in the estimation of age → happiness (i.e. in the results from Model 2).Footnote ^{1}
Comparing Model 3 to Model 4, we see that removal of the age < 70 restriction has a further substantial impact on the age and agesquared coefficients: they are cut in half again. The word ‘bias’ is perhaps less suitable as a way of characterising the differences between the two sets of results. But given that many people in Germany live rather longer than 70 years, it seems more sensible to model the age → happiness relationship via an analysis that considers the full age span. In that analysis, the extent of ushape is smaller (with coefficients closer to zero). Via both adjustments, the coefficient for age has been reduced by 80 per cent, and the coefficient for agesquared has been reduced by 79 per cent.
To appreciate further the consequences for ‘ushape’, in figure 1, I plot the curves from the different model results. The dashed line comes from the model containing control variables and excluding people 70+ (corresponding to Blanchflower’s analysis); it shows the ‘deepest’ ushape. The dotted line, from the model excluding controls, is shallower. The solid line, with the age restriction removed, is shallower still.
To estimate age → happiness for Germans, in my view the model without control variables and without the age restriction is preferable to the models with controls and/or with an age restriction, for the reasons given above. It is of course not a ‘perfect’ model; in fact I will shortly argue that it has certain disadvantages and we should do something different. But the ‘something different’ builds on the perspective that says Model 4 is better than Models 1, 2 and 3.
Table 2 presents models equivalent to Model 4 above for almost all the European countries for which ESS data are available. (For Croatia and Luxembourg the available data are limited, coming from only two adjacent rounds; this timerange does not give good leverage in connection with the need to control for period and cohort effects.) In Blanchflower’s analysis of ESS data, virtually all of the age and agesquared coefficients come with tstatistics larger than 1.5, the threshold he identifies for statistical significance (the only exception was the age coefficient for Poland). In table 2, countries for which tstatistics are greater than 1.5 (for both variables) are shaded, to indicate instances where a ushape can still be identified via this threshold (in models that exclude controls and include people older than 70). For 23 countries, a conclusion reached via statistical significance leads to identification of a ushape. For 7 of the 30, however, the finding of ushape is no longer supported. This is our first indication that the ushape is not found ‘everywhere’.
Note: The models contain a control for period (survey year). Shaded countries are those where a ushape is evident, via T > 1.5.
Where ushapes are evident, in each instance the curve is shallower, indicating a smaller reduction of happiness in middle age and a smaller rise afterwards. The final two columns show the per cent reduction in the size of coefficient, relative to models (not shown) that include controls and impose the restriction of age < 70. On average, the reduction for age is 66.4 per cent, and for agesquared 77.2 per cent. In a few instances (Austria, Israel and Italy) the per cent reduction for change in the agesquared coefficient is more than 100 per cent, which indicates that the sign of the coefficient has changed. For those three countries the coefficient for agesquared is now negative—indicating that happiness in those countries does not rise after middle age but instead continues to decline (though only for Italy is the agesquared coefficient statistically significant). For Denmark, in contrast, the sign of the age coefficient changes from negative to positive; in that country happiness appears to increase across the entire lifecourse (another form of departure from ushape), though here as well not ‘significantly’.
2.4. Beyond the quadratic functional form
An estimate with a defined functional form (e.g. linear or quadratic) can be useful—if the underlying relationship is effectively captured via the function. A function is then a useful simplification, a way of conveying information in a single number (or, perhaps, two numbers, as with a quadratic function). With a small number of quantities to evaluate, it is then also more straightforward to draw conclusions via hypothesis tests (i.e. p < 0.05).
Whether such an estimate is consistent with the underlying relationship should be checked, via comparison to an estimate that does not impose the function (compare Bittman, Reference Bittmann2021). (Simonsohn, Reference Simonsohn2018 offers specific cautions against use of a quadratic function to detect ushapes.) A model specifying a functional form might fit the data poorly but still yield results where p < 0.05. A risk of that sort can be exacerbated simply by virtue of having a large sample. These points merit emphasis: with a large sample it is easier to get results where p < 0.05, even when the specified function does not accurately portray the actual pattern in the data.
Blanchflower (Reference Blanchflower2021) recognises the point, offering (in his figure 7) an evaluation of age → happiness (with ESS data) that uses a series of age dummy variables instead of the quadratic function. The figure (which includes respondents older than 70) presents a distinct ushape, with average happiness declining from 8.55 at age 18 to 7.60 at age 53 and then rising above 8 as respondents move into their 80s. It appears to offer strong confirmation of a ushape for Europe.
There are two important observations to make about that figure. First: it comes from a model that includes control variables—so, the results are biased (i.e. away from zero, thus with a deeper ushape). And second, it covers all 32 participating countries together. It thus tells us that age → happiness is ushaped for Europe as a whole (though in part that conclusion comes via use of controls where X → W), but it does not tell us that age → happiness is ushaped for all of those countries (taken separately).
I therefore proceed to a countrylevel evaluation of age → happiness that dispenses with a quadratic function and uses age ranges. An analysis using age ranges is useful for an additional reason, beyond evaluation of the findings from quadratic models. Any model of an ‘ageing’ effect has to consider the challenge that arises from the confounding of age (A) with period (P) and cohort (C). An ‘APC problem’ is rooted in the fact that any one of those terms forms a perfect linear combination of the other two (A + C = P, etc.), when each is specified in year units (see de Ree and Alessie, Reference Pearl and Mackenzie2011 for a cogent discussion in the context of age and subjective wellbeing).
To put the point in more substantive terms: when we have a coefficient for age from a model that does not include terms for period and cohort, we would have to consider alternative interpretations. Does the age coefficient partly capture changes that are taking place over a certain period in time, for all, perhaps because of specific events with broad impacts (e.g. a pandemic or financial crisis)? Does the age coefficient partly reflect the fact that older (vs. younger) people were born during a certain era and thus had different formative experiences? To identify a distinct age effect, we need to disentangle age from period and cohort. These are the needed control variables.
Attempts to address the APC problem have produced a wide range of disparate approaches, each attempting to ‘break’ the linearity of the APC combination while controlling effectively for cohort and period. For a while, a ‘hierarchical ageperiodcohort’ (HAPC) approach using a crossclassified randomeffects model (CCREM) seemed promising (Yang, Reference Yang2008; Yang and Land, Reference Yang and Land2008). In this approach, terms for period (in years) and cohort (in birthyear ranges) are entered as random effects, which helps enable model identification.
In recent years, however, there is growing awareness of some potentially important technical drawbacks (O’Brien, Reference Luo and Hodges2017). The model can ‘shrink’ the variation assigned to one of the random effects (typically cohort), inflating the apparent age effect (Bell and Jones, Reference Bell and Jones2018; Luo and Hodges, Reference Laaksonen2020). This technique is not a ‘silver bullet’. There likely is no silver bullet; instead, we are again enjoined to construct models using alternate approaches and then evaluate them substantively, perhaps using ‘side information’ (Ekstam, Reference Easterlin2021).
Here is where we see another advantage of using a model that includes age ranges instead of a quadratic function. In this approach, we can enter cohort and period also as fixed effects, foregoing random effects (Ekstam, Reference Easterlin2021) and thus avoiding the ‘shrinkage’ problem. While still acknowledging that no approach is ‘ideal’ on technical grounds, a model along these lines is arguably the most conservative in a technical sense, avoiding highly complex algorithms for calculation that are still not well understood.
In table 3, then, I present bycountry estimates that include age in ranges (15–34, 35–59, 60–74 and 75+), factor variables for period and cohorts aggregated into 5year ranges. (Ekstam, Reference Easterlin2021, shows that it is possible to use single birthyear variables for cohort. Taking that approach instead makes no difference to the results presented in table 3.) The logic of the age ranges is intended to correspond to the substantive idea behind ‘ushapes’: happiness should reach a low point in midlife (35–59), rising as people get older (but then perhaps declining again as people become very old—this is the reason to distinguish between 60–74 and 75+). If age → happiness is genuinely ushaped, then that relationship should appear in this approach as well. I extend this approach further below using a set of narrower ranges, with results presented in visual form.
Note: The reference category is people aged 35–59. Models include variables for period (survey year) and cohort (5year ranges).
In this table, the age range 35–59 is the reference category. To discern support for a ushape, then, we would need to see positive coefficients for the 15–34 range and the 60–74 range. In instances where that pattern is evident, we can conclude that happiness declines towards middle age and subsequently rises (in line with the idea behind the ‘ushape’).
In table 3, country names are again shaded where the patterns are consistent with the idea of ushapes, here using a threshold of T = 1 (a more generous standard than the 1.5 threshold used by Blanchflower). The conventional pattern is evident for only seven countries: Austria, Switzerland, Luxembourg, Norway, Poland, Portugal and Russia. For the rest, we do not see support for a ushape in conclusions reached via consideration of statistical significance (the T statistic).
All analyses come with a mix of advantages and disadvantages. A key advantage for table 3 is that the models effectively control for any cohort effects. A key (and indeed obvious) disadvantage is that the age ranges might be too large to capture important trends. I therefore now turn to an analysis that uses a set of narrower ranges (using a width of 10 years: 15–24, 25–34, etc.). With a larger number of narrow ranges, it becomes harder to draw conclusions via statistical significance. But statistical significance is overrated anyway (Cohen, Reference Cohen, Harlow, Mulaik and Steiger1997; Engman, Reference Ekstam2013). So, in figure 2, I present results in graphical form, so that we can discern visually whether ushapes are evident, in part via consideration of the extent to which happiness declines and then rises.
It is impossible to see ushapes as a universal pattern in figure 2. (The groupings in that figure are not meaningful; they are mostly alphabetical, but certain exceptions were made to avoid crowding/overlap in the lines.) Inevitably, a visual judgement is subjective, to some extent; here we lack the (potentially misleading) simplicity of decision by an asterisk (p < 0.05). In subjective terms, then, ushapes are reasonably evident for Austria (but happiness initially rises), Switzerland, Denmark, Germany, Greece, Spain, Cyprus, Great Britain, France, Iceland (with however a notable decline after 75), Israel, Netherlands, Norway, Poland, Russia, Slovenia, Sweden and Ukraine (really more of a W). It is particularly difficult to discern ushapes for Bulgaria, Czech Republic, Estonia, Finland, Hungary, Ireland, Italy, Portugal, Slovakia and Turkey. For Belgium, Hungary and Lithuania, I am only willing to say: perhaps.
It is worth reiterating where the lines in these figures come from. They are from OLS models built around ranges of age, where the models contain controls for period and cohort. There are of course no other control variables, again because none are needed (as there are no antecedents of age). With controls for period and cohort, a crosssectional comparison of people in one age range to people in another (older) range gives a reasonably good indication of what happens to people’s happiness as they get older. (In the next section, I offer a necessary caveat.) In general a crosssectional comparison of this sort would be vulnerable to the possibility that people with different values of X are different in other ways that actually generate the patterns we see in an unadjusted relationship. But in this context, where X is age, there are no other variables (apart from period and cohort) that could generate that relationship, because there are no antecedents of age. The patterns in the figures, then, are reasonably interpreted as the causal effects of ageing on people’s happiness (again because they are net of period and cohort effects).
To offer a numerical consideration of effect size, in table 4, I present the modeladjusted levels of life satisfaction in each age range by country (the numbers that give us the patterns in figure 2). In the final three columns, we see the maximum and minimum for each country, followed by the difference. Where there are ushapes, we can then gauge their magnitude. The largest differences are found for countries that most obviously do not have ushapes (Turkey, Slovakia, Portugal and Czech Republic). If we exclude these and also the other countries where (in my subjective judgement) ushapes do not appear (Bulgaria, Estonia, Finland, Ireland and Italy), we can then calculate an average: 0.44. This number gives us a generous indication of how ‘deep’ the ushapes in Europe might be. It is generous because it is only the difference between the maximum and the minimum; it does not tell us that on average in Europe happiness falls by 0.44 points and then rises by 0.44 points—it only tells us that it does one or the other. In Blanchflower’s analysis of ESS data (figure 7), the depth of the ushape curve for Europe is almost a full point on the 11point scale—but here we see a number less than half of that.
2.5. A deductive consideration of (hypothetical) longitudinal results
The analyses in this article are crosssectional. (The same is true of the analyses in Blanchflower’s, Reference Blanchflower2021 article.) Use of crosssectional data is inevitable in many instances; not all countries have panel datasets (and in some instances the panel data that are available do not contain questions on happiness). When crosssectional data are used, it is then important to consider angles that the analysis cannot address.
In research comparing crosssectional and longitudinal results (e.g. Frijters and Beatton, Reference Engman2012; Kratz and Brüderl, Reference Kassenboehmer and HaiskenDeNew2021), the introduction of individual fixed effects (when using panel data) leads to results that indicate a (steeper) decline in happiness as people move through old age. Use of fixed effects corrects for a potential sample selectivity pattern, that is, the likelihood that panel datasets oversample happier older people (and naturally omit older people whose unhappiness might have contributed to earlier mortality). Frijters and Beatton (Reference Engman2012) also suggest that panel datasets might undersample happier middleage people; the samples would then give more weight to middleage people who are relatively unhappy. In both respects, an analysis that does not correct for these selectivity biases would increase the impression of a ushape, especially by indicating a postmiddleage increase in happiness that is not ‘real’.
With crosssectional data, it is of course impossible to correct for bias rooted in these patterns. Investigations that work in a crosssectional mode, then, must reflect on their results in these terms. The crosssectional results presented above likely overstate any upward change in happiness as people move through old age; the real change is likely to be lower (i.e. more negative). For countries where results above appear to support a ‘ushape’ finding (because happiness appears to rise after middle age), a note of caution is therefore required: this pattern might disappear in a longitudinal analysis. Correction for selectivity (if possible, via use of individual fixed effects) would likely reveal a decline in happiness in older age (or, at least, a lesspronounced increase).
3. Conclusion
A finding of ‘ushapes everywhere’ is not robust to reasonable alternative methodological decisions—at least not for European countries. An analysis using different methodological decisions produces results that for some countries depart (sometimes substantially) from the idea of ushapes.
There are good reasons to prefer those methodological decisions over the decisions that produced the results in Blanchflower, Reference Blanchflower2021. To estimate the age → happiness effect, we are best served by an analysis that meets the following criteria:

1. Use of the full range of adult ages; if happiness tends to decline in very old age (when people are likely to experience significant challenges), then a restriction on the upper bound will inflate the coefficients in a quadratic age function.

2. No controls for individual circumstances, where X → W; if controls of that sort are included, the result is likewise inflation of coefficients in a quadratic function.

3. Inclusion of controls for cohort and period, so that coefficients for age variables do not reflect underlying change for all and/or the formative experiences associated with when people were born.

4. Evaluation of results using a specified functional form via comparison to less restrictive analyses (in part so that conclusions take effect size into account and are not unduly driven by hypothesis testing, sample size, and p < 0.05).
In general, a substantive question is usefully explored not just via one particular analytical approach but via several. Ideally, the results would be consistent across the different approaches (in the mode of ‘robustness checks’). Here the results are not consistent in that way. So, we have to consider the relative merits of different approaches.
The analysis above that meets the four criteria most effectively comes with the results presented via figure 2 (and table 4). The quadratic coefficients in table 2 impose a restrictive functional form (giving too much weight to hypothesis tests and p < 0.05) and do not include a control for cohort. The less restrictive results presented in table 3 (where a control for cohort is used) use a set of crude age ranges. Insofar as those results show ushapes in roughly onequarter of the countries analysed, however, there is no good reason to dismiss those findings. The potential problem with crude age ranges is that they might fail to detect ushapes in other countries.
That possibility is realised only to a limited extent in the results presented in figure 2. Those results are of course not without their own limitations. They do not include a consideration of statistical significance, which might be considered by some as a limitation (though it might also constitute a strength). If it is a limitation, then it is reasonably judged a less important limitation than the problems identified for the other approaches (the use of control variables where X → W, in particular, which leads to substantial bias). A reasonable summary of the findings presented above tells us that there is reasonably consistent evidence for a ushape in 16 countries: Austria, Switzerland, Germany, Denmark, Spain, France, Britain, Israel, Iceland, Netherlands, Norway, Poland, Russia, Sweden, Slovenia and Ukraine. (Again, however, a longitudinal analysis might undermine that conclusion for at least some of these countries.) For 9 of the 30, the evidence weighs against a ushape (Bulgaria, Czech Republic, Ireland, Italy, Portugal, Slovakia and Turkey). For the remaining five, the various methods used here do not lead to a consistent finding.
I therefore conclude that we do not see ushapes ‘everywhere’ in Europe. Figure 2 indicates a variety of patterns, some of them looking nothing like a ushape. Tables 3 and 4 offer a reasonable indication that a ushape describes the age → happiness relationship in some European countries, though by no means for all (or even most). Where ushapes exist, they are also substantially ‘shallower’ than is portrayed in Blanchflower (Reference Blanchflower2021). Whether the alternative methodological decisions used here would lead to a similar story in other world regions (and for other dependent variables) is left to future work. A similar outcome does not seem unlikely.
The debate about age → happiness ushapes has raged for an extended period. That debate will no doubt continue. Ideally, it would continue not simply via presentation of a new set of results but via consideration of why one set of results differs from another. A resolution can come (if at all) only via that sort of comparative evaluation. The key strength of this article is not just the findings per se but the use of detailed methodological arguments for preferring these findings over others.
Acknowledgement
I am very grateful to David Ekstam for insightful comments on an earlier draft.
Conflicts of interest
The author declares none.
Data availability statement
Data from the European Social Survey are available at www.europeansocialsurvey.org. The R syntax/code for this article is available on request and at https://osf.io/3axq6/?view_only=0c55ece4430148f8a77a2522975fda79.