A major part of the 2010–15 UK government's education reforms in England was a focus on the curriculum that pupils study from ages 14–16. Most high profile was the introduction of the English Baccalaureate (EBacc) performance measure for schools. Since schools were now judged on the proportion of pupils getting a “good pass” in the subjects that made up this measure, it incentivised schools to encourage pupils to study this set of “subjects the Russell Group identifies as key for university study” (Reference GibbGibb, 2011). Young people's parents also see the choices their children are making at this point in time as important, with 93 per cent of the parents of the Next Steps participants saying they see subject choices at age 14 as “very important” or “fairly important” for the educational options their offspring will have open to them subsequently.
However, there does not appear to be good quantitative evidence about the importance of studying a complete set of subjects, per se. Indeed, concern has been expressed in some quarters that a particular focus on a set of subjects such as the English Baccalaureate (EBacc) might ‘crowd out’ other subject combinations, such as a full set of separate sciences, that are also potentially important for individuals' future educational opportunities. Young people's subjects of study from age 14 may have important consequences for future academic and labour market outcomes, since they affect, in turn, the qualifications to which they can easily continue in post-compulsory education. Choosing the ‘wrong’ set of options at this point may have long-term consequences (Reference IannelliIannelli, 2013). This is a particularly important issue in an English context, where specialisation of the curriculum occurs earlier than in many other countries (Reference Hodgson and SpoursHodgson and Spours, 2008).
This paper provides new evidence on this issue. It borrows techniques from the programme evaluation literature to consider whether young people who study the full set of subjects required for EBacc-eligibility between ages 14 and 16 have different probabilities of applying to university, entering university and attending a high-status university. It also examines the same issue for sub-elements of the EBacc: studying for two or more science qualifications (either separate or combined awards), studying a foreign language, and studying History or Geography (we do not consider English or Maths since these are mandatory). By way of contrast, we also consider differences by whether individuals study for any ‘applied’ GCSEs, which, it has been argued, provide less effective preparation for future university study.
In particular, this paper contrasts purely descriptive differences in outcomes to those from flexible regression adjustment and matching approaches. These attempt to compare individuals very close to the margin of studying each full set of subjects, adjusting for observable differences in a highly flexible manner, taking advantage of rich survey data from a recent cohort of young people in England. As such, the estimates from this method demonstrate just the impact of having studied the full combination of subjects, rather than of the cumulative changes from overall differences in curriculum. To provide context for these results, we also produce estimates of the change in probability of the university outcomes associated with a continuous change in the academic selectivity of subjects studied and conditional on the same set of observable characteristics in the main analysis.
The paper proceeds as follows. Section 2 describes the background to the issue. Section 3 introduces the dataset used, highlighting the rich survey data available to do the best possible job of matching; in doing so, it explores the proportion of young people proceeding to university by whether they took the combinations of subjects we consider. Next the methods are introduced in Section 4, covering both the regression analysis and matching approach. Further details on the construction of the matched samples are reported in Section 5, including an assessment of the balance on observable characteristics. The results, contrasting university application, university attendance and attendance at a high-status university before any adjustment, using regression analysis, and using a matched sample, are reported in Section 6. We report comparative results for the differences associated with a continuous change in the academic selectivity of subjects studied in Section 6.1. Finally, Section 7 concludes.
The importance of the subjects that young people study while at school for their chances of progressing to Higher Education (HE), in general, and highly selective HE institutions, in particular, has increasingly attracted the attention of policymakers. Most notably this has manifested itself in the UK Government's introduction of the English Baccalaureate school performance measure at age 16 (Reference GibbGibb, 2011). Understanding subject choice at age 14 is important to help young people make the choices that will assist them in achieving their future plans.
The policy attention stems from a concern that young people are making subject choice decisions (or being channelled towards decisions) that are reducing the probability of progressing to Higher Education. Indeed, previous work has suggested that when high achieving young people from less advantaged backgrounds are provided with more information on how best to prepare for university applications their decisions improve (Reference Borghans, Golsteyn and StenbergBorghans et al., 2013; Reference Hoxby and TurnerHoxby and Turner, 2013). Although these previous studies cited did not specifically cover advice about subject choice, the same logic of improving educational decisions is applicable. Choosing the ‘wrong’ curriculum at this point may have long-term consequences in terms of occupational status acquisition (Reference IannelliIannelli, 2013); educational progression seems one plausible mechanism for this. In a different context, evidence from Belgium suggests that subject choice has an influence on the gender gap in the labour market (Reference Duquet, Glorieux, Laurijssen and Van DorsselaerDuquet et al., 2010).
Age 14 is the first point at which young people have a direct choice about the curriculum they receive, although there may of course be some earlier indirect influence through secondary school choice. It is also a point at which all young people are still in compulsory education for two more years. As such, it seems a sensible period in which to study the decisions and subsequent actions of young people. Unlike studying post-16 subject choices, there remains something of a common core to the curriculum, allowing a focus on how choices about noncompulsory subjects seem to affect future plans.
However, previous work has highlighted that there are important and complex patterns in the subjects that individuals study during this age range (Reference Henderson, Sullivan and AndersHenderson et al., 2016). Three particularly important characteristics in explaining subject choices at this age are gender (Reference BellBell, 2001; Reference FrancisFrancis, 2000; Reference Jin, Muriel and SibietaJin et al., 2011; Reference Sullivan, Zimdars and HeathSullivan et al., 2010), prior attainment (Reference Davies, Telhaj, Hutton, Adnett and CoeDavies et al., 2008; Reference Jin, Muriel and SibietaJin et al., 2011) and socioeconomic background (Reference Davies, Telhaj, Hutton, Adnett and CoeDavies et al., 2008; Reference Jin, Muriel and SibietaJin et al., 2011). Coming at this from a different angle, Reference Abbott-Chapman, Eastehope and O'ConnorAbbott-Chapman et al. (1995) point out that, due to the correlation between the two, subject choices may be used as a surrogate for information about ‘ability’ in an Australian setting.
Reference Davies, Telhaj, Hutton, Adnett and CoeDavies et al. (2008) found that ‘ability’ has the strongest influence on subject choice but for some subjects social class exerts more of an effect than gender. Reference Jin, Muriel and SibietaJin et al. (2011) find that girls are more likely to study a modern foreign language at school and less likely to study all three sciences separately; these associations remain after taking into account prior attainment. Furthermore, those with more educated parents are more likely to study two or more sciences and to stay on in full-time education after Year 11; however these effects are not significant after controlling for prior attainment. Since all these factors seem likely to affect university entry in their own right (Reference AndersAnders, 2012), it is important to take them into account in this paper's analysis.
We focus explicitly on whether individuals study for combinations of subjects, rather than their attainment in these subjects. While attainment at age 16 is undoubtedly highly important for whether individuals gain access to university (Reference AndersAnders, 2012), this paper explores whether there are distinct effects from subject choices in and of themselves. This has been considered before, but previous work on the issue has tended to focus on the more proximal influence of subject choice post-16. For example, in their exploration of racial inequality in university entry, Reference Noden, Shiner and ModoodNoden et al. (2014) note that differences in subject of study post-16 appear to affect university entry. By contrast, Reference Dolton and VignolesDolton and Vignoles (2002) explore the importance of studying a diverse curriculum at the same phase of education for returns in the labour market, finding little evidence of this.
However, since subjects available to individuals at age 16 depend on those that have been studied before this point, it is of interest to explore whether there are consequences of subject choices at age 14 that flow through to these same later outcomes. Reference De PhilippisDe Philippis (2017) provides evidence on a different aspect of subject choice at age 14, exploiting variation in the timing of a reform that increased incentives for English schools to offer “triple science” to identify the causal effect of taking this course on university attendance; she finds evidence of a small increase in university attendance, but which is only significant at the 10 per cent level.
This paper uses Next Steps (a representative longitudinal survey of young people in England) in order to explore these questions. Next Steps follows a cohort of young people born in 1989–90 from age 14 through to age 20. The survey includes annual interviews throughout with the young people themselves, interviews with their parents (for the first four years), and linked administrative data about young people's academic attainment (from the National Pupil Database, discussed below). Using the responses from the parental questionnaires provides high quality data on young people's socioeconomic background, based on questions about family income, parental education, and occupational status.
Importantly for this work, it also includes self-reported information on subjects that young people are studying at age 14 (academic year 2004/5–2005/6). We use these to generate the subject choice classifications that we use as ‘treatment’ variables, of which we attempt to assess the intrinsic importance for university outcomes.
We consider the importance of studying the full set of subjects required to be eligible for the English Baccalaureate (EBacc). For a pupil to count towards their school's EBacc measure they must achieve a C grade or above (often referred to as a ‘good pass’) in the following GCSE subjects: English, Mathematics, History or Geography, two sciences and a Modern or Ancient Language. However, the introduction of this performance measure comes after the cohort we consider took their GCSEs. This strengthens our approach since it eliminates the possibility that individuals took these subjects specifically in order to achieve the EBacc, which may increase any selection issue; constructing an indicator of studying EBacc subjects artificially for this cohort should give us a clearer estimate of whether studying the required subjects improves university entry chances in and of itself. We construct a binary measure according to whether pupils study the full set of subjects that would make them eligible for the EBacc if they a) go on to achieve at least a grade C in all of them and b) were in a later cohort when the measure had been introduced. We find that one third of the sample studied subjects that would have made them eligible for the EBacc in later years.
We also consider whether individuals study specific elements that make up the EBacc. We assess whether individuals study two or more sciences, i.e. two of Physics, Chemistry and Biology as separate subjects or a combined ‘double’ award in sciences during this period. Previous work on the importance of subject choice during secondary school has focussed on whether young people study Science, Technology, Engineering and Maths (STEM)-related subjects (Reference Tripney, Newman, Bangpan, Niza, MacKintosh and SinclairTripney et al., 2010; Reference CodiroliCodiroli, 2015), particularly reflecting concerns about a gender gap in uptake of such subjects, although Reference CodiroliCodiroli (2015) highlights that this may not be the case among individuals from advantaged backgrounds. Particularly for science subjects, it seems plausible that universities are likely to prefer individuals who have taken these more detailed tracks. Just under a third (30 per cent) of the sample report studying for at least two separate sciences or a double award.
We also consider whether individuals study foreign languages. We only consider the main languages studied in English secondary schools: French, German, Italian and Spanish. In the data, all other subjects are simply encoded as ‘Other’ and we wish to exclude those who study for a qualification in their first language, which often makes up a majority of those studying such qualifications (Reference Vidal RodeiroVidal Rodeiro, 2009). This cohort was one of the first for whom studying a language to age 16 was no longer compulsory; nevertheless, 60 per cent study one of these main languages during this period.
Finally, from the components of the EBacc, we compare individuals who study History or Geography with those who do not. Almost two thirds (64 per cent) of the sample do so and, in common with other elements of the EBacc, they are generally more advantaged and have higher prior attainment than their peers who do not do so.
As something of a comparison, we consider whether individuals studied for any applied GCSEs. These were introduced in the 2002 Education Act, as part of a policy to increase the diversity of the 14–19 curriculum. However, this policy has since been criticised, with some of these qualifications having their equivalence to GCSEs in performance tables downgraded since this period. 42 per cent of the sample studied for at least one applied GCSE; those who did so tended to be less advantaged and have lower prior attainment than those who did not.
Wave 7 of Next Steps covers young people aged 19–20. Hence the data allow us to model the entry to university through what might be thought of as the ‘traditional’ route, going from further education to university, either the same year or after a single gap year. While this includes the majority of those who attend university, later entrants would not be represented. The exclusion of this potentially interesting subpopulation should be noted; in particular, it could affect the results if subjects studied at GCSE are associated with later entry to university. We also consider entry to a Russell Group institution; the Russell Group is a group of 20 research-intensive UK institutions,Footnote 1 which are often considered to be amongst the most prestigious universities in the UK.Footnote 2
Next Steps includes a rich set of data measuring young people's socioeconomic status (SES), including household income, parental education, and parental occupational status, all of which are important in measuring SES (Reference HauserHauser, 1994). Household income is measured at each wave between 1 and 4. As previous research has suggested ‘permanent’ income (rather than transitory income) has a much larger effect on young people's educational outcomes (Reference Jenkins and SchluterJenkins and Schluter, 2002, p.2). An approximation of the household's equivalised ‘permanent’ income is made by averaging across these four measures and dividing by the square root of household size. Previous work suggests that Next Steps underestimates household income to some extent, relative to social surveys where it is a major focus (Reference AndersAnders, 2012).
Parental education also captures an important aspect of socioeconomic status, perhaps because it “may alter the ‘productivity’ of [parents'] time investments in children” (Reference Ermisch and PronzatoErmisch and Pronzato, 2010, p.1); a number of studies have found evidence of a causal impact of parents' education on children's educational outcomes (Reference ChevalierChevalier, 2004; Reference Ermisch and PronzatoErmisch and Pronzato, 2010; Reference Havari and SavegnagoHavari and Savegnago, 2014), making it an important factor to take into account. Similarly, social class is seen by sociologists as a key element of an individual's SES (Reference Goldthorpe and McKnightGoldthorpe and McKnight, 2004), in particular as “young people (and their families) have, as their major educational goal, the acquisition of a level of education that will allow them to attain a class position at least as good as that of their family of origin” (Reference Breen, Yaish, Morgan, Grusky and FieldsBreen and Yaish, 2006, p.232). Parents' occupational status is recorded in Next Steps using the National Statistics SocioEconomic Classification (NSSEC), which aims to capture social class differences between occupational types (Reference Rose and PevalinRose and Pevalin, 2001).
The top panel of table 1 demonstrates that there are large differences in university application and attendance by the subjects that young people have studied. While 60 per cent of the sample apply to university (attend university), 78 per cent (67 per cent) of those that studied EBacc subjects, 67 per cent (55 per cent) of those that studied two or more sciences, 66 per cent (53 per cent) of those who studied History or Geography, and 69 per cent (58 per cent) of those that studied any languages did so. Only 49 per cent (36 per cent) of those that studied any applied subjects did so. Similarly, while 11 per cent of the sample go on to attend a Russell Group university, almost twice as many who studied the EBacc do so; by contrast, nearly half as many who studied any applied subjects do so. A somewhat higher proportion go on to attend a Russell Group university if they studied two or more sciences (14 per cent), studied a foreign language (16 per cent), or studied History or Geography (14 per cent).
However, there is no indication that such differences can be interpreted as in any way causal. There are many differences in the characteristics of individuals who study these subject combinations, as can be seen in the lower panel of the same table and was explored in more detail by Reference Henderson, Sullivan and AndersHenderson et al. (2016). In general, we can see that individuals who study the full set of subjects required to be eligible for EBacc are, on average, from households with higher incomes than those who do not, scored higher in tests at age 14, are more likely to be in a selective school, and have parents who progressed to higher levels of education. All of these are plausibly important for explaining young people's increased probability of applying to and attending university and attending a Russell Group university (Reference AndersAnders, 2012). The same broad pattern is evident for studying two or more sciences, for studying History or Geography, and for studying foreign languages, while the opposite is the case among those studying for any applied GCSEs.
In order to make more meaningful comparisons of the probability of attending university depending on subject choices we wish to compare individuals who did or did not study these subjects, but who are similar in terms of other relevant characteristics.
This section discusses our analytical strategy for comparing individuals' probabilities of going to university depending upon their subject choices, while taking into account that these individuals may well differ in other important respects.
We take two main approaches, first applying binary choice regression modelling of our outcomes of interest, then using propensity score matching approaches to account more flexibly for differences in background characteristics. These methods both have advantages and disadvantages relative to one another, which we highlight in the following discussion.
4.1 Regression analysis
Regression modelling is a well-established method for estimating the association between a treatment variable and outcomes of interest, holding other background characteristics constant. Its advantage is that it provides an estimate of the treatment across the sample; however, this is also a disadvantage in that it may be extrapolating beyond the sample for which the data can provide us with reliable evidence. It also relies upon the regression equation adequately describing the relationship between independent and dependent variables.
Since our outcomes of interest are dichotomous, we estimate linear probability regression models (analyses using probit models do not give qualitatively different results). We use the following regression specification, recommended by Reference Imbens and RubinImbens and Rubin (2015), to estimate difference in outcomes conditional on a vector of controls, X, listed below:
Where Y is a binary indicator of whether individuals achieve our outcome of interest and Treat is a binary indicator of our subject combinations. In this regression, β is our primary coefficient of interest, recovering the average conditional difference in outcomes associated with the subject choice variable (separately: studying subjects required to be eligible for EBacc, studying at least one foreign language, studying two or more sciences, and studying for at least one applied GCSE).
This approach attempts to isolate the conditional association between subject choices and university access outcomes by using the extremely rich background data available in Next Steps. Specifically, we include the following covariates as dummy variables (where they are categorical) or linear and quadratic variables (where they are continuous): household income; age 14 (KS3)Footnote 3 test scores (English, maths and science); gender; ethnic group (white, mixed, Indian, Pakistani, Black Caribbean, Black African, other); month of birth (continuous linear); number of siblings (categorical: none, one, two, or three or more); number of elder siblings (categorical: none, one, two, or three or more); lone parent family; mother's qualifications (none, below GCSEs, A-Levels, HE below degree, degree); father's qualifications (none, below GCSEs, A-Levels, HE below degree, degree); region of England (North East, North West, Yorkshire & Humber, East Midlands, West Midlands, East of England, South East, South West); school type (community, community technology college, foundation school, voluntary aided, voluntary controlled); whether school is selective; whether school has sixth form; mother's occupational status; and father's occupational status.
Given the school-clustered design of Next Steps, we use clustered standard errors to account for the additional uncertainty around our estimates that this implies.
We also use propensity score matching methods (Reference Rosenbaum and RubinRosenbaum and Rubin, 1983) to provide estimates of the conditional change in probability of university attendance. Reference Mendolia and WalkerMendolia and Walker (2014), Reference AlcottAlcott (2017) and Reference McDoolMcDool (2017) have previously used this approach to address research questions using Next Steps data. It has the advantage of controlling for background characteristics in a more flexible manner. It also more explicitly restricts attention to the sample within which the data can provide reliable causal impacts (imposing ‘common support’), rather than extrapolating across the sample.Footnote 4
However, to produce its causal estimates, it is important to stress that it still relies on the assumption of all differences between individuals in the ‘treated’ and ‘untreated’ groups being captured by observed characteristics included in the propensity score model. In this case, ‘treated’ corresponds to individuals who study the combination of subjects considered (separately: studying subjects required to be eligible for EBacc, studying at least one foreign language, studying two or more sciences, and studying for at least one applied GCSE). Next Steps' rich set of such characteristics helps to make this a plausible assumption but we cannot, of course, rule out the continued presence of unobserved factors that determine the subject choices that individuals make (Reference ShadishShadish, 2012).
In this section, we lay out the matching approaches that we consider and discuss how we will assess whether they generate a matched sample that is well balanced on our observable characteristics and has good common support. The subsequent section reports on the process of constructing matched samples and assessing how well balanced these are in terms of background variables.
Matching begins by specifying a model of whether individuals study each of these sets of subjects starting at age 14. These are discrete choice models, specifically in this case we use probit regression. This model includes the same set of background characteristics as those added to the model discussed in Section 4.1. However, we also experimented with additional complexity in the model, such as the inclusion of interaction terms between characteristics, where this helped to increase the balance.
This model is used to generate an estimated propensity score for each individual i.e. the estimated probability that they study the relevant combination of subjects (are ‘treated’ in the policy evaluation terminology). We consider the distribution of these estimated propensity scores among treated and untreated individuals in order to assess the extent to which they overlap and which matching approaches are likely to construct a balanced sample.
These estimated propensity scores are then used to produce a matched sample. We consider three methods of doing this:
Nearest neighbour matching without replacement, with caliper: each treated individual is matched to one untreated individual with the closest propensity score, subject to the constraint (caliper) that the scores are no more than 0.05 different; once an individual has been used as a match they cannot be used again.
Nearest neighbour matching with replacement, with caliper: each treated individual is matched to one untreated individual with the closest propensity score, subject to the constraint (caliper) that the scores are no more than 0.05 different; an individual can be used as a match multiple times.
Kernel matching: each treated individual is matched to all untreated individuals with a weighting scheme that gives closer matches larger weight.
We assess the matched samples produced by these methods by considering the standardised differences in the background characteristics included in the matching model. Standardised differences are constructed by dividing the absolute difference in the characteristic between the treatment and control groups by the overall standard deviation of the characteristics, meaning that they are all in a common scale. As well as considering the average standardised difference across all characteristics of the matched sample, we also consider each characteristic to ensure that all differences are acceptably small (Reference Imbens and RubinImbens and Rubin, 2015).
Finally, we estimate linear probability regression models of our outcomes of interest using the matched sample, using the same model as that described in section 4.1.
5. Constructing a matched sample
The distribution of the propensity scores by combination of subjects studied is shown in figure 1 for whether individuals studied the subjects required to be eligible for EBacc, figure 2 for whether individuals studied two or more sciences, figure 3 for whether individuals studied foreign languages, figure 4 for whether individuals studied either History or Geography, and figure 5 for whether individuals studied any applied GCSEs.
On the basis of these distributions, our preferred method of matching is likely to be a nearest neighbour matching approach, without replacement (i.e. once an untreated individual is selected as a match they cannot be selected again) imposing common support and a caliper of 0.05.
In this case, imposing common support does not result in the exclusion of many observations since there is a similar range of propensity scores in treatment and control groups but remains important to exclude treated individuals for whom there are no comparable untreated individuals. In addition, a caliper on the distance between the propensity score of the treated individual and that of the match ensures that untreated matches do not end up being too different from their treated comparator.Footnote 5
We also perform our other proposed forms of matching. Matched samples from these approaches do, in some cases, differ somewhat from our preferred approach. However, the overall reduction in bias (as measured by standardised differences) was smaller, sometimes considerably, in matching with replacement. Kernel matching also resulted in smaller reductions in bias than nearest-neighbour matching with a caliper. Nevertheless, this ultimately makes only small differences to estimates of impact.
Since each individual in the treatment group is matched to an untreated individual with as similar as possible a propensity score, the matched sample should be balanced on the characteristics included in the propensity score model. We verify that this is the case in table 2 for our matched sample by EBacc-eligibility and table 6 for our matched sample by applied subjects. The balance for the other matched comparisons is reported in the appendix: Appendix table A1 for our matched sample by studying any languages, Appendix table A2 for our matched sample by two or more sciences, and Appendix table A3 for our matched sample by whether individuals study History or Geography.
Table 2 suggests that matching has produced a strongly balanced sample by whether or not individuals study a full set of EBacc subjects.Footnote 6 While, in the unmatched sample, there are substantial standardised differences between variables such as household income, prior attainment and parental education, these have all been substantially reduced in the matched sample.Footnote 7 No standardised differences exceed 0.08 in the matched sample. Overall, the average standardised difference between characteristics in the treatment and control group are reduced from 0.17 to 0.02.
We see a similarly well-matched sample when splitting the group by whether they study for any applied GCSEs (table 3). In the matched dataset the standardised differences between groups does not exceed 0.04 for any of the characteristics. The average standardised difference in the matched dataset is 0.01, compared to 0.12 in the unmatched sample.
The results are reported in table 4. This includes:
The ‘naïve’ estimates of the change in probability of university access measures associated with studying the relevant set of subjects (these replicate the difference between the two columns in the top panel of table 1);
The regression adjusted estimates of the change in probability of university access measures having controlled for background characteristics parametrically;
and, finally, the matched estimates of the change in probability of university access measures, estimating conditional differences among those in the matched samples constructed in Section 5.
It also reports the statistical significance (taking into account the school-level clustering in Next Steps) of these estimated differences and the size of the matched sample. We report average marginal effects from linear probability models (average marginal effects from probit regression models give similar results).
As noted in Section 3, in the unadjusted sample, individuals who study the full set of subjects required to be eligible for the EBacc are 27 percentage points more likely to apply to university, and 29 percentage points more likely to attend, than their peers who do not study the full set of subjects. Once we control for background characteristics using regression analysis, these differences are dramatically reduced to differences of four and three percentage points, respectively. The differences remain statistically significant. In the matched sample, the difference in the probability of attending university remains statistically significant, however the difference in probability of applying does not. Surprisingly, given the particular rhetoric around EBacc representing the subjects favoured by more prestigious universities (Reference GibbGibb, 2011), the results from the regression model imply that those with a full set of EBacc subjects are less likely to get into a Russell Group university than their peers who do not. However, this is not robust to using the matching approach, where the difference is essentially zero. This reduces our confidence in the regression adjustment result, suggesting it may be driven by extrapolation across not truly comparable individuals.
In purely descriptive terms, individuals who study two or more sciences are 24 percentage points more likely to apply to university and 25 percentage points more likely to attend than their peers who do not. As with the overall EBacc, these are much reduced and become statistically insignificant in the matched sample. Individuals who study two or more sciences are also 10 percentage points more likely to attend a Russell Group institution than those who only study for one science award. This is reduced to just a 1 percentage point difference from the matched sample. Although not identical, these results are, perhaps surprisingly, similar to those found by Reference De PhilippisDe Philippis (2017), given the differences in margin at which the difference was estimated (at least two sciences here, compared to three sciences in De Philippis' work) in that they are small and statistically insignificant at the 5 per cent level but significant at the 10 per cent level.
Students who study a foreign language are 25 percentage points more likely to apply and 26 percentage points more likely to attend university. However, once the background of those studying these subjects is taken into account, these differences are reduced to be small and statistically insignificant. In the case of entry to a Russell Group university, while those who study any languages are 11 percentage points more likely to attend a Russell Group university, the results from the matched sample suggest there is no statistically significant difference in Russell Group attendance by whether or not individuals have studied any languages.
Students who study either History or Geography are 17 percentage points more likely to apply to university and 16 percentage points more likely to attend university than their peers who do not study either of these subjects. After adjusting for background characteristics, these differences are reduced to be small and statistically insignificant. Likewise, an 8 percentage point raw difference in the probability of attending a Russell Group university is reduced to statistical insignificance when we use the matching approach.
Finally, we consider the case of whether individuals study any applied subjects. Unlike the other combinations we have considered, individuals who study applied subjects are less likely to achieve our outcome variables. Before any adjustment, we start out with a difference of 19 percentage points in the probability of applying to university, 20 percentage points in the probability of attending university, and 10 percentage points in the probability of attending a Russell Group institution. Controlling for the differences in composition of the group who study any applied subjects by restricting attention to the matched sample we find that, while the differences are much reduced to 4 percentage points in the case of university attendance and 3 percentage points in the case of university attendance, they are not eliminated. The difference for Russell Group attendance is not statistically significant but this could be due to few comparable individuals at this margin.
6.1 Continuous measure of subject choice
The main aim of this paper has been to consider what evidence there is that taking specific combinations of subjects makes a difference to later educational progression, comparing those on the margin between taking these combinations and not taking them. In general, we have found that the differences are small and often statistically insignificant once we account flexibly for observable differences in the individuals that take these subjects. However, perhaps it is simply the case that subject choices at age 14 simply do not affect progression to university.
To explore whether this is the case, we consider a continuous measure of academic subject selectivity (Reference Henderson, Sullivan and AndersHenderson et al., 2016) and whether changes along this spectrum make a difference to the probability of progression to higher education. This measure is based on the prior academic performance of the pupils that choose to study each subject. We assign each subject the average score in Key Stage 3 (KS3) compulsory tests at age 14 of those pupils that report they are studying that subject. KS3 tests are taken roughly contemporaneously with subject choice decisions, so they seem the most appropriate measure to use in this way. Further details, including a ranking of subjects based on this measure, are discussed by Reference Henderson, Sullivan and AndersHenderson et al. (2016).
We employ the same flexible regression adjustment approach outlined in Section 4.1 (substituting the binary treatment indicator Treat for our continuous measure of subject selectivity) to estimate an average difference in outcomes across the distribution of the subject choice measure.Footnote 8 We report the estimated change in probability of university entry (or Russell Group attendance) for a one standard deviation change in the subject choice academic selectivity score in table 5.
Individuals with a one standard deviation more academically selective subject mix are 11 percentage points more likely to apply to, or to attend, university. They are also 5 percentage points more likely to attend a Russell Group university. However, in line with the results of Reference Henderson, Sullivan and AndersHenderson et al. (2016), there are big differences in the subject choice mix that young people study depending on their background characteristics. After adjusting for these differences, the difference in probability of applying to university is reduced to 2 percentage points, while the difference in probability of attending university, or attending a Russell Group institution, is reduced to 1 percentage point. Only the difference in probability of applying to university remains statistically significant.
These results provide context to our main findings. The association between a change in our continuous measure of academic selectivity of all subjects studied and university progression measures is small and (except in the case of application) statistically insignificant. This makes the differences in probability of going to university depending upon studying combinations of subjects that do persist even when using a matching approach stand out more. It suggests that there may be a particular importance attached to the combinations, above and beyond differences in academically selectivity.
This paper has provided important new evidence on the importance of the subjects that individuals study from ages 14–16 for access to university. Using rich survey data collected about a representative cohort of young people from England, we estimated the effect on university entry of studying specific sets of subjects that have been of particular interest to policymakers in recent years. We did so using both regression modelling and propensity score matching to test the robustness of the results to each of these approaches. The aim of these methods is to produce estimates that specifically compare very similar individuals who are on the margin between studying a set of subjects or not.
While there are large raw differences in the probability of university attendance by subjects studied, once differences in the characteristics of individuals who study such subjects are taken into account, the remaining differences are small or non-existent. There is some evidence of a positive effect (3–5 percentage points) of studying the full suite of English Baccalaureate subjects and a negative effect of a similar magnitude of studying any applied GCSE subjects. We also produced estimates of the change in probability of applying to or attending university, or attending a Russell Group university, associated with a general increase in the academic selectivity of the subjects that an individual studies. The differences are also small and, except in the case of applying to university, not statistically significant.
With respect to the significant differences, we should keep in mind that to regard these results as truly causal we need to be satisfied that there are no unobserved differences (i.e. driven by factors that we could not include in the propensity score model) between individuals who did study such subjects and those who did not that could be driving the results. Given the relatively small differences we find after taking the observed differences into account, it would not require large unobserved differences for these results to be overturned. It is striking that, while differences associated with a continuous change in the academic selectivity of subjects studied become insignificant when controlling for background characteristics, differences in the probability of university attendance by whether or not young people studied the full set of EBacc subjects and by whether or not they studied any applied subjects remain significant when controlling for the same set of background characteristics. This suggests that there may be a particular importance attached to the combinations, above and beyond that they are just more academically selective subjects.
What should we make of these results? Overall, we find that the seemingly large differences in university progression associated with the subjects that young people study from ages 14 to 16 often seem to be small, at most, once we take into account differences in the kinds of people who study these subjects. Why might this be the case? Beyond the removal of the influence of background characteristics, it could be that differences in subjects studied at this stage are simply swamped by decisions young people make in the following two years. The results for studying the full set of EBacc subjects and for studying any applied subjects do show residual associations with university attendance, suggesting the view that they may have particular importance is not without merit, a finding that concords with other research focussing on subject choice at a later point in individuals' educational careers (Reference DilnotDilnot, 2016). Nevertheless, it is important to emphasise that the differences are still not large, meaning that we should not exaggerate the likely implications of more pupils studying EBacc subjects.
That said, we should be clear that our conclusion is not that subject does not matter. This paper does not estimate the impact of studying these different subjects on studying specific subjects (for example, STEM subjects) at university; they may be more important for this but that is beyond the scope of this paper. We set out to explore specifically the evidence behind government policies that focus on encouraging students to take certain subject combinations as a route to increasing chances of university or Russell Group institution attendance.
It is also important to acknowledge that university attendance is not just an end in itself, it also potentially has important implications for employment, income and occupational status attainment. We have not explored the implications of subject choice for these in this paper, but plan to explore this in future work, especially as previous work suggests it may be important in earlier cohorts (Reference IannelliIannelli, 2013).