1. Introduction
Despite its potential significance, the association between foreign language (FL) learning and maths performance remains an underexplored area of research. This study examines how acquiring a new language may do more than broaden linguistic horizons; it explores the hypothesis that FL instruction shares critical cognitive pathways with mathematical reasoning, particularly during the pivotal stage of adolescence.
Using data from over 300,000 students across 73 countries in the 2018 Programme for International Student Assessment (PISA), our research addresses the question: How is participation in language learning associated with mathematics performance? By identifying a substantial predictive association, this research suggests that FL programmes warrant consideration as potential strategic markers for global academic achievement and educational equity.
1.1. Foreign language learning and mathematical achievement: A cognitive perspective
Learning an FL for the first time – including in mainstream school settings – has been theorised to enhance executive function (EF), particularly working memory, cognitive flexibility, and inhibitory control. These EF components are strongly associated with academic success, especially maths performance (Fuhs et al., Reference Fuhs, Nesbitt, Farran and Dong2014). Based on these findings, researchers have hypothesised that an indirect link between FL learning and mathematical achievement may exist, mediated by improvements in EF, and potentially shaped by instructional factors (Schiltz et al., Reference Schiltz, Lachelin, Hilger and Marinova2024).
Although this link has been previously explored, the body of research remains relatively small and has primarily focused on younger cohorts. Among low-income preschoolers, bilingualism has been associated with mathematics through growth in inhibitory control (Choi et al., Reference Choi, Jeon and Lippard2018). In primary school, working memory gains due to FL learning predict mathematical problem-solving skills among language learners in grades 1–4 (Lee Swanson et al., Reference Lee Swanson, Arizmendi and Li2021). Similarly, in two-way dual-language programmes, gains in mathematical skills among fourth and fifth graders were statistically mediated by inhibitory control, working memory, and cognitive flexibility – an effect not observed in younger groups (Esposito, Reference Esposito2020). Consistent with this pattern, Park et al. (Reference Park, Dotan and Esposito2023) reported that enhanced EF associated with FL learning is linked to higher mathematics scores in fourth graders, after adjusting for caregiver education.
Research on the cognitive benefits of FL learning initiated during adolescence is also limited. However, evidence from later life stages suggests that these extend well beyond early childhood. Working memory improvements following FL learning are well-documented, with significant gains observed in both young and older adult learners after up to a year of FL instruction (Huang et al., Reference Huang, Loerts and Steinkrauss2022; Wong et al., Reference Wong, Ou, Pang, Zhang, Chi, Lam and Antoniou2019). Importantly, the literature shows that adolescents (average age 16 years) have also shown measurable improvements (Shoghi & Ghonsooly, Reference Shoghi and Ghonsooly2018).
FL learning after childhood also strengthens cognitive flexibility and attention switching. While few studies focus exclusively on adolescents, research demonstrates that even short-term instruction results in benefits for learners aged 16–78 (Al-khresheh & Karmi, Reference Al-khresheh and Karmi2024; Bak et al., Reference Bak, Long, Vega-Mendoza and Sorace2016; Shoghi & Ghonsooly, Reference Shoghi and Ghonsooly2018; Vega-Mendoza et al., Reference Vega-Mendoza, West, Sorace and Bak2015). Likewise, inhibitory control improves through FL learning, with significant gains observed in children, adolescents, and young adults after as little as six months of instruction (Bialystok & Barac, Reference Bialystok and Barac2012; Sullivan et al., Reference Sullivan, Janus, Moreno, Astheimer and Bialystok2014).
These cognitive skills – working memory, flexibility, and inhibition – are critical for managing complex tasks and understanding mathematical concepts across developmental stages (Bull & Lee, Reference Bull and Lee2014; Cragg et al., Reference Cragg, Keeble, Richardson, Roome and Gilmore2017; ten Braak et al., Reference ten Braak, Lenes, Purpura, Schmitt and Størksen2022). Conversely, deficits in these skills are linked to mathematical learning difficulties (Iglesias-Sarmiento et al., Reference Iglesias-Sarmiento, Carriedo, Rodríguez-Villagra and Pérez2023; Ramos et al., Reference Ramos, Jadán-Guerrero and Gómez-García2018), which has prompted the use of EF-targeted interventions (Sabine, Reference Sabine2020). Taken together, these findings support the rationale for investigating whether FL learning beginning during adolescence can yield cognitive benefits, with potential implications for mathematical achievement.
Beyond cognitive theory, empirical evidence suggests a positive link between FL learning and higher mathematics scores. However, most existing studies are geographically concentrated in the United States and Canada. Research on primary school dual-language immersion programmes indicates that students outperform their nonlanguage-learning peers in mathematics, even after adjusting for socioeconomic status (SES) and ethnicity (Padilla et al., Reference Padilla, Fan, Xu and Silva2013; Watzinger-Tharp et al., Reference Watzinger-Tharp, Swenson and Mayne2016). Similarly, Lee (Reference Lee2010), in their study of South Korean Year 11 students, identified a strong correlation between proficiency in English as a second language and mathematics achievement across both general and advanced maths courses.
Additionally, Woll and Wei’s (Reference Woll and Wei2019) systematic review of 20 studies examining preschoolers to university students found that 90% of the included research reported positive correlations between FL learning and academic outcomes, including mathematics. Moreover, a recent meta-analysis involving over 785,000 students (aged 10–17) found that second-language learners are three times more likely to achieve higher math grades compared to their non-learner peers (Nucette et al., Reference Nucette, Hamamura, Leitao and Biedermann2024). These results were consistent across various FL delivery methods, including dual-immersion, FL, and English as a second-language programmes.
At the same time, large-scale and multilingual contexts show mixed or small effects of FL learning on maths achievement, often hinging on language proficiency and language-of-instruction alignment (González-Martín et al., Reference González-Martín, Berd-Gómez, Saura-Montesinos, Biel-Maeso and Abrahamse2024; Greisen et al., Reference Greisen, Georges, Hornung, Sonnleitner and Schiltz2021). Recent reviews note that multilingual learners can perform below expectations when instruction or assessment occurs in a nondominant language, even though they may show enhanced mathematical encoding and processes in nonlanguage-related tasks (Dentella et al., Reference Dentella, Masullo and Leivada2024).
Despite these findings, research on first-time, school-based FL learning and mathematics achievement in adolescents remains scarce, particularly outside North America. This study addresses this gap by using international data to examine whether participation in FL programmes is associated with mathematics outcomes for 15-year-olds.
1.2. Foundations for effective foreign language learning
Effective FL learning depends on multiple factors beyond delivery method, including curriculum design, teaching strategies, educator expertise, and learner motivation (Dixon et al., Reference Dixon, Zhao, Shin, Wu, Su, Burgess-Brigham, Gezer and Snow2012; Ryshina-Pankova, Reference Ryshina-Pankova, Norris and Davis2015). Among these, culturally enriched content is essential. Integrating cultural materials fosters communicative competence, critical thinking, and engagement with the target language (Lavrenteva & Orland-Barak, Reference Lavrenteva and Orland-Barak2015). International frameworks (Council of Europe, 2001; National Standards, 2015) position culture as central to language learning, emphasising cross-cultural communication, intercultural awareness, and exposure to diverse perspectives as critical for proficiency (Chaika, Reference Chaika2023; Dlaska, Reference Dlaska2000; Lusin et al., Reference Lusin, Peterson, Sulewski and Zafer2023).
Similarly, multicultural teaching practices – such as teaching sociocultural communication, addressing cross-cultural misunderstandings, and facilitating cultural exchanges – help students develop practical language skills (Byram, Reference Byram2021; Byram et al., Reference Byram, Holmes and Savvides2013; Cok, Reference Cok2021; Savignon & Sysoyev, Reference Savignon and Sysoyev2005). Finally, teacher expertise plays a pivotal role: experienced educators adapt lessons for diverse learners, foster inclusive classrooms, and implement evidence-based strategies that support language development and cognitive growth (Chan, Reference Chan2006).
Ultimately, combining culturally enriched curricula, interactive methodologies, and skilled educators enhances language proficiency and intercultural competence, supporting cognitive skills that may translate into broader academic success.
1.3. Sociocultural and curricular moderators of mathematics achievement: A global perspective
In addition to cognitive skills, research has found several contextual factors that influence mathematical achievement during high school years. Socioeconomic status (SES), for instance, is a well-established determinant, often linked to lower academic performance among disadvantaged students across diverse educational systems and national contexts (Eriksson et al., Reference Eriksson, Lindvall, Helenius and Ryve2021; Kim et al., Reference Kim, Cho and Kim2019; OECD, 2023b).
Gender differences in mathematics achievement have also been widely studied. Several studies in different contexts have shown that when disparities are reported, their extent and nature vary significantly depending on cultural beliefs, educational policies, and societal norms (Hamamura, Reference Hamamura2011; Hyde & Mertz, Reference Hyde and Mertz2009; Nollenberger et al., Reference Nollenberger, Rodríguez-Planas and Sevilla2016; Stoet & Geary, Reference Stoet and Geary2018).
Parental education also influences mathematical achievement, though studies differ on whether maternal, paternal, or combined parental education has the strongest impact (Chiu et al., Reference Chiu, Economos, Markson, Raicovi, Howell, Morote and Inserra2016; Crede et al., Reference Crede, Wirthwein, McElvany and Steinmayr2015; Pishghadam & Zabihi, Reference Pishghadam and Zabihi2011). Similarly, parental involvement, particularly when involving high expectations and effective communication with schools, is often associated with positive outcomes (Boonk et al., Reference Boonk, Gijselaers, Ritzen and Brand-Gruwel2018; Fiskerstrand, Reference Fiskerstrand2022; Hong et al., Reference Hong, Yoo, You and Wu2010). However, it has been noted that excessive supervision can yield adverse effects (Boonk et al., Reference Boonk, Gijselaers, Ritzen and Brand-Gruwel2018; Hong et al., Reference Hong, Yoo, You and Wu2010; Rodríguez et al., Reference Rodríguez, Piñeiro, Gómez-Taibo, Regueiro, Estévez and Valle2017; Silinskas & Kikas, Reference Silinskas and Kikas2019).
Studies on school characteristics, including size and student–teacher ratios, present mixed evidence. Smaller classes and lower ratios are generally associated with improved performance, although other factors, such as school access to teaching resources, play a decisive role (Abizada & Seyidova, Reference Abizada and Seyidova2024; Olson et al., Reference Olson, Cooper and Lougheed2011). Additionally, urban students often outperform their rural peers, while students in independent schools tend to achieve higher scores than those in public schools in countries with dual education systems (Forgasz & Hill, Reference Forgasz and Hill2013; Mohammadpour & Abdul Ghafar, Reference Mohammadpour and Abdul Ghafar2014).
Beyond demographic and school-level factors, language policy and curriculum design add complexity in multilingual contexts. Mathematics outcomes vary with policy choices (e.g., language of instruction), curriculum language demands, and student-body composition. For example, Luxembourg’s European Public Schools – a system that allows families to choose the main language of instruction, typically to match their home language – report higher mathematics scores relative to schools following the traditional curriculum. This finding supports the idea that the alignment between home and school language and reduced language burden can enhance maths learning (Colling et al., Reference Colling, Grund, Keller, Esch, Fischbach and Ugen2024). Similarly, long-standing Canadian immersion studies show that teaching maths in a second language can yield comparable performance when proficiency is adequate and programmes are well-structured (Xu et al., Reference Xu, Di Lonardo Burr, Skwarchuk, Douglas, Lafay, Osana, Simms, Wylie, Maloney and LeFevre2022). These examples illustrate that mathematics outcomes are highly contingent on language match and programme design.
Understanding the impact of these contextual, demographic factors is essential for addressing disparities in maths performance and improving the accuracy of educational research. This is why large-scale international assessments offer a valuable platform for global comparisons, important for refining educational strategies and addressing inequalities.
1.4. Mathematical achievement and the Programme for International Student Assessment (PISA)
Mathematical proficiency is vital for educational attainment, career prospects, and individual and national development (Anderton et al., Reference Anderton, Hine and Joyce2017; Delaney & Devereux, Reference Delaney and Devereux2020; European Commission, 2011; Joensen & Nielsen, Reference Joensen and Nielsen2009). Better maths performance during high school has been correlated with improved transitions to higher education, stronger university performance, and enhanced career prospects (Anderton et al., Reference Anderton, Hine and Joyce2017; Delaney & Devereux, Reference Delaney and Devereux2020; European Commission, 2011; Joensen & Nielsen, Reference Joensen and Nielsen2009). Achievement in science, technology, engineering, and mathematics (STEM) fields, in particular, has been shown to drive innovation, economic growth, and societal welfare (Freeman et al., Reference Freeman, Marginson and Tytler2019). Despite its importance, maths outcomes have declined globally, with PISA reporting more countries experiencing drops than gains in 2018 – a trend that worsened in 2022, largely due to the COVID-19 pandemic (OECD, 2019b, 2023b).
PISA, a triennial assessment of 15-year-olds, measures reading, mathematics, and science performance, alongside career-related domains, such as financial literacy, which vary across iterations (OECD, 2019a). Since its launch in 2000, PISA has grown to include 80 countries and economies (OECD, 2024). PISA also gathers extensive contextual data from students, parents, teachers, and schools. Its rigorous design – standardised questionnaires, stratified sampling, advanced validation, and use of advanced statistical techniques to manage missing data – ensures comparability across diverse education systems (OECD, 2009, 2019a, 2024).
Given the cognitive, educational, and sociocultural benefits of FL learning, and the global concern over declining maths achievement, this study leverages PISA data to explore whether participation in FL programmes is associated with differences in mathematics performance among adolescents.
2. The current study
This study explores the relationship between school-based FL learning and mathematical achievement among 15-year-old students. It addresses two key gaps in the literature: the underrepresentation of adolescents – a critical developmental stage – and the predominance of North American research, which limits broader applicability.
Grounded in cognitive theory and empirical findings, this study models the association between FL learning and mathematics as indirect. It proposes, without empirically examining these mechanisms in the analyses, that FL learning may support mathematics achievement by strengthening EFs such as working memory, cognitive flexibility, and inhibitory control. It also considers a broader secondary theoretical contribution through higher-order thinking skills, including problem-solving and abstract reasoning. These mechanisms are presented as conceptual explanations rather than tested mediators.
Recognising the influence of sociocultural and curricular factors, our study focuses on students enrolled in mainstream programmes where FL is taught as a subject, excluding immersion and bilingual contexts to avoid confounding variables. For analysis, we operationalised “language learners” as students whose reported home language matched their language of instruction.
The research addresses two primary questions:
-
(1) To what extent does the duration of school-based FL learning is linked to the mathematical achievement of 15-year-old students, after controlling for established determinants of academic performance?
-
(2) To what extent do specific characteristics of FL learning programmes predict mathematical achievement in this population?
3. Methodological design
3.1. Data selection
This study utilised data from the Organisation for Economic Co-operation and Development (OECD) PISA 2018 cycle to address the research questions. PISA assesses the academic knowledge and skills of 15-year-old students across various domains, including reading, mathematics, and science, while also collecting detailed background information through student, parent, teacher, and school-level questionnaires.
The 2018 dataset, which includes data from over 600,000 students across 79 countries and economies, is particularly valuable as it represents the first and, at the time of writing, only instance where intercultural communication skills, including FL learning, were included in the assessment (OECD, 2019a). This allowed for an in-depth analysis of these skills alongside student performance in mathematics.
To ensure alignment with the research questions and maintain data validity, a subsample of students was selected based on language-use criteria and data completeness. Our study focuses on language learners in mainstream programmes where FLs are taught as subjects, rather than through immersion. To minimise the likelihood of including proficient bilinguals – who, according to prior research, may exhibit different cognitive profiles (White & Greenfield, Reference White and Greenfield2017) – only students who reported using the same language at home and at school were included in the sample. For example, if a student’s home language (e.g., Spanish) differed from the language of assessment (e.g., English), their data were excluded. Additionally, cases with missing or invalid responses for key variables, as well as extreme outliers, were removed to meet model requirements and reduce bias in the results (see Supplementary Material Appendices S1 and S2 for details).
For the first research question – examining the relationship between time spent on school-based FL learning programmes and mathematics performance – we used data from the student, parent, and school questionnaires. After applying our filters, 34% of cases in the PISA dataset were removed due to invalid or incomplete data, 13% were excluded based on the language-mismatch filter, and a further 4% were excluded due to the outlier removal process. Consequently, the final sample for the first research question comprised 300,656 students from 73 countries and economies.
The second research question focused on the role of specific FL programme characteristics in predicting mathematics performance. This analysis incorporated all datasets in the first study, supplemented with data from teacher questionnaires. To ensure conceptual relevance, the dataset was filtered to include only language teachers, as the teacher-level variables incorporated in the models – such as years of experience, multicultural teaching practices, and cultural competence – are specific to FL instruction and would not apply to teachers of other subjects. Because only 19 countries administered the teacher questionnaire and the proportion of language teachers was relatively small, the final sample for the second question was substantially smaller, consisting of 53,459 students and 16,342 teachers, representing 31% and 23% of their respective total samples. Further details on the data selection and preparation processes are provided in the Supplementary Material (Appendices S1 and S2).
3.2. Variable description
3.2.1. Foreign language learning variables
Our primary predictor variable was FL learning time in minutes per week (FLMINS). FLMINS serves as a proxy for exposure to FL learning, as direct measures of proficiency were unavailable.
This variable was calculated by combining two PISA 2018 variables: FL periods per week (ST059Q04HA) and average minutes per class period (ST061Q01NA), both derived from students’ input in the student/parent questionnaire.
The calculation was performed as follows:
To ensure reliability, outliers were addressed using two approaches. First, FLMINS values were capped at the total minutes of instruction (TMINS), an OECD-derived index reflecting realistic instructional time. Second, within each school, the interquartile range (IQR) method was applied to exclude extreme values exceeding the upper bound (James et al., Reference James, Witten, Hastie and Tibshirani2023). This ensured that only FLMINS values disproportionately high relative to other values within the same school were removed. Tables 1 and 2 provide a summary of all the variables used, including their description, sources, and values.
Table 1. Description of variables: linear mixed model variables (Study 1)

* Composite variables are calculated as described in the Methods section and Supplementary Material.
Table 2. Description of variables: random forest variables (Study 2)

Note: All variables from Study 1 are also included in Study 2.
* Composite variables are calculated as described in the Methods section and Supplementary Material.
3.2.2. Mathematical performance variables
PISA assesses students’ mathematical achievement using a set of 10 plausible values (PVs) rather than a single score. These PVs are not raw scores but random draws from a student’s estimated ability distribution, derived from latent proficiency models that incorporate extensive contextual covariates (Jewsbury et al., Reference Jewsbury, Jia and Gonzalez2024). This approach addresses the fact that, to encourage participation and minimise fatigue, each student completes only a subset of the total pool of mathematics questions during a given test cycle (OECD, 2009).
The PVs reflect students’ abilities across various mathematical domains, including formulation, application of concepts, facts, and procedures, as well as reasoning and interpretation of mathematical results (OECD, 2019a). In this study, the 10 PVs serve as the primary outcome variables. Following OECD guidance, all PVs were included in the analyses to account for their statistical nature and to ensure validity and robustness (OECD, 2009).
To enable comparability across cycles and countries, PISA standardises students’ PVs to have a mean of 500 and a standard deviation (SD) of 100 points (OECD, 2009). In 2018, due to a decline in student performance, the OECD average in mathematics was 489, with an SD of 91 points (OECD, 2019b). This process ensures that the data are also suitable for international comparative analysis.
3.2.3. School-related variables
Type of Admission (ADMISSION). It captures potential differences in student achievement stemming from selective school entry. Based on school responses to PISA variable SC012Q01TA, it reflects the frequency of admission based on academic performance (including placement tests). Its inclusion helps control bias from academically elite cohorts that might otherwise confound the relationship between our variables of interest.
Mathematics learning time (MMINS). As maths instructional time (in minutes per week) is logically associated with mathematical performance, MMINS was included as a control variable to account for its potential confounding effect on the relationship between FL learning time (FLMINS) and mathematical achievement.
MMINS, an OECD index, is based on student and school responses regarding the number of mathematics periods per week and the average length of each period (OECD, 2020).
School location (SCHLOC). Reported by the OECD using data from the school questionnaire, SCHLOC categorises schools based on population size, from villages to large cities, to control for geographical disparities.
School type (SCHTYPE). This variable is recorded by the OECD, based on school records, and groups them as private-independent, private government-dependent, or public. SCHTYPE was incorporated into the model to account for differences across school ownership.
Student–teacher ratio (STRATIO). Drawn from the school dataset, STRATIO is an OECD index constructed from principals’ reports regarding average class size It serves as a proxy for class size, a well-established determinant of maths performance.
School size (SCHSIZE). Also sourced from the school dataset, SCHSIZE is a school-level OECD index representing the total number of enrolled students. This variable was included to capture the potential effect of school scale on student achievement.
Teacher experience in years (TEACHEXP). Derived from the teachers’ dataset, TEACHEXP represents the total years of teaching experience, accounting for variations in instructional quality across classrooms.
3.2.4. Teaching programme variables
The following variables were included to ensure that the effects of FL learning are assessed independently of the broader benefits of engagement with multicultural teaching and programmes.
Multicultural curriculum (MCCUR). This index was derived from five PISA school questionnaire items (SC167Q01HA–SC167Q05HA) asking whether the formal curriculum for 15-year-olds includes components such as intercultural communication, cultural knowledge, openness to intercultural experiences, and FL instruction. Each item was coded 1 = Yes, 0 = No, and equally weighted. The score represents the proportion of these elements present at the school level, yielding a range from 0 to 100:
This approach follows PISA’s aggregation conventions (OECD, 2020) and reflects the extent to which multicultural content is formally integrated into the curriculum. Moreover, all components are derived from question SC167 and designed to be interpreted together, ensuring consistency in meaning. The inclusion of these elements aligns with research emphasising the role of cultural education in supporting language learning and cognitive development (Dlaska, Reference Dlaska2000).
Multicultural teaching practices (MCTEACH). This composite index captures the extent to which schools implement practices that actively promote cross-cultural interaction and understanding, such as hosting teachers from other countries, teaching the history of international cultural groups, offering exchange programmes, celebrating cultural festivities, encouraging intercultural communication, and promoting FL skills (Cok, Reference Cok2021; Savignon & Sysoyev, Reference Savignon and Sysoyev2005).
The index was calculated from seven PISA items (SC159Q01HA, SC165Q02HA, SC165Q06HA–SC165Q09HA, TC207Q05HA). Each item was coded 1 = Yes, 0 = No, and equally weighted. The composite score represents the proportion of these practices implemented at the school level, yielding a range from 0 to 100:
Internal consistency was high (Cronbach’s α = 0.923), supporting the reliability of this measure.
Teacher’s self-efficacy in multicultural environments (GCSELF). It is an OECD-calculated index reflecting teachers’ confidence in addressing multicultural challenges, based on self-assessment across various dimensions such as raising awareness of cultural differences and managing multicultural classrooms effectively.
3.2.5. Student-related variables
Socioeconomic status (ESCS). To ensure the analysis examines the effects of FLMINS independently of socioeconomic disparities, the Economic, Social and Cultural Status (ESCS) index was included in the model. This index is a composite measure developed by the OECD, based on answers from both student and parent questionnaires. It reflects students’ family backgrounds by combining parents’ highest education levels, highest occupational status, and home possessions.
Notably, the “cultural” aspect of the ESCS index pertains to intellectual resources and academic achievement potential, rather than social customs or ethnic-specific behaviours. This component reflects the educational background and cognitive environment within the family. For example, home possessions such as books or access to technology are included as proxies for intellectual capital, which supports cognitive development and learning opportunities (OECD, 2020).
Gender (GENDER). Recorded directly from the student dataset, it was derived from students’ self-reports. This categorical variable was included to control for gender-related differences in mathematics performance.
Academic year level (GRADE). PISA assesses students aged 15 years and 3 months to 16 years and 2 months (OECD, 2020), which often corresponds to one or two years before high school graduation. However, year levels vary significantly across countries, which may introduce differences in students’ maturity and preparation for assessments. Including GRADE in the analyses ensures comparability of participants across educational systems.
3.2.6. Parent-related variables
Parental level of education (MISCED and FISCED). Maternal (MISCED) and paternal (FISCED) education levels are reported by PISA using ISCED classifications (UNESCO, 2021). These variables were included to add nuance to the family socioeconomic background.
Parental involvement (PARINV). This continuous variable was constructed for this study using two PISA items: the percentage of parent–school discussions initiated by parents (SC064Q01TA) and those initiated by teachers (SC064Q02TA). To capture the highest level of engagement, PARINV represents the greater of the two percentages. This measure reflects the extent of parental communication with schools, which is often associated with improved academic outcomes (Boonk et al., Reference Boonk, Gijselaers, Ritzen and Brand-Gruwel2018).
4. Study 1. Effect of FLMINS on maths scores
4.1. Method: Linear mixed modeling
To investigate the association between FL learning time (FLMINS) and students’ mathematics performance, a linear mixed-effects model (LMM) was fitted using the lme4 package in R (Bates et al., Reference Bates, Mächler, Bolker and Walker2015). This model accounts for the nested structure of the data, where students are grouped within countries, and it controls for sources of variability at both levels.
The model was specified as follows:
$$ {\displaystyle \begin{array}{l} Math\ Score\sim FLMINS+ MMINS+ GENDER+ GRADE\\ {}\hskip9em +\hskip2px ESCS+ MISCED+ FISCED+ SCHLOC\\ {}\hskip9em + SCHLTYPE+\hskip2px ADMISSION\\ {}\hskip9em +\left(1|\mathrm{CNT}\mathrm{STUID}\right)+\left(1|\mathrm{CNT}\right).\end{array}} $$
This specification estimates the association between FLMINS and mathematical performance (Math Score), while controlling for time spent on mathematics (MMINS), demographic factors (gender, grade), socioeconomic and cultural background (ESCS), parental education level (MISCED, FISCED), and school characteristics (location, type, and admission process). Random intercepts at the student (CNTSTUID) and country (CNT) levels account for individual differences and country-level variability, respectively. This structure addresses unobserved heterogeneity within countries and among individual students.
An initial screening of FLMINS revealed the presence of outliers, even after applying data cleaning filters described earlier. To address this, FLMINS was winsorised at 1,500 minutes per week, the maximum value for mathematics instruction time (MMINS). This threshold aligns with educational practice, where FL instruction typically receives fewer periods compared to core subjects like mathematics (OECD, 2023a). Winsorisation, a statistical technique that limits extreme values in a dataset by replacing outliers with a specified value, was employed to preserve meaningful data points – such as those from schools prioritising FL study – while preventing FLMINS to exceed realistic bounds and having disproportionate influence on analyses (Gelman & Hill, Reference Gelman and Hill2006; Tabachnick & Fidell, Reference Tabachnick and Fidell2018; Wilcox, Reference Wilcox2017).
In addition, we tested for a potential nonlinear relationship between FLMINS and Math Score using a generalised additive model (GAM) with the mgcv package for R (Wood, Reference Wood2017). The analysis indicated a linear relationship between the variables of interest, supporting the use of an LMM (for details, see Supplementary Material Appendix S2). Given the size and complexity of the data, the use of a GAM, instead of an LMM, would have introduced substantial computational challenges without significantly improving model performance.
Importantly, school size (SCHSIZE) and class size (CLSIZE) were initially considered as predictors but were ultimately excluded, as their inclusion would have removed all countries for which the OECD did not report or compute values for these variables. Because complete data were required to estimate the LMM, including SCHSIZE and CLSIZE would have substantially reduced the sample without offering commensurate analytical benefit. To include the effect of class numbers without significantly reducing the sample, student–teacher ratio (STRATIO) was used instead. Other variables were initially considered but later removed, such as student age, which was replaced by GRADE, and highest parental income, which was replaced by parental education level (MISCED, FISCED).
4.2. Study 1 results
Given the potential associations among predictors, Pearson’s correlations (Olivoto & Lúcio, Reference Olivoto and Lúcio2020) and variance inflation factors (VIFs) (Lüdecke et al., Reference Lüdecke, Ben-Shachar, Patil, Waggoner and Makowski2021) were calculated to assess multicollinearity. Although some variables exhibited moderate-to-strong correlations (r = .70 for ESCS/MISCED, r = .67 for ESCS/FISCED), no significant collinearity concerns were identified (VIFs: 2.43 for ESCS, 1.78 for FISCED, and 1.89 for MISCED) (for details, see Supplementary Material Appendix S2).
The linear model, which included both fixed and random effects, achieved a conditional R 2 of .865, while the marginal R 2 was .162. This indicates that the inclusion of random intercepts for students and countries considerably improved the model fit, with 86.5% of the variance in mathematics scores explained by the full model compared to 16.2% achieved by fixed effects alone. Table 3 presents a summary of the results obtained.
Table 3. Linear mixed-effects model results: fixed effects estimates and random effects intercepts

All fixed effects were statistically significant (p < .001), which was expected given the large sample size, suggesting a robust relationship between the predictors and mathematics scores. FLMINS showed a positive association with maths scores, estimated at 4.8 PISA points higher per additional hour of FL instruction (or b = .08×60 minutes). For context, across the middle 50% (110 to 240 min/week), this effect corresponds to a positive variation of 8.8–19.2 points in predicted achievement.
Note that some of the variables included in the analysis were categorical – specifically GENDER, GRADE, MISCED, FISCED, SCHLOC, SCHLTYPE, and ADMISSION – and were treated accordingly. However, instead of presenting the individual effects for each category level, we only report their combined effect.
Among the demographic variables, SES (ESCS) and GRADE had the strongest effects. A one-standard-deviation increase in ESCS corresponded to a 29.33-point difference in mathematical scores – for the middle 50% of the sample (−0.86 to +0.66), this represents a 44.5-point spread. Similarly, each additional grade level was associated with an estimated difference of 28.08 points (e.g., 56.16 points between grades 8 and 10).
Other school-level factors showed smaller but significant associations. Schools located in larger cities (SCHLOC = 5) reported scores 25.24 points higher than those in rural areas (SCHLOC = 1), averaging a 6.31-point difference in scores per level. Likewise, students attending private independent schools (SCHLTYPE = 1) showed scores 10.40 points higher than those attending public schools (SCHLTYPE = 3) and about 5.20 points higher than those in government-dependent private schools. Similarly, academic selectivity (ADMISSION) accounted for a 4.84-point difference between schools that “always” versus “never” use placement tests (b = 2.42).
The remaining variables showed lesser effects on mathematics scores. Results indicate that the time spent learning mathematics (MMINS) had a limited positive association, corresponding to about 1.8 points per hour, a significantly smaller effect than the time spent learning FL. Gender differences were similarly modest, with boys scoring 10.43 points higher than girls on average. Surprisingly, higher maternal education levels (MISCED) was associated with an average difference of −2.16 point, while each increase in paternal education level (FISCED) corresponded to an estimated difference of −0.83 point in mathematics scores.
To explore differences among countries, these were grouped by economic status according to the latest United Nations World Economic Situation and Prospects report (United Nations, 2025). This approach was selected since national economic growth has been shown to mediate the effect of SES – the strongest predictor in our main model – on academic achievement (Kim et al., Reference Kim, Cho and Kim2019). Research also indicates there is a reciprocal relationship between economic growth and academic performance, further emphasising the rationale for this stratification (Hanushek & Woessmann, Reference Hanushek and Woessmann2021; Valero, Reference Valero2021; Woessmann, Reference Woessmann2016).
This model revealed a consistent positive association between increased FL instruction time and mathematics performance across all economic categories (Figure 1). However, the magnitude of this effect varied notably by economic status. Economies in transition, countries such as Azerbaijan, Belarus, and Georgia, exhibited the steepest slope, with mathematics scores on average 11.4 points higher per additional hour of FL learning per week. In contrast, developing countries (e.g., Argentina, Morocco, and Vietnam) recorded the lowest baseline scores and the smallest overall estimated difference, with an average of 4.2 points per hour of FL instruction. Developed countries (e.g., Canada, Japan, and Norway) also displayed a positive trend, with an average difference of 7.2 points per hour of FL learning (for details, see Supplementary Material Appendix S2).

Figure 1. Linear mixed effects model results: Effect of FLMINS on mathematics scores according to countries’ economic status.
4.3. Robustness tests
To assess the stability of the observed association between foreign-language learning and mathematics achievement, we conducted additional analyses using alternative subsamples and model specifications.
4.3.1. Multilingual subsample
For this check, the model was re-estimated using only data from multilingual education systems (Luxembourg, Switzerland, Canada, Finland, Spain, and Singapore), keeping all other filters in place. The coefficient for FL instruction time remained positive (b = 0.05, p < .001), corresponding to approximately 3 PISA points per additional weekly hour of FL learning. Although statistically reliable, this effect was modest compared to SES and grade level, indicating that contextual factors exert a stronger influence.
4.3.2. Expanded sample with language-mismatch indicator.
This test was performed by removing the language-use filters and adding a binary indicator for language mismatch (1 if home = test language, 0 if not), while keeping the covariates identical to the main specification. This expanded sample (382,647 students, 95% of PISA 2018 complete cases) yielded consistent results: The association between FLMINS and mathematics performance remained positive and significant (b = 0.07, p < .001; 95% confidence interval (CI) [0.06, 0.07]), corresponding to approximately 4.2 PISA points per additional hour of FL instruction per week. The language mismatch indicator was a significant predictor (b = 5.7, p < .001), confirming its influence on achievement but not accounting for the FLMINS effect.
4.3.3. Within- vs between-school decomposition (Mundlak approach)
The Mundlak approach (Mundlak, Reference Mundlak1978) was applied to assess if the observed FL–maths association persisted at the student level or was a reflection of school-level differences (e.g., wealthier schools offering more language classes). This method separates within-school (comparing students to their classmates) and between-school (comparing school averages) effects, ensuring that variations in school-level allocation do not inflate the individual-level estimate.
The within-school coefficient remained positive and statistically significant (b = 0.06), indicating that a student receiving one additional weekly hour of FL instruction (relative to their school peers) is predicted to score 3.6 points higher in mathematics. The between-school component was also positive (b = 0.09), showing that students attending schools that offer an additional weekly hour of FL instruction tend to score 5.5 points higher in maths tests. While these effects are smaller than those of SES and grade level, the persistence of the within-school association indicates that the relationship is not solely a byproduct of school placement.
Full results for all robustness analyses and alternative specifications are provided in the Supplementary Material (Table S3 and Appendix S2).
4.4. Study 1 discussion
Study 1 examined the relationship between foreign language instruction time (FLMINS) and mathematics achievement using a linear mixed model (LMM). The main model revealed a positive association: Mathematics scores were approximately 4.8 points higher per additional weekly hour of foreign-language (FL) instruction. This finding reinforces prior research on the academic benefits of FL learning (Nucette et al., Reference Nucette, Hamamura, Leitao and Biedermann2024; Woll & Wei, Reference Woll and Wei2019), as well as previous research on the indirect link between FL learning and improved numeracy performance, mediated by EF (Al-khresheh & Karmi, Reference Al-khresheh and Karmi2024; Shoghi & Ghonsooly, Reference Shoghi and Ghonsooly2018).
Although the model explained a significant portion of the variance in mathematical scores (R 2 = 86.5%), the analysis highlighted that most of the variability is attributable to country and individual differences, particularly socioeconomic level and grade. School-level characteristics (e.g., school location, type), while influential, played a comparatively smaller role in the variation.
Robustness checks further strengthened these findings against methodological challenges. The association persisted, though at a reduced magnitude (3 PISA points per FL-hour/week), when the analysis was restricted to multilingual education systems, indicating the effect is not confined to language learning contexts. Expanding the sample to include language-mismatch cases confirmed the overall stability of the association and indicated that it is not driven by language-background selection or test-language disadvantage. Crucially, decomposing exposure into within- and between-school components (Mundlak adjustment) revealed that the effect holds true among peers in the same school (b = 0.06) and across schools based on their FLMINS allocation (b = 0.09). These patterns suggest that the observed relationship is not only attributable to fixed differences between schools or non-random placement of students. While they do not establish causality, they are consistent with our theoretical interpretation that FL learning may support mathematics achievement indirectly through cognitive processes.
Stratification by country-level economic status demonstrated that FLMINS was consistently associated with higher mathematical performance across all economic categories. The strongest effects were observed for economies in transition, an outcome that could be attributable to specific pedagogical mechanisms, similar educational reforms, or shared attitudes toward FL learning. Another possible explanation is cultural in nature, with intrinsic motivation and resilience playing a role (Huisman et al., Reference Huisman, Smolentseva and Froumin2018). As neither of these mechanisms was the primary focus of the present study, additional research is needed to explore these hypotheses. In contrast, developing and developed economies exhibited smaller gains. In developing economies, this may reflect structural constraints such as limited access to quality education or inadequate teacher training, whereas in developed economies, well-established teaching practices and curricula may lead to smaller effects. These findings align with previous literature highlighting the influence of national economic contexts on educational outcomes (Hanushek & Woessmann, Reference Hanushek, Woessmann, Rosén, Hansen and Wolff2017).
While the positive association between FLMINS and mathematics achievement was robust across specifications, it should be interpreted as correlational rather than causal, given the cross-sectional nature of the data. We acknowledge that unobserved school- and country-level factors – such as specific teaching practices, student–teacher rapport, curriculum design, and educational policies – likely contribute to the observed relationship and require further exploration.
To address the limitations of traditional approaches, a machine learning (ML) framework was adopted for the second research question. ML techniques offer enhanced capacity to manage complex datasets, tolerate noise, and identify influential predictors with high accuracy – making them particularly suitable for analysing large-scale educational datasets such as PISA. This approach also enables cross-validation of findings from Study 1, contributing to the robustness and generalisability of the results.
5. Study 2. Effect of foreign language programme characteristics on maths scores
5.1. Method: Machine learning approach (random forest)
To evaluate the impact of FL programme characteristics on mathematical achievement, an ML approach was adopted, specifically the random forest (RF) method, implemented via the R package randomForest (Breiman, Reference Breiman2001; Liaw & Wiener, Reference Liaw and Wiener2002).
The primary rationale for selecting this method was its high predictive accuracy. ML methods, particularly tree-based models, are designed to construct statistical models that generalise effectively to unseen data, whereas traditional models often perform optimally on the data to which they were fitted. In other words, ML models tend to produce predictions that closely align with actual outcomes (Hastie et al., Reference Hastie, Tibshirani and Friedman2009). Due to their predictive power, such algorithms have been widely applied across various fields, including psychology and education, to uncover patterns that might otherwise remain undetected (Balabied & Eid, Reference Balabied and Eid2023; Liew et al., Reference Liew, Hamamura and Uchida2025; Nachouki et al., Reference Nachouki, Mohamed, Mehdi and Abou Naaj2023; Yarkoni & Westfall, Reference Yarkoni and Westfall2017).
RF algorithms, along with other ML approaches, have been increasingly applied to identify predictors of mathematics and science literacy, using large-scale international student assessments, such as PISA and Trends in International Mathematics and Science Study (TIMSS) (Arroyo Resino et al., Reference Arroyo Resino, Constante-Amores, Gil Madrona and Carrillo López2024; Gil-Madrona et al., Reference Gil-Madrona, Guerrero-Muguerza, Infantes-Paniagua and Martínez-López2025; Song & Cutumisu, Reference Song and Cutumisu2025). Their growing use likely reflects their suitability for large and complex datasets. These algorithms also demonstrate a greater resilience to missing data and are less influenced by outliers, ensuring robust and reliable results (Hastie et al., Reference Hastie, Tibshirani and Friedman2009; Liaw & Wiener, Reference Liaw and Wiener2002). RF models also provide an important interpretative advantage by quantifying the relative importance of predictor variables, offering insights into their practical significance on the outcome (James et al., Reference James, Witten, Hastie and Tibshirani2023; Varian, Reference Varian2014; Yarkoni & Westfall, Reference Yarkoni and Westfall2017). These features make them superior to more traditional statistical models, including LMMs, which have known issues to detect complex interactions and may overestimate the significance of certain variables in large samples (Lin et al., Reference Lin, Lucas and Shmueli2013; Thiese et al., Reference Thiese, Ronna and Ott2016).
Building on these strengths, the present study applies an RF model to examine a broader range of predictors than those considered in Study 1 by incorporating factors related to FL teaching programmes, such as multicultural curriculum and teaching practices. The primary objective was to identify the variables that most robustly and accurately predict the outcome variable – Math Score. Accordingly, the model included established demographic predictors of academic achievement, alongside variables specific to FL programmes (see Table 2).
The predictors can be categorised as follows:
-
• Student-level predictors: Gender, grade level, SES (ESCS), parental involvement (PARINV), and parents’ highest educational levels (MISCED, FISCED).
-
• Foreign language programme characteristics: Time allocated to FL study (FLMINS), teacher experience in years (TEACHEXP), teacher self-efficacy in multicultural environments (GCSELF), and indicators of multicultural teaching practices and curriculum (MCTEACH, MCCUR).
-
• School-level predictors: Student–teacher ratio (STRATIO), school size (SCHSIZE), school ownership (SCHTYPE), school location (SCHLOC), and mathematics instructional time (MMINS) as a control for time-on-task effects.
The dataset was randomly split into training (80%) and testing (20%) subsets. This approach ensured that model fit indices reflected the model’s ability to predict outcomes on unseen data (Varian, Reference Varian2014), thereby providing external validation of model performance (hold-out testing). Internal model performance was evaluated using out-of-bag (OOB) validation, which provides a cross-validation estimate based on bootstrap resampling during training. Performance was quantified using root-mean-square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R 2).
5.2. Study 2 results
The RF model performed well across both internal and external validation methods. The OOB cross-validation yielded an R 2 of 0.962, an RMSE of 17.29, and an MAE of 9.43. Performance on the independent hold-out test set was nearly identical (R 2 = 0.966, RMSE = 17.16, MAE = 9.41), indicating minimal overfitting and strong generalisability within the PISA dataset. Given the range of mathematics scores (136–799.6) and the mean of 485.1 in the data, the RMSE and MAE correspond to deviations of approximately 3.5% and 1.9% of the mean score, respectively.
These results indicate that the model explained approximately 96% of the variance in the OECD-generated PVs for mathematics scores, with a low prediction error, as illustrated in Figure 2.

Figure 2. Random forest model results: Predicted vs actual values.
Predictor importance was assessed using the percentage increase in mean-squared error (%IncMSE), a metric that quantifies the reduction in the model’s predictive performance when the values of a given variable are permuted while keeping all other variables unchanged. A higher %IncMSE value indicates greater predictor importance of a particular variable, as altering its values results in an error of higher magnitude (Breiman, Reference Breiman2001; Liaw & Wiener, Reference Liaw and Wiener2002).
Using this approach, Economic, Social, and Cultural Status (ESCS) emerged as the most influential predictor (%IncMSE = 364.9%). This means that randomly changing ESCS values, while keeping all other variables constant, introduces an error 3.6 times greater than that of the original model. The second most important predictor, closely following ESCS, was FLMINS (363.3%), further confirming its strong association with mathematics performance.
Other important predictors included the implementation of multicultural curricula (MCCUR = 277.9%) and maths instruction time (MMINS = 274.9%), while SCHSIZE (259.4%) and STRATIO (228.0%) demonstrated moderate importance. Notably, teachers’ multicultural self-efficacy (GCSELF) had a %IncMSE of 0%, indicating that it had no predictive power for mathematics scores in this model. Figure 3 presents a ranked summary of predictor importance.

Figure 3. Random forest model results: Variable importance based on the percentage increase in mean-square error (%IncMSE – mean decrease in accuracy).
5.3. Robustness tests
5.3.1. Single PV value
A robustness check was conducted using a single PV (PV3). For sample sizes exceeding 6,500 cases, single-PV estimation yields stable results (OECD, 2009) and avoids the potential smoothing effect of averaging, ensuring the findings are not dependent on the aggregation procedure.
The PV3 model showed slightly lower performance (R 2 = 0.962, RMSE = 19.95, MAE = 10.83) than the averaged-PV model, but predictor importance patterns remained consistent, supporting the robustness of the findings.
5.3.2. Removal of PV-conditioning variables
PISA PVs are generated using latent proficiency models that incorporate some of the background variables included in our predictive models (e.g., socioeconomic status or ESCS, gender, grade) (Zieger et al., Reference Zieger, Jerrim, Anders and Shure2022). Consequently, it was deemed necessary to examine whether the predictive prominence of FLMINS was influenced by the inclusion of these variables. To do so, the RF model was re-estimated excluding ESCS, GENDER, and GRADE. As expected, model fit decreased to R 2 = 0.83, with RMSE = 36.92 and MAE = 23.73, corresponding to 7.6% and 4.9% of the mean mathematics score, respectively.
Despite this reduction in overall fit, in the absence of ESCS, FLMINS emerged as the most important predictor across both variable importance metrics (%IncMSE and IncNodePurity), surpassing all other student- and school-level variables. Other predictors retained their relative ranking, although their importance values changed modestly. These results demonstrate that the predictive contribution of FLMINS is not an artifact of PV conditioning and is robust across model specifications.
Full results, including detailed importance rankings and plots, are provided in the Supplementary Material (Appendix S2).
5.4. Study 2 discussion
Study 2 employed an RF model to examine associations between FL programme characteristics and mathematics achievement. The model explained a large share of variance (R 2 = 0.97, RMSE = 3.5%, MAE = 1.9% of mean score), with OOB performance closely matching test-set results, indicating strong internal consistency and generalisability. However, the high level of model fit requires careful interpretation. PISA PVs are conditioned on background variables (such as SES, grade, and gender) that are also included as predictors – creating an outcome-predictor dependency known to inflate fit indices (Zieger et al., Reference Zieger, Jerrim, Anders and Shure2022). The large sample size and the presence of several highly predictive variables may have further contributed to this effect.
A robustness check using only PV3 produced a modestly lower R 2, consistent with the expected influence of pooling results across PVs, which smooths random error and can slightly increase fit. Crucially, when key PV-conditioning variables (ESCS, GRADE, and GENDER) were removed from the model, FLMINS emerged as the most important predictor across both importance metrics, and the ranking of the remaining variables remained highly stable. This pattern indicates that although absolute accuracy metrics are influenced by PV construction, the relative importance structure – particularly the prominence of FLMINS – reflects a genuine and robust signal rather than a modeling artifact.
Taken together, these findings suggest that the RF results should be interpreted primarily as exploratory, highlighting variable importance and nonlinear relationships rather than providing definitive estimates of predictive accuracy. Nonetheless, the stability of the importance rankings offers meaningful insights into the role of FL programme features within the broader ecosystem of predictors of mathematics achievement.
Consistent with Study 1, time dedicated to FL instruction (FLMINS) emerged as a highly influential predictor – second only to SES (ESCS) in the full model – a pattern that held across specifications. The RF model’s capacity to capture nonlinearities and interactions further enriches this interpretation, offering a more nuanced understanding of how FL-related factors operate alongside other student- and school-level characteristics (Bates et al., Reference Bates, Mächler, Bolker and Walker2015; Breiman, Reference Breiman2001).
Among the FL programme features, multicultural curriculum integration (MCCUR) demonstrated the strongest association with mathematics scores. This effect is likely mediated by the cognitive benefits of robust FL learning and cultural exposure, which is strongly linked to enhanced EFs – including working memory, response inhibition, and cognitive flexibility – all critical predictors of mathematical reasoning (Cragg et al., Reference Cragg, Keeble, Richardson, Roome and Gilmore2017; ten Braak et al., Reference ten Braak, Lenes, Purpura, Schmitt and Størksen2022). Additionally, multicultural education has been associated with enhanced critical thinking and problem-solving skills (Aslan & Aybek, Reference Aslan and Aybek2020; Qondias et al., Reference Qondias, Lasmawan, Dantes and Arnyana2022), which also contribute to successful mathematical performance.
While this study offers credible evidence that features of FL programmes, particularly instructional time (FLMINS) and multicultural curriculum (MCCUR), are meaningfully associated with mathematical achievement, these findings do not establish causality. As in Study 1, additional research is required to disentangle the mechanisms underlying these associations.
6. General discussion
The present research examined the relationship between FL instruction time (FLMINS) and mathematics achievement among 15-year-old students using PISA 2018 data. Across all specifications, a consistent positive association was identified: Mathematics scores were on average 4.8 PISA points higher per additional hour of weekly FL instruction. Notably, this association is substantially stronger than that observed for mathematics instruction time (MMINS), which yielded a comparatively modest increment of 1.8 points per hour. This finding, which persisted after adjusting for powerful predictors like SES (ESCS) and grade level, underscores the unique and complementary contribution of FL learning to mathematical development.
6.1. Dual pathways and contextual moderation
The robustness of this positive association provides theoretical nuance, highlighting overlapping student-level and school-level components. The within–between decomposition showed a significant within-school effect: even when comparing peers in the same school, students with greater FL exposure tend to score higher in mathematics – consistent with our cognitive transfer theory. The stronger between-school effect suggests that unobserved institutional features may covary with FL time (e.g., curriculum, instructional quality, selection/admissions, timetabling).
These findings also engage with potential trade-offs in highly multilingual systems (Dentella et al., Reference Dentella, Masullo and Leivada2024). In a robustness check restricted to Luxembourg, Switzerland, Canada, Finland, Spain, and Singapore, the association persisted but was modest (3 points per additional FL hour/week). This pattern suggests that the link is sensitive to system-level features (curricular alignment, language-of-instruction policy), and learner-level factors (FL acquisition, usage), which may interact or influence cognitive development (Schiltz et al., Reference Schiltz, Lachelin, Hilger and Marinova2024). Together, these considerations provide a framework through which our findings can coexist with prior reports of reduced maths achievement in contexts of high language burden, rather than contradict them.
6.2. Programme features and equity
Consistent with prior research, SES (ESCS) emerged as the strongest determinant of mathematics achievement (b = 29.33), highlighting persistent educational disparities linked to economic resources (Kim et al., Reference Kim, Cho and Kim2019; OECD, 2023b; Rodríguez-Hernández et al., Reference Rodríguez-Hernández, Cascallar and Kyndt2020). However, our findings suggest that FL study may serve as a potential avenue for promoting equity. Indeed, the magnitude of these associations is considerable: the predicted score difference associated with two hours of weekly FL instruction (9.6 points) is roughly equivalent to the observed gender gap, while a six-hour instructional volume (28.8 points) parallels the impact of a full SD increase in SES. Moreover, the positive association between FLMINS and maths achievement was observed across all economic status groupings, showing the strongest effect size in economies in transition. This suggests a compensatory role for FL programmes, where the strongest positive associations are observed in settings with fewer existing structural advantages.
The ML approach (Study 2) complemented this finding by confirming FLMINS as a strong predictive factor, comparable in importance to ESCS in predicting mathematical performance. Furthermore, multicultural curriculum integration (MCCUR) emerged as the most significant feature among the evaluated FL programme components, supporting the view that FL benefits may accrue not just from time on task, but from instruction delivered authentically, promoting higher-order thinking and culturally responsive learning environments.
6.3. Other determinants of achievement
Beyond the primary determinants, the models identified several conventional and anomalous predictors of mathematics achievement. Consistent with expectations, grade level exhibited a strong effect (b = 28.08 per year of schooling), as students at higher grade levels have been exposed to more advanced topics and have had more opportunities to practice, apply concepts, and master fundamental skills.
Regarding institutional factors, school location (SCHLOC) and school type (SCHLTYPE) demonstrated moderate effects, indicating that students in urban settings and private schools achieve higher scores – a finding often linked to differential access to specialised resources and infrastructure. As expected, admission type (ADMISSION) also showed a significant positive effect, showing that selective or merit-based enrolment policies contribute to higher mean achievement. Institutional size factors, such as school size (SCHSIZE) and student-teacher ratio (STRATIO), were found to be of moderate predictive importance in Study 2, confirming that while these structural attributes may shape the learning environment, their impact on achievement is less pronounced than curricular content (Mohammadpour & Abdul Ghafar, Reference Mohammadpour and Abdul Ghafar2014).
Our results also showed that boys outperformed girls by an average of 10.43 points, consistent with the documented gender gaps in mathematics performance (Anaya et al., Reference Anaya, Stafford and Zamarro2022), which are likely influenced by contextual factors unrelated to biology, such as SES and cultural expectations (Breda et al., Reference Breda, Jouini and Napp2018; Johnson et al., Reference Johnson, Burgoyne, Mix, Young and Levine2022).
An interesting anomaly emerged regarding parental education (MISCED, FISCED). Contrary to typical findings (Davis-Kean et al., Reference Davis-Kean, Tighe and Waters2021), both maternal and paternal education showed a small negative association with scores. This finding is likely explained by the high correlation between parental education and the already-included ESCS variable (r ≈ 0.70), suggesting that SES (ESCS) accounts for the primary variance. The residual, small negative effect might reflect subtle mechanisms, such as parental time constraints associated with highly demanding careers, or could simply be the result of multicollinearity within the structural model.
Finally, instructional time allocated to mathematics (MMINS) and teacher quality variables exhibited nuanced roles. The effect of increasing MMINS was nearly three times smaller than the association observed for FLMINS (1.8 versus 4.8 PISA points per hr/week), suggesting that gains from instructional time alone are incremental and likely secondary to other factors, such as pedagogical quality or student engagement. Correspondingly, teacher self-efficacy in multicultural environments (GCSELF), a predictor in Study 2, showed no predictive power for mathematics achievement. Contrary to previous studies (Abacioglu et al., Reference Abacioglu, Volman and Fischer2020; Nuenay et al., Reference Nuenay, Cariga, Bualan and Banes2024), this non-finding suggests a potential intention-action gap, where teacher confidence in managing diverse classrooms primarily impacts noncognitive outcomes (e.g., classroom climate) rather than translating directly into measurable gains in mathematics proficiency (Feng et al., Reference Feng, Zhang, Yang, Lin and Maulana2024).
Collectively, these studies offer significant theoretical insights and suggest promising directions for policy. Our findings provide empirical support for the hypothesis that FL learning is a robust correlate of mathematical achievement. This relationship, established through a dual-pathway framework of cognitive transfer and institutional quality signaling, aligns with recent syntheses highlighting the potential of FL programmes to foster mathematical performance through relatively short instructional periods (Nucette et al., Reference Nucette, Hamamura, Leitao and Biedermann2024).
Given the cross-sectional nature of the PISA data, these results are hypothesis generating for policy. They suggest that FL education should be viewed as a complementary component of a well-rounded curriculum, rather than a competitor for STEM hours. Notably, the 2.6-to-1 efficiency ratio – where FL instruction exhibits a significantly stronger association with maths scores than additional mathematics time – suggests that at the margin, prioritising cognitive breadth may be more effective for developing mathematical literacy than increasing subject-specific volume.
Furthermore, the consistent predictive role of FLMINS indicates it may serve as a lever for reducing academic disparities. Strategic FL investment offers a hypothesised “cognitive lift” capable of narrowing achievement gaps associated with gender and SES. Specifically, policy should prioritise integrating multicultural curricula to foster the abstract reasoning essential for numeracy and ensuring equitable access to high-quality FL programmes in under-resourced schools. However, it is crucial that any implementation of such initiatives consider the system-level constraints and potential trade-offs noted in multilingual contexts.
7. Limitations and future directions
While this research significantly expanded the scope and methodological rigour of the research on FL learning and mathematical achievement, several limitations warrant consideration. Mainly, the cross-sectional design prohibits any claims of causality, as unmeasured factors such as intrinsic motivation, teacher–student rapport, pedagogical approaches, or general cognitive ability may partly account for the observed association.
Additionally, the theoretical link to cognitive transfer mechanisms (e.g., EFs) remains indirect, as direct measures of these cognitive functions are unavailable in the PISA data.
Future research should adopt longitudinal or quasi-experimental designs to better establish causality, incorporate direct cognitive assessments to test mediation pathways, and conduct cross-cultural comparisons to clarify how system-level predictors (e.g., national policies, resource allocation, language regimes) and curricular contexts shape these relationships.
8. Conclusion
The results of these studies highlight the consistent and meaningful relationship between FL learning and mathematical achievement. By showing that the association between FLMINS and maths performance is robust across diverse contexts and driven by both individual and institutional factors, these findings reinforce the importance of providing access to high-quality FL education. Ultimately, the integration of FL instruction, particularly when accompanied by a multicultural curriculum, may present a promising strategy for supporting students’ broader academic development and potentially mitigating contextual disparities in achievement.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S1366728926101138.
Data availability statement
The data that support the findings of this study are openly available in the OECD’s PISA 2018 Database (https://www.oecd.org/en/data/datasets/pisa-2018-database.html). The methodology details, along with the R code use for all models, are also available in the Supplementary Material (Appendices S3 and S4).
Acknowledgments
We would like to thank the two anonymous reviewers for their contributions to the previous version of the manuscript.
Competing interests
The authors declare none.
Disclosure of generative AI use
Generative artificial intelligence (AI) tools were used to assist with the troubleshooting of the R code for data analysis. Specifically, OpenAI’s ChatGPT-4 model (April 2024 version) was utilised. No AI tools were employed in the generation, interpretation, or critical analysis of the manuscript’s scientific content. The authors accept full responsibility for the accuracy, integrity, and validity of all aspects of the work.