1. Introduction
The Kyoto Protocol (KP), an international environmental accord adopted in 1997 as an extension of the United Nations Framework Convention on Climate Change (UNFCCC), sought to combat global climate change by establishing binding emission reduction targets – an important indicator of the circular economy – for developed nations. It officially took effect in 2005 after Russia ratified the protocol, which fulfilled the prerequisite that a minimum of 55 countries emitting at least 55 per cent of global greenhouse gas (GHG) emissions had ratified the treaty. With the first commitment period that ran from 2008 to 2012 under an average reduction target of 5 per cent compared to 1990 levels (UNFCCC, 1998), the KP created a substantial impact on worldwide endeavours to mitigate climate change (Aichele and Felbermayr, Reference Aichele and Felbermayr2012; Grunewald and Martinez-Zarzoso, Reference Grunewald and Martinez-Zarzoso2016). For instance, the UNFCCC reported that during the first commitment period, the emissions of the largest 37 polluters declined by 22 per cent, far exceeding the initial target (UNFCCC, 2020). Consequently, in December 2012, the Doha Amendment (DA) established a second commitment period (2013–2020) for the KP’s emissions reduction, this time with a targeted reduction of 18 per cent compared to 1990 levels. The DA formally entered into force on 31 December 2020 (UNFCCC, 2020, 2022).
There is a rich body of literature examining the effects of the KP on emissions (e.g., Grunewald and Martinez-Zarzoso, Reference Grunewald and Martinez-Zarzoso2016; Maamoun, Reference Maamoun2019; Kim et al., Reference Kim, Tanaka and Matsuoka2020; Fernando and McWhinnie, Reference Fernando and McWhinnie2025). Maamoun (Reference Maamoun2019) used the generalized synthetic control method to examine the effectiveness of the KP under a ‘No-Kyoto’ counterfactual scenario and found that the KP has been successful in reducing the GHG emissions of the industrialized countries when compared to the counterfactual case. Kim et al. (Reference Kim, Tanaka and Matsuoka2020) employed the difference-in-difference (DID) and the propensity score matching to examine the marginal benefit of the KP during 1997–2008 in terms of emissions reduction and economic development. Their results show that the KP had a significant positive impact on carbon dioxide (CO2) reduction but a negative impact on GDP in the long run (Kim et al., Reference Kim, Tanaka and Matsuoka2020). Using a smaller sample focused on 12 EU countries, however, Albulescu et al. (Reference Albulescu, Artene, Luminosu and Tămășilă2020) found no evidence for the impact of the KP on the reduction of pollution levels in those countries. Such studies contributed to the environmental Kuznets curve (EKC) literature arguing that economic growth may be harmful in the short term, but it can improve the environment in the long term (Grossman and Krueger, Reference Grossman and Krueger1995; Mikayilov et al., Reference Mikayilov, Mukhtarov, Mammadov and Azizov2019).
Nevertheless, it is noted that most of those studies employed the DID method because it allows for the examination of two groups of countries (e.g., those who ratified the KP versus those who did not) around the event (e.g., before versus after the KP ratification) (Callaway and Sant'Anna, Reference Callaway and Sant'Anna2020; Callaway and Li, Reference Callaway and Li2023). However, traditional DID assumes that such an event is a one-off, e.g., all treated countries ratified the KP in the same year. When the ratification times are different across the treated countries (e.g., China and New Zealand ratified the KP in 2002 while Australia accepted it in 2007), heterogeneous issues may occur (Goodman-Bacon, Reference Goodman-Bacon2021). Specifically, challenges arise when using early-treated samples as controls for late-treated samples without addressing the heterogeneity stemming from the effects of the relevant environmental policies in early-treated countries (Xu et al., Reference Xu, Wang and Tao2024).
On the other hand, empirical studies on the DA are limited. One main reason is that the DA has been replaced by the Paris Agreement and thus more attention is focused on the latter (Wirth, Reference Wirth2017; He et al., Reference He, Luo, Shamsuddin and Tang2022). Another reason is the data availability issue (Depledge, Reference Depledge2022), which might prevent researchers from deeper analysis of this event. For instance, as discussed later in Section 4.1.1, the parallel assumption for DID is violated if examining the DA independently. As such, a novel DID approach is needed.
We argue that the inconclusive results of DID studies on the impacts of the KP and DA on emissions may be due to the heterogeneous issue arising from policy staggering, i.e., countries ratified and implemented the KP and DA at different times. Therefore, we aim to bridge this research gap by providing fresh evidence on the impacts of the KP and DA on emissions reduction using various novel approaches of staggered DID (Goodman-Bacon, Reference Goodman-Bacon2021; Sun and Abraham, Reference Sun and Abraham2021). More importantly, such advanced techniques allow us to examine both the KP and DA under a unified framework but not independently from each other; this study is the first to do so.
Consequently, the contribution of this paper is threefold. Firstly, to the best of our knowledge, it is the first staggered DID study on the impacts of international environmental agreements such as the KP and DA. So far, the limited literature on staggered DID has focused more on emissions at the firm level (Yu et al., Reference Yu, Liu, Gao, Yuan, Shen and Chen2022; Wu et al., Reference Wu, Tan, Wang, Wu, Meng and Zheng2023) but not at country or international levels. Secondly, it is also the first to examine both the KP and DA in a unified framework; previous studies mostly focused on the KP while the DA received less attention due to its recent inception. Thirdly, we not only account for the heterogeneity arising from different ratification times of different countries regarding the KP and DA but also robustly examine those impacts under different time frames (e.g., 1 year before or 3 years after the event). In this sense, our results can provide a comprehensive picture of the impacts, helping to reveal the real effect of the two important international environmental agreements. Together, these contributions ensure that our study not only fills a gap in the literature but also provides policy-relevant insights on the effectiveness of international agreements, even those whose impacts unfold over long periods. Understanding these agreements’ effectiveness provides valuable lessons for international climate policy and bridges the gap between the Kyoto era and later accords such as the Paris Agreement.
The rest of the paper is structured as follows. Section 2 reviews the relevant literature on both emissions reduction and the impacts of international environmental agreements including the KP and DA. Section 3 introduces the empirical approach, and the data used in the analysis. Section 4 reports and discusses empirical results while Section 5 concludes our study.
2. Literature review
2.1. Emissions and growth: the EKC hypothesis
It is accepted that there exists a trade-off between economic development and other dimensions of well-being, such as equality, welfare, and environmental quality (Ngo et al., Reference Ngo, Trinh, Haouas and Ullah2022; Ping and Shah, Reference Ping and Shah2023); such a trade-off is normally examined under the EKC hypothesis. Arguably, the EKC framework was first introduced by Grossman and Krueger (Reference Grossman and Krueger1991) in their seminal study on air pollution and per capita income using cross-sectional data from 42 countries. They argued that in the early stages of development, countries prioritize economic growth over environmental concerns, leading to increased pollution levels. However, as incomes rise and societies become wealthier, environmental awareness and demand for cleaner technologies increase, resulting in a decline in pollution (Grossman and Krueger, Reference Grossman and Krueger1995). This implies that in the short term, economic growth may harm the environment, but over the long term, it may lead to environmental improvement (Bimonte and Stabile, Reference Bimonte and Stabile2017; Mikayilov et al., Reference Mikayilov, Mukhtarov, Mammadov and Azizov2019). This is an important argument supporting the development of the circular economy (Ahmad et al., Reference Ahmad, Chandio, Solangi, Shah, Shahzad, Rehman and Jabeen2021). Accordingly, it is not only important to examine the linearFootnote 1 but also non-linearFootnote 2 relationships between the socio-economic-demographic factors such as GDP and population on emissions.
Numerous studies have examined the EKC hypothesis across various pollutants, including air and water pollution, deforestation and carbon emissions (e.g., Stern, Reference Stern2004; Baiardi, Reference Baiardi2014; Jaeger et al., Reference Jaeger, Kolpin and Siegel2023). While the EKC concept has garnered significant attention, the empirical evidence supporting the existence of an EKC is mixed (Bimonte and Stabile, Reference Bimonte and Stabile2017; Ongan et al., Reference Ongan, Isik and Özdemir2019). Several factors contribute to this situation.
Firstly, the choice of environmental indicators and economic variables can influence the shape and existence of the EKC. Different pollutants and environmental indicators may exhibit distinct relationships with economic development. For instance, some studies have found evidence of an EKC for local pollutants such as sulphur oxide (Baiardi, Reference Baiardi2014; Wang et al., Reference Wang, Han and Kubota2016) and CO2 (Ozturk and Acaravci, Reference Ozturk and Acaravci2013; Chan and Wong, Reference Chan and Wong2020), while others have observed a monotonically increasing relationship (Hussain and Dogan, Reference Hussain and Dogan2021) or even no evidence of the EKC (Mikayilov et al., Reference Mikayilov, Mukhtarov, Mammadov and Azizov2019). Nevertheless, since CO2 accounts for nearly 80 per cent of the GHGs (EPA, 2021), most studies have used CO2 as the main variable of interest (Yang et al., Reference Yang, Dong, Du, Du, Dong and Chen2021; Khalfaoui et al., Reference Khalfaoui, Arminen, Doğan and Ghosh2023).
Secondly, the choice of econometric techniques and model specifications can influence the results because each approach has its advantages and limitations. Panel data analysis is commonly used to control unobserved country-specific factors and potential endogeneity (Ngo et al., Reference Ngo, Trinh, Haouas and Ullah2022). However, it is subject to model specificationFootnote 3 and identification challenges, and the results can be sensitive to the inclusion of additional control variables (more discussions are provided in the meta-analysis of Saqib and Benhmad (Reference Saqib and Benhmad2021) on the EKC literature).
Thirdly, the heterogeneity across countries and regions can affect the EKC relationship. The EKC hypothesis assumes a universal pattern of environmental degradation and economic development (Stern, Reference Stern2004; Jaeger et al., Reference Jaeger, Kolpin and Siegel2023). However, different countries may exhibit diverse institutional and cultural characteristics, and policies that influence their environmental performance. Considering the variations in the acceptance and implementation of the KP and DA at a global scale, such variations must be an important factor to be considered when examining the EKC hypothesis.
Nevertheless, the EKC hypothesis is a robust theoretical framework to examine the relationship between the environment and economic development as well as other determinants such as institutions and regulations (Ngo et al., Reference Ngo, Trinh, Haouas and Ullah2022; Jaeger et al., Reference Jaeger, Kolpin and Siegel2023; Khalfaoui et al., Reference Khalfaoui, Arminen, Doğan and Ghosh2023). We, therefore, posit that it is important to extend the EKC hypothesis to examine CO2 emissions using a panel data analysis but considering the heterogeneity issue using proper tools such as the staggered DID approach. With the help of staggered DID, in addition, we could also examine the combined effects of multiple regulations and agreements such as the KP and DA simultaneously.
2.2. The role of international environment agreements
There are multiple international environment agreements,Footnote 4 even before the KP, such as the Stockholm Conference in 1972, the Vienna Convention in 1985 and its resulting Montreal Protocol 1987, and the Basel Convention in 1989 (Mitchell et al., Reference Mitchell, Andonova, Axelrod, Balsiger, Bernauer, Green, Hollway, Kim and Morin2020; Shelton, Reference Shelton2021). The literature on international environmental agreements, although more focused on the KP and DA because of their global scale, climate change focus and binding emission reduction commitments (OECD, 2001; Keong, Reference Keong and Keong2021), have highlighted a couple of important aspects as follows.
Firstly, most studies agree that those agreements play an important role in improving the quality of the environment. For instance, it is reported that the Basel Convention has helped raise awareness, enhance transparency and strengthen enforcement mechanisms, resulting in a reduction of illegal and uncontrolled transboundary movements of hazardous waste (UNEP, 2015). Similarly, the Montreal Protocol is deemed highly effective in achieving its double objectives of phasing out the production and consumption of ozone-depleting substances and, consequently, protecting the ozone layer (Velders et al., Reference Velders, Andersen, Daniel, Fahey and McFarland2007; Gonzalez et al., Reference Gonzalez, Taddonio and Sherman2015). For the KP, Aichele and Felbermayr (Reference Aichele and Felbermayr2012) found empirical evidence of a 7 per cent reduction in domestic emissions in 40 committed countries (1995–2007). A similar result of 7 per cent reduction was also found in the study of Maamoun (Reference Maamoun2019), this time for 153 countries (1995–2012). However, there are also studies with contradictory results on the effectiveness of such agreements (Victor, Reference Victor2011; Helm, Reference Helm2012).
Secondly, the number of studies on the DA is relatively limited due to its recent adoption and the focus on its implementation and effectiveness is still emerging. Consequently, the effectiveness of the DA in achieving emissions reductions remains uncertain and thus, a key issue with the DA is the level of ambition and compliance with emission reduction targets. For instance, because large emitters such as Japan, Russia and the US have not ratified the amendment, it is questionable if any emissions reduction, if it happens, can be attributed to the DA or not (Wirth, Reference Wirth2017; Verweij, Reference Verweij2023). A study by Harmsen (Reference Harmsen2018) found that the DA had the potential to lead to additional emissions reductions compared to a scenario without the amendment; however, it would depend on the level of participation and ambition. To this end, a comprehensive study that can empirically examine the effectiveness of both the KP and DA can contribute to this blurry picture.
3. Research design
3.1. Data
We follow the EKC hypothesis (e.g., Grunewald and Martinez-Zarzoso, Reference Grunewald and Martinez-Zarzoso2016; Ngo et al., Reference Ngo, Trinh, Haouas and Ullah2022) to collect global data on CO2 emissions, GDP per capita, and population for 197 countries during the 1990–2020 period utilizing the World Development Indicators database (World Bank, 2021). It should be noted that all countries have ratified the KP with Fiji and Maldives being among the earliest ones to ratify it in 1998 and Afghanistan being the latest to accept it in 2013 (see illustrations in the Appendix, Figure A1).
Accordingly, after removing missing observations, we ended up with unbalanced panel data of up to 5,859 country-year observations. In summary, an average country in the sample has a population of about 35 million, and each person has an income of $11,741.28 at constant 2015 US$ price (see more details in the Appendix, Table A1). To account for different measures of CO2 emissions, we examine the amount of kg per GDP (i.e.,
${\text{C}}{{\text{O}}_{\text{2}}}$), its total equivalent weight in kilotons (i.e.,
${\text{C}}{{\text{O}}_{{\text{2a}}}}$) and its value in metric tons per capita (i.e.,
${\text{C}}{{\text{O}}_{{\text{2b}}}}$); such inclusion helps improve the robustness of our results (more details are provided in Section 4). It is noted that the correlations between these variables are all lower than 0.2, suggesting that they are independent and can be used in our DID estimations.
3.2. Estimation strategy
The DID is a popular estimator and is a widely used tool in applied economic research for assessing the impacts of public interventions and other treatments of interest on relevant outcome variables (Abadie, Reference Abadie2005). Many public policies, including international agreements, are adopted at different times across units. In such staggered designs, our target is a set of group time average treatment effects (ATEs) that compare outcomes for a cohort of adopters at a given event time to outcomes for appropriate controls that have not yet adopted (or never adopted) in the same calendar time. Standard two-way fixed effect (TWFE) regressions can produce unintuitive or even negative weights when treatment effects vary over time or across cohorts, because already treated units may implicitly act as controls. We therefore rely on a modern estimator that is robust to such heterogeneity: the imputation/event study approach of Borusyak et al. (Reference Borusyak, Jaravel and Spiess2024), which first fits untreated potential outcomes using only untreated observations (with fixed effects) and then differences observed outcomes from their imputed counterfactuals to obtain dynamic effects by event time. Identification rests on (i) parallel trends in untreated outcomes, (ii) no anticipation (or limited anticipation), (iii) no interference across units and (iv) overlap so that not yet treated units exist to serve as controls at each event time. In our context, ratification and entry into force dates yield absorbing treatment with clear timing; we assess parallel trends visually using pre-treatment event time leads and discuss possible cross border spillovers (e.g., trade related leakage) as a limitation. This design aligns with best practice recommendations for staggered adoption and avoids the weighting pathologies of traditional TWFE event studies (Goodman-Bacon, Reference Goodman-Bacon2021; Sun and Abraham, Reference Sun and Abraham2021).
In this study, we applied a DID imputation estimator with staggered adoption of treatment when countries accept the KP and DA at different times, following the approach of von Bismarck-osten et al. (Reference von Bismarck-osten, Borusyak and Schönberg2022), Nguyen et al. (Reference Nguyen, Huynh, Ngo and Nguyen2022) and Borusyak et al. (Reference Borusyak, Jaravel and Spiess2024). While DID is popularly applied at the micro-level, e.g., firms and households (Bach et al., Reference Bach, Hoang and Le2025; Tran et al., Reference Tran, Hoang, Ngo, Nguyen and Tran2025), recent DID applications are also extended to macro-level studies involving regions, states and countries using annual and aggregated data (Malesky et al., Reference Malesky, Nguyen and Tran2014; Nguyen et al., Reference Nguyen, Huynh, Ngo and Nguyen2022; Wooldridge, Reference Wooldridge2025), including ones on the KP (Kim et al., Reference Kim, Tanaka and Matsuoka2020; Fernando and McWhinnie, Reference Fernando and McWhinnie2025). It is noted that the conventional TWFE regression analysis in event studies has its limitations, particularly when dealing with heterogeneous treatment effects. These effects occur when units within the panel are subjected to varying treatment timings, creating an analytical challenge (Goodman-Bacon, Reference Goodman-Bacon2021). Imbens and Angrist's (Reference Imbens and Angrist1994) seminal work posited that the estimator for ATE can be viewed as a weighted average. Intriguingly, their strategic application of a monotonic condition (either non-increasing or non-decreasing) guarantees the exclusion of negative weights. From a financial perspective, we can comprehend the staggered DID as a weighted mean of two-period DIDs, which we refer to as the variance weighted ATE on the treated (Callaway and Sant'Anna, Reference Callaway and Sant'Anna2020; de Chaisemartin and D'Haultfœuille, Reference de Chaisemartin and D'Haultfœuille2020; von Bismarck-osten et al., Reference von Bismarck-osten, Borusyak and Schönberg2022). In his comprehensive analysis, Goodman-Bacon (Reference Goodman-Bacon2021) elucidated the circumstances under which negative weights might arise. He then posits that units already subjected to treatment, but concurrently serving as control units, undergo changes in their treatment effects over time. This phenomenon of negative weighting is predominantly associated with time-variant treatment effects. Such an occurrence can potentially skew the regression DID estimates, typically causing a bias in the opposite direction to that of the authentic treatment effect (Goodman-Bacon, Reference Goodman-Bacon2021).
The imputation estimator as conceived by von Bismarck-osten et al. (Reference von Bismarck-osten, Borusyak and Schönberg2022) and Borusyak et al. (Reference Borusyak, Jaravel and Spiess2024) represents an exemplary model within the developing landscape of DID estimators for staggered settings. This estimator exhibits robustness to heterogeneous treatment effects without any restrictions while ensuring efficiency and maintaining unbiasedness in finite-sample scenarios. This embodies the desired qualities in contemporary financial econometric analysis. Additionally, Cunningham (Reference Cunningham2021) also endorses the staggered DID methodology as is an effective and straightforward resolution to the issues of negative weights associated with the conventional TWFE DID estimator. While we also notice the existence of alternative methods such as the two-stage DID, spatial DID and synthetic control DID (Arkhangelsky et al., Reference Arkhangelsky, Athey, Hirshberg, Imbens and Wager2021; Butts and Gardner, Reference Butts and Gardner2022), the main purpose of our study is to examine the impacts of both the KP and DA under a unified framework and thus the staggered DID approach of Borusyak et al. (Reference Borusyak, Jaravel and Spiess2024) is utilized in this study.
Specifically, the procedures for imputation estimation of the staggered DID unfold in a sequence of three steps. Initially, our task is to formulate an estimate for potential outcomes in the absence of treatment – namely, entities that have never been treated or are yet to undergo treatment. This is achieved by employing exclusively untreated observations within a TWFE regression framework of
The theoretical construction of Equation (1) suggests an evaluation of the KP's (or DA)’s influence on
${\text{C}}{{\text{O}}_{\text{2}}}$ emissions (in kg per GDP), primarily considering economic output (GDP) and demographic trends (POP) as chief drivers of these emissions. Note that we also examine another two variations of CO2, which are the total equivalent weight in kilotons (i.e.,
$CO2a$) and its value in metric tons per capita (i.e.,
${\text{C}}{{\text{O}}_{{\text{2b}}}}$) (see Table 1 in Section 4.2). As such, Equation (1) closely follows Harbaugh et al. (Reference Harbaugh, Levinson and Wilson2002) and Jaeger et al. (Reference Jaeger, Kolpin and Siegel2023) to identify the (inverted) U-shape relationship between economic growth and the environment, i.e., the EKC hypothesis (Grossman and Krueger, Reference Grossman and Krueger1995; Stern, Reference Stern2004; Ongan et al., Reference Ongan, Isik and Özdemir2019; Khalfaoui et al., Reference Khalfaoui, Arminen, Doğan and Ghosh2023). Because international agreements act through policy and technology channels, conditioning on post–treatment variables (e.g., domestic carbon pricing or renewable shares that change due to the treaty) would bias total–effect estimates downward. Our design targets the total effect; country fixed effects absorb stable capacity differences; year fixed effects absorb common shocks; and EKC–motivated income/population terms capture baseline growth–emissions dynamics. Here, for the treated observations, we calculated the treatment effects
${{{\tau }}_{it}}$ depending on
${\text{C}}{{\text{O}}_{\text{2}}}_{it}\left( 0 \right)$ which is the untreated potential outcome:
In the final step, we compute the averages of these individually determined, imputation-based treatment effects, denoted as
${{{\tau }}_w}$. This value offers insight into how countries reduce CO2 emissions after accepting the KP or DA. Note that this approach allows us to examine the consequent effects of both the KP and DA, which has never been done before. We believe that it is an important contribution of this study to the literature.
4. Empirical results
4.1. Testing for the assumptions of DID
4.1.1. Parallel trend tests
Marcus and Sant'Anna (Reference Marcus and Sant'Anna2021) stressed that DID procedure relies on the assumption that the average outcome for the treated and untreated groups would have evolved in parallel in the absence of the treatment. Importantly, it restricts the average counterfactual outcome for treated units in the post-treatment period, assuming they had not received the treatment. However, it does not directly impose constraints on the outcome during the pre-treatment periods. According to Angrist and Pischke (Reference Angrist and Pischke2009) and Sant'Anna and Zhao (Reference Sant'Anna and Zhao2020), this assumption is the hardest to fulfil in DID and thus it is a critical condition to ensure the internal validity of DID models.
To do so, we employ the strategies proposed by Sun and Abraham (Reference Sun and Abraham2021) and (Borusyak et al., Reference Borusyak, Jaravel and Spiess2024). Essentially, this involves verifying whether there exists a statistically insignificant differential of CO2 emissions between the treatment and control groups preceding the event dates. In particular, we run the regression in Equation (3) on the untreated sample to falsify the parallel trend violation:
where
${D_{it}}$ stands for the sets of indicators for observations recorded during certain periods preceding the treatment (when
$t \lt 0$).
For Equation (3), we incorporate both country- and year-fixed effects in our analysis, coupled with clustering at the country level for enhanced robustness, following Abadie (Reference Abadie2005). Here, periods occurring before
$t$ act as the reference group. For instance, observations recorded at the
${t^{th}}$ periods before the treatment date will be assigned a
${D_{it}}$ value of 0. Conversely, all other observations will receive a
${D_{it}}$ value of 1. We estimate
$\mu $ by using ordinary least squares on untreated observations only, followed by a joint null test (i.e.,
$\mu = 0$) using the Chi-squared test. Particularly, the Chi-squared test suggests that if the observed p-value of the joint null test exceeds 0.1, the parallel trend assumption is supported. In essence, this indicates that the difference between the treatment and control groups before the event date is statistically insignificant. While there is no universally accepted optimal choice for the number of
$t$, we adopt the approach from Nguyen et al. (Reference Nguyen, Huynh, Ngo and Nguyen2022), selecting the number of
$t$ to be three and supplement this with an additional test for two periods before the implementation date. This strategy fortifies the robustness of our parallel trend test results; our findings suggest that it is appropriate to examine the KP and DA as staggered events whereas the post-event differences between the treatment and control groups can be attributed to those events (see the Appendix, Table A2).
4.1.2. Anticipation tests
For the investigation of anticipation effects, we employ a methodology similar to that of von Bismarck-osten et al. (Reference von Bismarck-osten, Borusyak and Schönberg2022), wherein we scrutinize placebo effects by shifting the actual treatment year for each observational unit backwards by a period of 3 years. After this temporal recalibration, we assess the ATE for the periods spanning 2 and 3 years preceding the official event year. A lack of statistical significance in these estimates would imply the absence of anticipation effects. In other words, this would suggest that the firms in the study do not alter their behaviour in anticipation of the upcoming event, thereby validating the exogeneity of the treatment (Nguyen et al., Reference Nguyen, Huynh, Ngo and Nguyen2022). For our study, the joint null ATE of 2 years and 3 years before the event date are all insignificant, validating the exogeneity of the KP and DA (see the Appendix, Table A3).
4.2. Estimated results from staggered DID
Once the parallel trend assumption is substantiated, we proceed to scrutinize the disparities between nations that have either accepted (i.e., regarding the KP) or amended (i.e., regarding the DA) the two international agreements on CO2 emissions reduction. For a robust and thorough investigation, we utilize the imputation estimator in conjunction with Callaway and Li (Reference Callaway and Li2023)’s approach – such dual methodology helps provide a comprehensive analysis and robust results for our study (see Section 4.4).
The estimated results for
${{{\tau }}_w}$ in Table 1 generally suggest that the ratification of the KP significantly helped decrease CO2 emissions, while the same impact of the DA was not significant. More specifically, the results indicate that a country that has pledged to reduce emissions, as stipulated by the KP, typically emits around 9 per cent less CO2 compared to a nation without such commitments. This result is in line with the figure of 7 per cent from Grunewald and Martinez-Zarzoso (Reference Grunewald and Martinez-Zarzoso2016) and 8 per cent from Aichele and Felbermayr (Reference Aichele and Felbermayr2012), which may be due to the differences in the methodologies. For instance, Grunewald and Martinez-Zarzoso (Reference Grunewald and Martinez-Zarzoso2016) used a TWFE DID and propensity score matching while Aichele and Felbermayr (Reference Aichele and Felbermayr2012) applied fixed effects IV regressions; our study employed the staggered DID. Nevertheless, this consistent result underscores the substantial role that international environmental agreements such as the KP can play in reducing global CO2 emissions.
Table 1. Analysis of CO2 emissions around the KP and DA

Notes: This table presents the anticipation effects of the three dependent variables at 2 and 3 years after the event dates across the KP and DA members.
${{{\tau }}_w}$ denotes the imputation-based treatment effects (see Equation (2) and the relevant discussions).
$CO2$ measures CO2 emissions in kg per 2015 US$ of GDP (in logarithm),
$CO2a$ measures CO2 emissions in kiloton (in logarithm),
$CO2b$ measures CO2 emissions in metric tons per capita (in logarithm). p-Values are presented in parentheses.
Upon juxtaposing the results in column (1) against those in columns (2) and (3), it is further discernible that the estimated impact of the KP, when operationalized via emission intensity as the dependent variable, exhibits a marginally more pronounced effect than its alternatives. In essence, nations with Kyoto commitments demonstrate an average of 9.56 per cent lower emission intensity per GDP unit relative to their counterparts lacking such commitments. This differential in emission intensity may well constitute the mechanism through which Kyoto commitments exert influence on nations’ emissions. A plausible argument would posit that policy frameworks such as the KP spur technological change, which in turn brings about modifications in emission intensity (Grunewald and Martinez-Zarzoso, Reference Grunewald and Martinez-Zarzoso2016).
For the insignificant roles of
${{{\tau }}_w}$ in columns (4)–(6) of Table 1, one can argue that the effect of the DA was not as significant as that of the KP because the DA is an extension of the KP. In other words, the marginal effect of the DA should be less apparent than that of the KP.Footnote 5 More importantly, it is noted that the KP and DA have their own historical milieus. For instance, the KP's immediate influence could be attributable to the unprecedented nature of such an accord at its inception, stimulating nations to undertake immediate and potentially more accessible modifications. In contrast, as an extension, the DA adaptations now might involve more complex procedures that demand a longer period to manifest into discernible effects. Importantly, as discussed in Section 2.2, many major emitters (e.g., Japan and the US) did not participate in the DA and thus the intensity of emission reduction commitments of its participating countries was lower, compared to the case of the KP. Moreover, it is important to acknowledge that the global geopolitical, economic, and technological contexts have undergone significant shifts since the implementation of the KP (Wirth, Reference Wirth2017; Verweij, Reference Verweij2023); such changes need to be further examined. These alterations can exert a substantial influence on the pace and scale at which the impact of such international agreements is realized. Hence, while the current impact may not mirror the immediate effects witnessed with the KP, the potential for long-term, profound changes because of the DA should not be underestimated. We leave this task for future research.
Table 1 also presents empirical results for the EKC variables. We can observe that population positively and significantly impacts CO2 under a linear relationship, while economic growth significantly influences CO2 emissions via an inverted U-shape relationship. These findings re-confirm the existence of the EKC across the sampled countries, consistent with the previous literature (Albulescu et al., Reference Albulescu, Artene, Luminosu and Tămășilă2020; Ngo et al., Reference Ngo, Trinh, Haouas and Ullah2022); this time using a new approach utilizing new (combined) data regarding both the KP and DA agreements. Interestingly, we found that the coefficients of POP (and GDP/GDP2) are larger (and smaller) during the KP commitment period compared to the DA, respectively. It indicates that when an agreement plays a more significant role (e.g., for the KP), it can influence the relevant roles of population and economic growth within the EKC framework. Therefore, we argue that the effectiveness of environmental agreements such as the KP could alleviate the burdens for countries to control their economic development, e.g., the positive impact of GDP on CO2 is 1.670 in column (2) and smaller than the value of 1.923 in column (5). Instead, countries should pay more attention to the population factor in an effort to monitor and reduce CO2 emissions.
4.3. Other evidence
We first provide some illustrations in Figure 1 to support the empirical evidence in Table 1. Particularly, Figure 1 shows that before these event dates, the red nodes move around the x-axis across six graphs, which supports our parallel trend assumption for both the KP and DA (see Section 4.1). Importantly, the top three graphs illustrate the significant decrease in CO2 emissions after the KP has experienced a downward trend (i.e., the blue nodes), hence supporting the statistical results of Table 1, columns (1)–(3). In addition, the bottom three graphs show that the blue dots fluctuate around the x-axis, supporting the insignificant results in Table 1 regarding the DA, i.e., columns (4)–(6). However, as discussed in Footnote 5, the graphs of the DA still reveal that (i) there is a decreasing trend in CO2 emissions after the DA, and (ii) that trend should become more significant in the long run.

Figure 1. The impacts of the KP and DA on CO2 emissions reduction.
We further test for the time-sensitive effects of the KP (and the subsequent DA) to CO2 emissions reduction, confirming the escalating magnitude of CO2 over time. Particularly, all three dependent variables (i.e.,
$lnCO2$,
$lnCO2a$, and
$lnCO2b$) demonstrate a consistent pattern of improvement in CO2 emissions reduction, manifested in both significance levels and magnitudes. On average, the KP yields a reduction of approximately 5.05 per cent in CO2 emissions, of which the average reduction reached the level of 9.56 per cent just 3 years after the KP (see the Appendix, Table A4). These findings highlight the substantive impact of the KP, especially considering the time-sensitive nature of policy effects. Consequently, it strengthens the necessity for longitudinal studies, especially for multiple events such as our paper, in order to accurately gauge the effectiveness and efficiency of these international accords.
4.4. Robustness test: The average group-time treatment effect approach
An alternative resolution to the negative weight issue is to bypass unfavourable comparisons, as suggested by Callaway and Li (Reference Callaway and Li2023). This approach involves estimating all plausible ‘favourable' comparisons for calculating the ATEs on the treated. These estimations are then consolidated to yield a comprehensive summary of the results, thereby offering a solution that circumvents the issue of negative weights while maintaining robustness in the analysis (explanations are provided in Appendix B). Nevertheless, the average group-time treatment effect results are consistent with our main findings in both statistical and illustrative aspects (see Figure A2 and Table A5 in the Appendix), confirming that our findings and the relevant discussions in the previous sections are robust.
5. Conclusions
This study explored the impact of the KP (1998–2012) and its successor, the DA (2013–2020), on global CO2 emissions reduction. In a novel approach, we employed the imputation and average group-time treatment DID methods for staggered settings to analyse both international environmental agreements within a unified framework, which has not been done before. Our analysis took into account the heterogeneity resulting from different countries ratifying the agreements at different times and robustly examined the impacts across various time frames, such as 1 year before and 3 years after the events. This comprehensive approach allowed us to provide a thorough understanding of the effects of these agreements in various dimensions, shedding light on their true impacts.
Empirically, we found that the ratification of the KP has significantly contributed to a decrease in CO2 emissions, corroborating previous studies by Aichele and Felbermayr (Reference Aichele and Felbermayr2012) and Grunewald and Martinez-Zarzoso (Reference Grunewald and Martinez-Zarzoso2016), among others. Importantly, for the first time in the literature, such confirmation is derived from a novel approach examining the KP and DA together. Although we do not find statistical evidence of the impact of the DA, which may be attributed to a timeliness issue, the validity of this agreement is expected to improve in the coming years. It is also important to note that the DA is an extension of the KP but without some major emitters such as Japan and the US, so its marginal effect may be less apparent than that of its predecessor agreement. Furthermore, the two agreements have distinct historical contexts, with the KP focusing primarily on the short term, while the DA entails more complex procedures that require a longer period to manifest discernible effects. Therefore, although the immediate impact may not mirror that of the KP, the potential for significant and lasting changes resulting from the DA should not be underestimated. Nevertheless, our results confirmed the existence of the EKC hypothesis at the global level and the role of international environment agreements such as the KP in this context. It is, therefore, implied that strengthening international cooperation across countries, especially for heavy emitters, is important for the success of those agreements. Interestingly, the results of this study also indicate that the effectiveness of environmental agreements (e.g., the KP) could alleviate the burdens for countries to control their economic development in order to reduce pollution.
Other limitations and future research directions are as follows. Firstly, it is obvious that future studies with more data on the DA (when it is available) will help explain the impact of this agreement. Secondly, because our study focused on the methodological extension of the EKC hypothesis to utilize the staggered DID to examine the impacts of both the KP and DA in a unified framework, our EKC model is simple with only the three core variables of CO2, GDP, and POP; although we have accounted for the country- and year-fixed effects, the omitted variable bias may occur. Future studies could extend our model to deal with this issue by incorporating more control variables such as geographical, regulatory, economic and institutional factors, or even using subgrouping analysis (Hussain and Dogan, Reference Hussain and Dogan2021; Ngo et al., Reference Ngo, Trinh, Haouas and Ullah2022; Gao et al., Reference Gao, Tan and Chen2025). In a similar vein, different emissions and pollutants as well as different forms (e.g., with cubic or quartic polynomial values) are different ways to extend our EKC model. And lastly, one could also apply alternative approaches such as the two-stage DID (Butts and Gardner, Reference Butts and Gardner2022), synthetic control DID (Arkhangelsky et al., Reference Arkhangelsky, Athey, Hirshberg, Imbens and Wager2021), generalized method of moments DID (Brown and Butts, Reference Brown and Butts2025) and chained DID (Bellégo et al., Reference Bellégo, Benatia and Dortet-Bernadet2025) to verify and extend our results.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1355770X26100503.
Competing interests
The authors declare no conflict of interest.