Change-Point Detection and Regularization in Time Series Cross-Sectional Data Analysis

Abstract Researchers of time series cross-sectional data regularly face the change-point problem, which requires them to discern between significant parametric shifts that can be deemed structural changes and minor parametric shifts that must be considered noise. In this paper, we develop a general Bayesian method for change-point detection in high-dimensional data and present its application in the context of the fixed-effect model. Our proposed method, hidden Markov Bayesian bridge model, jointly estimates high-dimensional regime-specific parameters and hidden regime transitions in a unified way. We apply our method to Alvarez, Garrett, and Lange’s (1991, American Political Science Review 85, 539–556) study of the relationship between government partisanship and economic growth and Allee and Scalera’s (2012, International Organization 66, 243–276) study of membership effects in international organizations. In both applications, we found that the proposed method successfully identify substantively meaningful temporal heterogeneity in parameters of regression models.


Introduction
Many of the datasets encountered by political scientists in applied research take the form of repeated observations from a fixed number of subjects, the most well-known of which is time series cross-sectional (TSCS) data. When examining these time series data, researchers frequently run into the change-point problem because an underlying theory predicts that the effect of some variables will change or because researchers are concerned that unknown breaks will result in model misspecification and omitted variable bias. In any scenario, identifying change points in regression coefficients has a significant impact on substantive findings.
Detecting change points in regression parameters requires researchers to distinguish between major parametric shifts that can be interpreted as structural changes and minor parametric shifts that must be interpreted as noise. There are two statistical challenges that arise when attempting to identify major change points. First, two "unknowns" in the change-point problem (change points and regime-dependent parameters) must be jointly estimated. Separate estimates, such as data segmentation and separate model fitting, introduce the danger of overfitting or incorrect data splitting. Second, due to the possibility of rank deficiency in subsample data, parameter regularization must be considered in conjunction with the joint estimation of change points and regime-dependent parameters.
In this paper, we propose a new Bayesian method for joint estimation of change points and regime-specific regression parameters in high-dimensional data. Our proposed method combines Bayesian methods for parameter regularization, change-point detection, and variable selection. We first introduce the hidden Markov Bayesian bridge model (HMBB), which combines a Bayesian bridge model for parameter regularization with a hidden Markov model for multiple change-point detection. In this paper, we present HMBB in the context of TSCS data because TSCS data have been the core of dynamic model development in political science literature (Beck 2001;Beck et al. 1993;Beck and Katz 1995;Beck and Katz 2011;Box-Steffensmeier et al. 2014;Brandt and Freeman 2006;Hazlett and Wainstein 2022;Imai and Kim 2021;Pang, Liu, and Xu 2022;Western and Kleykamp 2004;Wucherpfennig et al. 2021).
Having said that, our work closely follows the development of change-point models in political science, economics, and statistics. In political science, Beck (1983) introduced the idea of change points as a special case of time-varying parameter models with a sharp change. Afterward, Western and Kleykamp (2004) elaborated on the benefits of using a Bayesian modeling method to the change-point problem. According to Western and Kleykamp, the Bayesian approach "combines the advantages of diagnostic and parametric approaches but addresses their limitations. . . .Like diagnostic methods, the Bayesian analysis treats the timing of change as uncertain and the location of a change point as a parameter to be estimated. . . .Like parametric models, the Bayesian model yields statistical inferences about regression coefficients. However, these inferences reflect prior uncertainty about the location of the change point that is unaccounted for in conventional models" (355). After Western and Kleykamp (2004), Spirling (2007b) developed the concept of Bayesian change-point modeling in the setting of a limited dependent variable based on Carlin, Gelfand, and Smith (1992). On the other hand, Park (2010Park ( , 2011bPark ( , 2012) extended Chib's (1998) multiple-change-point model to binary, ordinal, count, and panel data cases. Blackwell (2018) relaxed the restriction of the fixed change-point numbers in over-dispersed count data models by employing Fox et al.'s (2011) hierarchical Dirichlet process approach. Furthermore, Kent, Wilson, and Cranmer (2022) presented a change-point detection method using a permutation-based parameter distribution. 1 We will discuss the change-point problem in regression models with a large number of predictors in the sections that follow. Following that, we provide a fixed-effects HMBB for TSCS data and describe our proposed estimation and model diagnostic procedures. We demonstrate the proposed method's performance on simulated data. Then, we apply our method to Alvarez, Garrett, and Lange's (1991) study of the relationship between government partisanship and economic growth, as well as Allee and Scalera's (2012) study of membership effects in international organizations (IOs). Our proposed method is freely available as an open-source R package BridgeChange (https://github.com/jongheepark/BridgeChange).

Problem
The change-point problem in regression models with a large number of predictors is illustrated in Figure 1. The difficulty of detecting substantial from minor changes in time-varying characteristics is illustrated in panel (a). The ground truth shows two major shifts (vertical gray bars) in two groups of parameters (A and C). All parameters, however, have changed slightly. The number and location of major changes, as well as regime-specific parameter values, are the two main quantities of importance in the change-point analysis. We need a principled method to identify major changes (vertical gray bars) from minor changes (local fluctuations). It is also important to note that the regime shifts in panel (a) are abrupt, but not deterministic. Separate regression after data splitting (or using a period dummy regression model) does not adequately represent the data generating process with stochastic regime transitions.
Panel (b) in Figure 1 reveals a case of rank deficiency in the change-point analysis even when the pooled data have full rank. Despite the fact that the entire sample has N > K , where N denotes the number of observations and K denotes the number of predictors, one of the subsamples identified In order to address the problems in Figure 1, we need a statistical method that combines change-point models with high-dimensional regression models. Recently, there has been a surge of high-dimensional change-point detection methods in frequentist approaches (e.g., Chan, Yau, and Zhang 2014;Frick, Munk, and Sieling 2014;Lee et al. 2018;Lee, Seo, and Shin 2016). Most of these methods focus on simple cases of high-dimensional change-point problems in which only a small subset of parameters are time-varying or the case under consideration is limited to a singlebreak case.
Our strategy is to take a full advantage of recent developments in regularization methods in the statistics literature (Carvalho, Polson, and Scott 2010;Chernozhukov et al. 2017;Fan and Li 2001;Hoerl and Kennard 1970;Park and Casella 2008;Polson 2012;Tibshirani 1996;Tibshirani et al. 2004;Zou and Hastie 2005). In particular, we emphasize that a Bayesian shrinkage approach to highdimensional regression provides an effective framework for regularizing high-dimensional model parameters while also providing a credible measure of estimation uncertainty (Kyung et al. 2010;Polson and Scott 2010).

Method
We introduce our proposed method for examining change-point effects in regression models with a large number of predictors in this section. An example procedure for implementing the proposed method is as follows: 1. model specification based on a theory and available data, 2. model fitting using multiple HMBBs with a varying number of break points, 3. model diagnostic using the Watanabe-Akaike Information Criterion (WAIC), 4. posterior summary of hidden state transitions and time-varying parameters.

Bridge Estimator
We begin with the Bridge estimator for parameter regularization (Frank and Friedman 1993;Fu 1998). The Bridge estimator is motivated by the penalized likelihood formulation shown below: where 0 < α ≤ 2 (Frank and Friedman 1993;Fu 1998). The above formula has an interesting feature in that the popular lasso estimator and ridge regression can be obtained as special cases of this estimator when α = 1 and α = 2, respectively. The variable selection feature of the bridge regression is asymptotic when 0 < α ≤ 1 (Frank and Friedman 1993;Huang, Horowitz, and Ma 2008). Because of this generality, Equation (1) has garnered an increasing attention in the statistical literature (Armagan 2009;Fan and Li 2001;Huang et al. 2009;Huang et al. 2008;Liu et al. 2007;Polson, Scott, and Windle 2014). We employ Polson et al.'s (2014) Bayesian approach of the bridge model in this research. A significant novelty in Polson et al.'s (2014) Bayesian method is the use of Lévy processes to create joint priors for β j and local shrinkage parameters (λ j ). A combined prior distribution of the regression parameter b and the local shrinkage parameter Λ = diag(λ 1 , . . .,λ j ) is represented as follows using scale mixes of normal representation: where p(λ j ) is the density of 2S α/2 and S α is the Lévy alpha-stable distribution. Equation (2) increases the efficiency of the Markov chain Monte Carlo (MCMC) method by allowing posterior samples of local shrinkage parameters (Λ = diag(λ 1 , . . .,λ j )) to be sampled independently of global shrinkage parameter sampling (τ). The dependence of the sampling of the global shrinkage parameter on the sample of the local shrinkage parameter has been cited as a significant restriction of Bayesian shrinkage models (Hans 2010;Polson et al. 2014).
In the case of a linear regression model with β as regression slope parameters and σ 2 as a residual variance parameter, the posterior distribution of the Bayesian bridge model can be written as

HMBB
Now, we allow the Bayesian bridge model's parameters to vary in response to a hidden state transition.
To be more precise, let S denote a vector of hidden state variables where s t is an integervalued hidden state variable at t and P as a forward moving M × M transition matrix where p i is the ith row of P and M is the total number of hidden states. For example, if we assume a single break, M = 2 and P is a 2 × 2 transition matrix. For efficient sampling of hidden states, we adopt Chib's (1998) non-ergodic transition of hidden states where a hidden state variable starts from state 1 and moves forward to the terminal state (M). From the above description, we can develop a fixed-effects HMBB for TSCS data using the groupdemeaned data (ȳ,x): where τ m is the break point between regime m −1 and regime m. In order to allow many predictors, we use the Bayesian bridge prior for β as shown in Equation (2). The posterior distribution of the resulting model is The subscript m in model parameters (β, σ 2 ,Λ, α, ν) indicates hidden states it belongs to. 2 If we ignore the Markov property of hidden regimes, we can simplify Equation (6) as The posterior distribution takes a form of a mixture distribution. From this, it becomes clear that HMBB estimates can be considered as weighted averages of Bayesian bridge regression model estimates fitted to subsets of data partitioned by known change points. Because we do not know the location and number of change points in reality, the hidden state variable is added as a latent variable and sampled from data in HMBB. The sampling algorithm is discussed in Appendix A.

Model Diagnostics using WAIC
After fitting multiple HMBBs with a varying number of breaks (or different model specifications), researchers must assess the model's fit to observed data. Model checking is a critical step in Bayesian analysis in general. This is especially true in the case of change-point analysis, where the break points are unknown. We recommend the WAIC for model diagnostics of HMBB because of its low computational cost. 3 WAIC is a fully Bayesian estimate of model uncertainty. WAIC approximates the expected 2 A random-effects HMBB can be written in a similar way by letting parameters of the random-effects model vary across subjects: . .
Then, the Bayesian bridge prior is used as the prior distribution of β. The posterior distribution of the resulting model will take the following form: Although we do not discuss a random-effects HMBB in this paper, it is available in BridgeChange. 3 We also examined the approximate log marginal likelihood (Chib 1995). However, in our test, the approximate log marginal likelihood of HMBB does not show a satisfactory result. The software to compute the approximate log marginal likelihood of HMBB is available in BridgeChange. We discuss the computational algorithm to estimate the approximate log marginal likelihood of HMBB in the Supplementary Material. log pointwise predictive density by subtracting a bias for the effective number of parameters from the sum of log pointwise predictive density. Using Gelman, Hwang, and Vehtari's (2014) formula, a WAIC of HMBB with M latent states (M M ) is indicates a variance, and θ (g ) are the gth simulated outputs for θ.

Simulated Data
We construct 24 sets of TSCS data with varied group sizes (n = (10, 20)), time lengths (t = (30, 60)), predictor sizes (k = (20, 30)), and break numbers (m = (0, 1, 2)) to evaluate the validity of our proposed method. We set x t ∼ N k (0, I k ) and From this, we generate y t as where CHOL indicates the Cholesky factorization. That is, the observed data are generated by the systematic component, time-invariant group-level factors, time-varying contemporaneous shocks, and time-varying observation error.

Simulation Results
One interesting pattern in Figure 2 is the over-detection of hidden states by both RMSE and WAIC. When the ground truth is time series data with a single break (the middle column), RMSEs and WAIC scores sometimes favor two-break models over one break models, which are closer to the ground truth. RMSEs and WAIC scores do not, however, favor models with fewer breaks than the ground truth. That is, there is no sign of under-detection of hidden states by HMBB.
Generally speaking, under-detection is a significantly more problematic issue than overdetection. Figure 4 illustrates this point. We compare the simulation results with one break (a) and two breaks (b) for n = 20, t = 60, and k = 30. The ground truth of panel (a) is time series data with a single break, and the ground truth of panel (b) is time series data with two breaks. Thus, the right column of panel ( Figure 3 shows additional simulation results. Panel (a) compares recovered hidden states (bright thin lines) with the ground truth (thick lines). Panel (a) clearly demonstrates that HMBB successfully uncovers hidden state structures under various setups. Panel (b) shows convergence diagnostics of the simulation results using a stabilized Gelman-Rubin statistics (Vats and Knudson 2021). The values that are close to 1 indicate good convergence of Markov chains. All Gelman-Rubin statistics are close to 1.
To summarize, Bayesian model diagnostics with WAIC give a solid framework for avoiding the problem of under-detection. However, WAIC frequently exposes researchers to the problem of over-detection. Comparing hidden state transitions between multiple models, as we did in Figure 4, is the best way to check for the over-detection problem. We will cover a more practical guidance in greater detail when we examine applications.   (s t |y)). The data are simulated from n = 20, t = 60, and k = 30.

Applications
In this section, we apply our proposed method to two studies in political science. The first example is Alvarez et al.'s (1991) study of partisan sources of economic growth, and the second example is Allee and Scalera's (2012) study on the IO membership effects.  Alvarez et al. (1991): The estimation is based on 10,000 MCMC runs after discarding the first 10,000 MCMC runs.  Alvarez et al. (1991) investigate how government partisanship and the level of labor union centralization affect economic growth in advanced countries using a longitudinal dataset of 16 OECD countries. They discovered that centralized labor organizations have a conditional effect on economic growth: Centralized labor organizations are "conducive to better economic performance when the Left was politically powerful." In contrast, weaker union movements "had desirable consequences for growth and inflation when governments were dominated by rightist parties" (551). Since then, the growth-promoting effect of left-party government and inclusive labor has received ongoing attention in comparative political economy literature (Beck et al. 1993;Boix 1997;Franzese Jr. 2002;Garrett 1998;Rueda 2008;Scruggs 2001;Soskice and Iversen 2000;Western 1998). The annual growth rate observed at country i and year t is the dependent variable of Alvarez et al. (1991). The independent variables are lagged growth rate (lagg1), weighted OECD demand (opengdp), weighted OECD export (openex), weighted OECD import (openimp), cabinet composition of left-leaning parties (leftc), and the degree of labor organization encompassment (central). 4 We add a year-fixed effect to Alvarez et al.'s (1991) partial interaction model, Our main goal in the replication is to check whether the key explanatory variables (central, leftc, and leftc×central) have time-varying effects during the sample period. We consider three different model specifications (no interaction model, a partial interaction model, and a full interaction model) using the original input variables of Alvarez et al. (1991). Then, given the short duration of the TSCS data (T = 15), we set the top limit of breaks to two for each model specification, providing nine models to compare. The panel method employed is the one-way year fixed effects. The country fixed effects are not used due to the time-invariant predictor (central).
The WAIC scores of the tested models are listed in Table 1. The partial interaction model with two breaks has the lowest WAIC score (642), followed by the same model with one break (646). However, as illustrated in the center-left in panel (a) of Figure 5, the initial regime of the two-break partial interaction models lasts only one year and appears to be redundant given the absence of a comparable pattern in the other models. Panel (b) of Figure 5 compares posterior estimates of time-varying parameters of the single-break partial interaction HMBB (left) with the two-break partial interaction HMBBs (right). Except the starting point difference, the two models produce almost identical posterior estimates of time-varying parameters. As a result, we will interpret the data using the single-break partial interaction model. 1970 1972 1974 1976 1978 1980 1982 1984 Figure 5 shows the posterior estimates of time-varying parameters for a singlebreak HMBB (left) and a two-break HMBB (right). As expected, the left plot, generated using a single-break HMBB, exhibits a pattern identical to that of the right plot, generated using a twobreak HMBB except the initial regime in the right plot.
Last, we compare HMBB estimates with conventional fixed-effects estimates in Figure 6. Several interesting patterns are worth noting. First, conventional fixed-effects estimates take the form of weighted averages of regime-specific HMBB values for certain covariates (e.g., central, inter, lagg1, and leftc), but not for others. Second, the interaction term (inter) and its two constituent terms (central and leftc) exhibit the most pronounced parametric change, which adds an interesting twist to Alvarez et al.'s (1991)

original conclusion. Only until 1978 did the left party government have a growth-promoting influence in the presence of centralized labor organizations.
After that, the effect waned significantly, signaling the start of a new era, the era of neo-liberal reform (Frieden 2020;Helleiner 1994).

Allee and Scalera's (2012) Study of Membership Effects in International Organizations
In our second example, we revisit Allee and Scalera's (2012) study on the divergent effects of membership in IOs from 1950 to 2006. Rose (2003)  matters, ceteris non paribus" (emphasis original, 111). This discovery sparked a flood of following studies amending or questioning the null effect (e.g., Goldstein, Rivers, and Tomz 2007;Gowa and Kim 2005;Park 2012;Subramanian and Wei 2007;Tomz, Goldstein, and Rivers 2007). Allee and Scalera (2012) is one of recent amendments that settles conflicting evidence regarding the effects of GATT/WTO membership on trade.
The key explanatory variable in Allee and Scalera (2012) is the type of accession, which is classified into three categories: early accession, automatic accession, and rigorous accession. According to Hypothesis 1 in Allee and Scalera (2012), rigorous accession must have a greater tradepromoting effect than other types of accession because "the more rigorous a state's accession to an IO, and thus the more policy changes required to join, the greater the benefits it will receive from membership" (243). Temporal heterogeneity is one of main concerns in Allee and Scalera (2012). They argue "although rigorous and early joiners should benefit from membership, we [Allee and Scalera] expect those benefits to be most pronounced in the years after accession and to fade over time" (260). They deal with the temporal effect heterogeneity by employing a "counter" variable that counts how many years have passed.
Our goal of replication is to check temporal heterogeneity in the key explanatory variables of Hypothesis 1 in Allee and Scalera (2012) using HMBB. 5 5 More specifically, we chose Column (5) and Column (6) models in Table 4 of the original paper. In these two models, the outcome variable is total national trade (imports and exports). The model formula of Column (5) model is y ∼ lnpop1 + gled g dppc + totalcont + polity + domestic1 9 + rigorous + colonial + earlymem.
The model formula of Column (6) model is y ∼ lnpop1 + gled g dppc + totalcont + polity + domestic1 9 + rigorous + rigorouscounter + colonial + colonialcounter + earlymem + earlymemcounter. In words, we examine whether the effect of accession type on a country's total national trade may vary over time as a result of economic shocks, decaying effects of accession types, or network effects of IO membership. Due to the significant fraction of missing observations in the original data, we use Ranjit's (2016) imputed data. The specification for the panel is identical to that in the original publication (a two-way fixed-effects at the country and year level).
The results of Bayesian model diagnostics using WAIC for the two models in our replication are summarized in Table 2. The two-break Column (6) model has the lowest WAIC score (22,078) among the six models in comparison. Given the possibility of the over-detection, we further examine hidden state transitions and time-varying movements of parameters in Figure 7.
Panel (a) of Figure 7 clearly shows that either 1964 or 2007 is estimated as change points across different HMBB specifications. Panel (b) demonstrates that a single-break HMBB of Column (6) shows a starkly different picture from a two-break HMBB of Column (6). Compare the two covariates at the top in panel (b). While the effect of (log) population (lnpop1) is not affected by hidden regime transitions, the effect of the level of economic development (gled_gdppc) shows dramatic shifts over time. Figure 8 compares parameter estimates of the conventional fixed effects with HMBB estimates. Several interesting patterns are worth noting. First, the two key explanatory variables (rigorous and rigorouscounter) show strong time-dependent effects. In Regime 1 (1946)(1947)(1948)(1949)(1950)(1951)(1952)(1953)(1954)(1955)(1956)(1957)(1958)(1959)(1960)(1961)(1962)(1963)(1964), the marginal effect of rigorous accession on country's total national trade is statistically indistinguishable from 0. This is very much in accordance with reality. Between 1946 and 1954, the proportion of sample countries with the status of rigorous accession is zero. After 1955, the fraction gradually grew, and by 1964, only 12 countries (1%) had achieved rigorous accession. This historical pattern in the effect of rigorous accession is unobservable using traditional fixed-effect estimation methods.
Second, a substantial decline in rigorouscounter, the interaction of rigorous with its "counter," following the second break (2007) can be attributed to the economic crisis of 2008-9. The greater negative trend in rigorouscounter following 2007 implies that countries that have a lengthy history of rigorous admission have borne the brunt of the economic crisis.
Last, as illustrated in Figure 6, there is no consistent pattern in the association between conventional fixed-effects and HMBB estimates. In some circumstances (e.g., gled_gdppc), the Explanations for the included predictors are as follows: • lnpop1: log of population, conventional fixed-effects estimates are close to the regime 2 estimates, but not in others (e.g., lnpop1 and earlymemcounter).

Discussion
A typical way to use our method for TSCS data will be as follows. First, researchers build a set of regression models for the change-point analysis. Next, researchers fit HMBBs with a varying number of breaks ranging from 0 to an upper limit researchers deem ideal given the model, data, and theory. Researchers make an informed decision regarding the best-fitting model by analyzing the WAIC scores, hidden state transitions, and parameter changes of several HMBBs. Once researchers have selected an appropriate HMBB using the above procedure, they will have K × M regime-specific regression coefficients to analyze. When either K or M is large, it might be difficult to interpret all the time-varying information. Although Bayesian shrinkage approaches outperform variable selection methods such as spike and slab prior models in terms of computational efficiency, they lack the sparsity property. That is, in Bayesian shrinkage approaches, parameter estimates are never exactly zero. Thus, it would be beneficial for researchers to discriminate between "strong" (i.e., statistically distinct from 0) and "weak" (i.e., statistically indistinguishable from 0) signals across all K × M parameters.
In this scenario, we can employ the decoupled shrinkage and selection (DSS) method that minimizes an 0 -type loss function on pre-regularized posterior distributions (Hahn and Carvalho 2015). The DSS loss function of the HMBB for regime m can be constructed as follows using HMBB estimations as shrinkage inputs: where X m β * m is the fitted value of the fixed-effects HMBB at regime m. λ is a nonnegative regularization parameter, and γ is the new slope parameters that minimize the loss function.
To find the optimum of Equation (8), we take the popular approach of the 1 surrogation of the 0 problem. Furthermore, to better target the 0 solution, we use the weight vector ( w j ,m = 1 | γ j ,m | δ ) where δ > 0 and γ j ,m is the root N-consistent estimate of γ j ,m as suggested by Zou (2006): Table 3 summarizes the DSS results using a single-break HMBB fitted on the full interaction model of Alvarez et al. (1991). Notably, regime-specific parameters that are near to zero are forced to zero, which helps researchers concentrate on nonzero DSS estimates for succinct interpretation of HMBB results.

Conclusion
We presented a Bayesian strategy for detecting and estimating change points in regression models with a high number of predictors in this article. The suggested model unifies a variety of statistical methods (a Bayesian shrinkage method, a change-point model, and sparse regression) to enable researchers to perform effective and consistent inference on time-varying parameters in a variety of TSCS datasets. We concentrate on the fixed-effects method in this article because it is one of the most often-used ways to analyze TSCS data in political science. However, our proposed strategy has a far broader application than the fixed-effect model. Our software package includes a tool for constructing a regression model from univariate time series data as well as multilevel models with changing intercepts or slopes. We intend to expand the presented strategy to models with discrete response data. While we are confident in our proposed method's performance in regular TSCS data settings, we would like to point out a few limitations. First, HMBB is best suited to changes in parameter values that are quite abrupt or significant. Dynamic linear models are better equipped to handle slow, cyclical, or evolutionary changes (West and Harrison 1997). Second, HMBB determines whether all parameters have a common break. Assume that only a small subset of parameters changes over time, whereas the remainder remain constant, or that parameters exhibit heterogeneous change points. HMBB would be incapable of identifying these parameter-specific change points precisely. We are currently developing change-point regression models that allow for parameter-specific break detection and regularization using recent breakthroughs in Bayesian statistics, such as Hahn et al. (2018). Finally, HMBB uses the Bayesian bridge model as a baseline model for shrinkage. It is worth trying to combine different shrinkage methods, such as the horseshoe prior (Carvalho et al. 2010), with change-point models.