Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-06T06:13:45.691Z Has data issue: false hasContentIssue false

Inference at the Data’s Edge: Gaussian Processes for Estimation and Inference in the Face of Extrapolation Uncertainty

Published online by Cambridge University Press:  10 March 2026

Soonhong Cho
Affiliation:
Political Science, UCLA , USA
Doeun Kim
Affiliation:
Political Science, UCLA , USA
Chad Hazlett*
Affiliation:
Political Science, UCLA , USA Political Science, Statistics & Data Science, UCLA , USA
*
Corresponding author: Chad Hazlett; Email: chazlett@ucla.edu
Rights & Permissions [Opens in a new window]

Abstract

Many inferential tasks involve fitting models to observed data and predicting outcomes at new covariate values, requiring interpolation or extrapolation. Conventional methods select a single best-fitting model, discarding fits that were similarly plausible in-sample but would yield sharply different predictions out-of-sample. Gaussian processes (GPs) offer a principled alternative. Rather than committing to one conditional expectation function, GPs deliver a posterior distribution over outcomes at any covariate value. This posterior effectively retains the range of models consistent with the data, widening uncertainty intervals where extrapolation magnifies divergence. In this way, the GP’s uncertainty estimates reflect the implications of extrapolation on our predictions, helping to tame the “dangers of extreme counterfactuals” (King and Zeng, 2006). The approach requires (i) specifying a covariance function linking outcome similarity to covariate similarity and (ii) assuming Gaussian noise around the conditional expectation. We provide an accessible introduction to GPs with emphasis on this property, along with a simple, automated procedure for hyperparameter selection implemented in the R package gpss. We illustrate the value of GPs for capturing counterfactual uncertainty in three settings: (i) treatment effect estimation with poor overlap, (ii) interrupted time series requiring extrapolation beyond pre-intervention data, and (iii) regression discontinuity designs where estimates hinge on boundary behavior.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of The Society for Political Methodology
Figure 0

Figure 1 Blue (prior): Distributional belief for an unseen $Y^*$ knowing only that it will come from a normal distribution with mean 0 and variance $\sigma ^2$. Red (posterior): Our revised belief regarding $Y^*$ assuming $cor(Y^*,Y_{obs})=0.8$, and having observed $Y_{obs}=-1$.

Figure 1

Figure 2 Posterior CEF with its 95% CI. Inferred distribution (spread vertically) about $Y^*$ as a function of covariate X, after seeing five observations (red dots). The key assumption is on the covariance between points as a function of their X values, here given by $cov(Y^*,Y_i) = exp(-||X_i-X^*||^2/b)$.

Figure 2

Figure 3 Comparison of uncertainty for conditional expectation functions given by GP (blue), KRLS (red), and a quadratic polynomial (gray). Models are fitted on X data between $-$3 and 1, then extrapolated over X between 1 and 3. The solid line shows the (randomly drawn) true function, while dashed/dotted lines indicate each model’s estimated CEF. Bands indicate 95% CIs for the CEFs.

Figure 3

Figure 4 Uncertainty quantification of CATE under varying overlap. Top row: Illustration from one draw of the simulation setting, with a good overlap region ($-2) and no overlap elsewhere. For treated units (red) and control units (blue), the true CEF (dotted lines) and estimated CEF (black solid lines) with corresponding uncertainty bands are drawn. As shown at both ends of the covariate value, GP allows for growing uncertainty bands adaptive to the degree of overlap. The upper bound of the confidence interval in extrapolation depends on the variability in the fitted data. Bottom row: The CATE estimates with 95% confidence bands. The wider bands of GP in poor-overlap regions propagate to higher uncertainty in CATE estimation. The red dashed line represents the true effect size.

Figure 4

Figure 5 Pointwise (top) and average (bottom) performance of GP model compared to LM and BART. Top row: The graph on the left displays the pointwise coverage rate of the ATE estimators by the three models. On the right, the lengths of corresponding 95% confidence intervals are shown over 500 simulations ($n=500$). Bottom row: Boxplots represent the distributions of the ATE estimates (left) and average interval lengths (right). The true treatment effect is denoted by the red dashed line. The coverage rates and RMSEs (of ATE) for each method are also shown.

Figure 5

Figure 6 Left column: Monthly rate of handgun background checks per 100,000 population in D.C. (top) and Vermont (bottom), seven years before and one year after the Heller ruling. The red envelope shows the 95% predictive interval for the GP-estimated non-treatment outcome, post-treatment. Right column: Point estimates and confidence band for the differences between the observed outcome (treated) and the predicted (non-treatment) counterfactual at each month. The kernel choice combines (adds) the linear, periodic, and Gaussian kernels.

Figure 6

Figure 7 GP results with three different kernels for Vermont. The post-treatment period is extended to four years to better illustrate the behavior under extreme extrapolation. Left: Gaussian kernel; Center: “Gaussian + periodic + linear” kernel; Right: “Gaussian + periodic + quadratic” kernel.

Figure 7

Figure 8 The coverage rate, average length of 95% confidence interval, and RMSE of GPrd (red) and rdrobust with “robust” options (black) in the total random setting. The ratio of the variance of Y to X is given by $\sigma $ on the horizontal axis. Top row: True effect size is zero, 500 observations. Bottom row: Effect size is zero, sample size is reduced to 100. Results with a true effect size of three look identical.

Figure 8

Figure 9 Relationship between simulated latent variable $\mu $ (horizontal axis) and vote share X (vertical axis) at two steepness (s) parameters used in the simulation.

Figure 9

Figure 10 RD simulation results with latent variable (effect size = 3). Each point represents the coverage rate, or average length of 95% confidence interval of GPrd, gp_causal, gp_causaltrim, and rdrobust across 500 iterations of the latent variable simulations ($n = 200$). The horizontal axis represents the $R^2$.

Figure 10

Figure 11 RDD estimates using close election data with placebo cutoff points and different GP RD Specifications. The 95% confidence interval of GPrd (red), gp_causaltrim (blue), and rdrobust (black) across eight different placebo cutoffs and the true cutoff point at 0.5 are presented.

Figure 11

Figure 12 RDD placebo cutoff subsampling simulation. Boxplots show the distribution of treatment effect estimates from GPrd (red), gp_causaltrim (blue), and rdrobust (black) across four different placebos and true cutoff point at 0.5.

Supplementary material: File

Cho et al. supplementary material

Cho et al. supplementary material
Download Cho et al. supplementary material(File)
File 10.6 MB