Skip to main content
×
Home
    • Aa
    • Aa

Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach

  • Jens Hainmueller (a1) and Chad Hazlett (a2)
Abstract

We propose the use of Kernel Regularized Least Squares (KRLS) for social science modeling and inference problems. KRLS borrows from machine learning methods designed to solve regression and classification problems without relying on linearity or additivity assumptions. The method constructs a flexible hypothesis space that uses kernels as radial basis functions and finds the best-fitting surface in this space by minimizing a complexity-penalized least squares problem. We argue that the method is well-suited for social science inquiry because it avoids strong parametric assumptions, yet allows interpretation in ways analogous to generalized linear models while also permitting more complex interpretation to examine nonlinearities, interactions, and heterogeneous effects. We also extend the method in several directions to make it more effective for social inquiry, by (1) deriving estimators for the pointwise marginal effects and their variances, (2) establishing unbiasedness, consistency, and asymptotic normality of the KRLS estimator under fairly general conditions, (3) proposing a simple automated rule for choosing the kernel bandwidth, and (4) providing companion software. We illustrate the use of the method through simulations and empirical examples.

Copyright
Corresponding author
e-mail: jhainm@mit.edu (corresponding author)
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

N. Beck , G. King , and L. Zeng 2000. Improving quantitative studies of international conflict: A conjecture. American Political Science Review 94: 2136.

T. Brambor , W. Clark , and M. Golder 2006. Understanding interaction models: Improving empirical analyses. Political Analysis 14(1): 6382.

E. De Vito , A. Caponnetto , and L. Rosasco 2005. Model selection for regularized least-squares algorithm in learning theory. Foundations of Computational Mathematics 5(1): 5985.

T. Evgeniou , M. Pontil , and T. Poggio 2000. Regularization networks and support vector machines. Advances in Computational Mathematics 13(1): 150.

R. J. Friedrich 1982. In defense of multiplicative terms in multiple regression equations. American Journal of Political Science 26(4): 797833.

G. H. Golub , M. Heath , and G. Wahba 1979. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2): 215–23.

T. Hastie , R. Tibshirani , and J. Friedman 2009. The elements of statistical learning: Data mining, inference, and prediction. 2nd ed. New York, NY: Springer.

J. E. Jackson 1991. Estimation of models with variable coefficients. Political Analysis 3(1): 2749.

G. Kimeldorf , and G. Wahba 1970. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics 41(2): 495502.

G. King , and L. Zeng 2006. The dangers of extreme counterfactuals. Political Analysis 14(2): 131–59.

S. N. Wood 2003. Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(1): 95114.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×
MathJax
Type Description Title
PDF
Supplementary Materials

Hainmueller and Hazlett supplementary material
Appendix

 PDF (844 KB)
844 KB

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 11 *
Loading metrics...

Abstract views

Total abstract views: 44 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 27th April 2017. This data will be updated every 24 hours.