Skip to main content

Sparse Estimation and Uncertainty with Application to Subgroup Analysis

  • Marc Ratkovic (a1) and Dustin Tingley (a2)

We introduce a Bayesian method, LASSOplus, that unifies recent contributions in the sparse modeling literatures, while substantially extending pre-existing estimators in terms of both performance and flexibility. Unlike existing Bayesian variable selection methods, LASSOplus both selects and estimates effects while returning estimated confidence intervals for discovered effects. Furthermore, we show how LASSOplus easily extends to modeling repeated observations and permits a simple Bonferroni correction to control coverage on confidence intervals among discovered effects. We situate LASSOplus in the literature on how to estimate subgroup effects, a topic that often leads to a proliferation of estimation parameters. We also offer a simple preprocessing step that draws on recent theoretical work to estimate higher-order effects that can be interpreted independently of their lower-order terms. A simulation study illustrates the method’s performance relative to several existing variable selection methods. In addition, we apply LASSOplus to an existing study on public support for climate treaties to illustrate the method’s ability to discover substantive and relevant effects. Software implementing the method is publicly available in the R package sparsereg.

Corresponding author
* Email:
Hide All
Authors’ note: We are grateful to Neal Beck, Scott de Marchi, In Song Kim, John Londregan, Luke Miratrix, Michael Peress, Jasjeet Sekhon, Yuki Shiraito, Brandon Stewart, and Susan Athey for helpful comments on an earlier draft. Earlier versions presented at the 2015 Summer Methods Meeting, Harvard IQSS Applied Statistics Workshop, Princeton Political Methodology Colloquium, DARPA/ISAT Conference “What If? Machine Learning for Causal Inference,” and EITM 2016. We are also grateful to two anonymous reviewers for detailed feedback on an earlier version. All mistakes are because of the authors. Replication data is available at Ratkovic and Tingley 2016.
Contributing Editor: R. Michael Alvarez
Hide All
Albert, James H., and Chib, Siddhartha. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88:669679.
Alhamzawi, Rahim, Yu, Keming, and Benoit, Dries F.. 2012. Bayesian adaptive Lasso quantile regression. Statistical Modelling 12(3):279297.
Armagan, Artin, Dunson, David B., and Lee, Jaeyong. 2013. Generalized double pareto shrinkage. Statistica Sinica 23:119143.
Bechtel, Michael M., and Scheve, Kenneth F.. 2013. Mass support for global climate agreements depends on institutional design. Proceedings of the National Academy of Sciences 110(34):1376313768.
Belloni, A., Chen, D., Chernozhukov, V., and Hansen, C.. 2012. Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80(6):23692429, doi:10.3982/ECTA9626.
Belloni, Alexandre, and Chernozhukov, Victor. 2013. Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2):521547.
Belloni, Alexandre, Chernozhukov, Victor, and Hansen, Christian. 2011. Inference for high-dimensional sparse econometric models. CeMMAP Working Papers CWP41/11 Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Benjamini, Yoav, and Yekutieli, Daniel. 2005. False discovery rate-adjusted multiple confidence intervals for selected parameters. Journal of the American Statistical Association 100(469):7193.
Berger, J. O., and Bernardo, J. M.. 1989. Estimating a product of means: Bayesian analysis with reference priors. Journal of American Statistical Association 84:200207.
Berger, James O. 2006. The case for objective Bayesian analysis. Bayesian Analysis 1(3):385402.
Berger, James O., Bernardo, Jose M., and Sun, Dongchu. 2009. The formal definition of reference priors. The Annals of Statistics 37(2):905938.
Berger, James O., Wang, Xiaojing, and Shen, Lei. 2015. A Bayesian approach to subgroup identification. Journal of Biopharmaceutical Statistics 24(1):110129.
Berk, Richard, Brown, Lawrence, Buja, Andreas, Zhang, Kai, and Zhao, Linda. 2013. Valid post-selection inference. Annals of Statistics 41(2):802837.
Bernardo, J. M. 1979. Reference posterior distributions for Bayesian inference. Journal of the Royal Statistical Society Series B 41:113147.
Bernardo, Jose M. 2005. Reference analysis. ed. Dey, D. K. and Rao, C. R.. Handbook of statistics . Elsevier.
Berry, Donald. 1990. Subgroup analysis. Biometrics 46(4):12271230.
Bhadra, Anindya, Datta, Jyotishka, Polson, Nicholas G., and Willard, Brandon. 2015. The Horseshoe estimator of ultra-sparse signals. Working Paper.
Bhattacharya, Anirban, Pati, Debdeep, Pillai, Natesh S., and Dunson, David B.. 2015. Dirichlet-laplace priors for optimal shrinkage. Journal of the Americal Statistical Association 110:14791490.
Bickel, Peter, Ritov, Ya’acov, and Tsybakov, Alexandre. 2009. Simultaneous analysis of Lasso and Dantzig selector. Annals of Statistics 37(4):17051732.
Buhlmann, Peter, and van de Geer, Sara. 2013. Statistics for high-dimensional data . Berlin: Springer.
Candes, E., and Tao, T.. 2007. The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Annals of Statistics 35:23132404.
Candes, Emmanuel J. 2006. Modern statistical estimation via oracle inequalities. Acta Numerica 15:169.
Carvalho, C., Polson, N., and Scott, J.. 2010. The Horseshoe estimator for sparse signals. Biometrika 97:465480.
Chatterjee, A., and Lahiri, S. N.. 2011. Bootstrapping lasso estimators. Journal of the American Statistical Association 106(494):608625.
Chatterjee, Sourav. 2014. Assumptionless consistency of the LASSO. arXiv:1303.5817v5.
Chernozhukov, Victor, Fernández-Val, Iván, and Melly, Blaise. 2013. Inference on counterfactual distributions. Econometrica 81(6):22052268.
Datta, Jyotishka, and Ghosh, Jayanta K.. 2013. Asymptotic properties of bayes risk for the Horseshoe prior. Bayesian Analysis 8(1):111132.
Donoho, David L., and Johnstone, Iain M.. 1994. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425455.
Efron, Bradley. 2015. Frequentist accuracy of Bayesian estimates. Journal of the Royal Statistical Society Series B 77(3):617646.
Esarey, Justin, and Summer, Jane Lawrence. 2015. Marginal effects in interaction models: Determining and controlling the false positive rate. Working Paper.
Fan, Jianqing, and Peng, Heng. 2004. Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics 32(3):928961.
Fan, Jianqing, and Li, Runze. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456):13481360.
Figueiredo, Mario. 2004. Lecture Notes on the EM Algorithm. Lecture notes. Instituto de Telecomunicacoes, Instituto Superior Tecnico.
Foster, J. C., Taylor, J. M., and Ruberg, S. J.. 2011. Subgroup identification from randomized clinical trial data. Statistics in Medicine 30(2867-2880).
Gelman, Andrew. 2006. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis 1(3):515534.
Gelman, Andrew, Jakulin, Aleks, Pittau, Maria Grazia, and Su, Yu-Sung. 2008. A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics 2(4):13601383.
Gelman, Andrew, and Hill, Jennifer. 2007. Data analysis using regression and multilevel/hierarchical models . Cambridge: Cambrdige University Press.
Gelman, Andrew, Carlin, John B., Stern, Hal S., Dunson, David B., Vehtari, Aki, and Rubin, Donald B.. 2014. Bayesian data analysis . Text in statistical science series. Boca Raton, FL: CRC Press.
Gill, Jeff. 2014. Bayesian methods: A social and behavioral sciences approach . 3rd ed. CRC Press.
Gillen, B., Montero, S., Moon, H. R., and Shum, M.. 2016. BLP-Lasso for aggregate discrete choice models applied to elections with rich demographic covariates. Working Paper.
Green, Donald P., and Kern, Holger L.. 2012. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opinion Quarterly 76:491511.
Griffin, J. E., and Brown, P. J.. 2010. Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis 5(1):171188.
Griffin, J. E., and Brown, P. J.. 2012. Structuring shrinkage: Some correlated priors for regression. Biometrika 99(2):481487.
Grimmer, Justin, Messing, Solomon, and Westwood, Sean. Forthcoming. Estimating heterogeneous treatment effects and the effects of heterogeneous treatments with ensemble methods. Political Analysis.
Hahn, P. Richard, and Carvalho, Carlos M.. 2015. Decoupling shrinkage and selection in Bayesian linear models: A posterior summary perspective. Journal of the American Statistical Association 110(509):435448.
Hainmueller, Jens, and Hazlett, Chad. 2013. Kernel regularized least squares: Reducing misspecification bias with a flexible and interpretable machine learning approach. Political Analysis 22:143169.
Hainmueller, Jens, Hopkins, Daniel J., and Yamamoto, Teppei. 2014. Causal inference in conjoint analysis: Understanding multidimensional choices via stated preference experiments. Political Analysis 22(1):130.
Hans, Chris. 2009. Bayesian lasso regression. Biometrika 96(4):835845.
Harding, Matthew, and Lamarche, Carlos. 2016. Penalized quantile regression with semiparametric correlated effects: An application with heterogeneous preferences. Journal of Applied Econometrics , doi:10.1002/jae.2520.
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2010. The elements of statistical learning: Data mining, inference, and prediction . New York: Springer.
Imai, Kosuke, and Strauss, Aaron. 2011. Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get- out-the-vote campaign. Political Analysis 19(1):119.
Imai, Kosuke, and Ratkovic, Marc. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. The Annals of Applied Statistics 7(1):443470.
Jackman, Simon. 2009. Bayesian analysis for the social sciences . West Sussex, UK: Wiley.
Jaynes, E. T. 1982. On the rationale of maximum-entropy methods. Proceedings of the IEEE 70:939952.
Kang, Jian, and Guo, Jian. 2009. Self-adaptive Lasso and its Bayesian estimation. Working Paper.
Kenkel, Brenton, and Signorino, Curtis. 2012. A method for flexible functional form estimation: Bootstrapped basis regression with variable selection. Working Paper.
Kyung, Minjung, Gill, Jeff, Ghosh, Malay, and Casella, George. 2010. Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5(2):369412.
Kyung, Minjung, Gill, Jeff, Ghosh, Malay, and Casella, George et al. . 2010. Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5(2):369411.
Leeb, Hannes, and Potscher, Benedikt. 2008. Sparse estimators and the oracle property, or the return of hodges estimator. Journal of Econometrics 142:201211.
Leeb, Hannes, Potscher, Benedikt, and Ewald, Karl. 2015. On various confidence intervals post-model-selection. Statistical Science 30(2):216227.
Leng, Chenlei, Tran, Minh-Ngoc, and Nott, David. 2014. Bayesian adaptive LASSO. Annals of the Institute of Statistical Mathematics 66(2):221244.
Lipkovich, I., Dmitrienko, A., Denne, J., and Enas, G.. 2011. Subgrosup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine 30:26012621.
Liu, H., and Yu, B.. 2013. Asymptotic properties of Lasso mLS and Lasso Ridge in sparse high-dimensional linear regression. Electronic Journal of Statistics 7:31243169.
Lockhart, Richard, Taylor, Jonathan, Tibshirani, Ryan J., and Tibshirani, Robert. 2014. A significance test for the lasso. The Annals of Statistics 42(2):413468.
Loh, Wei-Yin, Heb, Xu, and Manc, Michael. 2015. A regression tree approach to identifying subgroups with differential treatment effects. Statistics in Medicine 34:18181833.
Minnier, Jessica, Tian, Lu, and Cai, Tianxi. 2011. A perturbation method for inference on regularized regression estimates. Journal of the American Statistical Association 106:13711382.
Mitchell, T. J., and Beauchamp, J. J.. 1988. Bayesian variable selection in linear regression. Journal of the Americal Statistical Association 83(404):10231032.
O’Hara, R. B., and Silanapaa, M. J.. 2009. A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis 4(1):85118.
Park, Trevor, and Casella, George. 2008. The bayesian lasso. Journal of the American Statistical Association 103(482):681686.
Polson, Nicholas, and Scott, James. 2012. Local shrinkage rules, Levy processes and regularized regression. Journal of the Royal Statistical Society, Series B 74(2):287311.
Potscher, Benedikt, and Leeb, Hannes. 2009. On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. Journal of Multivariate Analysis 100(9):20652082.
Ratkovic, Marc, and Tingley, Dustin. 2016. Replication data for: Sparse estimation and uncertainty with application to subgroup analysis. doi:10.7910/DVN/RNMB1Q, Harvard Dataverse, September 6, 2016.
Stewart, Brandon M.Latent factor regressions for the social sciences. Working Paper.
Strezhnev, Anton, Hainmueller, Jens, Hopkins, Daniel, and Yamamoto, Teppei. 2014. cjoint: AMCE estimator for conjoint experiments. R package version 1.0.3.
Su, Xiaogang, Tsai, Chih-Ling, Wang, Hansheng, Nickerson, David M., and Li, Bogong. 2009. Subgroup analysis via recursive partitioning. Journal of Machine Learning Research 10:141158.
Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58:267288.
Tierney, Luke. 1994. Markov chains for exploring posterior distributions. The Annals of Statistics 22(4):17011728.
Tingley, Dustin, and Tomz, Michael. 2013. Conditional cooperation and climate change. Comparative Political Studies , p. 0010414013509571.
Wager, S., and Athey, S.. 2015. Estimation and inference of heterogeneous treatment effects using random forests. Working paper.
West, M. 1987. On scale mixtures of normal distributions. Biometrika 74:646648.
Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101(476):14181429.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
Type Description Title
Supplementary materials

Ratkovic and Tingley supplementary material
Ratkovic and Tingley supplementary material 1

 Unknown (257 KB)
257 KB


Altmetric attention score

Full text views

Total number of HTML views: 11
Total number of PDF views: 367 *
Loading metrics...

Abstract views

Total abstract views: 846 *
Loading metrics...

* Views captured on Cambridge Core between 22nd February 2017 - 20th August 2018. This data will be updated every 24 hours.