MODEL SELECTION AND AVERAGING OF HEALTH COSTS IN EPISODE TREATMENT GROUPS

Shujuan Huang; Brian Hartman; Vytaras Brazauskas

doi:10.1017/asb.2016.26

MODEL SELECTION AND AVERAGING OF HEALTH COSTS IN EPISODE TREATMENT GROUPS

Published online by Cambridge University Press: 21 December 2016

Shujuan Huang ,

Brian Hartman and

Vytaras Brazauskas

Show author details

Shujuan Huang: Affiliation:
Data Science Professional Resource Group at Liberty Mutual, Seattle, WA 98154, USA E-Mail: hshujuan@gmail.com
Brian Hartman*: Affiliation:
Department of Statistics, Brigham Young University, Provo, UT 84602, USA
Vytaras Brazauskas: Affiliation:
Department of Mathematical Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI 53201, USA E-Mail: vytaras@uwm.edu
*: E-Mail: hartman@stat.byu.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Episode Treatment Groups (ETGs) classify related services into medically relevant and distinct units describing an episode of care. Proper model selection for those ETG-based costs is essential to adequately price and manage health insurance risks. The optimal claim cost model (or model probabilities) can vary depending on the disease. We compare four potential models (lognormal, gamma, log-skew-t and Lomax) using four different model selection methods (AIC and BIC weights, Random Forest feature classification and Bayesian model averaging) on 320 ETGs. Using the data from a major health insurer, which consists of more than 33 million observations from 9 million claimants, we compare the various methods on both speed and precision, and also examine the wide range of selected models for the different ETGs. Several case studies are provided for illustration. It is found that Random Forest feature selection is computationally efficient and sufficiently accurate, hence being preferred in this large data set. When feasible (on smaller data sets), Bayesian model averaging is preferred because of the posterior model probabilities.

Keywords

Akaike weights Bayesian model selection model averaging random forest

Type: Research Article
Information: ASTIN Bulletin: The Journal of the IAA , Volume 47 , Issue 1 , January 2017 , pp. 153 - 167

DOI: https://doi.org/10.1017/asb.2016.26 [Opens in a new window]
Copyright: Copyright © Astin Bulletin 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Bai, Y. (2009) Convergence of Adaptive Markov Chain Monte Carlo Methods. Ph.D. dissertation, Department of Statistics, University of Toronto.Google Scholar

Breiman, L. (2001) Random forests. Machine Learning, 45 (1), 5–32.CrossRef Google Scholar

Burnham, K.P. and Anderson, D.R. (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. New York, NY: Springer.Google Scholar

Chen, M.H. and Schmeiser, B. (1993) Performance of the Gibbs, hit-and-run, and metropolis samplers. Journal of Computational & Graphical Statistics, 2 (3), 251–272.Google Scholar

Congdon, P. (2006) Bayesian model choice based on Monte Carlo estimates of posterior model probabilities. Computational Statistics & Data Analysis, 50 (2), 346–357.Google Scholar

Dove, H.G., Duncan, I. and Robb, A. (2003) A prediction model for targeting low-cost, high-risk members of managed care organizations. The American Journal of Managed Care, 9 (5), 381–389.Google Scholar

Duncan, I. (2011) Healthcare Risk Adjustment and Predictive Modeling. Winsted, CT: Actex Publications.Google Scholar

Eling, M. (2012) Fitting insurance claims to skewed distributions: Are the skew-normal and skew-student good models? Insurance: Mathematics & Economics, 51 (2), 239–248.Google Scholar

Ferreira, J. and Steel, M.F. (2007) A new class of skewed multivariate distributions with applications to regression analysis. Statistica Sinica, 17 (2), 505–529.Google Scholar

Forthman, M.T., Dove, H.G., Forthman, C.L. and Henderson, R.D. (2005) Beyond severity of illness: Evaluating differences in patient intensity and complexity for valid assessment of medical practice pattern variation. Managed Care Quarterly, 13 (4), 9–17.Google Scholar

Forthman, M.T., Dove, H.G. and Wooster, L.D. (2000) Episode Treatment Groups (ETGs): A patient classification system for measuring outcomes performance by episode of illness. Topics in Health Information Management, 21 (2), 51–61.Google Scholar

Forthman, M.T., Gold, R.S., Dove, H.G. and Henderson, R.D. (2010) Risk-adjusted indices for measuring the quality of inpatient care. Quality Management in Health Care, 19 (3), 265–277.Google Scholar

Frees, E.W., Gao, J. and Rosenberg, M.A. (2011) Predicting the frequency and amount of health care expenditures. North American Actuarial Journal, 15 (3), 377–392.Google Scholar

Hartman, B.M. and Groendyke, C. (2013) Model selection and averaging in financial risk management. North American Actuarial Journal, 17 (3), 216–228.Google Scholar

Hastie, T.J., Tibshirani, R.J. and Friedman, J.H. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY: Springer.CrossRef Google Scholar

Hoffman, M.D. and Gelman, A. (2014) The No-U-Turn sampler: Adaptively setting path lengths in hamiltonian Monte Carlo. Journal of Machine Learning Research, 15 (1), 1351–1381.Google Scholar

Jones, M. and Faddy, M. (2003) A skew extension of the t-distribution, with applications. Journal of the Royal Statistical Society: Series B, 65 (1), 159–174.Google Scholar

Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences. Hoboken, NJ: Wiley.Google Scholar

Klugman, S.A., Panjer, H.H. and Willmot, G.E. (2012) Loss Models: From Data to Decisions, 4th ed. Hoboken, NJ: Wiley.Google Scholar

Kuha, J. (2004) AIC and BIC comparisons of assumptions and performance. Sociological Methods & Research, 33 (2), 188–229.Google Scholar

Leary, R.S., Johantgen, M.E., Farley, D., Forthman, M.T. and Wooster, L.D. (1997) All-payer severity-adjusted diagnosis-related groups: A uniform method to severity-adjust discharge data. Topics in Health Information Management, 17 (3), 60–71.Google Scholar

Neal, R. (2011) MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo (eds. Brooks, S., Gelman, A., Jones, G.L. and Meng, X-L.), pp. 113–162. Boca Raton, FL: Chapman & Hall/CRC.CrossRef Google Scholar

Schwartz, E.M., Bradlow, E.T. and Fader, P.S. (2014) Model selection using database characteristics: Developing a classification tree for longitudinal incidence data. Marketing Science, 33 (2), 188–205.Google Scholar

Shtatland, E.S., Moore, S., Dashevsky, I., Miroshnik, I., Cain, E. and Barton, M.B. (2000) How to be a Bayesian in SAS: Model selection uncertainty in PROC LOGISTIC and PROC GENMOD. In Proceedings of the 13th Annual NorthEast SAS Users Group Conference, pp. 724–732, Philadelphia, PA.Google Scholar

Article contents

MODEL SELECTION AND AVERAGING OF HEALTH COSTS IN EPISODE TREATMENT GROUPS

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests