Hostname: page-component-8448b6f56d-t5pn6 Total loading time: 0 Render date: 2024-04-23T16:05:31.755Z Has data issue: false hasContentIssue false

MODEL SELECTION AND AVERAGING OF HEALTH COSTS IN EPISODE TREATMENT GROUPS

Published online by Cambridge University Press:  21 December 2016

Shujuan Huang
Affiliation:
Data Science Professional Resource Group at Liberty Mutual, Seattle, WA 98154, USA E-Mail: hshujuan@gmail.com
Brian Hartman*
Affiliation:
Department of Statistics, Brigham Young University, Provo, UT 84602, USA
Vytaras Brazauskas
Affiliation:
Department of Mathematical Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI 53201, USA E-Mail: vytaras@uwm.edu

Abstract

Episode Treatment Groups (ETGs) classify related services into medically relevant and distinct units describing an episode of care. Proper model selection for those ETG-based costs is essential to adequately price and manage health insurance risks. The optimal claim cost model (or model probabilities) can vary depending on the disease. We compare four potential models (lognormal, gamma, log-skew-t and Lomax) using four different model selection methods (AIC and BIC weights, Random Forest feature classification and Bayesian model averaging) on 320 ETGs. Using the data from a major health insurer, which consists of more than 33 million observations from 9 million claimants, we compare the various methods on both speed and precision, and also examine the wide range of selected models for the different ETGs. Several case studies are provided for illustration. It is found that Random Forest feature selection is computationally efficient and sufficiently accurate, hence being preferred in this large data set. When feasible (on smaller data sets), Bayesian model averaging is preferred because of the posterior model probabilities.

Type
Research Article
Copyright
Copyright © Astin Bulletin 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bai, Y. (2009) Convergence of Adaptive Markov Chain Monte Carlo Methods. Ph.D. dissertation, Department of Statistics, University of Toronto.Google Scholar
Breiman, L. (2001) Random forests. Machine Learning, 45 (1), 532.CrossRefGoogle Scholar
Burnham, K.P. and Anderson, D.R. (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. New York, NY: Springer.Google Scholar
Chen, M.H. and Schmeiser, B. (1993) Performance of the Gibbs, hit-and-run, and metropolis samplers. Journal of Computational & Graphical Statistics, 2 (3), 251272.Google Scholar
Congdon, P. (2006) Bayesian model choice based on Monte Carlo estimates of posterior model probabilities. Computational Statistics & Data Analysis, 50 (2), 346357.Google Scholar
Dove, H.G., Duncan, I. and Robb, A. (2003) A prediction model for targeting low-cost, high-risk members of managed care organizations. The American Journal of Managed Care, 9 (5), 381389.Google Scholar
Duncan, I. (2011) Healthcare Risk Adjustment and Predictive Modeling. Winsted, CT: Actex Publications.Google Scholar
Eling, M. (2012) Fitting insurance claims to skewed distributions: Are the skew-normal and skew-student good models? Insurance: Mathematics & Economics, 51 (2), 239248.Google Scholar
Ferreira, J. and Steel, M.F. (2007) A new class of skewed multivariate distributions with applications to regression analysis. Statistica Sinica, 17 (2), 505529.Google Scholar
Forthman, M.T., Dove, H.G., Forthman, C.L. and Henderson, R.D. (2005) Beyond severity of illness: Evaluating differences in patient intensity and complexity for valid assessment of medical practice pattern variation. Managed Care Quarterly, 13 (4), 917.Google Scholar
Forthman, M.T., Dove, H.G. and Wooster, L.D. (2000) Episode Treatment Groups (ETGs): A patient classification system for measuring outcomes performance by episode of illness. Topics in Health Information Management, 21 (2), 5161.Google Scholar
Forthman, M.T., Gold, R.S., Dove, H.G. and Henderson, R.D. (2010) Risk-adjusted indices for measuring the quality of inpatient care. Quality Management in Health Care, 19 (3), 265277.Google Scholar
Frees, E.W., Gao, J. and Rosenberg, M.A. (2011) Predicting the frequency and amount of health care expenditures. North American Actuarial Journal, 15 (3), 377392.Google Scholar
Hartman, B.M. and Groendyke, C. (2013) Model selection and averaging in financial risk management. North American Actuarial Journal, 17 (3), 216228.Google Scholar
Hastie, T.J., Tibshirani, R.J. and Friedman, J.H. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY: Springer.CrossRefGoogle Scholar
Hoffman, M.D. and Gelman, A. (2014) The No-U-Turn sampler: Adaptively setting path lengths in hamiltonian Monte Carlo. Journal of Machine Learning Research, 15 (1), 13511381.Google Scholar
Jones, M. and Faddy, M. (2003) A skew extension of the t-distribution, with applications. Journal of the Royal Statistical Society: Series B, 65 (1), 159174.Google Scholar
Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences. Hoboken, NJ: Wiley.Google Scholar
Klugman, S.A., Panjer, H.H. and Willmot, G.E. (2012) Loss Models: From Data to Decisions, 4th ed. Hoboken, NJ: Wiley.Google Scholar
Kuha, J. (2004) AIC and BIC comparisons of assumptions and performance. Sociological Methods & Research, 33 (2), 188229.Google Scholar
Leary, R.S., Johantgen, M.E., Farley, D., Forthman, M.T. and Wooster, L.D. (1997) All-payer severity-adjusted diagnosis-related groups: A uniform method to severity-adjust discharge data. Topics in Health Information Management, 17 (3), 6071.Google Scholar
Neal, R. (2011) MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo (eds. Brooks, S., Gelman, A., Jones, G.L. and Meng, X-L.), pp. 113162. Boca Raton, FL: Chapman & Hall/CRC.CrossRefGoogle Scholar
Schwartz, E.M., Bradlow, E.T. and Fader, P.S. (2014) Model selection using database characteristics: Developing a classification tree for longitudinal incidence data. Marketing Science, 33 (2), 188205.Google Scholar
Shtatland, E.S., Moore, S., Dashevsky, I., Miroshnik, I., Cain, E. and Barton, M.B. (2000) How to be a Bayesian in SAS: Model selection uncertainty in PROC LOGISTIC and PROC GENMOD. In Proceedings of the 13th Annual NorthEast SAS Users Group Conference, pp. 724732, Philadelphia, PA.Google Scholar