Hostname: page-component-8448b6f56d-dnltx Total loading time: 0 Render date: 2024-04-19T17:12:04.562Z Has data issue: false hasContentIssue false

COST-SENSITIVE MULTI-CLASS ADABOOST FOR UNDERSTANDING DRIVING BEHAVIOR BASED ON TELEMATICS

Published online by Cambridge University Press:  31 August 2021

Banghee So
Affiliation:
Department of Mathematics, Towson University, 7800 York Rd, Towson, MD, 21252, USA, E-Mail: bso@towson.edu
Jean-Philippe Boucher
Affiliation:
Département de Mathématiques, Université du Québec à Montréal, 201 Avenue du Président-Kennedy, Montréal, Québec, H2X 3Y7, Canada, E-Mail: boucher.jean-philippe@uqam.ca
Emiliano A. Valdez*
Affiliation:
Department of Mathematics, University of Connecticut, 341 Mansfield Road, Storrs, CT, 06269-1009, USA, E-Mail: emiliano.valdez@uconn.edu

Abstract

Using telematics technology, insurers are able to capture a wide range of data to better decode driver behavior, such as distance traveled and how drivers brake, accelerate, or make turns. Such additional information also helps insurers improve risk assessments for usage-based insurance, a recent industry innovation. In this article, we explore the integration of telematics information into a classification model to determine driver heterogeneity. For motor insurance during a policy year, we typically observe a large proportion of drivers with zero accidents, a lower proportion with exactly one accident, and a far lower proportion with two or more accidents. We here introduce a cost-sensitive multi-class adaptive boosting (AdaBoost) algorithm we call SAMME.C2 to handle such class imbalances. We calibrate the algorithm using empirical data collected from a telematics program in Canada and demonstrate an improved assessment of driving behavior using telematics compared with traditional risk variables. Using suitable performance metrics, we show that our algorithm outperforms other learning models designed to handle class imbalances.

Type
Research Article
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of The International Actuarial Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ayuso, M., Guillen, M. and Nielsen, J.P. (2019) Improving automobile insurance ratemaking using telematics: incorporating mileage and driver behaviour data. Transportation 46, 735752.CrossRefGoogle Scholar
Ayuso, M., Guillen, M. and Pérez-Marín, A.M. (2016) Telematics and gender discrimination: some usage-based evidence on whether men’s risk of accidents differs from women’s. Risks 4, 110.CrossRefGoogle Scholar
Bhowan, U., Zhang, M. and Johnston, M. (2010) Genetic programming for classification with unbalanced data. Proceedings 13th European Conference on Genetic Programming, EuroGP 2010, pp. 113. Springer-Verlag Berlin.CrossRefGoogle Scholar
Boucher, J.-P., Côté, S. and Guillen, M. (2017) Exposure as duration and distance in telematics motor insurance using generalized additive models. Risks 5, 123.CrossRefGoogle Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321357.CrossRefGoogle Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O. and Bowyer, K.W. (2003) SMOTEBoost: Improving prediction of the minority class in boosting. PKDD 2003: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery, pp. 107–119. Springer-Verlag: Berlin-Heidelberg.CrossRefGoogle Scholar
Constantinescu, C.C., Stancu, I. and Panait, I. (2018) Impact study of telematics auto insurance. Review of Financial Studies 3(4), 1735.Google Scholar
Douzas, G., Bacao, F. and Last, F. (2018) Improving imblanced learning through a heuristic oversampling method based on k-means and smote. Information Sciences 465, 120.CrossRefGoogle Scholar
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B. and Herrera, F. (2018). Learning from Imbalanced Data Sets. Switzerland: Springer.CrossRefGoogle Scholar
Ferrario, A. and Hämmerli, R. (2019) On Boosting: Theory and Applications. Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3402687 CrossRefGoogle Scholar
Ferreira, A.J. and Figueiredo, M.A. (2012) Boosting algorithms: A review of methods, theory, and applications. In Ensemble Machine Learning: Methods and Applications (eds. Zhang, C. and Ma, Y. ), chap. 2, pp. 35–85. Springer Science.CrossRefGoogle Scholar
Fowlkes, E.B. and Mallows, C. (1983) A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78(383), 553569.CrossRefGoogle Scholar
Freund, Y. and Schapire, R.E. (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119139.CrossRefGoogle Scholar
Friedman, J., Hastie, T. and Tibshirani, R. (2000) Additive logistic regression: A statistical view of boosting. The Annals of Statistics 28(2), 337407.CrossRefGoogle Scholar
Galar, M., Fernández, A., Barrenechea, E., Bustince, H. and Herrer, F. (2012) A review on emsembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics – Part C: Applications and Review 42(4), 463484.CrossRefGoogle Scholar
Gao, G., Meng, S. and Wüthrich, M.V. (2019) Claims frequency modeling using telematics card driving data. Scandinavian Actuarial Journal 2, 143162.CrossRefGoogle Scholar
Gao, G., Wang, H. and Wüthrich, M.V. (2021) Boosting poisson regression models with telematics car driving data. Machine Learning.CrossRefGoogle Scholar
Guillen, M., Nielsen, J.P., Pérez-Marín, A.M. and Elpidorou, V. (2020) Can automobile insurance telematics predict the risk of near-miss events? North American Actuarial Journal 24(1), 141152.CrossRefGoogle Scholar
Hand, D.J. and Till, R.J. (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45(2), 171186.CrossRefGoogle Scholar
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.CrossRefGoogle Scholar
Holland, J.H. (1975) Adaptation in Natural and Artifical Systems. Ann Arbor: Univesity of Michigan Press.Google Scholar
Mühlenbein, H. (1997) Genetic algorithms. In Local Search in Combinatorial Optimization (eds. Aarts, E.H. and Lenstra, J.K. ), pp. 137–172. Princeton University Press.CrossRefGoogle Scholar
Orphanoudakis, S.C., Chronaki, C.E., Tsiknakis, M. and Kostomanolakis, S.G. (1998) Telematics in healthcare. In Medical Image Databses (ed. Wong, S.T. ), chap. 10, pp. 251–281. New York: Springer.Google Scholar
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T. and Brunk, C. (1994) Reducing misclassification costs. ICML 1994: Proceedings of the Eleventh International Conference on Machine Learning, pp. 217225. San Francisco, CA: Morgan Kaufman Publishers Inc.CrossRefGoogle Scholar
Pednault, E.P., Rosen, B.K. and Apte, C. (2000) Handling imbalanced data sets in insurance risk modeling. Technical report, Association for the Advancement of Artificial Intelligence (AAAI).Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, 28252830.Google Scholar
Pérez-Marín, A. M., Guillen, M., Alcañiz, M. and Bermúdez, L. (2019) Quantile regression with telematics information to assess the risk of driving above the posted speed limit. Risks 7, 111.CrossRefGoogle Scholar
Pesantez-Narvaez, J., Guillen, M. and Alcañiz, M. (2019) Predicting motor insurance claims using telematics data – XGBoost versus logistic regression. Risks 7, 116.CrossRefGoogle Scholar
Schapire, R.E. and Singer, Y. (1999) Using boosting algorithms using confidence-rated predictions. Machine Learning 37, 297336.CrossRefGoogle Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J. and Napolitano, A. (2010) RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40(1), 185197.CrossRefGoogle Scholar
Shon, H.S., Batbaatar, E., Kim, K.O., Cha, E.J. and Kim, K.-A. (2020) Classification of kidney cancer data using cost-sensitive hybrid deep learning approach. Symmetry 12, 154.CrossRefGoogle Scholar
Sun, Y., Kamel, M.S., Wong, A.K. and Wang, Y. (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40(12), 33583378.CrossRefGoogle Scholar
Tang, Y., Zhang, Y.-Q., Chawla, N.V. and Krasser, S. (2009) SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 39(1), 281288.CrossRefGoogle ScholarPubMed
Verbelen, R., Antonio, K. and Claeskens, G. (2018) Unravelling the predictive power of telematics data in car insurance pricing. Journal of the Royal Statistical Society: Series C (Applied Statistics) 67(5), 12751304.Google Scholar
Wüthrich, M.V. and Buser, C. (2020) Data analytics for non-life insurance pricing. Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2870308 Google Scholar
Yang, Q. and Wu, X. (2006) 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making 5(4), 597604.CrossRefGoogle Scholar
Zhang, S. (2020) Cost-sensitive KNN classification. Neurocomputing 391, 234242.CrossRefGoogle Scholar
Zhu, J., Zou, H., Rossett, S. and Hastie, T. (2009) Multi-class AdaBoost. Statistics and Its Interface, 2, 349360.Google Scholar