Hostname: page-component-76fb5796d-zzh7m Total loading time: 0 Render date: 2024-04-29T09:58:33.477Z Has data issue: false hasContentIssue false

Smoothness and monotonicity constraints for neural networks using ICEnet

Published online by Cambridge University Press:  01 April 2024

Ronald Richman*
Affiliation:
University of the Witwatersrand, Johannesburg, South Africa Old Mutual Insure, Johannesburg, South Africa
Mario V. Wüthrich
Affiliation:
RiskLab, Department of Mathematics, ETH Zurich, Zürich, Switzerland
*
Corresponding author: Ronald Richman; Email: ronaldrichman@gmail.com

Abstract

Deep neural networks have become an important tool for use in actuarial tasks, due to the significant gains in accuracy provided by these techniques compared to traditional methods, but also due to the close connection of these models to the generalized linear models (GLMs) currently used in industry. Although constraining GLM parameters relating to insurance risk factors to be smooth or exhibit monotonicity is trivial, methods to incorporate such constraints into deep neural networks have not yet been developed. This is a barrier for the adoption of neural networks in insurance practice since actuaries often impose these constraints for commercial or statistical reasons. In this work, we present a novel method for enforcing constraints within deep neural network models, and we show how these models can be trained. Moreover, we provide example applications using real-world datasets. We call our proposed method ICEnet to emphasize the close link of our proposal to the individual conditional expectation model interpretability technique.

Type
Original Research Paper
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anderson, D., Feldblum, S., Modlin, C., Schirmacher, D., Schirmacher, E., & Thandi, N. (2007). A practitioner’s guide to generalized linear models–a foundation for theory, interpretation and application (3rd ed.). Towers Watson.Google Scholar
Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011) Algorithms for hyper-parameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11 (pp. 2546–2554).Google Scholar
Biecek, P., & Burzykowski, T. (2021). Explanatory model analysis: Explore, explain, and examine predictive models. CRC Press.CrossRefGoogle Scholar
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).CrossRefGoogle Scholar
Chollet, F., Allaire, J., et al. (2017). R interface to ‘keras’.Google Scholar
Daniels, H., & Velikova, M. (2010). Monotone and partially monotone neural networks. IEEE Transactions on Neural Networks, 21(6), 906917.CrossRefGoogle ScholarPubMed
Debón, A., Montes, F., & Sala, R. (2006). A comparison of nonparametric methods in the graduation of mortality: Application to data from the Valencia region (Spain). International Statistical Review, 74(2), 215233.CrossRefGoogle Scholar
Delong, Ł., & Kozak, A. (2023). The use of autoencoders for training neural networks with mixed categorical and numerical features. ASTIN Bulletin, 53(2), 213232.CrossRefGoogle Scholar
Devriendt, S., Antonio, K., Reynkens, T., & Verbelen, R. (2021). Sparse regression with multi-type regularized feature modeling. Insurance: Mathematics and Economics, 96, 248261.Google Scholar
Dutang, C., & Charpentier, A. (2020). Package ‘casdatasets.Google Scholar
Forfar, D. O., McCutcheon, J. J., & Wilkie, A. D. (1987). On graduation by mathematical formula. Transactions of the Faculty of Actuaries, 41, 97269.CrossRefGoogle Scholar
Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 11891232.CrossRefGoogle Scholar
Goldburd, M., Khare, A., Tevet, D., & Guller, D. (2016). Generalized linear models for insurance rating, CAS Monographs Series , vol. 5. Casualty Actuarial Society.Google Scholar
Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2015). Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1), 4465.CrossRefGoogle Scholar
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.Google Scholar
Guo, C., & Berkhahn, F. (2016). Entity embeddings of categorical variables. arXiv preprint arXiv: 1604.06737.Google Scholar
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The Lasso and generalizations. CRC Press.CrossRefGoogle Scholar
Henckaerts, R., Antonio, K., Clijsters, M., & Verbelen, R. (2018). A data driven binning strategy for the construction of insurance tariff classes. Scandinavian Actuarial Journal, 2018(8), 681705.CrossRefGoogle Scholar
Henderson, R. (1924). A new method of graduation. Transactions of the Actuarial Society of America, 25, 2940.Google Scholar
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv: 1503.02531.Google Scholar
Kellner, R., Nagl, M., & Rösch, D. (2022). Opening the black box - quantile neural networks for loss given default prediction. Journal of Banking & Finance, 134, 120.CrossRefGoogle Scholar
Richman, R. (2021). AI in actuarial science–a review of recent advances–part 1. Annals of Actuarial Science, 15(2), 207229.CrossRefGoogle Scholar
Richman, R., & Wüthrich, M. V. (2020). Nagging predictors. Risks, 8(3), 83.CrossRefGoogle Scholar
Sill, J. (1997). Monotonic networks. In Advances in neural information processing systems, 10.Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267288.CrossRefGoogle Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1), 91108.CrossRefGoogle Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv, 1706.03762v5.Google Scholar
Whittaker, E. T. (1922). On a new method of graduation. Proceedings of the Edinburgh Mathematical Society, 41, 6375.CrossRefGoogle Scholar
Wüthrich, M. V., & Merz, M. (2023). Statistical foundations of actuarial learning and its applications. Springer Actuarial.CrossRefGoogle Scholar
Wüthrich, M. V., & Ziegel, J. (in press). Isotonic recalibration under a low signal-to-noise ratio. Scandinavian Actuarial Journal.Google Scholar
You, S., Ding, D., Canini, K., Pfeifer, J., & Gupta, M. (2017) Deep lattice networks and partial monotonic functions. In Advances in neural information processing systems, 30.Google Scholar
Supplementary material: File

Richman and Wüthrich supplementary material

Richman and Wüthrich supplementary material
Download Richman and Wüthrich supplementary material(File)
File 344.5 KB