Smoothness and monotonicity constraints for neural networks using ICEnet

Ronald Richman; Mario V. Wüthrich

doi:10.1017/S174849952400006X

Smoothness and monotonicity constraints for neural networks using ICEnet

Published online by Cambridge University Press: 01 April 2024

Ronald Richman

and

Mario V. Wüthrich

Show author details

Ronald Richman*: Affiliation:
University of the Witwatersrand, Johannesburg, South Africa Old Mutual Insure, Johannesburg, South Africa
Mario V. Wüthrich: Affiliation:
RiskLab, Department of Mathematics, ETH Zurich, Zürich, Switzerland
*: Corresponding author: Ronald Richman; Email: ronaldrichman@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Deep neural networks have become an important tool for use in actuarial tasks, due to the significant gains in accuracy provided by these techniques compared to traditional methods, but also due to the close connection of these models to the generalized linear models (GLMs) currently used in industry. Although constraining GLM parameters relating to insurance risk factors to be smooth or exhibit monotonicity is trivial, methods to incorporate such constraints into deep neural networks have not yet been developed. This is a barrier for the adoption of neural networks in insurance practice since actuaries often impose these constraints for commercial or statistical reasons. In this work, we present a novel method for enforcing constraints within deep neural network models, and we show how these models can be trained. Moreover, we provide example applications using real-world datasets. We call our proposed method ICEnet to emphasize the close link of our proposal to the individual conditional expectation model interpretability technique.

Keywords

Smoothing Whittaker–Henderson smoothing graduation monotonicity deep neural networks constrained likelihood individual conditional expectation

Type: Original Research Paper
Information: Annals of Actuarial Science , First View , pp. 1 - 28

DOI: https://doi.org/10.1017/S174849952400006X [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anderson, D., Feldblum, S., Modlin, C., Schirmacher, D., Schirmacher, E., & Thandi, N. (2007). A practitioner’s guide to generalized linear models–a foundation for theory, interpretation and application (3rd ed.). Towers Watson.Google Scholar

Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011) Algorithms for hyper-parameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11 (pp. 2546–2554).Google Scholar

Biecek, P., & Burzykowski, T. (2021). Explanatory model analysis: Explore, explain, and examine predictive models. CRC Press.CrossRef Google Scholar

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).CrossRef Google Scholar

Chollet, F., Allaire, J., et al. (2017). R interface to ‘keras’.Google Scholar

Daniels, H., & Velikova, M. (2010). Monotone and partially monotone neural networks. IEEE Transactions on Neural Networks, 21(6), 906–917.CrossRef Google Scholar PubMed

Debón, A., Montes, F., & Sala, R. (2006). A comparison of nonparametric methods in the graduation of mortality: Application to data from the Valencia region (Spain). International Statistical Review, 74(2), 215–233.CrossRef Google Scholar

Delong, Ł., & Kozak, A. (2023). The use of autoencoders for training neural networks with mixed categorical and numerical features. ASTIN Bulletin, 53(2), 213–232.CrossRef Google Scholar

Devriendt, S., Antonio, K., Reynkens, T., & Verbelen, R. (2021). Sparse regression with multi-type regularized feature modeling. Insurance: Mathematics and Economics, 96, 248–261.Google Scholar

Dutang, C., & Charpentier, A. (2020). Package ‘casdatasets.Google Scholar

Forfar, D. O., McCutcheon, J. J., & Wilkie, A. D. (1987). On graduation by mathematical formula. Transactions of the Faculty of Actuaries, 41, 97–269.CrossRef Google Scholar

Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.CrossRef Google Scholar

Goldburd, M., Khare, A., Tevet, D., & Guller, D. (2016). Generalized linear models for insurance rating, CAS Monographs Series , vol. 5. Casualty Actuarial Society.Google Scholar

Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2015). Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1), 44–65.CrossRef Google Scholar

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.Google Scholar

Guo, C., & Berkhahn, F. (2016). Entity embeddings of categorical variables. arXiv preprint arXiv: 1604.06737.Google Scholar

Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The Lasso and generalizations. CRC Press.CrossRef Google Scholar

Henckaerts, R., Antonio, K., Clijsters, M., & Verbelen, R. (2018). A data driven binning strategy for the construction of insurance tariff classes. Scandinavian Actuarial Journal, 2018(8), 681–705.CrossRef Google Scholar

Henderson, R. (1924). A new method of graduation. Transactions of the Actuarial Society of America, 25, 29–40.Google Scholar

Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv: 1503.02531.Google Scholar

Kellner, R., Nagl, M., & Rösch, D. (2022). Opening the black box - quantile neural networks for loss given default prediction. Journal of Banking & Finance, 134, 1–20.CrossRef Google Scholar

Richman, R. (2021). AI in actuarial science–a review of recent advances–part 1. Annals of Actuarial Science, 15(2), 207–229.CrossRef Google Scholar

Richman, R., & Wüthrich, M. V. (2020). Nagging predictors. Risks, 8(3), 83.CrossRef Google Scholar

Sill, J. (1997). Monotonic networks. In Advances in neural information processing systems, 10.Google Scholar

Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.CrossRef Google Scholar

Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1), 91–108.CrossRef Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv, 1706.03762v5.Google Scholar

Whittaker, E. T. (1922). On a new method of graduation. Proceedings of the Edinburgh Mathematical Society, 41, 63–75.CrossRef Google Scholar

Wüthrich, M. V., & Merz, M. (2023). Statistical foundations of actuarial learning and its applications. Springer Actuarial.CrossRef Google Scholar

Wüthrich, M. V., & Ziegel, J. (in press). Isotonic recalibration under a low signal-to-noise ratio. Scandinavian Actuarial Journal.Google Scholar

You, S., Ding, D., Canini, K., Pfeifer, J., & Gupta, M. (2017) Deep lattice networks and partial monotonic functions. In Advances in neural information processing systems, 30.Google Scholar

Richman and Wüthrich supplementary material

File 344.5 KB

Article contents

Smoothness and monotonicity constraints for neural networks using ICEnet

Abstract

Keywords

Access options

References

Richman and Wüthrich supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests