Hostname: page-component-77c78cf97d-v4t4b Total loading time: 0 Render date: 2026-05-04T22:50:56.700Z Has data issue: false hasContentIssue false

Cyber breach risk modeling for insurance: capturing temporal and cross-group dependence

Published online by Cambridge University Press:  12 September 2025

Yijia Li
Affiliation:
Department of Statistics and Finance, University of Science and Technology of China, Hefei, China
Xuanhe Wang*
Affiliation:
School of Finance, Dongbei University of Finance and Economics, Dalian, China
Peng Zhao
Affiliation:
School of Mathematics and Statistics, Jiangsu Normal University, Xuzhou, China
Taizhong Hu
Affiliation:
Department of Statistics and Finance, University of Science and Technology of China, Hefei, China
*
Corresponding author: Xuanhe Wang; Email: wxhmath@163.com
Rights & Permissions [Opens in a new window]

Abstract

Cyber breaches pose a significant threat to both enterprises and society. Analyzing cyber breach data is essential for improving cyber risk management and developing effective cyber insurance policies. However, modeling cyber risk is challenging due to its inherent characteristics, including sparsity, heterogeneity, heavy tails, and dependence. This work introduces a cluster-based dependence model that captures both temporal and cross-group dependencies, providing a more accurate representation of multivariate cyber breach risks. The proposed framework employs a cluster-based kernel approach to model breach severity, effectively handling heterogeneity and extreme values, while a copula-based method is used to capture multivariate dependence. Our findings, validated through both empirical and synthetic studies, demonstrate that the proposed model effectively captures the statistical characteristics of multivariate cyber breach risks and outperforms commonly used models in predictive accuracy. Furthermore, we show that our approach can enhance cyber insurance pricing by generating more profitable insurance contracts.

Information

Type
Original Research Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries
Figure 0

Table 1. The resulting groups from PRC dataset categories

Figure 1

Table 2. Summary statistics of the various groups using the PRC data, where "SD" means standard deviation and "CV" means coefficient of variation

Figure 2

Figure 1. Boxplots of Pearson’s $\rho$ and Kendall’s $\tau$ for temporal and cross-group dependence.

Figure 3

Figure 2. The first tree of a $5$-dimensional S-vine with the first three time points.

Figure 4

Algorithm 1. Estimating an S-vine copula model to simultaneously accommodate the cross-group dependence and the temporal dependence in a multivariate time series.

Figure 5

Algorithm 2. Simulating the predictive distribution from a fitted S-vine copula model.

Figure 6

Table 3. The AIC and BIC of the clustered severity when fitted as a mixed distribution with different numbers of clusters

Figure 7

Table 4. K-means clustering of breach sizes

Figure 8

Table 5. Estimated parameters and their standard errors for the marginal distribution of severity in each cluster, where "Est." represents the estimates and "SE" is the standard error

Figure 9

Figure 3. PP-plot of fitted mixed distribution of severity for each cluster.

Figure 10

Figure 4. The first tree of the S-vine fitted on the first $15$ time periods data.

Figure 11

Figure 5. Violin plots of the predicted distributions of breach sizes for the extreme cluster, where a red circle represents an observed value; a green dot and a blue star, respectively, represent the predicted median and mean.

Figure 12

Figure 6. Violin plots of the predicted distributions of breach sizes for the large cluster, where a red circle represents an observed value; a green dot and a blue star, respectively, represent the predicted median and mean.

Figure 13

Figure 7. Violin plots of the predicted distributions of breach sizes for the medium cluster, where a red circle represents an observed value; a green dot and a blue star, respectively, represent the predicted median and mean.

Figure 14

Figure 8. Violin plots of the predicted distributions of breach sizes for the small cluster, where a red circle represents an observed value; a green dot and a blue star, respectively, represent the predicted median and mean.

Figure 15

Figure 9. Violin plots of the fifth root transformed MSE for each model in various clusters, where a green dot indicates a median value.

Figure 16

Figure 10. Violin plots of the square root transformed MAD for each model in various clusters, where a green dot indicates a median value.

Figure 17

Table 6. MSEs and MADs of predicted breach sizes for each model

Figure 18

Table 7. Medians of CRPSs and percentages of CRPS of M1 less than or equal to that of the other models

Figure 19

Figure 11. Violin plots of the square root transformed CPRS for each model in various clusters, where a green dot indicates a median value.

Figure 20

Table 8. Gini indices and their standard errors (SE) based on various models for the number of breach records with base premium from the sample means

Figure 21

Table 9. Gini indices of different models for the number of breach records

Figure 22

Figure 12. Ordered Lorenz curves of the number of breached records.

Supplementary material: File

Li et al. supplementary material

Li et al. supplementary material
Download Li et al. supplementary material(File)
File 299.1 KB