Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-08T04:57:11.813Z Has data issue: false hasContentIssue false

A representation-learning approach for insurance pricing with images

Published online by Cambridge University Press:  15 March 2024

Christopher Blier-Wong*
Affiliation:
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
Luc Lamontagne
Affiliation:
Département d’informatique et de génie logiciel, Université Laval, Québec, QC, Canada
Etienne Marceau
Affiliation:
École d’actuariat, Université Laval, Québec, QC, Canada
*
Corresponding author: Christopher Blier-Wong; Email: cblierwo@uwaterloo.ca
Rights & Permissions [Opens in a new window]

Abstract

Unstructured data are a promising new source of information that insurance companies may use to understand their risk portfolio better and improve the customer experience. However, these novel data sources are difficult to incorporate into existing ratemaking frameworks due to the size and format of the unstructured data. This paper proposes a framework to use street view imagery within a generalized linear model. To do so, we use representation learning to extract an embedding vector containing useful information from the image. This embedding is dense and low dimensional, making it appropriate to use within existing ratemaking models. We find that there is useful information included in street view imagery to predict the frequency of claims for certain types of perils. This model can be used as in a ratemaking framework but also opens the door to future empirical research on attempting to extract which characteristics within the image leads to increased or decreased predicted claim frequencies. Throughout, we discuss the practical difficulties (technical and social) of using this type of data for insurance pricing.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of The International Actuarial Association
Figure 0

Figure 1. Framework for the representation-learning framework.

Figure 1

Figure 2. Ideal candidate for facade image.

Figure 2

Figure 3. Examples of images requiring filtering.

Figure 3

Figure 4. Examples of images requiring censoring.

Figure 4

Figure 5. Steps in image cleanup.

Figure 5

Table 1. Variables and summary statistics from the property assessment dataset.

Figure 6

Table 2. Summary of fine-tuned, frozen and PCA construction approaches.

Figure 7

Figure 6. Architecture for the complete and limited representation approaches.

Figure 8

Figure 7. Architecture for the basic PCA approach.

Figure 9

Table 3. Summary of parameters for ResNet and DenseNet models.

Figure 10

Table 4. Summary results on training set for Image models.

Figure 11

Table 5. Root mean squared error of regression tasks for frozen and fine-tuned models with ResNet-101 and 32 embedding dimensions.

Figure 12

Figure 8. Confusion matrices of the number of stories for frozen (left) and fine-tuned (right) models with ResNet-101 and 32 embedding dimensions on training set.

Figure 13

Figure 9. Cumulative percentage of variance explained for the first principal components from the feature spaces.

Figure 14

Table 6. Testing deviance for frequency prediction with fine-tuned models.

Figure 15

Table 7. Testing deviance for frequency prediction with frozen models.

Figure 16

Table 8. Testing deviance for frequency prediction with principal components.

Figure 17

Table 9. Comparison of p-values for the variable bage with and without embeddings.

Figure 18

Table 10. Variance inflation factors for different embedding construction approaches.

Figure 19

Figure 10. Correlation matrix of embedding dimensions for ResNet-18 with eight embeddings for frozen (left) and fine-tuned (right) approaches.

Figure 20

Table 11. Variance inflation factors after decorrelating embeddings.

Figure 21

Table 12. Testing deviance for severity prediction with fine-tuned models.

Figure 22

Table 13. Testing deviance for severity prediction with frozen models.

Figure 23

Table 14. Testing deviance for severity prediction with principal components.