To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Chapter Preview. Many insurance datasets feature information about frequency, how often claims arise, in addition to severity, the claim size. This chapter introduces tools for handling the joint distribution of frequency and severity. Frequency-severity modeling is important in insurance applications because of features of contracts, policyholder behavior, databases that insurers maintain, and regulatory requirements. Model selection depends on the data form. For some data, we observe the claim amount and think about a zero claim as meaning no claim during that period. For other data, we observe individual claim amounts. Model selection also depends on the purpose of the inference; this chapter highlights the Tweedie generalized linear model as a desirable option. To emphasize practical applications, this chapter features a case study of Massachusetts automobile claims, using out-of-sample validation for model comparisons.
How Frequency Augments Severity Information
At a fundamental level, insurance companies accept premiums in exchange for promises to indemnify a policyholder on the uncertain occurrence of an insured event. This indemnification is known as a claim. A positive amount, also known as the severity, of the claim, is a key financial expenditure for an insurer. One can also think about a zero claim as equivalent to the insured event not occurring. So, knowing only the claim amount summarizes the reimbursement to the policyholder. Ignoring expenses, an insurer that examines only amounts paid would be indifferent to two claims of 100 when compared to one claim of 200, even though the number of claims differs.
Chapter Preview. This chapter presents regression models where the random variable is a count and compares different risk classification models for the annual number of claims reported to the insurer. Count regression analysis allows identification of risk factors and prediction of the expected frequency given characteristics of the risk. This chapter details some of the most popular models for the annual number of claims reported to the insurer, the way the actuary should use these models for inference, and how the models should be compared.
Introduction
In the early 20th century, before the theoretical advances in statistical sciences, a method called the minimum bias technique was used to find the premiums that should be offered to insureds with different risk characteristics. This technique's aim was to find the parameters of the premiums that minimize their bias by using iterative algorithms.
Instead of relying on these techniques that lack theoretical support, the actuarial community now bases its methods on probability and statistical theories. Using specific probability distributions for the count and the costs of claims, the premium is typically calculated by obtaining the conditional expectation of the number of claims given the risk characteristics and combining it with the expected claim amount. In this chapter, we focus on the number of claims.
Chapter Preview. Linear modeling, also known as regression analysis, is a core tool in statistical practice for data analysis, prediction, and decision support. Applied data analysis requires judgment, domain knowledge, and the ability to analyze data. This chapter provides a summary of the linear model and discusses model assumptions, parameter estimation, variable selection, and model validation around a series of examples. These examples are grounded in data to help relate the theory to practice. All of these practical examples and exercises are completed using the open-source R statistical computing package. Particular attention is paid to the role of exploratory data analysis in the iterative process of criticizing, improving, and validating models in a detailed case study. Linear models provide a foundation for many of the more advanced statistical and machine-learning techniques that are explored in the later chapters of this volume.
Introduction
Linear models are used to analyze relationships among various pieces of information to arrive at insights or to make predictions. These models are referred to by many terms, including linear regression, regression, multiple regression, and ordinary least squares. In this chapter we adopt the term linear model.
Linear models provide a vehicle for quantifying relationships between an outcome (also referred to as dependent or target) variable and one or more explanatory (also referred to as independent or predictive) variables.
Chapter Preview. Generalized additive models (GAMs) provide a further generalization of both linear regression and generalized linear models (GLM) by allowing the relationship between the response variable y and the individual predictor variables xj to be an additive but not necessarily a monomial function of the predictor variables xj. Also, as with the GLM, a nonlinear link function can connect the additive concatenation of the nonlinear functions of the predictors to the mean of the response variable, giving flexibility in distribution form, as discussed in Chapter 5. The key factors in creating the GAM are the determination and construction of the functions of the predictor variables (called smoothers). Different methods of fit and functional forms for the smoothers are discussed. The GAM can be considered as more data driven (to determine the smoothers) than model driven (the additive monomial functional form assumption in linear regression and GLM).
Motivation for Generalized Additive Models and Nonparametric Regression
Often for many statistical models there are two useful pieces of information that we would like to learn about the relationship between a response variable y and a set of possible available predictor variables x1, x2, …, xk for y: (1) the statistical strength or explanatory power of the predictors for influencing the response y (i.e., predictor variable worth) and (2) a formulation that gives us the ability to predict the value of the response variable ynew that would arise under a given set of new observed predictor variables x1,new, x2,new, …, xk,new (the prediction problem).
Chapter Preview. In the actuarial context, fat-tailed phenomena are often observed where the probability of extreme events is higher than that implied by the normal distribution. The traditional regression, emphasizing the center of the distribution, might not be appropriate when dealing with data with fat-tailed properties. Overlooking the extreme values in the tail could lead to biased inference for rate-making and valuation. In response, this chapter discusses four fat-tailed regression techniques that fully use the information from the entire distribution: transformation, models based on the exponential family, models based on generalized distributions, and median regression.
Introduction
Insurance ratemaking is a classic actuarial problem in property-casualty insurance where actuaries determine the rates or premiums for insurance products. The primary goal in the ratemaking process is to precisely predict the expected claims cost which serves as the basis for pure premiums calculation. Regression techniques are useful in this process because future events are usually forecasted from past occurrence based on the statistical relationships between outcomes and explanatory variables. This is particularly true for personal lines of business where insurers usually possess large amount of information on policyholders that could be valuable predictors in the determination of mean cost.
The traditional mean regression, though focusing on the center of the distribution, relies on the normality of the response variable.
Chapter Preview. This chapter provides an introduction to transition modeling. Consider a situation where an individual or entity is, at any time, in one of several states and may from time to time move from one state to another. The state may, for example, indicate the health status of an individual, the status of an individual under the terms of an insurance policy, or even the “state” of the economy. The changes of state are called transitions. There is often uncertainty associated with how much time will be spent in each state and which state will be entered on each transition. This uncertainty can be modeled using a multistate stochastic model. Such a model may be described in terms of the rates of transition from one state to another. Transition modeling involves the estimation of these rates from data.
Actuaries often work with contracts involving several states and financial implications associated with presence in a state or transition between states. A life insurance policy is a simple example. A multistate stochastic model provides a valuable tool to help the actuary analyze the cash flow structure of a given contract. Transition modeling is essential to the creation of this tool.
This chapter is intended for practitioners, actuaries, or analysts who are faced with a multistate setup and need to estimate the rates of transition from available data. The assumed knowledge – only basic probability and statistics as well as life contingencies – is minimal.
Filtering and smoothing methods are used to produce an accurate estimate of the state of a time-varying system based on multiple observational inputs (data). Interest in these methods has exploded in recent years, with numerous applications emerging in fields such as navigation, aerospace engineering, telecommunications and medicine. This compact, informal introduction for graduate students and advanced undergraduates presents the current state-of-the-art filtering and smoothing methods in a unified Bayesian framework. Readers learn what non-linear Kalman filters and particle filters are, how they are related, and their relative advantages and disadvantages. They also discover how state-of-the-art Bayesian parameter estimation methods can be combined with state-of-the-art filtering and smoothing algorithms. The book's practical and algorithmic approach assumes only modest mathematical prerequisites. Examples include Matlab computations, and the numerous end-of-chapter exercises include computational assignments. Matlab code is available for download at www.cambridge.org/sarkka, promoting hands-on work with the methods.
Professor Kac's monograph is designed to illustrate how simple observations can be made the starting point of rich and fruitful theories and how the same theme recurs in seemingly unrelated disciplines. An elementary but thorough discussion of the game of 'heads or tails', including the normal law and the laws of large numbers, is presented in a setting in which a variety of purely analytic results appear natural and inevitable. The chapter 'Primes Play a Game of Chance' uses the same setting in dealing with problems of the distribution of values of arithmetic functions. The final chapter 'From Kinetic Theory to Continued Fractions' deals with a spectacular application of the ergodic theorems to continued fractions. Mark Kac conveyed his infectious enthusiasm for mathematics and its applications in his lectures, papers, and books. Two of his papers won Chauvenet awards for expository excellence. Born in Poland, he studied with Hugo Steinhaus at Lwów earning his doctorate in 1936. He had a long and productive career in the United States, serving on the faculties of Cornell University (1939-1961), Rockefeller University (1961-1982), and the University of Southern California (1982 until his death in 1984).
Communication networks underpin our modern world, and provide fascinating and challenging examples of large-scale stochastic systems. Randomness arises in communication systems at many levels: for example, the initiation and termination times of calls in a telephone network, or the statistical structure of the arrival streams of packets at routers in the Internet. How can routing, flow control and connection acceptance algorithms be designed to work well in uncertain and random environments? This compact introduction illustrates how stochastic models can be used to shed light on important issues in the design and control of communication networks. It will appeal to readers with a mathematical background wishing to understand this important area of application, and to those with an engineering background who want to grasp the underlying mathematical theory. Each chapter ends with exercises and suggestions for further reading.
'Big data' poses challenges that require both classical multivariate methods and contemporary techniques from machine learning and engineering. This modern text equips you for the new world - integrating the old and the new, fusing theory and practice and bridging the gap to statistical learning. The theoretical framework includes formal statements that set out clearly the guaranteed 'safe operating zone' for the methods and allow you to assess whether data is in the zone, or near enough. Extensive examples showcase the strengths and limitations of different methods with small classical data, data from medicine, biology, marketing and finance, high-dimensional data from bioinformatics, functional data from proteomics, and simulated data. High-dimension low-sample-size data gets special attention. Several data sets are revisited repeatedly to allow comparison of methods. Generous use of colour, algorithms, Matlab code, and problem sets complete the package. Suitable for master's/graduate students in statistics and researchers in data-rich disciplines.
The MAA is pleased to re-issue the early Carus Mathematical Monographs in ebook and print-on-demand formats. Readers with an interest in the history of the undergraduate curriculum or the history of a particular field will be rewarded by study of these very clear and approachable little volumes. This monograph contributes toward shifting the emphasis and point of view in the study of statistics in the direction of the consideration of the underlying theory involved in certain highly important methods of statistical analysis. With this as the main purpose it is natural that no great effort is made to present a well-balanced discussion of all the many available topics. Considerable portions of this monograph can be read by those who have relatively little knowledge of college mathematics. However, the exposition is designed, in general, for readers of a certain degree of mathematical maturity, and presupposes an acquaintance with elementary differential and integral calculus, and with the elementary principles of probability as presented in various books on college algebra for freshmen.
Focusing on what actuaries need in practice, this introductory account provides readers with essential tools for handling complex problems and explains how simulation models can be created, used and re-used (with modifications) in related situations. The book begins by outlining the basic tools of modelling and simulation, including a discussion of the Monte Carlo method and its use. Part II deals with general insurance and Part III with life insurance and financial risk. Algorithms that can be implemented on any programming platform are spread throughout and a program library written in R is included. Numerous figures and experiments with R-code illustrate the text. The author's non-technical approach is ideal for graduate students, the only prerequisites being introductory courses in calculus and linear algebra, probability and statistics. The book will also be of value to actuaries and other analysts in the industry looking to update their skills.
Life and pension insurance are arrangements for which payment streams are determined by states occupied by individuals, for example active, retired, disabled and so on. These contracts typically last for a long time, up to half a century and more. A simple example is an arrangement where an account is first built up and then harvested after a certain date. At first glance this is only a savings account, but insurance can be put into it by including randomness due to how long people live. When such accounts are managed for many individuals simultaneously, it becomes possible to balance lifecycles against one another so that short lives (for which savings are not used up fully) partially finance long ones. There is much sense in this. In old age benefits do not stop after an agreed date, but may go on until the recipient dies.
Many versions and variants of such contracts exist. Benefits may be one-time settlements upon retirement or death of a policy holder or distribute over time as a succession of payments. Traditional schemes have often been defined benefit, where economic rights after retirement determine the contributions that sustain them. Defined contributions are the opposite. Now the pension follows from the earlier build-up of the account. Whatever the arrangement, the values created depend on investment policy and interest rate and also on inflation (which determines the real worth of the pension when it is put to use).