To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Chapter Preview. This chapter extends the discussion of multiple linear regression by introducing statistical inference for handling several coefficients simultaneously. To motivate this extension, this chapter considers coefficients associated with categorical variables. These variables allow us to group observations into distinct categories. This chapter shows how to incorporate categorical variables into regression functions using binary variables, thus widening the scope of potential applications. Statistical inference for several coefficients allows analysts to make decisions about categorical variables and other important applications. Categorical explanatory variables also provide the basis for an ANOVA model, a special type of regression model that permits easier analysis and interpretation.
The Role of Binary Variables
Categorical variables provide labels for observations to denote membership in distinct groups, or categories. A binary variable is a special case of a categorical variable. To illustrate, a binary variable may tell us whether someone has health insurance. A categorical variable could tell us whether someone has
Private group insurance (offered by employers and associations),
Private individual health insurance (through insurance companies),
Public insurance (e.g., Medicare or Medicaid) or
No health insurance.
For categorical variables, there may or may not be an ordering of the groups. For health insurance, it is difficult to order these four categories and say which is larger. In contrast, for education, we might group individuals into “low,” “intermediate,” and “high” years of education.
Chapter Preview. Many datasets feature dependent variables that have a large proportion of zeros. This chapter introduces a standard econometric tool, known as a tobit model, for handling such data. The tobit model is based on observing a left-censored dependent variable, such as sales of a product or claim on a health-care policy, where it is known that the dependent variable cannot be less than zero. Although this standard tool can be useful, many actuarial datasets that feature a large proportion of zeros are better modeled in “two parts,” one part for frequency and one part for severity. This chapter introduces two-part models and provides extensions to an aggregate loss model, where a unit under study, such as an insurance policy, can result in more than one claim.
Introduction
Many actuarial datasets come in “two parts:”
One part for the frequency, indicating whether a claim has occurred or, more generally, the number of claims
One part for the severity, indicating the amount of a claim
In predicting or estimating claims distributions, we often associate the cost of claims with two components: the event of the claim and its amount, if the claim occurs. Actuaries term these the claims frequency and severity components, respectively. This is the traditional way of decomposing two-part data, where one can consider a zero as arising from a policy without a claim (Bowers et al., 1997, Chapter 2).
Chapter Preview. A model with a categorical dependent variable allows one to predict whether an observation is a member of a distinct group or category. Binary variables represent an important special case; they can indicate whether an event of interest has occurred. In actuarial and financial applications, the event may be whether a claim occurs, whether a person purchases insurance, whether a person retires or a firm becomes insolvent. This chapter introduces logistic regression and probit models of binary dependent variables. Categorical variables may also represent more than two groups, known as multicategory outcomes. Multicategory variables can be unordered or ordered, depending on whether it makes sense to rank the variable outcomes. For unordered outcomes, known as nominal variables, the chapter introduces generalized logits and multinomial logit models. For ordered outcomes, known as ordinal variables, the chapter introduces cumulative logit and probit models.
Binary Dependent Variables
We have already introduced binary variables as a special type of discrete variable that can be used to indicate whether a subject has a characteristic of interest, such as sex for a person or ownership of a captive insurance company for a firm. Binary variables also describe whether an event of interest, such as an accident, has occurred. A model with a binary dependent variable allows one to predict whether an event has occurred or a subject has a characteristic of interest.
Chapter Preview. This chapter introduces linear regression in the case of several explanatory variables, known as multiple linear regression. Many basic linear regression concepts extend directly, including goodness-of-fit measures such as R2 and inference using t-statistics. Multiple linear regression models provide a framework for summarizing highly complex, multivariate data. Because this framework requires only linearity in the parameters, we are able to fit models that are nonlinear functions of the explanatory variables, thus providing a wide scope of potential applications.
Method of Least Squares
Chapter 2 dealt with the problem of a response depending on a single explanatory variable. We now extend the focus of that chapter and study how a response may depend on several explanatory variables.
Example: Term Life Insurance. Like all firms, life insurance companies continually seek new ways to deliver products to the market. Those involved in product development want to know who buys insurance and how much they buy. In economics, this is known as the demand side of a market for products. Analysts can readily get information on characteristics of current customers through company databases. Potential customers, those who do not have insurance with the company, are often the main focus for expanding market share.
In this example, we examine the Survey of Consumer Finances (SCF), a nationally representative sample that contains extensive information on assets, liabilities, income, and demographic characteristics of those sampled (potential U.S. customers).
Chapter Preview. This chapter describes tools and techniques to help you select variables to enter into a linear regression model, beginning with an iterative model selection process. In applications with many potential explanatory variables, automatic variable selection procedures will help you quickly evaluate many models. Nonetheless, automatic procedures have serious limitations, including the inability to account properly for nonlinearities such as the impact of unusual points; this chapter expands on the discussion in Chapter 2 of unusual points. It also describes collinearity, a common feature of regression data where explanatory variables are linearly related to one another. Other topics that affect variable selection, including heteroscedasticity and out-of-sample validation, are also introduced.
An Iterative Approach to Data Analysis and Modeling
In our introduction of basic linear regression in Chapter 2, we examined the data graphically, hypothesized a model structure, and compared the data to a candidate model to formulate an improved model. Box (1980) describes this as an iterative process, which is shown in Figure 5.1.
This iterative process provides a useful recipe for structuring the task of specifying a model to represent a set of data. The first step, the model formulation stage, is accomplished by examining the data graphically and using prior knowledge of relationships, such as from economic theory or standard industry practice. The second step in the iteration is based on the assumptions of the specified model.
Chapter Preview. This chapter introduces a classic actuarial reserving problem that is encountered extensively in property and casualty as well as health insurance. The data are presented in a triangular format to emphasize their longitudinal and censored nature. This chapter explains how such data arise naturally and introduces regression methods to address the actuarial reserving problem.
Introduction
In many types of insurance, little time elapses between the event of a claim, notification to an insurance company, and payment to beneficiaries. For example, in life insurance notification and benefit payments typically occurs within two weeks of an insured's death. However, for other lines of insurance, times from claim occurrence to the final payment can be much longer, taking months and even years. To introduce this situation, this section describes the evolution of a claim; introduces summary measures used by insurers, and then describes the prevailing deterministic method for forecasting claims, the chain-ladder method.
Claims Evolution
For example, suppose that you become injured in an automobile accident covered by insurance. It can take months for the injury to heal and all of the medical care payments to become known and paid by the insurance company. Moreover, disputes may arise among you, other parties to the accident, your insurer, and insurer(s) of other parties, thus lengthening the time until claims are settled and paid. When claims take a long time to develop, an insurer's claim obligations may be incurred in one accounting period, but not paid until a later accounting period.
Chapter Preview. This chapter begins our study of time series data by introducing techniques to account for major patterns, or trends, in data that evolve over time. The focus is on how regression techniques developed in earlier chapters can be used to model trends. Further, new techniques, such differencing data, allow us to naturally introduce a random walk, an important model of efficient financial markets.
Introduction
Time Series and Stochastic Processes
Business firms are not defined by physical structures such as the solid stone bank building that symbolizes financial security. Nor are businesses defined by spacealien invader toys that they manufacture for children. Businesses comprise several complex, interrelated processes. A process is a series of actions or operations that lead to a particular end.
Processes not only are the building blocks of businesses but also provide the foundations for our everyday lives. We may go to work or school every day, practice martial arts, or study statistics. These are regular sequences of activities that define us. In this text, our interest is in modeling stochastic processes, defined as ordered collections of random variables that quantify a process of interest.
Some processes evolve over time, such as daily trips to work or school, or the quarterly earnings of a firm. We use the term longitudinal data for measurements of a process that evolves over time.
Chapter Preview. Longitudinal data, also known as panel data, are composed of a cross-section of subjects that we observe repeatedly over time. Longitudinal data enable us to study cross-sectional and dynamic patterns simultaneously; this chapter describes several techniques for visualizing longitudinal data. Two types of models are introduced, fixed and random effects models. This chapter shows how to estimate fixed effects models using categorical explanatory variables. Estimation for random effects models is deferred to a later chapter; this chapter describes when and how to use these models.
What Are Longitudinal and Panel Data?
In Chapters 1–6, we studied cross-sectional regression techniques that enabled us to predict a dependent variable y using explanatory variables x. For many problems, the best predictor is a value from the preceding period; the times series methods we studied in Chapters 7–9 use the history of a dependent variable for prediction. For example, an actuary seeking to predict insurance claims for a small business will often find that the previous year's claims are the best predictor. However, a limitation of time series methods is that they are based on having available many observations over time (typically 30 or more). When studying annual claims from a business, a long time series is rarely available; either businesses do not have the data, or if they do, it is unreasonable to use the same stochastic model for today's claims as for those 30 years ago.
Actuaries and other financial analysts quantify situations using data – we are “numbers” people. Many of our approaches andmodels are stylized, based on years of experience and investigations performed by legions of analysts. However, the financial and risk management world evolves rapidly. Many analysts are confronted with new situations in which tried-and-true methods simply do not work. This is where a toolkit like regression analysis comes in.
Regression is the study of relationships among variables. It is a generic statistics discipline that is not restricted to the financial world – it has applications in the fields of social, biological, and physical sciences. You can use regression techniques to investigate large and complex data sets. To familiarize you with regression, this book explores many examples and data sets based on actuarial and financial applications. This is not to say that you will not encounter applications outside of the financial world (e.g., an actuary may need to understand the latest scientific evidence on genetic testing for underwriting purposes). However, as you become acquainted with this toolkit, you will see how regression can be applied in many (and sometimes new) situations.
Who Is This Book For?
This book is written for financial analysts who face uncertain events and wish to quantify the events using empirical information. No industry knowledge is assumed, although readers will find the reading much easier if they have an interest in the applications discussed here! This book is designed for students who are just being introduced to the field as well as industry analysts who would like to brush up on old techniques and (for the later chapters) get an introduction to new developments.
Chapter Preview. This chapter considers regression in the case of only one explanatory variable. Despite this seeming simplicity, most of the deep ideas of regression can be developed in this framework. By limiting ourselves to the one variable case, we are able to express many calculations using simple algebra. This will allow us to develop our intuition about regression techniques by reinforcing it with simple demonstrations. Further, we can illustrate the relationships between two variables graphically because we are working in only two dimensions. Graphical tools prove important for developing a link between the data and a model.
Correlations and Least Squares
Regression is about relationships. Specifically, we will study how two variables, an x and a y, are related. We want to be able to answer questions such as, If we change the level of x, what will happen to the level of y? If we compare two subjects that appear similar except for the x measurement, how will their y measurements differ? Understanding relationships among variables is critical for quantitative management, particularly in actuarial science, where uncertainty is so prevalent.
It is helpful to work with a specific example to become familiar with key concepts. Analysis of lottery sales has not been part of traditional actuarial practice, but it is a growth area in which actuaries could contribute.
Example: Wisconsin Lottery Sales. State of Wisconsin lottery administrators are interested in assessing factors that affect lottery sales. Sales consists of online lottery tickets that are sold by selected retail establishments in Wisconsin.