To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In classical life insurance mathematics the obligations of the insurance company towards the policy holders were calculated on artificial conservative assumptions on mortality and interest rates. However, this approach is being superseded by developments in international accounting and solvency standards coupled with other advances enabling a market-based valuation of risk, i.e., its price if traded in a free market. The book describes these approaches, and is the first to explain them in conjunction with more traditional methods. The various chapters address specific aspects of market-based valuation. The exposition integrates methods and results from financial and insurance mathematics, and is based on the entries in a life insurance company's market accounting scheme. The book will be of great interest and use to students and practitioners who need an introduction to this area, and who seek a practical yet sound guide to life insurance accounting and product development.
Every statistician and data analyst has to make choices. The need arises especially when data have been collected and it is time to think about which model to use to describe and summarise the data. Another choice, often, is whether all measured variables are important enough to be included, for example, to make predictions. Can we make life simpler by only including a few of them, without making the prediction significantly worse?
In this book we present several methods to help make these choices easier. Model selection is a broad area and it reaches far beyond deciding on which variables to include in a regression model.
Two generations ago, setting up and analysing a single model was already hard work, and one rarely went to the trouble of analysing the same data via several alternative models. Thus ‘model selection’ was not much of an issue, apart from perhaps checking the model via goodness-of-fit tests. In the 1970s and later, proper model selection criteria were developed and actively used. With unprecedented versatility and convenience, long lists of candidate models, whether thought through in advance or not, can be fitted to a data set. But this creates problems too. With a multitude of models fitted, it is clear that methods are needed that somehow summarise model fits.
Data can often be modelled in different ways. There might be simple approaches and more advanced ones that perhaps have more parameters. When many covariates are measured we could attempt to use them all to model their influence on a response, or only a subset of them, which would make it easier to interpret and communicate the results. For selecting a model among a list of candidates, Akaike's information criterion (AIC) is among the most popular and versatile strategies. Its essence is a penalised version of the attained maximum log-likelihood, for each model. In this chapter we shall see AIC at work in a range of applications, in addition to unravelling its basic construction and properties. Attention is also given to natural generalisations and modifications of AIC that in various situations aim at performing more accurately.
Information criteria for balancing fit with complexity
In Chapter 1 various problems were discussed where the task of selecting a suitable statistical model, from a list of candidates, was an important ingredient. By necessity there are different model selection strategies, corresponding to different aims and uses associated with the selected model. Most (but not all) selection methods are defined in terms of an appropriate information criterion, a mechanism that uses data to give each candidate model a certain score; this then leads to a fully ranked list of candidate models, from the ostensibly best to the worst.
In this chapter we compare some information criteria with respect to consistency and efficiency, which are classical themes in model selection. The comparison is driven by a study of the ‘penalty’ applied to the maximised log-likelihood value, in a framework with increasing sample size. AIC is not strongly consistent, though it is efficient, while the opposite is true for the BIC. We also introduce Hannan and Quinn's criterion, which has properties similar to those of the BIC, while Mallows's Cp and Akaike's FPE behave like AIC.
Comparing selectors: consistency, efficiency and parsimony
If we make the assumption that there exists one true model that generated the data and that this model is one of the candidate models, we would want the model selection method to identify this true model. This is related to consistency. A model selection method is weakly consistent if, with probability tending to one as the sample size tends to infinity, the selection method is able to select the true model from the candidate models. Strong consistency is obtained when the selection of the true model happens almost surely. Often, we do not wish to make the assumption that the true model is amongst the candidate models. If instead we are willing to assume that there is a candidate model that is closest in Kullback–Leibler distance to the true model, we can state weak consistency as the property that, with probability tending to one, the model selection method picks such a closest model.
This book is about making choices. If there are several possibilities for modelling data, which should we take? If multiple explanatory variables are measured, should they all be used when forming predictions, making classifications, or attempting to summarise analysis of what influences response variables, or will including only a few of them work equally well, or better? If so, which ones can we best include? Model selection problems arrive in many forms and on widely varying occasions. In this chapter we present some data examples and discuss some of the questions they lead to. Later in the book we come back to these data and suggest some answers. A short preview of what is to come in later chapters is also provided.
Introduction
With the current ease of data collection which in many fields of applied science has become cheaper and cheaper, there is a growing need for methods which point to interesting, important features of the data, and which help to build a model. The model we wish to construct should be rich enough to explain relations in the data, but on the other hand simple enough to understand, explain to others, and use. It is when we negotiate this balance that model selection methods come into play. They provide formal support to guide data users in their search for good models, or for determining which variables to include when making predictions and classifications.
The model selection methods presented earlier (such as AIC and the BIC) have one thing in common: they select one single ‘best model’, which should then be used to explain all aspects of the mechanisms underlying the data and predict all future data points. The tolerance discussion in Chapter 5 showed that sometimes one model is best for estimating one type of estimand, whereas another model is best for another estimand. The point of view expressed via the focussed information criterion (FIC) is that a ‘best model’ should depend on the parameter under focus, such as the mean, or the variance, or the particular covariate values, etc. Thus the FIC allows and encourages different models to be selected for different parameters of interest.
Estimators and notation in submodels
In model selection applications there is a list of models to consider. We shall assume here that there is a ‘smallest’ and a ‘biggest’ model among these, and that the others lie between these two extremes. More concretely, there is a narrow model, which is the simplest model that we possibly might use for the data, having an unknown parameter vector θ of length p. Secondly, in the wide model, the largest model that we consider, there are an additional q parameters γ = (γ1, …, γq). We assume that the narrow model is a special case of the wide model, which means that there is a value γ0 such that with γ = γ0 in the wide model, we get precisely the narrow model.
In this chapter model selection and averaging methods are applied in some usual regression set-ups, like those of generalised linear models and the Cox proportional hazards regression model, along with some less straightforward models for multivariate data. Answers are suggested to several of the specific model selection questions posed about the data sets of Chapter 1. In the process we explain in detail what the necessary key quantities are, for different strategies, and how these are estimated from data. A concrete application of methods for statistical model selection and averaging is often a nontrivial task. It involves a careful listing of all candidate models as well as specification of focus parameters, and there might be different possibilities for estimating some of the key quantities involved in a given selection criterion. Some of these issues are illustrated in this chapter, which is concerned with data analysis and discussion only; for the methodology we refer to earlier chapters.
AIC and BIC selection for Egyptian skull development data
We perform model selection for the data set consisting of measurements on skulls of male Egyptians, living in different time eras; see Section 1.2 for more details. Our interest lies in studying a possible trend in the measurements over time and in the correlation structure between measurements.
Assuming the normal approximation at work, we construct for each time period, and for each of the four measurements, pointwise 95% confidence intervals for the expected average measurement of that variable and in that time period.
Several real data sets are used in this book to illustrate aspects of the methods that are developed. Here we provide brief descriptions of each of these real data examples, along with key points to indicate which substantive questions they relate to. Key words are also included to indicate the data sources, the types of models we apply, and pointers to where in our book the data sets are analysed. For completeness and convenience of orientation the list below also includes the six ‘bigger examples’ already introduced in Chapter 1.
Egyptian skulls
There are four measurements on each of 30 skulls, for five different archaeological eras (see Section 1.2). One wishes to provide adequate statistical models that also make it possible to investigate whether there have been changes over time. Such evolutionary changes in skull parameters might relate to influx of immigrant populations. Source: Thomson and Randall-Maciver (1905), Manly (1986).
We use multivariate normal models, with different attempts at structuring for mean vectors and variance matrices, and apply AIC and the BIC for model selection; see Example 9.1.
The (not so) Quiet Don
We use sentence length distributions to decide whether Sholokhov or Kriukov is the most likely author of the Nobel Prize winning novel (see Section 1.3). Source: Private files of the authors, collected by combining information from different tables in Kjetsaa et al. (1984), also with some additional help of Geir Kjetsaa (private communication); see also Hjort (2007a).