This chapter examines the usual argument that adverse selection makes insurance work less well. It shows that from a public policy perspective, the absolutism of this argument is misconceived. Some degree of adverse selection in insurance is generally beneficial to society. The optimal level of adverse selection is the level which maximises loss coverage, the expected losses compensated by insurance for the population as a whole.
Although this proposition is unorthodox, there are some analogies which lend it intuitive appeal and plausibility. In many biological contexts, the analogy is hormesis: the phenomenon of stressors which are beneficial to an organism in small doses, but become harmful in higher doses. Examples of hormesis in the human body include the response to most drugs and nutrients (alcohol is a vivid example, but the same pattern applies for substances as ubiquitous and innocuous as water); the response to physical exercise; and the response to stress (cf. ‘eustress’ and ‘distress’). By analogy, we can think of the insurance system being ‘stressed’ by adverse selection. Small doses ‘stretch’ the system and make it work better, in the sense of increasing loss coverage, but higher doses make the system work less well.
The argument that some degree of adverse selection makes the insurance system work better will be presented in three ways: first, a verbal summary of the key argument; second, numerical examples; and third, mathematical details. The summary and examples are presented in this chapter, and the mathematics in Chapters 4–6.
The Key Argument
The following verbal summary repeats that given in Chapter 1, with only footnotes and a final paragraph added.
Consider an insurance market where individuals can be divided into two risk-groups, one higher risk and one lower risk, based on information which is fully observable by insurers. Assume that all losses and insurance are of unit amount (this simplifies the discussion, but it is not necessary). Also assume that an individual’s risk is unaffected by the purchase of insurance, i.e. there is no moral hazard.Footnote 1
If insurers can, they will charge risk-differentiated prices to reflect the different risks. If instead insurers are banned from differentiating between higher and lower risks, and have to charge a single ‘pooled’ price for all risks, a pooled price equal to the simple average of the risk-differentiated prices will seem cheap to higher risks and expensive to lower risks. Higher risks will buy more insurance, and lower risks will buy less.
To break even, insurers will then need to raise the pooled price above the simple average of the prices. Also, since the number of higher risks is typically smaller than the number of lower (or ‘standard’) risks, higher risks buying more and lower risks buying less implies that the total number of people insured usually falls.Footnote 2 This combination of a rise in price and a fall in demand is usually portrayed as a bad outcome, for both insurers and society.
However, from a social perspective, it is arguable that higher risks are those more in need of insurance. Also, the compensation of many types of loss by insurance appears to be widely regarded as a desirable objective, which public policymakers often seek to promote, by public education, by exhortation and sometimes by incentives such as tax relief on premiums. Insurance of one higher risk contributes more in expectation to this objective than insurance of one lower risk. This suggests that public policymakers might welcome increased purchasing by higher risks, except for the usual story about adverse selection.
The usual story about adverse selection overlooks one point: with a pooled premium and adverse selection, expected losses compensated by insurance can still be higher than with fully risk-differentiated premiums and no adverse selection. Although pooling leads to a fall in numbers insured, it also leads to a shift in coverage towards higher risks. From a public policymaker’s viewpoint, this means that more of the ‘right’ risks – those more likely to suffer loss – buy insurance. If the shift in coverage is large enough, it can more than outweigh the fall in numbers insured. This result of higher expected losses compensated by insurance – higher ‘loss coverage’ – can be seen as a better outcome for society than that obtained with no adverse selection.
Another perspective on this argument is that a public policymaker designing risk classification policies in the context of adverse selection normally faces a trade-off between insurance of the ‘right’ risks (those more likely to suffer loss) and insurance of a larger number of risks. The optimal trade-off depends on the response of higher and lower risk-groups to different prices (technically, the demand elasticity of different risk-groups), and will normally involve at least some adverse selection. The concept of loss coverage quantifies this trade-off, and provides a metric for comparing the effects of different risk classification schemes.
Numerical Examples
Numerical examples can illustrate the argument sketched above. These are similar in nature to the two scenarios illustrating ‘no adverse selection’ and ‘some adverse selection’ in the toy example in Chapter 1. But now I use more realistic numbers, and also introduce a third scenario of ‘too much’ adverse selection.
Suppose that in a population of 1,000 risks, 16 losses are expected every year. There are two risk-groups; 200 high risks have a probability of loss 4 times higher than the other 800 low risks. We assume that all losses and insurance are of unit amount, and that there is no moral hazard. An individual’s risk-group is fully observable to insurers.
Under the initial risk classification regime, insurers operate full risk classification, charging actuarially fair premiums to members of each risk-group. We assume that the proportion of each risk-group which buys insurance under these conditions – the ‘fair-premium demand’ – is 50%, which is realistic for life insurance in the UK and the USA.Footnote 3 Table 3.1 shows the outcome, which can be summarised as follows:
– There is no adverse selection. The average of the insurance premiums, weighted by numbers of insurance buyers at each price, is 0.016 (final column, fourth line). This is the same as the population-weighted average risk (final column, first line). Dividing the first by the second, we index the adverse selection as 0.016/0.016 = 1.0, indicating a neutral position (i.e. no adverse selection).
– Half the losses in the population are compensated by insurance. We heuristically characterise this as a ‘loss coverage’ of 0.5.
Table 3.1 Full risk classification: no adverse selection (base outcome)
| Low risk-group | High risk-group | Aggregate | |
|---|---|---|---|
| Risk | 0.01 | 0.04 | 0.016 |
| Total population | 800 | 200 | 1 000 |
| Expected population losses | 8 | 8 | 16 |
| Break-even premiums (risk-differentiated) | 0.01 | 0.04 | 0.016 |
| Numbers insured | 400 | 100 | 500 |
| Insured losses | 4 | 4 | 8 |
| Adverse selection | 1 | ||
| Loss coverage | 0.5 |
Now suppose that a new risk classification regime is introduced, where insurers are obliged to charge a common ‘pooled’ premium to members of both the low and high risk-groups. One possible outcome is shown in Table 3.2, which can be summarised as follows:
– The pooled premium of 0.02 at which insurers make zero profits is calculated as the demand-weighted average of the risk premiums: (300 × 0.01 + 150 × 0.04)/450 = 0.02).
– The pooled premium is expensive for low risks, so 25% fewer of them buy insurance (300, compared with 400 before). The pooled premium is cheap for high risks, so 50% more of them buy insurance (150, compared with 100 before). Because there are 4 times as many low risks as high risks in the population, the total number of policies sold falls (450, compared with 500 before).
– There is moderate adverse selection. The pooled premium of 0.02 exceeds the population-weighted average premium of 0.016, giving adverse selection = 0.02/0.016 = 1.25.
– The resulting loss coverage is 0.5625. The shift in coverage towards high risks more than outweighs the fall in number of policies sold: 9 of 16 losses (56%) in the population as a whole are now compensated by insurance (compared with 8 of 16 before).
Table 3.2 Risk classification banned: moderate adverse selection leading to higher loss coverage (better outcome)
| Low risk-group | High risk-group | Aggregate | |
|---|---|---|---|
| Risk | 0.01 | 0.04 | 0.016 |
| Total population | 800 | 200 | 1 000 |
| Expected population losses | 8 | 8 | 16 |
| Break-even premiums (pooled) | 0.02 | 0.02 | 0.02 |
| Numbers insured | 300 | 150 | 450 |
| Insured losses | 3 | 6 | 9 |
| Adverse selection | 1.25 | ||
| Loss coverage | 0.5625 |
Table 3.2 exhibited moderate adverse selection. Another possible outcome under the restricted risk classification scheme, this time with more severe adverse selection, is shown in Table 3.3, which can be summarised as follows:
– The pooled premium of 0.02154 at which insurers make zero profits is calculated as the demand-weighted average of the risk premiums: (200 × 0.01 + 125 × 0.04)/325 = 0.02154.
– There is severe adverse selection, with a further increase in the pooled premium and a significant fall in numbers insured.
– The loss coverage is 0.4375. The shift in coverage towards high risks is not sufficient to outweigh the fall in number of policies sold: 7 of 16 losses (43.75%) in the population as a whole are now compensated by insurance (compared with 8 of 16 in Table 3.1, and 9 out of 16 in Table 3.2).
Table 3.3 Risk classification banned: severe adverse selection leading to lower loss coverage (worse outcome)
| Low risk-group | High risk-group | Aggregate | |
|---|---|---|---|
| Risk | 0.01 | 0.04 | 0.016 |
| Total population | 800 | 200 | 1 000 |
| Expected population losses | 8 | 8 | 16 |
| Break-even premiums (pooled) | 0.02154 | 0.02154 | 0.02154 |
| Numbers insured | 200 | 125 | 325 |
| Insured losses | 2 | 5 | 7 |
| Adverse selection | 1.34625 | ||
| Loss coverage | 0.4375 |
Taking the three tables together, we can summarise by saying that compared with an initial position of no adverse selection in Table 3.1, moderate adverse selection leads to higher expected losses compensated by insurance (higher loss coverage) in Table 3.2; but too much adverse selection leads to lower expected losses compensated by insurance (lower loss coverage) in Table 3.3.
This argument that moderate adverse selection increases loss coverage is quite general: it does not depend on any unusual choice of numbers for the examples. It also does not assume any bias by the policymaker towards (or against) compensating the losses of the high risk-group in preference to those of the low risk-group. The same preference is given to compensation of losses anywhere in the population ex post, when all uncertainty about who will suffer a loss has been resolved. This implies giving higher preference to insurance cover for higher risks ex ante, before we know who will suffer a loss, but only in proportion to their higher risk.
Loss Coverage in Context
The remainder of this chapter makes a range of comparisons and analogies which helps to set the concept of loss coverage in context.
The Hirshleifer Effect
The idea that restricting risk classification and hence inducing a degree of adverse selection can produce a better outcome for society as a whole is analogous to the Hirshleifer effect.Footnote 4 This is named after the economist Jack Hirshleifer, who pointed out that too much information about future investment returns can reduce opportunities for risk sharing, because a risk which has been resolved cannot be shared. Although some people can make better decisions when more information becomes available, society as a whole can end up worse off. In insurance, the analogous phenomenon is that the use of too much information for risk classification can reduce risk sharing: although some people get cheaper insurance because of the increased use of information, the overall quantum of risk transferred to insurers falls, so society as a whole ends up worse off.
Loss Coverage and Compulsory Insurance
One way of maximising loss coverage is to make insurance compulsory – either through social insurance, or by laws which compel people to buy commercial insurance. Healthcare is often provided by social insurance, and liability insurances which protect the interests of unidentified third parties or the public at large are often made compulsory (e.g. third-party car insurance, employers’ liability insurance). The potential losses compensated by these insurances are largely independent of the personal circumstances of the insured, and so compulsion is a well-targeted approach to achieving policymakers’ desired (universal) loss coverage.
In other classes of insurance, such as life insurance, the policy motivations to increase loss coverage may be less conclusive, but are still present to some degree. For example, policymakers may consider that through incomplete information or behavioural bias, most people under estimate their needs for life insurance, and so too little life insurance is bought.Footnote 5 However, in life insurance the potential loss depends on the personal circumstances of the insured. A compulsory requirement for unmarried persons to purchase life insurance may not increase loss coverage (because there is often no financial loss to a survivor if the unmarried person dies); the cost of premiums to the unmarried person may itself represent an uncompensated loss. One might exclude unmarried persons from compulsion; but some unmarried persons do need life insurance, and equally some married persons do not. Overall, for insurances where the potential loss depends on personal circumstances, compulsion may be a poorly targeted and politically unpopular means of increasing loss coverage. It may be both more effective and more politically acceptable to leave purchasing decisions to individuals, and increase loss coverage by regulating risk classification.
There may be a few classes of insurance where there is little policy motivation to promote the compensation of losses by insurance. A clear example would be insurance against the cost of legal or regulatory penalties; a less clear example might be insurance of pet animals. In these cases, policymakers may view loss coverage as a less important objective than in markets such as liability, health and life insurance. Indeed, policymakers may even seek to reduce loss coverage, by discouraging the insurance. In extreme cases this might mean banning the insurance. In the UK, the Financial Conduct Authority bans the firms which it regulates from insuring against the cost of regulatory penalties.Footnote 6
Loss Coverage versus ‘Social Welfare’
There is an extant literature in economics which considers the effect of restrictions on risk classification. Economists typically take a utility-based approach: individuals make insurance choices to maximise their expected utility according to some utility function, and the outcomes of different risk classification schemes are evaluated by social welfare, typically defined as the sum of expected utilities over the entire population.Footnote 7
One way of describing the difference in this book’s approach compared with the economics literature is that rather than assigning equal weights to expected utility levels across low and high risks, I assign equal weights to loss coverage levels across low and high risks. Under my approach, if all losses are of unit amount, and high risks have a probability of loss twice that of low risks, then coverage of one high risk is considered to be worth the same ex ante as coverage of two low risks.
Another way of describing the difference between loss coverage and social welfare is to note that loss coverage focuses on maximising the benefit of insurance, framed solely as the compensation of losses. Minor details about preferences – who underpays and overpays for insurance, and how much they care (how it affects their utility) – are disregarded, provided that the insurance system as a whole equilibrates.
This disregard seems reasonable in the typical scenario where insurance premiums are small relative to wealth. It is also practical in the sense that loss coverage depends solely on observable claims, whereas social welfare depends on utilities which are never observable. Loss coverage also has the advantage of being a simpler concept, and so more conducive to wide understanding in policy discussions.
Although loss coverage and social welfare are distinct criteria, they can be shown to be tightly related under simple assumptions about utility (and hence insurance demand) functions. Specifically: for power utility functions (and hence iso-elastic insurance demand), either criterion – loss coverage or social welfare – gives the same rank ordering of different risk classification schemes. Those who prefer the criterion of social welfare in principle may then wish to think of loss coverage as an observable proxy, under certain assumptions, for social welfare.Footnote 8
Loss Coverage versus ‘Coverage’
The criterion of loss coverage and the criterion of social welfare can be further contrasted with the criterion of simple coverage, which is often implicit in informal discussions of risk classification. In informal discussions, adverse selection is often said to be a bad thing because it leads to a ‘reduction in coverage’ compared with that attained in the absence of adverse selection. Economists typically say that any reduction in coverage is ‘inefficient’.Footnote 9 But if loss coverage is increased, the quantum of risk transferred to insurers is increased. Why should an arrangement under which more risk is voluntarily traded and more losses are compensated be disparaged as ‘inefficient’?
Such discussions of the ‘reduction in coverage’ arising from adverse selection implicitly treat coverage of one high risk as equivalent to coverage of one low risk. But if the purpose of insurance is to compensate the population’s losses, this does not seem a valid equivalence: covering one high risk contributes more in expectation to this purpose than covering one low risk. In other words, these informal discussions are typically predicated on an inaccurate measure of the benefit of insurance.
Another succinct description of the contrast between loss coverage and the common informal reference to just ‘coverage’ is to note that loss coverage corresponds to risk-weighted insurance demand, and coverage to unweighted insurance demand.
Probabilistic Goods versus Reassurance Goods
The risk-weighted nature of loss coverage is predicated on the notion that the good provided by insurance is the contingent compensation of losses. This good materialises only in a particular future state of the world, which has different probabilities for higher and lower risks. That is, insurance is a probabilistic and individually heterogeneous good.
For such a good, one unit of sales to a higher-risk individual is a different good compared with one unit of sales to a lower-risk individual. In loss coverage, this difference is reflected in the risk-weighting of coverage. In utilitarian social welfare, it is reflected in higher probabilities attached to losses in expected utility calculations.
While I believe this framing of insurance as a probabilistic good is appropriate when considering public policy at a population and objective level, I acknowledge that at an individual and subjective level, insurance can also be framed as a non-probabilistic good. Insurance goods can be framed as non-probabilistic if the good is conceived as ‘peace of mind’ or ‘freedom from worry’. Think of car insurance, home insurance or life insurance: in every case, at an individual and subjective level, the good might just as well be framed as freedom from worry in the present, rather than as the contingent compensation of losses in the future. Insurance framed as ‘freedom from worry’ is a non-probabilistic experience good. In the specific insurance context, we can call it a reassurance good (reassurance being the experience which insurance provides).
If insurance is framed as a reassurance good, one unit of sales to a higher-risk individual is closer to the same good as one unit of sales to a lower-risk individual. I say ‘closer to’ rather than ‘equal to’ because although both higher and lower risks experience similar reassurance, it might be felt that the quantum of this reassurance increases for higher underlying probability of loss. This is certainly arguable, but it is not obvious that the increase should be in one-to-one proportion to the probability. And it seems at least equally arguable that for any class of insurance (say car, home or life), the quantum of reassurance is invariant to individual variations in probabilities of loss (or alternatively, increasing in less than one-to-one proportion, or at a sublinear rate, in those variations).
I acknowledge that the framing of insurance as a reassurance good may often be psychologically salient to individuals; and that in this framing, the risk-weighting in loss coverage might overstate the benefit of coverage of higher risks. Nevertheless, I believe that the framing of insurance as a probabilistic good is a better basis for public policy. This is for the same reason as I prefer loss coverage to social welfare: I believe that it is more practical to base public policy on observables (such as probabilities of loss), not on individuals’ unobservable states of mind (such as feelings of reassurance, or increases in utility).Footnote 10
Loss Coverage and Partial Risk Classification
The discussion above has considered only two possibilities for risk classification: fully risk-differentiated premiums or complete pooling. In practice, partial restrictions on risk classification are common, where only some risk variables (e.g. gender, family history, genetic test results) are banned. Loss coverage is maximised when there is an intermediate level of adverse selection, not too low and not too high. It seems plausible that in some markets, complete pooling generates too much adverse selection; but partial restrictions on risk classification generate an intermediate level of adverse selection, and hence higher loss coverage than under either pooling or fully risk-differentiated premiums. Loss coverage can then provide an explanation or rationalisation of the partial restrictions often seen in practice. Partial risk classification is considered further in Chapter 6.
Different Weights for Different Risk-Groups?
Loss coverage as defined in this book always places equal weight on equal expected losses, irrespective of the part of the population from which they are expected. No preference is given to achieving compensation of losses arising from higher or lower risks. This solution to the trade-off between the interests of higher and lower risks has obvious intuitive and egalitarian appeal. But in some circumstances, a policymaker might wish to give higher priority to compensating the losses of higher (or lower) risks – perhaps because those with higher insurance risks also tend to suffer other economic or social disadvantages. If so, it is straightforward to redefine loss coverage as a weighted average of expected losses over the population, with the weights reflecting the priority the policymaker places on compensation of losses arising from different risks.Footnote 11
Other Public Policy Criteria for Risk Classification
Loss coverage is not the only criterion which a public policymaker may consider when setting policy on risk classification. In some markets, risk classification might provide an incentive for loss prevention: for example, if house insurance premiums are lower for homes with security features such as burglar alarms or better locks, perhaps home owners will be more inclined to install these features.Footnote 12 A policymaker might also place value on ensuring the optional availability of insurance to members of higher risk-groups, distinct from the actual take-up of the option. The possible ‘spillover’ effects of insurance risk classification on privacy, public health and discrimination in other contexts such as employment might also be considered. Restrictions on risk classification sometimes appear to be based on a principled objection to statistical discrimination per se, rather than on an assessment of insurance market consequences. These and other objections to risk classification are discussed further in Chapter 7. Nevertheless, to the extent that consequences in the insurance market itself are given weight, loss coverage is a useful metric for those consequences.
Loss Coverage: The Insurer’s Perspective
While this book focuses mainly on a public policy perspective, the loss coverage concept can also be viewed from the insurer’s perspective. Maximising loss coverage is equivalent to maximising premium income. If profit loadings are proportional to premiums, maximising loss coverage could be a desirable objective for insurers. Even from the insurer’s perspective, adverse selection is not always a bad thing! This might help explain why insurers often appear rather slower to make use of every scrap of marginal information for risk classification than economic theory predicts, even when information is observable at zero or at negligible cost and apparently relevant to the risk.
Finkelstein and Poterba (Reference Finkelstein and Poterba2014) note that UK insurers were very slow to adopt ‘postcode pricing’ of annuities, despite postcode information being available at zero cost and regional variations in annuitant mortality having been reasonably well known for many years. They offer four possible explanations, but do not consider the possibility that the small amount of adverse selection induced by ignoring postcode information may give higher loss coverage (from the insurers’ perspective, higher aggregate annuity premiums). They also note that once one large insurer (Legal and General) adopted postcode pricing in 2007, most other insurers quickly followed. If profit loadings are proportional to premiums, Legal and General’s competitive innovation and the resulting ‘competitive adverse selection’ (as discussed in Chapter 8) may have made insurers collectively worse off.
Loss Coverage versus Preconceptions: Hidden in Plain View
The observation that a larger fraction of the population’s losses can be compensated with some adverse selection involves only elementary arithmetic, but as far as I am aware it has not been pointed out before. Given its technical simplicity, this may seem surprising. It becomes less surprising when one reflects on how the insight clashes with extant economic and industry dogma. Economists seem strongly committed to the idea that asymmetric information always makes all markets work less well; in contrast, loss coverage says that that a limited amount of asymmetric information makes insurance markets work better. Similarly, insurers seem strongly committed to the idea that unrestricted risk classification is commercially vital; in contrast, loss coverage says that some restrictions on risk classification are likely to be beneficial, including for insurers themselves. An unrecognised simplicity which clashes with so many people’s preconceptions can remain hidden in plain view.
One example where the loss coverage perspective contrasts with an argument made by economists on behalf of insurers is a paper Why the use of age and disability matters to consumers and insurers, commissioned by the trade association Insurance Europe from the British economic consultants Oxera.Footnote 13 The document presents an example of the effect of banning risk classification by disability in life insurance. The example is hypothetical, with assumed sums assured of £100,000, and all other figures chosen to make a case to policymakers against restrictions on risk classification – but in my view inadvertently making the opposite case. Table 3.4 summarises the figures presented by Oxera as the hypothetical position before a ban on risk classification.
Table 3.4 Life insurance with risk-based premiums (based on Oxera, 2012)
| Risk | Mortality | Population | Insured population (clientele) | Exapected cost per customer (€) | Total claims (€) | Premium rate (€) | Total premium income (€) |
|---|---|---|---|---|---|---|---|
| Standard | 1.4% | 800 | 400 | 1 400 | 560 000 | 1 400 | 560 000 |
| Double | 2.8% | 150 | 75 | 2 800 | 210 000 | 2 800 | 210 000 |
| High | 14.0% | 50 | 0 | 14 000 | |||
| Total | 1 000 | 475 | 770 000 | 770 000 |
Table 3.5 shows the figures given as the hypothetical position after a ban on risk classification. Oxera notes that the premium for the ‘Standard’ risks more than doubles after the ban, and suggests that this shows that a ban is undesirable.
Table 3.5 Life insurance with pooled premiums (based on Oxera, 2012)
| Risk | Mortality | Population | Insured population (clientele) | Expected cost per customer (€) | Total claims (€) | Premium rate (€) | Total premium income (€) |
|---|---|---|---|---|---|---|---|
| Standard | 1.4% | 800 | 200 | 1 400 | 280 000 | 3 333 | 666 667 |
| Double | 2.8% | 150 | 75 | 2 800 | 210 000 | 3 333 | 250 000 |
| High | 14.0% | 50 | 40 | 14 000 | 560 000 | 3 333 | 133 333 |
| Total | 1 000 | 315 | 1 050 000 | 1 050 000 |
But on Oxera’s figures, loss coverage is increased by restricting risk classification. Before the ban, the expected number of deaths compensated by insurance is 770,000/100,000 = 7.7. After the ban, the corresponding number is 1,050,000/100,000 = 10.5. Banning risk classification in this example gives an improvement: the shift in coverage towards higher-risk lives more than compensates for the fall in numbers insured.
Of course, since Oxera’s figures are hypothetical, it is easy to construct an example corresponding to Scenario 3 earlier in this chapter, where adverse selection ‘goes too far’ and lower loss coverage results from a ban. But the fact that this was not done suggests that the loss coverage perspective was not considered.
Summary
This chapter has suggested that the usual arguments that adverse selection always makes insurance work less well, and that more adverse selection is always worse than less, are misconceived. Adverse selection implies a fall in numbers insured, and also a shift in coverage towards higher risks. If the shift in coverage outweighs the fall in numbers, the expected losses compensated by insurance are increased – that is the loss coverage is increased. From a public policy perspective, a degree of so-called ‘adverse’ selection in insurance is a good thing.
The discussion of loss coverage in Chapter 3 was informal. Chapters 4–6 develop more precisely the mathematical relationships between adverse selection, loss coverage, and demand elasticities. Readers who are not concerned with mathematical detail may prefer to skip Chapters 4–6, or perhaps just skim the graphs and end-of-chapter summaries. The full mathematical details are not prerequisites for understanding most of the rest of this book.
A Model for an Insurance Market
Throughout Chapters 4–6, I assume a population of risks can be divided into a low risk-group and a high risk-group, based on information which is fully observable to insurers. Just two risk-groups is not, of course, a realistic model of most insurance markets, but it is enough to illustrate principles.Footnote 1
Let μ1 and μ2 be the underlying risks (probabilities of loss) for the low and high risk-groups.
Let p1 and p2 be the population fractions for the low and high risk-groups, that is the proportions of the total population represented by each risk-group. This means a risk chosen at random from the entire population has a probability p1 of belonging to the low risk-group.
Throughout Chapters 4–6, I assume that all losses and insurance cover are of unit size; this simplifies the presentation, but it is not necessary. I also assume no moral hazard, that is, given the risk-group, the probability of loss is not affected by the purchase of insurance.Footnote 2
All quantities defined below are for a single risk sampled at random from the population (unless the context requires otherwise).
The expected loss is denoted E[L] (‘L’ for ‘loss’) and given by:
E[L] corresponds to a unit version of the third row of the tables in the examples in Chapter 3.
In the absence of limits on risk classification, insurers will charge risk-differentiated premiums equal to the probabilities of loss, π1 = μ1 and π2 = μ2 for risk-groups 1 and 2, respectively.
The expected insurance demand is denoted E[Q] (‘Q’ for ‘quantity’) and given by:
where d(μi,πi) is the proportional demand for insurance for risk-group i when premium πi is charged, that is the probability that an individual selected at random from the risk-group buys insurance.Footnote 3
The expected premium is denoted by E[Π] (‘Π’ for ‘premium’) and given by:
The expected insurance claim is given by:
Adverse Selection
The phrase ‘adverse selection’ is typically associated with positive correlation (or equivalently, covariance) of cover Q and loss L; most papers testing for adverse selection in the economics literature use this definition.Footnote 4 This section makes this definition precise, in a form which will later help to highlight the relationship between adverse selection and loss coverage.
By the standard definition, the covariance of Q and L is:
While the economics literature typically uses covariance (Q, L) > 0 as a test for adverse selection, it is more convenient to note that when the covariance is zero, the two terms in Equation (4.5) must be the same. We can then define the adverse selection as:
This enables us to index different types of selective behaviour by insurance customers as follows:
A < 1: advantageous selection
A > 1: adverse selection.Footnote 5
To compare the severity of adverse selection under different risk-classification regimes, we need to define a reference level of adverse selection. Adverse selection under alternative schemes can then be expressed as a fraction of adverse selection under the reference scheme. A convenient reference scheme is risk-differentiated premiums (actuarially fair premiums). Then using subscript 0 to denote quantities evaluated under risk-differentiated premiums, we define the adverse selection ratio as:
In words, adverse selection ratio is the ratio of the expected claim per policy under the actual risk classification scheme to the expected claim per policy under risk-differentiated premiums.
Adverse selection ratio can also be thought of as the ratio of the demand-weighted average premiums required for insurers to break even under each risk classification scheme.
Loss Coverage
Loss coverage in this model is defined as the expected insurance claim (as previously evaluated in Equation (4.4)):
The product of random variables Q and L can alternatively be thought of as the following ‘indicator’ random variable:
Loss coverage can then be thought of as indexing the ‘overlap’ of cover Q and losses L in the population. It represents the extent to which insurance cover is concentrated over the ‘right’ risks (those most likely to suffer loss). It measures the efficacy of insurance in compensating the population’s losses.
The right-hand side of Equation (4.8) also shows that loss coverage can be thought of as the risk-weighted insurance demand for the randomly selected member of the population.
This concept of loss coverage as risk-weighted insurance demand can be contrasted with unweighted insurance demand, which corresponds to the ‘number of risks insured’ often referenced in informal discussions of adverse selection.
Loss Coverage Ratio
When comparing alternative risk classification schemes, it is often helpful to define loss coverage to be 1 under some suitable reference scheme. Loss coverage under alternative schemes can then be expressed as a fraction of loss coverage under the reference scheme. It is convenient to use the same approach as for adverse selection above, that is, I use risk-differentiated premiums as the reference scheme. Then using subscript 0 to denote quantities evaluated under risk-differentiated premiums, I define the loss coverage ratio (LCR) as:
Note that in the numerical examples in Chapters 1 and 3, the fraction between 0 and 1 which I heuristically labelled ‘loss coverage’ was, more precisely, a loss coverage ratio with the reference scheme (i.e. under which loss coverage = 1) defined as compulsory insurance of the whole population. Clearly the choice of reference scheme – risk-differentiated premiums, compulsory insurance or something else – does not matter, provided we use a consistent reference when making comparisons of different proposed risk classification schemes.
Now note that loss coverage ratio in Equation (4.10) can also be expanded into:
Then by noting that expected population losses are the same irrespective of the risk classification scheme (i.e. E[L] = E0[L]), we see that the first term on the right-hand side of Equation (4.11) is the adverse selection ratio in Equation (4.7). The second term on the right-hand side of Equation (4.11) is the ratio of demand under the actual risk classification scheme to demand under risk-differentiated premiums, which I call the demand ratio. So Equation (4.11) can then be interpreted as:
We can illustrate this decomposition of loss coverage ratio by applying it to the numerical examples in Chapter 3. The decomposition is shown in Table 4.1. In each of the three columns in the table, loss coverage ratio (the third line) is the product of adverse selection ratio and demand ratio (the first and second lines).
Table 4.1 Decomposition of loss coverage ratio into adverse selection ratio and demand ratio for Tables 3.1–3.3
| Table 3.1 | Table 3.2 | Table 3.3 | |
|---|---|---|---|
![]() |
1.0 | 1.25 | 1.3461 |
![]() |
1.0 | 0.90 | 0.65 |
![]() |
1.0 | 1.125 | 0.875 |
The decomposition of loss coverage ratio into adverse selection ratio and demand ratio is sometimes helpful in correcting casual intuitions and commentary about restrictions on risk classification. Casual intuitions and commentary often reference rising average prices (adverse selection) and falling demand (numbers insured), but without considering how the two effects interact. But depending on the product of the two effects, loss coverage ratio may be higher or lower than 1 (that is, loss coverage may be increased or decreased). The decomposition highlights that predicting or observing a rise in average price and a fall in demand under a new risk classification scheme is not sufficient to demonstrate a worse outcome. The outcome in terms of loss coverage depends on the product of the two effects.
An illustration of the trade-off between loss coverage and adverse selection is provided by Figure 4.1. This graph plots loss coverage ratio against adverse selection ratio, based on the two risk-group model in this chapter with a plausible form for the demand function.Footnote 6 It can be seen that the maximum point for loss coverage corresponds to an intermediate degree of adverse selection, not too low and not too high.
The shape of this graph, with the interior maximum showing that loss coverage is maximised by an intermediate level of adverse selection, is the most important image in this book. A similar inverted U-shape is obtained for any reasonable demand function. Note that the highest loss coverage is obtained not despite the adverse selection, but because of the adverse selection. In moderation, adverse selection is a good thing.
Figure 4.1 Loss coverage ratio as a function of adverse selection ratio
Figure 4.1 is based on a relative risk β = μ2/μ1 = 4. If the relative risk is lower, the maximum value of loss coverage is lower, and this maximum is attained with a lower level of adverse selection. This is illustrated in Figure 4.2, which shows the plot of loss coverage ratio against adverse selection ratio for two values of relative risk, β = 3 and β = 4. The maximum of the dashed curve for β = 3 lies below and to the left of the maximum of the solid curve for β = 4.
Note that the right-hand terminal points of each curve in Figures 4.1 and 4.2 correspond to limiting values, not points at which I arbitrarily chose to stop drawing the curve. These limiting values represent the scenario where all lower risks have dropped out of insurance and only higher risks remain; clearly, adverse selection then cannot increase any more. In Figure 4.2, the terminal point for β = 3 lies to the left and below the terminal point for β = 4. To understand this, note that when all higher risks have dropped out of the market and only higher risks remain, lower relative risk β = 3 implies a lower break-even premium (lower adverse selection), and also that a lower fraction of the total risk in the population is covered (lower loss coverage).
Figure 4.2 Loss coverage ratio as a function of adverse selection ratio, for two relative risks
Summary
The main points of Chapter 4 can be summarised as follows:
1. Loss coverage:
Loss coverage is the expected losses compensated by insurance for the whole population. It represents the extent to which insurance cover is concentrated on the ‘right’ risks (those most likely to suffer loss). It measures the efficacy of insurance in compensating the population’s losses.
Loss coverage under alternative risk classification schemes can be compared by loss coverage ratio:
where subscript 0 denotes a quantity calculated under some base or reference risk classification scheme, e.g. risk-differentiated premiums, or compulsory insurance.
2. Adverse selection:
where subscript 0 denotes a quantity calculated under some base or reference risk classification scheme, e.g. risk-differentiated premiums, or compulsory insurance.
3. Decomposition of loss coverage ratio into adverse selection ratio and demand ratio:
This decomposition highlights that predicting or observing a rise in average price and fall in demand under a new risk classification scheme is not sufficient to demonstrate a worse outcome. The outcome in terms of loss coverage depends on the product of the two effects.
Chapter 4 gave mathematical definitions of loss coverage and related quantities. This chapter introduces models of insurance markets which enable us to study how loss coverage varies with changes in the fractions of the population represented by higher risks and lower risks, probabilities of loss and demand elasticities. As with Chapter 4, readers who are not concerned with mathematical detail may prefer to skim just the graphs and end-of-chapter summary.
We focus first on a simple iso-elastic demand function, and then consider more general demand functions.
Two Risk-Groups with Iso-elastic Demand
This chapter uses the same two risk-group model as in Chapter 4. In that chapter, I used a generic function for proportional insurance demand d(μi, πi) to define the quantities E[L], E[Q], E[Π] and E[QL] – expected population loss, expected insurance demand, expected premium and expected insurance claim (loss coverage). I did not specify a form for the demand function, nor the detail of how the premiums πi charged to each risk-group were determined. In the present chapter, I specify a form for the demand function. I also specify the zero-profit equilibrium condition which determines the ‘pooled’ premium when all risks are pooled at the same price.
Specifying an Insurance Demand Function
To recap, the proportional demand for insurance d(μi, πi) is the proportion of the risk-group with probability of loss μi which buys insurance when a premium of πi is charged to members of that risk-group.
As a preliminary, it is helpful to define the concept of demand elasticity. Demand elasticity is defined as:
Roughly speaking, this is the percentage change in demand for a very small percentage change in premium. It measures the responsiveness of demand to small changes in premium.
Note that the derivative within the above expression is normally negative (as the premium rises, demand falls). The minus sign ensures that demand elasticity as defined here will normally be positive; this makes the subsequent mathematical presentation tidier.
It is sometimes more convenient to rewrite Equation (5.1) as the log–log derivative of the demand function with respect to the premium, that is:
What properties should the demand function possess? I suggest the following as intuitive axiomsFootnote 1:
(a) Decreasing in premium: d(μi, πi) is a decreasing function of the premium πi for all risk-groups.
(b) Increasing in risk: d(μ1, π0) < d(μ2, π0), that is at any given premium π0, the proportional demand is higher for the higher risk-group.
(c) Decreasing in premium loading: d(μi, πi) is a decreasing function of the premium loading (πi/μi).
(d) Capped at 1: d(μi, πi) ≤ 1, that is the highest possible demand is when all members of the risk-group buy insurance.
A simple demand function which satisfies these requirements can be obtained by setting the demand elasticity in Equation (5.2) equal to a constant, say λi. Solving this differential equation leads to the so-called iso-elastic demand function:
where
τi = d(μi, μi) is the ‘fair-premium demand’ for risk-group i, that is the proportion of risk-group i who buy insurance at an actuarially fair premium, that is when πi = μi.
μi is the risk (probability of loss) for members of risk-group i.
λi is the demand elasticity for members of risk-group i, as already defined.
To interpret the demand formula in Equation (5.3), observe that it specifies demand as a function of the premium loading (πi/μi). When the premium loading is high (insurance is expensive), demand is low and vice versa. The ‘iso-elastic’ terminology reflects that price elasticity of demand is the same constant λi everywhere along the demand curve.
The λi parameter controls the shape of the demand curve. This is illustrated in Figure 5.1, which shows plots of demand from higher and lower risk-groups for three different values of λ in the demand function of Equation (5.3).Footnote 2
Figure 5.1 Iso-elastic demand curves for λ = 0.5, 1.0 and 1.5
The demand function in Equation (5.3) clearly satisfies axioms (a) and (c) above. Axioms (b) and (d) appear superficially to require conditions on the fair-premium demands τ1 and τ2. In other words, we need to be careful that modelled demand from the lower risk-group is always lower than from the higher risk-group (axiom (b)); and also that modelled demand from the higher risk-group does not exceed 1 at the equilibrium premium (axiom (d)). In Figure 5.1, we can see that the latter point might be a concern for the highest curve in the right panel, the case λ = 1.5, if the equilibrium-pooled premium happened to be below about 0.016.
However, for the purposes of analysing the mathematical properties of the model, it is convenient to use the following trick. Recall that p1 and p2 are the fractions of the total population represented by lower risks and higher risks, respectively. Then define the fair-premium demand-share of the lower risk-group as the proportion of total demand which the risk-group represents when actuarially fair premiums are charged to both risk-groups:
Clearly α2, the fair-premium demand-share of the higher risk-group, is just the complement of α1 (i.e. α2 = 1 – α1).
We can then analyse the mathematical properties of the model for the full range of possible fair-premium demand-shares 0 ≤ α1 ≤ 1, without worrying about specifying any particular values for the fair-premium demands τi and population fractions pi. It suffices to note that for every possible α1, there will be some hypothetical combination of population structure pi and fair-premium demand τi which satisfies the axioms (b) and (d) above.
Specifying a Zero-Profit Equilibrium Condition
Suppose now that the premiums charged to members of the low risk-group and high risk-group are π1 and π2, respectively.
The insurance incomeFootnote 3 (premiums) per member of the population will be the sum of the products of demand and premium for each risk-group:
The insurance outgo (claims) per member of the population will be the product of demand and the probability of loss for each risk-group (recall from Chapter 4 that a loss, if it occurs, is always of unit size):
The right-hand side of Equation (5.6) looks like the definition of loss coverage in Equation (4.8) in Chapter 4. However, loss coverage refers specifically to this quantity at equilibrium. The insurance outgo in Equation (5.6) is defined for any premiums, not just equilibrium premiums.
To determine equilibrium premiums, note that the insurer’s expected profit (loss, if negative) is expected income less expected outgo, that is Equation (5.5) – Equation (5.6). So any pair of premiums (π1,π2) which equates Equations (5.5) and (5.6) is an equilibrium.
One equilibrium is obvious from inspection: full risk classification (or ‘fully risk-differentiated premiums’), that is π1 = μ1, π2 = μ2. This represents one extreme.
Another equilibrium – and the main focus in this chapter – is a common ‘pooled’ premium π0 for both risk-groups: nil risk classification (or ‘pooling’) that is π1 = π2 = π0. This represents the other extreme.Footnote 4 There will always exist some value of π0 which gives a pooling equilibrium.Footnote 5
Examples
To illustrate the use of this model, set μ1 = 0.01, μ2 = 0.04 (i.e. relative risk β = 4) and α1 = 0.9, that is 90% of the insurance demand under risk-differentiated premiums is from lower risks. These parameter values are used throughout Chapters 5 and 6, except where stated otherwise. The parameters have been chosen by loose analogy with the life insurance market, where typically around 90% of accepted risks are assigned to a large ‘standard’ risk-group with low mortality, and around 10% of accepted risks are charged a range of higher premiums ranging from +50% to +300% over the standard risk-group’s premium. But these values are merely hypothetical and illustrative; they are not presented as a calibrated model of any real market.
Figure 5.2 shows the equilibrium for relatively inelastic demand, λ = 0.5. Figure 5.2 can be interpreted as follows.
Figure 5.2 Low elasticity: λ = 0.5, giving increased loss coverage under pooling
The horizontal dashed line is a reference level representing income and outgo if risk-differentiated premiums are charged: that is, if π1 = μ1, π2 = μ2 in Equations (5.5) and (5.6).
The curves represent how income and outgo as defined in Equations (5.5) and (5.6) vary with the level of a common (‘pooled’) premium which is charged to both risk-groups.
On the left-hand side of the graph, where the premium π is low, demand for insurance at this price is high, and so outgo is high. Because of the very low premium, income is low (despite the high demand); the market is far from equilibrium and insurers make large losses. Insurers will therefore increase the pooled premium, and some customers will leave the market. As customers leave the market, outgo decreases monotonically (the downward sloping curve), but income increases because the number of customers leaving the market is outweighed by the increase in premium collected for each remaining customer. So for demand elasticity λ < 1, the curve of total income slopes upwards.Footnote 6 The intersection (shown by the arrow) of the curves for income and outgo represents a pooling equilibrium. The premium at this intersection is the equilibrium pooled premium π0.
Note that the arrowed intersection is at a higher level of income and outgo than under full risk differentiation. In other words, with the low demand elasticity λ = 0.5 assumed here, loss coverage under pooling is increased compared with that under risk-differentiated premiums.
Figure 5.3 shows the result for more elastic demand λ = 1.5, with all other parameters as in Figure 5.2. Note that the equilibrium is at a lower level of income and outgo than when risk-differentiated premiums are charged. In other words, with the higher demand elasticity λ = 1.5 assumed here, loss coverage under pooling is reduced compared with that under risk-differentiated premiums.
Figure 5.3 High elasticity: λ = 1.5, giving reduced loss coverage under pooling
General Results for Iso-elastic Demand with Common Elasticity λ
This section states and illustrates general results for adverse selection, insurance demand (cover) and loss coverage under the iso-elastic demand function as per Equation (5.3), with a common elasticity parameter λ for both risk-groups. Mathematical proofs are omitted in this book, but have been published in Hao et al. (Reference Hao, Tapadar and Thomas2015, Reference Hao, Macdonald, Tapadar and Thomas2016a). In interpreting these results, note that a loss coverage ratio (LCR) above one (LCR > 1) signifies a ‘good’ outcome from restricting risk classification, and LCR < 1 signifies a ‘bad’ outcome.
The results stated below are illustrated in Figures 5.4–5.6.
(a) Adverse selection ratio increases monotonically with demand elasticity, to an upper limit where the only remaining insureds are high risks. This is shown in the upper panel of Figure 5.4 for two different relative risks β = 3 and β = 4. Note that the asymptotic limiting values of adverse selection ratio are equivalent to the pooled premium when all lower risks have left the insurance market, divided by the weighted average premium under risk-differentiated premiums.
(b) The corresponding change in demand ratio is shown in the lower panel of Figure 5.4. Recall from Chapter 4 that demand ratio represents insurance demand when risk classification is restricted divided by insurance demand under fully risk-differentiated premiums. Demand ratio is therefore a measure of the reduction in cover which arises from adverse selection; it is the realisation in our model of what economic rhetoric typically describes as ‘efficiency losses’ or ‘inefficiency’ arising from adverse selection.
(c) In contrast to adverse selection ratio and demand ratio, LCR as a function of demand elasticity has an interior maximum, as shown in the upper left region of Figure 5.5. In other words, loss coverage is maximised with a non-zero level of adverse selection, irrespective of whether adverse selection is characterised as ‘positive correlation of cover and losses’ (as in this book, and as in most econometric tests for adverse selection) or as ‘reduction in cover’ (what economic rhetoric calls ‘efficiency losses’).Footnote 7
(d) For iso-elastic demand, demand elasticity of 1 always gives LCR of 1, as shown in Figure 5.5 (i.e. both curves pass through the coordinate (1,1)). Lower values of demand elasticity give LCR above 1, and vice versa, that is
(5.7)So for the iso-elastic demand function, λ = 1 represents a critical value of demand elasticity, which determines whether loss coverage is increased or reduced by pooling all risks in a single class, as compared with loss coverage under risk-differentiated premiums.
(e) For high values of λ, loss coverage flattens out at a lower limit where the only remaining insureds are high risks. This is shown towards the right side of the main graph in Figure 5.5.
(f) The smaller graph on the lower right in Figure 5.5 zooms over the upper left region of the main graph, that is the region where 0 < λ < 1. It can be seen that as demand elasticity increases from zero, LCR increases from 1 to a maximum at around demand elasticity λ = 0.5. A higher relative risk β gives a higher maximum value of LCR. Note that in this region 0 < λ < 1, the common characterisation of adverse selection as ‘inefficient’ seems unreasonable. With adverse selection, more risk is being voluntarily traded, and more losses are being compensated.
(g) The value of the maximum for LCR in the zoomed region in Figure 5.5 is higher for β = 4 than for β = 3. The maximum is also affected by the population structure. To be precise, the maximum depends on the fair-premium risk-share, say w (note: not the same as the fair-premium demand-share αi previously defined in Equation (5.4)), i.e. the fraction of total insured risk which lower risks represent when actuarially fair premiums are charged:
(5.8)
(h) Figure 5.6 shows loss coverage for three population structures, all with relative risk β = 4. The smaller graph on the lower right zooms over the region 0 < λ < 1. The curve with the highest maximum for LCR is that corresponding to population structure α1 = 0.8; this corresponds to a fair-premium risk-share of:
(5.9)
(i) For the iso-elastic demand function used in this chapter, it can be shown that if both demand elasticity and population structure can vary, the maximum value for LCR occurs when λ = 0.5 and w = 0.5. This maximum value for LCR is:
(5.10)
(j) The form of Equation (5.10) combined with the requirement for w = 0.5 suggests that pooling will be particularly beneficial to loss coverage when there is small risk-group with very high risk (since this combination allows high relative risk β to be combined with fair-premium risk-share w ≈ 0.5). One obvious example is a small fraction of the population with an adverse genetic profile.Footnote 8
Figure 5.4 Adverse selection ratio and demand ratio as a function of demand elasticity, for two relative risks
Figure 5.5 Loss coverage ratio as a function of demand elasticity, for two relative risks
Figure 5.6 Loss coverage ratio as a function of demand elasticity, for three population structures
General Results for Iso-elastic Demand with Different Elasticities (λ1 ≠ λ2)
When the elasticity parameters λ1 and λ2 for the low and high risk-groups are different, it becomes harder to make concise general statements about how they affect loss coverage. The general pattern of results is shown in Figure 5.7. This shows the regions in the (λ1, λ2) plane where LCR is above or below 1 (i.e. pooling gives higher or lower loss coverage than risk-differentiated premiums). The graph can be explained as follows.
(a) First, ignore the two dashed curves in Figure 5.7 and focus just on the solid curve for β = 4 (a relative risk of 4, e.g. μ1 = 0.01, μ2 = 0.04). This curve demarcates a left-hand unshaded region containing all combinations of (λ1, λ2) for which LCR > 1, and a right-hand shaded region containing all combinations for which LCR < 1. Note that LCR > 1 is associated with movement towards the upper left of the graph: that is, to give LCR > 1, λ1 needs to be ‘sufficiently low’ relative to λ2.
(b) Second, note that λ1 sufficiently low relative to λ2 does not necessarily mean λ1 lower than λ2. In particular, in the lower part of the unshaded area inside the unit square in Figure 5.7 (the narrow segment below the dashed 45° line, but above the solid curve), λ1 is slightly higher than λ2, and yet still ‘sufficiently low’ to give LCR > 1.
(c) Third, focus now on the two dashed curves in Figure 5.7. The dashed curves illustrate how the solid curve demarcating the left-hand LCR > 1 (unshaded) and right-hand LCR < 1 (shaded) regions shifts as the relative risk β changes. Note that as relative risk β increases, the curve demarcating the regions becomes more convex, so that a greater range of combinations of λ1 and λ2 inside the unit square gives LCR > 1. But the effect is small: increasing the relative risk from 4 to 40 times or even 400 times makes only a small difference to the curve.
Figure 5.7 Regions of (λ1, λ2) space where loss coverage ratio is greater or less than 1
Comparison with Empirical Demand Elasticities: λ < 1 is Realistic
The results summarised in Figures 5.4–5.7 suggest that under iso-elastic demand, pooling will give higher loss coverage than fully risk-differentiated premiums:
(a) in the equal elasticities case, whenever demand elasticity is less than 1;
(b) in the different elasticities case, whenever λ1 < 1 and λ1 < λ2 (in Figure 5.7, the part of the unshaded region vertically above the λ1 = λ2 diagonal and to the left of λ1 = 1);
(c) for some values outside this range, provided λ1 is ‘sufficiently low’ relative to λ2 (in Figure 5.7, other parts of the unshaded region).
How do these limits compare with demand elasticities in the real world? There is some evidence that insurance demand elasticities are less than 1 in many markets. Table 5.1 shows some relevant empirical estimates. (For convenience the elasticity parameter in this chapter was defined as a positive constant, but estimates in empirical papers are generally given with the negative sign, so the table quotes them in that form.) These estimates suggest that if the iso-elastic demand model is reasonable, loss coverage might often be increased by restricting risk classification.Footnote 9
Table 5.1 Estimates of demand elasticity for various insurance markets
| Market and country | Estimated demand elasticities | References |
|---|---|---|
| Yearly renewable term life insurance, USA | −0.4 to −0.5 | Pauly et al. (Reference Pauly, Withers and Viswanathan2003) |
| Term life insurance, USA | −0.66 | Viswanathan et al. (Reference Viswanathan, Lemaire and Withers2007) |
| Whole life insurance, USA | −0.71 to −0.92 | Babbel (Reference Babbel1985) |
| Health insurance, USA | 0 to −0.2 | Chernew et al. (Reference Chernew, Frick and McLaughlin1997), Blumberg et al. (Reference Blumberg, Nichols and Banthin2001), Buchmueller and Ohri (Reference Buchmueller and Ohri2006) |
| Health insurance, Australia | −0.35 to −0.50 | Butler (Reference Butler1999) |
| Farm crop insurance, USA | −0.32 to −0.73 | Goodwin (Reference Goodwin1993) |
Loss Coverage under Alternative Demand Functions
The ‘iso-elastic’ property of the demand function used above implies that demand elasticity does not change as the premium changes. This is mathematically tractable, but it is arguably unrealistic: the usual pattern for most goods and services is that demand elasticity increases as price increases. To accommodate this pattern, more flexible demand specifications are needed.
One such specification is to assume that demand elasticity is a linearly increasing function of the premium, and the same irrespective of the individual’s risk-group. This leads to the negative exponential demand function. If we relax the constraint of linearity in the increase of elasticity with premium, we then have the generalised negative exponential demand function.
The mathematical details of these demand functions are given in Appendix A. In the present chapter, I just show Figure 5.8, which plots loss coverage ratio against demand elasticity for three demand functions: iso-elastic (as used so far in this chapter), negative exponential and generalised negative exponential with a key parameter n set to 2 (this parameter n is the ‘elasticity of elasticity’, also known as the ‘second-order elasticity’).
Figure 5.8 Loss coverage ratio as a function of demand elasticity, for three versions of generalised negative exponential demand
I first need to clarify a technical point about the definition of ‘demand elasticity’ on the x-axis in Figure 5.8. With iso-elastic demand in Figures 5.4–5.6, I plotted loss coverage ratio against a single λ parameter. This parameter represents the elasticity parameter value (common to both risk-groups) of λ1 = λ2 = λ; and because of the iso-elastic property, it also represents the actual demand elasticity at any premium. For the alternative demand functions, demand elasticity varies as a function of the premium. But because the alternative demand functions assume that demand elasticity is the same increasing function of premium for both risk-groups (the details are given in Appendix A), we can still define a single measure of elasticity for the x-axis: we just plot loss coverage against the actual demand elasticity (which is the same for both risk-groups) at the equilibrium premium.
I now move to results. Figure 5.8 shows the plot of loss coverage ratio as a function of demand elasticity at the equilibrium premium, for generalised negative exponential demand with three parameter values for the second-order elasticity:
– n → 0 (this corresponds to iso-elastic demand);
– n = 1 (this corresponds to negative exponential demand); and
– n = 2 (this corresponds to generalised negative exponential demand).
The solid curve (the iso-elastic case) in this graph is the same as the solid curve in the earlier Figures 5.5 and 5.6. The dashed and dotted curves (the negative exponential and generalised negative exponential cases, respectively) are above the horizontal line representing LCR = 1 for a wider range of demand elasticities. In other words, the iso-elastic case used throughout the earlier parts of this chapter is actually the ‘least favourable’ case for restrictions on risk classification.
Taken together, the three curves in Figure 5.8 suggest that the main point of this book is robust to a variety of plausible demand functions. For all these demand functions, loss coverage is higher under pooling compared with risk-differentiated premiums, provided demand elasticity is sufficiently low.
Multiple Equilibria: A Technical Curiosity
In footnote 5, I gave a simple argument to show that a pooled equilibrium premium π0 always exists (the profit function crosses the x-axis at least once), but I did not show that it was unique (the profit function crosses the x-axis only once). For some extreme population structures and extreme differences between the risk-groups in demand elasticities, the profit function can generate multiple equilibria. In other words, the zero-profit condition can be satisfied by more than one pooled premium (π0A, π0B, π0C, etc.). This technical curiosity turns out not to be important in practice, because the parameter values required to generate multiple equilibria are implausible. But to demonstrate this implausibility, it is necessary to analyse how multiple equilibria can arise, and establish limits on the parameter values which are required. In view of its technical nature, this is dealt with in Appendix B.
Summary
The main points of Chapter 5 can be summarised as follows.
1. This chapter has introduced models of insurance market equilibrium under restricted risk classification, principally with a simple iso-elastic demand function, but also considering alternative demand functions. Although the results differ in detail, the general pattern is the same for all the demand functions considered (see Figure 5.8). The main point of this book is robust to the different demand functions: loss coverage is higher under pooling than under risk-differentiated premiums, provided demand elasticity is sufficiently low.
2. For iso-elastic demand, demand elasticity of 1 is a critical value, which determines whether loss coverage is increased or reduced by pooling all risks at a common premium, as compared with loss coverage under risk-differentiated premiums. Demand elasticity of 1 gives loss coverage ratio (LCR) of 1. Lower values of demand elasticity give LCR above 1, and vice versa, that is:
3. For the other demand functions we considered, the critical value of demand elasticity at the equilibrium premium, which determines whether loss coverage is increased or reduced by pooling, is above 1 (the other curves in Figure 5.8 cross the x-axis at x > 1). In other words, pooling gives higher loss coverage for a wider range of demand elasticities.
4. There is some evidence that insurance demand elasticities are less than 1 in many markets. Together with (2) and (3) above, this suggests a realistic possibility that in the real world, loss coverage can often be increased by restricting risk classification.
Chapters 3–5 considered only two polar cases: fully risk-differentiated prices (where risk classification is unrestricted) and complete pooling (where risk classification is banned). Another possibility – and perhaps a more realistic one – is that insurers are allowed to differentiate prices to some extent, but not so much as to fully reflect differences between high and low risks. For example, suppose insurers are allowed to differentiate life insurance prices by age but not by gender, and obtain some but not all of any relevant medical history. Then even where insurers attempt to differentiate prices by risk, the highest-risk insured lives may be charged prices less than their true risk, and conversely for the lowest-risk. This chapter considers such schemes of ‘partial risk classification’, in contradistinction to ‘nil risk classification’ (complete pooling) and ‘full risk classification’ (fully risk-differentiated prices).Footnote 1
This chapter also introduces a novel metric for the degree to which different risks are pooled into broad risk-groups where all members of each risk-group are charged the same price. I call this metric the inclusivity of risk classification. I call the complement of inclusivity, that is one minus inclusivity, the separation of risk classification.
Partial Risk Classification
To represent partial risk classification in the two risk-group model as used in Chapters 4 and 5, think of starting from an initial position of fully risk-differentiated prices. Total expected profits are zero, and expected profits on both risk-groups individually are zero. Then suppose that a monopoly insurer (or competing insurers, under suitable regulation) slightly reduces the premium for high risks.Footnote 2 Demand from high risks will rise, and there is an expected loss on each high risk. To break even, the insurer then needs to increase the premium for low risks. Demand from low risks will fall, and there is an expected profit on each low risk. At a suitably increased low-risk premium, the expected profits on (a reduced number of) low risks will exactly offset the expected losses on (an increased number of) high risks. There are infinitely many possible solutions (π1, π2) of the profit equation like this, each corresponding to a different scheme of partial risk classification. Using other notation as in Chapters 4 and 5, the solutions are given by:
The change in high-risk and low-risk demand consequent upon the change in prices implies that loss coverage will also change. Loss coverage under partial risk classification is given by:
where (π1, π2) are a pair of prices which satisfy the zero-profit condition in Equation (6.1).
This leads to the question: under what risk classification scheme is loss coverage maximised: fully risk-differentiated prices, complete pooling or some intermediate scheme of partial risk classification?
The question can be answered by performing a constrained optimisation of the premiums (π1, π2), with loss coverage as the objective function and the zero-profit condition as a constraint. The results depend on the nature of insurance demand, as specified by the demand elasticities. Figure 6.1 illustrates results for the iso-elastic demand function used in Chapter 5. The graph shows which type of risk classification – full, partial or nil – gives the highest possible loss coverage in each region of the (λ1, λ2) parameter space.
Figure 6.1 Risk classification scheme giving highest loss coverage in different regions of the (λ1, λ2) parameter space
Figure 6.1 can be interpreted as follows.
(a) The convex curve from the origin extending into the middle of the lightly shaded region is the line along which LCR = 1, that is where nil and full risk classification give the same loss coverage. (Note that this line is the same line as shown in Figure 5.7.)
(b) In the unshaded left-hand region (i.e. λ1 ‘sufficiently low’ compared with λ2), nil risk classification is optimal; and conversely in the darkly shaded right-hand region, full risk classification is optimal. This accords with intuition: other things equal, lower demand elasticity in the low risk-group or higher demand elasticity in the high risk-group should make pooling more beneficial, and vice versa.
(c) Partial risk classification is optimal only in the lightly shaded middle region where both elasticities exceed 1, with λ2 somewhat higher (but not too much higher) than λ1. The nature of the partial risk classification scheme which maximises loss coverage varies progressively as we move horizontally across this middle region, from ‘almost nil’ at the left boundary to ‘almost full’ at the right boundary. Details of the boundary conditions are given in the footnote.Footnote 3
(d) At the graph coordinate (1,1) where λ1 = λ2 = 1, the three regions in the graph converge: all risk classification schemes give the same loss coverage. This is consistent with a well-known property of iso-elastic demand with unit demand elasticity: total revenues (in this context, premiums) are invariant to price. ‘Invariant to price’ is equivalent in this context to ‘invariant to the risk classification scheme’. Since expected premiums must also equal expected claims (loss coverage) in equilibrium, it follows that loss coverage is then invariant to the risk classification scheme.
(e) The lightly shaded middle region where partial risk classification gives the highest loss coverage is a region where both demand elasticities are higher than one, with λ2 higher (but not too much higher) than λ1.
Separation and Inclusivity of Risk Classification
In the discussion above, I noted that there were many feasible equilibrium partial risk classifications (π1, π2). The prices π1 and π2 are ‘close together’ in some risk classification schemes (close to nil risk classification, towards the left of the lightly shaded middle region in Figure 6.1), and ‘far apart’ in other risk classification schemes (close to full risk classification, towards the right of the lightly shaded middle region in Figure 6.1). Informally, the former type of scheme can be said to have a low ‘degree’ or ‘gradation’ of risk classification, and conversely for the latter type. It would be useful to have a metric to index the ‘degree’ or ‘gradation’ of a risk classification scheme. The rest of this chapter defines and explores a metric for this purpose, the separation of a risk classification scheme and its complement inclusivity (i.e. one minus separation).
Two Risk-Groups: Linear Separation
For the two risk-group model used so far in this book, I suggest a simple metric of linear separation. Suppose that (π1, π2) is a ‘partial risk classification’ solution of the zero-profit condition given in Equation (6.1), i.e. premiums are neither completely pooled nor fully risk-differentiated, but instead only partly reflect the difference in true risks μ1 and μ2. Then the linear separation of risk classification is:
that is, the distance between the premiums divided by the distance between the true risks.
Interpretation of this metric is intuitive: linear separation is 0 for nil risk classification and 1 for full risk classification, with intermediate values for partial risk classification.
Why Is Separation Interesting?
There are four main ways in which I envisage that separation (or its complement inclusivity) may be a useful concept when discussing risk classification schemes.
(a) Intuitive description of different ‘degrees’ of risk classification. Separation characterises in a direct and intuitive way different versions of partial risk classification – close to fully risk-differentiated prices, close to complete pooling, and all points in between. Characterising these different versions by the resulting adverse selection or loss coverage seems less direct and intuitive.
(b) Quantification of differences between markets in risk classification. In car insurance in many countries, there is ‘a lot’ of risk classification (prices are close to fully risk-differentiated). In health insurance in many countries, there is only ‘a little’ risk classification (e.g. under Obamacare premiums can vary by a maximum of 3 times for age, and 1.5 times for smoking status). We need a concept to quantify the difference between the typical approaches to risk classification in car insurance and in health insurance. Separation is that concept.
(c) Quantification of political or ethical objective. I believe that many people see high inclusivity as politically or ethically desirable for health insurance (and to some extent for life insurance, and sometimes for other types of insurance). This preference often seems quite resistant to arguments about adverse selection or loss coverage; that is, the preference is to some degree independent of the aggregate insurance market outcomes it produces. Since different people seem to have different ‘strengths’ of this type of preference, we need a metric to quantify it.Footnote 4
(d) Object of regulation. Separation may be a useful metric for regulation of risk classification. For example, the Obamacare rules mentioned above (premiums can vary by a maximum of 3 times for age, and 1.5 times for smoking status) are controls on separation (albeit separation defined by maximum range, rather than the definition I have suggested).
More than Two Risk-Groups: Lorenz Separation
In the real world, realistic risk classification schemes normally involve more than two risk-groups. Each risk-group is identified by the premium charged to all members of the risk-group. Individuals are allocated by the insurer to the risk-group ‘closest’ to their true risk. Expressed mathematically, risk classification is a function f(μi) which maps true risk (probability of loss) μi to premium πj according to the following ruleFootnote 5:
subject to the ‘tie-breaker’ that if |μi – πk| is equal for two values of k, we choose the higher πk.
To generalise the concept of separation for use with more than two risk-groups, we can use the concept of Lorenz curves. Lorenz curves are generally used to represent distributions of income and wealth over the members of a population, but they can also represent distributions of premiums and true risks. The basic idea of Lorenz separation is to compare the shapes of two Lorenz curves, one for premiums and one for true risks. The relevant population for whom the comparison is made are the people who buy insurance. I call this population the insurance clientele. Two Lorenz curves for premiums and true risks for an insurance clientele are shown in Figure 6.2.
Figure 6.2 Lorenz curves denoting distributions of true risks and premiums
In Figure 6.2, the solid curve labelled A represents the distribution of true risks over the clientele. As drawn, the solid curve shows that the lower-risk 80% of the clientele by number account for 40% of the total risk of the clientele: risk is relatively concentrated in the highest-risk individuals (e.g. four individuals with risk of 0.01 and one individual with risk of 0.06). If the solid curve approaches the 45° line, all individuals in the clientele have the same risk. If the solid curve slumps away towards a right-angled shape with corner at the lower right, a single high-risk individual represents all the clientele’s risk.
The dashed curve labelled B represents the distribution of premiums over the clientele. If the dashed curve coincides with the solid curve, we have full risk classification (fully risk-differentiated premiums). If the dashed curve coincides with the 45° line, we have nil risk classification (complete pooling). If the dashed curve follows an intermediate path, as shown in Figure 6.2, we have partial risk classification.
We then define the Lorenz separation of risk classification as followsFootnote 6:
As with linear separation, interpretation of this metric is intuitive: D = 0 for nil risk classification (the Lorenz curve for premiums coincides with the 45° line); D = 1 for full risk classification (the Lorenz curve for premiums coincides with that for true risks); and D takes intermediate values for partial risk classification. Risk classification schemes with values of D outside the [0,1] range are conceivable, but can generally be disregarded as perverse and unfair.Footnote 7
Lorenz curves are often described by Gini coefficients. The Gini coefficient of the distribution of premiums is the area between the 45° line and the dashed curve, divided by the entire area below the 45° line. Hence writing Gπ for the Gini coefficient of premiums and Gμ for the Gini coefficient of true risks, we can write Lorenz separation in Equation (6.5) as
For a finite number of observations, the Gini coefficient can be estimated as half the ‘relative mean absolute difference’, that is the mean absolute difference between all possible pairs of observations, divided by the mean of the observations.Footnote 8 That is, for n observations xi, the relative mean absolute difference is:
We can use Equation (6.7) to obtain a formula to estimate the right-hand side of Equation (6.6). First note the following two preliminaries:
(i) In any equilibrium scheme of risk classification, the mean of the premiums (weighted by numbers of risks charged each premium) must be equal to the mean of the corresponding true risks.
(ii) Assume the clientele comprises n individuals with true risks (μ1, μ2, …, μn), which the risk classification scheme assigns into m < n distinct risk-groups (π1, π2, …, πm). So when calculating the relative mean absolute difference of the premiums, each absolute difference |πi − πj| needs to be repeated once for each of the ni individuals who are paying premium πi; and then for each of those individuals, the absolute difference with the premium πj needs to be repeated once for each of the ni individuals who are paying premium πj. So each |πi − πj| is repeated a total of ni × nj times, giving an overall total of n2 absolute differences. This is the same as the number of absolute differences between the n true risks.Footnote 9
Taking (i) and (ii) together, the n2 and
in Equation (6.7) are the same when we calculate Equation (6.7) for either the premiums or the corresponding true risks. So we can omit them in both the numerator and denominator of Equation (6.6), and hence write Equation (6.6) as:
In words, Lorenz separation is the sum of the absolute difference between each premium and all other premiums (weighted by the numbers paying that premium), divided by the sum of the absolute difference between each true risk and all other true risks. Both numerator and denominator sum over all individuals in the clientele.
Note that Lorenz separation represents a generalisation of linear separation: Equation (6.8) with two risk-groups and two levels of true risk reduces to Equation (6.3).
The geometry of the Lorenz curves provides a visual representation of the cross-subsidies implicit in a risk classification scheme. For example, consider a clientele comprising one-third by number of each of low, medium and high risks, with uniform risk within each of these risk-groups. Suppose the low risks are overpriced, the medium risks are assigned an actuarially fair price and the high risks are underpriced. The Lorenz curves then look something like Figure 6.3 (note that the middle segments of the curves are parallel).
Figure 6.3 Lorenz curves when low, medium and high risks are overpriced, fairly priced and underpriced
Numerical Example
To illustrate the concept of Lorenz separation for comparing risk classification schemes, consider a population of five risks [0.01, 0.02, 0.03, 0.05, 0.09], and the following alternative schedules of premiums (i.e. risk classification schemes):
Alternative premium schedules:
(a) 5 risk-groups: [0.01, 0.02, 0.03, 0.05, 0.09]
(b) 3 risk-groups: [0.025, 0.06, 0.08]
(c) 2 risk-groups: [0.03, 0.07]
(d) 1 risk-group: [0.07]
Under each premium schedule, the insurer assigns individuals to the nearest premium to their true risk (i.e. in line with the rule given in Equation (6.4) in this chapter). Note that scheme (a) has fully risk-differentiated prices, and scheme (d) has complete pooling. As the risk classification scheme progresses from (a) to (d), we expect adverse selection: that is, the clientele will shift towards the higher risks, and reduce in number. Suppose that the clientele attracted by each premium schedule can be summarised as follows:
Clienteles attracted by the alternative premium schedules:
(a) 4 risks: [0.01, 0.02, 0.03, 0.05]
(b) 4 risks: [0.02, 0.03, 0.05, 0.09]
(c) 3 risks: [0.03, 0.05, 0.09]
(d) 2 risks: [0.05, 0.09]
Note that under scheme (a), fully risk-differentiated prices, the highest-risk individual does not buy insurance, because she perceives the price of 0.09 as unaffordable. As we progress from scheme (a) towards scheme (d), the highest-risk individual joins the clientele, and lower-risk individuals drop out, because they perceive the prices offered to them as poor value. Note that all the premium schedules (a) to (d) are equilibrium schedules, i.e. under each schedule, expected profits are zero for the clientele attracted by that schedule.
Table 6.1 shows values of Lorenz separation calculated using Equation (6.8) for each of the alternative risk classification schemes (a) to (d). Lorenz separation decreases as we move from fully risk-differentiated prices in scheme (a) towards complete pooling in scheme (d).
Table 6.1 also shows the loss coverage ratio for each of the alternative risk classification schemes (a) to (d). Loss coverage ratio is highest under risk classification scheme (b), which has an intermediate value of separation.
Table 6.1 Lorenz separations and loss coverage under four alternative risk classification schemes
| Risk classification scheme (premium schedule) | True risk-levels of clientele attracted by the scheme | Lorenz separation | Loss coverage ratio |
|---|---|---|---|
| (a) [0.01, 0.02, 0.03, 0.05, 0.09] | [0.01, 0.02, 0.03, 0.05] | 1 | 1 |
| (b) [0.025, 0.06, 0.08] | [0.02, 0.03, 0.05, 0.09] | 0.870 | 1.72 |
| (c) [0.03, 0.07] | [0.03, 0.05, 0.09] | 0.667 | 1.54 |
| (d) [0.07] | [0.05, 0.09] | 0 | 1.27 |
Figure 6.4 illustrates Lorenz separation for risk classification scheme (c) in Table 6.1, that is two risk-groups attracting a clientele of three risks. The solid curve represents the distribution of the three true risks in the clientele. The dashed curve represents the distribution of the two premiums to which the risk classification scheme assigns the three true risks. By inspection, the area between the 45° line and the dashed curve appears to be around two-thirds of the area between the 45° line and the solid curve. This corresponds well to the separation of 0.667 calculated using Equation (6.8) and shown in the third line of Table 6.1.
Figure 6.4 Lorenz curves for premiums and true risks under risk classification scheme (c) from Table 6.1
Separation versus Inclusivity
Separation and inclusivity are complements, and so any reference to one of them immediately implies the other. In this chapter I have usually referenced separation, because this is more mathematically convenient. In the discussion in later chapters, I usually refer to the complement, inclusivity, because this is more verbally convenient. (This usage of a complement for verbal convenience is analogous to usage of the term ‘power’ in statistical hypothesis testing: the power of a statistical test is one minus the probability of a Type II error.)
Summary
The main points of this chapter can be summarised as follows.
1. This chapter has discussed loss coverage under partial risk classification, where insurers set prices which only partly reflect differences between high and low risks. Typically this will be because of regulation.
2. Loss coverage may sometimes be maximised by neither complete pooling nor fully risk-differentiated prices, but by an intermediate degree of partial risk classification. To quantify the idea of different ‘degrees’ of partial risk classification, I defined the concepts of separation of risk classification, and its complement, inclusivity (one minus separation).
3. For two risk-groups, linear separation is defined as:
4. For any number of risk-groups, Lorenz separation is defined as:
where Gπ and Gμ are the Gini coefficients of the Lorenz curves for premiums and true risks, respectively, for the insurance clientele.
5. Lorenz separation can be estimated by the formula:
where
– πi and πj are the premiums assigned to insureds in risk-groups i and j
– μi and μj are the corresponding true risks
– ni and nj are the numbers of individuals assigned to the i-th and j-th of the m risk-groups
– n is the number of individuals in the insurer’s clientele.
6. Inclusivity of risk classification is the complement of separation:



