To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Many risks that are measured develop over time. As such, it is important that the ways in which these risks develop are correctly modelled. This means that a good understanding of time series analysis is needed.
Deterministic modelling
There are two broad types of model: deterministic and stochastic. At its most basic, deterministic modelling involves agreeing a single assumption for each variable for projection. The single assumption might even be limited to the data history, for example the average of the previous monthly observations over the last twenty years.
With deterministic approaches, prudence can be added only through margins in the assumptions used, or through changing the assumptions. A first stage might be to consider changing each underlying assumption in turn and noting the effect. This is known as sensitivity analysis. It is helpful in that it gives an idea of the sensitivity of a set of results to changes in each underlying factor, thus allowing significant exposures to particular risks to be recognised. However, variables rarely change individually in the real world. An approach that considers changes in all assumptions is therefore needed.
This leads us to scenario analysis. This is an extension of the deterministic approach where a small number of scenarios are evaluated using different pre-specified assumptions. The scenarios used might be based on previous situations, but it is important that they are not restricted to past experience – a range of possible futures is considered.
The nature of an organisation gives the basis on which other aspects of the risk management context can be built. One of the more important aspects is the nature of the relationships that various stakeholders have with an institution. There are a number of ways in which these relationships can be described, but a good starting point is to classify them into one of several broad types, these types being:
principal;
agency;
controlling;
advisory; and
incidental.
In this chapter, these relationships are considered in more detail, to make it easier to understand where risks can occur.
Principals
All financial institutions need and use capital (as do all non-financial institutions), and the principal relationships describe those parties who either contribute capital to or receive capital from the institution. Providers can be categorised broadly into those who expect a fixed, or at least predetermined, return on their capital (providers of debt capital, debtholders) and those who expect whatever is left (providers of equity capital, shareholders). The former will generally be creditors of the institution. This means that they have lent money to the institution, and are reliant on the institution being able to repay the debt. Shareholders, on the other hand, are not owed money by the institution; rather, they can be regarded as part owners of the institution. On the other side, institutions have relationships with their customers.
Measurements are central to clinical practice and medical and health research. They form the basis of diagnosis, prognosis and evaluation of the results of medical interventions. Advances in diagnosis and care that were made possible, for example, by the widespread use of the Apgar scale and various imaging techniques, show the power of well-designed, appropriate measures. The key words here are ‘well-designed’ and ‘appropriate’. A decision-maker must know that the measure used is adequate for its purpose, how it compares with similar measures and how to interpret the results it produces.
For every patient or population group, there are numerous instruments that can be used to measure clinical condition or health status, and new ones are still being developed. However, in the abundance of available instruments, many have been poorly or insufficiently validated. This book primarily serves as a guide to evaluate properties of existing measurement instruments in medicine, enabling researchers and clinicians to avoid using poorly validated ones or alerting them to the need for further validation.
Validity is defined by the COSMIN panel as ‘the degree to which an instrument truly measures the construct(s) it purports to measure’ (Mokkink et al., 2010a). This definition seems to be quite simple, but there has been much discussion in the past about how validity should be assessed and how its results should be interpreted. Psychologists, in particular, have struggled with this problem, because, as we saw in Chapters 2 and 3, they often have to deal with ‘unobservable’ constructs. This makes it difficult for them to judge whether they are measuring the right thing. In general, three different types of validity can be distinguished: content validity, criterion validity and construct validity. Content validity focuses on whether the content of the instrument corresponds with the construct that one intends to measure, with regard to relevance and comprehensiveness. Criterion validity, applicable in situations in which there is a gold standard for the construct to be measured, refers to how well the scores of the measurement instrument agree with the scores on the gold standard. Construct validity, applicable in situations in which there is no gold standard, refers to whether the instrument provides the expected scores, based on existing knowledge about the construct. Within these three main types of validity, there are numerous subtypes, as we will see later in this chapter.
Systematic reviews are made for many different types of studies, such as randomized clinical trials (RCTs), observational studies and diagnostic studies. Researchers, doctors and policy-makers use the results and conclusions of systematic reviews for research purposes, development of guidelines, and evidence-based patient care and policy-making. It saves them a considerable amount of time in searching for literature, and reading and interpreting the relevant articles. For the same purposes, more and more systematic reviews of studies focusing on the measurement properties of measurement instruments are being published. The aim of such reviews is to find all the existing evidence of the properties of one or more measurement instruments, to evaluate the strength of this evidence, and come to a conclusion about the best instrument available for a particular purpose. They may also result in a recommendation for additional research.
Measuring is the cornerstone of medical research and clinical practice. Therefore, the quality of measurement instruments is crucial. This book offers tools to inform the choice of the best measurement instrument for a specific purpose, methods and criteria to support the development of new instruments, and ways to improve measurements and interpretation of their results.
With this book, we hope to show the reader, among other things, why it
is usually a bad idea to develop a new measurement instrument
that objective measures are not better than subjective measures
that Cronbach???s alpha has nothing to do with validity
why valid instruments do not exist and
how to improve the reliability of measurements
The book is applicable to all medical and health fields and not directed at a specific clinical discipline. We will not provide the reader with lists of the best measurement instruments for paediatrics, cancer, dementia and so on ??? but rather with methods for evaluating measurement instruments and criteria for choosing the best ones. So, the focus is on the evaluation of instrument measurement properties, and on the interpretation of their scores.
Field-testing of the measurement instrument is still part of the development phase. When a measurement instrument is considered to be satisfactory after one or more rounds of pilot-testing, it has to be applied to a large sample of the target population. The aims of this field-testing are item reduction and obtaining insight into the structure of the data, i.e. examining the dimensionality and then deciding on the definitive selection of items per dimension. These issues are only relevant for multi-item instruments that are used to measure unobservable constructs. Therefore, the focus of this chapter is purely on these measurement instruments. Other newly developed measurement instruments (e.g. single-item patient-reported outcomes (PROs)) and instruments to measure observable constructs go straight from the phase of pilot-testing to the assessment of validity, responsiveness and reliability (see Figure 3.1).
This chapter forms the backbone of the book. It deals with choices and decisions about what we measure and how we measure it. In other words, this chapter deals with the conceptual model behind the content of the measurements (what), and the methods of measurements and theories on which these are based (how). As described in Chapter 1, the scope of measurement in medicine is broad and covers many and quite different concepts. It is essential to define explicitly what we want to measure, as that is the ‘beginning of wisdom’.
In this chapter, we will introduce many new terms. An overview of these terms and their explanations is provided in Table 2.1.
Different concepts and constructs require different methods of measurement. This concerns not only the type of measurement instrument, for example an X-ray, performance test or questionnaire, but also the measurement theory underlying the measurements. Many of you may have heard of classical test theory (CTT), and some may also be familiar with item response theory (IRT). Both are measurement theories. We will explain the essentials of different measurement theories and discuss the assumptions to be made.
An essential requirement of all measurements in clinical practice and research is that they are reliable. Reliability is defined as ‘the degree to which the measurement is free from measurement error’ (Mokkink et al., 2010a). Its importance often remains unrecognized until repeated measurements are performed. To give a few examples of reliability issues: radiologists want to know whether their colleagues interpret X-rays or specific scans in the same way as they do, or whether they themselves would give the same rating if they had to assess the same X-ray twice. These are called the inter-rater and the intra-rater reliability, respectively. Repeated measurements of fasting blood glucose levels in patients with diabetes may differ due to day-to-day variation or to the instruments used to determine the blood glucose level. These sources of variation play a role in test–retest reliability. In a pilot study, we are interested in the extent of agreement between two physiotherapists who assess the range of movement in a shoulder, so that we can decide whether or not their ratings can be used interchangeably in the main study. The findings of such performance tests may differ for several reasons. For example, patients may perform the second test differently because of their experience with the first test, the physiotherapists may score the same performance differently or the instructions given by one physiotherapist may motivate the patients more than the instructions given by the other physiotherapist.
Technical developments and advances in medical knowledge mean that new measurement instruments are still appearing in all fields of medicine. Think about recent developments such as functional MRI and DNA microarrays. Furthermore, existing instruments are continuously being refined and existing technologies are being applied beyond their original domains. The current attention to patient-oriented medicine has shifted interest from pathophysiological measurements to impact on functioning, perceived health and quality of life (QOL). Patient-reported outcomes (PROs) have therefore gained importance in medical research.
It is clear that the measurement instruments used in various medical disciplines differ greatly from each other. Therefore, it is evident that details of the development of measurement instruments must be specific to each discipline. However, from a methodological viewpoint, the basic steps in the development of all these measurement instruments are the same. Moreover, basic requirements with regard to measurement properties, which have to be considered in evaluating the adequacy of a new instrument, are similar for all measurement instruments. Chapters 3 and 4 are written from the viewpoint of developers of measurement instruments. When describing the different steps we have the development of PROs in mind. However, at various points in this chapter we will give examples to show analogies with other measurement instruments in medicine.
The ultimate goal of medicine is to cure patients. Therefore, assessing whether the disease status of patients has changed over time is often the most important objective of measurements in clinical practice and clinical and health research. In Section 3.2.3, we stated that we need measurement instruments with an evaluative purpose or application to detect changes in health status over time. These instruments should be responsive. Responsiveness is defined by the COSMIN panel as ‘the ability of an instrument to detect change over time in the construct to be measured’ (Mokkink et al., 2010a). In essence, when assessing responsiveness the hypothesis is tested that if patients change on the construct of interest, their scores on the measurement instrument assessing this construct change accordingly. The approach to assess responsiveness is quite similar as for validity, as we will show in this chapter. In Section 7.2, we will start by elaborating a bit more on the concept of responsiveness. We will discuss the relationship between responsiveness and validity, taking responsiveness as an aspect of validity, in a longitudinal context. We will also elaborate on the definition of responsiveness and the impact of this definition on the assessment of responsiveness.
After addressing the development of measurement instruments in Chapters 3 and 4 and evaluating measurement properties (i.e. reliability, validity and responsiveness) in Chapters 5–7, it is time to pay attention to the interpretability of the scores when applying the measurement instruments. For well-known instruments, such as blood pressure measurements and the Apgar score, the interpretability will cause no problems, but for new or lesser known instruments this may be challenging. This particularly applies to the scores for multi-item measurement instruments, the meaning of which is not immediately clear. For example, in a randomized trial on back pain carried out in the United Kingdom, the effectiveness of exercise therapy and manipulation was compared with usual care in 1334 patients with low back pain. The researchers used the Roland–Morris Disability Questionnaire (RDQ) to assess functional disability (UK BEAM trial team, 2004). The RDQ has a 0–24-point scale, with a score of 0 indicating no disability, and 24 indicating very severe disability. The mean baseline score for the patients with low back pain was 9.0. In the group who received usual care, the mean RDQ value decreased to 6.8 after 3 months, resulting in an average improvement of 2.2 points. This gives rise to the following questions: What does a mean value of 9.0 points on the 0–24 RDQ scale mean? In addition, is an improvement of 2.2 points meaningful for the patients? The primary focus of this chapter is on the interpretability of scores and change scores on a measurement instrument. In other words, the aim is to learn more about the measurement instrument, and not about the disease under study.
This book explains how computer software is designed to perform the tasks required for sophisticated statistical analysis. For statisticians, it examines the nitty-gritty computational problems behind statistical methods. For mathematicians and computer scientists, it looks at the application of mathematical tools to statistical problems. The first half of the book offers a basic background in numerical analysis that emphasizes issues important to statisticians. The next several chapters cover a broad array of statistical tools, such as maximum likelihood and nonlinear regression. The author also treats the application of numerical tools; numerical integration and random number generation are explained in a unified manner reflecting complementary views of Monte Carlo methods. Each chapter contains exercises that range from simple questions to research problems. Most of the examples are accompanied by demonstration and source code available from the author's website. New in this second edition are demonstrations coded in R, as well as new sections on linear programming and the Nelder–Mead search algorithm.
The juxtaposition of these two topics may appear strange to many readers. Upon further reflection, the common thread of spreading points in space may become apparent. My point in combining these topics is to emphasize that this thread is not weak. Monte Carlo should be viewed as just another way to compute an integral; numerical integration should be viewed as just another way to sample points in space. Great gains can be made by exploiting the strengths of one approach when the other is floundering. Only with the willingness to adjust one's viewpoint and use these tools in combination can the full array of techniques be brought to bear on a difficult problem.
Tools such as Riemann sums and Simpson's rule characterize the set of tools known as fixed quadrature or simply quadrature. A viewpoint of these methods as a discretization of the continuous problem of integration is indeed naive. The points are spread in a fixed way in space, with the number of points set in advance. Most of these methods employ a weighting scheme, so that the points (abscissas) where a function is to be evaluated have varying importance. For estimating an integral by evaluating a function at N points in one dimension, the error converges to zero at a rate of O(N-2) or better, depending on the smoothness of the function. In higher dimensions, however, this rate slows considerably.
Maximum likelihood is generally regarded as the best all-purpose approach for statistical analysis. Outside of the most common statistical procedures, when the “optimal” or “usual” method is unknown, most statisticians follow the principle of maximum likelihood for parameter estimation and statistical hypothesis tests. Bayesian statistical methods also rely heavily on maximum likelihood. The main reason for this reliance is that following the principle of maximum likelihood usually leads to very reasonable and effective estimators and tests. From a theoretical viewpoint, under very mild conditions, maximum likelihood estimators (MLEs) are consistent, asymptotically unbiased, and efficient. Moreover, MLEs are invariant under reparameterizations or transformations: the MLE of a function of the parameter is the function of the MLE. From a practical viewpoint, the estimates and test statistics can be constructed without a great deal of analysis, and large-sample standard errors can be computed. Overall, experience has shown that maximum likelihood works well most of the time.
The biggest computational challenge comes from the naive expectation that any statistical problem can be solved if the maximum of some function is found. Instead of relying solely on the unconstrained optimization methods presented in Chapter 8 to meet this unrealistic expectation, the nature of the likelihood function can be exploited in ways that are more effective for computing MLEs. Since the exploitable properties of likelihood functions follow from the large-sample theory, this chapter will begin with a summary of the consistency and asymptotic normality properties of MLEs.