To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Bootstrap methods are computer-intensive methods of statistical analysis, which use simulation to calculate standard errors, confidence intervals, and significance tests. The methods apply for any level of modelling, and so can be used for fully parametric, semiparametric, and completely nonparametric analysis. This 1997 book gives a broad and up-to-date coverage of bootstrap methods, with numerous applied examples, developed in a coherent way with the necessary theoretical basis. Applications include stratified data; finite populations; censored and missing data; linear, nonlinear, and smooth regression models; classification; time series and spatial problems. Special features of the book include: extensive discussion of significance tests and confidence intervals; material on various diagnostic methods; and methods for efficient computation, including improved Monte Carlo simulation. Each chapter includes both practical and theoretical exercises. S-Plus programs for implementing the methods described in the text are available from the supporting website.
Genomics is majorly impacting therapeutics development in medicine. This book contains up-to-date information on the use of genomics in the design and analysis of therapeutic clinical trials with a focus on novel approaches that provide a reliable basis for identifying which patients are most likely to benefit from each treatment. It is oriented to both clinical investigators and statisticians. For clinical investigators, it includes background information on clinical trial design and statistical analysis. For statisticians and others who want to go deeper, it covers state-of-the-art adaptive designs and the development and validation of probabilistic classifiers. The author describes the development and validation of prognostic and predictive biomarkers and their integration into clinical trials that establish their clinical utility for informing treatment decisions for future patients.
Recent decades have brought advances in statistical theory for missing data, which, combined with advances in computing ability, have allowed implementation of a wide array of analyses. In fact, so many methods are available that it can be difficult to ascertain when to use which method. This book focuses on the prevention and treatment of missing data in longitudinal clinical trials. Based on his extensive experience with missing data, the author offers advice on choosing analysis methods and on ways to prevent missing data through appropriate trial design and conduct. He offers a practical guide to key principles and explains analytic methods for the non-statistician using limited statistical notation and jargon. The book's goal is to present a comprehensive strategy for preventing and treating missing data, and to make available the programs used to conduct the analyses of the example dataset.
One of the keys to understanding the potential impact of missing data is to understand the mechanism(s) that gave rise to the missingness. However, before considering missing data mechanisms, two important points are relevant. First, there is no single definition of a missing value. Even if restricting focus to dropout (withdrawal), several possibilities exist. For example, values may be missing as the result of a patient being lost to follow-up, with nothing known about treatment or measurements past the point of dropout. Alternatively, a patient may withdraw from the initially randomized study medication and be given an alternative (rescue) treatment, but with no further measurements taken. Or, follow-up measurements may continue after initiation of the rescue treatment. All these and other scenarios may happen within a single trial, with differing implications for appropriate handling of the data (Mallinckrodt and Kenward, 2009).
Moreover, the consequences of missing values are situation dependent. For example, in a clinical trial for diabetes, if a patient is lost to follow-up halfway through the trial, information needed to understand how well the drug worked for that patient is indeed missing. On the other hand, in a trial for a treatment to prevent breast cancer, if a patient dies from breast cancer midway through the trial, follow-up data are again incomplete; however, information about how well the treatment worked for that patient is not missing because it is known that the treatment did not work.
This chapter illustrates via retrospective analyses of a longitudinal clinical trial how the principles and recommendations outlined in previous chapters can be applied a priori, such as would be required in regulatory settings for confirmatory trials and for optimum decision making in early phase trials. The next section describes the setting, the data, and the originally reported results from the trial used in the present re-analysis. Section 14.3 specifies the objectives and Section 14.4 describes the analysis plan for the retrospective analyses as they could be prespecified, using ideas presented in Chapter 12. The results from the retrospective analyses are presented in Section 14.5, followed by a discussion on how principled inferences can be drawn from the results.
Having proper tools to conduct sensitivity analyses is essential if they are to be a routine part of clinical trial analysis and reporting. Software tools to implement the various analyses presented in this chapter are made available in Chapter 15.
This book focuses on the prevention and treatment of missing data in longitudinal clinical trials with repeated measures, such as are common in later phases of medical research and drug development. Recent decades have brought advances in statistical theory, which, combined with advances in computing ability, have allowed implementation of a wide array of analyses. In fact, so many methods are available that it can be difficult to ascertain when to use which method. A danger in such circumstances is to blindly use newer methods without proper understanding of their strengths and limitations, or to disregard all newer methods in favor of familiar approaches.
Moreover, the complex discussions on how to analyze incomplete data have overshadowed discussions on ways to prevent missing data, which would of course be the preferred solution. Therefore, preventing missing data through appropriate trial design and conduct is given significant attention in this book. Nevertheless, despite all efforts at prevention, missing data will remain an ever-present problem and analytic approaches will continue to be an important consideration.
Recent research has fostered an emerging consensus regarding the analysis of incomplete longitudinal data. Key principles and analytic methods are explained in terms non-statisticians can understand. Although the use of equations, symbols, and Greek letters to describe the analyses is largely avoided, sufficient technical detail is provided so readers can take away more than a peripheral understanding of the methods and issues. For those with in-depth statistical interests, reference to more technical literature is provided.
The evidence to support new medicines, devices, or other medical interventions is based primarily on randomized clinical trials. Many of these trials involve assessments taken at the start of treatment (baseline), followed by assessments taken repeatedly during and in some scenarios after the treatment period. In some cases, such as cancer trials, the primary post-baseline assessments are whether or not some important event occurred during the assessment intervals. These outcomes can be summarized by expressing the multiple post-baseline outcomes as a time to an event, or as a percentage of patients experiencing the event at or before some landmark time point. Alternatively, the multiple post-baseline assessments can all be used in a longitudinal, repeated measures analysis, which can either focus on a landmark time point or consider outcomes across time points.
Regardless of the specific scenario, randomization facilitates fair comparisons between treatment and control groups by balancing known and unknown factors across the groups. The intent of randomization in particular, and the design of clinical trials in general, is that differences observed between the treatment and control groups are attributable to causal differences in the treatments and not to other factors.
Missing data is an ever-present problem in clinical trials and has been the subject of considerable debate and research. In fact, the U.S. Food and Drug Administration convened an expert panel to make recommendations for the prevention and treatment of missing data (NRC, 2010). The fundamental problem caused by missing data is that the balance provided by randomization is lost if, as is usually the case, the patients who discontinue the study differ in regards to the outcome of interest from those who complete the study.
Missing data is an ever-present problem in clinical trials, which can destroy the balance provided by randomization and thereby bias treatment group comparisons. Data simulation has provided a powerful platform for comparing how well analytic methods perform with incomplete data. In contrast, methods of preventing missing data cannot be evaluated via simulation and actual clinical trials are not designed to assess factors that influence retention. Therefore, many confounding factors can mask or exaggerate differences in rates of missing data attributable to trial methods. Not surprisingly then, the literature contains more information on how to treat missing data than on how to prevent it.
In order to understand the potential impact of missing data and to choose an appropriate analysis, the mechanism(s) leading to the missingness must be considered. In longitudinal clinical trials, MCAR is not likely to hold, MAR is often plausible but never testable, and going beyond MAR to MNAR requires assumptions that are also not testable. Although some analyses are better than others in mitigating the problems caused by missing data, no analysis solves the problems. Even if bias is minimized, the loss of information can still be considerable.
Until recently, guidelines for the analysis of clinical trial data provided only limited advice on how to handle missing data, and analytic approaches tended to be simple and ad hoc. The calculations required to estimate parameters from a balanced data set with the same number of patients in each treatment group at each assessment time are far easier than the calculations required when the numbers are not balanced, as is the case when patients drop out. Hence, the motivation behind early methods of dealing with missing data may have been as much to restore balance and foster computational feasibility in an era of limited computing power as to counteract the potential bias from the missing values.
However, with advances in statistical theory and in computing power that facilitates implementation of the theory, more principled approaches can now be easily implemented. This chapter begins with sections describing the simpler methods, including complete case analyses and single imputation methods such as last and baseline observation carried forward. Subsequent sections cover more principled methods, including multiple imputation, inverse probability weighting, and modeling approaches such as direct likelihood.
An important evolution in the discussions on missing data has been the focus on clarity of objectives. In fact, the first recommendation from the recent National Research Council (NRC, 2010) recommendations on the prevention and treatment of missing data was that the objectives be clearly specified.
The need for clarity in objectives is driven by ambiguities arising from the missing data. As noted in Chapter 2, data may be intermittently missing or missing due to dropout. Patients may or may not be given rescue medications. Assessments after withdrawal from the initially randomized study medication or after the addition of rescue medications may or may not be taken. Whether or not – and if so, how – these follow-up data should be used in analyses and inference is critically important.
Conceptually, an estimand is simply what is being estimated. Components of estimands for longitudinal trials may include the parameter (e.g., difference between treatments in mean change), time point or duration of exposure (e.g., at Week 8), outcome measure (e.g., diastolic blood pressure), population (e.g., in patients diagnosed with hypertension), and inclusion/exclusion of follow-up data after discontinuation of the originally assigned study medication and/or initiation of rescue medication.
The previous chapter on design options to lower rates of missing data noted that evidence to support various approaches was limited because studies are not done specifically to assess how trial design influences retention. Therefore, most assessments are on a between-study basis and thus many confounding factors can mask or exaggerate differences due to trial methods. These same factors limit understanding of how trial conduct influences retention. Again, mindful of these limitations, the recent NRC guidance (NRC 2010) provides a number of suggestions on ways to minimize missing data.
Actions for Design and Management Teams
Trial design should limit participants’ burden and inconvenience in data collection. However, once again a trade-off needs to be considered; collecting less data means getting less information. Therefore, study teams need to strike a balance between the conflicting goals of getting the most information possible from a trial and reducing patient burden in order to increase retention (NRC, 2010).
Trial Conduct Options to Reduce Missing Data
Actions for Design and Management Teams
Trial design should limit participants’ burden and inconvenience in data collection. However, once again a trade-off needs to be considered; collecting less data means getting less information. Therefore, study teams need to strike a balance between the conflicting goals of getting the most information possible from a trial and reducing patient burden in order to increase retention (NRC, 2010).
Many of the principles regarding analysis of incomplete data previously discussed for continuous outcomes also apply to categorical outcomes. For example, the missing data mechanisms (Chapter 2) apply to categorical data in essentially the same manner as for continuous data. In addition, considerations regarding modeling time and correlation are also essentially the same as previously outlined for continuous outcomes (Chapter 7). As with continuous data, likelihood-based methods are appealing because of their flexible ignorability properties (Chapter 8). However, their use for categorical outcomes can be problematic because of increased computational requirements as compared with continuous data. Therefore, GEE is a useful alternative.
Despite the similarities between continuous and categorical analyses of incomplete data, some aspects are unique to categorical outcomes, and that is the focus of this chapter. The next section begins with a discussion on marginal and conditional inference because this sets the stage for subsequent sections that discuss the similarities and differences between analyses of continuous and categorical data.