Annually, more than 115 million animals are used worldwide in experimentation or to supply the biomedical industry.Footnote 1 Nonhuman animal (hereafter “animal”) experimentation falls under two categories: basic (i.e., investigation of basic biology and human disease) and applied (i.e., drug research and development and toxicity and safety testing). Regardless of its categorization, animal experimentation is intended to inform human biology and health sciences and to promote the safety and efficacy of potential treatments. Despite its use of immense resources, the animal suffering involved, and its impact on human health, the question of animal experimentation’s efficacy has been subjected to little systematic scrutiny.Footnote 2
Although it is widely accepted that medicine should be evidence based, animal experimentation as a means of informing human health has generally not been held, in practice, to this standard. This fact makes it surprising that animal experimentation is typically viewed as the default and gold standard of preclinical testing and is generally supported without critical examination of its validity. A survey published in 2008 of anecdotal cases and statements given in support of animal experimentation demonstrates how it has not and could not be validated as a necessary step in biomedical research, and the survey casts doubt on its predictive value.Footnote 3 I show that animal experimentation is poorly predictive of human outcomes,Footnote 4 that it is unreliable across a wide category of disease areas,Footnote 5 and that existing literature demonstrates the unreliability of animal experimentation, thereby undermining scientific arguments in its favor. I further show that the collective harms that result from an unreliable practice tip the ethical scale of harms and benefits against continuation in much, if not all, of experimentation involving animals.Footnote 6
Problems of Successful Translation to Humans of Data from Animal Experimentation
Although the unreliability and limitations of animal experimentation have increasingly been acknowledged, there remains a general confidence within much of the biomedical community that they can be overcome.Footnote 7 However, three major conditions undermine this confidence and explain why animal experimentation, regardless of the disease category studied, fails to reliably inform human health: (1) the effects of the laboratory environment and other variables on study outcomes, (2) disparities between animal models of disease and human diseases, and (3) species differences in physiology and genetics. I argue for the critical importance of each of these conditions.
The Influence of Laboratory Procedures and Environments on Experimental Results
Laboratory procedures and conditions exert influences on animals’ physiology and behaviors that are difficult to control and that can ultimately impact research outcomes. Animals in laboratories are involuntarily placed in artificial environments, usually in windowless rooms, for the duration of their lives. Captivity and the common features of biomedical laboratories—such as artificial lighting, human-produced noises, and restricted housing environments—can prevent species-typical behaviors, causing distress and abnormal behaviors among animals.Footnote 8 Among the types of laboratory-generated distress is the phenomenon of contagious anxiety.Footnote 9 Cortisone levels rise in monkeys watching other monkeys being restrained for blood collection.Footnote 10 Blood pressure and heart rates elevate in rats watching other rats being decapitated.Footnote 11 Routine laboratory procedures, such as catching an animal and removing him or her from the cage, in addition to the experimental procedures, cause significant and prolonged elevations in animals’ stress markers.Footnote 12 These stress-related changes in physiological parameters caused by the laboratory procedures and environments can have significant effects on test results.Footnote 13 Stressed rats, for example, develop chronic inflammatory conditions and intestinal leakage, which add variables that can confound data.Footnote 14
A variety of conditions in the laboratory cause changes in neurochemistry, genetic expression, and nerve regeneration.Footnote 15 In one study, for example, mice were genetically altered to develop aortic defects. Yet, when the mice were housed in larger cages, those defects almost completely disappeared.Footnote 16 Providing further examples, typical noise levels in laboratories can damage blood vessels in animals, and even the type of flooring on which animals are tested in spinal cord injury experiments can affect whether a drug shows a benefit.Footnote 17
In order to control for potential confounders, some investigators have called for standardization of laboratory settings and procedures.Footnote 18 One notable effort was made by Crabbe et al. in their investigation of the potential confounding influences of the laboratory environment on six mouse behaviors that are commonly studied in neurobehavioral experiments. Despite their “extraordinary lengths to equate test apparatus, testing protocols, and all possible features of animal husbandry” across three laboratories, there were systematic differences in test results in these labs.Footnote 19 Additionally, different mouse strains varied markedly in all behavioral tests, and for some tests the magnitude of genetic differences depended on the specific testing laboratory. The results suggest that there are important influences of environmental conditions and procedures specific to individual laboratories that can be difficult—perhaps even impossible—to eliminate. These influences can confound research results and impede extrapolation to humans.
The Discordance between Human Diseases and Animal Models of Diseases
The lack of sufficient congruence between animal models and human diseases is another significant obstacle to translational reliability. Human diseases are typically artificially induced in animals, but the enormous difficulty of reproducing anything approaching the complexity of human diseases in animal models limits their usefulness.Footnote 20 Even if the design and conduct of an animal experiment are sound and standardized, the translation of its results to the clinic may fail because of disparities between the animal experimental model and the human condition.Footnote 21
Stroke research presents one salient example of the difficulties in modeling human diseases in animals. Stroke is relatively well understood in its underlying pathology. Yet accurately modeling the disease in animals has proven to be an exercise in futility. To address the inability to replicate human stroke in animals, many assert the need to use more standardized animal study design protocols. This includes the use of animals who represent both genders and wide age ranges, who have comorbidities and preexisting conditions that occur naturally in humans, and who are consequently given medications that are indicated for human patients.Footnote 22 In fact, a set of guidelines, named STAIR, was implemented by a stroke roundtable in 1999 (and updated in 2009) to standardize protocols, limit the discrepancies, and improve the applicability of animal stroke experiments to humans.Footnote 23 One of the most promising stroke treatments later to emerge was NXY-059, which proved effective in animal experiments. However, the drug failed in clinical trials, despite the fact that the set of animal experiments on this drug was considered the poster child for the new experimental standards.Footnote 24 Despite such vigorous efforts, the development of STAIR and other criteria has yet to make a recognizable impact in clinical translation.Footnote 25
Under closer scrutiny, it is not difficult to surmise why animal stroke experiments fail to successfully translate to humans even with new guidelines. Standard stroke medications will likely affect different species differently. There is little evidence to suggest that a female rat, dog, or monkey sufficiently reproduces the physiology of a human female. Perhaps most importantly, reproducing the preexisting conditions of stroke in animals proves just as difficult as reproducing stroke pathology and outcomes. For example, most animals don’t naturally develop significant atherosclerosis, a leading contributor to ischemic stroke. In order to reproduce the effects of atherosclerosis in animals, researchers clamp their blood vessels or artificially insert blood clots. These interventions, however, do not replicate the elaborate pathology of atherosclerosis and its underlying causes. Reproducing human diseases in animals requires reproducing the predisposing diseases, also a formidable challenge. The inability to reproduce the disease in animals so that it is congruent in relevant respects with human stroke has contributed to a high failure rate in drug development. More than 114 potential therapies initially tested in animals failed in human trials.Footnote 26
Further examples of repeated failures based on animal models include drug development in cancer, amyotrophic lateral sclerosis (ALS), traumatic brain injury (TBI), Alzheimer’s disease (AD), and inflammatory conditions. Animal cancer models in which tumors are artificially induced have been the basic translational model used to study key physiological and biochemical properties in cancer onset and propagation and to evaluate novel treatments. Nevertheless, significant limitations exist in the models’ ability to faithfully mirror the complex process of human carcinogenesis.Footnote 27 These limitations are evidenced by the high (among the highest of any disease category) clinical failure rate of cancer drugs.Footnote 28 Analyses of common mice ALS models demonstrate significant differences from human ALS.Footnote 29 The inability of animal ALS models to predict beneficial effects in humans with ALS is recognized.Footnote 30 More than twenty drugs have failed in clinical trials, and the only U.S. Food and Drug Administration (FDA)–approved drug to treat ALS is Riluzole, which shows notably marginal benefit on patient survival.Footnote 31 Animal models have also been unable to reproduce the complexities of human TBI.Footnote 32 In 2010, Maas et al. reported on 27 large Phase 3 clinical trials and 6 unpublished trials in TBI that all failed to show human benefit after showing benefit in animals.Footnote 33 Additionally, even after success in animals, around 172 and 150 drug development failures have been identified in the treatment of human ADFootnote 34 and inflammatory diseases,Footnote 35 respectively.
The high clinical failure rate in drug development across all disease categories is based, at least in part, on the inability to adequately model human diseases in animals and the poor predictability of animal models.Footnote 36 A notable systematic review, published in 2007, compared animal experimentation results with clinical trial findings across interventions aimed at the treatment of head injury, respiratory distress syndrome, osteoporosis, stroke, and hemorrhage.Footnote 37 The study found that the human and animal results were in accordance only half of the time. In other words, the animal experiments were no more likely than a flip of the coin to predict whether those interventions would benefit humans.
In 2004, the FDA estimated that 92 percent of drugs that pass preclinical tests, including “pivotal” animal tests, fail to proceed to the market.Footnote 38 More recent analysis suggests that, despite efforts to improve the predictability of animal testing, the failure rate has actually increased and is now closer to 96 percent.Footnote 39 The main causes of failure are lack of effectiveness and safety problems that were not predicted by animal tests.Footnote 40
Usually, when an animal model is found wanting, various reasons are proffered to explain what went wrong—poor methodology, publication bias, lack of preexisting disease and medications, wrong gender or age, and so on. These factors certainly require consideration, and recognition of each potential difference between the animal model and the human disease motivates renewed efforts to eliminate these differences. As a result, scientific progress is sometimes made by such efforts. However, the high failure rate in drug testing and development, despite attempts to improve animal testing, suggests that these efforts remain insufficient to overcome the obstacles to successful translation that are inherent to the use of animals. Too often ignored is the well-substantiated idea that these models are, for reasons summarized here, intrinsically lacking in relevance to, and thus highly unlikely to yield useful information about, human diseases.Footnote 41
Interspecies Differences in Physiology and Genetics
Ultimately, even if considerable congruence were shown between an animal model and its corresponding human disease, interspecies differences in physiology, behavior, pharmacokinetics, and genetics would significantly limit the reliability of animal studies, even after a substantial investment to improve such studies. In spinal cord injury, for example, drug testing results vary according to which species and even which strain within a species is used, because of numerous interspecies and interstrain differences in neurophysiology, anatomy, and behavior.Footnote 42 The micropathology of spinal cord injury, injury repair mechanisms, and recovery from injury varies greatly among different strains of rats and mice. A systematic review found that even among the most standardized and methodologically superior animal experiments, testing results assessing the effectiveness of methylprednisolone for spinal cord injury treatment varied considerably among species.Footnote 43 This suggests that factors inherent to the use of animals account for some of the major differences in results.
Even rats from the same strain but purchased from different suppliers produce different test results.Footnote 44 In one study, responses to 12 different behavioral measures of pain sensitivity, which are important markers of spinal cord injury, varied among 11 strains of mice, with no clear-cut patterns that allowed prediction of how each strain would respond.Footnote 45 These differences influenced how the animals responded to the injury and to experimental therapies. A drug might be shown to help one strain of mice recover but not another. Despite decades of using animal models, not a single neuroprotective agent that ameliorated spinal cord injury in animal tests has proven efficacious in clinical trials to date.Footnote 46
Further exemplifying the importance of physiological differences among species, a 2013 study reported that the mouse models used extensively to study human inflammatory diseases (in sepsis, burns, infection, and trauma) have been misleading. The study found that mice differ greatly from humans in their responses to inflammatory conditions. Mice differed from humans in what genes were turned on and off and in the timing and duration of gene expression. The mouse models even differed from one another in their responses. The investigators concluded that “our study supports higher priority to focus on the more complex human conditions rather than relying on mouse models to study human inflammatory disease.”Footnote 47 The different genetic responses between mice and humans are likely responsible, at least in part, for the high drug failure rate. The authors stated that every one of almost 150 clinical trials that tested candidate agents’ ability to block inflammatory responses in critically ill patients failed.
Wide differences have also become apparent in the regulation of the same genes, a point that is readily seen when observing differences between human and mouse livers.Footnote 48 Consistent phenotypes (observable physical or biochemical characteristics) are rarely obtained by modification of the same gene, even among different strains of mice.Footnote 49 Gene regulation can substantially differ among species and may be as important as the presence or absence of a specific gene. Despite the high degree of genome conservation, there are critical differences in the order and function of genes among species. To use an analogy: as pianos have the same keys, humans and other animals share (largely) the same genes. Where we mostly differ is in the way the genes or keys are expressed. For example, if we play the keys in a certain order, we hear Chopin; in a different order, we hear Ray Charles; and in yet a different order, it’s Jerry Lee Lewis. In other words, the same keys or genes are expressed, but their different orders result in markedly different outcomes.
Recognizing the inherent genetic differences among species as a barrier to translation, researches have expressed considerable enthusiasm for genetically modified (GM) animals, including transgenic mice models, wherein human genes are inserted into the mouse genome. However, if a human gene is expressed in mice, it will likely function differently from the way it functions in humans, being affected by physiological mechanisms that are unique in mice. For example, a crucial protein that controls blood sugar in humans is missing in mice.Footnote 50 When the human gene that makes this protein was expressed in genetically altered mice, it had the opposite effect from that in humans: it caused loss of blood sugar control in mice. Use of GM mice has failed to successfully model human diseases and to translate into clinical benefit across many disease categories.Footnote 51 Perhaps the primary reason why GM animals are unlikely to be much more successful than other animal models in translational medicine is the fact that the “humanized” or altered genes are still in nonhuman animals.
In many instances, nonhuman primates (NHPs) are used instead of mice or other animals, with the expectation that NHPs will better mimic human results. However, there have been sufficient failures in translation to undermine this optimism. For example, NHP models have failed to reproduce key features of Parkinson’s disease, both in function and in pathology.Footnote 52 Several therapies that appeared promising in both NHPs and rat models of Parkinson’s disease showed disappointing results in humans.Footnote 53 The campaign to prescribe hormone replacement therapy (HRT) in millions of women to prevent cardiovascular disease was based in large part on experiments on NHPs. HRT is now known to increase the risk of these diseases in women.Footnote 54
HIV/AIDS vaccine research using NHPs represents one of the most notable failures in animal experimentation translation. Immense resources and decades of time have been devoted to creating NHP (including chimpanzee) models of HIV. Yet all of about 90 HIV vaccines that succeeded in animals failed in humans.Footnote 55 After HIV vaccine gp120 failed in clinical trials, despite positive outcomes in chimpanzees, a BMJ article commented that important differences between NHPs and humans with HIV misled researchers, taking them down unproductive experimental paths.Footnote 56 Gp120 failed to neutralize HIV grown and tested in cell culture. However, because the serum protected chimpanzees from HIV infection, two Phase 3 clinical trials were undertakenFootnote 57—a clear example of how expectations that NHP data are more predictive than data from other (in this case, cell culture) testing methods are unproductive and harmful. Despite the repeated failures, NHPs (though not chimpanzees or other great apes) remain widely used for HIV research.
The implicit assumption that NHP (and indeed any animal) data are reliable has also led to significant and unjustifiable human suffering. For example, clinical trial volunteers for gp120 were placed at unnecessary risk of harm because of unfounded confidence in NHP experiments. Two landmark studies involving thousands of menopausal women being treated with HRT were terminated early because of increased stroke and breast cancer risk.Footnote 58 In 2003, Elan Pharmaceuticals was forced to prematurely terminate a Phase 2 clinical trial when an investigational AD vaccine was found to cause brain swelling in human subjects. No significant adverse effects were detected in GM mice or NHPs.Footnote 59
In another example of human suffering resulting from animal experimentation, six human volunteers were injected with an immunomodulatory drug, TGN 1412, in 2006.Footnote 60 Within minutes of receiving the experimental drug, all volunteers suffered a severe adverse reaction resulting from a life-threatening cytokine storm that led to catastrophic systemic organ failure. The compound was designed to dampen the immune system, but it had the opposite effect in humans. Prior to this first human trial, TGN 1412 was tested in mice, rabbits, rats, and NHPs with no ill effects. NHPs also underwent repeat-dose toxicity studies and were given 500 times the human dose for at least four consecutive weeks.Footnote 61 None of the NHPs manifested the ill effects that humans showed almost immediately after receiving minute amounts of the test drug. Cynomolgus and rhesus monkeys were specifically chosen because their CD28 receptors demonstrated similar affinity to TGN 1412 as human CD28 receptors. Based on such data as these, it was confidently concluded that results obtained from these NHPs would most reliably predict drug responses in humans—a conclusion that proved devastatingly wrong.
As exemplified by the study of HIV/AIDS, TGN 1412, and other experiences,Footnote 62 experiments with NHPs are not necessarily any more predictive of human responses than experiments with other animals. The repeated failures in translation from studies with NHPs belie arguments favoring use of any nonhuman species to study human physiology and diseases and to test potential treatments. If experimentation using chimpanzees and other NHPs, our closest genetic cousins, are unreliable, how can we expect research using other animals to be reliable? The bottom line is that animal experiments, no matter the species used or the type of disease research undertaken, are highly unreliable—and they have too little predictive value to justify the resultant risks of harms for humans, for reasons I now explain.
The Collective Harms That Result from Misleading Animal Experiments
As medical research has explored the complexities and subtle nuances of biological systems, problems have arisen because the differences among species along these subtler biological dimensions far outweigh the similarities, as a growing body of evidence attests. These profoundly important—and often undetected—differences are likely one of the main reasons human clinical trials fail.Footnote 63
“Appreciation of differences” and “caution” about extrapolating results from animals to humans are now almost universally recommended. But, in practice, how does one take into account differences in drug metabolism, genetics, expression of diseases, anatomy, influences of laboratory environments, and species- and strain-specific physiologic mechanisms—and, in view of these differences, discern what is applicable to humans and what is not? If we cannot determine which physiological mechanisms in which species and strains of species are applicable to humans (even setting aside the complicating factors of different caging systems and types of flooring), the usefulness of the experiments must be questioned.
It has been argued that some information obtained from animal experiments is better than no information.Footnote 64 This thesis neglects how misleading information can be worse than no information from animal tests. The use of nonpredictive animal experiments can cause human suffering in at least two ways: (1) by producing misleading safety and efficacy data and (2) by causing potential abandonment of useful medical treatments and misdirecting resources away from more effective testing methods.
Humans are harmed because of misleading animal testing results. Imprecise results from animal experiments may result in clinical trials of biologically faulty or even harmful substances, thereby exposing patients to unnecessary risk and wasting scarce research resources.Footnote 65 Animal toxicity studies are poor predictors of toxic effects of drugs in humans.Footnote 66 As seen in some of the preceding examples (in particular, stroke, HRT, and TGN1412), humans have been significantly harmed because investigators were misled by the safety and efficacy profile of a new drug based on animal experiments.Footnote 67 Clinical trial volunteers are thus provided with raised hopes and a false sense of security because of a misguided confidence in efficacy and safety testing using animals.
An equal if indirect source of human suffering is the opportunity cost of abandoning promising drugs because of misleading animal tests.Footnote 68 As candidate drugs generally proceed down the development pipeline and to human testing based largely on successful results in animalsFootnote 69 (i.e., positive efficacy and negative adverse effects), drugs are sometimes not further developed due to unsuccessful results in animals (i.e., negative efficacy and/or positive adverse effects). Because much pharmaceutical company preclinical data are proprietary and thus publicly unavailable, it is difficult to know the number of missed opportunities due to misleading animal experiments. However, of every 5,000–10,000 potential drugs investigated, only about 5 proceed to Phase 1 clinical trials.Footnote 70 Potential therapeutics may be abandoned because of results in animal tests that do not apply to humans.Footnote 71 Treatments that fail to work or show some adverse effect in animals because of species-specific influences may be abandoned in preclinical testing even if they may have proved effective and safe in humans if allowed to continue through the drug development pipeline.
An editorial in Nature Reviews Drug Discovery describes cases involving two drugs in which animal test results from species-specific influences could have derailed their development. In particular, it describes how tamoxifen, one of the most effective drugs for certain types of breast cancer, “would most certainly have been withdrawn from the pipeline” if its propensity to cause liver tumor in rats had been discovered in preclinical testing rather than after the drug had been on the market for years.Footnote 72 Gleevec provides another example of effective drugs that could have been abandoned based on misleading animal tests: this drug, which is used to treat chronic myelogenous leukemia (CML), showed serious adverse effects in at least five species tested, including severe liver damage in dogs. However, liver toxicity was not detected in human cell assays, and clinical trials proceeded, which confirmed the absence of significant liver toxicity in humans.Footnote 73 Fortunately for CML patients, Gleevec is a success story of predictive human-based testing. Many useful drugs that have safely been used by humans for decades, such as aspirin and penicillin, may not have been available today if the current animal testing regulatory requirements were in practice during their development.Footnote 74
A further example of near-missed opportunities is provided by experiments on animals that delayed the acceptance of cyclosporine, a drug widely and successfully used to treat autoimmune disorders and prevent organ transplant rejection.Footnote 75 Its immunosuppressive effects differed so markedly among species that researchers judged that the animal results limited any direct inferences that could be made to humans. Providing further examples, PharmaInformatic released a report describing how several blockbuster drugs, including aripiprazole (Abilify) and esomeprazole (Nexium), showed low oral bioavailability in animals. They would likely not be available on the market today if animal tests were solely relied on. Understanding the implications of its findings for drug development in general, PharmaInformatic asked, “Which other blockbuster drugs would be on the market today, if animal trials would have not been used to preselect compounds and drug-candidates for further development?”Footnote 76 These near-missed opportunities and the overall 96 percent failure rate in clinical drug testing strongly suggest the unsoundness of animal testing as a precondition of human clinical trials and provide powerful evidence for the need for a new, human-based paradigm in medical research and drug development.
In addition to potentially causing abandonment of useful treatments, use of an invalid animal disease model can lead researchers and the industry in the wrong research direction, wasting time and significant investment.Footnote 77 Repeatedly, researchers have been lured down the wrong line of investigation because of information gleaned from animal experiments that later proved to be inaccurate, irrelevant, or discordant with human biology. Some claim that we do not know which benefits animal experiments, particularly in basic research, may provide down the road. Yet human lives remain in the balance, waiting for effective therapies. Funding must be strategically invested in the research areas that offer the most promise.
The opportunity costs of continuing to fund unreliable animal tests may impede development of more accurate testing methods. Human organs grown in the lab, human organs on a chip, cognitive computing technologies, 3D printing of human living tissues, and the Human Toxome Project are examples of new human-based technologies that are garnering widespread enthusiasm. The benefit of using these testing methods in the preclinical setting over animal experiments is that they are based on human biology. Thus their use eliminates much of the guesswork required when attempting to extrapolate physiological data from other species to humans. Additionally, these tests offer whole-systems biology, in contrast to traditional in vitro techniques. Although they are gaining momentum, these human-based tests are still in their relative infancy, and funding must be prioritized for their further development. The recent advancements made in the development of more predictive, human-based systems and biological approaches in chemical toxicological testing are an example of how newer and improved tests have been developed because of a shift in prioritization.Footnote 78 Apart from toxicology, though, financial investment in the development of human-based technologies generally falls far short of investment in animal experimentation.Footnote 79
The unreliability of applying animal experimental results to human biology and diseases is increasingly recognized. Animals are in many respects biologically and psychologically similar to humans, perhaps most notably in the shared characteristics of pain, fear, and suffering.Footnote 80 In contrast, evidence demonstrates that critically important physiological and genetic differences between humans and other animals can invalidate the use of animals to study human diseases, treatments, pharmaceuticals, and the like. In significant measure, animal models specifically, and animal experimentation generally, are inadequate bases for predicting clinical outcomes in human beings in the great bulk of biomedical science. As a result, humans can be subject to significant and avoidable harm.
The data showing the unreliability of animal experimentation and the resultant harms to humans (and nonhumans) undermine long-standing claims that animal experimentation is necessary to enhance human health and therefore ethically justified. Rather, they demonstrate that animal experimentation poses significant costs and harms to human beings. It is possible—as I have argued elsewhere—that animal research is more costly and harmful, on the whole, than it is beneficial to human health.Footnote 81 When considering the ethical justifiability of animal experiments, we should ask if it is ethically acceptable to deprive humans of resources, opportunity, hope, and even their lives by seeking answers in what may be the wrong place. In my view, it would be better to direct resources away from animal experimentation and into developing more accurate, human-based technologies.