The Flaws and Human Harms of Animal Experimentation

Abstract: Nonhuman animal (“animal”) experimentation is typically defended by arguments that it is reliable, that animals provide sufficiently good models of human biology and diseases to yield relevant information, and that, consequently, its use provides major human health benefits. I demonstrate that a growing body of scientific literature critically assessing the validity of animal experimentation generally (and animal modeling specifically) raises important concerns about its reliability and predictive value for human outcomes and for understanding human physiology. The unreliability of animal experimentation across a wide range of areas undermines scientific arguments in favor of the practice. Additionally, I show how animal experimentation often significantly harms humans through misleading safety studies, potential abandonment of effective therapeutics, and direction of resources away from more effective testing methods. The resulting evidence suggests that the collective harms and costs to humans from animal experimentation outweigh potential benefits and that resources would be better invested in developing human-based testing methods.


Introduction
Annually, more than 115 million animals are used worldwide in experimentation or to supply the biomedical industry. 1 Nonhuman animal (hereafter "animal") experimentation falls under two categories: basic (i.e., investigation of basic biology and human disease) and applied (i.e., drug research and development and toxicity and safety testing). Regardless of its categorization, animal experimentation is intended to inform human biology and health sciences and to promote the safety and effi cacy of potential treatments. Despite its use of immense resources, the animal suffering involved, and its impact on human health, the question of animal experimentation's effi cacy has been subjected to little systematic scrutiny. 2 Although it is widely accepted that medicine should be evidence based , animal experimentation as a means of informing human health has generally not been held, in practice, to this standard. This fact makes it surprising that animal experimentation is typically viewed as the default and gold standard of preclinical testing and is generally supported without critical examination of its validity. A survey published in 2008 of anecdotal cases and statements given in support of animal experimentation demonstrates how it has not and could not be validated as a necessary step in biomedical research, and the survey casts doubt on its predictive value. I show that animal experimentation is poorly predictive of human outcomes, 4 that it is unreliable across a wide category of disease areas, 5 and that existing literature demonstrates the unreliability of animal experimentation, thereby undermining scientifi c arguments in its favor. I further show that the collective harms that result from an unreliable practice tip the ethical scale of harms and benefi ts against continuation in much, if not all, of experimentation involving animals. 6

Problems of Successful Translation to Humans of Data from Animal Experimentation
Although the unreliability and limitations of animal experimentation have increasingly been acknowledged, there remains a general confi dence within much of the biomedical community that they can be overcome. 7 However, three major conditions undermine this confi dence and explain why animal experimentation, regardless of the disease category studied, fails to reliably inform human health: (1) the effects of the laboratory environment and other variables on study outcomes, (2) disparities between animal models of disease and human diseases, and (3) species differences in physiology and genetics. I argue for the critical importance of each of these conditions.

The Infl uence of Laboratory Procedures and Environments on Experimental Results
Laboratory procedures and conditions exert infl uences on animals' physiology and behaviors that are diffi cult to control and that can ultimately impact research outcomes. Animals in laboratories are involuntarily placed in artifi cial environments, usually in windowless rooms, for the duration of their lives. Captivity and the common features of biomedical laboratories-such as artifi cial lighting, human-produced noises, and restricted housing environments-can prevent speciestypical behaviors, causing distress and abnormal behaviors among animals. 8 Among the types of laboratory-generated distress is the phenomenon of contagious anxiety. 9 Cortisone levels rise in monkeys watching other monkeys being restrained for blood collection. 10 Blood pressure and heart rates elevate in rats watching other rats being decapitated. 11 Routine laboratory procedures, such as catching an animal and removing him or her from the cage, in addition to the experimental procedures, cause signifi cant and prolonged elevations in animals' stress markers. 12 These stress-related changes in physiological parameters caused by the laboratory procedures and environments can have signifi cant effects on test results. 13 Stressed rats, for example, develop chronic infl ammatory conditions and intestinal leakage, which add variables that can confound data. 14 A variety of conditions in the laboratory cause changes in neurochemistry, genetic expression, and nerve regeneration. 15 In one study, for example, mice were genetically altered to develop aortic defects. Yet, when the mice were housed in larger cages, those defects almost completely disappeared. 16 Providing further examples, typical noise levels in laboratories can damage blood vessels in animals, and even the type of fl ooring on which animals are tested in spinal cord injury experiments can affect whether a drug shows a benefi t. 17 In order to control for potential confounders, some investigators have called for standardization of laboratory settings and procedures. 18 One notable effort was made by Crabbe et al. in their investigation of the potential confounding infl uences of the laboratory environment on six mouse behaviors that are commonly studied in neurobehavioral experiments. Despite their "extraordinary lengths to equate test apparatus, testing protocols, and all possible features of animal husbandry" across three laboratories, there were systematic differences in test results in these labs. 19 Additionally, different mouse strains varied markedly in all behavioral tests, and for some tests the magnitude of genetic differences depended on the specifi c testing laboratory. The results suggest that there are important infl uences of environmental conditions and procedures specifi c to individual laboratories that can be diffi cult-perhaps even impossible-to eliminate. These infl uences can confound research results and impede extrapolation to humans.

The Discordance between Human Diseases and Animal Models of Diseases
The lack of suffi cient congruence between animal models and human diseases is another signifi cant obstacle to translational reliability. Human diseases are typically artifi cially induced in animals, but the enormous diffi culty of reproducing anything approaching the complexity of human diseases in animal models limits their usefulness. 20 Even if the design and conduct of an animal experiment are sound and standardized, the translation of its results to the clinic may fail because of disparities between the animal experimental model and the human condition. 21 Stroke research presents one salient example of the diffi culties in modeling human diseases in animals. Stroke is relatively well understood in its underlying pathology. Yet accurately modeling the disease in animals has proven to be an exercise in futility. To address the inability to replicate human stroke in animals, many assert the need to use more standardized animal study design protocols. This includes the use of animals who represent both genders and wide age ranges, who have comorbidities and preexisting conditions that occur naturally in humans, and who are consequently given medications that are indicated for human patients. 22 In fact, a set of guidelines, named STAIR, was implemented by a stroke roundtable in 1999 (and updated in 2009) to standardize protocols, limit the discrepancies, and improve the applicability of animal stroke experiments to humans. 23 One of the most promising stroke treatments later to emerge was NXY-059, which proved effective in animal experiments. However, the drug failed in clinical trials, despite the fact that the set of animal experiments on this drug was considered the poster child for the new experimental standards. 24 Despite such vigorous efforts, the development of STAIR and other criteria has yet to make a recognizable impact in clinical translation. 25 Under closer scrutiny, it is not diffi cult to surmise why animal stroke experiments fail to successfully translate to humans even with new guidelines. Standard stroke medications will likely affect different species differently. There is little evidence to suggest that a female rat, dog, or monkey suffi ciently reproduces the physiology of a human female. Perhaps most importantly, reproducing the preexisting conditions of stroke in animals proves just as diffi cult as reproducing stroke pathology and outcomes. For example, most animals don't naturally develop signifi cant atherosclerosis, a leading contributor to ischemic stroke. In order to reproduce the effects of atherosclerosis in animals, researchers clamp their blood vessels or artifi cially insert blood clots. These interventions, however, do not replicate the elaborate pathology of atherosclerosis and its underlying causes. Reproducing human diseases in animals requires reproducing the predisposing diseases, also a formidable challenge. The inability to reproduce the disease in animals so that it is congruent in relevant respects with human stroke has contributed to a high failure rate in drug development. More than 114 potential therapies initially tested in animals failed in human trials. 26 Further examples of repeated failures based on animal models include drug development in cancer, amyotrophic lateral sclerosis (ALS), traumatic brain injury (TBI), Alzheimer's disease (AD), and infl ammatory conditions. Animal cancer models in which tumors are artifi cially induced have been the basic translational model used to study key physiological and biochemical properties in cancer onset and propagation and to evaluate novel treatments. Nevertheless, signifi cant limitations exist in the models' ability to faithfully mirror the complex process of human carcinogenesis. 27 These limitations are evidenced by the high (among the highest of any disease category) clinical failure rate of cancer drugs. 28 Analyses of common mice ALS models demonstrate signifi cant differences from human ALS. 29 The inability of animal ALS models to predict benefi cial effects in humans with ALS is recognized. 30 More than twenty drugs have failed in clinical trials, and the only U.S. Food and Drug Administration (FDA)-approved drug to treat ALS is Riluzole, which shows notably marginal benefi t on patient survival. 31 Animal models have also been unable to reproduce the complexities of human TBI. 32 In 2010, Maas et al. reported on 27 large Phase 3 clinical trials and 6 unpublished trials in TBI that all failed to show human benefi t after showing benefi t in animals. 33 Additionally, even after success in animals, around 172 and 150 drug development failures have been identifi ed in the treatment of human AD 34 and infl ammatory diseases, 35 respectively.
The high clinical failure rate in drug development across all disease categories is based, at least in part, on the inability to adequately model human diseases in animals and the poor predictability of animal models. 36 A notable systematic review, published in 2007, compared animal experimentation results with clinical trial fi ndings across interventions aimed at the treatment of head injury, respiratory distress syndrome, osteoporosis, stroke, and hemorrhage. 37 The study found that the human and animal results were in accordance only half of the time. In other words, the animal experiments were no more likely than a fl ip of the coin to predict whether those interventions would benefi t humans.
In 2004, the FDA estimated that 92 percent of drugs that pass preclinical tests, including "pivotal" animal tests, fail to proceed to the market. 38 More recent analysis suggests that, despite efforts to improve the predictability of animal testing, the failure rate has actually increased and is now closer to 96 percent. 39 The main causes of failure are lack of effectiveness and safety problems that were not predicted by animal tests. 40 Usually, when an animal model is found wanting, various reasons are proffered to explain what went wrong-poor methodology, publication bias, lack of preexisting disease and medications, wrong gender or age, and so on. These factors certainly require consideration, and recognition of each potential difference between the animal model and the human disease motivates renewed efforts to eliminate these differences. As a result, scientifi c progress is sometimes made by such efforts. However, the high failure rate in drug testing and development, despite attempts to improve animal testing, suggests that these efforts remain insuffi cient to overcome the obstacles to successful translation that are inherent to the use of animals.
Too often ignored is the well-substantiated idea that these models are, for reasons summarized here, intrinsically lacking in relevance to, and thus highly unlikely to yield useful information about, human diseases. 41

Interspecies Differences in Physiology and Genetics
Ultimately, even if considerable congruence were shown between an animal model and its corresponding human disease, interspecies differences in physiology, behavior, pharmacokinetics, and genetics would signifi cantly limit the reliability of animal studies, even after a substantial investment to improve such studies. In spinal cord injury, for example, drug testing results vary according to which species and even which strain within a species is used, because of numerous interspecies and interstrain differences in neurophysiology, anatomy, and behavior. 42 The micropathology of spinal cord injury, injury repair mechanisms, and recovery from injury varies greatly among different strains of rats and mice. A systematic review found that even among the most standardized and methodologically superior animal experiments, testing results assessing the effectiveness of methylprednisolone for spinal cord injury treatment varied considerably among species. 43 This suggests that factors inherent to the use of animals account for some of the major differences in results.
Even rats from the same strain but purchased from different suppliers produce different test results. 44 In one study, responses to 12 different behavioral measures of pain sensitivity, which are important markers of spinal cord injury, varied among 11 strains of mice, with no clear-cut patterns that allowed prediction of how each strain would respond. 45 These differences infl uenced how the animals responded to the injury and to experimental therapies. A drug might be shown to help one strain of mice recover but not another. Despite decades of using animal models, not a single neuroprotective agent that ameliorated spinal cord injury in animal tests has proven effi cacious in clinical trials to date. 46 Further exemplifying the importance of physiological differences among species, a 2013 study reported that the mouse models used extensively to study human infl ammatory diseases (in sepsis, burns, infection, and trauma) have been misleading. The study found that mice differ greatly from humans in their responses to infl ammatory conditions. Mice differed from humans in what genes were turned on and off and in the timing and duration of gene expression. The mouse models even differed from one another in their responses. The investigators concluded that "our study supports higher priority to focus on the more complex human conditions rather than relying on mouse models to study human infl ammatory disease." 47 The different genetic responses between mice and humans are likely responsible, at least in part, for the high drug failure rate. The authors stated that every one of almost 150 clinical trials that tested candidate agents' ability to block infl ammatory responses in critically ill patients failed.
Wide differences have also become apparent in the regulation of the same genes, a point that is readily seen when observing differences between human and mouse livers. 48 Consistent phenotypes (observable physical or biochemical characteristics) are rarely obtained by modifi cation of the same gene, even among different strains of mice. 49 Gene regulation can substantially differ among species and may be as important as the presence or absence of a specifi c gene. Despite the high degree of genome conservation, there are critical differences in the order and function of genes among species. To use an analogy: as pianos have the same keys, humans and other animals share (largely) the same genes. Where we mostly differ is in the way the genes or keys are expressed. For example, if we play the keys in a certain order, we hear Chopin; in a different order, we hear Ray Charles; and in yet a different order, it's Jerry Lee Lewis. In other words, the same keys or genes are expressed, but their different orders result in markedly different outcomes.
Recognizing the inherent genetic differences among species as a barrier to translation, researches have expressed considerable enthusiasm for genetically modifi ed (GM) animals, including transgenic mice models, wherein human genes are inserted into the mouse genome. However, if a human gene is expressed in mice, it will likely function differently from the way it functions in humans, being affected by physiological mechanisms that are unique in mice. For example, a crucial protein that controls blood sugar in humans is missing in mice. 50 When the human gene that makes this protein was expressed in genetically altered mice, it had the opposite effect from that in humans: it caused loss of blood sugar control in mice. Use of GM mice has failed to successfully model human diseases and to translate into clinical benefi t across many disease categories. 51 Perhaps the primary reason why GM animals are unlikely to be much more successful than other animal models in translational medicine is the fact that the "humanized" or altered genes are still in nonhuman animals.
In many instances, nonhuman primates (NHPs) are used instead of mice or other animals, with the expectation that NHPs will better mimic human results. However, there have been suffi cient failures in translation to undermine this optimism. For example, NHP models have failed to reproduce key features of Parkinson's disease, both in function and in pathology. 52 Several therapies that appeared promising in both NHPs and rat models of Parkinson's disease showed disappointing results in humans. 53 The campaign to prescribe hormone replacement therapy (HRT) in millions of women to prevent cardiovascular disease was based in large part on experiments on NHPs. HRT is now known to increase the risk of these diseases in women. 54 HIV/AIDS vaccine research using NHPs represents one of the most notable failures in animal experimentation translation. Immense resources and decades of time have been devoted to creating NHP (including chimpanzee) models of HIV. Yet all of about 90 HIV vaccines that succeeded in animals failed in humans. 55 After HIV vaccine gp120 failed in clinical trials, despite positive outcomes in chimpanzees, a BMJ article commented that important differences between NHPs and humans with HIV misled researchers, taking them down unproductive experimental paths. 56 Gp120 failed to neutralize HIV grown and tested in cell culture. However, because the serum protected chimpanzees from HIV infection, two Phase 3 clinical trials were undertaken 57 -a clear example of how expectations that NHP data are more predictive than data from other (in this case, cell culture) testing methods are unproductive and harmful. Despite the repeated failures, NHPs (though not chimpanzees or other great apes) remain widely used for HIV research.
The implicit assumption that NHP (and indeed any animal) data are reliable has also led to signifi cant and unjustifi able human suffering. For example, clinical trial volunteers for gp120 were placed at unnecessary risk of harm because of unfounded confi dence in NHP experiments. Two landmark studies involving thousands of menopausal women being treated with HRT were terminated early because of increased stroke and breast cancer risk. 58 In 2003, Elan Pharmaceuticals was forced to prematurely terminate a Phase 2 clinical trial when an investigational AD vaccine was found to cause brain swelling in human subjects. No signifi cant adverse effects were detected in GM mice or NHPs. 59 In another example of human suffering resulting from animal experimentation, six human volunteers were injected with an immunomodulatory drug, TGN 1412, in 2006. 60 Within minutes of receiving the experimental drug, all volunteers suffered a severe adverse reaction resulting from a life-threatening cytokine storm that led to catastrophic systemic organ failure. The compound was designed to dampen the immune system, but it had the opposite effect in humans. Prior to this fi rst human trial, TGN 1412 was tested in mice, rabbits, rats, and NHPs with no ill effects. NHPs also underwent repeat-dose toxicity studies and were given 500 times the human dose for at least four consecutive weeks. 61 None of the NHPs manifested the ill effects that humans showed almost immediately after receiving minute amounts of the test drug. Cynomolgus and rhesus monkeys were specifi cally chosen because their CD28 receptors demonstrated similar affi nity to TGN 1412 as human CD28 receptors. Based on such data as these, it was confi dently concluded that results obtained from these NHPs would most reliably predict drug responses in humans-a conclusion that proved devastatingly wrong.
As exemplifi ed by the study of HIV/AIDS, TGN 1412, and other experiences, 62 experiments with NHPs are not necessarily any more predictive of human responses than experiments with other animals. The repeated failures in translation from studies with NHPs belie arguments favoring use of any nonhuman species to study human physiology and diseases and to test potential treatments. If experimentation using chimpanzees and other NHPs, our closest genetic cousins, are unreliable, how can we expect research using other animals to be reliable? The bottom line is that animal experiments, no matter the species used or the type of disease research undertaken, are highly unreliable-and they have too little predictive value to justify the resultant risks of harms for humans, for reasons I now explain.

The Collective Harms That Result from Misleading Animal Experiments
As medical research has explored the complexities and subtle nuances of biological systems, problems have arisen because the differences among species along these subtler biological dimensions far outweigh the similarities , as a growing body of evidence attests. These profoundly important-and often undetected-differences are likely one of the main reasons human clinical trials fail. 63 "Appreciation of differences" and "caution" about extrapolating results from animals to humans are now almost universally recommended. But, in practice, how does one take into account differences in drug metabolism, genetics, expression of diseases, anatomy, infl uences of laboratory environments, and species-and strain-specifi c physiologic mechanisms-and, in view of these differences, discern what is applicable to humans and what is not? If we cannot determine which physiological mechanisms in which species and strains of species are applicable to humans (even setting aside the complicating factors of different caging systems and types of fl ooring), the usefulness of the experiments must be questioned.
It has been argued that some information obtained from animal experiments is better than no information. 64 This thesis neglects how misleading information can be worse than no information from animal tests. The use of nonpredictive animal experiments can cause human suffering in at least two ways: (1) by producing misleading safety and effi cacy data and (2) by causing potential abandonment of useful medical treatments and misdirecting resources away from more effective testing methods.
Humans are harmed because of misleading animal testing results. Imprecise results from animal experiments may result in clinical trials of biologically faulty or even harmful substances, thereby exposing patients to unnecessary risk and wasting scarce research resources. 65 Animal toxicity studies are poor predictors of toxic effects of drugs in humans. 66 As seen in some of the preceding examples (in particular, stroke, HRT, and TGN1412), humans have been signifi cantly harmed because investigators were misled by the safety and effi cacy profi le of a new drug based on animal experiments. 67 Clinical trial volunteers are thus provided with raised hopes and a false sense of security because of a misguided confi dence in effi cacy and safety testing using animals.
An equal if indirect source of human suffering is the opportunity cost of abandoning promising drugs because of misleading animal tests. 68 As candidate drugs generally proceed down the development pipeline and to human testing based largely on successful results in animals 69 (i.e., positive effi cacy and negative adverse effects), drugs are sometimes not further developed due to unsuccessful results in animals (i.e., negative effi cacy and/or positive adverse effects). Because much pharmaceutical company preclinical data are proprietary and thus publicly unavailable, it is diffi cult to know the number of missed opportunities due to misleading animal experiments. However, of every 5,000-10,000 potential drugs investigated, only about 5 proceed to Phase 1 clinical trials. 70 Potential therapeutics may be abandoned because of results in animal tests that do not apply to humans. 71 Treatments that fail to work or show some adverse effect in animals because of species-specifi c infl uences may be abandoned in preclinical testing even if they may have proved effective and safe in humans if allowed to continue through the drug development pipeline.
An editorial in Nature Reviews Drug Discovery describes cases involving two drugs in which animal test results from species-specifi c infl uences could have derailed their development. In particular, it describes how tamoxifen, one of the most effective drugs for certain types of breast cancer, "would most certainly have been withdrawn from the pipeline" if its propensity to cause liver tumor in rats had been discovered in preclinical testing rather than after the drug had been on the market for years. 72 Gleevec provides another example of effective drugs that could have been abandoned based on misleading animal tests: this drug, which is used to treat chronic myelogenous leukemia (CML), showed serious adverse effects in at least fi ve species tested, including severe liver damage in dogs. However, liver toxicity was not detected in human cell assays, and clinical trials proceeded, which confi rmed the absence of signifi cant liver toxicity in humans. 73 Fortunately for CML patients, Gleevec is a success story of predictive human-based testing. Many useful drugs that have safely been used by humans for decades, such as aspirin and penicillin, may not have been available today if the current animal testing regulatory requirements were in practice during their development. 74 A further example of near-missed opportunities is provided by experiments on animals that delayed the acceptance of cyclosporine, a drug widely and successfully used to treat autoimmune disorders and prevent organ transplant rejection. 75 Its immunosuppressive effects differed so markedly among species that researchers judged that the animal results limited any direct inferences that could be made to humans. Providing further examples, PharmaInformatic released a report describing how several blockbuster drugs, including aripiprazole (Abilify) and esomeprazole (Nexium), showed low oral bioavailability in animals. They would likely not be available on the market today if animal tests were solely relied on. Understanding the implications of its fi ndings for drug development in general, PharmaInformatic asked, "Which other blockbuster drugs would be on the market today, if animal trials would have not been used to preselect compounds and drug-candidates for further development?" 76 These near-missed opportunities and the overall 96 percent failure rate in clinical drug testing strongly suggest the unsoundness of animal testing as a precondition of human clinical trials and provide powerful evidence for the need for a new, human-based paradigm in medical research and drug development.
In addition to potentially causing abandonment of useful treatments, use of an invalid animal disease model can lead researchers and the industry in the wrong research direction, wasting time and signifi cant investment. 77 Repeatedly, researchers have been lured down the wrong line of investigation because of information gleaned from animal experiments that later proved to be inaccurate, irrelevant, or discordant with human biology. Some claim that we do not know which benefi ts animal experiments, particularly in basic research, may provide down the road. Yet human lives remain in the balance, waiting for effective therapies. Funding must be strategically invested in the research areas that offer the most promise.
The opportunity costs of continuing to fund unreliable animal tests may impede development of more accurate testing methods. Human organs grown in the lab, human organs on a chip, cognitive computing technologies, 3D printing of human living tissues, and the Human Toxome Project are examples of new human-based technologies that are garnering widespread enthusiasm. The benefi t of using these testing methods in the preclinical setting over animal experiments is that they are based on human biology. Thus their use eliminates much of the guesswork required when attempting to extrapolate physiological data from other species to humans. Additionally, these tests offer whole-systems biology, in contrast to traditional in vitro techniques. Although they are gaining momentum, these human-based tests are still in their relative infancy, and funding must be prioritized for their further development. The recent advancements made in the development of more predictive, human-based systems and biological approaches in chemical toxicological testing are an example of how newer and improved tests have been developed because of a shift in prioritization. 78 Apart from toxicology, though, fi nancial investment in the development of human-based technologies generally falls far short of investment in animal experimentation. 79

Conclusion
The unreliability of applying animal experimental results to human biology and diseases is increasingly recognized. Animals are in many respects biologically and psychologically similar to humans, perhaps most notably in the shared characteristics of pain, fear, and suffering. 80 In contrast, evidence demonstrates that critically important physiological and genetic differences between humans and other animals can invalidate the use of animals to study human diseases, treatments, pharmaceuticals, and the like. In signifi cant measure, animal models specifi cally, and animal experimentation generally, are inadequate bases for predicting clinical outcomes in human beings in the great bulk of biomedical science. As a result, humans can be subject to signifi cant and avoidable harm.
The data showing the unreliability of animal experimentation and the resultant harms to humans (and nonhumans) undermine long-standing claims that animal experimentation is necessary to enhance human health and therefore ethically justifi ed. Rather, they demonstrate that animal experimentation poses signifi cant costs and harms to human beings. It is possible-as I have argued elsewhere-that animal research is more costly and harmful, on the whole, than it is benefi cial to human health. 81 When considering the ethical justifi ability of animal experiments, we should ask if it is ethically acceptable to deprive humans of resources, opportunity, hope, and even their lives by seeking answers in what may be the wrong place. In my view, it would be better to direct resources away from animal experimentation and into developing more accurate, human-based technologies.