Hostname: page-component-5db58dd55d-mhzq2 Total loading time: 0 Render date: 2026-06-02T01:22:03.681Z Has data issue: false hasContentIssue false

Invited review: Bioinformatic methods to discover the likely causal variant of a new autosomal recessive genetic condition using genome-wide data

Published online by Cambridge University Press:  10 August 2018

G. E. Pollott*
Affiliation:
Department of Pathobiology and Population Sciences, Royal Veterinary College, Royal College Street, LondonNW1 0TU, UK
*

Abstract

In animals, new autosomal recessive genetic diseases (ARGD) arise all the time due to the regular, random mutations that occur during meiosis. In order to reduce the effect of any damaging new variant, it is necessary to find its cause. To evaluate the best way of doing this, 34 papers which found the exact location of a new genetic disease in livestock were reviewed and found to require at least two stages. In the initial stage the commonly used χ2 method, applied in a case-control association analysis with single nucleotide polymorphism (SNP)-chip data, was found to have limitations and was almost always used in conjunction with a second method to locate the target region on the genome containing the variant. The commonly used methods had their drawbacks; so a new method was devised based on long runs of homozygosity, a common feature of new ARGD. This ‘autozygosity by difference’ method was found to be as good as, or better than, all the reviewed methods tested based on its ability to unambiguously find the shortest known target region in an already analysed data set. Mean target region length was found to be 4.6 megabases in the published reports. Success did not depend on the size of commercial SNP-chip used, and studies with as few as three cases and four controls were large enough to find the target region. The final stage relied on either sequencing the candidate genes found in the target region or using whole genome sequencing (WGS) on a small number of cases. Sometimes this latter method was used in conjunction with WGS on a number of control animals or resources such as the 1000 bull genomes data. Calculations showed that, in cattle, less than 15 animals would be needed in order to locate the new variant when using WGS data. This could be any combination of cases plus parents or other unrelated animals in the breed. Using WGS data, it would be necessary to search the three billion bases of the cattle genome for base positions which were homozygous for the same allele in all cases and heterozygous for that allele in parents, or not containing that homozygote in unrelated controls. This site could be confirmed on other healthy animals using much cheaper methods, and then a genetic test could be devised for that variant in order to screen the whole population and to devise a breeding programme to eliminate the disorder from the population.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-, NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium,, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained, for commercial re-use or in order to create a derivative work.
Copyright
© The Animal Consortium 2018
Figure 0

Figure 1 The advent of a new autosomal recessive genetic condition on a single pair of homologous chromosomes followed over several generations.

Figure 1

Table 1 Summary of the five methods using the Lavender Foal Syndrome data set of Brooks et al. (2010) based on the horse genome build EquCab 2.0

Figure 2

Table 2 Examples of the Fisher’s exact test results in the Lavender Foal Syndrome target region based on the horse genome build EquCab2.0

Figure 3

Table 3 Summary of regions defined as containing a run of homozygosity from PLINK using the six Lavender Foal Syndrome cases based on the horse genome build EquCab2.0

Figure 4

Figure 2 Results of calculating mean runs of homozygosity (ROH) scores for the Lavender Foal Syndrome data set cases using the autozygosity-by-difference method (P=0.05 shown as ROH=3576 kb after 1000 permutations on cases; top plot) and as differences between cases and control mean ROH length (permutated 0.001 P-value shown after 1000 permutations as 4315 Kb; bottom plot) (based on the EquCab2.0 build of the horse genome).

Figure 5

Table 4 Results of running a different number of permutations for the Lavender Foal Syndrome data set using PLINK label-swapping permutation for a genotypic χ2 table

Figure 6

Table 5 Possible genotypes at two adjacent single nucleotide polymorphism (SNP) loci, one polymorphic for adenine (A) and guanine (G) and the other for cytosine (C) and thiamine (T), when a variant (*) occurred between two SNP on the AC haplotype a few generations back

Figure 7

Figure 3 The number of base positions likely to be found with the appropriate genotype criteria when two to 21 animals are whole genome sequenced for four scenarios: complete genome data (3 billion bases; whole genome), all single nucleotide variants (15 million SNV; all variants), all bases in a 2 megabase (Mb) target region (2 Mb target region) and all SNV in target region (2 Mb target region variants).

Supplementary material: File

Pollott supplementary material

Pollott supplementary material 1

Download Pollott supplementary material(File)
File 293.9 KB