Skip to main content Accesibility Help
×
×
Home

The Weighting is the Hardest Part: On the Behavior of the Likelihood Ratio Test and the Score Test Under a Data-Driven Weighting Scheme in Sequenced Samples

  • Camelia C. Minică (a1) (a2), Giulio Genovese (a3) (a4) (a5), Christina M. Hultman (a6), René Pool (a1) (a2), Jacqueline M. Vink (a7), Michael C. Neale (a1) (a8), Conor V. Dolan (a1) (a2) and Benjamin M. Neale (a3) (a4) (a9)...
Abstract

Sequence-based association studies are at a critical inflexion point with the increasing availability of exome-sequencing data. A popular test of association is the sequence kernel association test (SKAT). Weights are embedded within SKAT to reflect the hypothesized contribution of the variants to the trait variance. Because the true weights are generally unknown, and so are subject to misspecification, we examined the efficiency of a data-driven weighting scheme. We propose the use of a set of theoretically defensible weighting schemes, of which, we assume, the one that gives the largest test statistic is likely to capture best the allele frequency–functional effect relationship. We show that the use of alternative weights obviates the need to impose arbitrary frequency thresholds. As both the score test and the likelihood ratio test (LRT) may be used in this context, and may differ in power, we characterize the behavior of both tests. The two tests have equal power, if the weights in the set included weights resembling the correct ones. However, if the weights are badly specified, the LRT shows superior power (due to its robustness to misspecification). With this data-driven weighting procedure the LRT detected significant signal in genes located in regions already confirmed as associated with schizophrenia — the PRRC2A (p = 1.020e-06) and the VARS2 (p = 2.383e-06) — in the Swedish schizophrenia case-control cohort of 11,040 individuals with exome-sequencing data. The score test is currently preferred for its computational efficiency and power. Indeed, assuming correct specification, in some circumstances, the score test is the most powerful test. However, LRT has the advantageous properties of being generally more robust and more powerful under weight misspecification. This is an important result given that, arguably, misspecified models are likely to be the rule rather than the exception in weighting-based approaches.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      The Weighting is the Hardest Part: On the Behavior of the Likelihood Ratio Test and the Score Test Under a Data-Driven Weighting Scheme in Sequenced Samples
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      The Weighting is the Hardest Part: On the Behavior of the Likelihood Ratio Test and the Score Test Under a Data-Driven Weighting Scheme in Sequenced Samples
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      The Weighting is the Hardest Part: On the Behavior of the Likelihood Ratio Test and the Score Test Under a Data-Driven Weighting Scheme in Sequenced Samples
      Available formats
      ×
Copyright
Corresponding author
address for correspondence: Dr Camelia C. Minică, Department of Biological Psychology, Vrije Universiteit Amsterdam, Transitorium 2B-03, Van der Boechorststraat 1, 1081 BT, Amsterdam, the Netherlands. E-mail: camelia.minica@gmail.com
Footnotes
Hide All
*

These authors contributed equally to this work

Footnotes
References
Hide All
Aberg, K. A., Liu, Y., Bukszár, J., McClay, J. L., Khachane, A. N., Andreassen, O. A., . . . Gurling, H. (2013). A comprehensive family-based replication study of schizophrenia genes. JAMA Psychiatry, 70, 573581.
Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova, A., Bork, P., . . . Sunyaev, S. R. (2010). A method and server for predicting damaging missense mutations. Nature Methods, 7, 248249.
Basilevsky, A. (1983). Applied matrix algebra in the statistical sciences. New York: Elsevier Science Publishing.
Chen, H., Meigs, J. B., & Dupuis, J. (2013). Sequence kernel association test for quantitative traits in family samples. Genetic Epidemiology, 37, 196204.
Cohen, J. C., Boerwinkle, E., Mosley, T. H. Jr, & Hobbs, H. H. (2006). Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. New England Journal of Medicine, 354, 12641272.
Cruchaga, C., Karch, C. M., Jin, S. C., Benitez, B. A., Cai, Y., Guerreiro, R., . . . Bertelsen, S. (2014). Rare coding variants in the phospholipase D3 gene confer risk for Alzheimer/'s disease. Nature, 505, 550554.
Dalman, C., Broms, J., Cullberg, J., & Allebeck, P. (2002). Young cases of schizophrenia identified in a national inpatient register. Social Psychiatry and Psychiatric Epidemiology, 37, 527531.
Darnell, J. C., Van Driesche, S. J., Zhang, C., Hung, K. Y. S., Mele, A., Fraser, C. E., . . . Chi, S. W. (2011). FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell, 146, 247261.
Davies, R. (1980). The distribution of a linear combination of chi-square random variables. Journal of the Royal Statistical Society, Series C (Applied Statistics), 29, 323333.
Daye, Z. J., Li, H., & Wei, Z. (2012). A powerful test for multiple rare variants association studies that incorporates sequencing qualities. Nucleic Acids Research, 40, e60–e60.
DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., . . . Hanna, M. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43, 491498.
Eyre-Walker, A., & Keightley, P. D. (2007). The distribution of fitness effects of new mutations. Nature Reviews Genetics, 8, 610618.
Franić, S., Dolan, C. V., Broxholme, J., Hu, H., Zemojtel, T., Davies, G. E., . . . Hottenga, J.-J. (2015). Mendelian and polygenic inheritance of intelligence: A common set of causal genes? Using next-generation sequencing to examine the effects of 168 intellectual disability genes on normal-range intelligence. Intelligence, 49, 1022.
Fromer, M., Pocklington, A. J., Kavanagh, D. H., Williams, H. J., Dwyer, S., Gormley, P., . . . Ruderfer, D. M. (2014). De novo mutations in schizophrenia implicate synaptic networks. Nature, 506, 179184.
Genovese, G., Fromer, M., Stahl, E. A., Ruderfer, D. M., Chambert, K., Landén, M., . . . Sullivan, P. F. (2016). Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia. Nature Neuroscience, 19, 14331441.
Greene, W. H. (2003). Econometric analysis. Upper Saddle River, NJ: Prentice Hall.
Huyghe, J. R., Jackson, A. U., Fogarty, M. P., Buchkovich, M. L., Stančáková, A., Stringham, H. M., . . . Cederberg, H. (2013). Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nature Genetics, 45, 197201.
Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum, J. D., & Lin, X. (2013). Sequence kernel association tests for the combined effect of rare and common variants. The American Journal of Human Genetics, 92, 841853.
Iossifov, I., Ronemus, M., Levy, D., Wang, Z., Hakker, I., Rosenbaum, J., . . . Leotta, A. (2012). De novo gene disruptions in children on the autistic spectrum. Neuron, 74, 285299.
Kristjansson, E., Allebeck, P., & Wistedt, B. (1987). Validity of the diagnosis schizophrenia in a psychiatric inpatient register: A retrospective application of DSM-III criteria on ICD-8 diagnoses in stockholm county. Nordic Journal of Psychiatry, 41, 229234.
Kryukov, G. V., Pennacchio, L. A., & Sunyaev, S. R. (2007). Most rare missense alleles are deleterious in humans: Implications for complex disease and association studies. The American Journal of Human Genetics, 80, 727739.
Lee, S., Wu, M. C., & Lin, X. (2012). Optimal tests for rare variant effects in sequencing association studies. Biostatistics, 13, 762775.
Li, B., & Leal, S. M. (2008). Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data. The American Journal of Human Genetics, 83, 311321.
Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics, 25, 17541760.
Lippert, C., Xiang, J., Horta, D., Widmer, C., Kadie, C., Heckerman, D., & Listgarten, J. (2014). Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics, 30, 32063214.
Listgarten, J., Lippert, C., Kang, E. Y., Xiang, J., Kadie, C. M., & Heckerman, D. (2013). A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics, 29, 15261533.
Lohmueller, K. E., Sparsø, T., Li, Q., Andersson, E., Korneliussen, T., Albrechtsen, A., . . . Kiil, K. (2013). Whole-exome sequencing of 2,000 Danish individuals and the role of rare coding variants in type 2 diabetes. The American Journal of Human Genetics, 93, 10721086.
Madsen, B. E., & Browning, S. R. (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genetics, 5, e1000384.
Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., . . . Visscher, P. M. (2009). Finding the missing heritability of complex diseases. Nature, 461, 747753.
Mather, K., & Jinks, J. L. (1977). Introduction to biometrical genetics. Ithaca, NY: Cornell University Press.
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., . . . Daly, M. (2010). The genome analysis toolkit: A mapreduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20, 12971303.
Neale, M. C., Hunter, M. D., Pritikin, J. N., Zahery, M., Brick, T. R., Kirkpatrick, R. M., . . . Boker, S. M. (2016). OpenMx 2.0: Extended structural equation and statistical modeling. Psychometrika, 81, 535549.
Neale, M., & Cardon, L. (1992). Methodology for genetic studies of twins and families. Springer Science & Business Media.
Ng, P. C., & Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research, 31, 38123814.
Peloso, G. M., Auer, P. L., Bis, J. C., Voorman, A., Morrison, A. C., Stitziel, N. O., . . . Fornage, M. (2014). Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks. The American Journal of Human Genetics, 94, 223232.
Price, A. L., Kryukov, G. V., de Bakker, P. I., Purcell, S. M., Staples, J., Wei, L.-J., & Sunyaev, S. R. (2010). Pooled association tests for rare variants in exon-resequencing studies. The American Journal of Human Genetics, 86, 832838.
Pritchard, J. K. (2001). Are rare variants responsible for susceptibility to complex diseases?. The American Journal of Human Genetics, 69, 124137.
Purcell, S. M., Moran, J. L., Fromer, M., Ruderfer, D., Solovieff, N., Roussos, P., . . . Kähler, A. (2014). A polygenic burden of rare disruptive mutations in schizophrenia. Nature, 506, 185190.
Ripke, S., O'Dushlaine, C., Chambert, K., Moran, J. L., Kähler, A. K., Akterin, S., . . . Fromer, M. (2013). Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nature Genetics, 45, 11501159.
Samocha, K. E., Robinson, E. B., Sanders, S. J., Stevens, C., Sabo, A., McGrath, L. M., . . . Kirby, A. (2014). A framework for the interpretation of de novo mutation in human disease. Nature Genetics, 46, 944950.
Schizophrenia Working Group of the Psychiatric Genomics Consortium. (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511, 421427.
Schork, N. J., Murray, S. S., Frazer, K. A., & Topol, E. J. (2009). Common vs. rare allele hypotheses for complex diseases. Current Opinion in Genetics & Development, 19, 212219.
Shlyakhter, I., Sabeti, P. C., & Schaffner, S. F. (2014). Cosi2: An efficient simulator of exact and approximate coalescent with selection. Bioinformatics, 30, 34273429.
Speed, D., Hemani, G., Johnson, M. R., & Balding, D. J. (2012). Improved heritability estimation from genome-wide SNPs. The American Journal of Human Genetics, 91, 10111021.
Svishcheva, G. R., Belonogova, N. M., & Axenovich, T. I. (2014). FFBSKAT: Fast family-based sequence kernel association test. PloS One, 9, e99407.
Tang, Z.-Z., & Lin, D.-Y. (2015). Meta-analysis for discovering rare-variant associations: Statistical methods and software programs. The American Journal of Human Genetics, 97, 3553.
Teslovich, T. M., Musunuru, K., Smith, A. V., Edmondson, A. C., Stylianou, I. M., Koseki, M., . . . Willer, C. J. (2010). Biological, clinical and population relevance of 95 loci for blood lipids. Nature, 466, 707713.
Van der Auwera, G. A., Carneiro, M. O., Hartl, C., Poplin, R., del Angel, G., Levy‐Moonshine, A., . . . Thibault, J. (2013). From fastq data to high confidence variant calls: The genome analysis toolkit best practices pipeline. Current Protocols in Bioinformatics, 43, 11.10.1–11.10.33.
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S. New York, NY: Springer Science & Business Media.
Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M., & Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics, 89, 8293.
Xu, C., Tachmazidou, I., Walter, K., Ciampi, A., Zeggini, E., & Greenwood, C. M. (2014). Estimating genome-wide significance for whole-genome sequencing studies. Genetic Epidemiology, 38, 281290.
Yang, J., Lee, S. H., Goddard, M. E., & Visscher, P. M. (2011). GCTA: A tool for genome-wide complex trait analysis. The American Journal of Human Genetics, 88, 7682.
Zeng, P., Zhao, Y., Liu, J., Liu, L., Zhang, L., Wang, T., . . . Chen, F. (2014). Likelihood ratio tests in rare variant detection for continuous phenotypes. Annals of Human Genetics, 78, 320332.
Zhan, X., Larson, D. E., Wang, C., Koboldt, D. C., Sergeev, Y. V., Fulton, R. S., . . . Bragg-Gresham, J. (2013). Identification of a rare coding variant in complement 3 associated with age-related macular degeneration. Nature Genetics, 45, 13751379.
Zuk, O., Schaffner, S. F., Samocha, K., Do, R., Hechter, E., Kathiresan, S., . . . Lander, , E., S. (2014). Searching for missing heritability: Designing rare variant association studies. Proceedings of the National Academy of Sciences, 111, e455e464.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Twin Research and Human Genetics
  • ISSN: 1832-4274
  • EISSN: 1839-2628
  • URL: /core/journals/twin-research-and-human-genetics
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

Type Description Title
PDF
Supplementary materials

Minică supplementary material
Tables and Figures

 PDF (205 KB)
205 KB

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed