Hostname: page-component-6766d58669-tq7bh Total loading time: 0 Render date: 2026-05-14T23:23:58.076Z Has data issue: false hasContentIssue false

The Weighting is the Hardest Part: On the Behavior of the Likelihood Ratio Test and the Score Test Under a Data-Driven Weighting Scheme in Sequenced Samples

Published online by Cambridge University Press:  27 February 2017

Camelia C. Minică*
Affiliation:
Department of Biological Psychology, Vrije Universiteit, Amsterdam, The Netherlands The EMGO+ Institute for Health and Care Research, Amsterdam, The Netherlands
Giulio Genovese
Affiliation:
The Stanley Center for Psychiatric Research, Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, MA The Program in Medical and Population Genetics, Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, MA The Department of Genetics, Harvard Medical School, Cambridge, MA
Christina M. Hultman
Affiliation:
The Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm
René Pool
Affiliation:
Department of Biological Psychology, Vrije Universiteit, Amsterdam, The Netherlands The EMGO+ Institute for Health and Care Research, Amsterdam, The Netherlands
Jacqueline M. Vink
Affiliation:
Behavioural Science Institute, Radboud University, Nijmegen, The Netherlands
Michael C. Neale
Affiliation:
Department of Biological Psychology, Vrije Universiteit, Amsterdam, The Netherlands Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, USA
Conor V. Dolan
Affiliation:
Department of Biological Psychology, Vrije Universiteit, Amsterdam, The Netherlands The EMGO+ Institute for Health and Care Research, Amsterdam, The Netherlands
Benjamin M. Neale
Affiliation:
The Stanley Center for Psychiatric Research, Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, MA The Program in Medical and Population Genetics, Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, MA The Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
*
address for correspondence: Dr Camelia C. Minică, Department of Biological Psychology, Vrije Universiteit Amsterdam, Transitorium 2B-03, Van der Boechorststraat 1, 1081 BT, Amsterdam, the Netherlands. E-mail: camelia.minica@gmail.com

Abstract

Sequence-based association studies are at a critical inflexion point with the increasing availability of exome-sequencing data. A popular test of association is the sequence kernel association test (SKAT). Weights are embedded within SKAT to reflect the hypothesized contribution of the variants to the trait variance. Because the true weights are generally unknown, and so are subject to misspecification, we examined the efficiency of a data-driven weighting scheme. We propose the use of a set of theoretically defensible weighting schemes, of which, we assume, the one that gives the largest test statistic is likely to capture best the allele frequency–functional effect relationship. We show that the use of alternative weights obviates the need to impose arbitrary frequency thresholds. As both the score test and the likelihood ratio test (LRT) may be used in this context, and may differ in power, we characterize the behavior of both tests. The two tests have equal power, if the weights in the set included weights resembling the correct ones. However, if the weights are badly specified, the LRT shows superior power (due to its robustness to misspecification). With this data-driven weighting procedure the LRT detected significant signal in genes located in regions already confirmed as associated with schizophrenia — the PRRC2A (p = 1.020e-06) and the VARS2 (p = 2.383e-06) — in the Swedish schizophrenia case-control cohort of 11,040 individuals with exome-sequencing data. The score test is currently preferred for its computational efficiency and power. Indeed, assuming correct specification, in some circumstances, the score test is the most powerful test. However, LRT has the advantageous properties of being generally more robust and more powerful under weight misspecification. This is an important result given that, arguably, misspecified models are likely to be the rule rather than the exception in weighting-based approaches.

Information

Type
Articles
Copyright
Copyright © The Author(s) 2017 
Figure 0

FIGURE 1 The power of the likelihood ratio test (LRT) and the score test to detect a gene harboring 50 functional variants, jointly explaining 1% of the phenotypic variance (minor allele frequency 0.5–5%). Data were simulated according to weights dbeta (0.5, 0.5). Power was evaluated in 1,000 datasets consisting of 10,000 individuals each.

Figure 1

TABLE 1 The Empirical 95% Confidence Intervals around the Type I Error for the Restricted Likelihood Ratio Test (LRT) and the Score Test, given data simulated under the null model of no association between the target region and the phenotype

Figure 2

TABLE 2 Results of the gene based analysis run in the Swedish sample (N = 11,040)

Figure 3

TABLE 3 Results of the gene-based analysis run in the Swedish sample (N = 11,040)

Figure 4

TABLE 4 Results of the gene-based analysis run in the Swedish schizophrenia case-control sample (N = 11,040)

Supplementary material: PDF

Minică supplementary material

Tables and Figures

Download Minică supplementary material(PDF)
PDF 205.5 KB