Psychometric and Classification Properties of the Peas in a Pod Questionnaire

Ally R. Avery; Eric Turkheimer; Siny Tsang; Glen E. Duncan

doi:10.1017/thg.2020.64

Psychometric and Classification Properties of the Peas in a Pod Questionnaire

Published online by Cambridge University Press: 10 August 2020

Siny Tsang and

Ally R. Avery*: Affiliation:
Department of Nutrition and Exercise Physiology, Washington State University, Spokane, WA, USA
Eric Turkheimer: Affiliation:
Department of Psychology, University of Virginia, Charlottesville, VA, USA
Siny Tsang: Affiliation:
Department of Nutrition and Exercise Physiology, Washington State University, Spokane, WA, USA
Glen E. Duncan: Affiliation:
Department of Nutrition and Exercise Physiology, Washington State University, Spokane, WA, USA
*: Author for correspondence: Ally R. Avery, Email: ally.avery@wsu.edu

Article contents

Abstract
Study 1
Methods
Results
Discussion
Study 2
Methods
Results
Discussion
Overall Discussion
Supplementary material
References

Abstract

We examined the item properties of the Two Peas Questionnaire (TPQ) among a sample of same-sex twin pairs from the Washington State Twin Registry. With the exception of the ‘two peas’ item, three of the mistakenness items showed differential item functioning. Results showed that the monozygotic (MZ) and dizygotic (DZ) pairs may differ in their responses on these items, even among those with similar latent traits of similarity and confusability. Upon comparing three classification methods to determine the zygosity of same-sex twins, the overall classification accuracy rate was over 90% using the unit-weighted pair zygosity sum score, providing an efficient and sufficiently accurate zygosity classification. Given the inherent nature of twin-pair similarity, the TPQ is more accurate in the identification of MZ than DZ pairs. We conclude that the TPQ is a generally accurate, but by no means infallible, method of determining zygosity in twins who have not been genotyped.

Keywords

Zygosity item factor analysis differential item functioning classification latent class analysis

Information

Type: Articles
Information: Twin Research and Human Genetics , Volume 23 , Issue 4 , August 2020 , pp. 247 - 255

DOI: https://doi.org/10.1017/thg.2020.64 [Opens in a new window]
Copyright: © The Author(s), 2020. Published by Cambridge University Press

The earliest twin studies (e.g. Merriman, Reference Merriman1924), conducted before World War II, were based on small samples that were studied in person by the investigator. Zygosity could be determined based on either clinical impression or blood groups. It was only when large twin registries were established in Scandinavia that it became necessary to diagnose zygosity remotely using self-report questionnaires. The first systematic approach to the problem was undertaken in the Swedish Twin Register (STR; Magnusson et al., Reference Magnusson, Almqvist, Rahman, Ganna, Viktorin, Walum and Lichtenstein2013), who asked participants whether they were as ‘lika som bär’ (alike as berries). This is why the logo of the STR is a pair of cherries (‘korsbar’ in Swedish).

English versions of the Swedish questionnaire translated the expression as ‘alike as two peas in a pod’, and the peas in a pod question have in the years since become the centerpiece of zygosity questionnaires, which are often known as ‘peas in a pod questionnaires’. Although no universal standard for such questionnaires has ever emerged, the item about peas is usually combined with a series of questions about whether the twins are confused by parents, family members and acquaintances. Many studies have demonstrated that self-report questionnaires of this kind can make accurate decisions about zygosity when validated against blood markers or genotyping (accuracy rate ranges from 92.4% to 98.8%; e.g. Eisen et al., Reference Eisen, Neuman, Goldberg, Rice and True1989; Forsberg et al., Reference Forsberg, Goldberg, Sporleder and Smith2010; Jackson et al., Reference Jackson, Snieder, Davis and Treiber2001; Jarrar et al., Reference Jarrar, Ward, Mangino, Cherkas, Gill, Gillham-Nasenya and Spector2018; Magnus et al., Reference Magnus, Berg and Nance1983; Magnusson et al., Reference Magnusson, Almqvist, Rahman, Ganna, Viktorin, Walum and Lichtenstein2013; Ohm Kyvik & Derom, Reference Ohm Kyvik and Derom2006; Reed et al., Reference Reed, Plassman, Tanner, Dick, Rinehart and Nichols2005; Song et al., Reference Song, Lee, Lee, Lee, Lee, Hong, Han and Sung2010). There is no universal standard for these items, and twin researchers and/or registries have used various forms of these collection of items to assess zygosity. In this article, we refer to our particular version of the questionnaire as the Two Peas Questionnaire (TPQ).

It is somewhat surprising that no systematic examination of the psychometrics of the TPQ has ever been conducted. In fact, the questionnaire is more than just a simple list of questions that can be used with a cutoff to diagnose zygosity; it is a psychological measurement instrument, designed to measure self-reported subjective impressions of similarity and confusability. The validity of the questionnaire as a tool for classification is closely tied to its measurement properties.

There are several reasons to expect that the psychometrics of the TPQ and its application to classification would be less than perfectly straightforward. First, the questionnaire is by design administered to disparate groups of individuals, that is, monozygotic (MZ) and dizygotic (DZ) twins, who might be expected to have different reactions to questions about their similarity and confusability. Second, there is an asymmetry in the way biological differences reflect on zygosity; even small differences are sufficient to demonstrate that a pair of twins is DZ, whereas a high degree of similarity is not sufficient to demonstrate that a pair is MZ. For example, twin pairs with different eye colors are almost certainly DZ, but pairs with the same eye color are not certain to be MZ. This asymmetry leads to an expectation of a difference in the distribution of responses to the TPQ in MZ and DZ twins. When the questionnaire is used as a classification instrument, it will usually be the case that prior probabilities favor a pair being MZ. Identical twins are often easier to ascertain within twin samples, but even if this is not the case in a particular sample, opposite-sex twins will be DZ twins and can be classified without the use of the questionnaire. Finally, there is reason to expect that responses to the questionnaire will vary according to age. Both classic (Scarr & McCartney, Reference Scarr and McCartney1983) and more recent (Beam & Turkheimer, Reference Beam and Turkheimer2013) analyses show that twins become more different as they age, and that DZ pairs do so more rapidly than MZ pairs.

We report a series of psychometric and classificatory analyses in a large sample of twins who have been administered a TPQ, and a smaller subsample who have been genotyped to provide a biological criterion for zygosity. We estimate item factor analysis (IFA) parameters for the psychometric properties of the questionnaire in the MZ and DZ groups and use them to identify differential item functioning (DIF) across groups. We then estimate the distributions of the latent similarity parameters in the two groups and explore several classification models based on the IFA model and methods based on latent class analysis (LCA).

Study 1

The primary goal of study 1 was to examine the item parameters of the TPQ among a sample of same-sex adult twin pairs with DNA-based zygosity. We used IFA models to examine potential DIF in the TPQ items between MZ and DZ twin pairs. IFA models describe the association between the latent trait level (i.e. underlying trait of being identical) and item scores (i.e. scores on the TPQ), allowing DIF analyses that are not affected by potential differences in the latent trait distributions across groups (Embretson & Reise, Reference Embretson and Reise2000).

Methods

Participants

The current study utilized data from 753 same-sex adult twin pairs (33.9% men, 66.1% women) enrolled in the Washington State Twin Registry (WSTR) with DNA-based zygosity (72.4% MZ, 27.6% DZ). The WSTR is a community-based registry of twin pairs primarily recruited through Washington State Department of Licensing records. Details regarding the recruitment procedures of the WSTR and additional information are reported elsewhere (Duncan et al., Reference Duncan, Avery, Strachan, Turkheimer and Tsang2019). Participants in this study were recruited into the WSTR between 2002 and 2014.

DNA Determination of Zygosity

DNA was extracted from twins using either whole blood or saliva (buccal cells). Zygosity was determined by using either the AmpFlSTR® Identifiler® Plus PCR Amplification Kit or the PowerPlex® 16 HS System, per manufacturer’s instructions. The two methods are nearly identical (Hannelius et al., Reference Hannelius, Gherman, MäkeläLindstedt, Lindstedt, Zucchelli, Lagerberg and Lindgren2007; Yang et al., Reference Yang, Tzeng, Tseng and Huang2006). These kits are short tandem repeat multiplex assays that amplify 15 tetranucleotide repeat loci and the amelogenin sex-determining marker in a single PCR amplification. Thirteen of the required loci (CSF1PO, FGA, TH01, TPOX, vWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51 and D21S11) for the Combined DNA Index System are included (Budowle et al., Reference Budowle, Moretti, Baumstark, Defenbaugh and Keys1999). Two additional loci, D2S1338 and D19S433, are included. The combination of these 15 loci along with the amelogenin marker is consistent with zygosity tests conducted elsewhere (Yang et al., Reference Yang, Tzeng, Tseng and Huang2006). When comparing the twins with one another, DZ twins match on 25%−75% of the sites, whereas MZ twins match on 100% of the sites. Zygosity determination for twin pairs in this study was performed between 2009 and 2017.

Two Peas Questionnaire

Five items about childhood similarity were included in the WSTR enrollment survey. The ‘two-peas’ item, ‘When you were children, were you and your twin as alike as two peas in a pod or of ordinary family resemblance?’, has been used by twin registries for many years and is a reliable predictor of zygosity (Eisen et al., Reference Eisen, Neuman, Goldberg, Rice and True1989; Magnus et al., Reference Magnus, Berg and Nance1983; Reed et al., Reference Reed, Plassman, Tanner, Dick, Rinehart and Nichols2005; Sarna et al., Reference Sarna, Kaprio, Sistonen and Koskenvuo1978). Four mistakenness items ask, ‘When you were children, did the following people (parents, other relatives, teachers, and strangers) have difficulty telling you and your twin apart?’ (Buchwald et al., Reference Buchwald, Herrell, Ashton, Belcourt, Schmaling and Goldberg1999; Eisen et al., Reference Eisen, Neuman, Goldberg, Rice and True1989; Magnus et al., Reference Magnus, Berg and Nance1983; Reed et al., Reference Reed, Plassman, Tanner, Dick, Rinehart and Nichols2005). There are four response categories for each of the mistakenness items (1 = never confused, 2 = rarely confused, 3 = sometimes confused, 4 = always confused). For ease of interpretation, these four mistakenness items are subsequently referred to as ‘parents’, ‘relatives’, ‘teachers’ and ‘strangers’, respectively.

Statistical Analysis

We used IFA to estimate the item parameters of the 10 items (i.e. 5 items from each twin, 10 items per twin pair) in the TPQ. The 10 items were operationalized as indicators of the underlying latent trait (θ) of being similar and easily confused (i.e. more MZ-like), with higher levels reflecting stronger endorsement of being identical, whereas lower levels reflecting endorsement of being less identical. Considering that the items in the TPQ consist of ordinal response options, IFA is an alternative to the common linear factor model when item responses are categorical in nature (Wirth & Edwards, Reference Wirth and Edwards2007). One factor-loading parameter was estimated for each of the five items (λ₁ – λ₅). One threshold parameter (τ₁) was estimated for the dichotomous ‘two peas’ item, and three threshold parameters (τ₂₁, τ₂₂, τ₂₃, ... τ₅₁, τ₅₂ and τ₅₃) were estimated for each of the remaining four items, each with four response categories. All factor loadings and threshold parameters were constrained to be the same within twin pairs, and item covariances within twin pairs were allowed to differ between MZ and DZ twin pairs. Participants were designated as MZ and DZ using DNA-based zygosity.

First, we fit a ‘free-baseline’ model in which the factor loadings of a reference item (our selection of the reference item is described below) were fixed to 1, and the threshold parameters were constrained to be equal between MZ and DZ pairs (Stark et al., Reference Stark, Chernyshenko and Drasgow2006). The factor loadings and threshold parameters for the remaining four items were allowed to differ between MZ and DZ pairs. In order to detect items with DIF, we fit four constrained models where, in addition to the reference item, factor loadings and threshold parameters of each item, one at a time, were simultaneously constrained to be equal between MZ and DZ twins. Items with DIF were identified by comparing the changes in chi-square statistics. To control for type I errors due to multiple comparisons, a Bonferroni-corrected critical p value (.05/4 = .0125) was used.

To identify the reference item(s), we fit a fully constrained model in which the factor loadings and threshold parameters of all items were constrained to be equal between MZ and DZ pairs. Next, we fit a series of augmented models by freeing the factor loadings and threshold parameters one item at a time. The item(s) that did not result in a statistically significant increase in model fit when the parameters were allowed to differ between MZ and DZ twins was identified as the reference item(s) (Stark et al., Reference Stark, Chernyshenko and Drasgow2006). To control for type I errors due to multiple comparisons, a Bonferroni-corrected critical p value of (.05/5 = .01) was used.

Model fit indices reported include the comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA) and standardized root mean squared residual. Descriptive statistics were performed using R version 3.5.3 (R Development Core Team, 2015), and IFA models were performed using Mplus version 8.1 (Muthén & Muthén, Reference Muthén and Muthén2012).

Results

Descriptive Statistics

Of the 753 pairs of same-sex twins in this study, there were 545 (72.4%) MZ and 208 (27.6%) DZ twin pairs as determined by genotyping. Selected demographic characteristics of twin pairs in this study are presented in Table 1.

Table 1. Selected demographic characteristics of the Washington State Twin Registry (WSTR) twin pairs included in this study

MZ, monozygotic twins; DZ, dizygotic twins.

Descriptive statistics of the five TPQ items are shown in Table 2. For the ‘two peas’ item, most of the MZ twins (93%) reported that they were ‘as alike as two peas in a pod’, whereas the majority of the DZ twins (84%) responded that they were ‘of ordinary family resemblance’ when they were children. Concordance rates of the ‘two peas’ item are presented in Supplementary Table 1. For the four mistakenness items, larger proportions of MZ twins reported being confused by teachers and strangers (68% and 91% always confused, respectively) than by parents and other relatives (12% and 49% always confused, respectively) when they were children. On the other hand, small proportions of DZ twins reported being confused by teachers and strangers (11% and 20% always confused, respectively), and even smaller proportions reported being confused by parents and other relatives (3% and 6% always confused, respectively).

Table 2. Descriptive statistics of the Two Peas Questionnaire items (individual twin’s responses)

MZ, monozygotic twins; DZ, dizygotic twins; unknown, twin pairs with no DNA-based zygosity.

Note: Two peas: When you were children, were you and your twin as alike as two peas in a pod or of ordinary family resemblance? Parents: When you were children, how often did your parents had difficulty telling you apart? Relatives: When you were children, how often did other relatives had difficulty telling you apart? Teachers: When you were children, how often did teachers had difficulty telling you apart? Strangers: When you were children, how often did strangers had difficulty telling you apart?

Differential Item Functioning

Identify reference item

In order to identify the reference item, we fit a fully constrained model in which the factor loadings and threshold parameters of all items were constrained to be equal between MZ and DZ twins. The model was of acceptable fit (CFI = .980, TLI = .975, RMSEA = .067, 90% CI = .055, .078, SRMS = .060). Next, we fit a series of augmented models in which, one item at a time, the factor loadings and threshold parameters were simultaneously allowed to differ between MZ and DZ twins. Chi-square tests showed that there was no statistically significant improvement in model fit when the parameters for the ‘peas’ or ‘strangers’ item were allowed to differ between MZ and DZ twin pairs (Supplementary Table 2). Considering that the change in model fit was the smallest when the parameters for the ‘strangers’ item differed between MZ and DZ twins, the ‘strangers’ item was used as the reference item in the subsequent analyses.

Test for DIF

To test for DIF among self-report zygosity items, we first fit a ‘free baseline’ model where the factor loadings of the ‘strangers’ item (the reference item identified above) were fixed to 1, and the threshold parameters were constrained to be equal between MZ and DZ. The factor loadings and threshold parameters for the remaining four items — ‘two peas’, ‘parents’, ‘relatives’ and ‘teachers’ — were allowed to differ between MZ and DZ pairs. As shown in Table 3, the model fit was good (CFI = .990, TLI = .985, RMSEA = .052, 90% CI = .038, .066, SRMS = .054) and was a better fit than the fully constrained model, χ²(14) = 70.107, p < .001.

Table 3. Estimated factor loadings and thresholds of the free-baseline model for the self-report zygosity items

MZ, monozygotic twins; DZ, dizygotic twins; SE, standard error; RMSEA, root mean square error approximation; CFI, comparative fit Index; TLI, Tucker-Lewis index; SRMR, standardized root mean square residual. Note: Only parameters of one twin are shown here, as all item parameters are constrained to be the same within twin pairs. ‘Strangers’ is used as the referent item, with the factor loadings fixed to 1 and threshold parameters constrained to be equal between MZ and DZ twins.

Next, we fit four constrained models in which, one item at a time, in addition to the ‘strangers’ item, factor loadings and threshold parameters of each item were simultaneously constrained to be equal between MZ and DZ pairs. The model fit of these constrained models was compared against the ‘free-baseline’ model using chi-square tests (Supplementary Table 2). With the exception of the ‘two peas’ item, there was a statistically significant decrease in model fit when the item parameters were constrained to be equal between MZ and DZ pairs, suggesting DIF between MZ and DZ twins in the ‘parents’, ‘relatives’ and ‘teachers’ items.

We illustrate the similar item functioning (i.e. no DIF) of ‘two peas’ for MZ and DZ twins using category response curves (CRCs). As shown in Figure 1, the probabilities that MZ and DZ twins responded they were ‘two peas in a pod’ or ‘of ordinary resemblance’ were similar. For example, at θ = 0 (i.e. the average latent trait level of similarity and confusability), there was a 98.8% chance that MZ twins responded they were ‘two peas in a pod’, but only 1.2% chance that they identified themselves as ‘of ordinary resemblance’. At the same latent trait level (θ = 0), DZ twins were also more likely to respond that they were ‘two peas in a pod’ (90.7%) and less likely to identify themselves as of ‘ordinary resemblance’ (9.3%).

Fig. 1. Category response curves (CRCs) of the ‘two peas’ item by zygosity among twin pairs with DNA-based zygosity.

DIFs of the other three items are illustrated using CRCs (Supplementary Figure 1). Among twins with similar levels of the latent trait of being identical, DZ twins were more likely to respond that other people had difficulty telling them apart than MZ twins. For instance, at θ = 0, MZ twins were likely to respond that they were ‘rarely confused’ (38.4%) and ‘sometimes confused’ (31.6%) by parents, whereas DZ twins were more likely to respond that they were ‘always confused’ (43.1%) by parents. Likewise, MZ twins at θ = 0 were more likely to respond that they are ‘always confused’ (48.2%) or ‘sometimes confused’ (45.6%) by relatives, whereas DZ twins at θ = 0 were most likely to respond that they are ‘always confused’ by relatives.

Discussion

In study 1, we estimated the item parameters for the TPQ items using IFA models and examined whether there was DIF between MZ and DZ twin pairs. Results showed no loss of model fit when the ‘two peas’ item parameters were constrained to be equal across zygosity, suggesting the ‘two peas’ item functions similarly for MZ and DZ twins. Our analyses showed DIF in three of the mistakenness items on the TPQ, ‘parents’, ‘relatives’ and ‘teachers’. For these items, the probabilities of responses may differ not only by individuals’ underlying trait of being similar and confusable (i.e. more MZ-like or more DZ-like) but also by their actual zygosity (i.e. true MZ or true DZ twins, based on genotyping).

When twin pairs’ responses are used to classify twins with unknown zygosity into MZ or DZ pairs, it is possible that DIF in TPQ items may affect which twin pairs are assigned as MZ or DZ twin pairs. We followed up the current findings with a second study in which we explored several classification methods for zygosity assignment to establish an effective method to determine zygosity assignments among twin pairs that have not yet been genotyped.

Study 2

In study 2, we aimed to investigate three classification methods used to assign twins into MZ and DZ pairs, based on their responses on the TPQ. Zygosity of twin pairs was classified based on their unit-weighted pair zygosity sum (PZS) score, item response probabilities from an IFA model and item response probabilities from a LCA model.