The Healthy Twin Study, Korea (HT) is an ongoing multi-center cohort study, which was initiated in 2005 from a nation-wide twin and family database (Sung et al., Reference Sung, Cho, Cho, Duffy, Kim, Kim and Park2002, Reference Sung, Cho, Song, Lee, Choi, Ha and Kimm2006b). Since its inception, the HT has recruited 815 pairs of adult twins (a total of 3,690 individuals of twins and their families). The protocols and measurements are described in detail in the previous report (Sung et al., Reference Sung, Cho, Lee, Ha, Choi, Choi and Song2006a). In brief, the HT is recruiting adult like-sex twins through a national database, together with their first-degree family members, with no ascertainment by specific disease status. Extended questionnaires and health examinations have been provided upon recruitment and follow-up.
Since 2006, when the first report on study design of the HT was published, the HT has been enriched in several aspects: the size of the registry has increased and now includes 3,690 participants as of July 2012; construction of a biobank has been successful and is tailored for multiple ‘omics’ studies; protocols for follow-up have been established and the second wave survey was conducted in 2011, with a greater than 70% follow-up rate; several additional genome research projects have also been started and will allow multi-dimensional omics data to be generated and the ability to recruit more participants with multiple funding sources. We expect that this update provides information of potential interest both inside and outside of the twin research community.
Updates on Participants
Through continued recruitment from a nation-wide database the Korean Twin Registry using the same procedures described in previous report (Sung et al., Reference Sung, Cho, Cho, Duffy, Kim, Kim and Park2002), we have recruited a total of 3,690 individuals as of July 2012: 45% are like-sex twins, and the rest are their family members. An outline of current participants is described in Table 1. Because twin pairs are also allowed to join without their families, and larger families are encouraged to participate in the study, the size distribution of families is bimodal, with a peak at a family size of four. We believe the protocol of recruiting families along with twins has made the participation of monozygotic (MZ) easier. The HT has more MZ twins (78%) than dizygotic twins (DZ) and more females (61%). Excluding opposite-sex twins by design resulted in the excess of MZs. Zygosity determination was performed using a questionnaire developed by the authors (Song et al., Reference Song, Lee, Lee, Lee, Lee, Hong and Sung2010) or through genetic markers, which are ambiguous by the questionnaire survey (Christiansen et al., Reference Christiansen, Frederiksen, Schousboe, Skytthe, von Wurmb-Schwark, Christensen and Kyvik2003; Song et al., Reference Song, Lee, Lee, Lee, Lee, Hong and Sung2010).
MZ = monozygotic twin; DZ = dizygotic twin; XZ = zygosity undetermined twin. *Zygosity estimation was based on genotype between 2005 and 2009, and based on questionnaire only since 2010. †Zygosity of ‘XZ’ was assigned either in case questionnaire-based zygosity survey showed discrepancy between the co-twins (i.e., MZ vs. DZ) or both co-twins fell in the category of ambiguous zygosity.
Establishment of Biobank for Multi-Omics Studies
The HT was designed initially to enable omics studies, and so establishing a biobank was a priority. For each participant, genomic DNA was extracted and aliquoted; buffy coat fraction, enriched for white blood cells and platelets, was either treated for RNAse inhibitor (RNA later, until 2006), or snap frozen in liquid nitrogen (–180°C; since 2007). Epstein-Barr virus-transfected lymphoblastic B-cell lines have been generated for about 65% of participants as a semi-inexhaustible resource of DNA; plasma and serum are centrifuged within 90 minutes of collection, and two vials of serum and plasma are immediately transferred to a portable liquid nitrogen tank to meet proteomics quality standards. Additionally, 12-hour urine samples are collected along with the information of collection time and total volume, which will be useful for analyzing metabolites and biomarkers of exposure. One vial of urine sample for each participant is also kept in a deep freezer (–80°C), other samples are stored at –25°C.
In 2010, a microbial study was initiated and additional participants were recruited. In this microbial study involving twins and their parents and siblings, samples collected included stool samples (250 mL stool box), sputum, cervical smears (from women who took a Papanicolaou test), two smear samples from skin (dorsal surface or arm and back), and for the oral cavity, supra- and sub-gingival swab samples with mouthwash fluid after washing with clean water. For microbial samples, an aseptic cotton ball or toothpick (oral cavity) were collected together with microbial samples to serve as controls. Separate informed consent was obtained regarding storage of specimens, duration of storage, and use of information and scientific or commercial products that would be generated from those biospecimens. Table 2 describes the summary information of the biobank.
G = genomics; Eg = epigenomics; Tr = transcriptomics; Pr = proteomics; Metab = metabolomics; Mic = microbiome (= metagenomics).
Genotyping and Quality Control (QC) of Genetic Markers
Genomic DNA was extracted from venous blood samples drawn on all participants at their health examinations, and genotyping was performed in 2009 with the Affymetrix Genome-Wide Human SNP Array version 6.0. A conventional QC procedure for dense short nucleotide polymorphism (SNP) markers was carried out (WTCCC, 2007) and additional extensive marker cleaning was performed using familial relationships. In addition, the following SNPs were excluded: duplicated (3,011 SNPs); Hardy–Weinberg disequilibrium p < .001 or minor allele frequency <0.1 (288,426 SNPs); genotype missing rate >0.05 (4,227 SNPs); Mendelian inconsistency in >3 families (11,456 SNPs); and non-Mendelian multi-marker inconsistency in >3 families (47,594 SNPs). These exclusions reduced the total number of SNPs from 891,873 to 537,159.
Combined Asian HapMap Reference and Imputation of SNP Markers
To facilitate collaborative studies involving genome-wide association analyses, a marker imputation was carried out to increase compatibilities with other SNP marker sets. We built a new reference marker set using Asian HapMap3 data (release 2) with 1.39 million SNP markers and Korean HapMap data consisting of 90 unrelated Koreans with 1.66 million SNP markers (http://www.khapmap.org). HapMap3 Asian panel was used to phase the Korean HapMap, resulting in 1.39 million markers of 260 persons. Family-based SNP marker imputation was performed in three steps; first, each individual was treated as unrelated and a conventional method was applied using IMPUTE2 (Marchini & Howie, Reference Marchini and Howie2010); then Mendelian incompatible markers within the family were deleted for all family members; next, only those missing markers of the founders in each family were imputed again using IMPUTE2; finally, family-wise imputation was performed using BEAGLE, where the genetic markers of the founders were used as a reference in each family to impute the non-founders’ missing information. SNPs which had R 2 (ratio of the variance of imputed genotypes to the binomial variance) <0.03 were excluded, resulting in a total of 1,387,466 SNPs. The mean (SD) of the imputation score (r 2) was 0.997 (0.028).
Protocols for Longitudinal Study
The HT started follow-up survey in 2008. Participants are invited to take full health examination and questionnaire survey every 3 years. A third wave follow-up began in 2012. In the second wave, 1,848 participants were re-examined out of 2,602 target individuals (72%; Table 3). Most health examinations and questionnaire-based surveys are repeated in every wave, with the exception of some tests which are measured only after 5-year intervals, such as lung function or carotid artery Doppler scan. Some measurements such as whole body dual-energy X-ray absorptiometry and echocardiogram are only taken at the initial examination. Medical history between the health examinations are too (www.twinkorea.org for detailed follow-up protocols). New participants consisted of the non-participant members of existing families or those recruited from new research projects such as the Korean Microbiome study.
Multiple Omics Study Projects
The HT was initiated by the support of The Genome Research Center of Center for Disease Control, Korea. The Korean Microbiome Project and other research projects are being conducted as parallel projects, which enrich the data, information, and participants of the HT.
The Korean Microbiome Project
The Korean Microbiome project is analogous to the Human Microbiome Project (Gevers et al., Reference Gevers, Knight, Petrosino, Huang, McGuire, Birren and Huttenhower2012), which aims to characterize the microbial communities that live on the human body, and to examine associations between microbial diversity and susceptibilities of human disease (Gevers et al., Reference Gevers, Knight, Petrosino, Huang, McGuire, Birren and Huttenhower2012). The Korean Microbiome Project was designed to recruit mainly twins. It is well known that a twin study design particularly suits a microbiome study, because discordance in microbial profile among MZ pairs will provide reliable evidence of associations by canceling out noise from genomic DNA sequences or unmeasured environmental factors. For more than 200 stool samples of twins, the V2 and V3 regions of bacteria-specific 16S rRNA genes were amplified and sequenced by using the 454 Life Sciences FLX Titanium (Roche, Indianapolis, IN, USA) or Illumina HiSeq (Illumina, San Diego, CA, USA). After removing low-quality sequences (quality score < 25) phylogenetic analyses and taxa allocation were done using Quantitative Insights Into Microbial Ecology (http://qiime.sourceforge.net; Caporaso et al., Reference Caporaso, Kuczynski, Stombaugh, Bittinger, Bushman, Costello and Knight2010). Association studies between microbial profile and obesity, metabolic syndrome are being conducted.
Other Parallel Projects
Recently, the National Research Foundation, Korea commenced a new project involving twins. The Global Research Network program (GRN, 2011–2014), a pilot phase program, accepted a grant to support international collaboration between twin registers. The GRN aims to: (1) facilitate a global twin registry Network; (2) collect obesity-discordant and cardiovascular disease-discordant twin pairs, through the network; (3) identify risk factors and complications of both underweight and obesity using the ‘normal’ weight co-twin as a control (co-twin–control study). Despite the unique strength of studies involving discordant twins, the single most important barrier of co-twin–control studies is the scarcity of them. Identical twins comprise 0.3–0.6% of populations and only a small fraction of them are discordant for the phenotypes of interest. International collaboration will enable the researchers to recruit discordant twin pairs. Some new twin pairs, particularly disease- or health status-discordant MZ twins will be recruited from this program.
More support came from the Next Generation Personalized Medicine Project, Korea (PGM21). This program aims to discover genetic variants associated with common important diseases and conditions, and to apply the genetic information to generate preventive strategies. We are conducting a wide range of genome-wide association studies with not only disease outcomes, but with disease prediction models which can be applied to preventive measures. The PGM21 (2012–2015) will support the analysis of existing data, and will support further recruitment of participants and genotyping, which will be necessary for the replication of findings.
Conclusion
The HT has expanded considerably since the previous report of 2006. With the growing size, modern resources for omics studies, and increase in multi-dimensional information, the HT will be able to serve as valuable resources for twin research, suitable for common disease and risk factor genetic epidemiology studies.
Acknowledgments
This study was supported by the National Genome Research Institute, Korea, National Institute of Health research contract (budgets 2011E7101100, 2012E7100200), National Research Foundation of Korea (NRF 2011-220-E00006; NRF 2010-0029113; NRF 2012K2A1A2032536; and NRF 2010-0025814), PGM21 (A111218-12-GM02). SY, MJK, and YCH were supported by BK21 program. The views expressed in this article are those of the authors and not necessarily any funding body.