TwinsUK: The UK Adult Twin Registry Update

Abstract TwinsUK is the largest cohort of community-dwelling adult twins in the UK. The registry comprises over 14,000 volunteer twins (14,838 including mixed, single and triplets); it is predominantly female (82%) and middle-aged (mean age 59). In addition, over 1800 parents and siblings of twins are registered volunteers. During the last 27 years, TwinsUK has collected numerous questionnaire responses, physical/cognitive measures and biological measures on over 8500 subjects. Data were collected alongside four comprehensive phenotyping clinical visits to the Department of Twin Research and Genetic Epidemiology, King’s College London. Such collection methods have resulted in very detailed longitudinal clinical, biochemical, behavioral, dietary and socioeconomic cohort characterization; it provides a multidisciplinary platform for the study of complex disease during the adult life course, including the process of healthy aging. The major strength of TwinsUK is the availability of several ‘omic’ technologies for a range of sample types from participants, which includes genomewide scans of single-nucleotide variants, next-generation sequencing, metabolomic profiles, microbiomics, exome sequencing, epigenetic markers, gene expression arrays, RNA sequencing and telomere length measures. TwinsUK facilitates and actively encourages sharing the ‘TwinsUK’ resource with the scientific community — interested researchers may request data via the TwinsUK website (http://twinsuk.ac.uk/resources-for-researchers/access-our-data/) for their own use or future collaboration with the study team. In addition, further cohort data collection is planned via the Wellcome Open Research gateway (https://wellcomeopenresearch.org/gateways). The current article presents an up-to-date report on the application of technological advances, new study procedures in the cohort and future direction of TwinsUK.

TwinsUK is the largest adult twin registry in the UK and is one of the most deeply phenotyped and genotyped datasets in the world. It provides a multidisciplinary platform to research both health-and social-related questions, with the overarching aim of understanding the etiology of complex disease and the aging process. The registry was started in 1992, with the initial intention to investigate osteoporosis and osteoarthritis. Such conditions are highly prevalent in women, and consequently, several hundred middle-aged women were recruited and formed the core of the initial register. Success from these early studies led to a rapid expansion of TwinsUK, and to date the cohort consists of 14,000 community-dwelling twins, male and female, aged over 18 years. Current research areas of interest include the genetics of metabolic syndrome, cardiovascular disease, the musculoskeletal system, sensory impairment and aging, as well as how the microbiome affects human health. Details of the registry's progression have been described previously (Moayyeri et al., 2013;Spector & Williams, 2006). To date, the TwinsUK registry has contributed to over 850 publications and 800 international collaborations. More detailed description of research outputs may be accessed through the study website: http://www.twinsuk.ac.uk.

The Collection
Over the last 27 years, the TwinsUK registry has been enhanced through over 80 studies, some of which have been repeated over time. This has resulted in clinically rich, longitudinal phenotype information (Table 1), which may be categorized into four distinct time points (Verdi et al., 2019). Recruitment strategies have predominantly involved media campaigns. These have offered opportunities for adult twin pairs to join the registry and participate in unspecific research investigating various common diseases, without selecting for particular diseases or traits. At baseline (1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004), over 7000 twins responded to annual questionnaires, and approximately 5500 twins attended a full comprehensive clinical visit, which included several project-led studies. Age-matched characteristics of these volunteer twins were found not to differ from a singleton population-based cohort of British women (Chingford study; Andrew et al., 2001), apart from a lifelong lower weight in monozygotic (MZ) twins of approximately 1 kg. The follow-up visit occurred between April 2004 and May 2007, in which 3725 twins in the registry attended a full day clinical visit, and an additional 1299 twins posted blood taken via their GPs for DNA sampling. Participants ranged aged between 18 and 82 years (mean 52.5 ± 13 years) and the majority were female (89%). Protocols for the baseline and initial follow-up visit have been described previously (Spector & Williams, 2006).
The second wave of follow-up visits (August 2007-April 2012) aimed to investigate the aging process; Healthy Ageing Twin Study (HATS). Inclusion criteria were women aged ≥40 years with at least one previous clinical visit (n = 4610). In total, 3125 women (mean age 59.6 ± 9 years) attended the clinical visit. Follow-up time between first and last visits ranged between 6.1 and 17.4 years, with over 600 of the participants having 4 or more previous clinical visits. HATS outcomes have previously been described (Moayyeri et al., 2012), including details of data collection (Moayyeri et al., 2013).
The third wave of follow-up visits (May 2012 -May 2018) was performed to understand the interactions in disease processes between genes and the environment, as part of the Biomedical Research Centre (BRC) study. All participants of the TwinsUK registry were invited to attend a comprehensive clinical visit, which included collection of bone density/whole-body scan, cognitive and lung function, hearing and eye tests, fitness assessment (gait speed, chair stands and grip strength) and collection of blood, urine, stool and salivary samples. In total, 6686 clinical visits were made, with 3620 volunteers attending at least once and 1531 volunteers attended the clinic on 2 occasions with an average of 4 years between visits. In addition to the clinical visit, 6300 questionnaires were returned, complementing clinical data collected during the visit. Since February 2019, a further wave of followup visits has commenced that aims to continue the longitudinal data collection and adds further dynamic phenotyping and blood measurement over a 6-h visit incorporating standardized meals.

Longitudinal Data
Detailed clinical and biochemical phenotypes have been collected using harmonized protocols at each visit stage. A summary of a selection of clinical phenotypes is outlined in Table 2. In addition, questionnaire data have been collected on an annual basis and during visits, some which measure incident clinical endpoints such as cardiovascular accident, type 2 diabetes, chronic obstructive pulmonary disease, which have previously been described (Verdi et al., 2019). Three main comprehensive questionnaires ('TwinsUK Baseline Health', 'Baseline Core' and 'Longitudinal Core') were collected between 2004 and 2018 (detailed in Table 3). These were in paper format, completed at respondents' addresses and returned to the research facility. Over 2500 participants completed all 3 main questionnaires and 2300 completed either 2 of the main questionnaires. Furthermore, the demographic of the cohort provides an excellent resource to study aging where longitudinal changes are important to consider. Table 4 provides summaries of the key cognitive and frailty phenotypes we have acquired to explore questions in this area.
Alongside regular visits and questionnaires, TwinsUK has data linkage to official cancer and mortality data for retrospective analysis and future follow-up. Additional links to national health, education and environmental records to our own database are being established at present.

Novel Molecular and Genetic Phenotypes
In addition to epidemiological and clinical phenotypes collected from clinical visits, numerous biological samples, including body fluids (blood, urine, saliva, stool and sebum) and tissue (hair follicle, colonic mucosa, fat and skin biopsy), have been generously donated. Details of the samples collected are summarized in Table 1 and in Verdi et al. (2019). Collection methods have been described in their respective research publications, which can be found on the TwinsUK website (http://www. twinsuk.ac.uk). Here we describe the omic techniques (genomewide association studies, epigenomics, next-generation sequencing, metagenomics, metabolomics and microbiomics) that have been employed on biological samples and phenotypes from TwinsUK data. Details of some phenotypes collected prior to 2012 (e.g., telomere length) have previously been described (Moayyeri et al., 2013).

Genomewide Association Studies
TwinsUK has contributed to many international consortia for genomewide association analysis of various phenotypes (Mills & Rahal, 2018). Genomewide scan data using 2 chips (Illumina HumanHap300 BeadChip and Illumina HumanHap610 QuadChip) are available for 5654 (both MZ and dizygotic [DZ]) twins. The data have been fully imputed using '1000 Genomes' and 'Haplotype Reference Consortium -(HRC)' reference panels. TwinsUK is a member of many ongoing international consortia for meta-analysis of various traits such as height, BMI, lipids, obesity, blood pressure and back pain phenotypes. Some of the main publications from these collaborations can be found in the TwinsUK website. Our genomewide data are also being used to compile polygenic risk scores to isolate loci for various traits .

Epigenetic Markers
The first large-scale genomewide epigenetic assessment in TwinsUK was performed on DNA methylation patterns profiled on the Illumina HumanMethylation27 BeadChip in a whole blood sample of 172 female twins. This array examines 27,578 promoter CpG sites that map uniquely across the genome and some of these sites were found to be associated with age and age-related phenotypes (Bell et al., 2012). Subsequently, the Illumina Infinium HumanMethylation450 BeadChip was additionally applied up to 1000 blood samples to generate higher resolution genomewide DNA methylation profiles Zhang et al., 2015), as well as in 322 skin (Roos et al., 2017) and 648 adipose (Grundberg et al., 2013) tissue biopsy samples from twins. More recently, the Illumina Infinium MethylationEPIC array is being profiled in additional blood samples from over 400 twins. Further epigenetic datasets in TwinsUK cohort have also been generated as part of the EpiTwin study (http://www. epitwin.eu), which in collaboration with the Beijing Genomics Institute, assayed epigenomic sequencing profiles in up 5000 samples from twins aged 16-85 years. The results include methylated DNA immunoprecipitation sequencing profiles in whole blood samples from twins discordant and concordant for a wide variety of diseases and environmental exposures Davies et al., 2014;Yuan et al., 2014).

Gene Expression Markers
During the HATS visit, 856 twins with detailed clinical profiles underwent biopsies of multiple tissues as part of the Multiple Tissue Human Expression Resource project. This was a Wellcome Trust-funded study designed to investigate gene expression across multiple tissues simultaneously with the aim of examining mechanisms involved in common trait susceptibility. Gene expression in 3 tissues and derived cells, fat, skin and lymphoblastoid cell lines (LCL) was determined using Illumina whole genome expression array (HumanHT-12 version 3) comprising 48,803 probes in 3 technical replicates (Grundberg et al., 2012). The same skin, fat and LCL RNA samples plus an additional 400 whole blood samples were RNA sequenced as part of the EuroBATS project (Biomarkers of Ageing using whole Transcriptome Sequencing) a European (EU-FP7) study (Buil et al., 2015).

Whole Genome Sequencing
Whole genome sequencing (WGS) of 2000 healthy, deeply phenotyped twins formed part of the UK10K project, which used state-of-the-art next-generation sequencing methods to uncover rare genetic variants associated with health and disease. The data have been used extensively to describe population structure and functional annotation of rare and low-frequency variants (UK10K Consortium et al., 2015); further details can be accessed at: www.uk10k.org. In addition, approximately 1000 exome sequences at 30-60 × depth have been ascertained as part of  1900-1909: 2 FF 1910-1919: 14 FF 1920-192948 MM;4 FM 1930-1939: 1418196 MM;36 FM 1940-1949: 2352312 MM;64 FM 1950-1959: 2380486 MM;98 FM 1960-1969: 2104502 MM;70 FM 1970-1979: 1702506 MM;56 FM 1980-1989186 MM;58 FM 199058 FM -1999 HiSeqX sequencer using a 150-base paired-end single-index read format. The data have been used to disentangle to contribution of rare variants to the blood metabolome , and are now under investigation to identify rare variants associated with complex diseases and traits, and for the inference of structural variants.

Glycans
Glycosylation is the most common form of posttranscriptional protein modification and it is a putative mechanism in the modulation of the inflammatory response. The technology to assess glycosylation has recently become high throughput, and glycosylation of immunoglobulin G has been measured on 4900 twins while N-glycans in human serum glycoproteins have been measured in 1800 twins. Using this, we have found that glycans are highly heritable (Menni, Keser et al., 2013) and we have been the first to observe a number of associations between glycans and important age-related traits (Barrios et al., 2015;Menni, Gudelj et al., 2018).

Microbiome
Alongside the BRC study (third follow-up), over 5000 fecal samples have been collected for microbiome analysis. Twin volunteers provided stool samples, stored on site (St Thomas' Hospital, London) at −80°C. DNA extraction and 16S rRNA sequencing using the V4 variable region of nearly 3000 samples have been completed in collaboration with Cornell University using a multiplexed approach on the Illumina MiSeq platform. Smaller subsets of twins have also been sequenced with complementary methods by the BRC Genomics Facility at King's College London. In addition, plain saliva (700) and midstream urine (1600) specimens have undergone similar 16S amplicon sequencing using the same primers in collaboration with University of California San Diego and Stanford University. Diversity metrics, taxonomic levels from genus through to phylum and relative abundances of operational taxonomic units (OTUs) have been used to assess microbiota associations within the TwinsUK data. Associations have been observed with a number of health deficits and medication usage (Jackson, Goodrich et al., 2016Le Roy et al., 2017), and age-related traits, including frailty and cognition (Jackson, Jeffrey et al., 2016;Verdi et al., 2018), among others (Menni, Lin et al., 2018). In addition, microbiota associations with diet Menni, Zierer et al., 2017;Ni Lochlainn et al., 2018) and socioeconomic status  have been found. More recently, amplicon sequence variants, also known as exact sequence, have been generated from 3345 stool samples. This approach offers a higher resolution than the OTU, allowing for greater sensitivity and specificity in identifying the taxonomic associations with traits .

Metagenomics
Whole metagenomic shotgun sequencing (WMGS) has been performed on fecal samples in 2 batches comprising 250 and 1004 volunteers from the TwinsUK registry. This larger dataset, including 161 MZ twin pairs, 201 DZ twin pairs and 280 singletons generated an average of 39M high-quality microbial reads per sample. Taxonomic and functional information have been inferred from the WMGS data. These results are being studied to determine the influence of the microbiome on the fecal and host metabolome, and to identify bacterial species and function mediating microbiome-associated increased risk for common disease.

Dietary Phenotypes
TwinsUK has detailed datasets on dietary habits, which have been collected since the inception of the registry. Data vary and include dietary indices on >5000 participants (e.g., Mediterranean Diet Score, Healthy Eating Index -2010 and the Healthy Food Diversity Index; Bowyer et al., 2018). Dietary patterns, which are measured by category of foodstuff, have also been assessed through a Food Frequency Questionnaire previously used in the EPIC Study (Bingham et al., 2008). For details of collection, see Table 5.

Socioeconomic Data
The historical research focus of TwinsUK has shaped the main demographic of the twin cohort having middle socioeconomic status and education typical of a volunteer group (Moayyeri et al., 2013;Steves et al., 2013). Socioeconomic status of the twin volunteers has been collected since the registry's inception through self-reported questions (e.g., highest educational qualification status). More recently, the Index of Multiple Deprivation (IMD) has been compiled for all volunteers having UK postal codes, and data are to be linked to national databases for retrospective and future collection.

Index of Multiple Deprivation
Datasets from online government data repositories were combined, representing four of the UK's administrative countries: England (IMD version 2015), Scotland (IMD version 2016), Wales (IMD version 2014) and Northern Ireland (IMD version 2017). The IMD is a composite measure of area-level deprivation and considers the following domains: income, employment, education, skills and training, health deprivation, crime, barriers to housing and services, and the living environment. As methods may vary between the countries, and ranks are inappropriate (given the differing numbers of administrative districts in each country), the decile score was combined as a relative measure of deprivation. Datasets were matched to postcodes or Lower Layer Super Output Area (LSOA) codes at 17,498 time points for 12,041 individuals. Mean IMD decile score (considering all time points) was 6.49.

Future Directions and Collaborations
Longitudinal and detailed clinical, biochemical, behavioral, socioeconomic and deep omics (including multitissue characterization) of participants for nearly 30 years has provided a unique resource to study complex diseases and domains of healthy aging in the TwinsUK population. These, in conjunction with novel dynamic testing at study visits and lifestyle intervention studies, offer a unique opportunity to explore personalized medicine. High-quality data collection, database management, biological sample storage and statistical quality control enhance the resource. In addition, a key strength of the resource lies in the highly engaged and loyal population; this is evident from the high retention levels of participation across studies. Blood, urine, DNA and multiple tissue samples are available for future measurements. Online questionnaires and active engagement with our twin participants using text messages, emails and social networking enable responsive and agile data collection. Our Volunteer Advisory Panel is key to developing new strategies and governance of participants, informing on decisions about the ethics, practicalities and appropriateness of potential studies.
The TwinsUK registry has a history of numerous successful scientific collaborations, and we remain committed to providing the scientific community with access to the phenotype data from the TwinsUK Resource. TwinsUK has an exemplary record for data sharing with over 800 data access requests, 150,000 samples shared to over 100 collaborators and over 600 publications in the past 6 years. Detailed descriptions for researchers of data and samples are on the data access pages of the website (http://www.twinsuk. ac.uk/data-access/cohortdata-description/); here, over 10,000 phenotypes can be searched. Longitudinal population studies funding from the Wellcome Trust continues to fund the core functions of TwinsUK and opens up the resource to successful cross-cohort collaborations. Over the next 5 years, TwinsUK will integrate electronic health records into an enhanced deep tissue omics resource and continue dynamic phenotypic testing into clinical visits. In addition, we will extend the age range of the registry to include volunteer twins from birth to adulthood, thus opening up the resource to study unique twin gene-environment interactions across the life course. New efficient broad consent will ensure that the communication with participating twins is ethical and proportionate. New annual sociological questionnaires will harmonize with English Longitudinal Study of Ageing and other LPS (1946LPS ( /1958). We will also standardize mental health phenotypes between the complementary Twins Early Development Study (TEDS) such that, together, TwinsUK and TEDS cohorts will be an unparalleled twin resource across the life course. These developments will ensure TwinsUK will be a unique global resource of longitudinal omics and twin research across the life course, with immense potential for future scientific exploitation.