Comprehensive Multiomics Analysis of Monozygotic Twin Discordant for Double Outlet Right Ventricle

Abstract The objective of this study was to understand and measure epigenetic changes associated with the occurrence of CHDs by utilizing the discordant monozygotic twin model. A unique set of monozygotic twins discordant for double-outlet right ventricles (DORVs) was used for this multiomics study. The cardiac and muscle tissue samples from the twins were subjected to whole genome sequencing, whole genome bisulfite sequencing, RNA-sequencing and liquid chromatography-tandem mass spectrometry analysis. Sporadic DORV cases and control fetuses were used for validation. Global hypomethylation status was observed in heart tissue samples from the affected twins. Among 36,228 differentially methylated regions (DMRs), 1097 DMRs involving 1039 genes were located in promoter regions. A total of 419 genes, and lncRNA–mRNA pairs involved 30 genes, and 62 proteins were significantly differentially expressed. Multiple omics integrative analysis revealed that five genes, including BGN, COL1A1, COL3A1, FBLN5, and FLAN, and three pathways, including ECM-receptor interaction, focal adhesion and TGF-β signaling pathway, exhibited differences at all three levels. This study demonstrates a multiomics profile of discordant twins and explores the possible mechanism of DORV development. Global hypomethylation might be associated with the risk of CHDs. Specific genes and specific pathways, particularly those involving ECM–receptor interaction, focal adhesion and TGF–β signaling, might be involved in the occurrence of CHDs.

Congenital heart diseases (CHDs) are the most common type of congenital anomaly and the leading noninfectious cause of infant death (Liu et al., 2019).The mean prevalence of CHD 1970CHD -2017 globally was 8.224 (7.817, 8.641 globally was 8.224 (7.817, 8.641) per thousand live births (Liu et al., 2019).CHDs are much more common in twin pregnancies, with a reported prevalence of 20 in 1000 live births (Balasubramanian et al., 2021).
Double outlet right ventricle (DORV), a condition when both great arteries arise predominantly from the right ventricle, is a rare form of CHDs.In China, the birth prevalence of total CHDs has increased continuously over time, from 0.201% in 1980-1984 to 4.905% in 2015-2019; the birth prevalence of DORV is 0.059% (L.Zhao et al., 2020).Consequently, CHDs cause considerable suffering to patients and their families, and have become a sizable public health concern.
The etiology of the majority of CHDs remains largely unknown (van der Bom et al., 2011).It is widely believed that most CHDs arise from the interplay between genetic and environmental factors (Krauss & Hong, 2016;Tiret, 2002).Epigenetics has been proposed to be one of the main mediators of this interaction, and extensive studies have identified aberrant epigenetic patterns in several polygenic diseases, including CHDs (Cao et al., 2021;Feinberg, 2018;Gilsbach et al., 2014;Grunert et al., 2020;Lyu et al., 2018).Epigenetics involves the control of transcription and translation without changing the nucleotide sequence, and includes DNA methylation and histone modifications, which control gene accessibility, and noncoding RNAs, which primarily control mRNA translation.
Monozygotic (MZ) twins share almost all of their genetic information stored in DNA, but are often discordant for common complex diseases, such as diabetes, autism, schizophrenia, different types of cancer, as well as CHDs (Grunert et al., 2020;Lyu et al., 2018).MZ twins are advantageous material for studying epigenetics by controlling for many potential confounders (Castillo-Fernandez et al., 2014).One study conducting genomewide sequencing and methylation analysis in MZ twins discordant for DORV identified few genomic differences but 1566 differentially methylated regions (DMRs) showed a high correlation between hypermethylated promoters at ZIC3 and NR2F2 and downregulated gene expression levels of these two genes in patients with DORV compared to normal controls (Lyu et al., 2018).Another study performing genomewide high-throughput sequencing in MZ twins discordant for tetralogy of Fallot (TOF) observed DNA methylation changes in regulatory regions of known cardiac-relevant genes (Grunert et al., 2020).
In the present study, we obtained cardiac tissue samples from one MZ twin pair discordant for DORV, and comprehensively investigated genomewide genetic, DNA methylation, mRNA, lncRNA and protein-level differences, providing further insights into the etiology of CHDs

Study Subjects
The discordant monochorionic diamniotic twins were urgently delivered at 35 þ6 weeks gestation due to severe maternal preeclampsia.However, the twins died within 5 minutes after birth because of fetal status after resuscitation.Twin A had multiple congenital heart defects, including a DORV, with a ventricular septal defect, interrupted aortic arch, hypoplasia of the ascending aorta, thickening of the tricuspid valve leaflets, and right atrial dilation, while twin B had normal cardiovascular structures, but a diagnosis of selective intrauterine growth restriction (sIUGR; Figure S1).There was no evidence that twin-twin transfusion syndrome was applicable, based on the previously published Quintero criteria (Benoit & Baschat, 2014).The characteristics of the twins are detailed in Table S1.Monozygosity was confirmed by zygosity identification analysis.Cardiac and muscle tissue samples were collected from the same region of the conusarteriosus and the deltoid muscle of the right arm of the twins.Microsatellite analysis was used to confirm the zygosity of the twins (methodological details are given in Supplementary Additional file 3).Sporadic abortion cases were collected from Shenzhen Maternity and Child Healthcare Hospital and Xijing Hospital.All cases were all diagnosed as DORV by a professional pathologist and a cardiac surgeon.

Whole Genome Sequencing (WGS)
Genomic DNA from conusarteriosus tissue of the MZ twins was extracted using DNeasy Blood and Tissue Kit (Qiagen).A standard CG two-adapter library was constructed.Using short-sequence and paired-end sequencing strategies, genomes were sequenced with Illumina Hiseq TM 2000 (Illumina Inc, San Diego, CA, USA).Data processing included generating base sequences, alignment, local assembly of candidate sites, identification of variants, and QC of the results.CGA tools calldiff (http://www.completegenomics.com/analysis-tools/cgatools/) were used to compare the differences between the twins.The statistical results of WGS data output are shown in Table S2.

Whole Genome Bisulfite Sequencing (WGBS)
Genomic DNA from conusarteriosus and skeletal muscle tissue of the MZ twins were processed as follows: fragmentation, end repair, adapters ligation, bisulfite treatment, PCR amplification, library test.The processed libraries were then subjected to paired-end 90bp and single-end 50bp sequencing respectively by Illumina Hiseq TM 2000 sequencing system according to the manufacturer's instructions.The details regarding the sequencing and bioinformatics analysis are available in Supplementary Additional file 5.
The overall data output of WGBS, and the sequencing depth and coverage of CG sites in genomic elements and chromosomes are shown in Tables S3 and S4 respectively.
DMRs were screened with DMR different level ≥ 0.2 and Fisher exact test p value ≤ .005as filter conditions.Due to the relatively extensive research on the methylation function of the promoter regions, the DMRs that were significant in conus-case and conuscontrol pairs but not significant in muscle-case and muscle-control pairs were intersected with the gene promoter region, and the gene was outputted when the intersection was 100%.

RNA Sequencing
Total RNA from conusarteriosus and skeletal muscle tissue of MZ twin were processed as follows: purification, integrity assessment, removing ribosomal RNA (rRNA), random fragmentation, and reverse transcription, then synthesized into a double-stranded cDNA, terminal repair (blunt-ending), 3' terminal addition of A-residues, adaptor ligation, PCR amplification, purification and quality inspection of the library, cluster generation.The libraries were then subjected to paired-end sequencing by Illumina Hiseq TM 2000 sequencing system.The quality control standard of sequencing was as follows: the amount of data was about 15G per sample, and the ratio of base quality greater than 20 (Q20) in each direction was not less than 90%.The details regarding the sequencing and bioinformatics analysis of mRNA and lncRNA are available in Supplementary Additional file 8. Differences were compared using edgeR (Robinson et al., 2010).

Proteomic Expression Analysis
Four protein samples from the conusarteriosus and skeletal muscle of both twins were processed as follows: enzymolysis, desaltation, iTRAQ labeling, separation of components, and LC-MS/MS analysis.The details regarding the protein expression bioinformatics were available in Supplementary Additional file 9.
Three technical replica were carried out.A summary of the three replicate identifications of proteins are shown in Table S5.The number of peptides and proteins identified three times were 5820 and 1221, accounting for 48.66% and 55.58% of the total peptides (N 11961) and proteins (N 2197) respectively.

Gene Ontology Enrichment and Pathway Analysis
Gene ontology (GO) enrichment (http://www.geneontology.org/) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses (http://www.genome.jp/kegg/pathway.html)were applied to determine the functional roles and related pathways of differentially methylated regions, expressed genes or transcripts, or proteins.Whole genome sequencing results were used as enrichment background.Gene networks and canonical pathways representing key genes were identified using the Database for Annotation, Visualization and Integrated Discovery (DAVID; Sherman et al., 2022).Fisher's exact test and a chi-square test were used to select the significant pathway, and the false discovery rate (FDR) was applied to account for multiple comparisons.The threshold of significance was defined by the p value (p ≤ .005)and FDR (q ≤ 0.20).
Five abortion DORV cases were selected to validate gene expression.Using conus as lesion case and left ventricular as control, qPCR was performed to assess mRNA expression at specific genes.On the other hand, nine sporadic DORV cases and five normal controls were used to validate variation in the methylation levels of specific genes.DNA was extracted from conusarteriosus tissues to measure methylation levels by BSP.Differences in methylation levels between cases and controls were compared, to determine whether these followed the trend for differences between the twins.
Human umbilical vein endothelial cells (HUVECs) were cultured and treated with 4 μM 5-Aza-dC (Sigma-Aldrich, St Louis, MO, USA) for 48 h.Cells were harvested, then BSP and qPCR were performed to assess DNA methylation at specific loci and to validate mRNA expression.

Genomic Variations in MZ Twin Discordant for DORV
The number of differential SNVs between MZ twins was 18,824.Of these, the number of point mutation in coding sequence (CDS), introns, upstream of transcription start sites (TSS-upstream) and untranslated regions (UTRs) were 128, 6302, 685 and 96 respectively.Of the 128 differential SNVs in the CDS region, there were 69 nonsynonymous mutations, of which 29 mutations had a somatic score above -15 (Table S6).The mRNA expression analysis of these 29 SNVs showed that only the NRAP gene had differential mRNA expression.However, the result of Sanger validation showed that the twins had same genotype in NRAP gene; both were homozygous for TT.
Based on the WGS data, genomewide copy number variations (CNVs) were also screened.The number of specific CNVs for case and control was 9 and 13 respectively (Table S7).Among the differential CNVs between twins, only a few overlapped with gene regions, but have not been reported to be associated with heart disease.

DNA Methylation Differences Between Discordant Twin
WGBS revealed that the global methylation level of the affected twin was lower than the unaffected twin in both the conusarteriosus and muscle tissues (Figure 1A).Regarding the CG methylation sites, the global methylation level was 69.10% and 70.38% in the conusarteriosus of the affected and unaffected twin respectively, while that in the skeletal muscle was 69.36% and 70.05% in the affected and unaffected twin respectively.The resulting differences in the global CpG methylation levels between the twins that were relative larger in arterial conus and smaller in muscle tissue were confirmed by LC-MS/MS validation experiments (Figure 1B).The average methylation levels in genomic elements and chromosome are shown in Table S8 and Table S9 respectively.In terms of genomic regions, discordance in CpG methylation was significantly different between the two tissue types in promoter regions, CDSs, introns, and 3 0 -UTRs, but not in the 5 0 -UTRs or in the intergenic regions (Figure 1C).
According to the screening criteria, 36,228 DMRs that only occurred in the conusarteriosus tissue were obtained compared to skeletal muscle.A total of 1094 DMRs were listed in promoter region.The distribution of DMRs in promoter elements is shown in Figure 1D.Of these, 3% were located in promoter regions, involving 1039 genes.The GO enrichment in promoter DMRassociated genes in the twin case are shown in Table S10.Functional pathway analysis revealed 44 DMR-associated genes, involved in eight pathways as defined by the KEGG database, as shown in Table 1.Among these, the pathway involving ECMreceptor interaction was the most significant (EASE score = 9.05E-03).The hypomethylation of specific genes, such as COL1A1 and THBS3, were confirmed by BSP validation data (Figure 1E).

mRNA and lncRNA Expression Levels Between Discordant Twin
RNA-Sequencing revealed that 419 genes that were significantly differentially expressed with an expression fold-change >2 in the conusarteriosus and fold-change <2 in the skeletal muscle between the twin.The expression of these are listed in Table S11.The highly expressed genes by, such as COL1A1 and THBS3, were confirmed by qPCR validation data (Figure 2).The GO enrichment of differentially expressed genes are shown in Table S12.Functional pathway analysis revealed 10 pathways involving 46 genes, as shown in Table 2.Among these pathways, changes in focal adhesion pathways were the most significant (EASE score = 3.29E-08).The expression level of COL1A1and THBS3 by qPCR data were confirmed to be consistent with RNA-Seq results (Figure 2).
Thirteen novel and 33,370 known lncRNA transcripts were obtained.The novel lncRNA transcripts and their corresponding genes are displayed in Table S13.Among these, a total of 1818 lncRNAs were differentially expressed in the conusarteriosus between the twin, including 755 up-regulated and 1063 downregulated lncRNAs in the case twin.Among the potential target genes of differentially expressed lncRNAs, only 30 mRNAs were differentially expressed; the lncRNA-mRNA pairs are shown in Table S14.

Differential Proteins Between Discordant Twins
Out of the 1676 proteins that were repeatedly detected by LC-MS/MS, 62 proteins that were significantly different in the conusarteriosus tissue (fold-change ≥1.5 and p value ≤ .05)but not in muscle tissue were obtained.These protein-encoding genes function are shown in Table S15.Functional pathway analysis revealed that these genes clustered in eight pathways, as shown in Table 3, among which the extracellular matrix (ECM) -receptor interaction pathway was the most significant (EASE score = 8.89E-10).The MRM results showed that the specific peptide segment of COL1A1 protein was 2.48 times higher in the case twin than in the control twin; this confirmed COL1A1 higher expression by LC-MS/MS.

The Common Genes and Pathways of Multiple Omics Integrative Analysis
Among the 1039 promoter DMR-associated genes, 42 genes and 9 genes were differently expressed in mRNA and protein levels respectively.Five genes, including BGN, COL1A1, COL3A1, FBLN5 and FLAN, exhibited differences at all three levels.According to the analysis results of DNA promoter methylation, mRNA and protein expression, the pathways that were enriched in all three levels were ECM-receptor interaction, focal adhesion and TGF-β signaling pathway.The three enriched pathways and their corresponding genes that differed at different levels are shown in Figure 3. COL1A1 and COL3A1 participated in ECM-receptor interaction and focal adhesion pathway, and FLNA participated in focal adhesion pathway.Although THBS3 participated in all three pathways, it only showed differences at the methylation and mRNA level.

The Methylation and mRNA Expression Validation
The validation experiment in nine sporadic cases of DORV and five aborted normal controls showed that the average methylation levels at the promoter region of COL1A1 and THBS3 in cases were both lower than controls, following the trend seen in the twins (Figure 4A).The results in the cell model showed that after treatment with 5-Aza-dC, the methylation levels of COL1A1 and THBS3 genes decreased in the HUVECs.Concurrently, the expression of these two genes was increased (Figure 4B).Compared with the control group, 5-Aza-dC treatment led to 25.2-fold and 1.6-fold up-regulated expression of COL1A1 and THBS3 respectively.These results demonstrated that the hypomethylation of COL1A1 and THBS3 may cause up-regulation of their expression.

Discussion
In this study, we investigated specific tissues from discordant MZ twins, using multiomics strategies based on high-throughput techniques, which enabled us to perform the first systematic analysis of genome, methylation, transcriptome, proteome-wide molecular changes in CHDs.
We performed genomic sequence variation detection to identify post-twinning SNVs, short indel and CNV differences between the twins that could make one of them susceptible to CHD.However, we did not detect any discordant coding or regulatory SNV or indel variants or CNVs that would have been likely to cause CHD.The results were consistent with the outcomes of previous studies that were conducted in DORV (Lyu et al., 2018), TOF (Grunert et al., 2020) or CHDs (Y.Xu et al., 2017) discordant monozygotic twins.It has been reported that during the twinning process and/or embryonic development, the underlying genetic difference could be involved.For example, one study identified three CNV differences involving copy number gains in the affected member of a MZ twin discordant for CHDs, including 3.7 kb on 12p13.31,7.5 kb on Xp11.23, and 37.4 kb on Xq28 (Breckpot et al., 2012).In addition, studies have reported de novo somatic mutations in CHDdiscordant monozygotic twins, (Li et al., 2014;Zhang et al., 2016).
DNA methylation is an epigenetic modification that plays a critical role in heart development (Akerberg & Pu, 2020;  Gilsbach et al., 2014;Jarrell et al., 2019).In the present study, global hypomethylation status was observed in the heart tissue sample of the twin case with DORV.This is consistent with the results of two previous case-control studies: one assessed long interspersed nucleotide elements (LINE-1) methylation, which was representative of genomewide methylation status in right ventricular tissue samples from pediatric patients (Sheng et al., 2012), while the other conducted genomewide DNA methylation analysis in myocardium tissue samples from a fetal cases with cardiac defects (Zhou et al., 2021).However, one previous genomewide DNA methylation study found more hypermethylated than hypomethylated DMRs in myocardial samples from pediatric patients with CHDs  compared with controls (Grunert et al., 2016); another study using heart tissue DNA from fetuses presenting a variety of cardiac defect phenotypes, including DORV, found no significant differences in whole genome methylation pattern between fetuses with isolated cardiac defects and fetuses with normal development (Serra-Juhe et al., 2015).DNA methylation can change the normal gene expression and regulate crucial aspects of its function.One study revealed that hypermethylation of the NOTCH4 promoter region was associated with decreased NOTCH4 expression in patients with TOF (Zhu et al., 2020).Two studies demonstrated that aberrant methylation levels at the promoter CpG island shore of the ZFPM2 and HAND1 genes might be responsible for gene transcription regulation in patients with TOF (Sheng et al., 2016;Sheng et al., 2013).Another study suggested that demethylation in the TBX20 promoter region may be responsible for overexpression in the cardiac tissue of patients with TOF (Gong et al., 2019).In this study, hypomethylation and high mRNA expression level of COL1A1 and THBS3 genes were detected in the affected twin and sporadic cases.The cell model experiment demonstrated that decreased methylation levels of COL1A1 and THBS3 genes could lead to increased expression levels.Although there were no reports on the association of THBS3 and COL1A1 genes with cardiac development or CHDs, our results suggested that THBS3, a common gene in all three pathways, and COL1A1, a common gene in the first two pathways, might have potential functions in cardiac development.
In the multiomics analysis, five genes, including BGN, COL1A1, COL3A1, FBLN5 and FLAN, all showed differences in methylation, mRNA expression and protein level.Among them, only COL3A1 has been reported to be associated with the risk of CHDs (Morton et al., 2021;Tan et al., 2022).In addition, three critical pathways, viz.ECM-receptor interaction, focal adhesion and TGF-β signaling, were dysfunctional in the affected twin and enriched in all three levels.Recent studies have revealed that these pathways may play a role in appropriate cardiac outflow tract morphogenesis and aortic arch remodeling, as they may be involved in differentiation, migration, and survival of cardiac neural crest cells (Restivo et al., 2006) and the second heart field (Dyer & Kirby, 2009).Consequently, dysfunction within these pathways may lead to short outflow tracts (Lewandowski et al., 2015;Lu et al., 2016), including DORV.According to another theory of left-right (LR) asymmetry, DORV may be caused by the failure of shifting of the conotruncal septum to the left, so that both the aorta and the pulmonary artery would arise from the right ventricle (Rahman et al., 2020).During this process, TGF-β signaling is important for cardiac looping with appropriate L-R asymmetry (Tadjuidje et al., 2016).The ECM and focal adhesion pathways have been reported to be related to heart development and malformation.
This study had several strengths.First, our unique sample consisting of a MZ twin pair discordant for DORV allowed us to detect the epigenetic variation controlling for genotype, age, sex and other potential confounders, and provide credible results.Second, given that epigenetic regulation was tissue-specific, we have produced a tissue-specific multiomics map using tissue samples of the developing heart, which could enhance our capability to pinpoint the underlying mechanism of CHDs.Third, by undertaking a genomewide approach, we were able to uncover phenotype-relevant differentially methylated loci in genomic regions that are both novel and have been previously associated with CHDs.Fourth, compared with the previous studies of MZ twins discordant for CHDs, which only carried out genomewide genetic and DNA methylation analysis (Grunert et al., 2020;Lyu et al., 2018), this study systematically performed genomewide genetic and DNA methylation, and transcriptome and protein level analysis, providing a deeper understanding of the etiology of this complex disease.
Our study has several limitations.First, we only validated COL1A1 and THBS3.The other potentially related genes were not validated in the twins and sporadic cases.Second, the sample size of this study was relatively small, and the control fetus in the discordant twins was not completely normal but had iUGR.Although we did the validation experience on sporadic cases, the results on sequencing from only one pair of twins is hard to replicate under other circumstances.Ideally, we would hope to expand the sample size in future studies.Third, due to the relatively stringent screening criteria, genes or pathways that were different in methylation, mRNA expression and protein levels were considered candidates; this potentially reduced the number of candidates that need to be validated.Moreover, the differentially expressed lncRNA-mRNA pairs found in the twin requires further validation in additional cases.
In conclusion, this study is the first time a tissue-specific multiomics profile of DORV was delineated.Global hypomethylation might be associated with the occurrence of DORV.Multiomics molecular analysis suggested that several genes, such as BGN, COL1A1, COL3A1, FBLN5 and FLAN, and some pathways, particularly involving ECM-receptor interaction, focal adhesion and TGF-β signaling, might be involved in the occurrence of CHDs.These results will provide basic data resources for subsequent studies on the pathogenesis of CHDs.However, further studies are still required to investigate their mechanisms in the occurrence and development of CHDs.

Figure 1 .
Figure 1.Comparison of DNA methylation levels between the MZ twin and the distribution of DMR.(A) Overview the average genome-scale DNA methylation levels by WGBS.(B) The average genome-scale DNA methylation levels by LC-MS/MS.(C) The differences of methylation levels in genomic elements by WGBS.(D) The distribution of DMR in promoter elements.(E) The hypomethylation of COL1A1 and THBS3 by BSP.

Figure 2 .
Figure 2. The expression level of COL1A1and THBS3 by qPCR data were confirmed to be consistent with RNA-Seq results.

Figure 3 .
Figure 3.The three common pathway and genes at three levels.

Figure 4 .
Figure 4.The methylation and expression validation of COL1A1 and THBS3.A: The methylation level in sporadic cases and controls.B: The methylation and mRNA expression in cell model.

Table 1 .
Pathway analysis of promoter DMR associated genes

Table 2 .
Pathway analysis of differentially expressed genes

Table 3 .
Pathway analysis of differential protein-encoding genes