Associations between epigenome-wide DNA methylation and height-related traits among Sub-Saharan Africans: the RODAM study

Abstract Human height and related traits are highly complex, and extensively research has shown that these traits are determined by both genetic and environmental factors. Such factors may partially affect these traits through epigenetic programing. Epigenetic programing is dynamic and plays an important role in controlling gene expression and cell differentiation during (early) development. DNA methylation (DNAm) is the most commonly studied epigenetic feature. In this study we conducted an epigenome-wide DNAm association analysis on height-related traits in a Sub-Saharan African population, in order to detect DNAm biomarkers across four height-related traits. DNAm profiles were acquired in whole blood samples of 704 Ghanaians, sourced from the Research on Obesity and Diabetes among African Migrants study, using the Illumina Infinium HumanMethylation450 BeadChip. Linear models were fitted to detect differentially methylated positions (DMPs) and regions (DMRs) associated with height, leg-to-height ratio (LHR), leg length, and sitting height. No epigenome-wide significant DMPs were recorded. However we did observe among our top DMPs five informative probes associated with the height-related traits: cg26905768 (leg length), cg13268132 (leg length), cg19776793 (height), cg23072383 (LHR), and cg24625894 (sitting height). All five DMPs are annotated to genes whose functions were linked to bone cell regulation and development. DMR analysis identified overlapping DMRs within the gene body of HLA-DPB1 gene, and the HOXA gene cluster. In this first epigenome-wide association studies of these traits, our findings suggest DNAm associations with height-related heights, and might influence development and maintenance of these traits. Further studies are needed to replicate our findings, and to elucidate the molecular mechanism underlying human height-related traits.


Introduction
Human height and height-related traits, such as leg-to-height ratio (LHR), leg length, and sitting height, are complex traits that are determined by both genetic and environmental factors.Moreover, these traits have been shown to have low correlations with each other and therefore are assumed as independent variables when utilized by epidemiologists to measure growth factors among populations at risk of prolonged malnutrition. 1The longitudinal effects of environmental and genetic factors on height have been recorded since the first published Developmental Origin of Disease Hypothesis (DODH) by Baker.Investigating geneenvironment interaction in the form of DNA methylation (DNAm) as part of DODH is of potential relevance, as only the use of genome-wide association studies (GWAS) loci scores could only explain 18.5-19.8% of inter-individual variation in height heritability. 2Additionally, GWAS have identified 3290 loci associated with height and height-related traits, however those loci only represent 24.6% of the variance in height. 3,4][10] In the light of low explained variance of genetic factors and the important effect of environmental factors, studying epigenetic modifications associated with height and height-related traits is of interest, as they can reflect gene-environment interaction (Fig. 1). 1,11Epigenetics is the study of heritable, yet reversible molecular modifications of DNA without altering the DNA sequence 12 and comprises histone modifications, microRNA, and DNAm of which the latter is the most commonly studied.Epigenetic processes influence the phenotypic development, through regulation of gene expression. 12revious studies have found genes relating to bone cell development and maintenance to be under the influence of DNAm 13,14 but epigenome-wide association studies (EWAS) on height and heightrelated traits have not been performed.The aim of this study was to assess the relation between DNAm and height and height-related traits (leg length, sitting height, and LHR), in a population of Ghanaians, in order to increase our physiological understanding of growth regulation.This of particular relevance to a West African population, among whom an association between height and height-related with cardiometabolic conditions later in life was demonstrated. 15

Study population
This study population of adults was derived from the Research on Obesity and Diabetes among African Migrants (RODAM) study.The data collection procedures for the RODAM study have been published previously. 16The study was conducted from 2012 and 2015 with 6385 Ghanaians resident sampled from across in five geographical locations.Ghanaian individuals were randomly sampled from various urban and rural areas within Ghana and from migrant communities in Amsterdam, Berlin, and London 16 with the mean age for participants estimated to 51.2 (± 9.73 standard deviations (SD)).Ethical committees from participating institutions in Ghana, the Netherlands, Germany, and the UK approved the study before the start of data collection.Written informed consent was obtained from each study participant.All participants for the DNAm profiling were selected from a casecontrol design in which ~300 individuals were diabetic cases, ~300 were deemed diabetic control cases, and ~135 were obese control cases). 17The EWAS was performed using the DNAm profiles of 736 participants of the RODAM cohort, 18 of which 713 samples passed quality control.Quality control procedures were based on those described by Meeks et al. 18 and Chen et al. 19 Individuals missing one or more height-related traits and those whose measurements were erroneously recorded were omitted from inclusion in the final sample for this study (n = 704).

DNA isolation methylation profiling, processing, and quality control
High molecular DNAm was extracted from whole blood samples.The blood samples were processed manually, aliquoted, and stored at −20°C at local research locations.Samples were transported to central laboratories in each country to be registered and stored again −80°C.
The DNA extraction methylation profiling, processing, and quality control protocols of the RODAM samples have been described in detail previously. 20In brief, bisulfite conversion of DNA and DNA extraction protocols were conducted in the Source BioScience laboratories, Nottingham, UK, using the Zymo EZ DNAm ™ kit.Infinium® HumanMethylation450 BeadChip ampli- fied and hybridized the converted DNA, thereby quantifying DNAm levels for ~485,000 CpG sites.Methylation levels were determined from the methylated or unmethylated intensities for each individual CpG site present within the array.Methylation levels were quantified via Beta values, with zero representing an unmethylated probe and one representing a fully methylated probe.M values subsequently were calculated and were applied in further analyses.Using R statistical software (version 4.2.0), quality control was performed using the MethylAid package (version 1.30.0).The raw 450k data was normalized using the Minfi package (version 1.42).Additionally probes annotated to the X and Y chromosomes were removed as well as cross-hybridised CpGs, and probes known to contain single nucleotide polymorphisms with a minor allele frequency of > 0.05. 19These procedures reduced the data available for analysis to approximately 429,449 CpG sites per participant.Lastly, relative blood cell type distribution (CD8þ, T lymphocytes, CD4þ T lymphocytes, natural killer cells, B cell, monocytes, and granulocytes) was estimated according to the methods of Houseman et al. 21

Physical measurements
All height and height-related measurements were standardized across all sampling locations. 1,22Height was determined from a portable stadiometer (SECA 217) to the nearest 0.1 cm without participants wearing shoes.Sitting height was derived from measuring participants as they sat upright on a flat seat, and with their heads level, feet on the floor, and with the thighs unsupported.Then the sitting height distance (cm) from the floor to the top of the head was recorded.The leg length (cm) was calculated by subtracting an individual's sitting height from their total height.By dividing the calculated leg length by skeletal height, the LHR was estimated. 1,22fferentially methylated positions Differentially methylated positions (DMPs) were derived from multivariate linear regressions between height-related traits (independent variable) and DNAm M values (dependent variable) using the Limma package (version 3.52).Methylation M values were used to ensure normal distribution for the statistical analysis, while Beta values were used for interpretation and visualization of data.23 Models were adjusted for sex, age, estimated blood cell type proportions, and technical covariates (hybridisation batch and array position), because of correlation with DNAm in the principal components analysis.QQ plots were used to assess best model fit for each height-related trait (Fig. 2).False discovery rate (FDR) p-value adjustments were applied to reduce type I error through multiple testing.An FDR of < 0.05 was assumed epigenome-wide significance, while an FDR within a range of 0.05-0.5 was eligible for differentially methylated region (DMR) analysis.Since all traits analyzed in this study are to a greater or lesser extent correlated to each other, we assumed that traits were not fully independent and multiple test penalty was only applied per analyzed trait.CpG probes and genes were annotated using the Human Genome build 37 Illumina platform via the IlluminaHumanMethylation450kanno.ilmn12.hg19R package (Version 0.6, UCSC build).Lastly, in order to detect and remove undocumented genetic variation we ran a post hoc analysis on the top 25 DMPs using the Bioconductor package gaphunter, 24 applying the delta difference cluster (>2) threshold of 0.05.24,25 Differentially methylated regions DMRs were identified using two different R packages: bumphunter (version 1.38) and DMRcate (version 3.15).Both procedures generated results based on the lowest p-values and FDRs using fitted models similar to the DMP analysis.20 For bumphunter, the M value cutoff determining the effect size for the DMR analysis was set at 0.0025, corresponding to an effect size of 0.25% M value difference per cm for the corresponding height-related trait.Here we applied 500 bootstrap permutations on the bumphunter models. DMs were defined to compromise ≥ 3 CpGs.A familywise error rate (FWER) < 0.2 was considered statistically significant.With the DMRcate analysis, FDR threshold cutoffs varied for every height-related trait, i.e. this threshold was relaxed to each traits' lowest observed FDR value minus 0.01.We assumed that DMRs with a Stouffer coefficient and a smoothed FDR of < 0.05 were epigenome-wide significant.A smoothed FDR is a method of refining the multiple-hypothesis test by implementing a weighted distribution.DMRcate uses a Gaussian kernel bandwidth for the smoothed-function estimation.27 Biological relevance Biological functions of DMPs and DMRs were assessed through the systematic search of multiple academic sources.[28][29][30][31] All probes and gene functions were required to be consistent across at least three independent sources for the function of the probe, or gene, to be considered informative and relevant to this study.The sources used to verify the function of identified DMPs, and DMRs include EWAS Atlas Database, 28 iMethyl, 29 and UCSC Genome.30 The function of identified genes annotated to our top DMPs and DMRs were verified using UCSC Genome, 30 the National Library of Medicine's Gene Database, 31 as well as peer-reviewed article sourced from the PubMed database (https://pubmed.ncbi.nlm.nih.gov/).The purpose of these reviews was to confirm a probe/genes' citation in publications exploring metabolic conditions, growth factors, adverse environmental influenced development, or general probe/gene function.

Pathway analysis
Enrichment pathway analysis identifies molecular and genetic mechanisms associated with specific DMPs.The MissMethyl 32 an enrichment procedure was conducted with the first 100 significant probes per anthropometric trait based on p-value and lowest FDR.Enrichment results were generated using MissMethyl package in R; generating results from databases are the Gene Ontology 33 (GO) and Kyoto Encyclopedia of Genes and Genomes 34 (KEGG).The purpose of the enrichment pathway analysis was to confirm if there existed a correlation connection between the molecular mechanism associated with our top 100 DMPs and embryonic development, growth factors, or height-related traits.

Participants characteristics
The subset of the RODAM study presented, included 704 participants after quality control protocols (Table 1).Among this population, the mean age was 51.2 (± 9.73 SD), while average height was calculated at 164.04 cm (± 8.33), LHR was 0.50 (± 0.02), leg length 82.65 cm (± 5.21), and sitting height measuring 81.45 (± 4.72) (Table 1).While the various blood cells were observed to distribute at similar levels: CD8Tþ lymphocytes, CD4Tþ lymphocytes, natural killer cells (NK), B cell, monocytes, and granulocytes.When the subset was stratified between males and females, we observed that male participants were older and taller.The subset had an average age of 52 (± 9 SD) among males and 51 (± 10 SD) among females.Demographic analysis did not demonstrate a correlation between sex and a difference in blood cell type distributions, or habitual smoking (Table 1).

Differentially methylated positions
We detected no epigenome-wide significant DMPs among the four height-related traits.Although not statistically significant, we evaluated the top 25 DMPs per trait.Among the leg length DMPs, we identified cg26905768 annotated to the body of BMPER and cg13268132 annotated to the promoter region of TNFRSF11B, which were both hypomethylated (Table 2).Top DMPs among height included hypomethylated cg19776793 annotated to the body of SLC38A10, and for LHR, hypomethylated cg23072383 annotated to the TSS1500 of SLC35E4.For sitting height, we associated a hypomethylated probe cg24625894 annotated to the promoter region of SLC39A4.Note that the latter three DMPs are all members of the SLC30 gene family (Supplementary Table S1, S2, and S4).The possible informative nature of these probes will be discussed at length in the Discussion section.Post hoc analyses applying gaphunter, did not return any additional genetic variation among the 25 DMPs with the smallest p-values, per trait.Additionally, sex-stratified analysis did not reveal statistically significant DMPs associated with any of the traits.

Differentially methylated regions
Next we aimed to detect DMRs by applying DMRcate for all traits.From the DMRcate analysis, height was associated with 1291 DMRs according to a FDR threshold cutoff of 0.12 of which 110 demonstrated significance (Supplementary Table S5).Sixteen significant DMRs were detected for sitting height out of a total of 2053 according to an FDR threshold cutoff of 0.27 (Supplementary Table S7).Leg length generated 1506 DMRs, however none of the DMRs were epigenome-wide significantly represented (Supplementary Table S6).Several DMRs were annotated within the same gene; these repetitive DMRs were observed in one or more height-related traits.These recurring DMRs were located within the body of the HOXA gene cluster (covering 23, and 26  S7).DMR analyses applying Bumphunter, using an effect size cutoff of 0.0025, produced one significant DMR.This DMR was detected for both leg length and sitting height (covering 9, and 9 DMPs, respectively) and were located in the HLA-DPB1 gene.Neither height nor LHR generated DMRs.Noteworthy, the DMRs observed within the HLA-DPB1 gene associated with sitting height, were detected in both the DMRcate and bumphunter procedures, associated with sitting height.

Pathway analysis
The pathway analysis produced no results which could be used to add or subtract credibility from our hypothesis.Several p-value significant molecular mechanisms relating to bone development

Key findings
We conducted an exploratory association study to investigate associations between DNAm profiles and height-related traits using data of the RODAM study.Even though not epigenomewide significant, we did identify potentially informative DMPs and DMRs, located in genes previously linked to the growth regulation.
To our knowledge, none of the observed DMPs or DMRs have ever been associated with these height-related traits before.The relevant DMPs include multiple probes annotated to the SLCA gene family, across both height and sitting height.Leg length identified two CpGs within genes associated with bone cell regulation: cg26905768 annotated to BMPER and cg13268132 annotated to TNFRSF11B.BMPER specifically encodes a secreted protein that limits bone morphogenetic protein (BMP) function, inhibits BMP2-and BMP4-dependent osteoblast differentiation 35 , and modulates BMP-dependent differentiation among endothelial cells. 32The observed DMP was annotated to the body of the BMPER gene, and hypomethylated in this gene region generally associated with lower expression of the gene.In the context of regulation of leg length, this would be in line with our expectation that expression of bone cell genes should decrease upon aging.TNFRSF11B, encodes osteoprotegerin, a protein that regulates osteoclastogenesis inhibitory factors. 36,37TNFRSF11B's role in bone cell resorption has been hypothesized as contributing to osteoporosis as well as other conditions caused by decreased bone density. 36,37As hypomethylation of promoter regions generally is associated with more expression of the gene, it thus seems that TNFRSF11B is more expressed.As osteoclasts play a role in age-related decrease in bone mineral density, activity of this gene might be relevant given the average age of the study population. 36,37he alignment of multiple SLCA genes has potential importance as this gene family assists in transport of protein, zinc, and iron throughout the body during development. 38Specifically SLC39A4 gene (annotated to sitting height) is correlated with bone cell maintenance. 39Unfortunately, pathway analysis returned no significant results.In summary, our observed DMPs associated with height-related traits have previously been associated with growth factors or bone cell regulation, and the observed methylation levels, in the respective locations, are mostly in accordance with what we would expect in height-related traits.We identified several DMRs that were annotated to a hypomethylated HLA-DPB1 gene, as well as hypomethylation among the HOXA genes cluster for at least two height-related traits.In both DMRs we observed hypomethylation in the body of each gene, which asserts less gene expression.The HLA gene family is mostly associated with chronic autoimmune diseases 31,40 and inflammatory conditions.HLA-DPB1 is a class II HLA gene associated with the body's defense against infection. 31,40This gene, however, has never been studied in the context of growth regulation.We therefore cannot assert a physiological rationale for multiple DMRs being detected amongst only two of our heightrelated traits.For both sitting height and LHR, we found DMRs annotated to the HOXA cluster.The homeobox, or HOX, gene family is highly influential during embryonic development in most species. 30,40Researchers have demonstrated HOXA genes are associated with various forms of development, including skeletal regulation. 40Therefore, hypomethylation of the body of the HOXA genes, the implication of less expression, is in line with our expected findings as HOXA gene misregulation or mutations leading to changes in skeletal development are correlated with   Journal of Developmental Origins of Health and Disease early development, 40,42 not during middle life, which is the average age of the of our RODAM study subset.Our findings could serve as a starting point for further research to assess the role of the HOXA gene in human skeletal development.
Although there is no explicit information about exposure to malnutrition in early life, the average age of the RODAM cohort allows for speculation on exposure to food shortage and the impact on height.In Ghana, two periods of famine occurred in the latter half of the 20 th century.In the era between 1960 and 1974 there were widespread food shortages in rural areas throughout the country. 43,44At the time, an estimated 70% of Ghanaian citizens were estimated to reside in rural regions. 41More widely known is the 1981-1983 famine leading to a nationwide occurrence of malnutrition and inaccessibility to cash crops. 43Based on the mean age of RODAM participants, most would have been born or experienced infancy in Ghana between 1960 and 1974.This earlylife exposure to famine could have impacted DNAm patterns, thereby affecting height and related traits.

Strengths and limitations
A major benefit of using the RODAM study population is its relative genetic homogeneity, meaning that all individuals stemmed from one region in Ghana, the Ashanti region, and the majority of participants identified as originating from one ethnic groups, the Akan. 16Additionally, the prevalence of confounding factors in DNAm studies like smoking and alcohol consumption were very low and therefore we assume that our results were not impacted.
Our study has several limitations.The RODAM cohort was designed to investigate metabolic disease and immigration-related health concerns among Sub-Saharan African populations.It was not the primary purpose of the RODAM cohort to explore heightrelated traits, or the confounding factors that contribute toward height development.This difference between our use of the RODAM cohort and its original epidemiological purpose could contribute to our overall lack of statistically significant results.Note the additional co-factors of RODAM representing a mean age of 50 for its participants and that height-related traits have an ~80% heritability rate.This does pose the possibility that epigenetic signaling was diminished due to environmental factors over the course of participants' lifetimes.We suggest the use of younger cohorts in future.Additionally due to the relatively small sample size, as well as the effect size, our study was limited in statistical power to detect epigenome-wide significant DMPs.The small number of participants in our study subset meant that a sexstratified analysis was not possible, but differences based on sex were not expected as correlation between sex and height-related traits were shown to be low.We relaxed significance thresholds for both DMP and DMR protocols, potentially resulting in false positive findings.Moreover, we cannot definitively state whether or not our height-related traits were affected by any nutritional deprivation during early-life development, as we did not apply a longitudinal design.This limits our capacities to make statements on causality.Additionally, we assumed that DNAm patterns of height-related genes remain stable in adult, however, this assumption cannot be verified in this study.Differential epigenetic aberrations might echo in later life without having a current functional effect, but would be involved in other traits, biological mechanisms, or co-morbidity such as cardiovascular disease. 18astly, this study was conducted utilizing epigenetic profiles based on whole bloods samples, though we focused on the height-related traits determined by bone development.As DNAm is tissuespecific, and DNAm profiles derived from blood might therefore not be representative of processes occurring in bone tissue. 45pigenetic profiles derived from bone-related tissues would help to validate our findings.

Conclusion
In this proof-of-principle study, we found several potential DNAm markers for height, and height-related traits annotated to genes involved in skeletal and early development in humans.These findings can serve as a starting point to further elucidate the role of DNAm in skeletal development.Future research including a larger

Figure 1 .
Figure 1.Conceptual framework diagram.The diagram demonstrates this study's exploration of DNA methylation profiles associated with height-related traits due to these traits being influenced by both genetics and environmental factors.

Figure 2 .
Figure 2. QQ plots visualizing the fit of the linear regression models used in DMP analysis for height (A), LHR (B), leg length (C), and sitting height (D).

Figure 4 .
Figure 4. Comet plot of the differentially methylated region identified HLA-DPB1 gene based on the sitting height trait results.Plot show a differentially methylated region in chromosome 6, 33.646943 and 33.051125 megabases (mb) obtained via DMRcate.The red box highlights the probes annotated to the DMR.Correlations are measured using the spearman rambling coefficient.

Figure 5 .
Figure 5. Comet plot of the differentially methylated region identified HOXA cluster gene based on the sitting height trait results.Plot show a differentially methylated region in chromosome 7, 27.167207 and 27.173529 megabases (mb) obtained via DMRcate.The red box highlights the probes annotated to the DMR.Correlations are measured using the spearman rambling coefficient.

Table 1 .
Characteristics of RODAM study including participants demographic information, lifestyle factors, height-related traits, and distribution of the cell types observed

Table 2 .
Differentially methylated regions (DMRs) for height, sitting height, and leg length identified using two methods (DMRcate and bumphunter).DMRs with FWER > 0.2 or Stouffer coefficient > 0.05Journal of Developmental Origins of Health and DiseaseAre listed in Supplemental TableX.Chr = Chromosome, UCSC Reference: the gene feature based on the UCSC genome browser, Gene: the annotation performed via.Illumina R package version 0.06, UCSC build HG37.