A biomarker feasibility study in the South East Asia Community Observatory health and demographic surveillance system

Background Integration of biomarker data with information on health and lifestyle provides a powerful tool to enhance the scientific value of health research. Existing health and demographic surveillance systems (HDSSs) present an opportunity to create novel biodata resources for this purpose, but data and biological sample collection often presents challenges. We outline some of the challenges in developing these resources and present the outcomes of a biomarker feasibility study embedded within the South East Asia Community Observatory (SEACO) HDSS. Methods We assessed study-related records to determine the pace of data collection, response from potential participants, and feedback following data and sample collection. Overall and stratified measures of data and sample availability were summarised. Crude prevalence of key risk factors was examined. Results Approximately half (49.5%) of invited individuals consented to participate in this study, for a final sample size of 203 (161 adults and 42 children). Women were more likely to consent to participate compared with men, whereas children, young adults and individuals of Malay ethnicity were less likely to consent compared with older individuals or those of any other ethnicity. At least one biological sample (blood from all participants – finger-prick and venous [for serum, plasma and whole blood samples], hair or urine for adults only) was successfully collected from all participants, with blood test data available from over 90% of individuals. Among adults, urine samples were most commonly collected (97.5%), followed by any blood samples (91.9%) and hair samples (83.2%). Cardiometabolic risk factor burden was high (prevalence of elevated HbA1c among adults: 23.8%; of elevated triglycerides among adults: 38.1%; of elevated total cholesterol among children: 19.5%). Conclusions In this study, we show that it is feasible to create biodata resources using existing HDSS frameworks, and identify a potentially high burden of cardiometabolic risk factors that requires further evaluation in this population.


Introduction
There is a need for comprehensive data resources on population health and disease in low-and middle-income countries, where a large proportion of the global burden of morbidity and mortality is located (1,2). Biomarker data form an essential component of such endeavours, allowing objective assessment of a wide range of disease-related indices, facilitating validation of self-reported information, and allowing for greater statistical power of analyses. Integration of biomarker data with information on health and lifestyle provides a powerful tool to enhance the scientific value of health research.
Large-scale surveys in low-and middle-income populations, such as the Demographic and Health Surveys, have previously included biomarker modules (3). However, these have often been restricted to a narrow range of measures from limited samples, with variable capacity for long-term storage and later analysis (3). Importantly, they are unable to follow up individuals over time. Health and demographic surveillance system (HDSS) sites offer a valuable opportunity for efficient, large-scale collection and analysis of biomarker data. They provide pre-existing infrastructure to facilitate biological sample collection, and the potential to link biomarker data longitudinally to historical and future measures. This linkage allows for a detailed view of disease development across the life course (4).
We undertook a biomarker feasibility study embedded within the South East Asia Community Observatory (SEACO) HDSS, which covers approximately 45 000 individuals in Segamat, Malaysia (5). The SEACO HDSS conducts annual enumeration of individuals, and has also undertaken a population-wide health survey collecting questionnaire data and biophysical measurements, in its catchment area (5). Through this study, we explored the feasibility of building upon the previous survey work conducted by SEACO to include biological sample collection. This feasibility study aimed to recruit approximately 200 individuals aged seven years and above to assess the preparedness of individuals and families to participate, and to establish the procedures for the collection, analysis and storage of biological samples within a predominantly rural community setting. Here, we outline the developments in the procedures and examine the outcomes of this study to determine the potential to create a large-scale biodata resource within the full HDSS population.

Methods
A detailed profile of the SEACO HDSS, including the HDSS development, structure, and data collections, is presented in a recent publication (5).

Sampling
Adult (aged 18 years and over) and child (aged 7-17 years) participants for this study were recruited from the SEACO HDSS (5). Stratified random sampling was performed at the household level using data from the most recent enumeration (completed in 2016), aiming to achieve comparable proportions of individuals of Malay, Indian, Chinese and Orang Asli (indigenous) ethnicity. Sampling therefore covered all enumerated households within the SEACO catchment area (approximately 1250 km 2 ). SEACO has established strong community links through its community engagement strategy (6), and additional community awareness activities were undertaken to sensitise potential participants prior to this study.

Data and sample collection
Community-based data and sample collection was undertaken by two field teams between November 2016 and February 2017. Data were recorded on electronic tablets. Informed consent (adults) or informed assent with parental or guardian consent (children) was first obtained; individuals could only participate if they consented to providing all data and samples (Supplementary Methods). Following informed consent, along with questionnaire and biophysical data, capillary blood (via finger prick, for point-of-care glycated haemoglobin [HbA1c] measurement), and venous blood (four tubes from a single blood draw: up to 24 ml from adults, 12 ml from children; for serum, plasma and whole blood samples) were collected from participants. Hair and urine samples were also collected from adult participants. Following data and sample collection, participants were given their body mass index (BMI), blood pressure and point-for-care HbA1c results, and were provided referral to local clinics if these were above predetermined cut-offs. One session of data and sample collection took approximately 40-50 minutes for adult participants and 30 minutes for children (see Supplementary Methods for further details on sample collection purposes and procedures).

Measures and statistical analysis
Study measures to evaluate scale-up Literature on suitable measures or assessment frameworks to determine feasibility for population-based observational studies is scarce (7-10). We therefore identified and examined a range of study-related measures to gain a comprehensive picture of the potential for scale-up. This included indicators of efficiency, response from potential participants, feedback from participants, and completeness and quality of collected data and samples.
First, we summarised study operational data to assess operational efficiency and response to the study. This assessment included information on the number of days of data and sample collection; the number and demographic characteristics of households and individuals approached; proportions consenting, declining or absent; reasons for refusal among those declining participation; and post-study feedback among participating individuals. Study pace was calculated as the average number of participants recruited per day. Differences in demographic characteristics between consenting and non-consenting individuals were assessed using Pearson's chi squared tests or Fisher's exact tests (cell counts less than five).
Following this, we examined measures relating to quality and completeness of data and samples. We were particularly interested in measures relating to blood sample collection, availability of blood test data and availability of blood sample aliquots, as indicators of the success of sample collection, analysis and storage. We extracted relevant information from three datasets generated at the end of the study: (i) data recorded on the electronic questionnaire form, (ii) blood test results, and (iii) records of receipt, processing and aliquoting of biological samples at the central research laboratory. All three datasets were cleaned, merged and checked for consistency. The completeness of questionnaire data for each participant was assessed by examining a set of all questions and measurements collected from all participants. The number of participants with any questionnaire data, blood test data, collected samples and samples for storage (plasma, serum, whole blood and remnant cell aliquots, urine aliquots and hair samples) was examined, and differences by sex, ethnicity and obesity status were assessed. The number of participants with complete data and samples was similarly examined.
Sociodemographic, lifestyle and risk factor data Finally, sociodemographic characteristics of study participants and crude prevalence of key lifestyle, biophysical and blood-based risk factors in the population were examined; differences by sex were assessed using Pearson's chi squared or Fisher's exact tests (see Supplementary Methods for list of variables and corresponding definitions).
All data management and analyses were performed using Stata 14 (Statacorp, Texas).

Ethical approvals
Ethical approval for the study was obtained from the Monash University Human Research Ethics Committee (CF16/471-2016000227), and approval for the receipt and analysis of linked anonymised data at the University of Cambridge was obtained from the University Human Biology Research Ethics Committee (HBREC.2017.04) (Supplementary Methods).

Study measures to evaluate scale-up
Measures of study recruitment and response Overall, 203 participants (161 adults, 42 children) were recruited into the biomarker feasibility study, close to half (49.5%) of those responding to an invitation to participate ( Figure 1, Table 1).  Table S1). A greater proportion of women (56%) versus men, individuals aged 50-59 years (70.1%) or 60 years and above (64.7%) versus younger individuals, and those of Orang Asli ethnicity (64.9% among adults, 70.5% among children) versus those of any other ethnicity were available during recruitment (Table 1). Of those available and subsequently invited, women (68.5%, P < 0.001) were more likely to consent to participate compared with men, whereas children (30.0%) and young adults (48.2%), and those of Malay ethnicity (adults: 41.3%, P < 0.001, children: 19.0%, P = 0.129) were less likely to consent, compared with older individuals or those of any other ethnicity (Table 1).
Of 170 (83.7%) participants providing post-study feedback, over 95% agreed with comments relating to a favourable experience, including comfort during questionnaire administration (99.4%), interest in the study results (100.0%), and willingness to encourage others to participate in the study (99.4%) (Supplementary Table S2).

Completeness and quality of data and samples
We then examined the availability of data and samples collected from participants. All participants had some available questionnaire information, with most having three or fewer missing variables ( Table 2, Supplementary Tables S3-S4). At least one biological sample (capillary blood, venous blood, hair or urine) was collected at the anticipated quantity from all individuals ( Table 2, Supplementary Table S5). Over 90% of participants had some blood test data, whilst approximately 70-80% had complete data (Table 2), with no systematic differences in data and sample availability by ethnicity ( Supplementary Figures S3-S4).
Given the potential to obtain detailed biomarker information from blood, the availability and quality of blood samples was of particular interest in this study. A capillary (finger-prick) blood sample was successfully collected from all participants, with successful point-of-care HbA1c measurement in almost all (99.0%) participants (Table 2). At least one venous blood sample of any volume was collected from over 90% of both adult and child participants, with 82.6% of adults and 95.2% of children having all four blood samples collected at any volume (Table 3;  Supplementary Tables S6-S7). Notably, obese adults were less likely to have blood samples successfully collected (at least one blood sample at any volume: 100% among non-obese adults versus 79.5% among obese adults, P = 0.002) (Supplementary Table S8). Almost all collected blood samples passed as acceptable quality by the research laboratory, for processing, analysis and storage (Table 3; Supplementary Tables S6-S7). At least one storage aliquot was available from all collected and accepted blood samples among children, and over 96.2% of samples among adults (Supplementary Tables S9-S10).

Sociodemographic, lifestyle and risk factor data
In addition to a notable prevalence of lifestyle and biophysical risk factors, we found a high burden of blood-based cardiometabolic risk factors in this population. Close to one quarter of adults (23.8%) had elevated HbA1c, while 8.2% had elevated total cholesterol, 15.0% had low HDL cholesterol, and 38.1% had elevated triglycerides (Table 4). Risk factor prevalence was similarly high among children: 19.5% had elevated total cholesterol, 14.6% had low HDL cholesterol and 36.6% had elevated triglycerides (Table 4).

Discussion
Detailed, objective measures provided by biomarker information are fundamental to comprehensive data resources on population health and disease. In this study, we show the feasibility of biomarker collection within the context of the SEACO HDSS. Approximately half of invited individuals consented to participate in biological sample collection, with favourable participant feedback. Biological samples were collected from all participants. Outcome measures indicated that there was scope to increase study pace, and a need to improve blood sample collection from obese participants, both attainable through appropriate modifications to study design and training. A high prevalence of blood-based cardiometabolic risk factors was observed among both adult and child participants. These results indicate that creation of a large-scale biodata resource is both achievable and valuable in this population, with potential relevance to similar HDSS sites.
We demonstrate here that capitalising on existing HDSS frameworks to undertake biomarker collection is an efficient way to encourage community participation, and to enhance their value as data resources. We undertook biological sample collection by building upon the strong existing infrastructure, data, human and material resources, local knowledge and community and administrative links established by the SEACO HDSS (5). The proportion of consenting versus invited participants observed in this study is comparable to or greater than other large-scale biobank or biomarker collection studies based in high-income countries (11,12). Participants were willing to provide both capillary and venous blood samples, with successful capillary blood collection for all participating individuals. Blood test data and storage aliquots were available for the majority of participants, indicating the successful establishment of procedures from sample collection to analysis and long-term storage. Data and sample collection took under an hour, and participants providing feedback Global Health, Epidemiology and Genomics 3 responded favourably to the study. The community engagement strategy previously established by SEACO provided a mechanism through which individuals could raise and address concerns they had with participation in this study (6). Importantly, we have the capacity to link information obtained in this study with measures from both previous and future HDSS data collections, including later clinical outcomes, which will facilitate the creation of richer datasets that may be explored in future analyses. Compared with the growing focus on feasibility studies for randomised clinical trials (13-24), literature on operational outcomes of observational feasibility studies remains scarce, and restricted to a limited number of measures, such as the overall proportion of invited individuals ultimately participating (7-10). Few studies have directly assessed measures of sample collection feasibility, with none identified here that specifically examined blood sample collection (7,25). Here, we identified useful indicators relating to various aspects of study operation including sample collection, using these in the context of our study to obtain a clearer understanding of the feasibility of scale-up. Systematic assessment of such measures may be useful to researchers  EDTA: ethylene diamine tetra-acetic acid. 1 At least one of: plain serum or EDTA (plasma) or EDTA (whole blood 1) or EDTA (whole blood 2). 2 All of: plain serum and EDTA (plasma) and EDTA (whole blood 1) and EDTA (whole blood 2).

Global Health, Epidemiology and Genomics
planning similar data and sample collections in other low-and middle-income populations. While most outcomes assessed here indicated successful establishment of study operations, we identified two areas requiring improvement, which may be successfully addressed through simple modifications to study design and training. This included the slow study pace relative to the number of field teams and time taken per session of data and sample collection. This survey design-related issue was likely a result of the notable proportion of houses empty upon approach, due to outmigration or unavailability of household members at the time of recruitment. This, along with the predominantly rural setting and large sampling area, increased the travel time between houses with consenting individuals. More suitable methods of recruitment to improve study efficiency could include approaching sampled households in a separate recruitment drive to establish availability, willingness to participate, and to arrange convenient time windows for data and sample collection. We also observed lower blood sample collection success among obese participants, an issue specific to biomarker collection which may be resolved by further directed training of study phlebotomists.
The proportion of participating individuals in this study, along with differential response to participation across demographic subgroups, may suggest implications for generalisability. Although the demographic profile of this study may not be fully representative of the wider population, analyses arising from this study have the capacity to produce internally valid results regarding aetiological relationships, with wider relevance to other populations (11). Nonetheless, our observations indicate an opportunity to further improve recruitment strategies overall and across specific subgroups, in future data and sample collections.
The high burden of cardiometabolic risk factors observed in the current study population is consistent with previous findings from the SEACO HDSS (26,27). Similar trends have been reported in other middle-income countries including those from Asia, and are thought to be a result of epidemiologic transitions occurring in these populations (28)(29)(30)(31). These observations reinforce the need for large-scale biomarker data from such populations to comprehensively assess disease risk and associated influences across the life course. We demonstrate here that existing HDSS resources can be successfully augmented to achieve this purpose.
We present a study undertaken within a specific context, with basic infrastructure and resources already in place through the SEACO HDSS and augmented by collaborating institutions. Given our context and particular interests, we made specific choices regarding study design, including biological samples of interest, consent structure, the collection of non-fasting blood samples, and test result feedback and onward referral of participants. Researchers planning biomarker collections in other settings must consider their specific contexts and aims to inform decisions relating to suitable study design. Importantly, the measures presented here may be applicable and useful to understanding the feasibility of such biomarker collections regardless of exact study methodology.
To conclude, we show that biological sample collections to create biodata resources using existing HDSS frameworks are feasible. Using this approach, we identify a potentially high burden of cardiometabolic risk factors that requires further evaluation in this population. Building upon existing HDSS resources in this way would greatly enhance their scientific value, and contribute towards addressing the need for comprehensive biomarker data from low-and middle-income populations.  Classification of all risk factors is described in the Supplementary Methods. Differences in distributions between men and women or boys and girls were assessed using Pearson's chi squared or Fisher's exact (cell counts < 5) test. N was reduced due to missing observations for the following measures: (1) Low fruit and vegetable consumption among girls (N = 18); (2) Overweight, obesity, central obesity and elevated waist to hip ratio and elevated HbA1c among women (N = 101); (3) Elevated HbA1c in girls (N = 18); (4) All cholesterol and triglyceride measures among girls (N = 18), men (N = 58) and women (N = 89). 1 Measures for hypertension and elevated cholesterol prevalence included individuals who reported being told they had elevated blood pressure or cholesterol.