How differing methods of ascribing ethnicity and socio-economic status affect risk estimates for hospitalisation with infectious disease

Significant ethnic and socio-economic disparities exist in infectious diseases (IDs) rates in New Zealand, so accurate measures of these characteristics are required. This study compared methods of ascribing ethnicity and socio-economic status. Children in the Growing Up in New Zealand longitudinal cohort were ascribed to self-prioritised, total response and single-combined ethnic groups. Socio-economic status was measured using household income, and both census-derived and survey-derived deprivation indices. Rates of ID hospitalisation were compared using linked administrative data. Self-prioritised ethnicity was simplest to use. Total response accounted for mixed ethnicity and allowed overlap between groups. Single-combined ethnicity required aggregation of small groups to maintain power but offered greater detail. Regardless of the method used, Māori and Pacific children, and children in the most socio-economically deprived households had a greater risk of ID hospitalisation. Risk differences between self-prioritised and total response methods were not significant for Māori and Pacific children but single-combined ethnicity revealed a diversity of risk within these groups. Household income was affected by non-random missing data. The census-derived deprivation index offered a high level of completeness with some risk of multicollinearity and concerns regarding the ecological fallacy. The survey-derived index required extra questions but was acceptable to participants and provided individualised data. Based on these results, the use of single-combined ethnicity and an individualised survey-derived index of deprivation are recommended where sample size and data structure allow it.


Introduction
The epidemiology of infectious disease (ID) in New Zealand (NZ) is marked by significant ethnic and socio-economic disparities, with higher rates observed in Māori and Pacific peoples, and in areas of greater socio-economic deprivation. [1][2][3] Likewise, higher rates of ID are also seen in indigenous and marginalised ethnic minority groups in comparable developed countries such as Australia and the USA [4,5]. Therefore, accurate measures of ethnic identity and socio-economic deprivation are of particular importance in epidemiological research.
Ethnicity is a complicated social construct that describes cultural identity or affiliation [6]. The related but distinct concept of race is not generally used in research in NZ. There are a number of methods of ascribing ethnicity in epidemiological research including selfprioritisation, total response and single-combined ethnicity, each method having its advantages and disadvantages [7]. Socio-economic deprivation can also be measured in various ways, including by directly questioning household income, and by using census-derived geographic measures [8] or survey-derived individual measures [9].
In this study, we used hospitalisation for an ID before the age of 5, an outcome known to be linked to ethnicity and socio-economic disadvantage [2], to compare different measures of ethnicity and socio-economic deprivation.

Study population
This study was conducted within the Growing Up in New Zealand (GUiNZ) longitudinal birth cohort. GUiNZ enrolled 6822 pregnant mothers from the Auckland, Counties-Manukau and Waikato District Health Board areas due to deliver in 2009-2010 [10,11]. The cohort includes 6853 children born from these pregnancies, 11% of the national birth cohort. The cohort is generalisable to the national birth cohort with regards to ethnic and socio-economic diversity [12].

Ethics
Ethical approval was obtained from the Ministry of Health Northern Y Regional Ethics Committee (NTY/08106/055). Written informed consent for interviews and data linkage was completed by each child's primary caregiver.

Data collection and linkage
GUiNZ has conducted computer-assisted face-to-face interviews with the primary caregiver of each study child at multiple time points, including during the antenatal period, and at 9 months, 2 and 4½ years of child age. This study uses data from the interview conducted at 4½ years of child age. Linkage was established between GUiNZ datasets and the National Minimum Dataset (NMDS) using each child's unique National Health Index number. The NMDS is a national administrative health dataset that contains records of all public hospital admissions in NZ, including emergency department visits but not outpatient clinic attendances [13]. Public hospital care, including all acute inpatient paediatric care, is free for NZ permanent residents and citizens. The NMDS includes discharge diagnoses coded using the Australian Modification of the International Classification of Diseases and Health Related Problems (ICD-10-AM) [14].

Primary outcome
The primary outcome was hospitalisation for an ID in the first 5 years of life, treated as a binary (ever/never) variable. Hospitalisations for an ID were identified using the first diagnostic code listed in the NMDS, as the first listed code represented the primary health problem managed during that hospitalisation. This was done to avoid counting nosocomial infections and minor infections, where infection was not the main reason for hospitalisation. Acute and chronic viral, bacterial, fungal and parasitic infections were included, as were common infective syndromes without a specific microbiological diagnosis, such as fever, upper respiratory tract infection and gastroenteritis. Infective exacerbations of chronic non-infective conditions such as asthma were included but hospitalisations for non-infective sequelae of past infections were excluded. Birth hospitalisations and hospitalisations for non-medical reasons (e.g. maternal hospitalisation) were also excluded.

Ethnicity variables
Primary caregivers were asked the following questions regarding the ethnicity of the study child: • Total response ethnicity question: Which ethnic group or groups does (child name) belong to? • Prioritised ethnicity question: Which is the MAIN ethnic group that (child name) identifies with?
Multiple responses were allowed for the total response ethnicity question. A single response was requested for the prioritised ethnicity question, but if this was not possible, up to two responses were allowed. For both questions, caregivers were offered a list of 32 of the most common ethnicities in NZ, and could also choose to specify up to two unlisted ethnicities, or to decline the question.
Responses were aggregated into broad ethnic groups for the purposes of this study -Māori, Pacific, Asian and European/other. Responses of 'New Zealander' (479 children (8.5%)) were included in the European/other group. None of the methods described below account for multiple ethnicities within an ethnic group.
Self-prioritised ethnicity was defined as the response to the prioritised ethnicity question above. If two ethnicities from different ethnic groups were indicated, then for the purposes of this study only, these responses were prioritised in the order Māori, Pacific, Asian and European/other. This resulted in a single ethnic group variable with four non-overlapping levels. ID hospitalisations were compared between these groups and also with those unable to prioritise a single ethnic group excluded. The European/other group was used as the comparator in these self-prioritised ethnicity analyses.
Total response ethnicity was defined as the response to the total response ethnicity question above, with membership of each ethnic group recorded separately. This resulted in four overlapping binary ethnic group variables. In unadjusted analyses, ID hospitalisations for a given ethnic group were compared with the combined pool of children not in that group, e.g. Māori children were compared with all non-Māori children. All four groups were included in the multivariable analyses. As this approach meant there was no single baseline for comparison, analyses were repeated using the non-overlapping 'European/other only' group as a comparator.
Single-combined ethnicity was also defined using responses to the total response ethnicity question, with individuals assigned to a single or combination ethnic group matching their combination of ethnicities. This resulted in a single ethnic group variable with 15 non-overlapping levels. As several combination groups included few individuals, analyses were also presented using a modified approach focussing on Māori and Pacific children, in which mixed ethnicities were aggregated into the following groups: Māori + other non-Pacific ethnicity, Pacific + other non-Māori ethnicity, Māori + Pacific ± other ethnicity, and European/other + Asian. The 'European/other only' ethnic group was used as the comparator in both analyses.

Socio-economic variables
Household income was queried directly and primary caregivers were able to respond with a weekly, fortnightly, four-weekly, monthly or annual amount, before or after tax, from which annual income was calculated. Income brackets were offered to participants unable to answer, again before or after tax. NZ has a system of progressive marginal tax rates applied to individual earners, and tax credits which are available for lower income families. For post-tax amounts, it was not possible to calculate the corresponding pre-tax amount as household income may have been spread over several earners, and eligibility for tax credits was not recorded.
The NZDep is a census-derived, area-level measure of deprivation determined for the household using the usual address of the child [8]. The NZDep2013 was calculated from 2013 national census responses covering aspects of deprivation including internet access, receipt of government benefits, household income, The NZiDep is an individualised index of deprivation, calculated for the primary caregiver from responses to eight interview questions regarding unemployment, receipt of government benefits and community charity, and the need to economise on food, heating and footwear [9]. The interview questions, and the standard NZiDep questionnaire, are included in Supplementary  Table S1. A score from 1 (no responses suggesting deprivation) to 5 (5-8 responses suggesting deprivation) is calculated. Comparisons of ID hospitalisations were performed using the score and a binary variable comparing those scoring 1 and 2 with those scoring 3, 4 and 5.

Statistical analyses
Analyses were performed using SAS version 9.4 software (SAS Institute, Cary, NC, USA). Proportions hospitalised for an ID and risk differences between ethnicity methods were calculated. Unadjusted analyses were presented using relative risks, 95% confidence intervals and p-values derived from the χ 2 or Fisher's exact tests. Multivariable analyses were performed using logbinomial regression and were likewise presented as relative risks with 95% confidence intervals. Log-binomial regression was performed in preference to logistic regression as the outcome used was not rare. Multivariable models were first built using only the multiple levels of the variable of interest, then repeated with ethnicity models corrected for socio-economic deprivation using the NZDep2013 quintiles and socio-economic models corrected for ethnicity using self-prioritised ethnicity.

Results
The effect of ethnicity measures on risk of hospitalisation for an ID A total of 5602 children had results available from both the 4½-year interview and linked hospitalisation data from the NMDS. A single self-prioritised ethnicity was provided for 4991 (89.1%) children, while two were provided for 528 (9.4%) children, 133 (2.4%) within one broad ethnic group and 395 (7.1%) across two ethnic groups. The caregivers of 83 (1.5%) children declined to prioritise an ethnicity. A single total-response ethnicity was identified for 3062 (54.7%) children with the remainder having multiple ethnicities identified. As some of these multiple ethnicities fell within the same broad ethnic group, 3880 (69.3%) children had a single total response ethnic group. A majority of children in each total response ethnic group were self-prioritised to the corresponding group, as shown in Supplementary Table S2. Table 1 shows the relative risk of hospitalisation for an ID for each ethnic group by method of assigning ethnicity. Māori and Pacific children had a greater risk of hospitalisation for an ID across all methods of ascribing ethnicity and this was only partially reduced by correction for socio-economic deprivation. Relative risk estimates were higher for Māori and Pacific children when self-prioritised ethnicity was used compared with totalresponse ethnicity. When no fixed comparator group was used, European/other total response ethnicity was associated with a lower risk of hospitalisation for an ID. Using single-combined ethnicity, the larger groups containing Māori or Pacific children showed an increased risk for hospitalisation with an ID, as did the 'Asian only' group. No significant associations could be identified for the smallest combined ethnicity groups containing <100 children. After aggregation, a significant association with hospitalisation for an ID was apparent for all single or combination groups containing children with Māori or Pacific ethnicity. Higher relative risk estimates were seen for the Māori-only, Pacific-only and Māori + Pacific (±other) groups than for the mixed Māori + non-Pacific and Pacific + non-Māori groups.
When risk differences between corresponding ethnic groups using different methods were calculated (Table 2), a slightly higher risk for European/other children and a non-significant trend towards lower risk for Māori and Pacific children were seen when using total response compared with self-prioritised. Self-prioritised Māori ethnicity appeared to overestimate the risk for mixed Māori + non-Pacific children defined using aggregated single-combined ethnicity. Likewise, Pacific self-prioritised ethnicity appeared to overestimate the risk for mixed Pacific + non-Māori children. These differences were further accentuated when comparing total response with single-combined ethnicity. In addition, total response Māori ethnicity was found to underestimate the risk for Māori-only, and mixed Māori + Pacific children, while total response Pacific ethnicity underestimated the risk for Pacific-only children.

The effect of socio-economic status measures of risk of hospitalisation for an ID
Pre-tax household income was provided by 3833 (68.4%) primary caregivers and post-tax income by 1165 (20.8%). Household income was not provided by 604 (10.8%) primary caregivers. These data were not missing at random -5.0% of those in the least deprived NZDep2013 quintile and 21.5% of those in the most deprived NZDep2013 quintile had missing household income data ( p < 0.0001). In addition, post-tax income was provided by 15.0% of those in the least deprived quintile and by 35.0% of those in the most deprived quintile ( p < 0.0001). Due to the risk of bias and the inability to correlate pre-and post-tax income accurately, further analyses were not performed using household income.
While the NZDep2013 resulted in quintiles of similar size (range 952-1373), the NZiDep gave 55.1% of primary caregivers a score of 1. Both systems demonstrated a social gradient in relative risk of hospitalisation for an ID that was maintained after correction for ethnicity, as shown in Table 3. The full range of NZiDep scores was seen within each NZDep2013 quintile but there was a strong relationship between increasing NZDep2013 quintile and increasing NZiDep score, whether used directly or in a binary form (Supplementary Table S3).

Relationship between ethnicity and socio-economic status
Examination of socio-economic status within ethnic groups (Supplementary Tables S4.1 and S4.2) showed that Māori and Pacific children were more likely to live in socio-economically deprived households than Asian and European/other children. When Māori or Pacific ethnicity was defined using total response, the frequency of socio-economic deprivation was lower than when self-prioritisation was used. When aggregated single-combined ethnicity was used, a greater proportion of Māori only, Pacific only and Māori + Pacific (±other) children lived in socio-economically  .1 and S5.2), there was a general trend towards an association between increased deprivation and increased rates of hospitalisation. Ethnic subgroup size was important as the relationship was more likely to reach statistical significance when larger, total response ethnic groups were used, and when the NZiDep was used in a binary form.

Discussion
This study has demonstrated several methods of ascribing child ethnicity and socio-economic status. Regardless of methodology, Māori and Pacific children, and children in the most socioeconomically deprived households, had a higher risk of hospitalisation for an ID. However, the magnitude of these effects varied between methods. The GUiNZ cohort is large and includes representative proportions of Māori, Pacific, and socio-economically deprived families [10,12]. In studies with lower statistical power, due to a smaller sample size or under-representation of disadvantaged population subgroups, the differences between methodologies are likely to be more important. There are also philosophical differences between methodologies which might make one method more appropriate than another in certain settings.
Ethnicity can be an area of significant individual and societal sensitivity, so the use of ethnicity data in research must be managed respectfully. Ethnicity is a self-determined construct deeply enmeshed with related aspects of identity including genetic or geographic ancestry, race, nationality, physical features, shared history, language or religion, and how society identifies the individual [6]. The borders of ethnic categories are indistinct, and subject to local political considerations making international comparisons difficult. While the relationship between ethnicity and ancestry leads to a small degree of genetic variation between ethnic groups [15], this is minor compared with within-group variation, meaning ethnicity functions more as a marker of risk due to social factors than as a risk factor in itself [16,17]. Despite these complexities, ethnicity remains an important variable in epidemiological studies which seek to document disparities in health outcomes and to guide health resource allocation [17,18]. However, many research articles compare outcomes by ethnic group without defining how ethnicity was determined. From a participant perspective, an ideal measure of ethnicity should be self-identified and capable of recording multiple ethnicities and the relative or equal importance of each. From a research perspective, the measure should remain simple to analyse and maintain subgroup size and statistical power.
External prioritisation to a single ethnicity, generally in the order Māori, Pacific, Asian, other non-European, then European, has fallen out of favour in NZ as it does not respect self-identification and fails to account for mixed ethnicities [7]. In addition, external prioritisation maximises the size of the higher priority groups by including individuals who may share relatively little of the ethnic identity and social disadvantage common to that group. Therefore, it may underestimate the burden of excess morbidity experienced by individuals who would prioritise themselves to that ethnic group.

Epidemiology and Infection 5
The methods demonstrated in this study were all selfidentified but differed in the way they treated multiple ethnicities, with consequences for analysis and interpretation. Selfprioritisation did not allow multiple ethnicities and yielded a single non-overlapping ethnic group per participant. As individuals were assigned to the group their caregiver felt they identified with most, dilution of social disadvantage was reducedthis likely explains the slightly (though non-significantly) greater risk estimates seen when compared with total response. However, 11% of the cohort as a whole and 25% of total response Māori children were unable to be prioritised; caregivers either chose two ethnicities or declined to answer. Excluding these individuals had only a small effect on risk estimates but should not be seen as an appropriate response to this problem. Ethnic diversity within the Māori cohort, a parental preference for the transference of Māori ethnicity, and the concept of the indivisibility of whakapapa (genealogy or descent, meaning that a person of Māori descent is regarded as Māori regardless of the proportion of their Māori ancestry), likely explain why caregivers of Māori children were particularly disinclined to prioritise a single ethnicity [19][20][21].
Total response ethnicity allowed caregivers to identify multiple ethnicities simultaneously but this overlap complicated analyses as no single comparator group was available. Philosophically, this lack of a comparator group has the advantage of avoiding the ethnocentric assumption that the comparator group has the desired level of the outcome [22]. This benefit is lost if a residual non-overlapping group ('European/other only' in this study) is used as a fixed comparator. Dilution of social disadvantage can be a problem where overlaps are large, and can be further compounded by inclusion of other disadvantaged groups in the pooled comparator.   Single-combined ethnicity also allowed multiple ethnicities but without overlap between groups. Multiple combination groups were identified, some of which had few participants with consequent loss of power and difficulty presenting results coherently. Use of finer measures of ethnicity than the broad groups used in this study would exacerbate this effect further. Aggregating the smaller combination groups overcame this issue at the expense of loss of detail. A particular strength of the singlecombined method was the ability to demonstrate the diversity of risk within the Māori and Pacific groups, as defined using selfprioritisation or total response, showing that these groups are not homogeneous.
Comparison of measures of socio-economic status also identified several issues. This study compared household income with validated census-derived and survey-derived measures of socioeconomic deprivation which are commonly used in NZ. Household income proved problematic as a measure of financial resources. Previous research has documented that responses to questions regarding income are frequently incomplete, inaccurate or declined [23]. Reasons for this can be informational, computational or motivational [24]. While both under-and over-reporting may occur [24], we found non-response to be most frequent amongst participants living in more deprived areas, creating a significant risk of bias if those with missing data were excluded. Participants living in more deprived areas were also more likely to provide post-tax income, compounding the risk of bias if only participants with pre-tax income were considered. As an additional limitation, the entire income amount may not necessarily be available as money may be used to pay interest on debt or may be diverted by charitable giving, particularly to religious institutions, or remittances to relatives overseas.
The NZDep2013 [8] and NZiDep [9,25] performed similarly well in differentiating risk of hospitalisation for an ID across the socio-economic gradient, despite having different underlying methodologies. The NZDep is updated after each national census (NZDep2013 being the update following the 2013 census) and made publically available, including in atlas form [26], so it has the advantage of a high level of completeness if the residential address is known. Similar census-derived indices of deprivation have been used in the UK [27] and in limited areas in the USA [28], and can be used to determine health resource allocation as well as research [29]. There are a number of potential pitfalls with using the NZDep. First, the census questions used to derive the NZDep cover a range of social determinants of health. If similar variables, such as household crowding, are included with the NZDep in multivariable analyses, this creates a risk of multicollinearity. Second, deprivation is not homogeneous within NZDep deciles, therefore making assumptions about individuals within a given decile is a form of ecological fallacy. Comparison of NZiDep scores within NZDep2013 deciles showed that while high NZiDep scores were rare in the least deprived deciles, NZiDep scores of 1 or 2 were still common in the most deprived deciles, consistent with previous research in NZ [30].
The NZiDep required an additional eight questionnaire items but proved acceptable to participants with only 0.5% declining to answer one or more. While these questions covered employment and benefits, they did not cover other social determinants of health such as crowding, meaning that these variables could also have been included in multivariable analyses. As individualised data were used, albeit for the caregiver rather than the child, the NZiDep largely avoided the ecological fallacy concerns affecting the NZDep. The proportion of the cohort with NZiDep scores of 3, 4 or 5 was small, leading to reduced statistical power. Creating a binary variable by comparing those scoring 3, 4 or 5 with those scoring 1 or 2 preserved the size of the more deprived subgroup and appeared to work well. This approach would require further validation before widespread use. The responses to specific NZiDep questions may also be of interest in their own right, and may highlight areas of hardship, such as food poverty, for policy intervention.
This study had some limitations. The use of broad ethnic groups was artificial. Only the Māori group represented an ethnicity many people would identify with. This approach obscured within-group mixed ethnicities, such as Tongan and Samoan, or Chinese and Korean. However, it is similar to Statistics New Zealand's use of 'level 1' ethnicity, and to the use of ethnicity in much international health research, and maintains statistical power. This study used ethnicity data obtained from the primary caregiver, most commonly the mother, and not their partner. Parents may have differed in their interpretation or weighting of their child's ethnicity [20]. The intergenerational transmission of ethnicity is not straightforward, especially in inter-ethnic parental relationships or where one or both parents identify as mixed ethnicity themselves [19]. The frequent use of the 'New Zealander' response was problematic. We included these children in the European/other category, but in reality, the group is heterogeneous. Similar issues have been faced in interpreting national census results [31]. The NZiDep questions were not all taken verbatim from the validated questionnaire, so items describing employment status and receipt of benefits had to be recoded from alternative survey questions which closely approximated the NZiDep items. A high proportion (17.3%) of primary caregivers were not working due to childcare or family commitments. This scored a 'no' on the unemployment item and may have led to an underestimate of deprivation in some of these families. Neither of the socio-economic deprivation indices directly described material deprivation at the level of the individual child. GUINZ has collected numerous additional variables describing social determinants of health including household crowding, housing quality, tobacco smoke exposure, healthcare access, exposure to racism and others, all of which were intentionally excluded from this demonstration. GUiNZ is a longitudinal study incorporating data from multiple data collection waves, however the current study used only cross-sectional data at 4½ years of child age. Further research is required to investigate longitudinal changes in child ethnic identity and socio-economic mobility.
In summary, this study has demonstrated multiple methods of ascribing ethnicity and socio-economic deprivation in a large and diverse child cohort. Self-prioritised, total response and single combined ethnicity were all usable. Self-prioritisation was simplest to analyse, but 10% of participants could not be prioritised to a single ethnic group. Total response was complicated by overlap between groups and, in an unmodified form, did not allow a single baseline for comparison. Single-combined created a number of small ethnic groups with loss of statistical power, but aggregation overcame this. Single-combined ethnicity revealed diversity of risk within the broader Māori and Pacific groups and for this reason, and with mixed ethnicity becoming increasingly common, the single-combined method should be preferred where sample size and data structure allow it. Household income was affected by non-random missing data and the inability to combine pre-tax and post-tax income, both factors contributing to a risk of bias. Both NZDep2013 and the NZiDep were effective in differentiating risk between high and low levels of deprivation. The NZDep2013 8 Mark R. Hobbs et al.
requires caution with regards to the ecological fallacy and is most appropriate to studies which do not include social determinants of health which overlap with the census items used to derive the index. The NZiDep avoids these issues but requires a large sample size as a relatively small proportion of people are identified as having high deprivation scores. The questions from which the NZiDep is derived may be of interest in their own right.