Validity of recalled v . recorded birth weight : a systematic review and meta-analysis

Department of Geriatric Medicine, University of Edinburgh, Edinburgh, United Kingdom Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, University of Edinburgh, Edinburgh, United Kingdom Medical Student, University of Edinburgh, Edinburgh, United Kingdom NHS Lothian, Edinburgh, United Kingdom MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, United Kingdom University/British Heart Foundation Centre for Cardiovascular Science, Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, United Kingdom


Introduction
Birth weight is an important marker of current and future health, and has been used in many epidemiological studies of determinants of health and disease from childhood through adulthood to old age. 1,2Some studies have recorded birth weight directly in official records, 3 but many studies rely on recalled birth weight reported by the participants or their mothers. 4Several studies have found that maternal recall is fairly accurate, even years after the birth, 5,6 but to our knowledge there has been no systematic review to establish whether this finding is consistent across all published studies.This systematic review and meta-analysis of published observational studies aimed to determine the agreement between birth weight recalled by parent or self any time after birth, and the actual birth weight recorded in official records.

Data sources
We followed the Meta-Analyses of Observational Studies in Epidemiology (MOOSE) guidelines for the conduct, 7 and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for the reporting, 8 of this systematic review.M.G.Z.performed the literature search on MEDLINE, EMBASE and Cumulative Index to Nursing and Allied Health Literature (CINAHL) from inception to May 2015 using terms as both keywords and indexing (MeSH) terms: birth weight AND (mental recall OR self-report) AND (recorded OR actual OR verified) (full strategy: Supplementary material 1).We also searched reference lists and performed a forward citation search of all included papers.

Study selection
We included studies in the systematic review which addressed the question: 'Does recalled birth weight correlate with recorded birth weight?'We included both self and parental recall, with no restriction on time from birth.We excluded studies that did not report a 'gold standard' for birth weight (recorded in official document, e.g.birth certificate or birth register).We excluded individuals with specific mental or physical illnesses to ensure results were applicable to the general population, but included control groups if these were reported separately.We excluded studies that selected participants on the basis of abnormal births (e.g.low birth weight or preterm) as a high-risk pregnancy or birth may affect frequency of measurement, and influence maternal recall, but included studies that included all unselected births.We excluded studies which only categorized birth weight into two or three categories.There were no exclusions by age, sex, socioeconomic status, ethnicity or country, or language of publication.M.G.Z., S.M. and T.H.M. independently identified studies for inclusion, resolving any disagreements by consensus, and/or discussion with S.D.S. and R.M.R.The protocol is available by contacting the authors.

Data extraction
M.G.Z. and T.H.M. independently extracted relevant information on study characteristics (Table 1), and results (Table 2) directly to Excel spreadsheets.This included factors which may influence recall of birth weight, that is time since birth, method of recall (questionnaire or interview) and parity.Each paper was assessed qualitatively for major sources of bias or confounding.
Where data were not published, we contacted authors twice by email and post.We received a response with data from two (Sou, 9 data included; Tehranifar, 10 some required data not collected, therefore not included) and two further stating that data were not available.2][13] Where studies reported some form of correlation between the measures (Pearson's r, Spearman's ρ, ICC or κ) this was used in the main analysis if calculated on continuous (individual) birth weight measures, but not if calculated using categories of birth weight.Where more than one measure was reported, we used Pearson's r.
Where no correlation measure was reported, we used the summary estimate from the other studies as described below.Jaspers et al. 14 reported an upper CI which appeared too large (0.16 pounds = 80 g), given the mean difference of 25 g and the lower interval of 10 g.We contacted the author but have not received a reply, so have used an upper CI of 40 g.
The main quality assessment was the risk of bias in recall of birth weight due to access to the gold standard (e.g.birth certificate).We categorized risk of bias as high if the subjects had access to this document at the time of the study, low if they did not have access, or if this was unclear (i.e.not reported, but possible, for example, telephone interview where parent would have had access to birth records kept at home).

Meta-Analysis
The meta-analysis was conducted with Comprehensive Meta Analysis V3.3 (Biostat, Englewood, CO, USA) using inverse variance weighting and the method of moments for random effects. 15This means that the impact of the sample size is proportional to its square root.The main analysis summarized the mean difference in grams between measured and recalled birth weight.
To accurately calculate the variance of the difference requires knowledge of the correlation between recalled and measured birth weight.
The first step was to produce a summary estimate of the correlation from those studies that reported it.The summary estimate was then used in the main analysis for those studies that did not report a correlation.
A preliminary fixed effects analysis revealed high levels of heterogeneity (I 2 = 80%); we therefore report summary effects from random effects models.
Sensitivity analyses were conducted for (1) recall bias (only including studies without recall bias); (2) time elapsed since birth (only including those >1 year); (3) parity correction (only including studies which corrected for parity); (4) studies using estimated values; (5) study sample size (omitting the two largest studies and conducting a leave-one-out analysis); (6) the estimated correlation between measures (using the values of the 95% CI in place of the summary estimate).
Subgroup analyses were conducted for (1) self v. parental recall; (2) metric v. imperial units of measurement; (3) high v. low and middle income countries.The first two were pre-specified, whereas the third was post-hoc, suggested by a reviewer.Meta-regression was used to explore further significant subgroup differences.

Qualitative synthesis
In total, 40 studies were eligible for inclusion in the systematic review (Table 1).They were heterogeneous: size in the recalled group ranging from 14 to 46,637 (median 257), the year of publications ranging from 1935 to 2013; the majority from the United States (18 studies) and Europe (13 studies); birth information was mostly reported by mothers (31 samples), self (eight samples) or either parent (five samples).Two studies reported both mother and self-report. 30,41The time to recall for parental report varied from 3 weeks to 96 years, and for selfreport from 27 to 78 years.Data collection was by interview (20 studies, including three by telephone), questionnaire (17 studies) or both.Recorded data were from clinical (hospital or birth register) records (33 studies), birth certificates (four studies), or research databases collected at birth (four studies).138 S. D. Shenkin et al.
The majority reported metric measures (g); where imperial measures were used we converted to metric (1 oz = 28 g).Note one study used 'Dutch modern pounds' = 500 g. 14 There were 10 samples from nine studies, 10,20,23-27,46,48 which did not provide data for meta-analysis.These included from 47 to 2552 mothers (median 99) (Table 1) and generally reported good agreement within birth weight categories (Table 2), with over 50% of participants reporting agreement within 25 g (1 oz) (20,23), and 70-90% agreeing within 100 g. [20][21][22][23][24]27,47,48 The majority of studies were small (n < 200), with an unclear risk of bias (i.e. most stuies did not report whether or not the informant had access to a recorded birth weight).Bat-erdene et al.48 (n = 2552) estimated maternal recall at up to 3 months compared with electronic health records and found that 11.1% had exact recall, and 88.4% within 50 g; Victora et al. 23 (n = 1800) in Brazil at 9-15 months found 60% of mothers recalled the exact weight.
The largest study by far was eligible for meta-analysis: Gayle et.al. 47(n = 46,637), followed up participants in the Tennessee Women, Infants and Children Supplemental Feeding Program in the United States, and found 70.6% mothers had exact recall, and 89% within 28 g.This study included 20% preterm, and 7.4% low birth weight, but we did not exclude this study as these groups were not intentionally oversampled.The time to recall was not reported, though they reported that there was no difference in recall if child's age was greater or <1 year.There was no access to the electronic health record.Lower accuracy was associated with infant's low birth weight, poor birth outcome, poorer education, black race, single marital status and age <18 years.Mothers reported a 0.2 oz (6 g) lower mean birth weight compared with birth certificates.
Most studies do not report the proportion who were unable to recall birth weight: in Allen et al. 40 this was 47% (Table 2).In summary, included studies find that almost 90% of mothers recall birth weight to within 1-2 oz (Table 2).MD, mean difference.
Where data reported in oz, converted to g (1 oz = 28 g).
Where MD is negative, recorded BW larger than recalled BW. a Included in meta-analysis of mean difference.b Included in meta-analysis of correlation only.c Correlation coefficient/κ reported for categorical analysis therefore not included in meta-analysis.

Meta-analysis
We included 23 samples from 19 studies (total n = 7406) in the meta-analysis of correlation, and 29 samples from 26 studies (total n = 72,114) in the meta-analysis of differences in birth weight (Table 1 and 2): three studies 13,32,41 had two sets of data which allowed separate analysis: two age groups; 32 first v. subsequent births; 13 maternal v. self recall 41 (Table 2).Sample size ranged from 14 to 46,637, median 265.

Correlation
There was a strong correlation between recalled and recorded birth weight, estimated as 0.90 (CI 0.86-0.93)(Fig. 2).This estimate of the correlation was used in the main analysis for studies that did not report a correlation.

Differences in absolute birth weight
The absolute effect size of the difference in birth weight between recalled and recorded was very small, not statistically significant, and unlikely to be clinically important: 1.4 g (−4.0 to 6.9 g) (Fig. 3).

Sensitivity analysis
Sensitivity analyses to assess the effect of -(1) recall bias; (2) time elapsed since birth; (3) parity correction; (4) studies using estimated valuesall showed little effect on the results (Supplementary Figs 1-4).Leaving out the two very large studies -Gayle (n = 46,637) and Tate (n = 11,890)yielded a summary estimate of 5.82 g (−4.36, 16.00).A leaving one out analysis showed that no other study affected the summary estimate by more than 2 g (Supplementary Fig. 5).For eight studies, we used a summary estimate of the correlation.We therefore also performed sensitivity analyses in which we substituted the upper and lower 95% limits of the estimated correlation (0.93 and 0.85) for those studies that did not report one.The results (mean difference, 95% CI) are: 1.88 g (−3.64, 7.41) and 0.96 g (−4.50, 6.39), for the upper and lower limit, respectively.

Subgroup analyses
Subgroup analysis by informant and units of measurement yielded subgroup estimates that were not significantly different (Supplementary Figs 6 and 7).In contrast, the analysis by country income category revealed a striking difference.Low and middle income countries appear to overestimate birth weight by around 80 g (57,103) (Fig. 4).The income categorization explained 77% of between study variance, but unexplained variance was still moderately high (I 2 = 48%).

Risk of bias
Most studies were observational cohort studies of good quality with little evidence of major source of biases or confounding factors.Some studies analyzed subgroups to determine if there were subgroups with higher or lower errors.Inclusion and exclusion criteria were generally not well reported.The main source of bias was the possibility that participants were not blinded to the recorded birth weight (e.g.birth certificate), and for most studies it was unclear whether or not participants had access to such records.One excluded study 49 explicitly asked parents to copy results from a personal child health record.
Results were essentially unchanged if we excluded studies where access to the birth weight record was possible (difference in means −0.04 g (CI −5.6-5.5 g).

Discussion
This systematic review of 40 studies (total n = 78,997 births) and meta-analysis in 29 samples from 26 studies (total n = 72,114) shows that recalled birth weight has excellent agreement with recorded birth weight: pooled estimate of correlation in 23 samples from 19 studies (total n = 7406 births) was 0.90 (95% CI 0.86-0.93),with a small absolute difference: range from −86 to +129 g; random effects estimate 1.4 g (95% CI −4.0-6.9 g).There was no evidence for an effect of self or parental recall, age at recall or time elapsed since birth event on the validity of recalled birth weight.There was, however, evidence of higher recalled birth weight of 80 g (95% CI 57-103 g) in low or middle income countries, in post-hoc analysis.The majority of the studies included reported high agreement, with a small (clinically insignificant) absolute difference.Validity of recalled v. recorded birth weight 145 In studies which reported findings in categories, rather than absolute values, over 50% of participants reported agreement within 25 g (1 oz).If a 100 g error was tolerated, most studies reported agreement between 70 and 90%.Some of the differences may be due to reporting (rounding) errors: if reporting in imperial measures to the nearest ounce, the margin of reporting error could be up to 56 g (2 oz).
A strength of our study is that a systematic and comprehensive review process, devised with an experienced librarian, reported in line with PRISMA guidelines, was followed for this review.Two reviewers independently assessed eligibility of the titles, abstracts and full-text studies.We were able to conduct a meta-analysis of a significant number of studies with a large pooled sample size.Studies only including clinical populations, for example, mental or physical illnesses were excluded.We did this to ensure that our results were generalizable to the general population.Future systematic reviews can establish if the findings are similar in clinical subgroups.
However, there are some potential limitations of our study.The search terms were broad, and it is possible we have missed some potentially eligible studies.We also excluded studies that categorized births into three or less groups.The studies are heterogeneous in terms of size, countries, ethnicities, age groups, methodology (e.g.data collection methods, gold standard used), and reporting of statistical analysis.However, we performed sensitivity analyses to assess the influence of several potential influences on results, for example, imperial v. metric measurement, sample size, time since birth, first born v. subsequent birth, self v. parental recall, and found that there was no statistically significant influence on results.We also assessed the effect of the two largest studies: removing them increased the summary estimate from 1.4 g to 5.8 g, but neither of these are clinically significant.A further limitation is that the majority of studies were small, and the overall results are predominantly affected by a few large studies (in qualitative analysis 23,47,48 , in meta-analysis 6,14,42,43 ).However, the smaller studies had similar findings in qualitative review and meta-analysis.
Any validation study is limited by the data available: here, we required both the availability of a historical record, and an individual's recall.Clinical records may not be accessible in some countries, accurate data may not be recorded particularly in home births.Recovery of recorded birth weights could be as low as 10%.Historical records require transcription from hand-written ledgers for electronic analyses.Birth certificates include birth weights in some countries (e.g.United States) but not all.Recall rateswhere reportedwere variable, for example, self-recall 24 12 or 46%. 37This may vary for several reasons, for example, by country: in Africa up to 25% could not recall birth weight; 46 due to maternal of fetal factors such as maternal education; 47 or due to neonatal complications. 26urthermore, there are many methods of reporting the agreement between two measures.
We report correlation and mean difference, but acknowledge that overall correlation coefficient is limited as a measure of agreement: it measures the strength of the relationship between two variables, not the agreement between them; it is unaffected by the scale of measurement (e.g.grams or kilograms); it depends of the range of the measurements; it may mask variability within subgroups, or in certain parts of the distribution. 12,50The Pearson correlation coefficient is, however, required to correctly estimate the variance of the mean difference, so we would suggest that authors of future studies include this along with other measures of agreement.
We did not assess risk of bias using formal tools: there is currently no consensus on the best method of quality assessment for observational studies.The major source of potential bias was whether the individual had access to the recorded birth weight: for example in Catov et al. 13 , the mother brought in the birth certificate at the time of interview, which was used as the record of actual birth weight.However, the results were similar in studies where there was no access to the recorded birth weight.Some studies suggest that recall may be more accurate within some ethnic, socioeconomic or clinical subgroups. 6,12We did not extract data relating to this, and many studies did not report these data.
Birth weight from historical records has been used in many epidemiological studies, particularly relating to the Developmental Origins of Health and Disease. 1,2It is debated whether recalled birth weight is sufficient to explore the influence of early life factors as part of life course epidemiology.However, it is still widely used, and the findings from this systematic review and meta-analysis suggest that recalled birth weight can be reliability used as an estimate of actual birth weight, where birth records are not available, for example as a risk factor for later disease. 1,2Recalled birth weight also appears valid in low birth weight and preterm births, as part of population studies, but future studies should explore whether there are different rates of recall in clinical subgroups.There is insufficient evidence to confidently extrapolate this finding to low income countries, and future studies should explore whether the reported recall of higher birth weight in low and middle income countries is replicated, and explore potential reasons for this.

Conclusion
This systematic review and meta-analysis suggests that where birth weight is recalled, it can confidently be used as a reliable estimate of actual birth weight, particularly in high income countries.

Fig. 4 .
Fig. 4. Subgroup analysis of high v. low and middle income countries.HIC, high income country; LMIC, Low/middle income country.

Table 1 .
Flow [Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)] diagram of included studies.CINAHL, Cumulative Index to Nursing and Allied Health Literature Validity of recalled v. recorded birth weight 139 Descriptive data of studies included in systematic review of recalled v. recorded birth weight (in order of publication) -no birth weight data (52) -single group, no comparison(21)-clinical population(15)-no gold standard(10)-birth weight groups too large(7)-retracted(1)-no full text available (German) (1) -birth weight copied, not recalled (1) BW, birth weight; ICC, intraclass correlation; LBW, low birth weight; NR, not recorded.Where data reported in oz, converted to g (1 oz = 28 g).Where mean difference is negative, recorded BW larger than recalled BW. a Included in meta-analysis of mean difference.b Included in meta-analysis of correlation only.

Table 2 .
Results of studies included in systematic review of recalled v. recorded birth weight (in order of publication)