Research has increasingly shown that prenatal and perinatal events have an important effect on later and lifelong health outcomes of offspring. Complications during pregnancy, delivery, and early childhood have all been associated with neurological, developmental, and neuropsychiatric disorders (Cannon et al., Reference Cannon, Jones and Murray2002; Liu et al., Reference Liu, Raine, Wuerker, Venables and Mednick2009; Rice et al., Reference Rice, Lewis, Harold, van den Bree, Boivin, Hay and Thapar2007), as well as chronic diseases such as obesity, metabolic syndromes, cardiovascular disease, cancer, and neurocognitive disorders (Rice et al., Reference Rice, Lewis, Harold, van den Bree, Boivin, Hay and Thapar2007; Sou et al., Reference Sou, Chen, Hsieh and Jeng2006; Troude et al., Reference Troude, L'Helias, Raison-Boulley, Castel, Pichon, Bouyer and de La Rochebrochard2008). Early life factors are also associated with the development of chronic diseases and increased rates of cognitive, behavioral, and emotional problems (Buka et al., Reference Buka, Goldstein, Spartos and Tsuang2004; Liu, Reference Liu2011; Rice et al., Reference Rice, Lewis, Harold, van den Bree, Boivin, Hay and Thapar2007; Tomeo et al., Reference Tomeo, Rich-Edwards, Michels, Berkey, Hunter, Frazier and Buka1999). For instance, recent studies have suggested an association between birth weight and cardiovascular diseases in adulthood (Frontini et al., Reference Frontini, Srinivasan, Xu and Berenson2004; Mzayek et al., Reference Mzayek, Hassig, Sherwin, Hughes, Chen, Srinivasan and Berenson2007) that is not confounded by genetic and environmental factors (Bergvall et al., Reference Bergvall, Iliadou, Johansson, de Faire, Kramer, Pawitan and Cnattingius2007). Furthermore, maternal health-related behaviors, such as substance use during pregnancy, are known to have important implications on offspring health and development.
Researchers are increasingly interested in obtaining information from the perinatal period and often do so through maternal recall. Although medical records are often considered to be the most accurate sources of information, using medical records and charts can be impractical due to time and cost restraints, and in some instances health registries and records may not even exist (Troude et al., Reference Troude, L'Helias, Raison-Boulley, Castel, Pichon, Bouyer and de La Rochebrochard2008). Furthermore, recording errors can occur, medical criteria may vary from hospital to hospital, and abstraction of information may be difficult due to the Health Insurance Portability and Accountability Act (HIPAA) policies as well as inconsistencies in record organization (Elliott et al., Reference Elliott, Desch, Istwan, Rhea, Collins and Stanziano2010; Hewson & Bennett, Reference Hewson and Bennett1987; Joffe & Grisso, Reference Joffe and Grisso1985). As a result, pregnancy and neonatal information is commonly obtained through cost-effective, self-report questionnaires or interviews. However, the validity and reliability of maternal report are still debated, and despite the number of studies suggesting maternal recall is sufficiently reliable for some pregnancy and early life characteristics (D'Souza-Vazirani et al., Reference D'Souza-Vazirani, Minkovitz and Strobino2005; Launer et al., Reference Launer, Forman, Hundt, Sarov, Chang, Berendes and Naggan1992; Li et al., Reference Li, Scanlon and Serdula2005; McCormick & Brooks-Gunn, Reference McCormick and Brooks-Gunn1999; Olson et al., Reference Olson, Shu, Ross, Pendergrass and Robison1997; Quigley et al., Reference Quigley, Hockley and Davidson2007; Reich et al., Reference Reich, Todd, Joyner, Neuman and Heath2003; Tomeo et al., Reference Tomeo, Rich-Edwards, Michels, Berkey, Hunter, Frazier and Buka1999), evidence still suggests poor to moderate recall for information including lifestyle during pregnancy (Jaspers et al., Reference Jaspers, de Meer, Verhulst, Ormel and Reijneveld2010), complications and disease diagnosis (Coolman et al., Reference Coolman, de Groot, Jaddoe, Hofman, Raat and Steegers2010; Sou et al., Reference Sou, Chen, Hsieh and Jeng2006), and procedures during delivery (Quigley et al., Reference Quigley, Hockley and Davidson2007). These inconsistencies are in part attributed to the current literature's varied sample populations, methodology, length of recall, and measures of interest. Importantly, most studies have focused on the recall of one or a few related variables, such as birth weight (Catov et al., Reference Catov, Newman, Kelsey, Roberts, Sutton-Tyrrell, Garcia and Ness2006; Lumey et al., Reference Lumey, Stein and Ravelli1994), and specific procedures or complications during delivery (Coolman et al., Reference Coolman, de Groot, Jaddoe, Hofman, Raat and Steegers2010; Quigley et al., Reference Quigley, Hockley and Davidson2007; Sou et al., Reference Sou, Chen, Hsieh and Jeng2006). Thus, it is unclear whether inconsistencies in findings actually reflect differences in the accuracy of maternal report for different variables or whether they are due to methodological variations (i.e., sample characteristics, questionnaire wording, measurement). Few studies have looked comprehensively at recall validity for perinatal, prenatal, and postnatal data, and those which have often use small samples (Githens et al., Reference Githens, Glass, Sloan and Entman1993; Rice et al., Reference Rice, Lewis, Harold, van den Bree, Boivin, Hay and Thapar2007; Tomeo et al., Reference Tomeo, Rich-Edwards, Michels, Berkey, Hunter, Frazier and Buka1999). Even fewer studies have addressed maternal recall of pregnancy, delivery, or postnatal complications, and those that have often do so very broadly (Tomeo et al., Reference Tomeo, Rich-Edwards, Michels, Berkey, Hunter, Frazier and Buka1999; Yawn et al., Reference Yawn, Suman and Jacobsen1998). On the other hand, studies which have studied the recall validity for specific complications often leave out other important perinatal factors (Buka et al., Reference Buka, Goldstein, Spartos and Tsuang2004; Coolman et al., Reference Coolman, de Groot, Jaddoe, Hofman, Raat and Steegers2010; Sou et al., Reference Sou, Chen, Hsieh and Jeng2006). Finally, and importantly, validity has not been assessed in a large sample of mothers of twins who are asked to recall information for both twins simultaneously. To our knowledge, only Reich et al. (Reference Reich, Todd, Joyner, Neuman and Heath2003) have examined maternal recall with a focus on mothers of twins. In their study, mothers were re-interviewed 6–18 months after the initial interview, but comparison to medical records was not available. Thus, while the use of twins allowed for maternal recall to be assessed for reliability, validity of this information was not established.
This study aims to help bridge the gaps in the existing literature by examining the validity of maternal recall in a large twin cohort. Mothers of twins were asked to complete a questionnaire that was developed by the first author and asked mothers to report on pregnancy and birth-related events including maternal history, medical problems during pregnancy, substance and vitamin use, delivery procedures, neonatal information for both twins, and post-delivery complications for both twins. The validity of the data was obtained by comparing questionnaire answers to medical records.
The subjects were participants in the University of Southern California (USC) Risk Factors for Antisocial Behavior (RFAB) twin study, which is an ongoing prospective longitudinal study of the interplay of genetic, environmental, social, and biological factors on the development of antisocial behavior from childhood to early adulthood. The twins and their parents were recruited from the larger Los Angeles community and the sample is representative of the ethnic and socio-economic diversity of the greater Los Angeles area. On the first assessment (Wave 1), the twins were 9–10 years old (mean age = 9.59, SD = 0.58). On the second assessment (Wave 2), the twins were 11–13 years old (mean age = 11.79, SD = 0.92). On the third assessment (Wave 3), the twins were 14–15 years old (mean age = 14.82, SD = 0.83), and during Wave 4 the twins were 16–18 years old (mean age = 17.22, SD = 1.23). The total sample contains 1,564 subjects (781 twin pairs), including 169 monozygotic (MZ) male, 171 MZ female, 121 dizygotic (DZ) male, 120 DZ female, and 200 DZ opposite-sex twin pairs. Complete details on the procedures and measures can be found elsewhere (Baker et al., Reference Baker, Barton, Lozano, Raine and Fowler2006, Reference Baker, Jacobson, Raine, Lozano and Bezdjian2007, Reference Baker, Tuvblad, Wang, Gomez and Raine2013).
Caregiver participation was primarily by the biological mothers (>90%). Information on prenatal recall was collected from 611 of the twins’ mothers. The mean age of pregnancy among the women in this sample was 29.5 years.
Retrospective birth complications recall questionnaire
Birth complications recall was measured with a retrospective questionnaire developed by the first author who has a master degree in Maternal-Child Health Nursing (see the Appendix). It was developed from the birth complications-medical records instrument (see below), which asked mothers about birth complications on a more general level. The form includes questions regarding three main areas: prenatal (during pregnancy), perinatal (during birth), and postnatal (newborn) complications. Mothers were asked to fill in a computerized version of the birth complications questionnaire at their visit to the USC laboratory.
Birth complications-medical records instrument
We developed the Birth Complications-Medical Records Instrument, which incorporated more detailed birth complications information. This was derived from two well-established instruments: the Lewis–Murray Obstetric Complication Scale (Lewis & Murray, Reference Lewis and Murray1987; Lewis et al., Reference Lewis, Owen, Murray, Schultz and Tamminga1989) and the McNeil–Sjöström Scale for Obstetric Complications (McNeil & Sjöström, Reference McNeil and Sjöström1995). In this study, we asked for the mother's permission to obtain the children's medical records, which were stored at the birth hospitals. We then contacted each hospital and the records were mailed to the laboratory.
Items were grouped into those events occurring prior to the pregnancy of interest (maternal history), during the pregnancy (medical problems during pregnancy; substance use during pregnancy; vitamins during pregnancy), during delivery (medical procedures), information on the infant (neonatal information), and events occurring after delivery (post-delivery complications); see Table 1.
*p < .05. κ is not calculated for this dataset because observed concordance is smaller than mean-chance concordance.
NICU = neonatal intensive care unit.
In the prenatal recall record, seven different illnesses were combined into one variable (0 = none, 1 = respiratory infection, 2 = urinary tract infection, 3 = gall bladder inflammation, 4 = measles, 5 = TB, 6 = epilepsy, 7 = asthma). Due to limited data for some diseases, we only kept respiratory infection, urinary tract infection, and asthma when calculating the κ statistic. Two variables in prenatal record, pre-eclampsia and hypertension, were combined and paired with pre-eclampsia in medical records.
Measures of agreement for both categorical and continuous measures are presented. For the continuous variables (e.g., birth weight and birth length), we computed Pearson's correlation coefficient along with p values. The κ statistic was calculated for categorical variables. The κ statistic measures the extent of exact agreement, adjusting for chance agreement. All analyses were performed using the statistical software SAS (SAS, 2005).
Recall validity for several perinatal factors obtained from the USC Twin Study is presented in Table 1.
Perfect agreement was obtained for maternal history (previous live births; κ 1.00). Poor agreement was found among medical problems during pregnancy, such as bleeding (0.39), edema (0.30), proteinuria (0.10), and nausea and vomiting (0.38). Substance use during pregnancy, specifically alcohol use and vitamin use, was very poorly recalled. However, smoking during pregnancy showed moderate agreement. Information regarding substance use was collected as continuous data (e.g., number of cigarettes per day), and validity analysis was repeated after dichotomizing these data, where any answer >0 would, for example, represent having ever smoked during pregnancy. This produced better recall accuracy for smoking throughout pregnancy (κ 0.73, 95% CI 0.48–0.98) as well as during the first, second, and third trimesters (0.79, 0.80, 0.78, respectively), but validity recall of alcohol use remained poor (0.08, 95% CI 0.18–0.35).
Perinatal and Postnatal
For both twins (A, B), near-perfect (κ ≥ 0.80) agreement was obtained for medical procedures/method of delivery (0.94, 0.97) and birth weight (κ 0.84, 0.82), but not birth length (0.17, 0.21). Recall of specific delivery procedures such as the use of forceps and oxytocin to induce labor was excluded from validity analyses due to low frequency. Recall accuracy was generally low for neonatal information and post-delivery complications. Mothers generally recalled neonatal information for both twins with similar accuracy, and without consistently more accurate recall of information for twin A than B or vice versa. A notably pronounced difference in recall accuracy was, however, found in muscle tone: agreement between recall and medical records was substantial for Twin A (κ 0.70, 95% CI 0.51–0.89) but poor for Twin B (κ 0.26, 95% CI 0.05–0.46).
This study examined the validity of maternal recall for perinatal variables in a large twin sample 8–10 years after birth. Overall, the data obtained from questionnaires completed by mothers around 9 years after pregnancy showed substantial agreement (κ ≥ 0.60) with medical records for most pre-, peri-, and postnatal variables. Exceptions included poor validity for medical problems during pregnancy (e.g., bleeding, edema, proteinuria), substance and vitamin use, and some neonatal information (e.g., birth length, meconium, respiratory distress, and jaundice).
To our knowledge, these findings are the first that use medical records to demonstrate that that maternal recall is a valid method for obtaining neonatal information in twins. Although Reich et al. (Reference Reich, Todd, Joyner, Neuman and Heath2003) looked at reliability and stability of maternal report using a twin sample, this study's design compared sets of interview responses and did not assess validity through comparison with medical records. The findings for a number of pregnancy and neonatal factors are further discussed below.
The recall validity of medical problems such as bleeding, edema, and nausea and vomiting during pregnancy was mostly poor to moderate. Low rates of recall for ante partum vaginal bleeding and edema have been reported previously (Bryant et al., Reference Bryant, Visser and Love1989; Buka et al., Reference Buka, Goldstein, Seidman and Tsuang2000; Olson et al., Reference Olson, Shu, Ross, Pendergrass and Robison1997; Sou et al., Reference Sou, Chen, Hsieh and Jeng2006). Low rates of maternal recall for these particular problems may reflect the fact that these complications may not be severe enough to warrant major actions (i.e., diet change, medications) and are thus less memorable to mothers (Sou et al., Reference Sou, Chen, Hsieh and Jeng2006). Indeed, the few women whose complications did require them to take medications recalled this information with near-perfect accuracy. The moderate recall of hypertension versus pre-eclampsia in our sample (κ 0.60, 95% CI 0.39–0.80) is in line with previous reports, which have generally promoted more accurate patient–doctor communication in order to address the reduced maternal recall (Coolman et al., Reference Coolman, de Groot, Jaddoe, Hofman, Raat and Steegers2010; Rice et al., Reference Rice, Lewis, Harold, van den Bree, Boivin, Hay and Thapar2007). Previous work has also suggested recall of hypertension to be particularly time-sensitive (Olson et al., Reference Olson, Shu, Ross, Pendergrass and Robison1997).
Our initial findings suggest very poor recall validity and reliability for both smoking and alcohol use. While our findings are in line with existing evidence that maternal recall for alcohol use is poor (Delgado-Rodriguez et al., Reference Delgado-Rodriguez, Gomez-Olmedo, Bueno-Cavanillas, Garcia-Martin and Galvez-Vargas1995; Jaspers et al., Reference Jaspers, de Meer, Verhulst, Ormel and Reijneveld2010; Rice et al., Reference Rice, Lewis, Harold, van den Bree, Boivin, Hay and Thapar2007), these and other findings have demonstrated accurate recall for smoking (Tomeo et al., Reference Tomeo, Rich-Edwards, Michels, Berkey, Hunter, Frazier and Buka1999; Yawn et al., Reference Yawn, Suman and Jacobsen1998), which was not observed in our initial analysis. This discrepancy in recall validity for smoking likely reflects the fact that we asked mothers to provide continuous data (e.g., cigarettes per day for overall pregnancy and during each trimester). In contrast to the present study, other studies have generally used dichotomized categories (e.g., ‘ever’/‘never’) when comparing maternal recall data to medical records. Repeating our validity analysis with dichotomous data produced results more in line with the existing evidence with substantial to near perfect agreement for smoking but poor recall of alcohol use.
In addition to poor recall for substance use, we also found very poor recall (κ < 0.20) for the use of prenatal vitamins, iron supplements, and folic acid during pregnancy. To our knowledge, this is the first study that examines maternal recall for vitamin use, but very poor agreement between records and self-report has been reported for prenatal vitamin use even during pregnancy (κ 0.11; Hessol et al., Reference Hessol, Missett and Fuentes-Afflick2004). Due to low frequency of use for the individual vitamins mothers were asked to report on, answers for prenatal vitamins, iron supplements, and folic acid were all grouped into one category.
Perinatal and Postnatal
Our findings add to the existing evidence that birth weight and the method of delivery are among the most accurately recalled perinatal variables (Olson et al., Reference Olson, Shu, Ross, Pendergrass and Robison1997; Sou et al., Reference Sou, Chen, Hsieh and Jeng2006; Tomeo et al., Reference Tomeo, Rich-Edwards, Michels, Berkey, Hunter, Frazier and Buka1999; Yawn et al., Reference Yawn, Suman and Jacobsen1998). An accurate and consistent recall of birth weight may reflect high social value and repetition of information to others (Yawn et al., Reference Yawn, Suman and Jacobsen1998). The lack of such social value could explain the poor recall for birth length in both our samples. We are aware of only one other study that reports recall of birth length, which showed accurate recall but only 6–10 weeks after delivery (Troude et al., Reference Troude, L'Helias, Raison-Boulley, Castel, Pichon, Bouyer and de La Rochebrochard2008). Birth length has been demonstrated to be an independent predictor for various health outcomes (Maehle et al., Reference Maehle, Vatten and Tretli2010; Melve et al., Reference Melve, Gjessing, Skjaerven and Oyen2000; Sun et al., Reference Sun, Ponsonby, Wong, Brown, Kearns, Cochrane and Mackey2009), and may actually serve as a better indicator of birth size than birth weight (Silva et al., Reference Silva, De Stavola and McCormack2008). Thus, while there may be growing interest in obtaining this information, our finding highlights the need for researchers to use caution when relying on maternal reports of birth length.
Recall accuracy was generally poor for neonatal information and post-delivery complications regarding both Twin A and Twin B. Meconium was especially unreliably recalled. Although meconium-stained aminiotic fluid has been associated with higher rates of stillbirths, low Apgar scores, and hypoxic ischemic encephalopathy (Carbonne et al., Reference Carbonne, Cudeville, Sivan, Cabrol and Papiernik1997; Starks, Reference Starks1980; Steer et al., Reference Steer, Eigbe, Lissauer and Beard1989), outcomes are generally good (Balchin et al., Reference Balchin, Whittaker, Lamont and Steer2011) and may explain the underreporting of this complication by mothers. NICU admission tended to be over reported by mothers of the twin samples, while maternal recall of post-delivery complication factors for both twins was the most accurate. This result is similar to previous findings from the United States (Githens et al., Reference Githens, Glass, Sloan and Entman1993).
Limitations and Implications
Our findings are not without limitations, particularly our use of medical records as the ‘gold standard’. These records are not always valid, especially regarding behavioral or lifestyle factors (Hessol et al., Reference Hessol, Missett and Fuentes-Afflick2004; Hewson & Bennett, Reference Hewson and Bennett1987). Medical records are subject to recording errors and inconsistencies due to varying medical criteria between hospitals (Hewson & Bennett, Reference Hewson and Bennett1987; Joffe & Grisso, Reference Joffe and Grisso1985). Recall bias may have also affected our findings and can be caused by factors such as the child's current physical, emotional, mental, or behavioral state. For example, McIntosh et al. (Reference McIntosh, Holmes, Gleeson, Burns, Hodges, Byrne and Johnstone2002) found that the number of obstetric complications recalled by mothers was not related to their own schizophrenic status but was instead related to measures of abnormal child behavior, suggesting that concern for child's behavior may affect retrospective recall. Moreover, it may be possible that pregnancy and related events were more memorable to mothers expecting twins than for those expecting a single child. Additionally, because recall accuracy may be affected by culturally influenced factors, such as the importance of events and awareness and knowledge of conditions (Olson et al., Reference Olson, Shu, Ross, Pendergrass and Robison1997), these findings should be generalized with caution. Furthermore, our sample size is small, which may explain why not all results are significant, specifically regarding more rare medical outcomes. Finally, no information on chorion type was available in the medical birth records.
Despite these limitations, this study makes important contributions to the literature on validity of maternal recall for various perinatal factors. The questionnaire developed and used in this study provides data for medical and behavioral factors that are of interest to researchers, due to their associations with important health outcomes, but have not been examined elsewhere in regard to long-term recall validity. For instance, validity for maternal recall of Apgar scores and birth length has been assessed previously but only 6–10 weeks after delivery (Troude et al., Reference Troude, L'Helias, Raison-Boulley, Castel, Pichon, Bouyer and de La Rochebrochard2008). Recent studies have shown associations between low Apgar scores and a high risk for cerebral palsy in term infants born in Sweden (Thorngren-Jerneck & Herbst, Reference Thorngren-Jerneck and Herbst2006). Our findings suggest that researchers using maternal reports to assess Apgar scores should do so with caution because of the low validity of recall. It could be that parents do not understand the medical terminology, and the information may be unclear to parents when they recall, which in turn may affect validity. Additionally, jaundice has recently been associated with disorders of psychological development (Maimburg et al., Reference Maimburg, Bech, Vaeth, Moller-Madsen and Olsen2010), and prenatal vitamin use has been linked to outcomes such as childhood cancers (Goh & Koren, Reference Goh and Koren2008). The present study also informs researchers in the development and use of recall questionnaires. The validity of recall for behavioral factors like smoking was low when mothers were asked to report continuous data within a recall period of almost 10 years. Thus, while the frequency of smoking may be a variable of interest to researchers due to its association with many long-term outcomes in offspring (Batty et al., Reference Batty, Der and Deary2006; Brook et al., Reference Brook, Zhang, Rosenberg and Brook2006, Reference Brook, Zhang and Fagan2008; Button et al., Reference Button, Thapar and McGuffin2005; Lambe et al., Reference Lambe, Hultman, Torrang, MacCabe and Cnattingius2006; Liu et al., Reference Liu, Leung, McCauley, Ai and Pinto-Martin2013), it may be more suitable to present mothers with categorical answers, or ask within a more immediate recall period. Furthermore, since maternal knowledge and perception of the event's importance may also affect recall validity (Hewson & Bennett, Reference Hewson and Bennett1987; Mitchell et al., Reference Mitchell, Cottler and Shapiro1986; Olson et al., Reference Olson, Shu, Ross, Pendergrass and Robison1997), as different pre-, peri-, and postnatal events become increasingly associated as risk factors for offspring health, doctors and nurses should emphasize the importance of this information at or around the time of the delivery event. Healthcare professionals should also improve communication with parents in order to clarify an understanding of how various conditions, procedures, and other factors are defined.
In conclusion, our findings support that maternal recall, even in a twin sample, could be a reliable source for many pregnancy-related variables up to 10 years after the delivery event. However, maternal recall may not be appropriate for obtaining postnatal information, especially regarding twins, aside from the method of delivery, birth weight, and NICU admissions. Furthermore, this study also highlights the need for caution when using maternal report as a sole source of information, especially for information which mothers may not deem socially valuable (e.g., birth length) or events that require little involvement or changes from the mother (e.g., medical problems not requiring medication).