Validation of the Brief Developmental Assessment in pre-school children with heart disease

Abstract Introduction The objective of this study was to prospectively validate the “Brief Developmental Assessment”, which is a new early recognition tool for neurodevelopmental abnormalities in children with heart disease that was developed for use by cardiac teams. Methods This was a prospective validation study among a representative sample of 960 pre-school children with heart disease from three United Kingdom tertiary cardiac centres who were analysed grouped into five separate age bands. Results The “Brief Developmental Assessment” was successfully validated in the older four age bands, but not in the youngest representing infants under the age of 4 months, as pre-set validation thresholds were met – lower 95% confidence limit for the correlation coefficient above 0.75 – in terms of agreement of scores between two raters and with an external measure the “Mullen Scales of Early Learning”. On the basis of American Association of Pediatrics Guidelines, which state that the sensitivity and specificity of a developmental screening tool should fall between 70 and 80%, “Brief Developmental Assessment” outcome of Red meets this threshold for detection of Mullen scores >2 standard deviations below the mean. Conclusion The “Brief Developmental Assessment” may be used to improve the quality of assessment of children with heart disease. This will require a training package for users and a guide to action for abnormal results. Further research is needed to determine how best to deploy the “Brief Developmental Assessment” at different time points in children with heart disease and to determine the management strategy in infants younger than 4 months old.

significant deficits, allowing for appropriate therapies and education to enhance later academic, behavioural, psychosocial, and adaptive functioning".
Stakeholders, especially parent groups, view the evaluation of developmental and behavioural difficulties in children with CHD as one of their highest priorities. 16Although the benefits of early intervention for developmental difficulties has not been specifically studied in children with CHD, early intervention has been recommended across a range of paediatric contexts 17 and is supported by studies of premature infants 18 and autism; 19 early recognition and intervention is also more satisfactory to families. 20s part of wider health screening in the United Kingdom, all children undergo periodic clinical checks with a community-based nurse referred to as a health visitor; these do not rely on formal developmental tests apart from the "Ages and Stages Questionnaire", which is used with children at the age of 2 years.Despite the recognised higher risk of children with CHD for developmental delay, they do not routinely undergo any additional developmental scrutiny over and above what is offered to the wider population of children, nor do they undergo any standardised assessment of neurodevelopment either before or after cardiac operations.Health professionals trained in cardiac-related specialitiespaediatric cardiologists, paediatric cardiac surgeons and paediatric cardiac nursesare not also trained as specialists in paediatrics or in child development in the United Kingdom, and therefore do not consider themselves equipped to undertake such assessments.Paediatric cardiac services in the United Kingdom are centralised to 11 high-volume centres, which are responsible for overseeing or undertaking the majority of the follow-up of infants and young children with heart disease, presenting a potential opportunity for children to undergo periodic additional assessment of neurodevelopment.Resources are not available for every child with CHD to undergo full neurodevelopmental assessment by a child development specialist, and therefore an alternative approach to this clinical problem is necessary.The "Brief Developmental Assessment" was developed with a view to bridging this gap in care within the National Health Service, and the primary aim of this study was to prospectively validate the "Brief Developmental Assessment". 21e "Brief Developmental Assessment" The "Brief Developmental Assessment", in terms of its rationale, development, and wider qualities, is described in more detail in our companion paper (reference number 21).The "Brief Developmental Assessment" is an early recognition tool for childhood neurodevelopment that contains both direct observation and parental report.The "Brief Developmental Assessment" can be used without additional special equipment that is commonly necessary for child developmental assessment.Nurses or doctors who do not have specific training in child development but who have been trained in its use may undertake the "Brief Developmental Assessment".The "Brief Developmental Assessment" takes 5-10 minutes to complete and is designed for pre-school children up to the age of 5 years and consists of five individual age bands of 0-16.9 weeks, 17-34.9weeks, 35-60 weeks, 15 months to 2.9 years, and 3.0-4.9years.The "Brief Developmental Assessment" covers all the measurable areas of child development relevant to CHD in children under the age of 5 years, within the domains of gross motor skills, fine motor skills, daily living skills, communication, and socialisation, as well as general understanding for the oldest age band.
"Brief Developmental Assessment" scores of amberpossible abnormality based on age appropriate milestonesand redlikely abnormality based on age-appropriate milestoneshave been defined in order that results generate a useful guide to action.Within each developmental domain, individual items are scored as "yes" or "no" based on the child's undertaken activity.Within each age band, there are four subsections for scoring based on gradations of age, hence based on the child's exact age.The score for each domain is graded as red, amber, or green.The overall "Brief Developmental Assessment" result is graded red if there are any red domains.The "Brief Developmental Assessment" result is graded amber if there are any amber domains but no red domains.If all domains are green the "Brief Developmental Assessment" result is green.
An example of the Brief Developmental Assessment version applicable to age group 17-34.9weeks is shown for reader information in the supplementary material.

Study population
The study setting was at the three tertiary children's cardiac centres based in London and the time period of the study was January 2014 to July 2015.By approaching all children between birth and 5 years of age with heart disease attending outpatient or inpatient settings excluding those who were clinically unwell within the intensive care unit, a representative convenience sample of 200 patients with heart disease within each age band was recruited.Once the target of 200 participants was reached in any age band, recruitment to that age band ceased.Children were assessed by a small team of trained psychology assistants under the supervision of a single senior psychological researcher and a medical lead.

Evaluations undertaken
The internal consistency of the "Brief Developmental Assessment" was assessed for items within each domain, between totals for domains, and across the entire measure based on Cronbach's α.
The internal reliability of the "Brief Developmental Assessment" was evaluated in terms of interrater agreement when two raters simultaneously and independently performed and scored the "Brief Developmental Assessment".The study team consisted of at least three raters throughout the study duration, and the inter-rater performance assessment was undertaken whenever two raters could be scheduled to be free at the same time; again, recruitment ceased as soon as the required number of patients was recruited.
External measures used for validation of the "Brief Developmental Assessment" were Mullen Scales of Early Learning (Mullen) 22 and the Ages and Stages Questionnaire (Ages and Stages), 23 the details of which are presented in Table 1.
Concurrent validity of "Brief Developmental Assessment" scores was assessed against Mullen scores.
Evaluation of construct validity was based on detection of known abnormalities, and to that end study participants were defined as falling within a "known group" where the child had been diagnosed to have a condition linked to neurodevelopmental problemsa congenital syndrome, 24 an acquired condition such as stroke, 25 or a previously diagnosed developmental delay based on a specialist clinical assessment even if the cause was not statedand when the Mullen result was amber or red.
Construct validity was further assessed by calculating sensitivity and specificity of the "Brief Developmental Assessment" for detection of neurodevelopmental abnormalities against both of the external measures.

Data analysis
For the evaluation of agreement of "Brief Developmental Assessment" scores between two independent raters and with Mullen, the following measures were taken; see the "Development of the 'Brief Developmental Assessment" 21  Inter-rater agreement was judged based on intra-class correlation coefficients for raw "Brief Developmental Assessment" general total scores that reflect the cognitive domains combined and weighed κ statistics for the ordinal measure raw "Brief Developmental Assessment" gross motor scores.Successful validation was defined as the lower 95% confidence limit for the intra-class correlation coefficient (or weighted κ) exceeding 0.75.For comparison with the Mullen, the raw "Brief Developmental Assessment" scores for the first 100 recruited patients were used to generate regression models for predicting the raw Mullen scores.These predictions were then tested in the subsequent 100 recruited patients in each age band for both "Brief Developmental Assessment" general scores and "Brief Developmental Assessment" gross motor raw scores.Successful validation was defined as the lower 95% confidence limit for the intra-class correlation coefficient (or weighted κ) between observed and predicted Mullen scores exceeding 0.75 in the test sample.
Although the "Brief Developmental Assessment" represents an early recognition tool (rather than a developmental screener), the threshold for acceptability with respect to detection of known abnormalities and sensitivity/specificity was based on the American Academy of Pediatrics Committee on Children with Disabilities 2001 guidelines for desirable sensitivity and specificity for a developmental screening tool of 70-80%. 26

Sample size calculations
For the evaluation of inter-rater reliability of the "Brief Developmental Assessment", we required 56 patients per age band to estimate an expected inter-rater intra-class correlation of 0.9 with a precision of 5%lower bound of 95% CI is 0.85.
For the agreement of the "Brief Developmental Assessment" with the Mullen, we required 200 patients per age band to allow us to estimate an intra-class correlation coefficient of 0.8 with 5% precisionlower bound of 95% CI is 0.75.This is a validated measure for early developmental assessment between birth and 5 years. 32It has been used in infant CHD for developmental surveillance 33 There are five individual scales: four cognitive scales of visual reception, fine motor, receptive language, and expressive language, and a fifth individual scale of gross motor function applicable from birth to 33 months Age range is equivalent to the study population of pre-school-age children It takes only 30-40 minutes to complete, thus increasing its acceptability to research participant families attending clinic The "raw" scores for four cognitive scales and separately for the gross motor are computed to form agestandardised "T scores" in each area Mean "T scores" for each scale within the general population are 50 with standard deviation 10 The cognitive "T scores" applicable to the four cognitive scales combined may be further computed to generate a composite score, which within the general population has a mean of 100 with standard deviation 15 Mullen standardised scores 22  This is a validated screening questionnaire consisting of 21 age-versions applicable between birth and 66 months. 34It has been used in infant CHD for surveillance 35 There are five developmental domains with responses based on parental report: communication, gross motor, fine motor, problem-solving, and personal social It is used with the aim of capturing the Adaptive, and Social and Emotional domains not covered by the Mullen Of note, it does not entail direct observation Each item is scored depending upon whether the child performs consistently (10 points), sometimes (5 points), or not yet (0 points) The total achievable score for each domain ranges from 0 to 60.On the basis of the published means and standard deviations for each age-version questionnaire, two thresholds have been established for each tested area to define a child's score as "close to cut-off" (between 1 and 2 standard deviations from normative mean) and "below the normal range" (at least 2 standard deviations from the normative mean) Parental responses within the questionnaires were categorised as follows, which is based on the manual for usage 23 27 , provided sufficient numbers to detect a 0.5 SD difference in mean "Brief Developmental Assessment" scores between known groups, with 80% power and 5% significance.When assessing the ability of the "Brief Developmental Assessment" to discriminate between children with and without developmental abnormalities, the study was powered to detect a developmental abnormality with 12% precision, for an assumed sensitivity of 80%.We anticipated that the use of the "Brief Developmental Assessment" would result in a lower specificity, possibly 65%, and for this our sample provided 8% precision for this estimate.We were less concerned about the level of specificity as false positives where a child is subjected to medical review are unlikely to be harmful.Furthermore, we expected the "Brief Developmental Assessment" to have a higher sensitivity of 90% for detecting severe developmental abnormalities, and thus for a conservative estimate of prevalence for severe cases of 10% our sample size of 200 would provide a precision of 14%.

Results
The case mix of 982 consented participants in the study is shown in Table 2.The age distribution of the sample is skewed towards younger infants because the width of the five age bands narrows as age falls; the median age across all age bands is 11.5 monthsinterquartile range, 5 months to 2.6 years.

Internal reliability
The internal reliability of the "Brief Developmental Assessment", expressed as Cronbach's α, is shown in Table 3.This was high between "Brief Developmental Assessment" total scores and between all items but low in selected domains of the "Brief Developmental Assessment" particularly within the youngest two age bands representing children under 8 months of age.
Inter-rater reliability A total of 160 children participated in the evaluation of inter-rater reliability of the "Brief Developmental Assessment" (see Table 4).Correlations were very high for all age bands, thus passing the pre-set threshold for inter-rater validity.Agreement between the "Brief Developmental Assessment" and the Mullen Of the 981 participants, 21 did not complete one or more domains of the Mullen, and thus a total of 960 children participated in the evaluation of concurrent validity of "Brief Developmental Assessment" against the Mullen (see Table 4).For age bands two to five, the pre-set thresholds were met with the exception of gross motor in age band two; however, in age band one the "Brief Developmental Assessment" displayed a much weaker correlation with the Mullen and therefore did not pass the pre-set threshold for validity.

Developmental outcomes
The developmental outcomes of participants based on the "Brief Developmental Assessment", Mullen, and Ages and Stages are presented in Table 5 and summarised as follows: There were 960 children completing both the "Brief Developmental Assessment" and the Mullen: For "Brief Developmental Assessment", 364 (38%) had a green result, 361 (38%) had an amber result, and 235 (24%) had a red result.
For Mullen, and considering both Mullen cognitive composite scores and, where applicable based on age, gross motor scores, 639 (67%) had a green result, 178 (18%) had an amber result, and 143 (15%) had a red result.
Data were missing for at least one Ages and Stages domain in 149 children (15%), and all of these children were excluded from validity analyses involving the Ages and Stages.Of 832 children completing the Ages and Stages, only 238 (29%) had a green result, whereas 213 (25%) had an amber result and 381 (46%) had a red result.

Construct validity
Of the 960 participants who undertook the Mullen, 227 (24%) had a condition linked to developmental delay, of whom 153 (67%) also had a red or amber Mullen result, thus meeting the criteria for a "known group" with which to evaluate construct validity.Of these, 141 (92%) were also detected based on a red or amber "Brief Developmental Assessment" result, thus passing the pre-set threshold of 80%.
Surprisingly, 74 (33%) children with a condition linked to developmental delay had Mullen result of green.Of these 74, 40 (54%) were under the age of 8 months, and therefore based on young age developmental delay may not yet be evident.Furthermore, although 17 (23%) had a defined genetic condition such as Down's syndrome, 13 (18%) had peri-operative neurological events of unknown significance and the remaining 44 (59%) had a range of congenital abnormalities where development incorporates a range of outcomes including normality.
Moreover, there were 168 children with a Mullen result of red or amber representing 18% of the study cohort who were not in a known groupthat is, they had no known genetic or acquired condition linked to developmental problems and no known diagnosis of developmental delay stated by either the parents or written anywhere in their medical records that were based at the tertiary hospital.The charts of these patients were reviewed (see discussion section) and this finding may relate to under-detection of true abnormalities in the study population.
As might be expected given that child development assessments in general are more reliable in older children, the sensitivity and specificity of the "Brief Developmental Assessment" against the external measures improved with increasing age, with the best performance in age band five and poorest performance in age band one, which did not meet the criteria for validity.Given that the Ages and Stages is based on parental report only, whereas the Mullen is an objective validated developmental test, as expected, the "Brief Developmental Assessment" performed better against the Mullen than the Ages and Stages.The construct validity in age bands two to five combined may be summarized as follows (refer to (Table 6): The test measure of "Brief Developmental Assessment" red or amber has excellent sensitivity against the Mullen and good sensitivity against the Ages and Stages, but moderate to low specificity for both external measures.
The test measure of "Brief Developmental Assessment" red has variable sensitivity but high specificity for both external measures.When considered based on American Association of Pediatrics Standards for the performance of a developmental screening tool, which state that the sensitivity and specificity of a developmental screening tool should fall between 70 and 80% 26 , the "Brief Developmental Assessment" outcome of red against an outcome of Mullen red is compliant.
Positive and negative predictive values, as well as comparisons between the Ages and Stages and the Mullen, are presented for information purposes in Table 6.

Summary of validation
The primary aim of this study, which was to validate the "Brief Developmental Assessment" as an early recognition tool for childhood developmental delay, was achieved within a population with heart disease between the ages of 4 months and 5 years.Previous researchers have presented sensitivity and specificity across a range of thresholds as a method to judge the performance of a new test against validated measures and have used this approach to select the optimal threshold to trigger an abnormal result 28,29 as has been undertaken in this study.These analyses support the use of both "Brief Developmental Assessment" results of amber and red as thresholds to trigger further evaluation of a child, although after reassessment a proportion of such children may not turn out to have developmental delay.The protocol for such reassessment is currently being delineated within a Delphi survey and goes beyond the scope of the current study.The Delphi survey entails a series of questions to a large group of health professionals from a range of settings and backgrounds that seeks to achieve a consensus as to the referral and The number of patients completing the Brief Developmental Assessment and the Mullen was 960; and the Ages and Stages was 832.
reassessment pathway for children who have either amber or a red "Brief Developmental Assessment" test result picked up by the cardiac team at the tertiary centre.

Limitations of the validation
Our positive evaluation of "Brief Developmental Assessment" validity and reliability relates only to tests at a single time point and the constructs of testretest validity and responsiveness over time could not be assessed within the scope of this study.Both concepts are challenging to test within a rapidly developing population of very young children with a significant health problem such as CHD and a further dedicated study will be required to explore these in particular repeated testing over time.
We note that the Brief Developmental Assessment has been developed as an early recognition tool for child development, and it is not intended to replace full formal neurodevelopment evaluation.It is our hope and intent that children flagged up by the Brief Developmental Assessment when it is used with them by the cardiac team will be speedily referred to and assessed by a neurodevelopmental clinic with a more detailed formal evaluation using gold standard neurodevelopmental tests.
A motivation underpinning our study was a hypothesis that the processes in place to assess the neurodevelopment of children with CHD require improvement within the United Kingdom and children with CHD and developmental delay may be under-recognised.Indeed, 168 children with red or amber Mullen results were not in a known group, and chart review undertaken by one of three clinicians revealed concerns from the parents about the child's development and/or other risk factors for abnormal development such as a history of cardiac arrest or mechanical circulatory support 30,31 in the majority.A review of the services that children were under and what actions might need to be taken to meet their needs is underway and goes beyond the scope of the validation study.Health professionals and parents have commented anecdotally that children with CHD, such as those under the age of 5 years recruited to this study, are in general undergoing treatments for their heart including surgery, and this represents the main focus of contacts with health professionals including both those at the cardiac centres and also in the community such as health visitors.This may be represent a reason for these 168 children with red or amber Mullen results not already being identified as in a known group.The Brief Developmental Assessment was developed for use with children who have heart disease, and has not been used or validated with healthy children.There is the potential that the Brief Developmental Assessment might be useful within other groups of children who for medical reasons are at greater risk of neurodevelopmental problems, such as survivors of other types of critical illness.However, in order to take this forwards, further testing and research would be required.

Comment on external measures
Parents who were concerned about their child's development preferred to watch the entirety of the testing with "Brief Developmental Assessment" and Mullen, and were less likely than parents who had no concerns about their child to complete the Ages and Stages while their child was being assessed.This is supported by a comparison of the Mullen cognitive results between children with missing Ages and Stages -29% Mullen cognitive results red or amberand those who completed Ages and Stages -19% Mullen cognitive results red or amber.The overall proportion of red results for the Ages and Stages (46%) was very high, and perhaps proportion of Ages and Stages results that were red would have been even higher had the missing 15.2% been included.This is not a cohort study with longitudinal follow-up thus limiting interpretation, but as displayed in Table 6 developmental delay based on Mullen results was detected least frequently in the youngest infants in contrast with developmental delays based on Ages and Stages results, which was detected most frequently in the youngest infants.Medical ill health in children with CHD is more prevalent in infancy as this is a period when interventions are commonly undertaken.These observations support a hypothesis that in children with CHD the Ages and Stages may be picking up a range of issues including developmental delay and general ill health, and further emphasises the importance of an initial evaluation for signs of developmental delay that incorporates direct observation of children with CHD, such as the "Brief Developmental Assessment" provides.Furthermore, this emphasises the recognised importance of periodic assessment of neurodevelopment over time in children with heart disease.

Summary and next steps
The development and validation of the "Brief Developmental Assessment" represents an opportunity to improve the future quality of periintervention assessment for children with heart disease between the ages of 4 months and 5 years in United Kingdom children's cardiac centres.This initiative of using the "Brief Developmental Assessment" within the cardiac centres would complement the health visitor assessments that all children receive and will be undertaken by cardiac staff aware of details of the child's history such as cardiac arrests and mechanical circulatory support that predispose to neurodevelopmental problems.One problem with the current system of surveillance for young children in the United Kingdom as it pertains to children with heart disease is that in addition to them being inherently at higher risk than other children and therefore potentially benefitting from additional scrutiny, the standard health visitor reviews correspond with a period in these children's lives when cardiac conditions are often having a significant impact, and this may be a barrier to the systems effectiveness for them, and may account in part for the occurrence of undetected neurodevelopmental problems in the population that we observed.
Roll out of the "Brief Developmental Assessment" will require a training package for users and a guide to action for abnormal results, such as a standardised report for specific relevant health professionals and parents.Since the "Brief Developmental Assessment" is a short assessment that is undertaken by staff with the training background of those working in a cardiac centre with whom children who have CHD are in regular contact early in life and without additional equipment successful implementation is more likely.Further research is needed to delineate the optimal approach to assessment of children over time including when to incorporate the "Brief Developmental Assessment" and to establish the most effective management strategy for infants who attend cardiac centres for interventions when they are younger than 4 months old as the "Brief Developmental Assessment" is not appropriate for them.

Table 1 .
Overview of external measures used to assess "Brief Developmental Assessment" performance including definition of outcomes.
All other patients are classified as green categorised as follows: Patients with age-standardised cognitive or gross motor score falling between 1 and 2 standard deviations below the mean were scored as amber (Cognitive scores 70-84, gross motor scores 30-39) Patients with age-standardised cognitive or gross motor score more than 2 standard deviations below the mean were scored as red (Cognitive scores <70, gross motor scores <30) ://doi.org/10.1017/S1047951117002773Published online by Cambridge University Press Within each of the age bandsexcluding the youngest infantsa sample size of 200, comprising ~50 children with known developmental abnormality and 150 children presumed to have normal development, with anticipated prevalence of 25% https

Table 2 .
Description of the demographics and clinical features of study cohort by age band.

Table 3 .
Description of the internal reliability of the "Brief Developmental Assessment" by age band.

Table 4 .
Inter-rater agreement of the "Brief Developmental Assessment" and concurrent validity of the "Brief Developmental Assessment" against the Mullen expressed as correlation by age band.

Table 5 .
Descriptive table of developmental scores and known groups by age band.

Table 6 .
Sensitivity and specificity of test measures in age bands two to five, combined.The number of patients completing the Brief Developmental Assessment and Mullen was 763 and the Ages and Stages was 671.Data from age band one not included.