Singing to infants matters: Early singing interactions affect musical preferences and facilitate vocabulary building

Abstract This research revealed that the frequency of reported parent-infant singing interactions predicted 6-month-old infants’ performance in laboratory music experiments and mediated their language development in the second year. At 6 months, infants (n = 36) were tested using a preferential listening procedure assessing their sustained attention to instrumental and sung versions of the same novel tunes whilst the parents completed an ad-hoc questionnaire assessing home musical interactions with their infants. Language development was assessed with a follow-up when the infants were 14-month-old (n = 26). The main results showed that 6-month-olds preferred listening to sung rather than instrumental melodies, and that self-reported high levels of parental singing with their infants [i] were associated with less pronounced preference for the sung over the instrumental version of the tunes at 6 months, and [ii] predicted significant advantages on the language outcomes in the second year. The results are interpreted in relation to conceptions of developmental plasticity.


Introduction
The present study aimed to investigate whether 6-month-old infants displayed preferential attention to vocal or instrumental versions of the same novel, ecologically valid tunes, and whether this preference was affected by their home experience with active (e.g., parent singing to baby) or passive (exposure to background music) musical engagements. A follow-up at 14 months was planned to test whether early language development was best predicted by the infant attentional measures or informal home musical activities reported at 6 months. (2000) highlighted that the exaggerated intonation characteristic of ID-speech is associated with better discrimination of vowels in 6-month-old infants. Crucially, a robust finding has emerged that infants prefer ID-to AD-speech, albeit with some variability in the strength of this preference across languages and contexts (ManyBabies Consortium, 2020;Dunst, Gorman & Hamby, 2012a). In one of the earliest studies Fernald (1985) also found that adults identify affect more easily from low-pass filtered versions of ID-than AD-speech (Fernald & Kuhl, 1987). Thus, the emotional dimensions appear to be part of the specialization of early ID-speech (see also Trainor & Desjardins, 2002). However, at the end of the first year of life, when vocabulary acquisition is salient, other aspects will come into play, e.g., pitch peaks in utterance-final position used with target words in ID-speech (Fernald & Mazzie, 1991; see also Delavenne, Gratier & Devouche, 2013;Longhi, 2009, for similar considerations concerning singing interactions). Gratier and Devouche (2011) found that 3-month-olds and their mothers imitate each others' productions of prosodic contours, especially those more expressive of affect. More directly relevant for musical aspects, Van Puyvelde and colleagues (Van Puyvelde, Loots, Meys, Neyt, Mairesse, Simcock & Pattyn, 2015;Van Puyvelde, Loots, Vanfleteren, Meys, Simcock & Pattyn, 2014;Van Puyvelde, Loots, Vinck, De Coster, Matthijs, Mouvet & Pattyn, 2013;Van Puyvelde, Vanfleteren, Loots, Deschuyffeleer, Vinck, Jacquet & Verhelst, 2010) revealed that adjacent mother-infant vocalisations are tonally related ('tonal synchrony') and associated with physiological co-regulation at 3 months. Specifically, consecutive infant-mother turns fall mostly within tonal intervals that are highly consonant. When dissonant sequences occur, they are usually associated with disruptions in the dyad positive engagement, followed by subsequent maternal repair turns to re-establish interactional flow. Thus, it appears that musical and linguistic features are intertwined in early communication with infants, and deeply rooted in emotional meanings (Franco, 1997). In cross-linguistic analyses, Falk (2007;2011a) highlighted that ID-singing presents a number of characteristics very similar to ID-speech (e.g., higher pitch) as well as containing phonetic and prosodic information that is language-specific: hence, providing rich material for native language learning. In particular, higher similarity is found between melodic contours in ID-song and ID-speech produced in play, rather than soothing, contexts (Falk, 2011b), that is when babies are alert and actively engaged. Thiessen and Saffran (2009) showed that 7-month-old infants learned words more easily when they were presented with a melody than in isolation, and some tenuous but intriguing findings suggested that melodies may support the discrimination of syllables in 11-month-olds (Lebedeva & Kuhl, 2010).
Overall, research has produced robust evidence of a powerful relationship between characteristics of language input in terms of qualitative and quantitative individual differences, and language development outcomes in toddlers and young children (D'Odorico & Jacob, 2006;Fernald, Marchman & Weisleder, 2013;Fernald & Mazzie, 1991;Romeo, Leonard, Robinson, West, Mackey, Rowe & Gabrieli, 2018;Suttora, Salerni, Zanchi, Zampini, Spinelli & Fasolo, 2017;. In this framework, besides being 'as good as speech', it is possible that ID-song might even present specific contributions to language development in its own right (see also Ma, Fiveash, Margulis, Behrend & Thompson, 2019). ID-singing might be a super-stimulus by combining ID-register adaptation of speech (Falk, 2007) with musical structure and regularity. For example, the possibility that song would support learning was suggested by Schön, Boyer, Moreno, Besson, Peretz and Kolinsky (2008), as a factor explaining why the acquisition of a new artificial language was facilitated by sung contexts in adults and school-age children. Schön et al. speculated that this may be due to the higher predictability of song in a naturalistic context as it affords entrainment to a regular beat and pitch (Schön et al., 2008;Woolhouse, Cross & Horton, 2016), which may facilitate speech segmentation. Thus, attentional demands might be lower in singing contexts also for infants and lend support to the developing infant's vocabulary-building skills. Furthermore, maternal singing has been shown to be effective in regulating arousal in infants (Bainbridge et al., 2021;Cirelli, Jurewicz & Trehub, 2019;Shenfield, Trehub & Nakata, 2003), thus being a good candidate to offer an optimal learning context for words and speech sounds. Based on these studies, it is possible to hypothesise that infants exposed more frequently to singing interactions in their first year would experience facilitating effects on language development.
How might this happen? Possibly in association with more 'happy sounding' performances in ID-singing (Corbeil, Trehub & Peretz, 2013), song recruits higher levels of attention than speech in 6-to 10-month-old infants (Nakata & Trehub, 2004;Tsang, Falk & Hessel, 2017) but not past the first birthday (Costa-Giomi & Ilari, 2014). This suggests that the preference for songs may decrease when infants begin to focus more specifically on to the predominant communicative form used in their surroundings (at least in Western cultures) i.e., the speech form. Consistently, Delavenne et al. (2013) and Longhi (2009) identified subtle but systematic changes in the temporal and hierarchical structure of naturalistic singing interactions between mothers and infants when comparing different developmental levels within the first year of life. With older infants, adjustments were made by mothers to accommodate both the triadic nature of the developing infant communication and speech adjustments functional to segmentation / word recognition. Thus, it might be the case that ID-singing is particularly interesting to infants and beneficial to their learning during the pre-verbal stage of the first year of life.
In sum, although in recent years remarkable progress has occurred in research on the nature of infant preferences for speech/song, the evidence available at present remains in need of further clarification concerning the relative weight of both musical and speech parameters. For example, it is unclear if ID-song may be considered a super-stimulus for infants by coalescing multiple preference cues (the human voice, speech, ID-register alterations, positive affect, musicality) or whether some of these aspects are more important in capturing and holding infant attention. In a rigorously controlled experiment contrasting naturalistic materials with native vs non-native ID-song and -speech, Tsang et al. (2017) found a preference for ID-song in 7-10-month-old infants across the two languages (particularly for the non-native stimuli). In our experiment, we aim to explore specifically some aspects of the infant preference for ID-song. We narrow the research focus on the musical form, and aim to provide a tight comparison in which the same musical form could be contrasted across performances with speech (song) and without speech (instrumental). Differently from speech, songs have a regular beat, rhythmic structures, and melodies with harmonic constraints; hence, if the infant preferences for song over speech were guided by the musical aspect per se, no preference should emerge between the two musical conditions in the present experiment. Conversely, if the speech-like quality of singing is specifically important, songs would be expected to attract infants to a greater extent than instrumental pieces.
The first aspect of the present study is a step towards this clarification in a controlled experiment, by isolating the musical form and comparing infant preferences for instrumental vs. vocal versions of the same 'happy' tunes in infant-directed style (aim 1). The second aspect that we investigated concerns the impact on infant preferences of early musical experiences. We hypothesized that infant attention in the laboratory musical tasks may be affected by previous experience and familiarity with music at home (aim 2). In order to evaluate this aspect, information about the frequency and type of musical experiences reported in the infants' families was collected with an ad-hoc parental report. Thirdly, we carried out a longitudinal follow-up when the infants were 14-month-olds in order to test language development. We hypothesized associations between receptive vocabulary and infants' earlier attention measures (aim 3). Finally, potential effects of early musical experience at home on the infants' later receptive vocabulary were also investigated (aim 4).

Participants
Participants were recruited using a mailing list derived from the municipality birth records in Milan and its conurbation (Italy). The invitation was directed to families with infants having the target age of 6 months who were living within a reasonable distance from the University campus. The response rate was approximately 50%. Volunteers who responded to the invitation were subsequently contacted to provide them with further information about the study and arrange a testing appointment at Time 1 (6 months). Only infants from Italian-speaking families were included, with 5 infants being exposed also to a L2. A final sample of n = 36 healthy infants was recruited (50% female). Two boys were born preterm: hence, they were tested at the infants' corrected age. Of this sample, n = 26 were available at Time 2 (14 months, 46% female) corresponding to a drop-out of 27.8%.
At Time 1, 7/36 infants had one person in the family who had received musical training at some point in their lives but only one of them was still actively practicing. Siblings were present for 12/36 families and 14/36 infants attended nursery or playgroups.

Procedure
Time 1 (T1 henceforth; infant age = 6 Months) Together with demographic information, maternal reports of singing interactions and home music were collected using a novel ad-hoc questionnaire (see Materials).
A preferential listening experiment was conducted in a 1.5 X 2 m Amplifon soundproof booth, with the infant sitting on the parent's lap. Parents were asked to wear headphones exposing them to white noise so as to cover sound from the experiment; parents were allowed to adjust the volume at a comfortable level with preliminary tests. Parent and infant were facing a 26.3" computer monitor positioned at 40 cm distance. Music stimuli were presented at a constant level of 60 db with a 2.1 JBL Creature III System located below the monitor (see Materials). An interesting animation with sound was displayed throughout the set up and adjustment phase to keep the infant entertained. Once the experiment began, a colourful attention-getter was used to capture the infant's attention to the centre of the screen between trials, and a black and white checkerboard was displayed on the screen during trials, i.e., with the experiment music tracks. Infants were exposed to a sequence of six musical tracks in semi-random order (i.e., appearance of a track in first position was counterbalanced and the remaining trials were randomised), which included three vocal and three instrumental 1-minute trials (see below for details). An experimenter monitored infants' behaviour through a video-camera but was blind to the sound condition and, at the time of testing, to the individual's family background variables. Infant behaviour was recorded via experimenter's key-press signalling looking-on vs. looking-away from the screen. A trial was started with the infant gaze in central position and included the visual presentation of the checkerboard in association with a musical track. When the infant looked away from the screen, the music stopped but resumed immediately when the infant returned attention back to screen.
The experiment was programmed using MATLAB software, which also time-stamped all output events indicated by the experimenter. A looking-away event of 4 sec was the threshold to trigger automatic discontinuation of a trial and activation of the central visual attention-getter before the next trial. The behavioural measures analysed from the preferential listening experiment were: [i] mean listening event (i.e., the mean duration of continuous listening events, after the first orientation was discarded, Shi & Werker, 2001), [ii] distraction events (mean number of off-screen attention events before the end or discontinuation of the trial). Since we expected both types of stimuli (vocal and instrumental tunes) to be attractive to infants, we decided to measure continuous listening given that we were interested in engagement with structured melodic stimuli which are by definition extending over time i.e., a point in time corresponding, for instance, to one-two notes / syllable would not qualify as a melody. In this way, we have elected to give more importance to sustained attention rather than relying on the more conventional measure of total listening time, which is predominantly used in infant experiments utilizing discrete or segmentally qualified repetitive sound associated with contingent visual attention-holding stimuli (for a review of measures used in preference experiments, see Cristia et al., 2014). Furthermore, total listening time derived from total looking time may overemphasise visual attention/information processing. In order to have a measure of the variability in infant disengagement, however, we have included the second measure, Distraction events (Tellinghuisen, Oakes & Tjebkes, 1999).
Interrater reliabilities on fixation judgment were computed between the off-line ratings of two observers (the experimenter and an extra observer) on 12 randomly selected experimental sessions. The observers agreed on the time of occurrence of infant looks within 1.0 s in ⩾ 95% of judgments (average difference in time between the judged occurrence of looks: X = 0.653 s). Pearson's correlation coefficients between infant looking-time judgments during the test phase ranged from r = .91 to r = .98 across conditions.
Mothers and infants were also video-recorded in an adjacent observation lab during a 5 minutes free-play interaction with age-appropriate toys. Mothers were asked to play as they normally do with their infants.
Time 2 (T2 henceforth; infant age = 14 months) All participant families were contacted when the infant reached 14 months and invited to complete a parental report on their infant's language development (see Materials).

Materials
The music for the laboratory test was composed ad-hoc, with the main aim to avoid possible influences of uneven prior familiarity with the songs across the sample. When comparing ID-songs that may be produced in daily interactions with instrumental music that infants may be exposed to, there are many differentiating variables, which cannot be completely controlled in a single experiment. For the purpose of the present study, we chose to produce two highly similar stimuli differentiated only by the carrier of the melody, which was either an instrument or a female voice singing nonsense words. The core aspects of the brief given to the composer were: [1] to create ecologically valid stimuli in a style characteristic of Western music addressed to childrenthe composer invested a significant amount of study time, investigating the characteristics of current child-directed music before selecting the instruments and beginning composition; [2] to create musical tracks/songs that would be novel to all children (hence, avoiding similarity with highly popular tunes and choosing the nonsense syllables best fitting the musical score). The final versions were selected in consultation with the researchers, based on a unanimous decision that these were the three best pairs (instrumental/song) to represent child-directed happy-sounding music. The same music was used as part of a larger set for another study (Franco, Chew & Swaine, 2017) in which a validation was conducted with a sample of N = 36 adults that judged these tracks as 'happy-sounding' with very high ratings.
The six final tunes (approx.1 minute each in duration, 68.5 s on average) were produced on a Macintosh laptop using Logic software in both instrumental and sung versions, all presented at 60dB (audio examples from the composer at: https:// soundcloud.com/user-601027202/sets/music-stimuli_happy_i-s/s-OCzee). They all had a 4/4 time signature and used expressive cues associated with positive affect in music (for a review, see Gabrielsson & Lindström, 2010)namely, major keys (respectively, D, G and C major), fast tempos (respectively, 126, 134 and 113 BPM), staccato articulations, and instruments with fast attack times and shorter sustain and release times (e.g., drums, marimba, pizzicato strings), and they were highly consonant. For both consistency and variety, each of the three instrumental compositions were scored with different combinations of a percussion, harmonic, and lead instrument. The song and instrumental tracks were identical except the melody was carried by a female singer in the song stimuli and by an instrument in the instrumental version of each tune. The instrumental parts were created with sampled instruments, and were sequenced using both real-time expressive performance (via a USB keyboard) and offline editing of note events. The vocals were recorded in the soprano range with a trained female singer working with toddlers and young children at the time of recording and who was asked to imagine singing for a toddler pictured in photographs available in the studio. Rather than using meaningful lyrics, nonsense syllables were used to create pseudowords in order to prevent possible influences of uneven prior familiarity with some words across the infants (for the use of pseudowords in infant experiments, see Bortfeld, Morgan, Golinkoff & Rathbun, 2005;Thiessen & Saffran, 2009). The composer chose the pseudowords on the basis of the best fit to the score for both affect and rhythmic aspects.
Assessment of the home musical environment at 0-6 months A parental report, constituted by 12 items (Musical Experience in the Family, Franco, 2013), was administered to collect information about parents' musical education, music listening in the home (frequency and genres), activities, and singing interactions with the infants (frequency; context, e.g., play, sleep time; type of songs, e.g., lullabies, play songs). Some of the questions were open and descriptive (e.g., concerning the types of songs used with the infants, or the types of instruments/objects/toys used to make musical sound). For the purpose of this study, two variables were derived, Active Engagements (exposure to ID-singing) and Passive Engagements (exposure to background music), which were evaluated on a 4-point frequency scale (0 = only occasionally, 1 = sometimes, e.g., every couple of weeks max. once a week, 2 = daily, 3 = several times every day). Since the questionnaire was not validated, the psychometric properties of these questions were unknown. Certain scores were little used by the parents: hence, it was decided to dichotomise these variables and create a 2-level exposure variable in the analyses, with 'low exposure' (collapsing ratings 0-1) and 'high exposure' (collapsing ratings 2-3) for both Active and Passive Engagements with music at home.

Assessment of language development
At Time 2, the Primo Vocabolario del Bambino inventory (Caselli & Casadio, 1995), which is the Italian adaptation of the MacArthur-Bates Communicative Development Inventory (CDI; Fenson et al., 2007) Words and Gestures form, was collected from the infants' parents. From this, the outcome variable selected was Word Comprehension (CDI-WC), since receptive vocabulary is considered the most reliable measure of language development at the age of interest here (Cristia et al., 2014;Fernald & Marchman, 2012).

Maternal variables
Given the established importance of maternal variables in language development (Landry, Smith & Swank, 2006), we included maternal education and sensitivity in our study. The former is considered a general proxy of SES (e.g., Smith, Hart, Hole, MacKinnon, Gillis, Watt, Blane & Hawthorne, 1998) and was measured in years of formal education.
The Parent-Child Early Relationship Assessment (PCERA; Clark, 1985) is considered part of sensitivity-based measures that can be used with parents and young infants (Mesman & Emmen, 2013) and was used here to code the 5 minutes' free-play interactions at 6 months. PCERA is designed to assess behavioural characteristics of parents and infants, and the frequency, duration, and intensity of affect that occur during 5 min of face-to-face interactions. On the basis of the 5-min observation, each variable is coded on a scale ranging from 1 (negative relational quality) to 5 (positive relational quality). In the present study, we focused on the 29 parent variables that could be coded at 6 months. These included: tone of voice, affect and mood, attitude toward the child, affective and behavioural involvement, and style. A total maternal interaction quality score was calculated and finally the mean score between the items was calculated as an index of maternal interactive quality (PCERA maternal total mean score). This mean score ranged between 1 and 5 with higher scores indicating better quality of interaction. The PCERA (Clark, 1985) has an acceptable range of internal consistency, factor validity, and discriminant validity between high risk and well-functioning mothers (Clark, 1999;Spinelli, Poehlmann & Bolt, 2013). One of the authors is a certified coder for PCERA and coded the whole sample of observations; a second researcher trained on PCERA coded 20% of randomly chosen interactions from the sample. The inter-correlation among the global scores of the two coders was r = .89, indicating a good level of inter-rater reliability.

Data analysis
In order to assess infant preference for vocal vs instrumental condition, two ANOVAs were conducted with mean listening event and distraction events as dependent variables (aim 1). To assess infant attention during the preferential listening experiment as a function of musical experience at home (aim 2), two mixed factor GLM ANOVAs were performed for the listening variables with (i) music type (vocal, instrumental) X Home Active Engagements (High/Low) and (ii) music type (vocal, instrumental) X Home Passive Engagements (High/Low). Pearson correlations were used to assess the relationship between children's language development at 14 months and children's earlier attention measures (aim 3). ANOVAs were further performed to assess whether children's language outcomes at 14 months differed as a function of children's active and passive musical experience at home measured at T1 (aim 4). Finally, mediation and moderation analyses (Hayes, 2013;Preacher & Hayes, 2008;Preacher, 2015) were conducted on language development outcomes at 14 monthsincluding, respectively, Home Active and Passive Music Engagement as independent variables; and infant preferential listening variables as mediators. Preliminary analyses were planned to test associations of maternal variables with home musical engagements, with any significant variable used in moderation analyses when relevant (aim 5). Mediation analysis tests if a third variable, the mediator, mediates the relationship between the independent and the dependent variables, investigating the direct and indirect effects of the independent variable on the dependent variable outcome. Moderation analysis examines whether the effect of the independent variable on the outcome is influenced by its interaction with a third variable, the moderator. These analyses were conducted using the tool "PROCESS" (Hayes, 2012) in SPSS.

Ethics
Ethical approval was granted by the Psychology Ethics Committees of the University of Milan-Bicocca (Italy), conforming to the principles of the Helsinki Declaration for research involving human participants. Informed written consent was collected from the children's parents before testing and debriefing was provided at the end of each session.

Results
The number of participants for each variable assessed with the home musical engagement MEF questionnaire (Franco, 2013) at T1 and remaining in the sample at T2 is shown in Table 1. Parents reported that Active (Home ID-singing interactions) and Passive (Home Background Music) Engagements with music were common for most infants' families.
Preliminary analyses were conducted to assess the contribution of maternal variables to home music interactions. Given the high level of educational attainment in the sample (M = 15, SD = 3, with 13 years corresponding to completion of secondary school and 16 years to a university degree), this variable was dichotomised for the analyses (with/without university-level education). There were no significant associations of maternal level of education (with/without university-level education) with either Active (Low vs. High exposure to ID-singing) or Passive Musical Engagements (Low vs. High exposure to Background music): Pearson Chi-squared tests, Active Musical Engagements: X 2 (1) = 0.39, p = .53, Passive Musical Engagements: X 2 (1) = 0.09, p = .77. Thus, this maternal variable will no longer be entered in the analyses.
Parents in our sample also presented PCERA scores situated on the higher range of the 1-5 scale in both high-and low-exposure groups, indicating that overall parents were quite-to highly-positive during the interaction with their infants. Univariate ANOVAs were conducted on PCERA maternal total mean score comparing the infant groups with high vs. low levels of exposure to Active and Passive musical experiences at home. Results showed significant effects for both types of engagements, revealing that higher PCERA mean scores were found in the group with higher exposure to both Active and Passive music engagements, indicating that more positive interactions were observed in the more highly musical groups of either types of engagement: Active Musical Engagement: F(1,28) = 5.32, p = .029, η p 2 = 0.17 (Low exposure M = 3.74, SD = 0.40; High exposure M = 4.08, SD = 0.33); Passive Musical Engagement: F(1,28) = 4.36, p = .046, η p 2 = 0.14 (Low exposure M = 3.68, SD = 0.40, High exposure M = 4.05, SD = 0.35).
The following key findings emerged from the main analyses, investigating infant preferences for vocal/instrumental music and their relationships with home musical environment and language development.
Infant Attention in the preferential listening experiment (Aim 1) Table 2 displays the infants' listening variables as a function of the experimental conditions. There was a significant effect of condition on the mean listening event, showing longer mean listening events with the vocal than the instrumental version of the tunes, F(1,35) = 5.75, p = .022, η p 2 = 0.14, suggesting that overall infants preferred vocal to instrumental music. The overall distribution of Distraction events was not affected by condition, F(1,35) = 0.01, p = .99, η p 2 = 0.01.

Effects of home Active and Passive Musical Engagements on infant listening preference (aim 2)
The effect of Active Musical Engagements on the infant attention measures was tested with GLM ANOVAs High-/Low-ID-Singing exposure X vocal/instrumental music conditions. Significant interaction effects were found for both Mean Listening Event and Distraction, as reported in Table 3. Post-hoc analyses revealed that infants from Low ID-singing exposure displayed significantly longer Mean Listening Events with vocal than instrumental music, t(11) = 3.02, p = .012, but infants from High ID-singing exposure did not show a preference, t(23) = .73, p = .472 (see Figure 1). When considering Distraction Events, infants with Low ID-singing exposure displayed a trend for higher levels of Distraction with instrumental than vocal music, t (11) = −2.10, p = .060, whereas infants with High ID-singing exposure showed no difference, t(23) = 1.08, p = .290. As Figure 2 shows, infants from High ID-singing environment were more distracted than infants from Low ID-singing environment in vocal music conditions, t(32) = −3.62, p = .001 (see Figure 2).  There were no significant interaction effects of Passive Musical Engagements on the infant attention measures in the GLM ANOVA High-/Low Background Music exposure X vocal/instrumental music condition (see Table 3).
In sum, early home musical experiences modulated infant attention measures only when they included active interactions involving ID-singing addressed to the infants, with infant overall preference for vocal over instrumental tunes being reduced by high levels of ID-singing in the first six months of life.
Associations of infant attention to vocal/instrumental tunes at T1 with language development at T2 (Aim 3) Correlational analyses investigated the relationships between the infant attention variables during the preferential listening experiment at T1, 6 months, and language development at T2, 14 months, as measured by receptive vocabulary (CDI-WC) (see Table 4). When considering the vocal condition, the observed negative association between Mean Listening Event and receptive vocabulary was non-significant, but the positive relationship found between the frequency of Distraction Events and CDI-WC was significant, indicating that the more distraction during the vocal music trials at 6 months, the higher the scores on language development in this CDI component at 14 months. None of the corresponding analyses in the instrumental music condition yielded any significant correlations, suggesting that attention to instrumental music was not specifically associated with CDI-WC at this age.

Effects of home Active and Passive Musical Engagements at T1 on language development at T2 (n = 26) (Aim 4)
The effect of High/Low Active Musical Engagements at T1 (6 months) on the CDI Word Comprehension score at T2 (14 months) was highly significant in the ANOVA, F(1,24) = 11.05, p = .003, η p 2 = 0.31. Infants with reported higher exposure to singing interactions in their daily lives during their first 6 months of life outperformed infants with lower ID-singing exposure on CDI-WC at 14 months: Low ID-singing group: M = 57.89, (SD = 57.63); High ID-singing group: M = 152.59 (SD = 74.59).
The parallel ANOVA conducted to assess whether High/Low Passive Musical Engagements at T1 (6 months) had an impact on language development at T2 (14 months) also proved to be significant, F(1,24) = 4.77, p = .039, η p 2 = 0.17. Infants with lower exposure to background music displayed smaller receptive vocabularies (CDI-WC) than infants for whom exposure to background music was part of their In sum, higher levels of both Active and Passive Musical Engagements in the home environment reported at 6 months facilitated, respectively, nearly threefold and twofold gains in infants' language development as measured by receptive vocabulary at 14 months.

Assessment of indirect relationships
These analyses were conducted as proof of concept, in order to elucidate the relationships underlying the variables considered in this study, i.e., home musical environment, infants' attending to different musical forms at 6 months and language outcome at 14 months. Mediation and moderation models were used to test indirect and interaction effects between some variables that yielded consistent and significant associations in the analyses presented above.
Active Musical Engagement, infant attending to song and language development Infants from high ID-singing exposure environments displayed significantly more distraction during Vocal tracks at 6 months and better vocabularies at 14 months than their peers from low-ID-singing exposure environments (see above): hence, Distraction Events was tested as mediator between Active Musical Engagements at T1 (High vs Low ID-singing exposure) and infants' receptive vocabulary (CDI-WC) at T2. The bootstrapping method developed by Preacher and Hayes (2008) with 5000 replications was used in this mediation analysis. Both direct (B = 66.94, SE = 29.55, 95% CI = 5.81−128.1) and, specifically of interest here, indirect effects of home Active Musical Engagements on children's CDI-WC score were significant (B = 27.75, SE = 13.08, 95% CI = 6.72-57.43). The indirect effect suggests that Active Musical Engagements at T1 additionally affected receptive vocabulary at T2 indirectly through infant listening patterns (Distraction Events here) at T1, see Figure 3.
Active Musical Engagement, maternal sensitivity (Pcera) and language development Maternal sensitivity was associated with higher levels of home musical interactions at 6 months (see results above). A moderation analysis was conducted using maternal PCERA total mean score as a moderator of the association between Active Musical Engagements (high/low ID-singing) and language development at 14 months (CDI-WC score).
The moderation model explained 44% of variance, F (3,17) = 4.54, p = .016. However, the interaction of high/low exposure to ID-singing X PCERA score effect was nonsignificant (β= −.05, p = .83). This means that maternal sensitivity as measured by PCERA total mean score did not moderate the effect of Active Musical Engagements on infant receptive vocabulary (CDI-WC).

Passive Musical Engagement
The results for Passive Musical Engagement using mediation and moderation models paralleling those conducted for Active Musical Engagements supported a more tenuous relationship between early exposure to background music and receptive vocabulary at 14 months. The mediation model using Distraction Events as mediator between Passive Musical Engagements at T1 (High vs Low exposure to background music) and infants' receptive vocabulary (CDI-WC) at T2 showed only one significant indirect effect, suggesting that any effects on receptive vocabulary were mediated by infant attentional measures. These results are reported in the Supplementary Materials (Supplementary Materials), together with the nonsignificant moderation model concerning maternal sensitivity (PCERA score).

Discussion
The aim of the present study was three-fold, addressing questions concerning (1) 6-month-old infants' attention to sung vs. instrumental versions of tunes, (2) the potential influence of the informal home musical experience at the same age (namely, parent-reported exposure to ID-singing and background music), and (3) the relationship of these early measures to language development at 14 months. Additionally, tests of indirect effects between the relevant variables were made, as a proof of concept for future investigations.
The present study found that when keeping music structural properties constant, 6-month-old infants remained continuously engaged for longer with vocal than instrumental versions of 'happy' music tracks. One important contributing factor to this finding may be that, in early development, the human voice is a most attention-grabbing and -holding type of stimulus (Vouloumanos & Werker, 2007): hence, when comparing our stimuli, infants' attention was more attracted by the human voice (present in the vocal but not in the instrumental tracks). Furthermore, our results are compatible with both infant studies that highlighted a preference for ID-song over ID-speech (two forms of ID-vocal communication: Nakata & Trehub, 2004;Tsang et al., 2017), and Ilari and Sundara's (2009) study, which revealed a preference for song without rather than with instrumental accompaniment. However, the present experiment allows us to further show that 6-month-old infants' attention is attracted specifically by songthat is, a rich combination of speech sounds with musical structurenot just by the musical form. If the superiority effect found in the above studies comparing ID-speech and ID-song were due to the musical aspect per se (absent in the former but present in the latter), we should not observe an overall preference for vocal over instrumental music in our study, since the musical aspect was identical in our stimuli. We also overcame the possibility that simple complexity factors may explain the preference found in Ilari and Sundara (2009) for song without accompaniment (simpler) over song with accompaniment (more complex), by using vocal and instrumental tracks of equivalent level of complexity and novelty.
Our results suggest that song (voice + speech + music) is a uniquely attractive communicative form for infants. Although ID-singing presents specific adjustments along the ID-register characteristics, some important differences from ID-speech occur too: namely, pitch stability and the organization around a regular beat occurring with singing. While neither of these features is needed or typical of ID-speech (Falk, 2007), previous studies with adults and children (Schön et al., 2008) have also speculated that isochrony (Ravignani & Madison, 2017) may support attention and, hence, facilitate learning in song compared to speech. It is likely that song's regular pulse assists the building of infant expectations, engagement and stream-segmentation, while pitch stability may help the infant to identify phonetic regularities. Therefore, an association between song and language development may be expected.
However, the results revealed a more nuanced picture, as infants' inclination to listen to songs rather than instrumental versions of the same tunes was influenced by their experience with music. Babies growing up in families reporting very high levels of informal musical interactions in the home appeared to be less pervasively fascinated by the experiment songs, with some individual infants actually displaying more interest for instrumental tracks. Conversely, babies from families reporting low levels of musical engagements at home consistently showed a preference for songs and less distraction with vocal than instrumental music. Thus, the results suggest that environmental influences are already shaping infant attention as observed in the laboratory, in a direction that may be tentatively interpreted as capturing some novelty effect (vs. lack of). Whilst songs attract all infants' attention and appear to be an appealing communicative device, a low level of exposure to this experience did accentuate this preference in the laboratory setting, whereas a high level of exposure attenuated it. In other words, infants from highly musical families were interested also in other musical forms besides the general preference for song shared with their peers from less musical environments. It is also a possibility that the higher levels of distractability observed during the songs in infants from musically active environments suggests that they were expecting to interact in some way during the experiment, since singing interactions were part of their everyday life.
This novel environmental effect corroborates recent suggestions that early musical exchanges including ID-singing would play an important role in the development of communication (Falk, Fasolo, Genovese, Romero-Lauro & Franco, 2021;Politimou et al., 2019;Schaal, Politimou, Franco, Stewart & Müllensiefen, 2020) and merits further investigation, particularly in consideration of their longitudinal effects on language development found in the present study. The follow-up at 14 months revealed a strong association between exposure to reported high levels of musical interactions in the first 6 months of life and superior gains in early receptive vocabulary (CDI-WC). These results are important: since receptive vocabulary at this age is a key predictor of developments in productive languagehence, paving the way to better outcomes in early language acquisition (Tsao, Liu & Kuhl, 2004;Kuhl, Ramírez, Bosseler, Lin & Imada, 2014).
Interestingly, this relationship was reflected indirectly in the infant preferential listening experiment: for example, infants from high-music families showed an equal amount of distraction events in song and instrumental tracks in the lab at 6 months (this acoustic form, song, being largely familiar to them) while displaying better language development outcomes in their second year. Conversely, infants from low-music families were those who were most attracted by songs in the laboratory at 6 months (having less home experience of this highly attractive acoustical form) and also displayed lower levels of language development in the second year. Collectively, these findings might tap into speed of speech processing: infants who can decode relatively quickly have shorter mean listening, higher distraction and better subsequent receptive vocabulary (cf. Fernald et al., 2006). This means that early informal musical experiences in the home shape infant attention to sound, and that the cumulative experiences associated with them predict developmental outcomes. It also suggests that laboratory measures taken at only one point in development, without family context nor longitudinal perspective, may reveal only one part of the developmental story and its significance. Indeed, recent paradigmatic shifts point in the direction of multiple cues, longitudinal work and cooperative effort in the establishment of causal patterns affecting development (Byers-Heinlein, Tsui, Bergmann, Black, Brown, Carbajal, Durrant, Fennell, Fiévet, Frank, Gampe, Gervain, Gonzalez-Gomez, Hamlin, Havron, Hernik, Kerr, Killam, Klassen, Kosie, Kovács, Lew-Williams, Liu, Marino, Mastroberardino, Mateu, Noble, Orena, Polka, Potter, Singh, Soderstrom, Sundara, Waddell, Werker & Wermelinger, in press;Cristia et al., 2014;ManyBabies Consortium, 2020). Our results on infant engagement/distraction during the experimental conditions would have been puzzling without the longitudinal measure of language development, in their showing more distractability during song when the infants had a rich musical experience at home. The results of the follow-up allow us to interpret the earlier measures as potentially indicative of developmental advance, contrary to what one might have expected at the outset.
The strong association found between musical experience in the home and early language development was further examined by separating active engagements (with a specific focus on singing with the infant) from passive engagements (e.g., exposing the infant to a variety of background music). In a nutshell, active engagements revealed more extensive effects than passive exposure (cf. Gerry et al., 2012, for similar trends in developmental measures as a result of structured infant group activities). Specifically, high levels of ID-singing exposure appeared both directly and indirectly (mediated by infant preferences to sound types) associated with better developmental outcomes in receptive vocabulary, whereas high levels of passive engagements (exposure to background music) only displayed one significant direct relationshipnamely, with higher scores in CDI-word comprehension at 14 monthscompared to families with low levels of passive musical exposure, and no mediated effects. The contribution of background music towards the development of infant receptive vocabulary may be underpinned by statistical learning mechanisms based on repeated exposure to transitional probabilities in verbal and melodic aspects occurring in songs (Thiessen & Saffran, 2009). This mechanism, however, is vulnerable to interfering attentional demands, which lead to poorer performance (Toro, Sinnett & Soto-Faraco, 2005). In the home environment, differently from active musical interactions, background music may co-occur with other possibly competing joint or independent activities for parent and infant.
The more prominent effect of active musical engagements for gains in early language development may be associated to a number of factors, which future research will need to address. Schön et al. (2008) found that adults' and children's ability to segment 'words' out of an artificial language sound stream was superior in a musical condition (in which speech units were coupled to pitches) than to a condition with flat speech (i.e., no coupling between speech-units and pitches). Schön et al. (2008) suggested that the musical condition presented enriched material to learn from (see also Ma et al., 2019) and, when generalising to more naturalistic musical contexts, would present the learner with a more predictable sound stream based on a regular beat. In interactions with infants, singing is typically associated with positive affect (Corbeil et al., 2013;Trainor & Desjardins, 2002), and, hence, it includes a motivational aspect. In extending such speculations to the results of the present study, a first consideration is that singing with infants is often associated with motor activities (e.g., bouncing the baby on the knees): hence, promoting inter-sensory redundancy that facilitates learning (Bahrick & Lickliter, 2014), exploiting auditory-vestibular or -tactile abilities that are available in young infants (Phillips-Silver & Trainor, 2005). A second advantage of ID-singing is its documented association with emotion regulation in infants (Shenfield et al., 2003) and neurovisceral integration (Van Puyvelde et al., 2013;, which may optimise the infant's learning opportunities. Furthermore, there is evidence that direct interaction has superior effects on infant learning compared to passive exposure. Kuhl, Tsao and Liu (2003) showed that post-phonetic narrowing infants learned aspects of a new language (Mandarin) when interacting with a real person but not when exposed to a televised display of the Mandarin speaker or simply audio-recordings.
Besides contingent adult scaffolding (Elsabbagh et al., 2013) and opportunities for mutual and joint attention, active musical engagements between parents and infants may benefit from [i] the entrainment to one another via the musical beat (Cirelli, Trehub & Trainor, 2018;, which may give infants facilitated access to embodied aspects (e.g., breathing for vocalising), and [ii] what has been described in vocal exchanges as 'tonal synchrony' (i.e., the spontaneous falling of infant-mother vocalizations into tonally related musical intervals; Van Puyvelde et al., 2010;Van Puyvelde et al., 2015). Thus, songs may facilitate vocal imitation and speech segmentation, an ability that by 12 months predicts later language development at 2 and 4-6 years (Newman, Ratner, Jusczyk, Jusczyk & Dow, 2006). In the present study an analysis directly comparing active and passive musical engagements would have caused an unacceptable drop in statistical power also due to uneven numerosity in the relevant groups (e.g., only 2/24 parents with high ID-singing score reported to play background music very infrequently). However, based on our preliminary results, future studies with larger samples could pre-screen parents to test whether the effects of reported active and passive musical engagements on infant measures have independent or complementary effects.
It is possible that the effects found on attentional measures at 6 months and receptive vocabulary at 14 months may be explained by other factors (uncontrolled in this study), rather than frequency and type of early home musical interactions with infants. In this regard there could be a factor underlying both aspects and explaining the results. Maternal variables were considered as one of such potential influences, but maternal education and demographic characteristics (including the overall level of formal musicianship in the sample) were similar in high/low musical families. PCERA sensitivity scores were higher in both groups with higher active and higher passive musical exposure, compared to low levels of home musical activities. This may be interpreted as parents with higher sensitivity scores being indeed more 'in tune' with their infants: hence, selecting activities that infants are known to like (e.g., ID-singing; Tsang et al., 2017). However, across the whole sample PCERA scores were in the higher range of positivity. Possibly for this reason, maternal sensitivity did not moderate the relationship of active and passive musical engagements with infant language development. These results suggest that the influence of home musical interactions on shaping infant attention to sound and language outcomes are relatively independent form social influences per se, and cannot be explained by factors such as maternal sensitivity. In spite of substantial differences in the studies, the results of the present research are strikingly consistent with Weisleder and Fernald (2013), who were able to show that in a demographically homogeneous low-SES sample, toddlers whose mothers offered a larger and richer ID-speech input displayed superior language development measures compared with infants who experienced less speech directed to them. This effect was mediated by infants' language processing efficiency, and overheard speech did not have the same effect.
The present study had limitations, which need to be addressed by further, more controlled investigations. Firstly, we have used two measures of engagement with sound in the vocal/instrumental experiment: duration of a mean listening event (orientation to the sound source) and number of distraction events (look-aways from sound source), aiming to capture sustained attention instead of general exposure time as in the more conventional measure of total fixation time as a proxy for total listening. While this choice may make it more difficult to directly compare results from this experiment with other studies, we hope to stimulate some methodological exploration directly comparing these two measures (mean duration of continuous listening event vs total fixation time) in a variety of paradigms, in order to identify their strengths and weaknesses. Secondly, participants in our sample were relatively highly educated: hence, the lack of maternal education effects may not hold when having a much more diverse sample. Similarly, all our participants scored relatively high for sensitivity as measured by PCERA: hence, limiting the generalizability of the results concerning maternal variables. However, it could be argued that given the relatively small size of this socially homogeneous sample, the absence of large differences in education and sensitivity may support the view that any effects observed are actually due to variables other than demographic features, particularly when considering the longitudinal design employed (measuring the same families when the infants were 6 and 14 months). Besides controlling for levels of informal home musical interactions, future studies would be strengthened by quantified online measures of ID-speech such as those used by Weisleder and Fernald (2013), in order to evaluate the independence (or otherwise) of musical input from speech input. It is possible that individual differences in the amount of speech and singing directed to infants, or the amount of interactions and singing directed to infants would covary. However, it is also possible that individual differences in the amount of singing to infants are independent from the amount of speech or, say, book reading (Dunst, Simkus & Hamby, 2012b), and may operate as protecting factors for language development in some scenarios. This is an empirical question for future studies to test. At present this involves a technological challenge, i.e., using LENA technology (see Wang, Williams, Dilley & Houston, 2020, for a recent review). Based on unobtrusive continuous sound recordings of an infant's interactions over multiple days, LENA allows researchers to have automatic counts of speech directed to infants. However, tracking musical interactions along a day in the life of an infant has still to be coded off-line.
The type of informal musical experience gained by infants at home was introduced in the study as a potential factor affecting infant attention to musical sound. At the time of designing the study there was no information in the literature about possible effects of this type of experience in infancy, and a tool with good psychometric properties to measure this effect was not available. Although our questionnaire did not provide a proper score, and the two subgroups identified (high/low levels of home musical activities) do not have the same numerosity, the results of this study are consistent with a substantial body of literature showing that, at later ages, musical training facilitates cognitive and language development (among others, Degé & Schwarzer, 2011;François et al., 2012;Moreno et al., 2011;Politimou et al., 2019;Strait, Parbery-Clark, Hittner & Kraus, 2012) and attendance to organised musical sessions in infancy enhances musical and social abilities (Gerry et al., 2012) as well as neural processing of speech and music (Zhao & Kuhl, 2016). Some evidence is also emerging, which shows that at 2-3 years of age shared musical activities between parents and children are associated with more mature neural processing of structural aspects of auditory stimuli (Putkinen, Tervaniemi & Huotilainen, 2013) and predict children's language, attention, arithmetic and social abilities two years later (Williams, Barrett, Welch, Abad & Broughton, 2015). Although extremely interesting, these two studies did not consider the beginning of the developmental path, and the assessment of the home musical engagements is based, respectively, on six and one items. In this respect, the present study is one of the first to show a relationship between infant exposure to home music (ID-singing in particular) and infant attention to sound and preferences at 6 months with early language development in the second year of life (see also Schaal et al., 2020).
The results of the present study identify a need for longitudinal investigations with larger samples in order to study more in depth the effect of early musical interactions on language development, also including online measures of linguistic input. New tools with reliable psychometric properties are needed to measure home musical engagements and are just beginning to appear (Politimou, Stewart, Müllensiefen & Franco, 2018), as have new frameworks for studying complex developmental relationships, e.g., the ManyBabies approach (Byers-Heinlein et al., in press;ManyBabies Consortium, 2020).
This research is important because we know from robust literature that advantages or disadvantages in early language development have major cascading effects on language acquisition and school readiness (e.g., Fernald et al., 2013). Research-based practice of musical interactions may represent an integrative activity to propose as parenting support in challenging contexts, early education, and with groups considered at risk or vulnerable for language development. An increase of early musical interactions may possibly act as a protecting factor in contexts of linguistic disadvantage, or facilitate language development in more challenging (e.g., bilingual) learning environments.