The development of narrative skills in monolingual Swedish-speaking children aged 4 to 9: a longitudinal study

Abstract This longitudinal study investigated the development of oral narrative skills in monolingual Swedish-speaking children (N = 17). The MAIN Cat/Dog stories were administered at four timepoints between age 4 and 9. Different narrative aspects were found to develop differently. In story comprehension, the children performed high already at T1 (4;4) and were at ceiling at T2 (5;10), whereas story structure developed significantly until T4 (9;4). Narrative length and syntactic complexity reached a plateau at T3 (7;4). Referent introduction was not mastered until T4. The results suggest that general conclusions regarding the development of narrative skills depend on the specific aspects studied.


Introduction and background
Picture-based elicited narratives are controlled language samples which can be used to give a fairly ecologically valid assessment of children's language skills (e.g., Botting, 2002). In addition, narrative data can be analyzed in various ways (see Pavlenko, 2008 for an overview), and thus provide information about various aspects of children's language skills, such as their ability to structure complex discourse (Fiestas & Peña, 2004) or to narrate how story characters think and feel (Burris & Brown, 2014). For these reasons, studies of children's narratives have become increasingly popular in recent years, with both monolingual and bilingual children speaking a number of different languages being investigated. Previous studies have shown that children's narrative abilities develop extensively during the preschool and early school years (e.g., Berman & Slobin, 1994;Hickmann, Hendriks, Roland, & Liang, 1996;Justice, Bowles, Kaderavek, Ukrainetz, Eisenberg & Gillam, 2006;Pearson, 2002;Schneider, Hayward, & Dubé, 2006). However, most previous studies were cross-sectional and thus investigated age differences and not development with age in the same children, which means that they could not take differences in individual developmental trajectories into account. In contrast, the present study investigates development from age 4 to 9 LONGITUDINALLY, following the same children throughout the time period that has been found to be central for the development of narrative skills.
Narrative analyses are often divided into two parts or levels (e.g., Justice et al., 2006), the MACROSTRUCTURE or STORY STRUCTURE, the structural organization of the story content, and the MICROSTRUCTURE, which includes different measures of the linguistic structure, such as productivity (the length of the narrative), use of vocabulary, syntactic complexity or the use of referential, temporal and causal linking devices. Although previous studies have often investigated aspects from both levels, studies that investigate development with age for a combination of different narrative aspects are rare. The present study analyzes comprehension and production of macrostructure (story structure) together with three different microstructural aspects (productivity, syntactic complexity and referent introduction); the included aspects tap into different narrative skills and combining them gives a broader view of how children's narratives develop.
The present study investigates developmental patterns from age 4 to 9 for five different narrative aspects in Swedish-speaking monolingual children. The following two research questions are asked: (1) How do story comprehension, story structure, narrative length, syntactic complexity, and the ability to introduce referents appropriately develop from age 4 to 9? (2) Are the developmental patterns the same for all five aspects? After an overview of the narrative instrument used in the study, the remainder of this section summarizes results of previous studies for each of the five narrative aspects.

Story comprehension
Comprehending a story means understanding the relations between events in the plot, such as reasons for a character's actions, and requires the listener to draw inferences (Hayward, Schneider, & Gillam, 2009;Stein & Glenn, 1979). Story comprehension has mainly been measured as (correct) answers to (inference-based) probe questions (e.g., Bohnacker, 2016;Lindgren, 2019;Stein & Glenn, 1979;Trabasso, Stein, Rodkin, Munger, & Baughn, 1992). For example, in a classic study, Stein and Glenn (1979) studied story comprehension by monolingual English six-and ten-year-olds. The six-year-olds had overall good narrative comprehension, but there was some further development to age 10.
MAIN includes a set of ten comprehension questions and a number of studies have investigated children's responses to these questions (e.g., Bohnacker, 2016;Fiani, Henry, & Prévost, 2020;Lindgren, 2018aLindgren, , 2019. 2 Results indicate that children's comprehension of the MAIN stories develops relatively early. For example, in a study of English-Swedish bilinguals, Bohnacker (2016) found development from age 5 (N = 19) to age 6-7 (N = 33), but story comprehension was already at a high level at age 5. Lindgren (2018a) found a significant increase from age 4 to 6 in the story comprehension of Swedish monolinguals, German-Swedish bilinguals and Turkish-Swedish bilinguals (N = 166); comprehension was good already at age 4. In a longitudinal study of Swedish monolinguals, Lindgren (2019) found development in story comprehension of the MAIN Baby Birds/Baby Goats stories from age 4 to 7, with scores approaching ceiling at age 7. With the exception of a recent study of Lebanese Arabic-French bilinguals aged 5 to 9 (Fiani et al., 2020), there are no published studies of story comprehension using MAIN in children above age 8. From these earlier studies, it can thus be expected that story comprehension on MAIN develops substantially during the preschool years, but that it is unlikely that there is further development to age 9.

Story structure
There is considerable variation in the aspects of the story content that are scored as part of the STORY STRUCTURE, the narrative macrostructure, although components such as settings, goals, attempts and outcomes are often included. Additionally, studies employ different methods to elicit narratives (telling vs retelling, with or without picture-based stimuli). Therefore, results cannot easily be compared. There are, however, clear indications that story structure develops substantially between age 3 and age 7 (Berman & Slobin, 1994;Pearson, 2002;Schneider et al., 2006;Trabasso & Nickels, 1992;; throughout these years, children's narratives develop from descriptive sequences only towards the inclusion of at least some complete episodic structures (see Westby, 2012). Some aspects of story structure, such as the inclusion of character's goals, develop further to age 9 .
Previous studies of MAIN-narratives have used the same story structure score as in the present study. Most of these studies found age effects from age 4 to 7; children older than 8 years have rarely been studied. For example, Bohnacker (2016) found that 6-7-year-old English-Swedish bilinguals scored higher than 5-year-olds in both languages. Lindgren (2018a) found significant age effects for Swedish monolinguals and German-Swedish 2 See also the recent volume on investigating narrative comprehension using MAIN (Bohnacker & Gagarina, 2020). and Turkish-Swedish bilinguals; six-year-olds had higher story structure scores than five-year-olds, who in turn performed better than four-year-olds. Kunnari et al. (2016) found a significant effect of age on the story structure score in Finnish monolinguals and Swedish-Finnish bilinguals (N = 32) aged 5;0 to 6;7. Gagarina (2016), in one of the few studies that included children above age 7, found for Russian-German bilinguals (N = 58) that preschoolers (M age =3;9) performed significantly lower on the story structure score in both languages than children in Grade 1 (M age =7;0), but that the children in Grade 1 performed similarly to children in Grade 3 (M age =9;3). In a longitudinal study of Dutch monolinguals, Blom and Boerma (2016) found no development in typically-developing children's story structure scores from age 5-6 (M age =5;9) to age 6-7 (M age =6;9). Lindgren (2019) found significant development from age 4 to 6, but no further development until age 7. The results from studies using the MAIN story structure score thus show a consistent development during the preschool period, but less clear development in the early school age. For the present study, this means that it is expected that there will be substantial development up to age 7, possibly with further development to age 9.
Productivitynarrative length One commonly analyzed aspect of narrative microstructure is narrative productivity, where the specific measure used is often its LENGTH IN TOTAL NUMBER OF WORDS (TNW; e.g., Fiestas & Peña, 2004;Gagarina, 2016;Justice et al., 2006). Studies have mostly compared this measure across groups, e.g., children with typical language development and children with developmental language disorder (Altman, Armon-Lotem, Fichman, & Walters, 2016;Iluz-Cohen & Walters, 2012) or across bilinguals' languages (Altman et al., 2016;Fiestas & Peña, 2004;Kunnari et al., 2016). The few studies of age effects on TNW show mixed results. In a large-scale study of English monolingual children aged 5 to 12, Justice et al. (2006) found an increase in TNW from age 5 to age 10. However, in a comparison of Swedish-speaking children aged 5 and 10, Reuterskiöld, Hansson and Sahlén (2011) found no significant difference between the age groups on TNW. To my knowledge, the only published study that investigated age effects on TNW for MAIN-narratives is Gagarina (2016). In both languages of Russian-German bilinguals, Gagarina (2016) found a significant increase in TNW from preschool age (M age =3;9) to Grade 1 (M age =7;0), but no further increase to Grade 3 (M age =9;3). There are thus indications that TNW may increase with age at least until age 7, but it may depend on the narrative stimuli used.

Syntactic complexity
Another frequently investigated microstructural aspect is syntactic complexity. Here, measures vary substantially across studies, but are often linked to the production of subordinate clauses (e.g., Justice et al., 2006;Tsimpli, Peristeri, & Andreou, 2016). Just as for TNW, age effects have rarely been investigated. However, Justice et al. (2006) found an increase from age 5 to 9 in the proportion of utterances containing two or more clauses. Syntactic complexity thus seems to increase throughout the preschool and early school age.

Referent introduction
Children's ability to introduce referents (also called first mentions) has been investigated in a number of studies (e.g., Aksu-Koç & Nicolopoulou, 2015; De Cat, 2013; Schneider & Hayward, 2010). With respect to the age at which children have been found to be able to introduce referents appropriately in elicited narratives, there is considerable variation. Some results indicate that already two-to four-year-old children are able to use indefinite expressions appropriately (De Cat, 2013;Emslie & Stevenson, 1981). However, although age 4-6 seems to be central for the development (Aksu-Koç & Nicolopoulou, 2015;Lindgren, 2018b;Schneider & Hayward, 2010), a number of studies have shown that children do not seem to master the ability to introduce referents appropriately before age 7 (Lindgren, 2018b;Schneider & Hayward, 2010) or not even before age 9 (Hickmann et al., 1996;Serratrice, 2007). Differences in results between studies are likely due to differences in procedures or stimuli (Lindgren, 2018b). Only few studies have investigated referent introduction in Swedish-speaking children. In a study of character introductions in MAIN Cat/Dog narratives (using the same stimuli as in the present study) by 72 Swedish monolingual children aged 4 to 6, Lindgren (2018b) found a significant increase in the use of appropriate referring expressions from age 4 to 6. At age 6, the children used 90% fully appropriate expressions. Similar results were found in a study of German-Swedish bilinguals aged 4 and 6 (Lindgren, Reichardt, & Bohnacker, in press). It is therefore expected that the children in the present study will have mastered referent introduction by age 7.

Participants
The participants were 17 Swedish monolingual children (10 girls, 7 boys), who were tested four times, twice at preschool, at age 4 (T1; M age = 4;4, range: 4;0-4;8) and 18 months later at age 5-6 (T2; M age = 5;10, range: 5;5-6;2), and twice at school, at age 7, the beginning of Grade 1 (T3; M age = 7;4, range: 6;11-7;8), and two years later, at age 9, the beginning of Grade 3 (T4; M age =9;4, range: 8;11-9;8). The children were recruited from two preschools at T1 and were also attending these at T2. 3 At T3 and T4, they attended 13 different schools. At each timepoint, in addition to a written consent form, a short parental questionnaire was filled in by all parents. Answers to the questionnaires showed that the children had typical language development, came from mid-to high SES families as measured by parental education, 4 and that no other languages than Swedish were spoken in the homes.

Materials
The children were tested with a picture-based narrative task from MAIN in the telling mode (Gagarina et al., 2012(Gagarina et al., , 2015(Gagarina et al., , 2019b. Each child was randomly assigned to tell either the Cat (N = 8) or the Dog story (N = 9). These stories are parallel in the number of pictures (six), episodic structure (three episodes) and number and types 3 Fifteen of the 17 children were part of a group of 24 monolingual four-year-olds in a larger study of 4-6-year-old mono-and bilingual Swedish-speaking children's narratives (Lindgren, 2018a). The final two children were also tested at this time, but were not included in the larger study. At T2, for practical reasons, only those monolingual children from whom data were available at T1 and who were attending these two preschools, a total of 18 children, were asked to participate. One child had moved and could therefore not be included in the present study. There were no further drop-outs; all 17 children who were tested at T2 also participated at T3 and T4. of characters (three characters: two animals and one boy), and have been carefully created to be controlled for story content and structure. The tasks also contain ten standardized comprehension questions which probe the child's understanding of characters' goals and emotions (internal states).

Procedure
The children carried out the narrative task in a quiet room at their (pre)schools as part of a larger test battery (see Lindgren, 2018a for details). They told the same story at all timepoints. The MAIN standardized procedure was used (Gagarina et al., 2019b): The child and the experimenter sit facing each other. On the table, there are three envelopes, each containing a set of story pictures. The child chooses one, opens the envelope and looks at all pictures. The pictures are then folded back so that only the first two are visible to the child, after which the child starts telling. When s/he has finished telling about the first pictures, the next two are unfolded, and finally the last two. The experimenter gives only general prompts such as aha, mhm, or and then? During the story telling, the pictures are not visible to the experimenter. When the child has finished telling the story, the pictures are placed on the table and the experimenter asks the comprehension questions.

Measures and analyses
The narratives were transcribed orthographically by the author in the CHAT-format (MacWhinney, 2000). The same five measures were analyzed at all timepoints (T1-T4). All coding and analyses were carried out by the author. Subsequently, the data from four randomly selected children (24% of the data) were coded by a second trained coder for story comprehension, story structure, syntactic complexity and referent introduction. All measures showed very high agreement (Cohen's kappa was 0.847, 0.899, 0.849 and 1.000 for story comprehension, story structure, syntactic complexity and referent introduction, respectively).

Story comprehension
The ten comprehension questions that were asked with each story were scored following the MAIN manual (Gagarina et al., 2019a). At each timepoint, the child received a total story comprehension score (max = 10 points).

Story structure
The MAIN scoring protocol was used to code each narrative for the production of narrative macrostructure. This scoring scheme awards points for the production of setting (time, place) and for internal states as initiating event, goal, attempt, outcome and internal state as reaction in each of the three episodes in the story. For more information about the scoring, see Gagarina et al. (2019a). Each narrative received a total story structure score (max = 17 points).

Narrative length (TNW)
The freq-commando of the program CLAN (MacWhinney, 2000) was used to calculate the length of each narrative in word tokens.

Syntactic complexity
The ratio of subordinate clauses to main clauses was used to measure syntactic complexity. All main and subordinate clauses were manually identified and counted.

Referent introduction
All first mentions of the three story characters (cat/dog, butterfly/mouse, boy) and one object central to the stories (ball/balloon) were coded with a maximum score of 3 points per referent, following the system developed by Schneider and Hayward (2010). Indefinite NPs and possessive NPs (where the new referent was introduced as belonging to a previously introduced referent, e.g., pojkens badboll 'the boy's beach ball', dens ägare 'its owner'), received 3 points, definite NPs and bare nouns received 2 points, and pronouns received 1 point. Each narrative received a total referent introduction score (max = 12 points).
For each of the five measures a linear mixed effects model (with Child as random effect) was conducted with timepoint as predictor (fixed effect) using the function lmer from the package lme4 in R (Bates, Mächler, Bolker & Walker, 2015). Timepoint was Helmert-coded, i.e., the mean for each timepoint was compared to the mean of the subsequent timepoints. Story was included as a control variable in all models.

Results
Table 1 and Figure 1 show descriptive statistics for story comprehension, story structure, narrative length, syntactic complexity and referent introduction. Table 2 gives the results from the five linear mixed effects models.
The descriptive statistics (Table 1, Figure 1) show relatively large increases from T1 to T4 on all measures. However, the statistical analyses (Table 2) show differences in the patterns of development. On STORY COMPREHENSION, the children performed significantly lower at T1 than at the later timepoints, but, due to ceiling effects already at T2, the measure showed no further increase. In contrast, STORY STRUCTURE showed significant differences between each previous timepoint and the following ones, with relatively large increases from T1 to T2 and from T3 to T4; the increase from T2 to T3 was small. Story structure scores were still only at 50% out of the maximum score at T4. The results for NARRATIVE LENGTH showed that the children produced significantly shorter narratives at T1 and T2 than at the subsequent timepoints, but that the increase from T3 to T4 was not significant. Individual variation was substantial for this measure at all timepoints, and ranges were similar for T2, T3, and T4. SYNTACTIC COMPLEXITY was significantly lower at T1 than at the subsequent timepoints (but with a negligible difference between T1 and T2), and showed a steep development from T2 to T3/T4, but, again, the increase to T4 was not significant, probably due to the large individual variation at both T3 and T4. At T4, the children produced almost one subordinate clause per three main clauses on average, and all children produced at least one subordinate clause. The score for REFERENT INTRODUCTION showed a large and significant increase from T1 to the subsequent timepoints, no increase from T2 to T3, but a further significant increase to T3 to T4. At T1, the children mainly introduced referents using definite NPs. At T4, referent introduction had been fully mastered by most children: only 4 children (out of 17) did not produce four indefinite NPs at this point.

Discussion and conclusion
The present study used the picture-based narrative task Cat/Dog from MAIN (Gagarina et al., 2019a) to investigate narrative development from age 4 to 9 in monolingual Table 1. Descriptive statistics for story comprehension, story structure, narrative length, syntactic complexity, and referent introduction (means, SDs, ranges), by timepoint.
In line with results from previous studies of STORY COMPREHENSION in MAIN (Bohnacker, 2016;Lindgren, 2018aLindgren, , 2019, the children's performance was at a relatively high level already at T1 (age 4;4), and showed a steep increase to T2 (age 5;10), but no further significant development from T2 onwards. This was due to a ceiling effect. For this measurement, children have reached above 90% accuracy already before age 6. However, it is important to point out that this does not mean that story comprehension IN GENERAL has been mastered at this age. The reason for the children's performance could lie in the nature of the comprehension question asked. For example, Stein and Glenn (1979) found a development in story comprehension from age 6 to 10; the difference in their results and those of the present study could lie in the number and type of comprehension questions.
In contrast, the production of STORY STRUCTURE (narrative macrostructure) showed continued development up to age 9; in fact, the significant increase in the story structure score from T3 (age 7;4) to T4 (age 9;4) was relatively large. This was different from the lack of a significant difference between Grade 1 (age 7) and Grade Note. * = p < .05; ** = p < .01; *** = p < .001. For each predictor, the second level is the reference level.
3 (age 9) found by Gagarina (2016) for Russian-German bilinguals in the only other study with MAIN that investigated age effects in children above age 8, as well as in the longitudinal studies by Lindgren (2019) and Blom and Boerma (2016) on Swedish and Dutch monolinguals, respectively. What these three previous studies have in common is that they all used the MAIN Baby Birds/Baby Goats task, whereas the present study used MAIN Cat/Dog. However, these two tasks have been created to be parallel in story structure and the components scored are the same ones, so one would not expect performance on the two tasks to differ. Two previous studies did not find differences in performance on story structure between MAIN Cat/Dog and Baby Birds/Baby Goats tasks in the telling mode (Lindgren, 2018a;Öztekin, 2019). 5 The reason for these differences remains an open question and needs to be investigated further in future studies. Although the development of story structure continued until age 9, at this age, scores were still at only around 50% out of the maximum score. The mean score of 8.5 points at age 9 is relatively far from the adult level (11.3 points for Swedish-speaking adults in Gagarina et al., 2019a). With respect to NARRATIVE LENGTH (TNW), the children produced significantly longer narratives until T3 (age 7;4), but the additional numeric increase in length to T4 (age 9;4) was not significant. This is similar to the results from Gagarina (2016), who used MAIN, but different from Justice et al. (2006), who used a narrative task with a single picture as stimulus. The difference between the present study and Justice et al. (2006) could thus be linked to differences in the stimuli employed, but it could also be an effect of the present study's small sample and substantial individual variation.
The narratives' SYNTACTIC COMPLEXITY showed its largest increase from T2 (age 5;10) to T3 (age 7;4). At T1/T2, the children rarely used subordinate clauses, whereas at T4 (age 9;4), they did so with almost a third of their main clauses. The results presented here do not show a linear development for this measure, in contrast to the results of Justice et al. (2006). Rather it seems to be the case that, at least in Swedish-speaking children, syntactic complexity may remain at a low level throughout the late preschool period, then increase substantially before the time the children start school, but then remain at a similar level until Grade 3. However, the lack of significant development from Grade 1 to Grade 3 in the present study may be an artefact of the relatively small sample and large individual variation at T3 and T4. To investigate this issue, larger studies, preferably longitudinal ones, are necessary.
The results for REFERENT INTRODUCTION were in line with those previous studies that show that this ability continues to develop after age 7 (e.g., Hickmann et al., 1996;Serratrice, 2007); although the children performed relatively well already at age 4, only at T4 (age 9;4) had the children fully mastered referent introduction. Interestingly and contrary to expectations, despite the task being the same, the children in the present study performed worse at age 6-7 than the Swedish-speaking six-year-olds in Lindgren (2018b), who were close to ceiling. It is possible that the cause of this difference is that Lindgren (2018b) only analyzed character introductions, whereas the present study also included one inanimate object. It is also possible that the children in Lindgren (2018b) were especially high-performing ones. Additionally, it could be an effect of repeated testing with the same narrative stimuli; children may be less inclined to introduce referents using indefinite NPs when they have already told the story to the same listener at an earlier timepoint. For these reasons, future studies need to elicit different, but comparable stories at 5 In both these studies, all children told one narrative from each task. different timepoints, analyze differences between different types of referents in detail and collect more detailed information about the participants' backgrounds.
To conclude, the present study has shown, for the first time in a longitudinal study using MAIN, that different narrative aspects develop differently from age 4 to 9. This suggests that researchers need to be careful when drawing general conclusions regarding the development of narrative skills, since results may differ substantially depending on the specific aspects studied.