Do children go for the nice guys? The influence of speaker benevolence and certainty on selective word learning

Abstract This study investigated how speaker certainty (a rational cue) and speaker benevolence (an emotional cue) influence children's willingness to learn words in a selective learning paradigm. In two experiments four- to six-year-olds learnt novel labels from two speakers and, after a week, their memory for these labels was reassessed. Results demonstrated that children retained the label–object pairings for at least a week. Furthermore, children preferred to learn from certain over uncertain speakers, but they had no significant preference for nice over nasty speakers. When the cues were combined, children followed certain speakers, even if they were nasty. However, children did prefer to learn from nice and certain speakers over nasty and certain speakers. These results suggest that rational cues regarding a speaker's linguistic competence trump emotional cues regarding a speaker's affective status in word learning. However, emotional cues were found to have a subtle influence on this process.


Introduction
In acquiring the vocabulary of a language, the child has to learn from other people which label to use to accurately refer to things in the world. Children are generally quite prepared to learn from the testimony of others, but they do not gullibly accept everything that comes their way (see, for example, Harris & Koenig, 2006, and Markson & Bloom, 1997, for relevant studies, and Gelman, 2009, for a review of this topic). Previous research has demonstrated that children can be quite selective in their learning, evaluating not only the nature of the information that speakers provide them with, but also the general characteristics of the speaker providing the information (Koenig & Sabbagh, 2013). If children are given cues that suggest that one speaker is more reliable than another, they prefer to learn from the more reliable speaker. This type of learning via social transmission has been argued to be a rational process (Sobel & Kushnir, 2013). That is, children take their previous knowledge into account, not only regarding the objects of learning, but also with respect to what they know about the particular speakers they are dealing with. However, Mills (2013) argues that although children can be critical in evaluating information in many respects, their critical thinking must not be overestimated. That is, children may be capable of recognising certain signals of untrustworthiness, without necessarily understanding that there may be a weighting in the importance of these different signals.

Rational and emotional cues in selective learning
Although it is a good strategy to learn from rational, epistemically justified, cues regarding speaker trustworthiness (e.g., preferring speakers who have demonstrated prior accuracy in a relevant domain over those who have demonstrated inaccuracy), children have also been shown to place trust in informants on the basis of cues that are not justified on epistemic grounds. For instance, Bascandziev and Harris (2014) show that children prefer information from an attractive informant over information from an unattractive one. Similarly, Jaffer and Ma (2015) demonstrate that four-and five-year-old children have a bias against physically disabled or obese people. This learning behaviour is not rational in the sense that there is no epistemically grounded reason to assume that attractiveness, physical ability, and obesity are relevant cues in a selective learning situation. Jaffer and Ma suggest that failure to learn from these individuals might be due to a general negativity bias, which would cause children to avoid (learning from) a negative environment. Alternatively, it might be the case that attractive, able-bodied, or non-obese individuals are associated with a positive bias, a 'halo-effect', according to which they are associated with positive learning experiences regardless of their actual trustworthiness. It has been shown that both adults and children are influenced by the halo-effect in trait attribution (see Wilson andEckel, 2006, for adults, andCain, Heyman, andWalker, 1997, for children). Although adults might be expected to consider rational cues to be of greater importance than emotional cues in their trustworthiness judgements, it is not clear whether this is also the case for children. Will epistemically justified, rational cues indeed override these kinds of non-rational biases in the word learning domain or will children consider the two types of cues to be of equal importance? The current study seeks to further investigate this issue and determine which cues children use in selective learning, and how rational and non-rational, emotional, cues are weighted in the domain of word learning.
Speaker competence and benevolence cues in word learning Two types of cues are the focus of the current study: A rational cue pertaining to a speaker's competence in providing accurate information in the linguistic domain, and a non-rational, emotional, cue pertaining to a speaker's level of benevolence (cf. Mascaro and Sperber, 2009, for more on the competence and benevolence dimensions). Previous studies have suggested that linguistic competence, which constitutes a rational cue in the domain of word learning, is taken into account by children in the word learning process, as they preferentially learn novel labels from previously accurate speakers over inaccurate ones (see Birch, Vauthier, & Bloom, 2008;Koenig, Clément, & Harris, 2004;Pasquini, Corriveau, Koenig, & Harris, 2007). Furthermore, Scofield and Behrend (2008) showed that four-year-olds were even able to revise a label-object link when the informant later proved to be an unreliable labeller. Sobel and Macris (2013) added to this research by demonstrating that a speaker who displays good syntactic ability (labelling an object using correct syntax) is also preferred over a speaker who displays poor syntactic ability (labelling an object using incorrect syntax: "This are a ball"). Both lexical and syntactic accuracy thus seem to be linguistic competence cues that children rely on in learning new object labels from others.
Children also appear to be attuned to more subtle linguistic competence cues, such as linguistically conveyed certainty in labelling a novel object (cf. Bergstra, De Mulder, & Coopmans, 2013;Sabbagh & Baldwin, 2001;Sabbagh, Wdowiak, & Ottaway, 2003). Children prefer to follow the labelling of a speaker who claims to be familiar with an object and to KNOW what it is called over the labelling of a speaker who states that he isn't familiar with an object and only THINKS a novel object has a particular label. Although younger children do understand direct physical cues to an interlocutor's knowledge states better than these more indirect verbal cues of knowledge (Saylor & Carroll, 2009), children of four years and older are generally capable of understanding the distinction between mental state terms like know and think (cf. Moore, Bryant, andFurrow, 1989, Moore andDavidge, 1989, for studies on mental state term understanding in English, and De Mulder, 2015, for Dutch) and they thus can and do use these cues in novel word learning. Although linguistically conveyed speaker certainty may be a more indirect cue than prior naming accuracy, it is clearly still a cue that is informative regarding a speaker's labelling competence. In this sense, then, it can be considered a rational cue: it is justifiable on epistemic grounds to follow the labelling of a speaker who says he is certain about an object name rather than the labelling of a speaker who makes it explicit that he is uncertain.
It has been shown, however, that children are also attuned to even more indirect speaker competence cues, such as native accent (Corriveau, Kinzler, & Harris, 2013) and speaker age (Jaswal & Neely, 2006). That is, children prefer to learn words from speakers with a native accent over those with a non-native accent and from adults over children. Although the epistemic grounds for these preferences in the domain of word learning are somewhat less clear than for prior accuracy and speaker certainty, it is still rational to follow the labelling of native speakers and adults in the domain of novel word learning. After all, native speakers (as compared to non-native speakers) and adults (as compared to children) are very likely to have a broader vocabulary and thus are more likely to be able to offer the correct label.
Aside from relying on these epistemically justified, rational cues, previous studies have also suggested that emotional cues such as speaker benevolence may influence children's preferential learning, although few studies have considered this factor specifically in a word learning paradigm. For instance, Mascaro and Sperber (2009) demonstrated that children as young as three are already more inclined to trust the testimony of nice over nasty informants, preferring to follow a nice informant's testimony regarding the content of an opaque box rather than that of a nasty informant. Vanderbilt, Liu, and Heyman (2011) demonstrate in a selective trust paradigm that five-year-olds prefer to take advice from a helpful informant rather than from a deceitful informant in their attempts to locate a hidden sticker. Benevolent speakers thus seem to be trusted more by children than malevolent speakers in a general sense, but there is also evidence to suggest that speaker morality may guide children's novel word learning. Doebel and Koenig (2013) found that children were more inclined to acquire novel labels from well-intentioned speakers (who had previously been seen to engage in prosocial behaviour towards peers) over those of a neutral speaker, and the labels of a neutral speaker over those of an ill-intentioned speaker (who had previously displayed antisocial behaviour towards peers). Similarly, Landrum, Mills, and Johnston (2013) showed that the labels of informants who were described as being prosocial ("This person is very nice. He shares things, he gives presents to his friends and family, and he really cares about other people's feelings.") were taken to be accurate by three-to five-year-old children, even if they were said not to be experts in a particular labelling domain. Prosocial, morally high-standing speakers thus seem to benefit from the halo-effect: their positive social behaviour leads to attribution of knowledge in the labelling domain, even though on epistemic grounds there is no clear reason to believe that they would be more accurate labellers than speakers who have behaved antisocially or neutrally. In this sense, then, speaker benevolence can be considered to be a non-rational, emotional cue (Sobel & Kushnir, 2013).

The current study
Previous research has thus shown that children are sensitive to rational and emotional cues if they are faced with conflicting information regarding the correct label for an object. The current study aims to build on this previous work and extend it not only by considering additional speaker features that may influence children's willingness to acquire novel vocabulary items (Experiment 1), but also by investigating how children's selective learning is influenced if multiple cue types interact (Experiment 2). To this end, Experiment 1 considers independent effects of speaker benevolence and speaker certainty on children's selective word learning. As previous studies have found children to follow certain speakers over uncertain speakers (e.g., Bergstra et al., 2013;Sabbagh & Baldwin, 2001), we expected a similar outcome for this study. The speaker certainty cue thus served primarily as a replication of previous studies and as a validation of the particular version of the paradigm used in this study. In this way, the effects of speaker benevolence in the word learning domain could be more clearly assessed. As the review of previous studies above makes clear, children are more inclined to trust nice over nasty speakers in a general sense, and the novel labels of morally high-standing speakers are also taken to be accurate. However, previous studies have not considered whether the child's PERSONAL liking of the speaker in and of itself affects word learning (both previous studies in this domain presented children with individuals who were said or shown to behave prosocially towards OTHERS, and Landrum et al., 2013, did not consider the effects of speaker benevolence independently from level of expertise). Investigating this specific factor can thus make clear whether it is enough for a child to just like the speaker (based on the speaker being nice to the child on one occasion) in order to be the recipient of selective trust for word learning. Alternatively, evidence of a speaker's moral high-standing may have to be more pronounced and/or benevolence cues may have to be coupled with other cues indicating potential labelling competence for benevolent speakers to be preferred in the word learning domain.
Based on the outcomes of previous studies, in Experiment 1 we expect children to have a preference for the object of the more positive informant: the certain speaker over the uncertain speaker in the case of differences in speaker certainty, and the nice speaker over the nasty speaker in the case of differences in speaker benevolence.
Experiment 2 then goes on to examine the weighting that children apply to these different cues by combining high or low speaker certainty (a rational cue) and benevolence or malevolence (an emotional cue) in one speaker. As relatively few studies have combined different cues, it is harder to offer specific predictions for Experiment 2. However, what we do know from prior research is that prior naming accuracy, which is a rational cue, tends to trump other (emotional or more subtle rational) cues.
Native-accented speakers, for instance, were not preferred over foreign-accented speakers if, previously, the native speaker displayed naming inaccuracy whilst the foreign speaker was accurate (Corriveau et al., 2013). Similarly, adults' labels are not preferred over the labels provided by a child if the adult previously was inaccurate and the child accurate (Jaswal & Neely, 2006). Corriveau and Harris (2009) show that accuracy is also more important than familiarity. Four-and five-year-old children prefer the labels of a familiar teacher over those of an unfamiliar teacher. However, when the unfamiliar teacher provides accurate labels prior to the experiment phase, and the familiar teacher provides inaccurate labels, the labels of the unfamiliar teacher are preferred. Prior accuracy thus seems to be more important than age, accent, and familiarity, which suggests that this clearly rational cue overrides other more subtle rational or emotional cues. In the light of these results, we might thus expect the rational speaker certainty cue to override the emotional benevolence cue.
However, not all studies find evidence for clearly rational cues prevailing over non-rational ones. Bascandziev and Harris (2016), for example, found that children's preference for learning from attractive over unattractive speakers was not overridden when the unattractive speaker was shown to be accurate and the attractive speaker inaccurate. Furthermore, Landrum et al. (2013) demonstrated that children prefer the labels of a benevolent informant over those of an expert informant. That is, informants who were described as being prosocial individuals were trusted more than informants who were described as being experts in a relevant area (for example, an eagle expert when bird-related objects were being named). Indeed, this preference for the prosocial speaker remained even when this speaker had irrelevant expertise (e.g., the prosocial speaker was described as an eagle expert in a situation in which vehicles were being named). However, the level of preference for the prosocial and relevant expert was higher than for the prosocial and irrelevant expert, indicating that children were taking expertise into account to some extent. This suggests that cues related to a speaker's linguistic competence may be less relevant and/or salient to children in their learning of novel labels than general benevolence cues. If this is the case in a broader sense, we would expect to find that children will choose the labels of nice speakers over those of certain speakers (i.e., the emotional benevolence cue will override the rational speaker certainty cue).

Retention of labels in the selective word learning paradigm
Aside from investigating what kinds of information children take on board in determining selective trust, and how different kinds of cues are weighted in this process, an additional aim of the current study was to determine to what extent the label-object links that the children learn during selective word learning trials are enduring. After all, in this paradigm, children might just stick with the accurate person without storing the new label in their lexicon (Birch et al., 2008), and thus not actually learn any new words in the process. Given that by far the most of the previous studies that have used the selective trust word learning paradigm have not assessed memory for the novel label-object pairing after initial exposure, it is currently not entirely clear whether children retain the label for a novel object over time. Sobel, Sedivy, Buchanan, and Hennessy (2012) demonstrated that children do still remember the label provided by a reliable speaker at the end of an experimental session, and Sabbagh and Shafman (2009) showed that labels of unreliable speakers were not remembered in this time frame, so there is some suggestion that word learning from reliable speakers does occur. However, neither of these studies assessed whether children remembered labelobject links after a greater time delay (i.e., beyond the experimental session during which first exposure occurred), so it is not known to what extent novel labels are retained in memory for a greater time period. Since creating an enduring link between an object and a label is an important part of word learning, the current study not only reassessed children's novel object-label pairings at the end of the initial test session, but also again during a second test session that was scheduled approximately a week after the initial exposure to the novel label-object combinations. In this way, then, the current study also aimed to shed light on whether word learning actually occurs in selective trust word learning paradigms.
Experiment 1 Experiment 1 assessed the independent effects of speaker certainty and benevolence in a selective trust word learning paradigm and considered children's memory of these novel label-object pairings. In order to ensure that none of the participating children had intrinsic problems in acquiring vocabulary items, a standardised receptive vocabulary task was also included (Peabody Picture Vocabulary Test-III-NL; Schlichting, 2005). This test was used to exclude children with very low receptive vocabulary scores (i.e., scores ⩽ 70) from analysis.

Method
Participants Thirty-two Dutch children participated in this study, but one child was excluded given a low score on the receptive vocabulary measure. This left 31 children (15 boys) between four and six years old (M = 5;3) for analysis.
Children were recruited from three different classes of one elementary school in De Bilt in the Netherlands. No formal data were collected regarding the socioeconomic status and the race/ethnicity of the participants, but the majority of the children in the sample were white and, given the socioeconomic characteristics of the school neighbourhood, the children attending the school were most likely to come from a lower-middle-class or middle-class background.

Procedure
Each child was tested in a separate room in the school. All children participated in two sessions, each lasting around 20 minutes and separated by approximately one week. One adult (the experimenter) was present in both sessions. The first session consisted of a selective word learning paradigm in which two speakers (hand puppets) used novel labels to describe various objects. At the end of the first session, children's ability to remember the novel label-object pairing was assessed in the first set of retention trials. In the second session, children were administered the second set of retention trials and their general Dutch receptive vocabulary was measured. After the experiment, the children were rewarded with stickers.

Materials and design
Selective word learning. At the start of the experiment, children were introduced to two puppets: Groentje (Greeny) and Blauwtje (Bluey), two identical monkeys, one with a green scarf and one with a blue scarf, and told that these puppets would provide names for various objects. For each trial, the puppets had one different object in front of them which they named using the same label. The child then had to decide to which of the two objects that particular label referred by pointing to one of them. In the practice phase, children received four trials consisting of known objects and labels to familiarise them with the task. While the experimenter was present, both puppets were seated next to each other, facing the child. Each puppet had a different familiar object in front of him and labelled this with the same familiar label (e.g., Dit is een bal. 'This is a ball.', which was an accurate labelling event for one of the puppets, but inaccurate for the other). Each puppet used the correct label for their object on half of the practice trials (i.e., each puppet engaged in two correct and two incorrect labelling events). The practice phase thus familiarised children with the task, but did not differentiate between the puppets' levels of reliability. After the puppets had labelled their objects on each trial, they disappeared, but the objects remained in the same place. The experimenter asked the child to point at the object to which the label referred, by asking: Welke was de X? 'Which one was the X?'. All children performed correctly on each of these practice items (a score of 100%), so all data from the experimental trials were retained for further analysis.
In the test phase, both puppets had a novel object in front of them (16 novel objects were created for this study) and the puppets used one of eight novel labels to refer to the objects. The eight novel labels used in this study were: Mit, klek, teg, glap, wop, prok, raf, and brim. Each of these words is a non-existent but phonotactically possible word in Dutch. After having labelled their objects, the puppets disappeared, but the experimenter remained visible during the entire experiment. The objects remained in the same place and the experimenter asked: Welke was de X? 'Which one was the X?' The children were then required to point to one of the objects. If they did not respond, the experimenter repeated the question, after which all of the children responded.
Children were randomly assigned to one of two experimental conditions: the speaker certainty condition (16 children) and the speaker benevolence condition (15 children), consisting of eight trials each.
In the speaker certainty condition, the puppets displayed certainty or uncertainty in their naming of the novel object. Level of speaker certainty was expressed by statements concerning the puppet's familiarity with the object and use of the mental state verb know or think. Certainty statements like: Ik heb dit al eerder gezien. Kijk, dit is hoe je het oppakt. 'I have seen this before. Look, this is how you pick it up.', which the puppet said while picking up the object; and Ik heb hier vaak mee gespeeld, want ik heb het thuis ook. Ik weet dat dit een mit is. 'I have played with this a lot, because I have it at home, too. I know this is a mit.', were thus contrasted with statements indicating uncertainty: Ik heb dit nog nooit gezien. Ik weet niet hoe je het op moet pakken. 'I have never seen this before. I don't know how to pick it up.', which the puppet said while touching, but not lifting the object; and Ik heb hier nog nooit mee gespeeld, want ik heb het thuis niet. Ik denk dat dit een mit is. 'I have never played with this, because I don't have it at home. I think this is a mit.' Each puppet was certain in four test trials and uncertain in the four remaining trials. Each puppet appeared on the left side (from the child's point of view) four times and on the right side four times. After each labelling event, the experimenter asked: Welke is de [novel label]? 'Which one is the [novel label]?', and the children had to point to one of the objects. If the children did not answer, the question was repeated, after which all of the children responded.
The set-up of the speaker benevolence condition was similar to the speaker certainty condition, except that the statements consisted of simple declarative sentences (both puppets uttered Dit is een mit 'This is a mit' and pointed at the object in front of them) and that prior to the test phase the puppets had displayed nice or nasty behaviour towards the child. Nice and nasty behaviour was operationalised as follows: at the beginning of the test phase both puppets received stickers, whilst the child did not receive any. One puppet then offered to share his stickers with the child whilst the other expressed joy at the child not having any stickers. The nice puppet then proceeded to give the child some stickers. Which of the two puppets was the nice puppet was counterbalanced across participants. After the eight test trials, the child was asked which puppet she considered to be nicer to ensure that the benevolence manipulation had been successful.
In each experimental condition, two different testing orders were used and the order in which the puppets spoke was counterbalanced. Furthermore, after four trials and after eight trials, the four novel labels used in the previous trials were repeated by the experimenter (e.g., Nu hebben we een mit, een klek, een teg en een glap gezien. 'Now we have seen a mit, a klek, a teg and a glap'), so that the child heard each label twice during the first session.
Retention trials. Eight retention trials were administered twice: once at the end of the first session and once in the second session, approximately one week later. In the retention trials the experimenter showed the child the same eight pairs of novel objects, one pair at a time, that the child had been exposed to initially. For each set of two, the child was then asked: Welke was de [novel label]? 'Which one was the [novel label]?'. The child's choice was considered correct if she gave the same answer that she had provided initially. If a child didn't answer, the question was repeated, after which all of the children responded. There was no specific protocol for the experimenter regarding the side on which the objects had to be placed; objects could thus be placed on the same side as they had previously been placed or on the other side. Table 1 shows the descriptive statistics for the participants in both conditions. We expected the children to choose the object of the positive informant in both conditions (i.e., the certain speaker and the nice speaker). The data was analysed by means of a mixed-effects logistic regression model, with 'positive informant' as a binomial dependent variable. The model included two crossed random effects, Note. a = score denotes the number of times the child followed the naming behaviour of the certain speaker; b = score denotes the number of times the child followed the naming behaviour of the nice speaker; maximum score is 8 for all variables except age.

Results and Discussion
'participant' (the effect of the particular child that participated) and 'trial' (the effect of the trial, whether it was the first, second, third, fourth, fifth, sixth, seventh, or final trial for the child), and a fixed effect of 'condition' (the effect of whether the child participated in the speaker certainty or in the speaker benevolence condition). The model was fitted without an intercept for 'condition' in order to draw conclusions about differences between participants' choice of positive vs. negative informants per experiment. There was a significant effect of condition (F(2,246) = 8.22, p < .001). For children in the speaker certainty condition, the odds of choosing the positive informant (in this case the certain speaker) were 3.66 times the odds of choosing the object of the negative (uncertain) informant (which corresponds to a medium effect size; see Chen, Cohen, & Chen, 2010) (β = 1.30, p < .001). For children in the speaker benevolence condition, the odds of choosing the object of the positive speaker (in this case the nice speaker) were only 1.30 times the odds of choosing the object of the negative (nasty) informant, (β = 0.26, p = .40). Although both odds ratios (OR) are higher than 1 (and thus suggest that there is a preference for both the certain and the nice speaker), the result was only significant with regard to the preference for the certain speaker. These findings thus suggest that, although children do prefer to follow the naming behaviour of a certain over an uncertain speaker, children do not preferentially follow the naming behaviour of a nice over a nasty speaker. Since age effects were not of primary relevance in this study, we considered the children as one group. However, in order to make sure that age did not play a primary role in the results, we recreated the mixed-effects logistics regression model mentioned above, but this time with the fixed effect 'age in months'. The results showed that age was not a significant factor (F(1,246) = 0.16, p = .69, OR = 0.99, β = -0.01). In a model in which both 'condition' and 'age in months' were included as a fixed effect, the results showed a very similar pattern. There was a significant effect of condition (F(1,245) = 6.10, p = .01, OR = 2.93, β = 1.07), but age was not a significant factor (F(1,245) = 0.43, p = .52, OR = 0.98, β = -0.02). Given these results, age was not considered as a factor in further analyses in this study.
It should be noted that three of the 15 children in the speaker benevolence condition did not consider the sticker-sharing puppet to be nicer than the non-sharing puppet when they were asked who they thought was nicer. We repeated the analysis after excluding these three children and it turned out that 'condition' (speaker certainty vs. benevolence) remained statistically significant (F(1,246) = 6.02, p = .02, OR = 2.83, β = 1.04).
To determine whether the label-object links would endure beyond the experimental trial, performance on the retention trials at the end of session 1 and in session 2 was considered. In the session 1 retention trials, children made the same label-object pairing as they had done earlier on average in 6.77 out of 8 trials. A one-sample t-test demonstrated that this was significantly above chance level (which would have been 4; t(30) = 14.22, p < .001, d = 2.55). In the session 2 retention trials, the children chose the same object as before in 5.58 out of 8 trials, which was also significantly above chance level (t(30) = 6.47, p < .001, d = 1.16), but significantly less often than they had in session 1 (t(30) = 4.67, p < .001, d = 0.84). If performance on the retention trials is considered separately for each condition, the same pattern of results emerges. That is, in both conditions children chose the same object as they had before at significantly higher than chance levels, both immediately after the experiment and a week later. Given that the novel label-object link endured for at least a week after the first exposure, the results for the retention trials suggest that word learning does occur in this paradigm.
Experiment 1 thus demonstrates that children create enduring label-object links in a selective learning paradigm, and that, although they prefer certain speakers over uncertain speakers, the label-object pairings of nice speakers are not preferred over those of nasty speakers. Whereas the finding regarding speaker certainty was expected, the finding for speaker benevolence was somewhat surprising. After all, Doebel and Koenig (2013) had demonstrated that a speaker who displayed well-intentioned, helpful behaviour towards a peer was considered to be reliable in his object labelling. Moreover, Landrum et al. (2013) showed that benevolence was even a more important factor than expertise. Although previous studies thus suggest that benevolence is a primary feature that children rely on in learning novel labels, the current study did not find support for this position. Behaving nicely towards the child on one occasion at the beginning of the test trials did not seem to be enough for the child to display trust in a speaker's testimony.

Experiment 2
Whereas Experiment 1 considered speaker certainty and benevolence independently, Experiment 2 investigated the combination of these characteristics in one speaker and aimed to answer the question whether children's preference for certain speakers' labels would remain in equal force if they had behaved nastily to the child prior to the labelling.

Method
Participants Fifty-two Dutch children participated in Experiment 2. However, three children could not be tested again in session 2 and one child had an extremely low receptive vocabulary score. These four children were excluded from analysis, leaving 48 children (23 boys) between four and six years old (M = 5;5) for analysis. Children were recruited from four different classes of two elementary schools in the Netherlands (in De Bilt and Nijmegen) and came from a similar background as the children in Experiment 1. None of the children in Experiment 2 had participated in Experiment 1.

Procedure, materials, and design
The procedure, materials, and design of Experiment 2 were very similar to that of Experiment 1. As the practice phase, the speaker benevolence manipulation, and the retention trials were identical to those of Experiment 1, only the test phase of the selective word learning task is described here.
Selective word learning. Prior to the test phase, children went through a practice phase and their liking of the speakers was manipulated (see Experiment 1). In the test phase each child was presented with 12 trials, in which both the nice and the nasty puppet named an unfamiliar object using statements that indicated their level of certainty ("I have seen this before. I play with it often, because I have it at home. I know this is a mit." vs. "I have not seen this before. I have never played with this, because I don't have it at home. I think this is a mit."). The twelve novel labels used in this study were: mit, klek, teg, glap, wop, prok, raf, and brim, as in Experiment 1, plus four additional words: hast, virg, tork, and nelf. Each of these words is a non-existent but phonotactically possible word in Dutch.
For each individual child, speaker benevolence was held constant (i.e., either Bluey or Greeny was nice), but speaker certainty varied by trial (on half of the trials Bluey was certain, on the other half Greeny was). This experimental design was chosen as it is plausible for a puppet to be certain on one trial but uncertain on the next, whereas this is not the case for benevolence. That is, if the puppets had changed their behaviour towards the child on each trial (with one puppet kindly sharing stickers one moment but expressing joy that the child had not received any stickers on the next trial), they could not really be considered to be truly nice or nasty in nature.
This combination of speaker certainty and benevolence characteristics led to two different kinds of trials. In half of the trials, a nice and uncertain puppet was contrasted with a nasty and certain puppet (certainty VS. benevolence trials). Comparison of these trials could demonstrate whether children preferentially trust certain speakers, even if they are nasty. In the other six trials, a nice and certain puppet was contrasted with a nasty and uncertain puppet (certainty PLUS benevolence trials). A comparison of the two types of trials (certainty VS. benevolence and certainty PLUS benevolence) can demonstrate whether the child will prefer the labelling of a nice and certain puppet over that of a nasty and certain puppet (i.e., if she follows the labelling of the certain puppet more often in the certainty PLUS benevolence trials than in the certainty VS. benevolence trials). This comparison is thus informative in order to determine whether benevolence influences the child's selective learning if the level of speaker certainty is made explicit, but does not distinguish the speakers. Table 2 shows a summary of the trial types (note that the two trial types were mixed and the order in which the two puppets spoke was counterbalanced). Table 3 shows the descriptive statistics for Experiment 2. What we investigate in Experiment 2 is, first, whether the testimony of a nice and uncertain puppet will be preferred over the testimony of a nasty and certain puppet (based on the certainty VS. benevolence trials). This comparison demonstrates which of the two cues, benevolence or certainty, children consider to be more important. Second, we considered whether there is a preference for the testimony of the nice and certain speaker over that of the nasty and certain speaker (based on the outcomes of the certainty VS. benevolence trials as compared to the certainty PLUS benevolence trials). This latter part of the investigation thus entails that information is compared across different trial types. We can thus not obtain a 'pure' measure of children's preference for nice and certain over nasty and certain speakers, because children are faced with different contrasts when they have to make their choice (see Table 2). This set-up means that the data in Experiment 2 are structured differently to the data in Experiment 1 and thereby preclude the use of the mixed models analysis that was applied to the data from Experiment 1 (Eddington, 2015). Therefore, paired samples t-tests were used for the analysis of Experiment 2 instead. The results of Experiment 2 showed that children followed the labelling of the nasty and certain puppet more often than that of the nice and uncertain puppet (M = 4.21 vs. M = 1.79), a difference that was statistically significant (t(47) = 5.16, p < .001, d = 0.74). Excluding children who did not consider the nice puppet to be nicer does not change this pattern of results. Note, though, that 21 out of 48 children did not consider the nice puppet to be nicer, which is a surprisingly large number given that the benevolence manipulation had been successful in 12 out of 15 children in Experiment 1. We will return to this issue in the 'General Discussion', but for now we note that the fact that the sticker sharing was a very short, single event and, as such, may not have been enough for children to consider the puppet to be nice or nasty in general. Speaker certainty was thus found to trump benevolence: even if a speaker is nasty to the child prior to labelling, children still prefer to learn from them if they are certain about their labelling.

Results and Discussion
However, benevolence may still play a role in selective word learning if certain and nice speakers are preferred over certain and nasty speakers. The number of times a child followed the testimony of a nice and certain puppet (from the certainty PLUS benevolence trials) was thus compared with the number of times she followed a nasty and certain puppet (from the certainty VS. benevolence trials). A paired samples t-test demonstrated that the certain and nice puppet was followed significantly more often than the certain and nasty puppet (M = 4.54 vs. M = 4.21, t(47) = 2.37, p = .02, d = 0.34). If the level of explicitly marked speaker certainty is equal, children thus do prefer to learn from a nice rather than a nasty speaker. This interpretation is strengthened if the results of children who considered the nice puppet to be nicer are considered independently from those of children who did not like the nice puppet more. The 27 children who considered the sticker-sharing puppet to be nicer followed the nice and certain puppet's advice significantly more often than they followed the advice of the certain and nasty puppet (M = 4.81 vs. M = 4.41, t(26) = 2.66, p = .01, d = 0.51), whereas this was not the case for the 21 children who did not prefer the sticker-sharing puppet (M = 4.19 vs. M = 3.95, t(20) = 0.93, p = .37, d = 0.20). This suggests that, if children like a particular speaker, they follow that speaker's labelling, but only if labelling certainty is also displayed explicitly.
In order to investigate children's ability to remember the label-object pairings, performance on the retention trials was also considered (see Table 3). In the session Note. a = score denotes the number of times the child followed the naming behaviour of the certain and nasty puppet, maximum score = 6; b = score denotes the number of times the child followed the naming behaviour of the certain and nice puppet, maximum score = 6; maximum score retention trials 1 and 2 = 12.
1 retention trials, the children chose the same object as they had before in 9.88 out of 12 trials, which is significantly above chance level (which would have been 6) (t(47) = 15.17, p < .001, d = 2.19). In the session 2 retention trials, the children chose the same object on 8.94 out of 12 trials, which was also significantly above chance (t(47) = 10.87, p < .001, d = 1.57), but significantly lower than one week earlier (t(47) = 3.86, p < .001, d = 0.56). In line with the results of Experiment 1, this outcome thus demonstrates that children remember many labels even after one week has passed, lending support to the idea that children create lasting lexical-semantic representations in this paradigm.

General Discussion
This study investigated how speaker certainty and speaker benevolence, both independently and in combination, influence children's willingness to acquire novel labels in a selective trust word learning paradigm. In line with previous research (Bergstra et al., 2013;Sabbagh & Baldwin, 2001;Sabbagh et al., 2003), speaker certainty was taken to be a reliable cue, with children significantly preferring the labelling of a certain speaker over that of an uncertain speaker. However, in contrast with what was expected for speaker benevolence, given the findings by Doebel and Koenig (2013) and Landrum et al. (2013) that children prefer to go with the labelling of speakers who are described as being well intentioned, this study did not find speaker benevolence to play an independent role in selective word learning. That is, children were not more inclined to learn a novel label-object pairing from a speaker who previously had behaved nicely towards them than from a speaker who had behaved nastily. Furthermore, when children had to choose between, on the one hand, the labelling of a nice and uncertain speaker and, on the other, a nasty and certain speaker, the speaker certainty cue trumped the speaker benevolence cue.
Children thus preferred the labelling of the certain speaker, even if this speaker had behaved nastily to them at the beginning of the experiment. These findings thus suggest that speaker certainty, as a rational cue in the word learning domain, trumped the emotional speaker benevolence cue. Interestingly though, speaker benevolence is not wholly disregarded by the child, as children were found to prefer the labelling of a nice speaker over that of a nasty speaker if they both expressed certainty about their labelling. Children thus go with the certain speaker regardless of benevolence, but they do prefer nice and certain speakers over nasty and certain speakers. We can assume, then, that children do show a preference towards nice speakers to some extent, but that, by itself, this cue was not strong enough to guide them.
Given that previous studies have found emotional cues to be strong enough to influence selective learning (with children preferentially learning from attractive, physically able, non-obese individuals, as in Bascandziev and Harris, 2014, 2016, and Jaffer and Ma, 2015, our findings raise the question why speaker benevolence did not have a stronger effect on children's word learning. Perhaps the benevolence manipulation in our experiments was not strong enough for it to be a sufficient cue to go on by itself. We assumed that children would think well of a speaker who shared stickers with her and badly of a speaker who refused to do so, but it may be the case that children just don't consider a failure to share stickers with others as particularly unkind behaviour. They may not be inclined to share their own stickers with others either and thus may not consider lack of sticker sharing to be particularly unreasonable. Evidence in favour of this idea comes from Experiment 2. Even though 12 out of 15 participants considered the sticker-sharing puppet to be the nicer puppet in Experiment 1, in Experiment 2 only 27 out of the 48 children said they liked the sticker-sharing puppet more. The sticker-sharing memory thus seems to have still been present at the end of Experiment 1 (allowing children to correctly identify the nicer speaker, even though the act of sticker sharing may not have evoked enough goodwill to actually influence selective trust), but may have waned for many children by the end of Experiment 2, which was both longer and more complex (consisting of more trials and requiring more bits of information to be processed) than Experiment 1.
Furthermore, for our benevolence manipulation to be effective, the child has to remember the sticker-sharing event explicitly, whereas this was not a requirement for other studies that have shown the influence of emotional cues (factors like a speaker's attractiveness, and level of obesity are constantly visible to the child and do not rely on memory). Given that the benevolence cue thus had to be explicitly remembered on the basis of information provided only at the beginning of the experiment, whereas the speaker certainty information was provided in each trial, the benevolence cue may just not have been strong enough or active enough for it to influence the child's labelling choices. At any rate, in future research a more salient means of manipulating speaker benevolence or a paradigm in which the benevolence information is cued in each trial should be considered. With a stronger manipulation, benevolence may turn out to be a more important factor in selective learning than the outcome of this study suggests.
Although this caveat should thus be kept in mind, the results of the current study suggest that children give more weight to cues pertaining to speaker competence (speaker certainty) than to cues pertaining to speaker benevolence. Children thus seem to prefer the rational speaker competence cue over the emotional benevolence cue. Although some previous studies have found the opposite pattern (Landrum et al., 2013), it should be noted that this would seem to be the more optimal strategy in learning novel words, at least as regards these specific instantiations of rational vs. emotional cues. After all, speakers who express certainty provide explicit information regarding their convictions about their labelling accuracy. The child may or may not believe these convictions to be accurate, but at least these speakers are being explicit about the likelihood with which they consider their labels to be correct. The nice speakers, on the other hand, did not provide any specific information in this respect. The only thing the child had to go on in these cases was an epistemically unfounded bias towards nice speakers. This might be enough for children if the benevolence manipulation is strong, but if it is relatively weak, as it seemed to be in the current study, benevolence cues may be largely ignored in favour of speaker competence cues.
However, what the findings of the current study do not make clear is the extent to which children have an explicit understanding of the different epistemic status of rational vs. emotional cues. That is, the current study cannot determine whether children preferentially learn from certain speakers over nice speakers because they realise that speaker certainty is in principle a more reliable cue than benevolence in the domain of word learning, or whether certainty was simply a more salient cue than benevolence in the set-up of the current study (see also Heyes, 2017, for a similar argument relating to developmental studies investigating selective trust). It should also be noted that, in theory at least, it is not even necessarily clear that the emotional cue used here is not based on some kind of rational thought process. That is, although there are no clear epistemic grounds on which to assume that nice speakers are more likely to know what a particular novel object is called as compared to nasty speakers, one could come to prefer the nice speaker's label-object pairing on the basis of reasoning. The child may assume that a nice speaker is less likely to wilfully mislead her by providing a label that they know to be incorrect (although they may be incorrect for other reasons). This would thus entail that the chances of learning the correct label may well be higher if the nice speaker is preferred over the nasty one. The results of the current study do not suggest that this is what is happening, as children do not prefer the label-object pairings of the nice speaker over those of the nasty speaker, even if that is the only information they have to go on. However, future studies that do find clearer evidence for reliance on emotional cues should investigate whether this is based on an epistemically unjustified halo-effect or on some form of rational reasoning.
In addition to investigating children's selective trust, this study also aimed to determine whether the label-object links that the children learn in these sort of experiments are enduring. Do children actually store these novel label-object pairings in their lexicon or are these pairings forgotten immediately after their first exposure? While Sabbagh and Shafman (2009) showed that children do not learn words from ignorant speakers, our study showed that once children have created a label-object link this does endure for at least a week, regardless of the particular characteristics of the speakers who were guiding them. That is, in Experiment 1 children retained their initial label-object link for a week even when the only information they were provided with came from nice vs. nasty speakers, a cue that, in itself, did not influence children's choices. The outcomes of this study therefore suggest that children do retain label-object links in this paradigm, as for both experiments children remembered which object they had paired a particular novel label with a week earlier, but they do this even if they are not clearly placing their trust in one specific speaker. Once children have decided on a specific label-object pairing, they thus stick with it for at least a week, even if they have no clear assurances that this pairing is indeed correct.
In conclusion then, this study demonstrates that enduring label-object links are made in selective trust word learning paradigms and that children preferentially learn from certain speakers over uncertain speakers. Speaker benevolence, on the other hand, has a more subtle role to play in word learning: in itself it is not used as a word learning cue, but it can flip the balance in favour of one speaker over another if combined with rational cues to labelling accuracy.