The effects of exposure and explicit stereotypes on veracity judgments of Polish-accented English speech: A preregistered close replication and extension of Boduch-Grabka & Lev-Ari (2021)

Boduch-Grabka and Lev-Ari (2021) showed that so-called “ native ” British-English speakers judged statements produced by Polish-accented English speakers as less likely to be truethan statements produced by “ native ” speakers and that prior exposure to Polish-accented English speech modulates this effect. Given the real-world consequences of this study, as well as our commitment to assessing and mitigating linguistic biases, we conducted a close replication, extending the work by collecting additional information about participants ’ explicit biases towards Polish migrants in the UK. We did not reproduce the original pattern of results, observing no effect of speaker accent or exposure on comprehension or veracity. In addition, the measure of explicit bias did not predict differential veracity ratings for Polish-and British-accented speech. Although the current pattern of results differs from that of the original study, our finding that neither comprehension nor veracity were impacted by accent or exposure condition is not inconsistent with the Boduch-Grabka and Lev-Ari (2021) processing difficulty account of the accent-based veracity judgment effect. We explore possible explanations for the lack of replication and future directions for this work.


Introduction
A number of studies have demonstrated that so-called "native" and "non-native" speakers judge statements produced by "foreign-accented" speakers as less likely to be true than those produced by "native speakers" (e.g., Boduch-Grabka & Lev-Ari, 2021;Hanzlíková & Skarnitzl, 2017;Lev-Ari & Keysar, 2010).Lev-Ari and Keysar (2010) further found that raising listeners' awareness to the (presumed) source of difficulty by asking them to rate the speech for understandability partially modulated the negative effect, leading the authors to conclude that these judgments of reduced veracity of "non-native" speech can be attributed to processing difficulty.Other studies have similarly investigated listeners' judgments of the veracity of statements produced by "native" and "non-native" speakers, but with mixed results.Some have observed no reduction in veracity judgments for "non-native" speech (e.g., Souza & Markman, 2013;Wetzel & Gygax, 2021), even when their listeners show evidence of holding stereotypes about the credibility of "non-native" speakers in a separate task (e.g., Stocker, 2017).Importantly, there is debate in this literature concerning whether observed reductions in veracity judgments for "nonnative" speech are attributable to processing difficulty, accent-based prejudice, or both (e.g., Castillo et al., 2014).These studies have varied along many dimensions, including geographic and national context, languages, speech content, and the type of rating task, leading to crucial questions about the "nature, repeatability, and generalizability" of the effect (Liu et al., 2023).

Study to be replicated
Building on this growing line of research, Boduch-Grabka and Lev-Ari (2021) sought to further isolate the impact of processing difficulty on veracity judgments by easing this difficulty through exposure to "native" or "non-native" accents prior to the veracity judgment task.Studies have shown that comprehension of unfamiliarly accented speech improves following even a few seconds of exposure (e.g., Clarke & Garrett, 2004), and Boduch-Grabka and Lev-Ari (2021) hypothesized that if the "[veracity judgment] effect is at least partly due to difficulty of processing the speech, then improving listeners' ability to understand the speech should reduce their tendency to find the speech less credible" (p.5).In their study, 220 "native speakers" of English were randomly assigned to "British" (i.e., "native" British English speech) or "Polish" (i.e., "Polish-accented" English speech) exposure conditions.During the exposure phase, they listened to eight brief stories in English (M = 179 words), each read aloud by a different "British" or "Polish" speaker, according to exposure condition.Next, participants listened to one of two counterbalanced lists of 50 trivia statements, half produced by eight "Polish" speakers and the other half by six "British" speakers; half of the statements were true, and half were false.Participants judged each statement on a 100-point continuous false-true scale.Finally, all participants heard eight sentences extracted from the exposure phase stories produced by the "Polish" speakers and transcribed them.They found that (1) participants in the "British" exposure condition judged "British-accented" trivia statements as more true than those produced by "Polish" speakers; (2) there was an interaction between exposure and trivia statement accent such that Polish-accented exposure reduced the difference in veracity judgments of British-accent versus Polish-accented speech; (3) "Polish" exposure participants judged Polish-accented trivia statements as more true than did "British" exposure participants; (4) "Polish" exposure participants were more accurate than "British" exposure participants at transcribing the Polish-accented sentences in the comprehension task; and (5) a mediation test demonstrated that once comprehension score was taken into account, the effect of exposure condition was no longer significant, a finding the authors interpret as indicating that "the reason that the exposure to Polish accent increased belief in statements delivered in Polish-accented speech is because it improved participants' comprehension of the accent" (p.9).Boduch-Grabka and Lev-Ari (2021) concluded that the prior exposure to "Polishaccented" speech led to more accurate comprehension for participants in the "Polish" exposure condition, which in turn moderated the negative effect of "non-native" speech on veracity judgments.

The current study
The current study advances the authors' commitment to interrogating and moderating the socio-cognitive biases that underpin responses to language and language users (please see the author positionality statement at https://osf.io/etgc4).Research on the perceived veracity of speech produced by speakers from various backgrounds has important implications for the fields of applied linguistics, criminal justice, psychology, and many others.If, as demonstrated by Boduch-Grabka and Lev-Ari's (2021) study, accent-based veracity effects can be mitigated via minimal interventions (in this case, by brief prior experience with the "accented" speech), replication studies should be conducted to confirm and expand upon the findings.The Boduch-Grabka and Lev-Ari (2021) study is thus an excellent candidate for independent replication, and the current study allows us to determine the replicability of the Boduch-Grabka and Lev-Ari (2021) findings.
The preregistration for this replication study can be found at: https://osf.io/ry8hm,and study materials, data, and analysis code are available at: https://osf.io/etgc4.This is a "close replication" of Boduch-Grabka and Lev-Ari (2021) following the characterization provided by Porte and McManus (2019, starting p. 72).A significant modification to Boduch-Grabka and Lev-Ari's (2021) study procedures is the introduction of a new continuous participant background variable: an index of explicit bias towards Polish migrants in the UK (adapting the explicit bias task of Babel and Russell, 2015).Given the mixed results and relatively small effect sizes associated with earlier studiesalong with the debate over the source of the veracity effects that have been found-we hypothesized that this variable may help to clarify the factors that control the effect size, source, and generalizability of the findings.The addition of the explicit bias variable allows us to examine the independent contributions of processing difficulty and accentbased prejudice on veracity judgments.As suggested by Boduch-Grabka and Lev-Ari (2021), "[i]t is quite possible that participants' lower belief in the Polish-accented statements was due to both prejudice and processing difficulty" (p.11); here we ask whether we can account for additional variance in the veracity judgment data by adding a measure of accent-based prejudice.

Terminology
We have placed quotation marks around the terms "native" and "non-native" when referencing their use in previous studies to draw attention to the problematic nature of these terms.Cheng et al. (2021) note that "simply reporting that 'native speakers' participated or recorded stimuli clearly does not provide information adequate for replication" (p.7) and that unacknowledged differences in the definitions of the term "native" and "non-native" among researchers may even be partially responsible for low rates of replication in psychology and related fields.We affirm the problematic nature of such essentializing terms, which lead to vagueness and potential harm in psycholinguistics research.Thus, an additional change from the original study is that we collected and reported richly detailed information from our study participants about their language backgrounds using instruments recommended by Cheng et al. (2021).
We additionally acknowledge that the characterization of study materials as representing "British accent" and "Polish accent" is similarly problematic: Is the English produced by Polish migrants in Britain not also a variety of British English?How were the individuals who represented these accents selected for the original study's materials?Because we employ the very same speech materials as the original study and do not know the precise criteria that were used to select the speakers, we reproduce to some extent the vagueness and potential harm associated with these labels.However, to be more precise about what we assume to be the relevant properties of the speech samples and to acknowledge that all materials are produced by speakers of English, we use the terms "Polish-accented English" and "British-accented English" when referring to speech materials, and "Polish-accented" and "British-accented" to refer to the speakers who produced the materials.

Method
Participants Participants (N = 222, following Boduch-Grabka and Lev-Ari's (2021) sample size of 220) were recruited via Prolific.comand were paid the equivalent of 20 USD/hour for completing the ~28-minute study.In the original study, "[p]articipants were first screened for native language and having no Polish friends or family members" (p. 6).On the assumption that "native language" refers to British English, we used the following Prolific screening criteria to recruit the sample of 222: country of birth, nationality, and current location were identified as the United Kingdom; current UK area of residence was England; and first language, earliest language in life, and primary language were English (these are the demographic category options provided by Prolific that were best-suited to the goal of recruiting "native speakers" of British English).We used a post-experiment questionnaire administered via Qualtrics.comto collect additional information about these participants and confirmed that none identified "Polish people" as their predominant social group.Two participants reported problems when registering their responses (one was unable to type responses during the comprehension task, and the other encountered an error while completing the participant questionnaire), and four participants did not meet the original study's criterion of achieving 6/8 accuracy on the attention-check questions ("[t]o be included, participants had to respond correctly to at least six of the eight [attention check] questions" (p.7); see below for attention-check task details).All 216 remaining participants reported that they consider English to be (one of ) their native language(s).Many reported having studied and/or being familiar with one or more additional languages: Ancient Greek (n = 1), Arabic (n = 6), British Sign Language (n = 5), Cantonese (n = 2), Danish (n = 1), Dutch (n = 7), Egyptian (n = 1), French (n = 93), German (n = 58), Greek (n = 3), Gujarati (n = 1), Hindi (n = 2), Icelandic (n = 1), Igbo (n = 1), Irish (n = 1), Italian (n = 17), Japanese (n = 11), Korean (n = 5), Ladino (n = 1), Latin (n = 13), Lingala (n = 1), Mandarin (n = 1), Polish (n = 4), Portuguese (n = 3), Russian (n = 5), Spanish (n = 57), Swedish (n = 2), Tagalog (n = 1), Turkish (n = 1), Twi (n = 1), Urdu (n = 3), Welsh (n = 6), and Yoruba (n = 2).Their Prolific profiles indicated that they came from the following regions within the UK: East Midlands (n = 20), East of England (n = 28), London (n = 29), North East (n = 12), North West (n = 29), South East (n = 27), South West (n = 22), West Midlands (n = 26), and Yorkshire and the Humber (n = 22).One participant met the location criterion but did not give consent to report the specific region.A fuller report of participant characteristics can be found in the OSF repository.Table 1 summarizes notable recruitment and participant differences between the original and current studies.

Materials
All audio materials used in the present study were those used by Boduch-Grabka and Lev-Ari (2021) and were retrieved from their Open Science Framework repository (https://osf.io/a2jcw/).Additional materials, including the attention-check task questions, some experiment instructions, and information about which sentences to extract from the exposure audio files for use in the comprehension task, were secured via email communication with original study author Lev-Ari.All other materials were inferred from the Boduch-Grabka and Lev-Ari ( 2021) article (e.g., some experiment 121 participants identified as female, 92 as male, 1 as "neither of these", and 2 selected "I prefer not to say"; Age range: 19-87, M = 40.0,SD = 13.02 Country of birth, nationality, current location, and current residence not specified.
Country of birth, nationality, and current location was the UK, and current residence was England.
Identified "British people" or "Other" (not "Polish people") as the predominant ethnic composition of their social group, including friends, family, and co-workers.
instructions, participant background questions, and experiment presentation code) or created by the present authors (e.g., the explicit bias task statements), and can be retrieved at https://osf.io/etgc4/.

Procedure
All data reported here were collected online during November 2022.Participants were required to use a laptop or desktop computer with a keyboard and headphones to participate in the online experiment (developed using Psychopy (Peirce et al., 2019), and hosted online via Pavlovia.org)and the accompanying post-experiment questionnaire and explicit bias task (hosted online via Qualtrics.com).Upon consenting to participate and completing a four-question sound check task (multiple-choice auditory word identification administered via Qualtrics.com)with 100% accuracy, participants were randomly assigned to one of two exposure conditions (British-accented exposure or Polish-accented exposure) and to one of two counterbalancing list conditions (List 1 or List 2) and were automatically redirected to the experiment's tasks.

Exposure phase
Participants listened to eight randomly ordered audios of paragraph-length statements, each produced by one of eight Polish-accented English speakers or one of eight Britishaccented English speakers, according to the participant's exposure condition.Prior to listening to the statements, participants read the following paragraph (adapted from materials received from author Lev-Ari via email correspondence): Police personnel are often trained in how to better understand and evaluate statements made by victims and witnesses.We are studying how members of the general public understand and evaluate such statements.We would therefore ask you to listen to both police-related and neutral statements and ask you questions about them.According to recent statistics, Poles are the biggest non-UK-born population in the UK with around 853,000 Poles residing in the UK in 2016.For this reason, the recordings you will listen to might include several Polish speakers.
After hearing each passage, participants responded to a multiple-choice listening comprehension test item to confirm that they were paying attention, for a total of eight attention-check items (also received from author Lev-Ari via email correspondence).

Veracity judgment task
Next, participants judged the veracity of 50 randomly ordered trivia statements (half of which were true and half of which were false), each of which was produced by one of eight Polish-accented English speakers or one of six British-accented English speakers.
Half of the statements were Polish-accented and the other half British-accented, with the language background of the speakers counterbalanced across List 1 and List 2. Following each statement, participants were asked to use a FALSE-TRUE slider to indicate how likely they thought the statement to be true.There was no time limit, and participants advanced to the next item immediately upon registering a response.

Comprehension task
One sentence from each of the eight Polish-accented witness statements was extracted for presentation in the comprehension task (the identity of these sentences was provided by author Lev-Ari via email correspondence).Sentences were presented in random order, and following the presentation of each sentence, participants were asked to transcribe the sentence by typing in a text box.There was no time limit, and participants advanced to the next item immediately upon pressing the enter/return key.

Participant background questionnaire
Upon completing the comprehension task, participants were automatically redirected to a questionnaire to complete the remaining tasks.Participants were prompted to indicate their age and gender, identify the predominant ethnic composition of their social group ("Polish people", "British people", or "Other"), and list and describe their relationship to all of the languages they use, following recommendations for robust language background descriptions provided by Cheng et al. (2021).

Explicit bias task
The explicit bias task occurred at the very end of the study session so as not to interfere with the close replication of the original study.The task was adapted from one described by Babel and Russell (2015), which assessed the degree of participants' explicit stereotyped views of "Asian Canadians" and "White Canadians" in Canada.Their task involved ten statements, half of which would elicit a Strongly Disagree (1) response and half of which would elicit a Strongly Agree (7) response if participants held stereotyped views about "Asian Canadians".To create our analogous six-item explicit bias task concerning Polish migrants in the UK, we relied on scholarly literature and media reports of stereotypes about Polish migrants in the UK.Stereotypes included having a strong work ethic (Dunin-Wasowicz, n.d.), taking jobs from British workers (Rzepnikowska, 2019), being "benefits spongers" (Dunin-Wasowicz, n.d.; Portas, 2018), achieving low education levels (Portas, 2018), and having poor English-language skills and unsophisticated manners (Portas, 2018).Table 2 presents the six statements used in the explicit bias task.On completion of the explicit bias task, participants were automatically redirected to the Prolific site where they were notified that they had received compensation for their • Polish migrants in England speak English as well as English people do.
• English people are more likely to take advantage of benefits and welfare services than Polish migrants.
• Polish migrants typically have more sophisticated manners than do English people.Strongly Agree (7) • Polish migrants take jobs away from English people.
• English people are more educated than Polish migrants.
• Polish migrants have a stronger work ethic than English people.
Note: This table presents the six explicit bias task statements, organized by the expected responses if one holds stereotyped views about Polish migrants in England.
The effects of exposure and explicit stereotypes 7 https://doi.org/10.1017/S0272263124000123Published online by Cambridge University Press participation.Table 3 summarizes notable material and procedure comparisons between the original and current studies.

Hypotheses
The following hypotheses summarize the findings of Boduch-Grabka and Lev-Ari (2021): • Hypothesis 1: Veracity ratings would be lower overall for Polish-accented statements than for British-accented statements.• Hypothesis 2: We would observe an interaction of Exposure condition and Trivia Speaker condition such that the effect of Trivia Speaker would be smaller for participants in the Polish-accented Exposure condition.• Hypothesis 3: Participants in the Polish-accented Exposure condition would assign higher veracity judgments to Polish-accented statements than would participants in the British-accented Exposure condition.• Hypothesis 4: Participants in the Polish-accented Exposure condition would show more accurate comprehension of Polish-accented English speech samples than would those exposed to British-accented speech.• Hypothesis 5: Comprehension accuracy would positively predict veracity judgments of Polish-accented statements (this is a precondition for the mediation analysis associated with Hypothesis 6).• Hypothesis 6: A mediation analysis would reveal that a significant proportion of the effect of Exposure on the veracity judgments of Polish-accented statements would be due to Comprehension, and that once the effect of Comprehension was taken into account, the effect of Exposure on veracity judgments would no longer be significant.
Hypothesis 1 relates to the finding reported in Boduch-Grabka and Lev-Ari (2021) and elsewhere that veracity ratings are overall lower for "non-native"-accented statements than for "native"-accented statements.Hypotheses 2, 3, and 4 relate to the innovation in the Boduch-Grabka and Lev-Ari (2021) study investigating whether prior exposure moderates an accent-based veracity effect.Hypotheses 5 and 6 concern the mechanism behind any moderating effect of exposure.We further hypothesized that additional

Data collected in November 2022
"The study was conducted online" (p. 6).
The study was conducted online via Qualtrics and PsychoPy/Pavlovia.
Instructions provided to participants generally unspecified.
Task instructions inferred from description of the tasks in the original article and adapted from those provided by the authors via personal correspondence.

Duration of the study not reported
The study took approximately 28 minutes to complete.
No explicit bias task Explicit bias task variance in veracity judgments would be accounted for by the inclusion of an index of Explicit Bias such that participants reporting greater belief in stereotypes about Polish migrants in the UK would exhibit lower judgments of truthfulness of Polish-accented statements.

Data coding and analysis procedures
Each of the eight attention check questions was scored for accuracy (1 = correct; 0 = incorrect), for a maximum possible score of 8.As indicated above, participants who did not achieve a score of at least 6 (out of 8) on this task were excluded from the analysis.As in the original study, responses to the veracity judgment items were converted to 0-100 (false-true) scores.The eight comprehension task sentences contained 60 content words.As in the original study, we counted the total number of these words that were correctly transcribed, for a maximum comprehension score of 60.We created a Python script to do this computation and counted only exact matches as correct (the original study does not specify the criterion for matches).The explicit bias scores were coded such that larger values on the 1-7 scale were associated with higher degrees of explicit bias.
As in the original paper, we used mixed-effects models for statistical analyses.All analyses were carried out using R's lme4 and lmerTest packages.Participants and items were specified as random factors.We report the results from models with the maximal random effects structure justified by the data, consistent with the original paper's approach.See below for more detailed information about model specifications.

Results
To test the effects of exposure and speaker accent on veracity judgements, we created a mixed-effects model with veracity ratings (0-100 scale) as the dependent variable; Exposure condition (British-accented exposure, Polish-accented exposure), Trivia Speaker condition (British-accented statement, Polish-accented statement), and the interaction of the two as fixed effects.The truth value of each trivia statement, trial number, and list number were included as control factors.The model reported below includes random intercepts for participants and items, and by-participant random slopes for trivia speakers.Categorical variables were sum-coded. 1 The model showed that there was no main effect of Trivia Speaker (β = 0.41, SE = 1.279, t = 0.32, p = 0.75), no main effect of Exposure (β = 0.37, SE = 0.45, t = 0.83, p = 0.41), and no interaction between Exposure and Trivia Speaker (β = 0.21, SE = 0.29, t = 0.74, p = 0.46).Thus, we did not find the effects of Trivia Speaker, Exposure, or the interaction of the two reported by Boduch-Grabka and Lev-Ari (2021).Our own descriptive analyses of the original study's data produced the following means for veracity judgments: in the British-accented Exposure condition, 1 The mixed-effect model specification of Boduch-Grabka and Lev-Ari (2021) includes the same independent variable, the same fixed effect factors, and the same random effect structure.Our specification is slightly different from theirs as we included control factors and employed sum-coded variables instead of treatment-coded variables.These modifications were to improve general model fit and interpretability, and they did not change the structure of the variables of interest.Running the analysis with the exact model specification of Boduch-Grabka and Lev-Ari (2021) returned the same results.
The effects of exposure and explicit stereotypes in the Polish-accented Exposure condition, Polish-accented Trivia Speakers = 43.77and British-accented Speakers = 57.96.As can be seen in Figure 1, the means in all four conditions in the current study ranged from 53.20 to 54.83, showing no evidence of the Trivia Speaker and Exposure effects observed in the original study, and thus providing no support for Hypotheses 1, 2, or 3.
In a departure from the analyses conducted by the original authors, to accommodate the possibility that participants did not use the full range of veracity ratings, which could mask the effects of Trivia Speaker and Exposure, we z-scored the veracity ratings of each participant and created a mixed-effects model with the same specifications presented above.The random effects structure for this model included random intercepts for items.The results of modeling this z-scored data were the same.There was no main effect of Trivia Speaker (β = 0.01, SE = 0.05, t = 0.2, p = 0.82), no main effect of Exposure (β = -1.13e-17,SE = 0.008, t = 0, p = 1), and no interaction between Exposure and Trivia Speaker (β = -0.005,SE = 0.008, t = -0.3,p = 0.57).
As in the original study, we performed a linear regression with Comprehension Score as the dependent variable and Exposure condition (British-accented exposure, Polish-accented exposure) as a predictor to test if exposure to Polish-accented English speech was associated with higher comprehension scores of Polish-accented English sentences.Our own analyses of the original study's data revealed that the British-and Polish-accented exposure conditions had mean comprehension scores of 35.99 and 51.71, respectively.In contrast, with means of 50.47 (British-accented exposure) and 50.10 (Polish-accented exposure; see Figure 2), the current study showed no difference in transcription accuracy between the two Exposure conditions (β = -0.37,SE = 1.13, t = -0.33,p = 0.74), and thus no support for Hypotheses 4 or 5.Because this pattern of results did not meet the criteria for a mediation analysis, we did not conduct the subsequent mediation analyses reported in the original study and associated with Hypothesis 6.
The above analyses are based on data from 216 participants, with exclusions based only on criteria explicitly mentioned in the original article.There are, however, additional exclusionary criteria employed by researchers to ensure data quality.For an experiment conducted remotely, it is common to exclude participants who took too long to complete the task, as it suggests participants might have been distracted or experienced disruptions.In the following analyses, we excluded an additional 15 participants who took longer than 40 minutes to complete the task.In addition, because we collected rich language histories of our participants, we were able to identify and exclude data from four participants who reported prior study or other experience with the Polish language.Finally, we excluded data from nine participants who did not identify "British people" as their predominant social groups.Twenty-seven participants met one or more of these additional exclusionary criteria, resulting in a data set of 189 participants.When we repeated the analyses described on this smaller yet cleaner dataset, the results with respect to the hypotheses were the same.There were no effects of Exposure or Trivia Speaker on veracity ratings.There was no effect of Exposure on comprehension sores.Given the lack of differences in the findings based on the two data sets, we conducted the remaining analyses with this smaller and cleaner data set.
Next, we conducted (pre-registered) exploratory analyses to determine whether variance in the veracity judgment data was accounted for by the continuous index of explicit bias that was collected via the explicit bias task.To this end, we created a mixed effects model similar to the initial model above, with the addition of Explicit Bias Score (the composite score) and its interaction terms with the other fixed effects.As with the original model, this model showed no main effect of Trivia Speaker (β = -0.55,SE = 2.33, t = -0.24,p = 0.81) and no main effect of Exposure (β = 3.05, SE = 2.83, t = 1.08, p = 0.28).The main effect of Explicit Bias Score was significant (β = -0.32,SE = 0.16, t = -2.58,p = 0.01).None of the interactions involving the three factors were significant (all p's > 0.25).
As seen in Figure 3, the main effect of the composite explicit bias score showed that the listeners' veracity judgment scores decreased as their explicit bias scores increased.However, this effect was not modulated by Trivia Speaker (top panel of Figure 3) or Exposure (bottom panel).

Discussion
We conducted a close replication of the study reported by Boduch-Grabka and Lev-Ari (2021), which demonstrated that exposure to Polish-accented English speech modulated subsequent judgments of the veracity of Polish-accented statements and further demonstrated via a mediation analysis that improved veracity ratings following exposure might be attributable to a decrease in processing difficulty.Hypotheses 1-6, detailed above, summarize the Boduch-Grabka and Lev-Ari (2021) findings; here we summarize our findings with respect to each hypothesis: • Hypothesis 1: We found no evidence that veracity ratings were overall lower for Polish-accented statements than for British-accented statements.• Hypothesis 2: We did not observe an interaction of Exposure condition and Trivia Speaker condition.• Hypothesis 3: Participants in the Polish-accented Exposure condition did not assign higher veracity judgments to Polish-accented statements than did participants in the British-accented Exposure condition.• Hypothesis 4: We observed no effect of Exposure condition on comprehension accuracy for Polish-accented sentences.
To summarize the current findings with respect to Hypotheses 1-4, we did not reproduce the effect of accent on veracity judgments that has been previously reported (Boduch-Grabka & Lev-Ari, 2021;Lev-Ari & Keysar, 2010) or the Boduch-Grabka and Lev-Ari (2021) finding that prior exposure to Polish-accented English speech improves veracity judgments of Polish-accented statements.
• Hypothesis 5: Comprehension scores did not predict veracity judgments of Polishaccented statements.• Hypothesis 6: Because we observed no effect of Exposure condition on comprehension accuracy (Hypothesis 4), the conditions for conducting a mediation analysis were not met (Renard, 2019).
We similarly did not reproduce the effects of accent and exposure on comprehension scores (Hypotheses 5 and 6).This finding is consistent with the original authors' argument that processing difficulty operationalized as comprehension scores (partially) explains accent-based veracity judgment effects: if, as we observed here, there is no effect of accent on comprehension scores, no accent-based veracity effect caused by processing difficulty is predicted.
In addition to conducting a close replication of the original study, we elicited participants' self-reported explicit biases relating to "British people" and "Polish migrants".Pre-registered exploratory analyses indicated that the inclusion of a composite measure of explicit bias relating to Polish people significantly improved the mixed effects models of veracity judgments for both Polish-accented and Britishaccented statements, suggesting that explicit bias as measured here did not contribute to an enhanced understanding of listeners' responses to Polish-accented speech in particular.It is also worth noting that a lack of relationship between listeners' explicit biases and their responses to speech accents has been observed in other studies as well (Babel & Russell, 2015;Pantos & Perkins, 2013).

Comparing the current and original studies
Why do the current results differ from those of Boduch-Grabka and Lev-Ari (2021)?It is estimated that approximately one-third to two-thirds of replication studies in the social and psychological sciences have not replicated the original study results The effects of exposure and explicit stereotypes (Camerer et al., 2018;Open Science Collaboration, 2015).Reasons for this lack of replication may be random (e.g., statistical error) or systemic (e.g., bias against publishing null results or research misconduct), and may additionally be attributable to differences in the implementation of the original and replication studies.Methodological discrepancies may result from factors such as insufficient methodological detail provided by the original researchers, original or replication researcher errors, or inherent and unavoidable differences like the time frame when data collection occurred.In Tables 1 and 3 above, we summarized notable methodological comparisons between Boduch-Grabka and Lev-Ari (2021) and the current study.
According to 2021 census data, Polish was the second-most spoken "main language" in England and Wales, unchanged from the 2011 census (Office for National Statistics, 2022).The proportion of short-term residents in England and Wales who were born in Poland decreased between the 2011 and 2021 census (Office for National Statistics, 2023b), while Poland retained its position of the second most frequent country of birth of long-term residents from 2011 to 2021, following a rapid change from 18th to 2nd place between 2001 and 2011 (Office for National Statistics, 2023a).These shifts in migration patterns, in addition to the social and cultural effects of Brexit and the Coronavirus pandemic, may have impacted participants' performance in unavoidable and unknown ways, such that the date of data collection, as well as the social, cultural, and political milieu should be considered a crucial rather than extraneous variable in future studies.
Participant recruitment also differed between the original and current studies: while the original study's participants were recruited via contacts, mailing lists, and social media, the current participants were recruited via Prolific.com.In addition, the Boduch-Grabka and Lev-Ari's (2021) participants were "often individuals in publicsector roles" (p.6), though the authors did not specify the proportion of participants meeting this description.Without this information needed to guide recruitment for the replication study, we made no effort to influence the representation of people holding public sector roles among our participants.Depending on the proportion of the original study's participants in these roles, and the actual effects of such roles on study performance, this difference between the studies may be responsible for differences in findings.We also do not know how the original study's participants were compensated and whether possible differences in compensation schemes affected performance on these tasks.
Additional important differences between the participant samples in the two studies may have resulted from differences in exclusionary criteria and their application.For example, the original study's authors did not detail how screening for "native language" was conducted, and we do not know whether the current screening criteria for language background and locations of birth and residence produced a participant sample with similar language and residential history profiles as the original study.They also did not specify whether participants were screened for Polish language experience; in our more limited data set of 189, we excluded four participants who reported study or knowledge of the Polish language.It is also worth noting that most of our participants reported knowledge of more than one language.Given that the original article did not report the language backgrounds of participants, we do not know whether our sample differed from theirs in this regard; however, participants in the British-accented exposure condition in the current study exhibited much higher comprehension scores than did participants in the same condition in the original study, which could be due to exposure in daily life to various accents.Indeed, exposure to multiple accents has been shown to enhance listeners' ability to comprehend accented speech (Baese-Berk, Bradlow, & Wright, 2013).accented English speakers), leading to questions about the generalizability of any effects of exposure.Without examining responses to new speakers, we do not know whether the exposure manipulation here is better characterized as exposure to Polish-accented/Britishaccented English speech, or as exposure to the speech of these particular groups of speakers.Future studies involving different speakers during the various tasks will help to connect this research to broader populations.Finally, this research was conducted entirely online and remotely, and participants were asked to respond to disembodied voices in a communicative setting that is particular to the research.Ultimately, a fuller understanding of these phenomena will be enhanced by studies of veracity judgments in settings more closely approximating the richness of real-world sociocultural contexts.

Conclusion
Replication studies play an important role in the research cycle, contributing additional data to support an understanding of the robustness of published research studies.We conducted a close replication and extension of the study reported in Boduch-Grabka and Lev-Ari (2021).In contrast to the original study, we did not find that British-Englishspeaking listeners judged statements produced by Polish-accented English speakers as less likely to be true than statements produced by British-accented English speakers.We also did not find an effect of prior exposure to Polish-accented English speech on veracity judgments, and no effect of exposure on comprehension of Polish-accented sentences.On the basis of these findings, we conclude that the previously observed accent-based reduction in veracity judgments for this populations of listeners, and this particular set of speech materials, may not be robust to replication.Further inquiry into the nature of accent-based veracity judgment effects should consider the replicability of this and similar studies with the goal of determining the factors that influence the presence or absence of accent-based veracity judgment effects as well as the factors that mediate them.

Figure 1 .
Figure 1.Box plot of veracity ratings by Trivia Speaker condition.Note: Data from participants in the British-accented exposure condition are presented in the left panel; Polish-accented exposure condition in the right panel.Dots indicate individual participant means, and medians are presented as horizontal lines.Superimposed values show the group mean for the corresponding condition.

Figure 2 .
Figure 2. Box plot of comprehension task scores by Exposure condition.Note: Data from participants in the British-accented Exposure condition are on the left; Polish-accented Exposure condition on the right.Dots indicate individual participant means, and medians are presented as horizontal lines.Superimposed values show the group mean for the corresponding condition.

Figure 3 .
Figure 3.Effect plots of explicit bias on veracity rating.Note: The top panel shows the effect of explicit bias score on veracity rating by Trivia Speaker; the bottom shows the effect by Exposure.The shaded areas represent 95% confidence intervals.

Table 2 .
Explicit bias task statements

Table 3 .
Summary of notable methodological comparisons between Boduch-Grabka and Lev-Ari (2021) and the current study