Do more proficient writers use fewer cognates in L2? A computational approach

Bilinguals often show evidence of cross language influences, such as facilitation in processing cognates. Here we use computational methods for analyzing spontaneous English texts written by hundreds of speakers of different L1s, at different levels of English proficiency, to investigate writers ’ preference for using cognates over alternative word choices. We focus on English, since a majority of its lexicon is either of Romance or Germanic origin, allowing an investigation of the preference of speakers of Germanic and Romance L1s towards cognates between their L1 and English. Results show that L2 writers tend to prefer English cognates, and that this tendency is weaker as English proficiency level increases, suggesting diminishing effects of CLI. However, a comparison of the L2 writers with native English writers shows general overuse of cognates only for the Germanic, but not the Romance, L1 speakers, most likely due to the register of argumentative writing.


Introduction
The two languages of bilinguals, who are a majority in the world today (Grosjean & Li, 2013), are not independent of each other (Kroll et al., 2014;Prior, 2014).Specifically, there are influences from speakers' first language (L1) on their second language (L2) (and also vice versa, e.g., Degani et al., 2011), a phenomenon which is termed TRANSFER or CROSS LANGUAGE INFLUENCE (CLI) (Jarvis & Pavlenko, 2008;Odlin, 1989;van Hell & Tanner, 2012).CLI is evident in various language domains, including phonology, morphology, lexicon and grammar, and is one of the reasons for differences between L1 and L2 speakers of the same language.In fact, these differences are so prominent that even highly advanced L2 speakers can be accurately distinguished from L1 speakers (Bergsma et al., 2012;Goldin et al., 2018;Rabinovich et al., 2016;Tomokiyo & Jones, 2001).In the current study of CLI, we focus on L2 speakers' preference for L2 words that have a cognate in the speakers' L1.Our hypothesis is that CLI, as reflected by lexical choice, is correlated with the speaker's L2 proficiency: more proficient speakers will show lower levels of CLI (Degani et al., 2022).We employ corpus-based computational methods to investigate this hypothesis.
This work focuses on cognates, as a much studied test case for CLI.Cognates are words in different languages that have similar forms and similar meanings, either due to a common ancestor in some protolanguage or via borrowing (for example, sofa in English and in Spanish).Due to this cross language similarity, when bilinguals need to process or retrieve a word from the lexicon, cognates are more easily accessed because they are activated in both language systems (e.g., Degani et al., 2018;Dijkstra & Van Heuven, 1998, 2002).Such activation, which is non-selective for language, results in facilitation effects when bilinguals learn or process cognates.Thus, bilinguals are faster to recognize and respond to cognates, relative to non-cognates, when they are presented visually (Dijkstra et al., 2010) or aurally (Woutersen et al., 1995).Cognate facilitation has also been observed using eye tracking: during text reading, bilinguals read cognates faster than noncognates (Cop et al., 2017;Libben & Titone, 2009).Along similar lines, bilinguals are faster and less error prone when producing cognates, when they are reading or translating words out loud ( de Groot, 1992;Schwartz et al., 2007;van Hell & de Groot, 1998), naming pictures (e.g., Costa et al., 2000;Hoshino & Kroll, 2008), or typing words (Muylle et al., 2022).
Here we investigate a somewhat different facet of cognate use in bilinguals: Namely, do bilinguals show preference for using cognates when there is an option to do so?In a series of studies, Prior and colleagues bring evidence to this effect.In an offline single word translation task, Prior et al. (2007) tested moderately proficient bilingual speakers of English and Spanish, and reported that for words that had two or more plausible translations, bilinguals showed a strong preference towards producing a cognate translation, if one existed.In a second study, this preference was also evident in a timed word translation task (Prior et al., 2013).When translating translation-ambiguous words (namely, words with more than a single translation, Schwieter & Prior, 2019), bilinguals once again were more likely to produce the cognate translation if one existed, and were also faster and more accurate when doing so.Finally, this tendency of bilinguals to prefer cognate translations was also evident in professional translators working on full-length texts (Prior et al., 2011).Thus, when bilinguals have an option in word selection, because two words in the target language are synonyms (or close synonyms) of each other, and when one of these words is a cognate with the speakers' L1, bilinguals have a preference for producing the cognate.
Corpus-based computational work has also identified cognate word choice as an important phenomenon shaping the language of L2 speakers.Rabinovich et al. (2018) introduced the L2-Reddit corpus: "a large corpus of highly-advanced, fluent, diverse, nonnative English, with sentence-level annotations of the native language of each author".The resulting dataset included texts written in English by authors with 31 different L1s.Next, they used WordNet and Etymological WordNet (de Melo, 2014) to automatically construct a focus list of synonym sets, such that each set included at least two synonyms with at least two different etymologies.This focus set was further revised to eliminate cultural bias.Based solely on the frequencies of the words in the focus set, and using a hierarchical clustering algorithm, Rabinovich et al. (2018) were able to reconstruct the phylogenetic language tree of the Indo-European language family.In other words, they demonstrated that speakers of different L1s tend to prefer cognates when they write in English, to the extent that it is possible to identify their L1 based mainly on the frequencies of these English words in their texts.
Thus, across both behavioral and computational approaches, there is convincing evidence that when using an L2, bilinguals demonstrate the effects of CLI in their preference for using words that are cognates with their L1.In the current study, we investigate a possible link between L2 proficiency and such cognate preferences.Specifically, we ask whether this preference for cognates is weaker in more proficient L2 speakers.Several previous studies support such a link between proficiency and cognate facilitation effects.For example, Kroll et al. (2002) examined two groups of less and more proficient bilinguals of English (L1) and French (L2) in word naming and word translation tasks.The cognate effect was significant for both groups, but was consistently more prominent for the less proficient bilinguals (see also Poarch & van Hell, 2012).Along similar lines, Rosselli et al. (2014) asked balanced and unbalanced Spanish-English bilinguals to name pictures, denoting cognates and non-cognates, in Spanish and English.The balanced bilinguals demonstrated cognate effects of similar magnitude in the two languages, but the unbalanced bilinguals showed a larger cognate effect when naming in the non-dominant language than in the dominant language (for reviews, see van Hell et al., 2019;van Hell & Tanner, 2012).However, such a link between proficiency and cognate facilitation is not always evident.For example, in a study of visual word processing in bilingual speakers of Arabic and Hebrew, which do not share a script, Degani et al. (2018) again demonstrated robust cognate effects, but these were not modulated by L2 proficiency (see also Prior et al., 2017).
Importantly, these and most other related studies were conducted in a laboratory environment, based on a limited number of participants, native languages, and target words.In contrast, here we use computational analysis to conduct a corpus-based study including thousands of argumentative essays, authored by hundreds of learners with four different L1s, and hundreds of target words.Sampling a more diverse population, who are responding to a free production task, might be more sensitive for identifying the possible modulating impact of proficiency on cognate effects than lab-based comprehension/judgments tasks.
We hypothesize that the preference for cognates in lexical selection, which is clearly evident in the works reviewed here and in many others, is correlated with L2 proficiency, i.e., that it weakens as the L2 proficiency level increases, becoming more similar to native preference.Ideally, we would put this hypothesis to test by using an extensive corpus, representing multiple levels of speakers from a wide range of native languages, much like the L2-Reddit corpus mentioned above (Rabinovich et al., 2018).Such corpora, however, include very little metadata, and in particular are not tagged for user proficiency.Using common measures, both lexical (Kyle & Crossley, 2015) and syntactic (Lu & Ai, 2015), to assess L2 proficiency on such enormous and noisy data sets is complex, expensive, and also sensitive to spurious factors such as prompt (task) and L1, which might pose a real difficulty in this regard (Lu & Ai, 2015;Weiss, 2017).Therefore, we decided to examine the hypothesis using a "cleaner" corpus, which is tagged both for user's L1 and for their level of L2 proficiencynamely, the TOEFL corpus, a dataset of essays written in English by non-native English speakers wishing to enroll in English-speaking universities (Blanchard et al., 2013;Malmasi et al., 2017).Of relevance, we defined proficiency here rather technically, as scores in the TOEFL language aptitude test, without directly addressing the important question regarding the correspondence between standard aptitude tests and "real" language proficiency (e.g., Wisniewski, 2018).
We test our hypothesis using English texts authored by nonnative speakers of English whose L1 is either of Germanic or of Romance origin.This task lends itself particularly well to English as the target language, since despite its Germanic origins, English vocabulary is heavily influenced by Romance languages, mainly French.Historical linguists estimate that about 11,000 words (mainly French and Latin) entered the English lexicon during the Middle English period (Culpeper & Clapham, 1996).The result is that nowadays there are numerous word pairs with different etymology that have approximately the same meaning, such as speed/velocity, start/ commence, where the former is Germanic and the latter is Romance.The Germanic words are often associated with a lower register while their Romance counterparts are considered of a higher register (Franceschi, 2019).Further, native English speaking children learn Germanic words at an earlier age than their Latin-based counterparts (Hernandez et al., 2021).In addition, we also documented the lexical choices of native English speakers to these same word pairs, as evident in a corpus of essays written by native English speakersnamely, LOCNESS (Granger, 1998) and in frequency distributions in a large sample of English (COW; Schäfer, 2015).
We apply computational methods on the TOEFL texts to quantify the tendency of the L2 writers to use words which have a common origin with their L1.Thus, we investigate how lexical choice, as reflected by the use of cognates, is correlated with L2 proficiency on a large scale, targeting hundreds of cognates that occur in argumentative essays written by thousands of learners with several different L1s.We further compare the tendencies of L2 writers to the tendencies of English L1 writers, to test our hypothesis that with increasing proficiency the patterns of L2 writers will grow more similar to those of L1 writers.

Dataset
The main corpus used in the current study is the TOEFL Corpus (Blanchard et al., 2013;Malmasi et al., 2017), a dataset of essays Bilingualism: Language and Cognition written by nonnative English speakers wishing to enroll in English-speaking universities.The essays were evaluated for proficiency by highly skilled annotators, and each was given a grade in the range {low, medium, high}.The corpus consists of 12,100 essays written by native speakers of 11 native languages: 1330 graded low, 6568 graded medium and the remaining 4202 graded high.Each L1 is represented by 1100 essays across all 3 levels.Metadata fields include the author's L1, their proficiency level, and (the index of) the prompt question for the essay (1-8).
In the current investigation, we selected from this corpus nonnative speakers of English whose L1 is of Romance or Germanic origin.The former is represented by Italian, Spanish, and French, whereas the latter only by German.Table 1 shows dataset information by level and by the language and language family of the writers' L1.Evidently, the dataset is not balanced, in more than one aspect.First, it includes significantly more essays written by authors with a Romance L1 compared with German as an L1.Also, in each L1 separately, as well as in the complete dataset, the number of essays per level is unbalanced: the numbers of lowlevel essays are always smaller.This effect intensifies in the number of words, because the low level essays are also shorter than the medium and high level essays, leading to much smaller text samples for low proficiency writers.A standard solution to this problem is down-sampling: randomly selecting subsets of each group, each in the size of the smallest group.This would have resulted in extremely small samples in our case, hampering our ability to yield meaningful results.We opted instead to retain the unbalanced corpus.
To compare the non-native essays with similarly authored native texts, we used a comparable native-speaker corpus, LOCNESS, the Louvain Corpus of Native English Essays (Granger, 1998).We used a subset of 412 essays written by A-level native English speakers from British and American universities.The essays in this corpus are much longer than TOEFL essays.Still, the characteristics of the two datasets are as similar as can be: the writers are of similar age, the genre is argumentative writing, and the setting is a test (see Table 2).
Although the LOCNESS dataset enables a relatively fair comparison with TOEFL, it does not necessarily reflect the common frequency distribution of the target words in the English language in general.The corpus consists of a collection of argumentative essays, written as part of a university exam, by a relatively small group of A-level students.These settings dictate a certain writing style and choice of words.To estimate a more ecologically valid frequency distribution of target words in English, we also examined their frequencies based on a very large corpus of diverse Englishnamely, frequency lists based on COW (COrpora from the Web), a huge collection of linguistically processed web corpora (Schäfer, 2015;Schäfer & Bildhauer, 2012).These frequency lists include the word, its part-of-speech tag, and the number of its occurrences in the corpus.While COW corpora are not claimed to be representative of any specific language variety (in fact, they are known to be biased by many factors, specifically the link structure of the Web), their sheer size makes them good candidates for reflecting "standard" language use, to the extent that such a concept exists.
Ideally, we would like to test our hypothesis considering each individual writer separately.However, since writers vary greatly with respect to their style (including essay length, spelling errors, etc.) such an approach runs the risk of reflecting each writer's personal style rather than their proficiency level and use of cognates.To eliminate noise that might result from such confounding variables, and due to the brevity of text per individual writer, we aggregated all same-grade essays with the same L1 language family (Romance or Germanic).For example, all low-graded essays written by German speaking writers are analyzed as a group named "Germanic low proficiency", while all mediumgraded essays written by Spanish, Italian or French speaking writers are analyzed as a group named "Romance medium proficiency".Similarly, the LOCNESS essays were also analyzed as a single group named "native speakers".

Target word list
In order to investigate how L2 English writers select specific lexical items during written expression in English, we constructed a list of highly frequent words, synonymous in a manner that captures a common sense of the word.A preliminary list of English synonyms was identified using online resources 1 , and also relying on words identified in previous research (Rabinovich et al., 2018).These lists were then manually evaluated by the authors with the goal of identifying synonym sets which included mainly medium to high frequency words, which have a greater probability of being used by L2 writers (Mean word frequency was 57.3 per million, SD=186, based on SUBTLEXus - Brysbaert & New, 2009).We then defined a set of English synonym sets (synsets): each synset includes two or more synonyms originating from different language families.In addition, we assigned a part-of-speech (POS) tag to each word, in order to reduce ambiguity.The words in each synset are exclusively of Germanic or Romance origin; we used Wiktionary to determine word etymology.The full list includes 235 synsets, each including at least one word with Germanic origin and at least one word of Romance origin, and 537 words in total.Examples include the nouns {mistake, error}, where the former is of Germanic origin and the latter is Romance; or the adjectives {endless, everlasting, eternal, infinite}, where the first two are Germanic and the last two are Romance (see Table S1 in the supplementary materials, as well as the online repository, for the complete list).Note that the target words, identified as described above, do not necessarily have cognates in all the various L1s included in our study, even though etymologically they come from the relevant language group.Many do, of course (e.g., bloom has a cognate in German, while flower has a cognate in French; similarly, Germanic full vs.Romance complete), but it is possible that some do not.At any rate, the existence of a few synsets which include words that might not have direct cognates in all four L1s only works against our hypothesis and makes it more difficult to find evidence that supports it, because it would add random noise rather than amplify the signal of "real'' cognates that we are after here.

Preprocessing
Ideally, we would like to capture all occurrences of words from the list that retain the sense reflected by their synonym set.For example, synset 102 includes the nouns {lift, elevator}.We would not want to miss the plural form (lifts, elevators), but also would not like to include the verb sense of lift, since it is not a valid alternative for elevator.To address this issue we used spaCy (Honnibal et al., 2020) to lemmatize and POS-tag the entire dataset (native and nonnative).Although a consideration of part-of-speech reduces ambiguity, it is far from a perfect solution, because different senses of the words can still exist within the same part of speech (e.g., lift as a noun can have the meaning of a ride in addition to that of an elevator).

Procedure
In order to investigate lexical choice in L2 writers of English, we refer to the tendency to use English words with a Germanic origin as Germanic Tendency (GT), and to the tendency to choose English words with a Romance origin as Romance Tendency (RT).
The Germanic Tendency of authors whose proficiency level is level with respect to a synset s is defined as the number of occurrences, in all essays of level level, of words included in s that are of Germanic origin, divided by the total number of occurrences in the same essays of all the words included in s.Formally, let S be the collection of all synsets considered; for a synset s ∈ S, let G(s) be the words of Germanic origin included in s, and R(s) be the Romance-originating words in s.For a set W of words, let #(W, level) be the total number of occurrences of the words in W in essays of proficiency level level.We then define: ; RT(level, s) For example, consider synset 79namely, s 79 = {excellent, fantastic, great, wonderful}.The Germanic and Romance subsets of s 79 are G(s 79 ) = {great, wonderful} and R(s 79 ) = {excellent, fantastic}, respectively.Bilingualism: Language and Cognition Then, we define the Germanic Tendency of authors whose proficiency level is levelnamely, GT(level), and similarly the Romance Tendency RT(level), as the (macro) average of GT (level, s) (and, respectively, RT(level, s)), across all synsets:

Basic descriptive patterns
We first examined the writers' average tendency to select words from the same language family as their L1, for each critical group (writers whose L1 is Romance, and writers whose L1 is Germanic) separately.That is, we examined the Germanic Tendency of German writers, and the Romance Tendency of Italian, French and Spanish writers.We calculated for each group and for each proficiency level (low, medium, and high) the average of this tendency.To compare the L2 writers' word selection preferences with those of native English speakers (which we assume are not influenced, at the group level, by knowledge of additional languages), we repeated the same calculation based on the essays of LOCNESS.Finally, we used the frequency lists from COW to compute a measure of how words from the target synsets are used in a general purpose, enormous corpus of English, outside the constraints of argumentative essay writing.We expected that higher levels of L2 proficiency in L2 writers would be associated with a weaker Germanic or Romance tendency; additionally, we expected native English speakers to select English words of Germanic origin less often than L2 users with a Germanic L1, and words of Romance origin less often than L2 users with a Romance L1.Due to the limited size of TOEFL and the lower level of lexical richness among L2 learners in general, some words from the target list appear very few times (or not at all) in the text.This sparsity is most evident in the sample of essays receiving grades of low proficiency 2 .To guarantee robustness in the calculation of Germanic and Romance tendencies, a minimum number of occurrences of the word types in each synset is necessary.We therefore include here only synsets in which all word types appear at least 3 times in the sample of essays at each level of proficiency 3 .

German L1
Figure 1 presents the Germanic tendency of low, medium and high proficiency German writers, and the baseline frequency estimates: NATIVE speakers reflected by essays in the LOCNESS dataset, and GENERAL (web-crawled) English, based on word frequencies from COW.The Germanic tendency is computed for 15 synsets whose words occur at least t = 3 times in essays of each level (a total of 232, 6,190, and 10,864 occurrences for the low, medium and high level essays, respectively).As described above (in equation 2) this was calculated as the macro-average.The error bars show the standard error of the mean; the large variability is probably due to the relatively small number of synsets.
The pattern visible in Figure 1 is consistent with the research hypothesis, as the Germanic tendency decreases for German speaking non-native writers who are more proficient in English.Both LOCNESS native authors and the values from COW reflect lower use of the Germanic alternatives within the target synsets compared with those demonstrated by the L2 writers.

Romance L1s
Figure 2 shows the Romance tendency of Romance L1 speakers, based on 75 synsets whose words occur at least t = 3 times in essays of each level (a total of 2,345, 20,420 and 17,071 occurrences for the low, medium and high level essays, respectively).As above, the figure presents macro-averages and standard errors of the mean.
The values of the Romance tendencies are overall lower than those of the Germanic tendencies.Values under 0.5 mean that in general, even Romance writers prefer the Germanic alternatives over the Romance ones.Focusing only on the three levels of L2 English speakers, a visual inspection again shows that more proficient writers demonstrate a weaker Romance tendency, which is consistent with the hypothesis.
However, the native speakers in the LOCNESS corpus tend to select Romance alternatives at a higher level than do all of the L2 groups, in contrast to our hypothesis.The Romance tendency value based on COW frequencies is between the low and medium levels of the L2 English speakers 4 .We return to these issues following the statistical analyses.The patterns presented in Figures 1 and 2 support the hypothesis that there is a monotonically decreasing relation between L2 proficiency and the tendency of writers to select L1 cognates.This is reflected in the monotonically decreasing height of the leftmost three columns in both figures.To rule out the possibility that this monotonicity is due to mere chance, we devised and ran a permutation test, tailored to the idiosyncrasies of the data and our hypothesis.The test uses a much larger portion of the data, as it does not require a minimum number of occurrences per synset.We now describe the test for Romance tendency; the Germanic tendency test is similar, with the obvious changes.The idea is to define a test statistic T, so that for each synset, T is "rewarded" (its value increases) according to the degree at which the writers' use of words from that synset is consistent with the monotonicity hypothesis.Thus, a large enough value of T supports the hypothesis, and the statistical significance of the test can be derived by comparing the value of T with the distribution of similarly calculated T values, under a suitable random permutation of the data.
Formally, let S 3 be the collection of all synsets whose words appeared in essays written by Romance authors of all three proficiency levels, and let S 2 be defined similarly, for synsets whose words appeared in only two proficiency levels.Synsets whose words appeared in essays of only one of the levels (or not at all) are not included in the analysis.
For a synset s ∈ S 3 , let t(s) be the number of inequalities that hold in the hypothesized relation RT(low, s) > RT(medium, s) > RT(high, s), i.e., For a synset s ∈ S 2 , define similarly where l 1 and l 2 are the two proficiency levels in which the words from s appear, and l 1 is the lower level of the two.We then define the test statistic: For example, consider again synset 79.The left side of Table 4 lists the Romance tendency corresponding to this synset, of Romance L1 writers from each of the three proficiency levels (computed from the entries of Table 3).In this synset we have RT(low, s 79 ) > RT(medium, s 79 ) > RT(high, s 79 ) (because 0.152 > 0.139 > 0.059), and therefore the contribution of s 79 to the statistic T is t(s 79 ) = 2.
Next, we randomly permuted the Germanic/Romance labels of all words in the dataset.We permute separately the labels in each synset s, in a manner similar to Fisher's exact test, i.e., while keeping the marginal label counts of s the same as in the original data, both across the proficiency levels (low / medium / high) and across the two etymological sources (Germanic / Romance).See the right side of Table 4 for an example.We generated 10,000 such permutations, and for each permutation i, calculated the corresponding statistic T perm i .Under a null hypothesis of no underlying monotonicity in the tendency to choose L1 cognates, the T statistic (based on the original, non-permuted data) is a random observation from the distribution of the {T perm i }.However, this null hypothesis is rejected (p < 0.0001), as the test statistic was T = 103 (based on 177 synsets), higher than all 10,000 T perm i values.See Figure 2 (right).We repeated the above procedure to analyze the Germanic tendency, and reached the same result: the test statistic was T = 77
We thus robustly established our main hypothesisnamely, that the tendency to use cognates diminishes as L2 proficiency increases.We now set out to evaluate our second question: does L2 writers' tendency to use cognates converge to the levels observed in native speakers?To investigate this, we repeated the described permutation test, this time including the native author group (from the LOCNESS corpus), thus resulting in a maximum of four levels of proficiency (low, medium, high, native).We calculated the Germanic and Romance tendencies calculated based on essays written by native writers, and compared the tendency of the high-proficiency non-native writers to that of the native writers.If the tendency of the high proficiency L2 writers was stronger, we added 1 to the value of T, as described above for the 3 non-native proficiency levels; if not, we added nothing to T. Finally, we again created 10,000 random permutations of T, including the native writers.Figure 3 shows the histogram of random T values for Germanic and Romance tendencies.
In the case of German and native writers, the observed T value (177), based on 183 synsets, is above the maximum value obtained over all 10,000 random permutations (162).This finding supports our hypothesis that the tendency of writers with German L1 to use words of Germanic origin decreases as their proficiency improves and grows more similar to that of native English speakers (Figure 3, left panel).
However, the same pattern is not observed in writers with Romance L1 backgrounds.Here, the observed T value achieved after including the native English writers is 178, based on 196 synsets.Figure 3 (right panel) shows the histogram of the random T values, and the observed T value based on the original texts is in the lower range of the random distribution.This means that when native English writers are included in the analysis, we no longer have evidence of convergence of non-natives to native writers.

Discussion
We investigated how traces of speakers' L1 can be detected in their L2 lexical selections in production, and in particular through the choice of cognates.Thus, we go beyond demonstrations of cognate facilitation in single word production (e.g., de Groot, 1992;Prior et al., 2007), and comprehension (e.g., Libben & Titone, 2009), and add to the growing literature showing that L2 users have a preference for using cognates when producing written text under natural conditions (Prior et al., 2011;Rabinovich et al., 2018).
Further, we also tested the hypothesis that the tendency to prefer cognates in L2 production weakens as writers' L2 proficiency increases.Some previous research has reported evidence for such an association between cognate facilitation and L2 proficiency (Kroll et al., 2002;Rosselli et al., 2014), but others have not found cognate effects to be modulated by L2 proficiency (Degani et al., 2018;Prior et al., 2017).
Here we offer a different perspective for examining this question, as we extend our investigation beyond the somewhat limited settings of laboratory experiments.Instead of analyzing the responses of a small number of participants requested to complete well controlled tasks, we used computational methods to process spontaneous productions of hundreds of writers.From this large and rich dataset, we calculated writers' tendency to prefer words which share etymology with their L1, and examine whether this tendency is modulated by their L2 proficiency.We also compared the tendencies of L2 writers to select alternatives from the two different etymological sources with those of L1 writers, finding  partial support for the hypothesis that with increasing L2 proficiency, L2 writers' lexical selections become more similar to those of L1 authors, though differences were found between the two non-native groups.
For the relatively small Germanic non-native group, consisting exclusively of German L1 writers, the results convincingly supported the hypothesis: the tendency to prefer words of Germanic over Romance source (the GERMANIC TENDENCY) was highest with the least proficient writers, and there was a significant decline in this tendency with increased proficiency.Compared to native English speakers, German L2 writers overuse the Germanic alternatives.
The Romance group included L1 speakers of French, Italian and Spanish.Analysis of the tendency to use Romance words was not fully consistent with the hypothesis.When examining only the L2 speakers, we found that the tendency to use Romance alternatives declined with increased proficiency, significantly more than would be expected by chance.In contrast, the Romance tendency was higher among L1 English speakers than among L2 English speakers at all three proficiency levels, in the genre of argumentative essays examined here.Thus, on the one hand, the comparison among the proficiency levels within the L2 writers supported our hypothesis that lower proficiency writers prefer English words of Romance origin; but the comparison between the L2 and the L1 writers did not support the hypothesis that the L2 writers will overall have a stronger preference for Romance source words than L1 writers.
We ascribe this unexpected finding, that L1 writers used more Romance alternatives than L2 writers whose native language is Romance, to the effects of register within the English language.Thus, the results show that the Germanic alternatives were used more than the Romance alternatives, for all groups included in the study.We suggest that the low usage rates of words with Romance origins can be partially explained by the higher register and lower frequency of English words from Romance origin (Bar-Ilan & Berman, 2007;Franceschi, 2019;Levin & Novak, 1991), and by the fact that they are learned later in life (Hernandez et al., 2021) compared to words of Germanic origin.The native speakers, who most likely have wider English vocabulary knowledge by virtue of greater exposure to the language, can more easily access such less frequent words.Further, recall that the LOCNESS corpus is a collection of essays written during academic exams, a setting in which writers naturally aim at selecting the higher-register, more formal words.However, this explanation contradicts the stronger Romance tendency of low-proficiency L2 writers, compared with the medium-and even more so with the high-proficiency L2 authors.Going back to the hypothesis, we speculate that the less proficient L2 writers use the higher register Romance alternatives more often, not by virtue of their vocabulary size in English, but rather due to CLI from their L1, and due to their limited exposure to English.The current results do not allow us to directly test this possibility, and we hope that future research may shed more light on this issue.
There are several possible explanations for the finding that higher proficiency in L2 results in reduced CLInamely, a weaker preference for cognates.First, because they are shared across languages, both L1 and L2 exposure contribute to the lexical frequency of cognates.As frequency distributions stabilize with increased L2 exposure, this impact of the frequency "boost" coming from L1 becomes smaller, because effects of frequency on lexical access are logarithmic in nature.Namely, frequency effects are sizeable at the low end of the distribution, but become smaller at high frequencies (Brysbaert et al., 2017;Diependaele et al., 2013;Kuperman & Van Dyke, 2013;Mor & Prior, 2020).Second, in a free writing task, participants have time to consider and monitor their word choice, with less immediate pressure than in many psycholinguistic tasks.Therefore, increased proficiency may lead to a subtler awareness of the appropriateness of the different lexical options in a way that more closely approximates that of L1 speakers.Finally, higher L2 proficiency may indicate enhanced language control, and specifically the ability of bilinguals to manage activation of the non-target language (Abutalebi & Green, 2007;Bonfieni et al., 2019;Costa & Santesteban, 2004;Declerck et al., 2020), i.e., the activation of L1 during L2 use.This again would lead to reduced CLI among more proficient L2 users.These effects are not mutually exclusive, and might be operating in concert to influence the final outcome.
Overall, the current results mostly show reduced CLI in lexical choice, exemplified by a preference for cognates, with increasing L2 proficiency, but previous behavioral studies did not always find L2 proficiency to modulate cognate effects (Degani et al., 2018;Prior et al., 2017).One possible reason for this difference is the current approach, of using large corpora and computational tools, for testing a psycholinguistic hypothesis.By using computational methods, we are able to analyze spontaneous texts written by both L1 and L2 writers, and combine other resources such as frequency lists that are based on very large corpora.Unlike traditional psycholinguistic experiments, we have very limited information on the individual writers.On the other hand, we are able to dramatically increase the number of participants (writers), and to include more L1s.The broad dataset, along with the more natural manner in which the writers express themselves, compared with a lab experiment, are most likely the cause for the differences found regarding the impact of proficiency on cognate effects.

Limitations and future research
The current study used much more extensive datasets compared to experimental behavioral studies.However, the datasets used here are still considered to be relatively small.The limited size of the dataset forced us, among other things, to go down from the level of the single writer and examine the research hypothesis at a lower resolution, aggregating texts written by different writers graded at the same proficiency level.Testing the hypothesis on larger corpora, where the amount of text contributed by each writer is much larger, will allow analysis of texts written by individuals.If, in addition, corpora will represent multiple levels of speaker proficiency from a wide range of native languages, the insights from this work can be taken further.Specifically, it would be interesting to repeat the experiment with other Germanic languages on a larger scale, and possibly with a wider range of L2 proficiency levels.Unfortunately, such corpora, that are tagged for L1 as well as for L2 proficiency level, especially when it comes to advanced writers rather than learners, are almost non-existent.Building such a dataset would be a great foundation for future research aiming to answer research questions related to the one in this study and others.Alternatively, developing a measure of L2 proficiency that is automatic, reliable, accurate and easy to calculate based on a given text sample, will achieve the same goal, and is likely to be useful for other purposes as well.
Finally, the target word set selected in the current study was based only on etymological information, and might have included English words that do not have direct cognates in some of the L1s.The question of how cognates are identified and defined, and specifically the relative weight of historical linguistic considerations vs. current overlap in form and meaning across languages, is still under debate (e.g., Batsuren et al., 2022).In the current study, the inclusion of English words that do not have a cognate in one or more of the L1s would operate against our ability to find a meaningful signal in the data, and thus does not impede us from reaching meaningful conclusions.However, future research might test different approaches to defining and selecting cognate synonym sets.

Conclusion
We demonstrated robust cognate effects in spontaneous L2 written production, specifically in the lexical choices made by writers.This finding significantly expands our understanding of the dynamics of CLI in a domain of language use that has received limited attention in psycholinguistic research.We further demonstrated that the effects of CLI diminish with increased L2 proficiency, adding important empirical evidence on naturalistic bilingual language use.Specifically, by adopting a computational approach and a large data set, we demonstrated an important finding on the impact of proficiency on CLI, though we cannot at this stage offer a definitive description of the underlying cognitive and linguistic mechanisms, which are ripe for future psycholinguistic investigation.We see the current research, therefore, as an example of how complementary methodologies from psycholinguistics and natural language processing can lead to fruitful generation and testing of hypotheses, to advance our understanding of CLI in bilingual language processing.

Figure 1 :
Figure 1: Germanic tendency (GT) of L1 German authors by proficiency and native English authors (Mean, SEM).

Figure 2 :
Figure 2: Romance tendency (RT) of Romance authors by proficiency and native English authors (Mean, SEM)

Figure 3 :
Figure 3: Histograms of the T perm i values, calculated from random permutation of the data, for Germanic (left) and Romance (right) tendencies.The arrows indicate the T values calculated from the original, non-permuted dataset.

Figure 4 :
Figure 4: Histogram of random T perm i values, representing Germanic (Left) and Romance (right) tendencies, when including data based on native author essays in the LOCNESS dataset.The arrows represent T values calculated with the original dataset.

Table 1 :
Text statistics by proficiency level and language family.
Table 3 lists the number of occurrences of the words in s 79 in the essays of Romance L1 writers, for each proficiency level.Then, for example, #(G(s 79 ), medium) = 358 + 40 = 398; #(R(s 79 ), medium)

Table 3 :
Number of occurrences of the words in synset 79, in essays of Romance L1 writers, from each proficiency level.

Table 2 :
Text size comparison between TOEFL and LOCNESS.

Table 4 :
Number of occurrences and Romance tendency of synset 79, for the original data (L1 Romance authors) and for an example random permutation.