Syntax and object types contribute in different ways to bilinguals’ comprehension of spatial descriptions

Abstract The world’s languages draw on different reference frames to encode spatial relationships between people, objects or places. We address how subtle differences in reference frame preferences across Spanish and English affect Spanish–English bilinguals’ interpretations of spatial descriptions involving the terms left and right. Bilinguals saw an entity (‘object’; e.g., a vase or a human) with a circle on either side, along with a description of the location of a ball relative to the object (e.g., The ball is to the right of the vase or The ball is on the vase’s right). Their task was to decide which circle indicated the ball’s location. Results showed that syntax and object type contributed differently to bilinguals’ responses: Effects of syntax patterned with Spanish preferences, whereas effects of object type patterned with English preferences. English language exposure subtly affected bilinguals’ response choices. Results are discussed with respect to experience-based theories of language processing.


Introduction
The majority of the world's population are bilingual (Crystal, 1992;Grosjean, 2010;Harris & McGhee, 1992;World Bank, 1995), but effects of bilingualism on cognition are still poorly understood. Bilinguals have learned to describe the world using more than one linguistic system. Such systems often differ beyond simple translation. For instance, if one language has one preposition to cover the range of three in another language, learners initially make predictable usage errors. For example, the Spanish preposition en corresponds to the English prepositions in, on or at in different contexts, such that native Spanish speakers may initially produce errors in English, such as Put the apple in the table (Moore & Marzano, 1979). However, differences across languages may be preferential rather than right or wrong. For example, concerning the description of spatial scenes, what English and Spanish speakers can express in their languages is generally similarbut what they actually do may be very different. As a result, a spatial description such as The ball is to the right of the woman when the speaker is facing the woman's front is likely to be interpreted in opposite ways by native English and Spanish speakers (Olloqui-Redondo et al., 2019). Specifically, native English speakers are likely to use their own perspective, whereas native Spanish speakers tend to use the woman's perspective. Such differences across languages open up new opportunities to explore how bilingualism affects spatial cognition. Inspired by these recent findings, in the current study, we ask how Spanish-English bilingual speakers interpret such scenes.

Spatial frames of reference
The world's languages draw on different reference frames to encode spatial relationships between entities (i.e., objects, people or places). Levinson (2003) (see also Danziger, 2010) famously classified these as INTRINSIC (object-or person-centred), RELATIVE (viewer-centred) and ABSOLUTE (geographical; see Jackendoff, 1996), and in this article, we follow this classification. Intrinsic reference frames are cases in which one entity (which we will call the LOCATUM) is described in relation to the intrinsic properties of another entity (the RELATUM; see Tenbrink, 2011). Bohnemeyer et al. (2015) list several subcategories that fall into Levinson's (2003) classification of intrinsic. These differ in terms of who or what the relatum is and the types of projective terms used. For instance, in the egocentric intrinsic description The box is in front of me, the position of the locatum box is described relative to the relatum me, that is, the speaker's own position. Of importance for the current study are allocentric intrinsic descriptions, such as The box is in front of the chair, where the position of the locatum box is described relative to the relatum the chair. Importantly, this is only possible because the speaker and the chair (the anchors in Danziger's, 2010, terminology) have an intrinsic front. Without further context, it would not make sense to exchange the relatum me or the chair in the above sentences by a relatum like the ball, because a ball does not normally provide any basis for deciding a 'front' side.
However, the sentence The box is in front of the ball is indeed meaningful from an external perspective, which allows interpretation on the basis of a relative reference system. This presupposes a third element in the overall configuration beyond locatum and relatum, namely some external perspective (anchor in Danziger's, 2010, terminology) or view direction (as opposed to intrinsic interpretations, where the relatum also provides the perspective). If Peter stands apart from the box and ball, he can say that The box is in front of the ball from his point of view. Likewise, the speaker's or listener's perspective can be used to provide a view direction in a relative reference system, as in The box is in front of the ball from my/your point of view.
A third option is to use a fundamentally different concept, associated with distinct vocabulary, namely a reference frame that relies on external and immovable, absolute directionssuch as the cardinal directions indicated by a compass. Speakers from Western societies would normally not be inclined nor have the knowledge required to state, for instance, that The box is north of the ball, particularly in indoor scenarios (Tversky et al., 1997); they primarily use compass directions to describe larger places in relation to each other, as in an utterance like Hamburg is northwest of Berlin. However, in other cultures, the absolute reference frame is more widely used, with speakers of numerous Australian indigenous languages traditionally preferring absolute frames of reference across most scenarios (Boroditsky & Gaby, 2010;Haviland, 1998).
The reference frame choices introduced above draw on different projective terms. Absolute and some subtypes of intrinsic reference frames (i.e., landmark-based and geomorphic, together also called geocentric; Bohnemeyer et al., 2015) employ cardinal directions (e.g., north/south) or geographic landmarks (e.g., downriver and mountainward). In contrast, egocentric intrinsic, allocentric intrinsic and relative reference frames use projective terms like above/below (vertical axis), front/back (sagittal axis) and left/right (lateral axis). Importantly, there is evidence that the lateral axis, that is, left/right, which is the focus of the current study, is more cognitively challenging than the other two axes, and this might influence reference frame choices. Specifically, children have greater difficulties in acquiring left and right compared with front and back (Shusterman & Li, 2016), and a substantial minority of healthy adults even in Western cultures that make abundant use of these terms have difficulties identifying left and right (van der Ham et al., 2020). Franklin and Tversky (1990) suggest that this difficulty stems from the lateral axis' lack of salient asymmetries (see also Pederson, 2006). In contrast, gravity provides a salient asymmetry for the vertical axis (above/below) and perception does so for the sagittal axis (front/ back), as we can typically see the part of entities facing us, that is, the front. In line with this, some languages, such as the Australian languages Wagiman (Palmer et al., 2021), Eastern Arrernte (Wilkins, 2006), Jaminjung (Schultze-Berndt, 2006) and Warrwa (McGregor, 2006), have projective terms for the sagittal axis, but not for the more challenging lateral axis. Furthermore, in some languages, such as Tamil (Pederson, 2006), MalakMalak (Palmer et al., 2021) and Yélî Dnye (Levinson, 2006), the relative frame of reference is more frequently used for the sagittal axis compared with the more challenging lateral axis (cf. Marghetis et al., 2020;Pitt et al., 2021). Interestingly, modern trends, such as globalisation, the spread of English and urban city environments appear to reinforce the spatial terminology corresponding to left and right, as used in egocentric intrinsic, allocentric intrinsic and relative reference systems (cf. Bohnemeyer et al., 2015;Cerqueglini, 2022;Pederson, 1993).
Much research has addressed the use of reference frames around the world, both prior to and following up on Levinson's (2003) seminal cross-linguistic and crosscultural work on space in language and cognition. However, while languages and cultures have been frequently discussed with respect to the preferred systems they use, and with respect to how such preferences affect speakers' thinking, there is less research on how speakers of more than one language deal with the fact that different conceptual reference frames are prioritised in each of their languages. To this, we turn next.

Reference frame selection in bilinguals
Abundant literature explores the connection between bilingualism and the development of spatial cognition (e.g., Greenberg et al., 2013;Ryskin et al., 2014). The study of spatial cognition in bilinguals is particularly interesting as it involves linguistic preferences rather than clear-cut differences that just need to be learneda target of most previous bilingualism research, often highlighting effects of one language's system (L1) on systematic errors made in the other (L2). Such L1 transfer (Gass & Selinker, 1992) may, for example, result in native Spanish speakers producing errors such as I washed the hands in analogy to the equivalent Spanish sentence Me lavé las manos (literally: To me I washed the hands; Moore & Marzano, 1979).
However, it is not just the L1 that can influence the L2. Conversely, the L2 can also affect the L1, especially after years of living in an L2 environment, in a process called first language attrition (Schmid, 2013). Various empirical studies have reported such an L2 influence on the L1 in bilinguals performing linguistic tasks dealing with space. For instance, Brown and Gullberg (2011) found that Japanese learners of English used more adverbials to encode the path in motion events (e.g., down instead of descend) in their L1 than Japanese monolinguals. This was attributed to an influence of English, which tends to express path through verb particles, on Japanese, which prefers to express path in the verb root. Similarly, a number of authors have proposed bilingualism as a factor affecting perspective switches (e.g., Polian & Bohnemeyer, 2011;Romero Méndez, 2011), but did not address this issue directly.
Various studies have observed reference frame variability in Spanish bilinguals, often focusing on Mesoamerican communities where Spanish coexists with different indigenous languages. In these contexts, Mesoamerican languages, in which geocentric reference systems are favoured, interact with Spanish, where conceptualisations based on (egocentric and allocentric) intrinsic or relative reference systems are preferred. For example, Chi Pech (2021) found that Mayan monolingual and Mayan-Spanish bilingual children overwhelmingly used geocentric reference frames, with a higher percentage of geocentric responses for the monolinguals compared with bilinguals. In addition, the bilingual children had more geocentric responses when instructed in Mayan compared with Spanish. Similarly, Marghetis et al.'s (2014) (see also Marghetis et al., 2020) study on Isthmus Zapotec-Spanish bilinguals in Juchitán (Oaxaca, Mexico) considered relative and absolute reference frames and found a slight overall preference for relative reference frames in both of the bilinguals' languages, with significantly more relative responses for participants with better comprehension of projective terms like left and right compared with participants with lower comprehension. In contrast, Pérez Báez (2011) found that Isthmus Zapotec-Spanish bilinguals in La Ventosa, a community just 15 km north-east of Juchitán, vastly dispreferred relative reference frames. Marghetis et al. (2014) suggested that the striking difference between the two studies involving Isthmus Zapotec-Spanish bilinguals could be explained in terms of the dissimilar topography that speakers from these communities experience (cf. Moore, 2018). Specifically, La Ventosa residents travel more often and find themselves in locations where the horizon is visible more often than Juchitán residents, which might facilitate geocentric reference frame choices. This explanation is in line with Shapero (2017), who found that experience with the surrounding landscape affected reference frame choices in Quechua-Spanish bilinguals.
The above studies shed light on the flexibility that bilingual speakers display to perform different spatial tasks, although the factors driving linguistic choices are still inconclusive. However, to the best of our knowledge, no research is available on reference frame selection in bilinguals whose languages differ more subtly in terms of reference frame selection, such as Spanish-English bilinguals. In both of these languages, (allocentric) intrinsic and relative reference frames are preferred over absolute systems; however, their preferences and interpretations differ, as we will explain in more detail below. Previous research on Spanish-English bilinguals indicated that preferences in the language of the participants' environment systematically affected the interpretation of ambiguous sentences more than preferences in their first language did (Dussias & Sagarra, 2007). However, little is known about reference frame selection in Spanish-English bilinguals in Spanish versus English language contexts, which will also be addressed in this study.

Experience-based theories of language learning and processing
Evidence suggests that experience may shape reference frame choices. Interlocutors adapt to each other's reference frame choices (Coventry et al., 2018) and the comprehension of reference frame choices is subject to priming effects, such that comprehenders process primed, that is, recently experienced, reference frames more accurately (Johannsen & De Ruiter, 2013). Adaptation to both recent and long-term experience can occur in other domains, for example, in terms of vocabulary (Chaouch-Orozco et al., 2021), syntactic structures (Jaeger & Snider, 2013) or ambiguity resolution (Dussias & Sagarra, 2007), both within and across languages (Loebell & Bock, 2003;Nitschke et al., 2010). It is thus possible that extended language experience also affects bilinguals' frame of reference choices.
The idea that bilinguals' two languages and language exposure patterns may affect reference frame choices is compatible with experience-based theories of language acquisition and processing (e.g., Bates & MacWhinney, 1987;Cuetos et al., 1996;Jurafsky, 1996;Levy, 2008;Mitchell et al., 1995). Specifically, experience plays an important role in certain rational and implicit learning accounts of language acquisition and use (Chang et al., 2006;Jaeger & Snider, 2013). These accounts suggest that prediction error drives learning. When processing language, listeners predict upcoming linguistic material and adjust their linguistic representations in response to mismatches between the predicted and the actually experienced input. Such inputdriven adjustments reduce future prediction errors by best matching predictions to the encountered input. The more unpredicted the input that a listener encounters, the larger a prediction error it generates. This has several implications. First, each encounter with unpredicted input increases the possibility that this input will be predicted in the future and lowers the prediction error generated the next time the unpredicted input is encountered. Second, encountering highly unpredicted input leads to increased production of that input, such that, for example, syntactic priming effects are larger for very rare compared with less rare syntactic structures (Hartsuiker & Westenberg, 2000). In short, more unpredicted input leads to greater adaptation or learning.
The same considerations may apply to spatial language, such that listeners may predict certain frames of reference and adjust their spatio-linguistic representations when the experienced reference frame does not match the one they predicted. The concrete predictions that experience-based accounts would make in this case depend on which factors influencing reference frame choices listeners are likely to predict. For example, reference frame prediction may be sensitive only to overall frequencies of the encountered reference frames, and not additionally to the syntactic construction that the interlocutor used or the type of relatum in the spatial scene.

The current study
Various authors (e.g., O'Meara, 2011;Pérez Báez, 2011) have pointed to a need for assessing the role of bilingualism in reference frame selection, and this study aims to address this gap. The current study follows on directly from Olloqui-Redondo et al.'s (2019) work with Spanish and English monolinguals, and we therefore briefly present their main results here (Olloqui-Redondo et al.'s, 2019, data and scripts are available at https://osf.io/krzqd/). Olloqui-Redondo et al.'s (2019) study (as well as ours) concerns only the lateral axis (left/right) and only allocentric intrinsic and relative reference frames. As both studies concern only one subtype of intrinsic reference frames, we will refer to it as intrinsic throughout this article. Fig. 1, adapted from Olloqui-Redondo et al. (2019), shows that the intrinsic reference frame was overwhelmingly preferred with possessive descriptions (e.g., on the car's left or on its left/a su izquierda) in Spanish and English. Interestingly, with non-possessive descriptions (e.g., to the left of the car/a la izquierda del coche), most Spanish speakers adopted the intrinsic reference frame with  Figures 4 and 3, respectively: percentage of responses using a relative vs. intrinsic frame of reference for each object type and for the non-possessive (non) and possessive (poss) conditions. The numbers below the bars represent the percentage of relative reference frame choices. animate relata (e.g., a person or animal) or with inanimate relata that represented an animate entity (anthropomorphic entities, such as a statue), but preferred the relative reference frame with other inanimate relata with or without intrinsic sides (e.g., a chair or a vase). This led to what the authors called a categorical effect, where speakers' overall preference for non-possessive descriptions flips from an intrinsic reference frame for animate and animate-like entities to a relative reference frame for sided and unsided inanimate entities. In contrast, English speakers consistently used their own perspective in the vast majority of non-possessive trials, and animacy affected reference frame choices only gradually, with a gradual decrease in relative reference frame choices from inanimate to animate relata.
To account for this difference, the authors highlight two factors. First, English has two unmarked constructions to express static lateral configurations through attributive possession, with each construction apparently linked to a particular reference frame: intrinsic for the possessive construction (e.g., on the car's left), and relative for the nonpossessive construction (e.g., to the left of the car). In contrast, Spanish has only one unmarked construction to express static lateral configurations through attributive possessionthe non-possessive construction (e.g., a la izquierda del coche; literally: on the left of the car). Instead, the Spanish possessive construction (e.g., a su izquierda; literally: on its left) is marked in the sense that it can only be used with a possessive adjective that refers back to a previously mentioned relatum (as in, e.g., Veo un coche. La pelota está a su izquierda; literally: I see a car. The ball is on its left). Second, the preference for the intrinsic reference frame in Spanish but not English monolinguals may be related to the higher number of syntactic structures affected by inalienable possession in Spanish, which has been widely attested (e.g., Kliffer, 1983;Nieuwenhuijsen, 2008). Thus, animate-like relata may prompt the use of the intrinsic reference frame in static lateral configurations because the lateral side expressed by the projective term (i.e., left or right) is understood as an inherent and inalienable element of the relatum when it has animate-like attributes. Hence, both projective terms izquierda 'left' and derecha 'right' belong to the relatum rather than to the observers.
These results raise the question as to which reference frame Spanish-English bilinguals adopt when interpreting static lateral descriptions with animate and animate-like relata (particularly with non-possessive descriptions). The current study addresses this. It tests Spanish-English bilinguals' reference frame choices for static lateral descriptions in both Spanish (the speakers' L1) and English (the speakers' L2), using the same experimental design as Olloqui-Redondo et al. (2019), who tested Spanish and English monolinguals. To account for the effect of language environment, we test two groups of Spanish-English bilinguals, those residing in Spain, that is, in an L1 environment, and those residing in the UK, that is, in an L2 environment.
Based on previous studies showing L1 transfer in L2 acquisition of spatial language (Coventry et al., 2010), Spanish-English bilinguals may transfer their native Spanish reference frame preferences to their L2 English. In this case, we would expect the Spanish-English bilinguals to pattern like monolingual Spanish speakers in both of their languages, that is, to show an overall preference for the intrinsic reference frame, except when a non-possessive construction is used and the relatum is neither animate nor anthropomorphic. However, other findings have demonstrated a strong influence of language exposure, for instance, on Spanish-English bilinguals' preferred interpretations of globally ambiguous sentences (Dussias & Sagarra, 2007). Therefore, participants' reference frame choices might instead pattern according to the language of the environment. In this case, we would expect the Spanish-English bilinguals residing in Spain to pattern like monolingual Spanish speakers in both of their languages. In contrast, the Spanish-English bilinguals residing in the UK would then pattern like monolingual English speakers in both of their languages, that is, show a strong preference for the relative reference frame for the non-possessive construction and a strong preference for the intrinsic reference frame for the possessive construction, which are both affected by animacy in a gradual manner. Such an effect of residence on reference frame selection would suggest that listeners predict reference frames considering both syntactic construction and relatum type. In the case of the Spanish-English bilinguals residing in the UK, finding an English-like pattern in their Spanish reference frame choices would also constitute a case of first language attrition (Schmid, 2013) with respect to preferred reference frame choices.

Participants
A total of 94 Spanish-English bilinguals were included in the study. All participants filled in the bilingual language profile (BLP): Spanish-English (Birdsong et al., 2012). Fifty-one of the participants (37 female, 14 male; mean age = 30.57, SD = 9.43) resided in Spain at the time of the study and comprised the Spain group. The vast majority of them resided in urban areas, with over half living in Córdoba, Zaragoza, Madrid and Logroño. All participants in this group reported having lived in a Spanish-speaking country for 19 years or over. Participants in the Spain group had previously lived in an English-speaking country on average for 1.29 years (SD = 1.50; range 0-5). The remaining 43 participants (36 female, 7 male; mean age = 30.18, SD = 7.68) resided in the UK at the time of the study and comprised the UK group. The vast majority of them also resided in urban areas, with over half living in London, Oxford and Birmingham. All but seven participants in this group reported 20þ years of previous residence in a Spanish-speaking country. Participants in the UK group had lived in an English-speaking country on average for 6.33 years (SD = 5.34; range 1-19).
All participants reported speaking Spanish since birth. Eight participants in the Spain group and five participants in the UK group reported learning English at or before age 3, with the remaining participants first learning English from age 4 to into their 20s.
Overall, participants were very highly educated, with all but one participant in the Spain group having attended university for some time and with 82.35% of participants in the Spain group and 81.40% in the UK group having received a Master's, diploma or PhD degree. Based on a Likert scale that participants used to rate their language proficiency in Spanish and English, all participants were highly proficient in both languages (Table 1). The ratings did not differ significantly across groups for all comparisons. While these non-significant results do not mean that the groups are equal, they do suggest that the groups are relatively similar in terms of language proficiency. Furthermore, participants in the Spain group assigned themselves numerically better ratings for English than the UK group across the board. This means that any differences in the study's outcomes are unlikely to be attributed to higher proficiency in the UK group's L2 (English), as might be speculated. It should be noted, however, that self-ratings, as used in the BLP, are subjective and may thus be somewhat less reliable than a proficiency test. For example, the participants residing in the UK may have been more likely to compare themselves with native speakers, with whom they have daily interactions, and may thus have underestimated their English language skills.
Participants reported their language use with friends, family and at work/school for Spanish, English and other languages during an average week, with individual ratings ranging from 0 to 100%. Since the BLP allows participants to enter percentages that do not add up to 100%, we converted these percentages into proportions for each context by dividing each individual percentage by the sum of the percentages for Spanish, English and other languages. This ensured that reported language use within each context added up to one. For example, if a participant reported 60% Spanish use, 60% English use and 30% use of other languages with friends during an average week, which adds up to 150%, we then divided each value by 150. This yields 60/150 = 0.4, 60/150 = 0.4 and 30/150 = 0.2, which adds up to one. As expected, participants in the Spain group had significantly higher Spanish language use across the board, that is, with friends, family and at work/school (Table 2). In contrast, participants in the UK group had significantly higher English language use across the board. Use of other languages was generally low and did not differ significantly across the two countries of residence.
To summarise, participants in both groups mostly lived in urban environments, were very highly educated and reported similar Spanish and English proficiency. Note. Ratings range from 0 = not well at all to 6 = very well. t-Tests compare ratings across groups. p-Values are corrected for multiple comparisons using a false discovery correction (Benjamini & Hochberg, 1995). Participants in the Spain group spent less time in an English-speaking country and used Spanish more and English less than participants in the UK group. An additional nine participants were excluded from the study. One participant responded correctly to only 67% of filler trials, suggesting that this participant may not have been paying sufficient attention. In contrast, the participants included in the study answered filler questions with very high accuracy (Spain group: mean = 95.92%, SD = 19.80; UK group: mean = 95.83%, SD = 18.99). Eight further participants were excluded either because they did not report having acquired Spanish from birth (N = 6) or because they represented outliers in terms of length of residence in the UK (N = 2).

Materials
The experiments were identical to those in Olloqui-Redondo et al. (2019). Materials were spatial scenes involving an avatar facing an entity that served as relatum (see Fig. 2). The entities used as relatum fell into five animacy categories, that is, object types, in order to evaluate the impact of animacy on the participants' frame of reference choices. We used an animacy hierarchy based on Rosenbach's (2008) scale of INANIMATE < ANIMATE < HUMAN. Following Olloqui-Redondo et al. (2019), the inanimate category was further refined by incorporating two extra criteria, sidedness and anthropomorphism, yielding the following order of inanimate categories from least to most human-like: UNSIDED < SIDED < ANTHROPOMORPHIC. Based on these criteria, object types fell into the categories shown in (1).
(1) Object types from least to most human-like: unsided:sides,anthropomorphic,animate,human (e.g., a vase) sided: þ sides,anthropomorphic,animate,human (e.g., a car) anthropomorphic: þ sides, þ anthropomorphic,animate,human (e.g., a statue) animate: þ sides,anthropomorphic, þ animate,human (e.g., a dog) human: þ sides, þ anthropomorphic, þ animate, þ human (e.g., a woman) For each of the five object types, photographs of six different objects were selected, for a total of 30 different objects (see https://osf.io/hmn2q/ for a list of all objects). On both sides of the relatum were blue circles representing two balls (A and B) and showing the possible locations of the locatum.
A speech bubble with a spatial description was shown next to the avatar. The spatial descriptions used involved either a non-possessive construction (such as the English I see a vase. The ball is to the right of the vase or the equivalent Spanish construction Veo una vasija. La pelota está a la derecha de la vasija) or a possessive construction (such as the English I see a vase. The ball is on the vase's right or the similar Spanish construction Veo una vasija. La pelota está a su derecha, literally I see a vase. The ball is to its right; see Olloqui-Redondo et al., 2019, for additional information about these constructions). Following Olloqui-Redondo et al. (2019), linguistic construction was a between-participant factor (non-possessive condition and possessive condition), with half of the participants being exposed to only the non-possessive construction and the other half to only the possessive construction. In both conditions, half of the instructions involved the use of left and right, respectively.
In contrast, object type and language were within-participant factors, such that all object types were shown to all participants and all participants took part in the study in both Spanish and English. Hence, the experiment had a 5 (object type) Â 2 (linguistic construction) Â 2 (language) design. Apart from the 30 target scenes, each experiment included 60 unambiguous filler scenes using projective terms that involved the frontal (e.g., behind) and vertical (e.g., above) axes (e.g., I see a bucket. The ball is behind the bucket or I see a bucket. The ball is on the bucket's back).

Procedure
The experiments were created using OpenSesame 2.9.6 (cf. Mathôt et al., 2012). Due to the COVID-19 pandemic, the experiments had to be administered in a web-based format. To do so, we used the OSWeb extension in OpenSesame, which allows exporting experiments in a zipped format that can then be imported into JATOS, a system that manages web-based experiments and generates links that can be distributed to participants.
Prior to the experiment, participants were asked to sign a consent form and do a 2-minute Skype call with one of the two experimenters for a brief introduction, which also served to corroborate their high proficiency in their L2 English. The experiment then consisted of two web-based sessions via Skype, one in English and one in Spanish, about a week apart. Half of the participants took part in the English version first, and the other half took part in the Spanish version first. Each experimental session lasted about 40 minutes. Efforts were made to ensure that the web-based experiments were as similar to a lab-based study as possible. Hence, in the first Skype session, experimenters greeted participants and explained the task. To ensure that participants had understood the task, they received written and spoken instructions and completed one practice trial. The experimenter then sent participants the link to the experiment. Participants were asked to end the Skype call, complete the experiment and then call back the experimenter, who would be waiting for them to finish the experiment. This was done to encourage participants to perform the task in one session without interruption. During each trial of the experiment, the participants' task was to decide whether the locatum (the ball) was in location A or B (see Fig. 2) based on their interpretation of the spatial description in the speech bubble. To choose location A, they had to press key A on their keyboard, and to choose location B, they had to press key B. Stimuli were presented in three blocks, each containing a set of 30 pictures, for a total of 90 pictures. Each block comprised 10 (two per object type) target scenes and 20 filler scenes, randomised within a block. Participants could take a short break between each block. When participants called back the experimenter to report that they had finished the experiment, they were reminded of the second experiment taking place the following week.
For the second Skype session, the same steps were followed, except for the end of the session. After completing the experiment, the experimenter sent participants their participant number and the link to the Español-Inglés (Spanish-English) version of the BLP (Birdsong et al., 2012), a questionnaire that produces a general bilingual profile and considers a variety of linguistic variables, including age of acquisition, places of residence and L1 and L2 language use. Participants were told to fill in the questionnaire and to send a message through Skype chat if they had any questions and when they had finished the questionnaire. Once participants had finished the questionnaire, the experimenter thanked them for their participation through Skype chat.

Analysis
We used mixed logit models for the statistical analysis, which are appropriate for binary response variables (i.e., intrinsic vs. relative reference frame; cf. Baayen, 2008). The appropriate statistical models were determined through model comparisons in R (R Core Team, 2019). The full model included sentence construction (possessive vs. non-possessive), object type (five levels from unsided to human), country of residence (Spain vs. UK) and all interactions as fixed effects. All fixed effects were centred to minimise collinearity and sum-coded for analysis-of-variance-style main effects. The full model also included random intercepts for participant and item and random by-participant and by-item slopes for the within-participant factor object type (cf. Winter & Wieling, 2016). Model comparisons then determined the optimal model. Specifically, random factors that did not reliably contribute to model fit were removed from the full model, starting with the random effect with the smallest variance. If a model did not converge, the random effects structure was further simplified until the model converged. Then, fixed factors that did not reliably contribute to model fit were removed from the model, starting with the fixed effect with the smallest absolute t-value. Data and R scripts for this article are available at https://osf.io/hmn2q/.

L1 Spanish
We first investigated whether object type and sentence construction affected reference frame choices in Spanish, participants' native language. Fig. 3 shows the reference frame choices for the different object types and syntactic constructions for participants residing in (a) Spain and (b) the UK, respectively. The figure shows that participants residing in both countries seem to prefer the intrinsic reference frame overall, with higher intrinsic reference frame choices for the possessive compared with the non-possessive construction.
The final statistical model contained sentence construction, object type, residence and the object type by residence interaction as fixed effects and random intercepts for   Table 3a. We find a significant main effect of construction type, such that there were significantly more intrinsic reference frame choices for the possessive compared with the non-possessive construction. In addition, there was a significant main effect of object type, and the object type by residence interaction was significant. The factor object type has five levels, requiring post-hoc comparisons to determine for which object types reference frame choices differ significantly from each other. As object type significantly interacts with residence, we will be reporting these post-hoc tests separately for each country of residence using the emmeans package in R (Lenth, 2020). The statistically significant comparisons are shown in Table 3b for participants residing in Spain and in Table 3c for participants residing in the UK. The results in Table 3b show that for participants in Spain, unsided relata had significantly fewer intrinsic reference frame choices than all other object types, and that human relata had significantly more intrinsic reference frame choices than all other object types. Thus, the endpoints on the object type continuum differ significantly from other object types. The results in Table 3c show that for participants in the UK unsided relata also had significantly fewer intrinsic reference frame choices than all other object types, but no effects were found for human relata. Thus, only one of the endpoints on the object type continuum differs significantly from other object types.
We report the marginal and conditional R 2 value for generalised linear mixed effects models (R 2 GLMM ; Johnson, 2014;Nakagawa et al., 2017;Nakagawa & Schielzeth, 2013) to gauge the effect size of our final statistical model. The marginal R 2 GLMM captures the variance explained by a model's fixed factors, and the conditional R 2 GLMM captures the variance explained by a model's fixed and random factors. The marginal R 2 GLMM is 0.10, suggesting that 10% of the variance in reference frame selections can be explained through the fixed factors in the model. The conditional R 2 GLMM is 0.79, suggesting that 79% of the variance in reference frame selections can be explained through the fixed factors and random factors in the model. Overall, a substantially larger percentage of the variance in reference frame selection can be explained through the random effects of participant and item than through the fixed effects.

L2 English
We then investigated whether object type, sentence construction and residence affected reference frame choices in English, participants' non-native language. Fig. 4 shows the reference frame choices for the different object types and syntactic constructions for participants residing in (a) Spain and (b) the UK, respectively. Similar to the Spanish data, both groups seem to prefer the intrinsic reference frame overall, with higher intrinsic reference frame choices for the possessive compared with the non-possessive construction. The final statistical model again contained sentence construction, object type, residence and the object type by residence interaction as fixed effects and random intercepts for participants and items. The results from the final statistical model are shown in Table 4a. Again, we find a significant main effect of construction type, such that there were significantly more intrinsic reference frame choices for the possessive compared with the non-possessive construction. In addition, there was a significant main effect of object type, and the object type by residence interaction was significant. As object type significantly interacts with residence, we will again report the post-hoc tests for object type separately for each country of residence. The statistically significant comparisons are shown in Table 4b for participants in Spain and in Table 4c for participants in the UK. The results in Table 4b show that for participants in Spain, unsided relata had significantly fewer intrinsic reference frame choices than all other object types, and that human relata had significantly more intrinsic reference frame choices than all other object types. Thus, the endpoints on the object type continuum differ significantly from other object types for the participants residing in Spain. This mirrors the pattern we found in the Spanish data for bilinguals residing in Spain. In addition, the object types adjacent to each of the endpoints (sided and animate) also differed significantly from each other, a result that is consistent with the idea of a gradual increase in intrinsic reference frame choices as relata get more human-like.
The results in Table 4c show that for participants in the UK unsided relata also had significantly fewer intrinsic reference frame choices than all other object types, but no general effect was found for human relata, which only had significantly more intrinsic reference frame choices compared with unsided and sided relata, that is, the two types of relatum nearest the other end of the animacy continuum. Thus, only one of the endpoints on the object type continuum differs significantly from all other object types. This differs in slight detail from the pattern we found in the Spanish data for bilinguals residing in the UK.
We also report the marginal and conditional R 2 GLMM to gauge the effect size of our final statistical model. The marginal R 2 GLMM is 0.13, suggesting that 13% of the variance in reference frame selections can be explained through the fixed factors in the model. The conditional R 2 GLMM is 0.83, suggesting that 83% of the variance in reference frame selections can be explained through the fixed factors and random factors in the model. Again, a substantially larger percentage of the variance in reference frame selection can be explained through the random effects of participant and item than through the fixed effects.

Comparison of languages
Our final analysis compares the results across Spanish and English, the two languages in which participants completed the experiment. The statistical analysis was the same as described in the Methods section, except that language (English vs. Spanish) was added as a factor to the fixed effects structure. The final statistical model contained no random intercepts or slopes and sentence construction, object type, residence and language as fixed effects. In addition, the model also contained the sentence construction by object type by residence interaction and all two-way interactions involving these three factors as fixed effects. The results from the final statistical model are shown in Table 5. These results partially mirror the previous ones in that, as before, we find significant main effects of construction type and object type as well as a significant object type by residence interaction. In addition, we find significant main effects of language, such that there were significantly more intrinsic reference frame choices in English than in Spanish, and of residence, with significantly more intrinsic reference frame choices for participants residing in Spain compared to the UK. Finally, the sentence construction by object type and the sentence construction by object type by residence interactions were significant. Detailed results from posthoc simple contrasts using the emmeans package in R are available at https://osf.io/ hmn2q/. The marginal R 2 GLMM for the final statistical model is 0.15, suggesting that 15% of the variance in reference frame selections can be explained through the fixed factors in the model.

Discussion
We tested whether Spanish-English bilinguals' reference frame choices for static lateral configurations varied based on (a) the syntactic construction used to describe the spatial relationship, (b) the kind of entity functioning as relatum, (c) the language of the task (Spanish or English) and (d) the country of residence of the participants (Spain or the UK). The Spanish-English bilinguals in this study engaged in the exact same tasks as did monolingual English and monolingual Spanish speakers in Olloqui-Redondo et al. (2019). Our discussion therefore focuses on the role of syntactic construction, object type, task language and country of residence on Spanish-English bilinguals' reference frame choices as well as implications of our results for theories of language learning and processing. When relevant, we present numeric comparisons of our results with Olloqui-Redondo et al.'s (2019) results. These comparisons are not based on statistical analyses (as sample sizes across the two studies are too different and as we collected the offline judgement and questionnaire data in a web-based format rather than in person), but instead are tentative numeric comparisons focusing on qualitative patterns of results.
The results from the current study across the two languages and countries of residence were surprisingly similar. Specifically, we found more intrinsic reference frame choices for the possessive construction compared with the non-possessive construction regardless of language and country of residence. In participants' L1 Spanish, 68% of reference frame choices were intrinsic for participants residing in Spain and 63% for participants residing in the UK. Similarly, in participants' L2 English, 71% of reference frame choices were intrinsic for participants residing in Spain and 64% for participants residing in the UK. Compared with Olloqui-Redondo et al.'s (2019) results, these percentages fall between the values for Spanish and English monolinguals: Spanish monolinguals chose the intrinsic reference frame 78% of the time and English monolinguals 52% overall. In addition, object types representing one or both of the endpoints of our animacy continuum differed significantly from other object types, again regardless of language and country of residence. In Olloqui-Redondo et al.'s (2019) data for monolingual English speakers, differences across object types only involved the endpoints of the animacy continuum, but not for monolingual Spanish speakers, where reference frame choices were more categorically affected by animacy of object types. We thus find that syntax and object types contribute in different ways to bilinguals' comprehension of spatial descriptions. In the following, we will explore our results in more detail and consider implications for theories of language learning and processing.

The role of syntactic construction
In monolingual English, syntactic construction is the main factor driving reference frame choices in English, with the possessive construction suggesting an intrinsic reference frame and the non-possessive construction the relative reference frame (Olloqui-Redondo et al., 2019). In contrast, Spanish monolinguals show an overall preference for the intrinsic reference frame regardless of syntactic construction, and tend to choose the relative reference frame only if certain non-animate object types co-occur with the non-possessive construction.
Our results show that with regard to the influence of syntactic construction, Spanish-English bilinguals pattern fairly closely with Spanish monolinguals across both of their languages and countries of residence, with an overall preference for intrinsic reference frames regardless of syntactic construction. The main difference is that Spanish-English bilinguals showed fewer intrinsic reference frame choices overall compared with Spanish monolinguals.
In contrast, the Spanish-English bilinguals do not pattern with English monolinguals with regard to syntactic construction. Spanish-English bilinguals in both countries of residence and in both their languages had overall somewhat fewer intrinsic reference frame choices than English monolinguals for the possessive construction. Crucially, with the non-possessive construction, they showed considerably more intrinsic reference frame choices than English monolinguals such that the monolinguals' clear preference for the relative reference frame was not found in bilinguals.
In summary, we find that syntactic construction affects reference frame choices in Spanish-English bilinguals' Spanish and English in a similar manner as in Spanish monolinguals. We thus find evidence of L1-to-L2 transfer: The Spanish-English bilinguals transfer their construction-driven L1 Spanish reference frame choices to reference frame choices in their L2 English. In other words, with regard to syntactic construction, participants use their L1 Spanish preferences when interpreting spatial scenes in their L2 English.

The role of object type
In monolingual Spanish, object type plays a larger role in reference frame selection than in English (Olloqui-Redondo et al., 2019), such that Spanish monolinguals showed an overall preference of 75% or more for the intrinsic reference frame when encountering the non-possessive construction with anthropomorphic, animate and human relata, but showed a slight preference of around 60% for the relative reference frame when encountering the non-possessive construction with unsided and sided relata. Thus, the overall preference for the intrinsic reference frame in Spanish flipped to a slight preference for the relative reference frame in the case of the non-possessive construction and unsided and sided relata. In contrast, English monolinguals encountering the non-possessive construction were less affected by the type of relatum, showing no switch in preference, but instead gradual increases in relative reference frame selection from 85 to 97% as relata became less humanlike. Further analyses by Olloqui-Redondo et al. (2019) revealed a categorical difference in terms of reference frame choices for unsided and sided relata on the one hand and anthropomorphic, animate and human relata on the other in monolingual Spanish, contrasting with a slight gradual increase in relative reference frame choices in monolingual English as relata become less human-like.
Our results show that with regard to the influence of relatum type, Spanish-English bilinguals pattern more closely with English monolinguals than with Spanish monolinguals across both of their languages and countries of residence. Specifically, while the effects of relatum type on reference frame choices differ in their details across English monolinguals and Spanish-English bilinguals, they have one thing in common: All statistically significant post-hoc comparisons involve either an endpoint of the animacy continuum, or (in one case) the position adjacent to each of the endpoints (sided and animate). This is in line with the idea that relatum type affects reference frame choices gradually in both English monolinguals and Spanish-English bilinguals, but not in Spanish monolinguals, where the effect is of a more categorical nature.
We identified a somewhat larger gradual increase of relative reference frame choices as relata become less human-like for Spanish-English bilinguals compared with English monolinguals. However, the categorical effect of relatum type on reference frame choices found for Spanish monolinguals is not present in the Spanish-English data. Thus, with respect to how relatum types affect reference frame choices, the speakers' L2 influenced their L1, such that Spanish-English bilinguals showed evidence for L1 attrition.
We should further note that both Olloqui-Redondo et al. (2019) and our study used pictures showing relata in an otherwise almost empty room as stimuli and that the results for the different object types might differ in real-life scenarios. For example, Johannsen and De Ruiter (2013) found that scenes with a realistic livingroom background elicited more relative reference frame choices than scenes with a white background. Similarly, in real-life scenarios, human relata might play a special role because they might also be considered to be an addressee or a partaker in the speech event and because people might consider themselves to be more of an active participant than a passive observer with human relata (cf. Marghetis et al., 2020). This might affect reference frame choices for human relata in real-life scenarios. Future studies are needed to determine to what extent our results for object type can be extended to other scenarios.

The role of task language and country of residence
We identified a main effect of language, such that Spanish-English bilinguals showed significantly more intrinsic reference frame choices in English than in Spanish. Since intrinsic reference frame choices are less frequent in English monolinguals compared with Spanish monolinguals, it is difficult to interpret this effect and further studies would need to address this to find a possible interpretation, if the pattern is confirmed. Future studies could also explore whether Olloqui-Redondo et al.'s (2019) and our results for Spanish versus English extend to the sagittal axis (front/ back). Specifically, previous research (cf. Marghetis et al., 2020;Pitt et al., 2021) suggests that relative reference frame choices might be more frequent for the sagittal axis than the lateral axis and it would be worth exploring if this is the case for English and Spanish speakers too.
The analyses also showed a main effect of residence, and residence interacted significantly with object type as well as with object type and syntactic construction.
The main effect of residence suggests a clear effect of English exposure in that Spanish-English bilinguals residing in the UK show overall more relative reference frame choices than those living in Spain. This is in line with the monolingual data in Olloqui-Redondo et al. (2019), which also shows more relative reference frame choices overall for English monolinguals compared with Spanish monolinguals. These results are in line with experience-based theories of language acquisition and processing and suggest that Spanish-English bilingual participants may track the overall frequencies of encountered intrinsic and relative reference frame choices, such that increased exposure to English may lead to increased relative reference frame choices in Spanish-English bilinguals. These results further suggest that it matters not only which languages we speak, but also where we speak them in terms of which other languages are spoken there.
The significant interactions with residence may reflect additional subtle differences due to English exposure. Specifically, bilinguals residing in the UK show a numerically smaller gradual increase of relative reference frame choices as relata become less human-like (14% in English and 18% in Spanish) than bilinguals residing in Spain (21% in English and 27% in Spanish). Thus, reference frame choices in bilinguals in the UK resemble the English monolingual pattern of a small gradual increase of relative reference frame choices as relata become less human-like (12%; Olloqui-Redondo et al., 2019) more closely than those of bilinguals in Spain. The post-hoc comparisons for the Spain and UK data also reflect this: While reference frame choices for both ends of the relatum type continuum, that is, both unsided and human relata, differ from all other relata for participants residing in Spain, only unsided relata differ from all other relata for participants residing in the UK. Thus, the smaller gradual increase in relative reference frame choices as relata become less human-like found for participants in the UK compared with participants in Spain is reflected in fewer pairwise comparisons in the UK data reaching significance compared with the Spain data.
Thus, despite previous research showing abundant effects of immersion in a linguistic environment, including structural effects on the brain (Deluca et al., 2019), our Spanish-English bilinguals remained relatively resistant to this kind of influence in their actual reference frame choices and we find only subtle effects of English language exposure on reference frame choices. Specifically, Spanish-English bilinguals may not have picked up on the (almost grammaticalised) influence of syntactic construction on reference frame choices in English. Instead, they may have registered that relative reference frame choices are more common overall in English than in Spanish, but not that this higher frequency is almost entirely driven by the non-possessive construction.
Overall, Spanish-English bilinguals' reference frame choice patterns were similar regardless of whether the task was in English or in Spanish and regardless of whether they resided in Spain or the UK. The fact that we found only few differences across the task languages and residences that patterned with English or Spanish monolinguals suggests that Spanish-English bilinguals' two languages may have partially merged when it comes to interpreting spatial scenes involving the lateral axis. This merged system includes aspects taken from both the L1 and the L2.

Implications for theories of language learning and processing
Our results suggest both L1-to-L2 transfer and L1 attrition in our data. Overall, this suggests a bidirectional influence of bilinguals' languages on spatial cognition, which has also been attested in other domains (cf. Köpke, 2004;Köpke & Schmid, 2004;Lambert & Freed, 1982;Seliger & Vago, 1991). It also attests to our cognitive flexibility across the lifespan and suggests that one's native language remains flexible even in adulthood. But why does the effect of syntactic structure on reference frame selection transfer from the L1 to the L2, while relatum type is subject to L1 attrition? We will now explore a tentative possible explanation for this pattern that draws on experience-based theories of language learning and processing.
Prediction and exposure are central to rational and implicit learning accounts of language acquisition and use (Chang et al., 2006;Jaeger & Snider, 2013). Specifically, prediction error, that is, encountering input that does not match one's predictions, drives adaptation and learning in these accounts: The more unpredicted the input, the greater the incurred prediction error and the greater the adaptation or learning effect (Kuperberg & Jaeger, 2016;Ness & Meltzer-Asscher, 2021). Very common structures will be strongly predicted, and less common structures will be relatively less strongly predicted. Encountering an alternative unpredicted structure causes a greater prediction error and leads to more learning the stronger the initial prediction was.
Predictive processing is pervasive in the L1 (Kamide, 2008). Predictions can occur in response to various linguistic cues, and different levels of linguistic representation can be predicted (e.g., Lew-Williams & Fernald, 2010;Weber et al., 2006). However, there is less evidence for predictive processing in the L2 (Kaan, 2014). L2 speakers may not engage in predictive processing even when they show knowledge comparable to L1 speakers of the words and syntactic structures involved (Grüter et al., 2017;Lew-Williams & Fernald, 2010), and even when the predictive cue is identical in their L1 and L2 (Foltz, 2021a). Whether or not bilinguals engage in predictive processing in their L2 may depend on factors such as proficiency (Hopp, 2013) and similarity of the L1 and L2 (Foucart & Frenck-Mestre, 2011).
Importantly, in those cases where speakers do not engage in predictive processing, they will not incur any prediction errors, and no adaptation or learning is expected to occur (Foltz, 2021b). Even prolonged L2 exposure would then not affect bilinguals' processing. Thus, for learning to occur, speakers need to use the involved cues for prediction, and they need to predict the particular structures involved.
Our results suggest that Spanish-English bilinguals might use only some cues, but not others, to make predictions about reference frame choices in both their L1 and L2.
Specifically, it appears that our Spanish-English bilinguals used relatum type information from both their L1 and L2 long-term input, but relied on long-term syntactic structure information from their L1 only for their reference frame predictions.
Relatum type plays a larger role for reference frame choices in the Spanish-English bilinguals' L1 (Spanish) compared with their L2 (English). It seems, therefore, that our bilinguals may have tracked aspects of the input that are important in their L1 (relatum type), but tended to disregard aspects of their L2 input that are less important in their L1 (syntactic structure). This would explain why the Spanish-English bilinguals show an English-like gradual increase in relative reference frames the less human-like the relatum and why this gradual increase is smaller, that is, more English-like, in the case of bilinguals residing in the UK: Exposure to unpredicted patterns in English would over time result in a more English-like pattern compared with Spanish monolinguals. In other words, Spanish-English bilinguals are used to taking relatum type into account when choosing a reference frame, and may therefore track relatum-type information in English and in Spanish. However, as syntactic structure is less relevant in Spanish, the difference in treatment may remain largely untracked and bilinguals stick with the L1 pattern.
The fact that the bilinguals are not only exposed to English, but also to Spanish, might explain why the gradual increase in relative reference frames from human to unsided relata is larger for the Spanish-English bilinguals, especially those residing in Spain, than for the English monolinguals in Olloqui-Redondo et al. (2019). Furthermore, English speakers are more likely to use the possessive construction for animate entities (e.g., John's house rather than the house of John) and the non-possessive construction for inanimate entities (e.g., the legs of the table rather than the table's legs; Rosenbach, 2008). As such, Spanish-English bilinguals are likely to encounter relative reference frames with inanimate relata in English, which then might lead to a further increase in relative reference frame choices for inanimate relata in Spanish-English bilinguals, especially those residing in the UK.
Our results would then also suggest that Spanish-English bilinguals track syntactic construction less robustly when it comes to choosing reference frames. Specifically, Spanish-English bilinguals seem to be sensitive to the overall amount of relative reference frames encountered, such that exposure to English leads to an increase of relative reference frame choices compared to English monolinguals, especially for Spanish-English bilinguals residing in the UK. But we find no evidence that Spanish-English bilinguals are sensitive to the strong effect of syntactic construction on reference frame choices seen in English. Our results are compatible with the idea that Spanish-English bilinguals track syntactic constructions to inform their reference frame choices either less robustly overall (possibly because Spanish has only one unmarked construction, with the other being marked), resulting in slower adaptation, or only in their L1 Spanish, but not in their L2 English, possibly due to resource limitations when processing their L2 English (Hopp, 2009). Alternatively, Spanish-English bilinguals may not make strong predictions based on the syntactic construction they encounter and thus may not experience a large enough prediction error to warrant sufficient long-term adaptation (Ness & Meltzer-Asscher, 2021). Thus, even after prolonged exposure to English, Spanish-English bilinguals pattern with Spanish monolinguals when it comes to the effect of syntactic construction on reference frame choices.
Our results are thus in line with previous studies on prediction in L2 processing, which suggest that L2 speakers engage in predictive processing in fewer processing situations than L1 speakers (Kaan, 2014). It seems that Spanish-English bilinguals may make reference frame predictions based on relatum type, which is more relevant in their L1, but less so based on syntactic construction, which is more relevant in their L2. In other words, the kinds of phenomena that speakers track in language processing and that they use to make predictions may be those that are relevant in their L1, but not necessarily those that are relevant in their L2. Aspects of the input that are less relevant in bilinguals' L1 may not be tracked as robustly because it is more resource intensive to track non-native aspects than to track native aspects.

Conclusions
We found that syntax and object types contribute in different ways to Spanish-English bilinguals' comprehension of spatial descriptions. Specifically, bilinguals pattern with Spanish monolinguals concerning syntax, and with English monolinguals concerning object types. As a result, we find evidence for both L1-to-L2 transfer and L1 attrition, such that bilinguals display a merged system that includes aspects of both their L1 Spanish and their L2 English. L2 exposure affects only aspects that are important in the L1 (object type), but not those that are important in the L2 (syntax). We suggest that bilinguals may be better able to track input patterns for aspects relevant in their L1, but less so for aspects relevant in their L2.