Introduction
Linguistics has long been remarkable for its inclusion of cognitive-centered approaches which privilege the individual mind as a unit of analysis, interactional approaches that focus on the complex structures emergent in multi-person social settings, and community-based approaches that study patterns across large networks of interactions. All three approaches are crucial to full understanding, as they examine the same phenomena at different scales. In particular, advances in the one can make evident limitations in the others. Most recently, interaction- and community-focused work in the “third wave” of variation research (Eckert Reference Eckert2012) has documented the complexity in speakers’ social engagement with language variation and in so doing has strained our existing models of sociolinguistic cognition to the breaking point. I suggest in this chapter that we need to rebuild these models in light of this third-wave evidence, in addition to work in sociophonetics and sociolinguistic cognition and related advances in language processing and social cognition.
Sociolinguistic cognition refers to the processes through which language and social structures encounter one another within the mind, not necessarily specific structures dedicated solely to sociolinguistic cognition. When it comes to cognitive theorizing, the field of sociolinguistic variation has relied overwhelmingly, implicitly or explicitly, on the model developed by Labov (Reference Labov1966, Reference Labov1972). In these early works, the language system is distinguished from a socially oriented system (usually positioned simply as speakers themselves), which expresses and acts on social preferences for language. This model has been further developed in the form of the sociolinguistic monitor or SLM (Labov Reference Labov1993; Labov et al. Reference Labov, Ash, Ravindranath, Weldon, Baranowski and Nagy2011), a deliberative module, positioned “downstream” from a grammar unit, which makes and acts on judgments regarding the social prestige of possible utterances, as well as socially judging utterances of other speakers.
In this chapter I discuss the current state of cognitive modeling in sociolinguistics and propose a new approach. In the second section, I describe the primary current model(s) of sociolinguistic cognition. The third section summarizes the sociolinguistic behavior which a model must capture. The fourth section presents findings from sociophonetics that can inform our theorizing. A summary of relevant models from language processing and social cognition is provided in the fifth section. The final section presents thoughts on a new direction for modeling building, focusing on three independently motivated constructs: a grammar with associational links to non-linguistic concepts, a person perception system, and a behavioral monitoring system.
Existing Models
Early sociolinguistic variation research focused on the community level, deliberately challenging the primacy of the individual speaker privileged by the Chomskyan approach. But while a community-level analysis is extremely valuable, it must integrate with a model of cognition at the individual level, which provides explanations of what speakers are able to do with their language and under what circumstances. All sociolinguistic work draws on some understanding of the cognitive abilities of speakers. Much of this theorizing has been done implicitly, as researchers discuss in general terms how “aware” or “conscious” speakers are of this or that form without defining precisely what is intended by these terms, while some has been explicit. Little of it, however, has taken advantage of the advances in related fields (cognitive psychology, social cognition) which have investigated and problematized ideas like awareness and consciousness.
In the current discussion, I will focus on the challenge of modeling the cognitive processes which link features of language to other aspects of social understanding. This is to limit the scope of the chapter, not to dismiss the considerable theoretical challenge of modeling the rest of the processes through which language is produced and perceived. Variationists have much to contribute to that project and have done so repeatedly, most often taking theoretical tools from other areas of linguistics and testing their ability to treat variable data (e.g. Guy Reference Guy1991). For the current work, it is the interface between those linguistic systems and the social cognition that is at issue, or in more familiar sociolinguistic terms, the cognitive modeling of the social significance, social meaning, and/or indexical meaning of language forms.
Labov (Reference Labov1966, Reference Labov1972) posits shared community-wide evaluative norms, according to which speakers recognize prestige forms. The apparent ease with which speakers perform this evaluation “correctly” (i.e. in agreement with others in New York City) is contrasted with the difficulty they often show in recognizing and consciously manipulating their own linguistic productions. This difficulty leads to one of the key theoretical constructs of variation, that of attention paid to speech. In this model, language production directly from the grammar is relatively effortless, but for social reasons speakers, particularly stigmatized speakers (such as working class New Yorkers), may wish to diverge from the speech their grammar produces, via effortful language management. Labov (Reference Labov1966) documents multiple cases of speakers who appear unaware of their own speech patterns, typically reporting that they use the prestige norms more often than their interview data would suggest. This contrast between speakers’ inability to report which forms they use even while manipulating those forms in predictable ways is an important point for any models of sociolinguistic cognition. As such, it provides a key rationale for positioning socially useful language manipulation as effortful and resource-heavy.
The sociolinguistic interview as a data-collecting tool is structured around the manipulation of attention paid to speech, intended to lure it out with standard language tasks like reading aloud on the one hand, or distract the conscious mind from monitoring with exciting speech content (“danger of death” stories) on the other. The malleability of a given variable to such tricks reveals useful information about whether and how speakers have positioned it on the prestige scale. Labov found that variables may show differential use across speech tasks even absent explicit speaker commentary about them, and suggested a three-way division of variables on this basis (Labov Reference Labov1972). Indicators are variables which exhibit socioeconomic patterning across speakers, but no task-based shifting, suggesting that speakers’ monitors do not register them as standard/non-standard. In contrast, markers do show such shifting; they are listed, as it were, in the monitors. Finally, stereotypes are variables which show both types of production patterns and additionally are culturally named and discussed (e.g. “dropping one’s g’s” or “dis and dat”). This tripartite division implies that the assessment and social control systems have access to some but not all linguistic objects, including some which are not available to introspection and explicit verbal description.
This idea of the grammar (producing the vernacular) on the one hand and a language-external process of monitoring on the other is further developed in Labov (Reference Labov1993), which introduces the sociolinguistic monitor, based on “the unobservability of linguistic structure.” This phrase refers to the observation that social evaluations are typically prompted by the frequencies of specific language forms, particularly words or pronunciations of words, rather than more complex relationships between multiple elements in a grammar, such as phonological merger or splits. Labov proposes that the cognitive capacity for socially evaluating and controlling language features is restricted, either entirely or primarily, to “surface” language forms like words and sounds, while being unable to observe “deeper” aspects of language like phonological categories or syntactic structures. These limitations necessitate positioning the monitor outside the grammar.
The sociolinguistic monitor is responsible for the social evaluation of speech generally, a function which includes forming social understandings of other people based on their own speech, developing a self-reflective (often inaccurate) image of one’s own speech patterns and controlling (often unsuccessfully) speech production to maximize its social appropriateness.
The SLM has received increased attention recently, beginning with the work of Labov et al. (Reference Labov, Ash, Ravindranath, Weldon, Baranowski and Nagy2011), who examined listener responses to varying frequencies of (ING). The conceptualization of the monitor as sensitive to frequencies rather than individual tokens raises questions as to how this frequency assessment is managed, and Labov et al. (Reference Labov, Ash, Ravindranath, Weldon, Baranowski and Nagy2011) identified three areas of interest: the length of time over which it can accumulate; the size of frequency differences to which it is able to attend; and its response pattern. They found that the impact of speech cues accumulates over a window at least a minute long and that the effect of subsequent tokens is attenuated in the face of preceding marked tokens. Wagner (Reference Wagner2013) reports a similar curvilinear response, but Levon and Fox (Reference Levon and Fox2014) failed to replicate the effect with British listeners, and suggest differences in the social meaning of the variables in the two contexts as an explanation. Given the mixed results and the limited variables investigated, caution is warranted in concluding that the curvilinear response is a fundamental of perception. Nonetheless, the pattern is intriguing.
The monitor as a construct is part of a larger theory of sociolinguistic language patterns. The concept of the vernacular (Labov Reference Labov1972) refers to speech produced exclusively by the grammar, with little to no interference by the monitor. Guy and Cutler (Reference Guy and Cutler2011) expand on this claim, suggesting that the work of the monitor in helping to produce an “inauthentic” sociolinguistic self (styles based on the monitor rather than the grammar) will show distortions in the linguistic constraints of a variable.
In addition, the distinction between change from below and change from above (Labov Reference Labov1966) is based on the theory that the sociolinguistic monitor operates within a relatively high level of awareness, such that it is only able to judge or alter certain speech patterns. Based on the assumption that speakers typically want to sound more prestigious rather than less prestigious, it was hypothesized that only prestigious forms (“from above” in the socioeconomic sense) could survive as incoming changes in situations where speakers knew about them (“from above” in the awareness sense). Changes from below in the former sense would be nipped in the bud by speakers’ sociolinguistic monitors if they were not also from below in the latter sense, that is, invisible to the monitor.
In the time since the model was first introduced, however, critiques have emerged, forcing the reconceptualization of a number of its predictions. One such critique challenges the model’s focus on the single dimension of prestige. As early as Trudgill (Reference Trudgill1974), variationists observed male speakers apparently orienting to working-class forms, in a move that was termed “covert prestige,” but was not fully integrated into the model. More tellingly, Rickford (Reference Rickford1986) pointed out the essentially consensual model of social class underlying the Labovian approach, and contrasted it with the explicitly Marxist understanding of class shown by many of the speakers in his own data from Guyana. Further complexity was introduced by Eckert (Reference Eckert2000) and others, showing that patterned linguistic variation may be observed in response to many social attributes that are not about socioeconomic status, or not only about socioeconomic status. Linguistic variation has been linked to high school social groups (Drager Reference Drager2009; Eckert Reference Eckert2000; Moore Reference Moore2011), gang affiliation and degree of engagement (Mendoza-Denton Reference 149Mendoza-Denton2008), sexual orientation (Podesva Reference Podesva2011), and many other social structures.
This theoretical development goes beyond simply expanding the number of social constructs the monitor is concerned with, however. Third-wave variation work has drawn from linguistic and cultural anthropology to explore the semiotic relationships linking variable language forms to other constructs. While all three of Peirce’s (Reference Peirce and Baldwin1901) sign types are relevant for understanding the social construction of variation, the field has centrally concerned itself with indexical links (Silverstein Reference Silverstein, Basso and Selby1976; Ochs Reference Ochs, Duranti and Goodwin1992; Silverstein Reference Silverstein2003). Speakers and listeners learn these links through observations of co-occurrence and meta-discourse, which position particular forms tied to particular meanings. These connections are made to seem natural and self-evident through their repetition and their embedding in culturally normative systems of understanding. Having learned these connections, speakers can employ the forms to try to invoke their associated social constructs, either to appropriately align with existing features of a situation or to alter a situation (Silverstein Reference Silverstein, Basso and Selby1976). This work is a crucial tool in the construction of social structures both large and small. Linguistic features, in this view, are resources for social activity, similar to other social objects like wearable fashion, consumption practices, bodily hexis, etc. (Hebdige Reference Hebdige1979; Bourdieu Reference 146Bourdieu, Raymond and Adamson1982; Coupland Reference Coupland2007).
Speakers’ ability to construct any given personal style, stance, or situational feature is constrained primarily by two factors: their sociolinguistic facility with the semiotic systems in question; and the willingness of their interlocutors (at any cognitive level) to interpret the performance in the ways intended. The former includes the ability to produce the appropriate forms in the right linguistic contexts, but also to perceive existing social features of the context and produce combinations of cues that are socially coherent for the audience. This ability may be influenced by the amount and type of exposure a speaker has had to the forms in question, but also by their motivation and perhaps general variation in linguistic flexibility. Audience acceptance of intended meanings will presumably be influenced by the sociolinguistic performance itself, but by other characteristics of the speaker including demographic categories, as well as the social goals of the audience (Campbell-Kibler Reference Campbell-Kibler2008).
In this approach, the complexity of the speaker/hearer’s social abilities and responsibilities increases substantially. The meanings that they may express and understand through language variation are multidimensional and subject to constant change. To take just a single example, Zhang (Reference Zhang2005) documents a set of variables used by young professionals in Beijing. Both the variables and the professionals can be loosely categorized within an overarching framework: the variables are generally either local, part of a Beijing-specific accent, or part of standard Mandarin and therefore placeless. The speakers Zhang interviewed were employed by either state-run businesses or multi-national corporations. These broad categories were linked, with state employees more likely to use local features. But within these broad categories, Zhang documented tremendous nuance, specific to the career trajectories of speakers and the social and characterological history of the variables. Beijing professionals developing sociolinguistic selves must not only consider whether they want to identify as a local or as a cultured and cosmopolitan international citizen. In the context of only a single variable, they must also consider their own relationship, in a given conversation, to the characterological figure of the “alley saunterer” – a shady, “in the know” man who frequents alleys and is involved in local black markets and other illegal activities and both his strengths (local knowledge, interpersonal networks) and his weaknesses (fecklessness, shady moral character).
These patterns of behavior documented in the third-wave tradition are difficult to account for the Labovian cognitive model. First, the sociolinguistic system is tasked not only with monitoring the prestige or standardness of language, but of any social meaning, the import of which can only be understood in a complex larger context filled with interactants, goals, and ideologies. The social complexity of both speaking and listening increases enormously in this framework, such that the task of tracking both the social meaning of both incoming and outgoing language with a conscious, effortful system becomes intractable. To incorporate the insights of the third wave, our cognitive model needs updating.
What Are We Modeling?
The previous section made clear that the current state of cognitive modeling in sociolinguistics, while capturing some crucial insights, is inadequate to the task of modeling our current understanding of sociolinguistic behavior. This section will sketch some of the abilities and phenomena that a model of sociolinguistic cognition needs to account for. The sociolinguistic abilities of individuals are complex and embedded in larger, even more complex systems, but may roughly be categorized into three main types: the production of sociolinguistically meaningful forms; the comprehension (linguistic and social) of such forms; and metapragmatic behaviors which create, negotiate, and reaffirm meaning-form links. Note that in practice comprehension can only be observed through metalinguistic acts or further production, but given the disciplinary, theoretical, and methodological differences in the approaches to “perception” on the one hand and “ideologies” on the other, they are divided here for ease of explication.
As noted in the section above, the field of sociolinguistic variation has shown over the years that speakers are capable not only of producing language with coherent referential meaning, but of adapting their non-referential choices to contexts and goals. Silverstein (Reference Silverstein, Basso and Selby1976) complicates our modeling task by observing that only some of such uses may be described as presupposing indexical uses, in which the language forms produced share indexical links to entities (people, topics, situational dynamics) already independently present in the discourse. Many others represent creative uses, in which the use of the form itself introduces an entirely or partially new entity, as, for example, the first use of the French form tu between new acquaintances creates a bid for intimate status which may be responded to in various ways. Thus, sociolinguistic production firmly includes behavior that we might characterize as volitional, such as initiating an informal or intimate relationship with an acquaintance. Further, such volitional or agentive behavior at least some of the time involves the adjustment of forms upon which speakers are unable to comment explicitly in any detail (e.g. Labov Reference Labov1963).
While there is a connection between consciously articulable goals and sociolinguistic behavior, they are not always in synchrony. Despite the evidence that speakers are able to manipulate their linguistic forms in ways that support their social goals and beliefs as they would verbally articulate them, the evidence also points to limitations on these abilities. Speech errors and self-corrections do occur (see Kitzinger Reference Kitzinger, Sidnell and Stivers2013 for an overview), including those which revolve around indexical rather than referential meanings. Speakers require a certain level of exposure to skillfully produce language forms and the necessary level appears to vary based on, among other factors, the linguistic level of the forms. New lexical items or new uses for existing items may only require a single instance to adopt, while certain complex phonological systems may require sustained exposure, perhaps only during childhood, for speakers to produce identically to others (Payne Reference Payne and Labov1980). Our model must allow speakers to learn new forms and new links between forms and social constructs, but should not predict instantaneous learning or perfect performance.
In addition to creating sociolinguistic performances, individuals are able to incorporate the speech of others into their mental understanding of the situation. Listeners are able not only to extract linguistic meaning from utterances, but to recover indexical links between the forms used by speakers and social structures, both presupposing and creative. More specifically, speakers are able to associate language forms with qualities of the speaker (Lambert et al. Reference Lambert, Hodgson, Gardner and Fillenbaum1960), the topic (Cargile and Giles Reference Cargile and Giles1998), stances towards interlocutors (Ball et al. Reference Ball, Giles, Byrne and Berechree1984) and others. Note, however, that the meanings recovered by listeners may not always be precisely (or even generally) those imagined by the speaker delivering the message.
When forming these social perceptions, listeners are able to take into account pre-existing information (from prior linguistic cues or other sources of information) and use them to guide perception. The contribution of language cues to the ultimate percept may differ based on the other linguistic cues available, such as regional accent (Campbell-Kibler Reference Campbell-Kibler2007), extra-linguistic information like profession (Campbell-Kibler Reference Campbell-Kibler2010), situational constraints such as speech task (Cargile Reference Cargile1997), or message content (Cargile and Giles Reference Cargile and Giles1998). More complex factors of stylistic structure have also been documented. Pharao et al. (Reference Pharao, Maegaard, Møller and Kristiansen2014) found that fronted /s/ tokens which prompt listeners to hear white Danish boys as gay-sounding become irrelevant to sexual orientation when placed in the speech of speakers of Danish “street style” associated with descendants of Turkish and other immigrants. Further, evaluations need not depend only on one speaker’s performance, but can be developed in response to a speaker’s positioning in relation to another, for example judging a job applicant’s choice to maintain a “broad” accent differently when they are interviewed by a speaker with a “broad” vs. a “refined” accent (Ball et al. Reference Ball, Giles, Byrne and Berechree1984).
Like production, listener perceptions show impressive skills, but also limitations which offer potential insight into their workings. Evaluator mood is a well-known influence on evaluations (Forgas and Moylan Reference Forgas and Moylan1988) and evaluations of people are no exception (Forgas and Bower Reference Forgas and Bower1987). More intriguingly, some evidence suggests that different linguistic performances may be differently susceptible to mood effects: Campbell-Kibler (Reference Campbell-Kibler2011) found that speakers using the -ing form of (ING) were exempted from a mood effect of ratings of intelligence which impacted other recordings.
Although perception can occur on its own, without observable behavioral consequences, such episodes are difficult to study, meaning that all research on perception is also research on metalinguistic characterization, the last category of sociolinguistic behavior to be accounted for (Jakobson Reference Jakobson and Sebeok1960; Silverstein Reference Silverstein, Basso and Selby1976). Such characterization can take place in a wide range of ways, including explicit description, assessment of speakers, deployment of variation in response, and others. Agha (Reference Agha2007: 16) introduces the notion of reflexive activity, the “activities in which communicative signs are used to typify other perceivable signs.” Through these activities, reflexive models are produced, transmitted, and altered, and through these models speakers and listeners make social sense of their own and others’ linguistic (and other) behavior. Such activity is as much a part of sociolinguistic behavior as the utterance of sociolinguistically variable forms itself.
Sociolinguistic beliefs are embedded throughout the explicit and introspectively accessible belief systems speakers hold about their worlds and the people in them, but we do not yet know how integrated these beliefs are with the systems responsible for production and perception. Labov (Reference Labov1993) notes the “reflexive stigma principle” that people who use stigmatized forms are often those most inclined to criticize them. While it is not clear that this is a general principle, we do see repeated evidence that production and perception behavior often do not align with stated ideologies. Kristiansen (Reference Kristiansen2009) discusses the disconnects between Danish youths’ beliefs about regional varieties when explicitly asked as opposed to their social responses to actual speakers of those varieties. Most notably, regional varieties, declared the most preferred in explicit surveys, lead speakers to be ranked as least intelligent and least socially desirable. Observations like these have led to the divide between overt and covert attitudes.
Already our summary of the sociolinguistic phenomena to be captured has become multi-layered, as we observe that specific language forms and indexical links that appear to influence some types of behavior (e.g. speech choices in different speech tasks) may be invisible in other types of behavior (e.g. explicit metapragmatic discussion of forms). These fractures and disconnects provide valuable starting points for our models, because they point to joints in the systems which manage these behaviors. The next section tackles such disconnects more explicitly, by examining the small body of work which experimentally probes sociolinguistic cognitive processes.
Where Systems Collide
One of the foundational challenges in linguistics is the question of how listeners are able to understand speech as well as they do, given the incredible amount of variability it contains. For some time, phoneticians hypothesized that the comprehension system filtered out and discarded this variability, but more recently it has become clear that much information about variable productions is retained and able to influence future perception and production processes (Goldinger Reference Goldinger1998). Moreover, listeners appear to be able to map specific kinds of variability to external influences and adjust their learning patterns accordingly, discounting situationally triggered interference, while generalizing apparently speaker-specific cues (Kraljic et al. Reference Kraljic, Samuel and Brennan2008).
Part of this tradition has explored not only idiosyncratic differences between speakers, but also structured differences linked to known social categories. This work has shown that connections between language forms and other social constructs can be accessed and maintained cognitively through processes which are not easily accessible to introspection. For example, Niedzielski (Reference Niedzielski1999) showed that invoking a national or regional variety label can shift the phonetic character listeners report hearing in speech immediately after hearing it. But Hay and Drager (Reference Hay and Drager2010) have shown that this association need not be explicit, occurring in response to seemingly incidental exposure to stuffed toys symbolic of the varieties (in this case, New Zealand and Australia). Further, the effect appears to depend on the speakers’ attitudes towards the variety invoked, seen even more clearly in a production-based follow-up study (Drager et al. Reference Drager, Hay and Walker2010).
Relatedly, Strand (Reference Strand1999) demonstrated that listeners take not only the sex of the speaker (based on a photo) into account when judging the boundary between /s/ and /ʃ/, but the gender typicality, such that men’s and women’s faces rated by independent judges as highly masculine and feminine respectively provoked a stronger difference in /s/-/ʃ/ category boundary than did faces rated as less masculine and feminine. This suggests that the classification of phonemes may be influenced by nuanced social factors. Staum Casasanto (Reference Staum Casasanto2008) has likewise shown that the perceived race of a face presented as the speaker influences how likely listeners are to hear a string like [mæs] as the word mass versus a reduced form of mast. The identification of vowel quality and the categorization of closely related consonants are central linguistic processes, typically seen as rapid and outside of conscious awareness. Even if the effects are triggered by conscious reflection on the nationality, gender, and/or race involved, the ability of such reflection to influence rapid, frequent, and early processes as phonological categorization requires a model of linguistic processing which integrates social information at more levels than the sociolinguistic monitor allows.
The rapid nature of sociolinguistic integration is further supported by Van Berkum et al. (Reference Van Berkum, van den Brink, Tesink, Kos and Hagoort2008), who document that the social category of a voice (young vs. old or posh vs. less posh) can influence the patterns of ERP response (Event Related Potential) associated with semantically surprising information. Brain responses within 200 to 300 milliseconds register the difference between an adult’s voice saying “I have a glass of wine with dinner” (unsurprising) and a child’s voice saying the same thing (surprising). For these effects to appear, our social expectations about who is speaking and thus what practices they are likely to engage in must be used quite early in the language processing stream.
Another aspect of language processing which appears to be both functional outside of introspective awareness and susceptible to social influence is that of alignment or accommodation. Accommodation has long been studied as an interactional strategy through which speakers appeal to or distance themselves from their interlocutors by manipulating their linguistic similarity (Giles and Powesland Reference Giles and Powesland1975). More recently, alignment has been investigated as an automatic response to linguistic input (Bock Reference Bock1986; Pickering and Garrod Reference Pickering and Garrod2004). This tradition has demonstrated that these effects emerge even in contexts where it is difficult to attribute a conscious interactional strategy to the behavior, typically because the social setting is highly impoverished (e.g. repeating words after a pre-recorded voice, as in Goldinger Reference Goldinger1998).
Despite the apparent lack of introspective motivation or control in such research (a pre-recorded voice is not able to appreciate accommodation towards it), evidence has emerged for a social dimension to such effects. Babel (Reference Babel2010) has shown that New Zealanders’ degree of vocalic accommodation while shadowing an Australian speaker correlated with their implicit attitudes towards Australians, although was unaffected by the (somewhat heavy-handed) manipulation of liking towards the specific speaker shadowed. Along similar lines, Yu et al. (Reference Yu, Abrego-Collier and Sonderegger2013) found that explicit liking ratings of a speaker predicted degree of convergence towards the exaggerated VOTs in his narrative. They also found an effect of their likability manipulation, but in the unexpected direction, with the less likable guise prompting greater convergence. Given the startling nature of their manipulation (the unlikable guise involved the speaker describing a blind date insultingly), it may be that memorability or attention is another key factor in accommodation patterns.
A few studies have reported similar effects for syntactic structures. Balcetis and Dale (Reference Balcetis and Dale2005) show that participants were more likely to re-use syntactic structures used by a confederate when that confederate behaved in friendly and pro-social ways as opposed to rude ways. They also found increased convergence for annoyed versus patient confederates, possibly as an interactional repairs strategy or due to the memorability or attention factor discussed for Yu et al. (Reference Yu, Abrego-Collier and Sonderegger2013). Weatherholtz et al. (Reference Weatherholtz, Campbell-Kibler and Florian Jaeger2014) similarly found effects of social judgments on syntactic priming, this time in a less interactive task. After listening to a politically charged diatribe in one of three accents, listeners were asked to describe line drawings in an apparently unrelated task. They were more likely to adopt the dative construction (DO, Give me the book vs. PO, Give the book to me) when they rated the speaker as more standard and when they were personally inclined towards compromise in conflict situations. Two effects reflected different priming effects for the two forms (DO vs. PO): the perceived similarity between the speaker and the participant; and the perceived intelligence of the speaker. These latter interactions suggest that such priming is influenced not only by social factors solely, but by a complex interaction between social assessments and expectations, which are driven in part by previous experience of frequencies (see Jaeger and Snider Reference Jaeger and Snider2013).
All of this work taken together shows that our cognitive models of language and social processing must allow for these systems to be integrated in a parallel manner, rather than the original model of an independent grammar which only feeds into the social system after having performed its function. What it does not do yet, however, is to tell us exactly how and where these systems are integrated and through what mechanisms. We can, however, identify some hypotheses. First, at least some of this processing is functioning in systems not dependent on introspectively available reasoning, given that they occur even when such reasoning would dismiss it as unnecessary or even counter-productive, as, for example, in Hay and Drager (Reference Hay and Drager2010), when incidental exposure to the concept of a variety impacted processing of a different variety. Next, expectations seem to play a key role, such that social and other types of non-linguistic reasoning may set expectations of behavior (including linguistic choices) that are used in the online processing or management of incoming stimuli.
These hypotheses leave many open questions about the nature of integration between the social and linguistic systems. Prior to tackling them directly, it seems wise to turn to the existing literatures on consciousness, memory research, social cognition, and language processing. Given the breadth of all of these domains, I will not present a thorough overview of any, but will rather focus on the elements of each most likely to be relevant for the modeling task as outlined above.
Sociolinguistic Cognition Is a Kind of Cognition
In order to formulate a plausible model of sociolinguistic cognition, it is necessary to understand as fully as possible the larger cognitive systems within which it operates. This is a task easier said than done, given that the larger study of human cognition is a work very much in progress. Nonetheless, some progress has been made which may shed light on sociolinguistic cognition. This discussion will focus on four areas: consciousness, memory, social cognition, and language processing.
Linguists have long struggled with the idea of conscious awareness and its role in sociolinguistic processing, with some theorists dismissing the possibility of socially motivated language processing outright, based on an assumption that social reasoning is necessarily conscious. Labov documented the complex social associations of the centralization of /ay/ and /aw/ in Martha’s Vineyard, then stated: “It has been noted that centralized diphthongs are not salient in the consciousness of Vineyard speakers. They can hardly therefore be the direct objects of social affect” (Labov Reference Labov1972: 40). This assumption remains active in the field; Brulard and Carr (Reference Brulard and Carr2013: 151) argue that their evidence of variable accommodation of Scottish Standard English speakers to RP could not be mediated by attitudinal factors regarding national identity, based on their belief that “sense of national or regional identity is necessarily conscious, and that unconscious accent accommodation falls below the level of conscious sense of identity” (emphasis in original).
While this idea seems to be common among linguists, it is not well supported by cognition research. Cognitive psychology has shown the wide range of processes which are carried out without effort, deliberation, or introspective awareness (Evans Reference Evans2008), including many social cognitive processes (Hassin et al. Reference Hassin, Uleman and Bargh2005). Indeed, the question of the role of consciousness as any kind of causal factor at all in human behavior is a topic of some debate, with many researchers portraying consciousness as purely epiphenomenal (for a discussion, see Baumeister et al. Reference Baumeister, Masicampo and Vohs2011). In social cognition specifically, evidence has been offered for automatic elements of person perception (Ferguson Reference Ferguson2008; Macrae and Martin Reference Macrae and Martin2007), stereotype application (Galinsky and Moskowitz Reference Galinsky and Moskowitz2007; Park et al. Reference Park, Glaser and Knowles2008), and social goal pursuit (Bargh et al. Reference Bargh, Green and Fitzsimons2008; Ferguson Reference Ferguson2008), among many others.
Dual-systems models of cognitive psychology and social cognition (for a summary, see Evans Reference Evans2008) have theorized that cognition consists of at least two systems or types of systems, one of which is (variably across specific models) relatively slow, available to introspection, and/or under conscious control, while the other is fast, operates outside of awareness, and/or cannot be prevented or can only be prevented with effort (e.g. Smith and DeCoster Reference Smith and DeCoster2000). As Evans (Reference Evans2008) explains, while the evidence supporting dual systems models is strong, a coherent single model has not yet emerged, due in part to the many dimensions along which the system can be divided. The available evidence suggests that the dichotomies typically invoked as signifying conscious or unconscious processes do not align consistently with each other across specific phenomena, making the construction of an overarching dual systems model challenging. Several researchers have proposed a move away from dual systems models towards more complex multiple interlocking systems without a clear automatic/controlled or conscious/unconscious divide (e.g. Van Bavel et al. Reference Van Bavel, Xiao and Cunningham2012). Despite these continued debates, what is clear and widely understood is that many important processes, including social processes, at least occasionally occur quickly, without introspective awareness and/or in ways apparently at odds with verbally reported or experimentally manipulated intentions.
In addition to the complexity of consciousness versus awareness, there is another crucial feature of the psychology literature which sociolinguists typically neglect and that is the apparent multiplicity of systems. At base, sociolinguistic systems, like many human cognitive processes, are memory systems: speakers are exposed to forms and social constructs in particular combinations, and they alter their future behavior on the basis of information, habits, etc. retained from these past experiences. Debates within variation tend to assume a single cognitive locus for such learning, but research on memory has increasingly indicated the existence of multiple overlapping and, at times, competing memory systems, each with its own strengths, weaknesses, and ideal time depth (for an overview, see Squire Reference Squire2004). The famous case of H.M. demonstrated that the total loss of the ability to form new episodic memories (due to surgery to treat epilepsy) left H.M. with several other types of memory retained, including the ability to learn new physical skills and some perceptual learning (Milner et al. Reference Milner, Coricin and Teuber1968). Experiments with similar patients have shown that social learning abilities may also be retained in such cases, including developing an aversion to specific individuals in response to problematic behavior such as being stuck with a pin when shaking hands (Draaisma Reference 147Draaisma2000: 198). In developing our models of sociolinguistic cognition, it may be instructive to turn to research on memory, particularly on language and memory, to better understand what systems might be contributing to the phenomena we study.
The cognitive phenomena most directly of interest to sociolinguistics are language processing and social cognition, with the latter much more poorly represented in our field. The study of social cognition is vast, with many different abilities and behaviors constituting independent subfields of research. One such subfield crucial for sociolinguistics is person perception, the processes by which individuals organize information about other people into models of that person’s qualities and likely future behavior. The idea that learning about people differs substantially from learning other kinds of information dates back to Asch (Reference Asch1946), who observed that the order in which personality traits were presented had a striking effect on the resulting impression of the individual described. Evidence has repeatedly shown that information understood as about a person is better retained and structured differently from the same information presented as unrelated items in a list (Chartrand and Bargh Reference Chartrand and Bargh1996). Evidence of this sort has led social cognition researchers to posit an independent system for person perception, in which an individual’s behavior spontaneously gives rise to inferred personality traits (Brown and Bassili Reference Brown and Bassili2002; Uleman et al. Reference Uleman, Hon, Roman and Moskowitz1996) which may likewise be influenced by co-presented visual cues (Carlston and Mae Reference Carlston and Mae2007). How direct observations of faces and voices are integrated in learning and recognition is a related area of concern likely to be of interest to sociolinguists (Campanella and Belin Reference Campanella and Belin2007; Kamachi et al. Reference Kamachi, Hill, Lander and Vatikiotis-Bateson2003; Stevenage et al. Reference Stevenage, Hugill and Lewis2012).
Part of the process of perceiving a person is identifying the social groups to which they belong and applying, failing to apply, or choosing not to apply the expectations and stereotypes associated with those groups to that individual (Jussim et al. Reference Jussim, Fleming, Coleman and Kohberger1996; Operario and Fiske Reference Operario, Fiske, Brewer and Hewstone2004). Because of the real world effects of these processes, they have received a great deal of study in social psychology. In particular, researchers have found that a common pattern, among US college participants, is for the egalitarian nature of explicitly endorsed beliefs to be at odds with more implicit attitudes and associations (e.g. Evans Reference Evans2008: 257). These conflicting forces, which can be pitted against each other experimentally (Govorun and Payne Reference Govorun and Keith Payne2006; Payne and Stewart Reference Payne, Stewart and Bargh2007), have lent support to the argument that person perception is carried out by at least two different processes, which at times prompt individuals towards divergent behaviors.
Note that the mere existence of opposing forces does not necessarily drive us to multiple systems. Even within explicit, verbally articulated domains, contradictory beliefs are commonplace. Rather, we see support for different systems in the difference in speed between the two types of responses and in their relationship. The classification of people into groups and the resultant reactions appear, at least in the case of face perception, to be very rapidly deployed. White and Black participants show differential Event Related Potential (ERP) patterning for racial in-group versus out-group faces (Ito and Bartholow Reference Ito and Bartholow2009). This rapid reaction is still susceptible to contextual cues, however. In an approach/avoidance task in which participants are told to manipulate a joystick towards or away from themselves in response to a given category of stimulus, the task instructions create a local context (equating, for example, a Black face with “approach” for a White participant) which attenuates this neural reaction (Cunningham et al. Reference Cunningham, Van Bavel, Arbuckle, Packer and Waggoner2012).
This seemingly more automatic system also appears to be associative rather than propositional in nature (Gawronski and Bodenhausen Reference Gawronski and Bodenhausen2011), in that social constructs have been shown to prime each other (Bargh Reference Bargh2006) in ways that do not always enhance performance from a rational perspective. One well-documented version of this priming is known as the weapons task, in which participants are exposed to a Black or White man’s face for a brief period of time, then shown a picture of a gun or a non-weapon tool such as a wrench. Participants are asked to identify the second object as a tool or a gun and either are or are not given time constraints. Immediately preceding exposure to a Black face increases errors of mistaking a tool for a gun, particularly under time pressures (Park et al. Reference Park, Glaser and Knowles2008; Payne Reference Payne2001, Reference Payne2005). This suggests influence from a rapid system which is susceptible to racist stereotypes linking Black men to notions of violence. This association influences participant responses despite the irrelevance of the face to the task in the experimental context.
Competing with associative, rapidly deployed perception are slower systems based on propositional reasoning, which are more able to take into account details of behavior and form a tailored understanding of an individual (Fiske and Neuberg Reference Fiske and Neuberg1990; Showers and Cantor Reference Showers and Cantor1985). Unlike the rapid association of concepts, which proceeds quickly and relatively effortlessly, rational consideration of individual information is mediated by the mental resources available and the motivation to think carefully about the perceptual target. For example, people’s assessment of a target’s behavior is more sophisticated when they believe they will be interacting with the target in the future than when the task is merely intellectual (Devine et al. Reference Devine, Sedikides and Fuhrman1989). Motivation may not only influence the rigor of the perception process, but also its direction, even to the extent of altering more general beliefs to support a socially desired assessment of an individual (Klein and Kunda Reference Klein and Kunda1992).
Another vast literature is that on self-regulation, through which individuals monitor their behavior and alter it as needed to pursue goals (Wagner and Heatherton Reference Wagner, Heatherton, Mikulincer, Shaver, Borgida and Bargh2015). Most commonly studied in the context of health-related behavioral choices like smoking and food choice, self-regulation is typically understood as a limited resource which can be depleted through use (Baumeister and Heatherton Reference Baumeister and Heatherton1996). As part of the executive control system more generally (Diamond Reference Diamond2013), self-regulation abilities can also be worn down through other ego-depleting activities or experiences such as the Stroop task (von Hippel and Gonsalkorale Reference von Hippel and Gonsalkorale2005). Self-regulation applies to a wide range of behavior types, from blocking stereotypical assumptions to refraining from eating unhealthy foods, and, presumably, substituting socially useful language forms for less useful, but perhaps situationally triggered, forms. Little work has connected this social psychological understanding of regulation to sociolinguistic models.
One necessary precursor to managing behavior is reasoning about one’s own beliefs and goals. Reasoning is similar to other mental processes in having been posited to include both automatic or associative elements and deliberative or propositional elements, although it is worth noting that these two dichotomies need not be aligned with one another. The existence of both associative and propositional systems may be seen informally in the joke question “What do cows drink?” which, particularly after other forms of priming, often prompts an initial impulse of “Milk,” then followed by the accurate response of “Water.” In addition to competing with each other, these systems presumably also interact with and influence one another. In order to be represented in either system, however, language forms and social constructs must be represented in the cognitive system, a mental learning process analogous to Agha’s cultural-level idea of registers. These concepts, for example, polite, Southern accent, or refined speech, exist within a much larger structured field of concepts which constitute the set of declarative knowledge available to a given individual (see Squire Reference Squire2004 for a cognitive, Deacon Reference Deacon, Christiansen and Kirby2003 for a semiotic discussion).
Finally, our models must have an adequate understanding of expectation (Van Berkum Reference Van Berkum2010). Variationists have something of a love/hate relationship with the notion of salience, which repeatedly emerges as an important construct in our research, while being notoriously difficult to pin down (for a singularly cogent treatment, see Auer et al. Reference Auer, Barden and Grosskopf1998). One potentially useful path in tackling ideas of what is or is not salient in a given context is to engage with broader cognitive notions of expectation (see also Ráckz Reference Ráckz2013). Expectation and surprisal have emerged as central concepts in psycholinguistics, for example in that patterns of syntactic priming may be influenced by the degree of prediction error they trigger in a comprehender (Jaeger and Snider Reference Jaeger and Snider2013). In non-linguistic processing as well, the mind seems strongly inclined to develop expectations about upcoming events and actions, prompting increases in alertness when these expectations are violated (Bar Reference Bar2007); in the words of Van Berkum (Reference Van Berkum2010) “the brain is a prediction machine that cares about good and bad.”
Newer models of cognition have placed increasing focus on prediction as a fundamental process, incorporated into processes from language to vision to physical movement. Most notably, Pickering and Garrod (Reference Pickering and Garrod2013) propose a model of language processing which integrates language production and perception into an interwoven single system. In this model, speaker/hearers are not only producing their own utterances and comprehending those of others, but constantly maintaining impoverished (and therefore rapid) predictive models of both their own speech and that of others. These forward models are continually checked against perceptions, providing an alert system for the correction of errors in one’s own speech or unexpected behavior on the part of others.
This section has provided an unfortunately brief overview of some recent insights on the diversity of cognitive systems, focusing on those most likely to be relevant to sociolinguistic cognition. Space constrains our ability to explore all of the relevant cognitive systems likely to contribute to sociolinguistic processing. In addition to those already discussed, it is likely that language variation is influenced by systems of affect or emotion, which appear to be distinct from, for example, those related to stereotypical beliefs about other groups (Amodio and Devine Reference Amodio and Devine.2006). In the next section, I will take some of these insights as a starting point for reconceptualizing our own models, with an emphasis on the ways in which concepts related to awareness and control are handled.
A New Model
It is important to note at the outset that the approach to sociolinguistic cognition discussed here is not an exclusive system. There is little evidence to suggest that sociolinguistic processes are independent of the social cognition and linguistic processing systems. Indeed, we might think of the entire question of sociolinguistic cognition as one of interface: where, why, and how do the social and linguistic systems meet? A model which answers this question fully must necessarily be based on accurate, well-established models of those systems or families of systems. Unfortunately, such models do not exist, both areas being currently subject to hot debate along a number of dimensions. Instead, sociolinguists can draw on insights common across the changing models, while also contributing some necessary constraints on their character, by virtue of what we know of their interface.
Just as in the original monitor-based model, the language processing system forms the most basic element of our model of sociolinguistic behavior. The existence of some amount of specialized machinery for language production and comprehension is one of the most widely supported conclusions of modern linguistics (e.g. Fodor Reference Fodor1983) although the exact extent and nature of the language-specific portions of processing continue to be a matter of debate (e.g. Lieberman Reference Lieberman2006). One of the key questions is the relationship between the production and perception processes. Although these are obviously linked, or language learning would not occur, they are also obviously at least partly distinct, given that speakers can understand varieties different from those they produce. As noted above, Pickering and Garrod (Reference Pickering and Garrod2013) have offered one view of an integrated production/perception system which prioritizes prediction as a key feature of both processes. This approach has many benefits, including capturing effects showing the influence of production and comprehension on one another, capturing speakers’ skills at maintaining very small gaps between turns, and providing an independently motivated notion of at least one dimension of salience. This model provides a promising base on which to build a more specific understanding of sociolinguistic phenomenon.
Based on the research presented in “Where systems collide,” above, our grammar must necessarily incorporate social features of the speech context, including the social identities or group affiliations of the speaker, addressee(s), and other participants; the speaker’s and others’ stances towards each other; the topic of conversation; the physical and conceptual setting of the speech and many other features (for one discussion of such dimensions, see Hymes Reference 148Hymes, Gumperz and Hymes1967). While earlier models of the grammar made such inclusion essentially impossible, more recent models have allowed for it, as new evidence has emerged suggesting that quite a lot of token-level detail makes its way into language-learning, including ongoing learning by adult native speakers (see as examples Goldinger Reference Goldinger1998; Pierrehumbert Reference 150Pierrehumbert, Bybee and Hopper2001). Constraint- and construction-based models of syntax may incorporate some social information (e.g. Bender Reference Bender2001), but this possibility has been most thoroughly explored for models of sound variability, in the tradition of exemplar phonology (Johnson Reference Johnson2006; Drager and Kirtley, this volume).
In these models, social information, as well as other information like fine-grained acoustic details, is stored, with as yet unknown detail and for unknown amounts of time, at a basic level associated with, at least originally, each token heard in a speech setting (Goldinger Reference Goldinger1998; Johnson Reference Johnson2006). This detail is included in a complex perceptual space, which generalized across to create abstract phonological categories (Beckman and Edwards Reference Beckman and Edwards2000).
The grammar’s use of social information has been most clearly documented in perception, where our ability to control potentially confounding factors is much greater. As noted in “Where systems collide,” above, sociophonetic research motivates connections between language and social processing in systems which can contribute to language processing absent the verbally accessible awareness of the listener. Strand (Reference Strand1999) provides just one example: phonological boundaries between /s/ and /ʃ/ being influenced by social perceptions of the speaker. The more culturally feminine the speaker seems to be, the more “feminine” the phonological boundary assigned to their speech by listeners (ambiguous tokens more likely to be heard as “shod” than “sod”). This suggests that language perception systems not only adapt flexibly to speakers and situations (Dahan et al. Reference Dahan, Drucker and Scarborough2008), but that this adaptation is influenced by social assessments. These social assessments must stem from relatively complex systems, given not only that gender itself is a complex social construct, but that Strand’s results demonstrate within-sex-class effects of degree of femininity or masculinity.
The role of social information in production systems is less well understood, but the body of variation work to date all bears strong witness to the influence of external social factors on essentially all levels of linguistic production. This influence may be triggered by clearly speaker-external factors like interlocutor, location, and topic (Blom and Gumperz Reference Blom, Gumperz, Gumperz and Hymes1972; Rickford and McNair-Knox Reference Rickford, McNair-Knox, Biber and Finegan1994), but frequently serve to support larger social goals, whether situational or ongoing projects of identity definition (Eckert Reference Eckert2000).
Much of the phenomena attributed to the sociolinguistic monitor and most of the phenomena documented in the third-wave tradition of variation can be handled in a model in which social information is connected to the grammar itself through associative links. Linguistic forms can be primed by interlocutors, physical locations, speech activities, and other external cues, facilitating their processing and increasing the likelihood of their production. They could also be primed by internally generated cues activated by social goals, memories of other interactions, or conscious reasoning or thoughts, accounting for volitional sociolinguistic style management. Such priming could presumably only influence forms already in the grammar, providing a natural limitation on speakers’ sociolinguistic performance by virtue of the challenges of language learning more generally.
The linkage of social information to grammatical structures and stored linguistic exemplars, however, does not obviate our need for a resource-heavy, attention-based process along the lines of the sociolinguistic monitor. Some sociolinguistic shifts occur apparently independently of conscious introspection, but other sociolinguistic behavior appears effortful, poorly integrated with other linguistic structures, and/or available to verbally accessible control. At a trivial level, it is possible for speaker to speak or refrain from speech when requested, or to produce specific words, including non-words, in response to verbal instructions. In spontaneous speech, speakers are observed at times producing less socially desirable forms, particularly when under cognitive load, tired, upset, or intoxicated. The existence of indexically based self-corrections, where speakers catch themselves producing an indexically less desired form and substitute another, likewise suggest that the concept of speakers “monitoring” their speech remains a useful notion. It is less common to hear reports of monitoring of sociolinguistic perception processes, but it may occur. Individuals with a commitment to linguistic equity may find, for example, that they catch themselves drawing stereotype-based conclusions about an interlocutor and attempt to rectify those impressions after the fact.
Little work has investigated the boundaries of conscious sociolinguistic control. As a result, we have very little systematic knowledge of, for example, which forms speakers are capable of consciously controlling in production (e.g. when explicitly instructed to do so, regardless of organic social motivations). Researchers frequently note informally when variables are subject to explicit commentary in the communities of use, but this has focused on what speakers tend to talk about rather than what they are capable of discussing. Sociolinguists are often in the habit of informally noting the difficulties respondents have in articulating specific language differences, but the limits on human ability to respond to, discuss, and control variables remain poorly understood.
Despite the clear existence of socially informed speech monitoring, there is no reason to believe that such monitoring is performed by a language-specific system. There are extensive literatures on various aspects of monitoring in speech (e.g. Levelt Reference Levelt1983; Blackmer and Mitton Reference Blackmer and Mitton1991), including speech perception (e.g. van de Meerendonk et al. Reference 151van de Meerendonk, Kolk, Chwilla and Vissers2009), and on the interactional mechanisms for detecting and repairing errors (e.g. Schegloff et al. Reference Schegloff, Jefferson and Sacks1977; Kitzinger Reference Kitzinger, Sidnell and Stivers2013), although not typically with a focus on indexical meanings of speech forms. The mechanisms for such monitoring are still very much an open question, with some theories proposing that the comprehension system itself serves as a monitor (Levelt Reference Levelt1983), while others suggest that an entirely distinct and less nuanced system attempts to predict behavior by both the speaker and any interlocutors (Pickering and Garrod Reference Pickering and Garrod2013, Reference Pickering and Garrod2014).
In the case of sociolinguistic production particularly, we might think of the object of monitoring being the speech produced by the speaker, or possibly the inner speech under preparation prior to utterance (Nooteboom Reference Nooteboom2005). This speech is produced from the grammar based on the previously learned linguistic systems and influenced by social context on an associative basis. The self-regulation system, however, applies editing to the speech to align with various social goals which might include producing grammatical speech (in both the linguistic sense of avoiding speech errors, as well as the prescriptive sense of avoiding socially stigmatized forms), but also, for example, producing specific forms appropriate to the situation such as more learned lexical items in a stressful professional setting or suppressing verbal indications of anger in a delicate conversation. Our understanding of sociolinguistic behavior would suggest that the cognitive constructs capable of being monitored for social factors are likely to be limited both in number and in formal complexity, relative to the objects manipulated by the grammar itself (Agha Reference Agha2007; Labov Reference Labov1993).
Alongside the grammar and a self-regulation system, another element is needed, namely the person perception system. Here I am departing from the traditional variationist model by separating the systems which evaluate the speech of other speakers and those which oversee speakers’ own speech. While it is clear that individuals’ actual judgments of others often bear some relation to their preferences in their own speech, this is not always the case (Labov Reference Labov1966, Reference Labov1993) and we have little evidence as to how closely linked the two processes are. When these systems do coincide, it is plausible that these alignments come from their shared relationship to belief and emotional systems of preference. As with self-regulation, person perception is likely not speech-specific, but rather forms part of a much larger system which also draws on visual information, content, and second-hand reports, among others. Work on the integration of visual, auditory, and semantic cues has focused primarily on the perception of emotion, but provides a useful starting point for sociolinguistic questions of information integration (e.g. de Gelder et al. Reference de Gelder, Pourtois and Weiskrantz2002; de Gelder and Vroomen Reference de Gelder and Vroomen2000; Nygaard and Queen Reference Nygaard and Queen2008). Person perception is a complex process in which “bottom-up” information drawn from, for example, direct observation or second-hand reports is combined with “top-down” expectations, including those prompted by situational structures and social category-triggered stereotypes (for one model on how these elements interact, see Freeman and Ambady Reference Freeman and Ambady2011). We see evidence of this interplay in sociolinguistic perception frequently, for example in Carmichael’s contribution to this volume.
These three independently motivated elements, a socially linked grammar, a general self-regulation system, and a general person perception, working together provide a more complete explanation for the sociolinguistic behavior modeled by the sociolinguistic monitor. Looking at the broad spectrum of sociolinguistic behavior discussed in “What are we modeling?” above, other systems are also necessary alongside these three, including general problem solving through which speakers might reason about what the wisest linguistic choices might be, or introspective reflection which might lead to the creation, alteration, or sharing of explicit ideological beliefs. The three discussed in this section, however, represent the central backbone of sociolinguistic cognition.
Conclusion
Work in sociolinguistic variation has recently begun to engage more closely with the sociolinguistic monitor as a theoretical construct, attempting to pin down its rates of sensitivity and time window (Labov et al. Reference Labov, Ash, Ravindranath, Weldon, Baranowski and Nagy2011), as well as its ability to operate on linguistic forms at varying levels of culturally established discussion or enregisterment (Levon and Fox Reference Levon and Fox2014). In this chapter I have argued that a more basic level of theorizing is needed, namely a discussion of what exactly the mechanisms are that speakers use to manage their sociolinguistic business. A responsible model of these mechanisms must draw on insights from language processing and social cognition, as well as the small but exciting body of work which has united one or both of these fields with sociolinguistics. I propose that the existing cognitive models within variation be more strongly informed by not only theoretical and psycholinguistics, but social cognition and cognitive psychology more broadly.
I propose that a full model of sociolinguistic processing is best built with independently motivated constructs, not with sociolinguistic-specific machinery. The first of these is a grammar with integrated social information, in the form of associative links between linguistic objects (including stored exemplars, phonological categories, lexical items, and syntactic constructions) and social cognitive constructs (including representations of individual people, social groups, personality traits, and emotions). Our model must also include a person perception system, which integrates visual cues, linguistic and paralinguistic information, third-party information, and more.
Finally, sociolinguistic processing also involves the self-regulatory system, tasked with monitoring behavior, including speech behavior, and initiating repairs when necessary. We might as a starting point follow existing variationist tradition and hypothesize that this self-regulatory system is able to access and attempt to control linguistic objects traditionally classified by sociolinguists as “above the level of consciousness,” as we also further develop our understanding of what is intended by “consciousness” in this description.
These models offer not only tools for explaining data that we have already gathered, but also guidance for future questions. Foremost among these are the points at which language processes are “visible” to social processes, including person perception and self-regulation, and/or subject to influence by them. While the evidence suggests that there exist links in both directions, we do not yet know how closely related the two directions are.
Finally, these models provide a caution for sociolinguists who are not strongly interested in issues of cognitive modeling. As the work in this volume demonstrates, there is interest in issues of awareness and control across a wide range of theoretical and methodological commitments in linguistic anthropology and sociolinguistics. It would be ideal as we engage in this work to understand precisely what we mean by terms like awareness, control, covert, overt, and salience, to name but a few, and to ground that understanding in cognitively realistic theory. Failing that Herculean accomplishment, it is important that we note where and how these terms are still poorly understood by those most involved in investigating them, so we may limit our use of such constructs to what they can comfortably support.