To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Corpus linguistics involves compiling and examining authentic samples of everyday communication. Chapter 11, written by Phoebe Lin, explores its significance for L2 teaching and learning. It provides an overview of its goals and methods, and presents recent advances in corpus linguistics for the L2 classroom. This chapter demonstrates new and user-friendly online corpus tools for various L2 teaching and learning contexts, including vocabulary analysis, error correction, and writing idea suggestions. It also summarises findings about data-driven learning (DDL), a specialisation in corpus linguistics that explores best practices when incorporating corpus use inside and outside the classroom. A number of essential and practical questions are addressed, including the class time required for teaching concordancing, the types of learners who will particularly benefit from concordancing training, how to select concordancers based on learner needs, factors to consider when planning lessons involving concordancing, and so on. The chapter concludes by discussing the role of teachers in implementing corpus and concordancing training in the classroom.
This chapter describes ongoing corpus-based research on representations of Islam in the British press. The study involved building large corpora of newspaper articles about Islam and/or Muslims and using techniques like collocation and keywords to identify patterns of representation as well as differences between newspapers and change over time. The chapter outlines some of the key findings of the research as well as describing the various impact activities that were carried, and the challenges these presented. This includes working with a number of groups (ENGAGE, MEND, the Centre for Media Monitoring), presenting our work at the Labour Party Conference and in Parliament, as well as giving talks in mosques. We also detail how our project resulted in the creation of additional collections of newer corpora, enabling further examination of how representations have changed over time.
This article revisits the diachrony of the genitive alternation, the alternation between ’s and prepositional phrases headed by of in Present-Day English. It is usually assumed to have developed around 1400 CE. For Old English (c. 650–1000 CE), a different alternation between pre-modifying and post-modifying genitive-case-marked noun phrases is suggested to be the genitive alternation. Building on descriptions of competition between genitive-case-marked noun phrases (gen) and prepositional phrases with of (of) in Old English, and unpicking some of the preconceptions about the alternation in Old English, we propose a bottom-up method for systematically identifying possible alternation between of and gen in the York–Toronto–Helsinki Parsed Corpus of Old English Prose (Taylor et al. 2003). Our findings indicate that there is plausibly an alternation in Old English that stands in continuity with Present-Day English and suggest a more complex diachrony for the alternation characterized by continuity and discontinuity in the alternants and the envelope of variation.
Tagalog adjectives and nouns variably occur in two word orders, separated by an intermediary linker: adjective-linker-noun versus noun-linker-adjective. The linker has two phonologically conditioned surface forms, -ng and na. This article presents a large-scale corpus study of adjective/ noun order variation in Tagalog, focusing in particular on phonological conditions. Results show that word-order variation in adjective/noun pairs optimizes for phonological structure, abiding by phonotactic, syllabic, and morphophonological well-formedness preferences that are also found elsewhere in Tagalog grammar. The results indicate that surface phonological information is accessible for word-order choice.
This report describes a new research resource: a searchable database of 4,700 naturally occurring instances of sluicing in English, annotated so as to shed light on the questions that have shaped research on ellipsis since the 1960s. The paper describes the data set and how it can be obtained, how it was constructed, how it is organized, and how it can be queried. It also highlights some initial empirical findings, first describing general characteristics of the data, then focusing more closely on issues concerning antecedents and possible mismatches between antecedents and ellipsis sites.
This article uses formal and usage-based data and methods to argue for a hybrid model of English tensed auxiliary contraction combining lexical syntax with a dynamic exemplar lexicon. The hybrid model can explain why the contractions involve lexically specific phonetic fusions that have become morphologized and lexically stored, yet remain syntactically independent, and why the probability of contraction itself is a function of the adjacent cooccurrences of the subject and auxiliary in usage, yet is also subject to the constraints of the grammatical context. Novel evidence includes a corpus study and a formal analysis of a multiword expression of classic usage-based grammar.
This article focuses on French espèce de + NP! ‘you + NP!’ to make a case that impoliteness can be conventionalized in linguistic form beyond the level of the lexicon. We argue that the pattern can be considered a construction in its own right and also that it is strongly conventionalized for impoliteness in particular. To support this claim, we adopt both a corpus-based and a questionnaire-based approach. The corpus study reveals not only that espèce de + NP! mainly serves impolite purposes in actual usage but also that it tends to force an impolite interpretation onto noun phrases that do not themselves express negative evaluation. Our questionnaire study complements these findings by showing, inter alia, that the construction is generally judged to be ill-formed when combining with positively evaluative or evaluatively neutral nouns and, at the same time, that such nouns are indeed rated as impolite in the construction. It also points to a difference between calling someone espèce d’idiot! ‘you idiot!’ and calling them just idiot!. We conclude the article with some reflections on why espèce de + NP! is an impoliteness construction.
This article presents a dictionary-based study of vowel reduction and preservation in British English in initial pretonic position and intertonic position. The different variables which have been claimed to influence those processes are tested on a data set of over 4,500 words using regression analyses. Our results confirm the significant effects of syllable structure, position of the vowel, word frequency and opaque prefixation. They also provide weak evidence for other factors such as vowel features and the existence of a base in which the vowel bears a stress, although no clear effects of word segmentability could be found. We also report new findings, as we find that foreign words reduce less than non-foreign words; we find that [+back] vowels reduce less than [−back] vowels in initial pretonic position; and we find a difference in behaviour for vowels followed by /sC/ clusters between non-derived words and stress-shifted derivatives.
This paper investigates the nonstandard use of first‑person singular pronouns (myself and I) in coordinate constructions, such as John and I or John and myself. Native English speakers frequently disregard prescriptive grammar rules by using subject or reflexive forms in place of object forms in sentences like Give those papers to John and I. The frequency of such nonstandard usage raises questions, such as when and why speakers substitute nominative or reflexive pronouns for object pronouns in coordinate constructions, and what evidence exists for the existence of fixed constructions like X and I or X and myself. To address these questions, the study analyzes data from the Corpus of Contemporary American English (COCA). Findings provide strong evidence for the existence of an X and I construction in that the nonstandard form is common after the coordinator but not before. Evidence for an X and myself construction is weaker, since untriggered reflexives also appear outside coordinate constructions. First‑person singular forms are more likely to appear in hypercorrect and untriggered forms that other pronouns. The research suggests that X and I may be stored in a chunk, possibly due to overgeneralizations resulting from prescriptive corrections during language acquisition.
We present a new corpus of child and child-directed speech (CDS) in Palestinian Arabic. It includes transcriptions following the CHILDES guidelines and features recordings of 16 monolingual Palestinian Arabic-speaking children with an age range of 19–58 months and their adult interlocutors. We analyse the children’s morphosyntactic development and identify a variety of target word orders (45 in child speech, 50 in CDS), with prevalent SV(O) structures; we also found high rates of null subjects in both populations, marginal errors in children’s verbal agreement morphology, and early emergence of serial verb constructions, observed from 23 months of age.
Chapter 3 examines the consanguinity of Ovid’s two bodies, or corpora: his body of work (his textual corpus) and his physical body, which here represents his living body, corpse, tomb and biographical life. Medieval commentators took great interest in the relationship between Ovid’s bodies, responding diversely to the opportunities – and challenges – posed by Ovid’s insistent focus on the relationship. Their responses illuminate the mechanisms by which Ovid was transformed from an immoral, salacious poet to a moral, edifying one. A surprising element of that metamorphosis is that the pagan Ovid became a justifiably Christian poet for the medieval age. The chapter discusses Ovid’s presentation of his corpora in the exile poetry and the medieval obsession with Ovid’s tomb, before focusing on three medieval case studies: the Nolo Pater Noster anecdote, a medieval Latin narrative where two clerics are visited by the spirit of Ovid; Guillaume de Deguileville’s Le pèlerinage de la vie humaine and John Lydgate’s English rendering of the text, The Pilgrimage of the Life of Man, where a figure on pilgrimage encounters Ovid’s exilic revenant; and Christine de Pizan’s Le livre de la cité des dames, in which Ovid is resurrected only to be castrated.
This study reflects on Japan's language policy, focusing on the government‑led proposals implemented in 2006, which suggested replacing loanwords with Japanese equivalents, known as Gairaigo Iikae Teian ‘proposals for replacing loanwords’. By investigating English loanwords, this article explores the impact of English on Japanese vocabulary, while providing insights into the practical implementation of the government-led language policy in Japan for a broader global audience. It also clarifies that the objective of the proposals was not to strictly regulate the use of English loanwords but to offer suggestions, with replacement as one strategy to improve communication, especially when disseminating information through government agencies and media organisations. Through a quantitative investigation on the usage of English loanwords in the media, the results reveal that the overall number of media articles containing the loanwords in the proposed list has increased over the last 30 years. The findings also confirm that loanwords and their Japanese equivalents are not in competition, with one replacing the other. Instead, their usage exhibits a parallel trend in both frequency and increase rates.
This chapter provides an overview of corpus-based advances in Construction Grammar. After a brief introduction on kinds of data in linguistics in general and the notion of corpora in particular, I discuss a variety of corpus-based studies categorized into (i) largely qualitative studies, (ii) studies based on frequencies and probabilities, (iii) studies focusing on association strengths, and (iv) statistical as well as machine-learning studies. In each section, representative studies covering a variety of languages and questions are covered with an eye to surveying methodological as well as theoretical advantages. I conclude with an assessment of the state of the art by comparing how recent developments fare relative to Dąbrowska’s discussion of Cognitive Linguistics’s seven deadly sins.
This article explores the language of social media by analyzing a selection of linguistic features in four corpora of Swedish social media available at Språkbanken Text: Blog mix, Familjeliv, Flashback, and Twitter. Previous research describes the language of these corpora as informal, spoken-like, unedited, non-standard, and innovative. Our corpus analysis confirms the informal and spoken-like nature of social media, while also showing that these traits are unevenly distributed across the various social media corpora and that they are also present in other traditional written corpora, such as novels. Our findings also reveal that the social media corpora show traits of involved and interactional language.
Do different emotion terms trigger different metaphorical conceptualizations of emotions? What are the effects of the discourse context of the genre on metaphor choice in the conceptualization of emotion concepts? Finally, are such lexical and discourse–contextual effects on emotion-targeted metaphor choice quantifiable? Prior discourse-oriented research has demonstrated from a largely qualitative perspective that metaphor use is dynamic and sensitive to discursive contextual variables (e.g., Deignan et al., 2013; Semino 2010, 2011; Semino et al., 2013; Dorst 2015; Caballero 2016; Knapton & Rundblad, 2018). In the present study, these questions are addressed from a corpus-based multivariate perspective, where detailed qualitative analysis of found examples is combined with quantitative modeling. The study examines negative self-evaluative emotions in English, operationalized through their two nominal exponents, i.e., shame and embarrassment, as attested in the discourse context of three genres – fiction, magazine and spoken TV language. The data are first analyzed qualitatively for relevant contextual variables and then modelled quantitatively. The results demonstrate that while both lexical and genre effects are observed in metaphor choice in the conceptualization of negative self-evaluative emotional experience, their combined effect should also be accounted for, as these two variables are found to interact with each other.
Semantic equivalence in the affective domain is always a matter of degree, even for the words that may seem uncontroversial. For example, a word may be quoted in dictionaries as the semantic equivalent of another word and be used in practice as its most frequent translation equivalent, and yet those two words may significantly differ in meaning. This study focuses on one such case – that of the English term frustration and its cognates in Spanish (frustración), French (frustration) and German (Frustration). Using data from corpora and self-report, we find that, while frustration terms in Spanish, French and German reflect a cross-culturally stable type of low-power anger, or can denote affective experiences other than anger, English frustration refers to a prototypical anger experience characterized by high power. Converging evidence is presented from two psycholinguistic and two linguistic studies employing elicited and observational data. We offer a possible explanation for the observed semantic differences based on psychological appraisal theory and cross-cultural psychology. The novelties and limitations of our findings are discussed, along with their implications for researchers in the affective sciences.
Genesis and development of EFL learner’s dictionaries, innovative methods and features, and influence on dictionaries in other genres. Pioneering examples (NMED, GEW, ISED) featured simple definition, un/countability, verb patterns, collocations, ample examples, pictorial illustrations, IPA, etc., and paved the path for learner’s dictionaries to come; later generations of learner’s dictionaries converged into corpus basis and towards user-friendliness. Innovative and distinctive features include grading of headword importance, transparent grammar indication, signposts/menus for polysemous entries, controlled defining vocabulary, full-sentence definitions, and extensive use of corpora (manifest in frequency-based sense ordering, identification of frequent grammatical and lexical collocations, authentic illustrative examples). Features of English learner’s dictionaries are now incorporated in dictionaries for native speakers, and English learner’s dictionaries and English–Japanese dictionaries have been mutually influential. The evolution and innovation of learner’s dictionaries are mainly motivated by EFL learners’ needs for comprehension and production, driven by users’ rudimentary reference skills, and influenced by digital technology.
This chapter provides an overview of the process of conceiving, researching, editing, and publishing dictionaries, both synchronic (or commercial) and historical. Discussed methods and tools for making dictionaries range from traditional hand-copying of citations from print books and paper-and-pencil editing to sophisticated electronic technologies like databases, corpora, concordances, and networked editing software. The chapter shows how editorial conception of the needs and sophistication of the end user largely determines the dictionary’s length and headword list as well as the format, defining style, and level of detail in entries. The chapter goes on to examine how the pressures of commercial publishing, with its looming deadlines and pressing need to recoup investment by profits from sales, affect the scope of dictionaries and the amount of time editors can devote to a project, and how these pressures differ from those affecting longer-trajectory, typically grant-funded historical dictionaries. Assessing the consequent challenges for managing and motivating people working in these two very different situations, what may be the most important factor in a project’s success, concludes the survey of dictionary editing.
Concerning the “ungrammatical” interrogative form aren't I, many scholars have made their points. However, these scholars’ arguments are based on their personal observations and few studies have examined this phenomenon against large corpora. This study aimed at investigating the widespread usage of “ungrammatical” contraction form aren't I in question tags from both quantitative and qualitative perspectives. Based on large corpora, this study showed a clear picture of the current frequency of use of the question tags aren't I and other alternatives (amn't I, ain't I, am I not and an't I) in modern English. From a qualitative perspective, this study found that the reason why aren't I has taken hold as a recognized standard form around the globe lies in that the use of aren't I appears to be a smart coincidence to imply the potential double roles of “I” as both the addresser and the addressee in a monologue. In addition, the fact of the matter that amn't I is difficult to pronounce, am I not is bookish, an't I is old-fashioned and ain't I can only be used in informal situations, increases the popularity of aren't I. The findings of this study can justify the usage of “ungrammatical” aren't I as a natural norm in both British English and American English. These findings open new research avenues alongside pedagogical and sociolinguistic implications for other similar “ungrammatical” language phenomenon.
In diachronic development and contemporary structure of Slavic lexicons, we see influences of universal semantic mechanisms and specific historical processes, of language development, and of language contact. Old Church Slavonic played a role in forming Slavic vocabulary, especially in Russian, where specific or colloquial synonyms contrast with abstract or formal (golova ‘head as body part’ vs. glava ‘head as top in a hierarchy’). Semantic divergence of Proto-Slavic roots creates inter-lingual enantiosemy (e.g., Rus. čerstvyj ‘stale’ vs. Cze. čerstvý ‘fresh’). To compare languages we use regular abstract semantic relations, e.g. synonymy, antonymy, or lexical functions Magn, Oper. Linguistic expressions may differ, but we find similar semantic oppositions and derivation mechanisms. The languages share the same types of antonymy, albeit using different prefixes. Semantic bleaching patterns also agree: adjectives meaning ‘scary’ develop to mean ‘high degree’. Motion verbs such as ‘go’ come to mean process or result. We give case studies of lexical relations: Polish synonyms honor vs. cześć, Russian pravda vs. istina.