To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Inflectional morphology refers to the mapping from grammatical information to surface forms, which are typically realized as morphemes. This mapping often exhibits fusion, where several abstract features are expressed in a single morpheme that cannot be decomposed into meaningful parts. Here, we discuss crosslinguistic generalizations of morphological fusion. We argue that fusion reflects principles of efficient processing, as formalized by the memory–surprisal tradeoff (Hahn, Degen, & Futrell 2021), which is based on information-theoretic models of language processing from psycholinguistics. We first show that the existence of fusion itself can, in some situations, lead communicative codes to be more efficient under our processing model. Particularly, we reveal via simulation that the fusion of highly correlated features is more efficient for processing, whereas agglutination is more efficient when features are less correlated. We next discuss crosslinguistic patterns of fusion in real languages. First, we analyze well-known generalizations about features that are commonly fused across languages (e.g. tense, aspect, and mood), as well as a typological pattern regarding suppletion. In both cases, we find that the universals we study tend to reflect a tendency toward more efficient structure under our model of language processing. Finally, we use paradigm and frequency data from four languages to study informational fusion, a gradable measure of fusion defined in Rathi et al. 2021. We find that informational fusion is higher when features are highly correlated, which suggests that gradable fusion is also influenced by optimization for the memory–surprisal tradeoff.
This article is a crosslinguistic investigation of the hypothesis that the average information rate conveyed during speech communication results from a trade-off between average information density and speech rate. The study, based on seven languages, shows a negative correlation between density and rate, indicating the existence of several encoding strategies. However, these strategies do not necessarily lead to a constant information rate. These results are further investigated in relation to the notion of syllabic complexity.
Discover the foundations of classical and quantum information theory in the digital age with this modern introductory textbook. Familiarise yourself with core topics such as uncertainty, correlation, and entanglement before exploring modern techniques and concepts including tensor networks, quantum circuits and quantum discord. Deepen your understanding and extend your skills with over 250 thought-provoking end-of-chapter problems, with solutions for instructors, and explore curated further reading. Understand how abstract concepts connect to real-world scenarios with over 400 examples, including numerical and conceptual illustrations, and emphasising practical applications. Build confidence as chapters progressively increase in complexity, alternating between classic and quantum systems. This is the ideal textbook for senior undergraduate and graduate students in electrical engineering, computer science, and applied mathematics, looking to master the essentials of contemporary information theory.
The world's languages exhibit striking diversity. At the same time, recurring linguistic patterns suggest the possibility that this diversity is shaped by features of human cognition. One well-studied example is word order in complex noun phrases (like these two red vases). While many orders of these elements are possible, a subset appear to be preferred. It has been argued that this ordering reflects a single underlying representation of noun phrase structure, from which preferred orders are straightforwardly derived (e.g. Cinque 2005). Building on previous experimental evidence using artificial language learning (Culbertson & Adger 2014), we show that these preferred orders arise not only in existing languages, but also in improvised sequences of gestures produced by English speakers. We then use corpus data from a wide range of languages to argue that the hypothesized underlying structure of the noun phrase might be learnable from statistical features relating objects and their properties conceptually. Using an information-theoretic measure of strength of association, we find that adjectival properties (e.g. red) are on average more closely related to the objects they modify (e.g. wine) than numerosities are (e.g. two), which are in turn more closely related to the objects they modify than demonstratives are (e.g. this). It is exactly those orders which transparently reflect this—by placing adjectives closest to the noun, and demonstratives farthest away—that are more common across languages and preferred in our silent gesture experiments. These results suggest that our experience with objects in the world, combined with a preference for transparent mappings from conceptual structure to linear order, can explain constraints on noun phrase order.
Autoregressive language models generate text by predicting the next word from the preceding context. The regularities internalized from specific training data make this mechanism a useful proxy for historically situated readerly expectations, reflecting what earlier linguistic communities would find probable or meaningful. In this article, I pre-train a GPT model (223M parameters) on a broad corpus of Chinese texts (FineWeb Edu Chinese V2.1) and fine-tune it on the collected writings of Mao Zedong (1893–1976) to simulate the evolving linguistic landscape of post-1949 China. Identifying token sequences with the sharpest drops in perplexity – a measure of the model’s surprise – reveals the core phraseology of “Maospeak,” the militant language style that developed from Mao’s writings and pronouncements. A comparative analysis of modern Chinese fiction demonstrates how literature becomes unfamiliar to the fine-tuned model, generating perplexity spikes of increasing magnitude. The findings suggest a mechanism of attentional control: whereas propaganda backgrounds meaning through repetition (cognitive overfitting), literature foregrounds it through deviation (non-anomalous surprise). By visualizing token sequences as perplexity landscapes with peaks and valleys, the article reconceives style as a probabilistic phenomenon and showcases the potential of “cognitive stylometry” for literary theory and close reading .
We discuss a range of juncture phenomena, from the utterance down to the word, as a way of trying to come to grips with the question of what kind of domain the ‘word’ is in Australian languages. In some languages, we find unexpected behaviour internal to words – pauses, epenthetic elements, and intonational pitch resets – that indicates the presence of word-internal phonological or intonational phrase boundaries. This in turn raises questions about the status of phonological rules that operate across such junctures, and whether these rules are better regarded as sandhi phenomena. While the metrical structures of complex words indicate that morphology is critical to understanding the location of prosodic prominence, it is also clear that not all morphological relations are equally ‘visible’’to the metrical (or intonational) system. Relations of an unproductive or lexically conditioned kind, as in verb inflection, typically do not constitute a separate prosodic domain, while productive inflections and compound constructions regularly do. We discuss the ways in which this distinction is cashed out in other phonological behaviour – vowel lengthening and reduplication patterns – and the ways in which this morphology might be modelled.
We begin with some longstanding observations about the unusual character of sound change in Australia: first, that there is often a lack of evidence for sound change between related languages; second and relatedly, that sound change is characteristically structure-preserving in Australia: it does not result in changes to the inventory or the phonotactics. This characteristic appears to be behind both the apparent lack of sound change and the widespread homogeneity of inventories and phonotactics discussed in earlier chapters. We discuss one very widespread pattern of sound change – lenition – with respect to the kinds of segments and word positions involved, and the evident failure of these changes to spread through the lexicon in a standard Neogrammarian fashion. Rather, many sound changes appear to have the character of ‘lexical diffusion’. We also discuss the set of changes known as ‘initial dropping’, which affected languages in Cape York, Central Australia, and elsewhere, where radical sound changes did take place, leaving these languages with inventories and phonotactics that are quite different from those found elsewhere on the continent or indeed in the world. Such languages raise questions about the relationship between models of speech processing and language change.
This chapter starts by communicating how various aspects of our lives involve interacting with queues. It then provides a brief history of the main inception of queueing theory and its main governing princples, and discusses how it has impacted various aspects of our lives. It educates the reader about the main ideas and principles in queueing theory and also elaborates on the psychological aspects of waiting in queues. Showcasing various examples of how the main ideas in queueing theory have enabled important improvements, ranging from what happened during Queen Elizabeth II’s memorial, to the creation of the internet and modern telephones, to our experiences in airports or on roads, the chapter presents queueing theory as a potent branch of analytics science that has enabled scholars to make the world a better place. The chapter also discusses the vital interplays between queueing theory, public policy, and technology.
What is the optimal level of questionnaire detail required to measure bilingual language experience? This empirical evaluation compares alternative measures of language exposure of varying cost (i.e., questionnaire detail) in terms of their performance as predictors of oral language outcomes. The alternative measures were derived from Q-BEx questionnaire data collected from a diverse sample of 121 heritage bilinguals (5–9 years of age) growing up in France, the Netherlands and the UK. Outcome data consisted of morphosyntax and vocabulary measures (in the societal language) and parental estimates of oral proficiency (in the heritage language). Statistical modelling exploited information-theoretic and cross-validation approaches to identify the optimal language exposure measure. Optimal cost–benefit was achieved with cumulative exposure (for the societal language) and current exposure in the home (for the heritage language). The greatest level of questionnaire detail did not yield more reliable predictors of language outcomes.
Our last chapter is devoted to entropy. With this excuse we first present Shannon’s information theory, including the derivation of his entropy, and the enunciations and proofs of the source coding theorem and of the noisy-channel coding theorem. Then, we consider dynamical systems and the production of entropy in chaotic systems, termed Kolmogorov–Sinai entropy. For non-experts or readers who require a memory jog, we make a short recap of statistical mechanics. That is just enough to tie up some knots left untied in Chapter 4, when we developed large deviations theory for independent variables. Here we generalize to correlated variables and make one application to statistical mechanics. In particular, we find out that entropy is a large deviations function, apart from constants. We end with a lightning fast introduction to configurational entropy in disordered complex systems. Just to give a tiny glimpse of … what we do for a living!
Chapter 4 takes up the question of book size, including format (folio, quarto, etc.) as well as the adjectives applied to books (big, large, little, etc.). The rhetoric of book size gave people a way to talk about information.
Based on the long-running Probability Theory course at the Sapienza University of Rome, this book offers a fresh and in-depth approach to probability and statistics, while remaining intuitive and accessible in style. The fundamentals of probability theory are elegantly presented, supported by numerous examples and illustrations, and modern applications are later introduced giving readers an appreciation of current research topics. The text covers distribution functions, statistical inference and data analysis, and more advanced methods including Markov chains and Poisson processes, widely used in dynamical systems and data science research. The concluding section, 'Entropy, Probability and Statistical Mechanics' unites key concepts from the text with the authors' impressive research experience, to provide a clear illustration of these powerful statistical tools in action. Ideal for students and researchers in the quantitative sciences this book provides an authoritative account of probability theory, written by leading researchers in the field.
Language models can produce fluent, grammatical text. Nonetheless, some maintain that language models don’t really learn language and also that, even if they did, that would not be informative for the study of human learning and processing. On the other side, there have been claims that the success of LMs obviates the need for studying linguistic theory and structure. We argue that both extremes are wrong. LMs can contribute to fundamental questions about linguistic structure, language processing, and learning. They force us to rethink arguments and ways of thinking that have been foundational in linguistics. While they do not replace linguistic structure and theory, they serve as model systems and working proofs of concept for gradient, usage-based approaches to language. We offer an optimistic take on the relationship between language models and linguistics.
Fish swimming together in schools interact via multiple sensory pathways, including vision, acoustics and hydrodynamics, to coordinate their movements. Disentangling the specific role of each sensory pathway is an open and important question. Here, we propose an information-theoretic approach to dissect interactions between swimming fish based on their movement and the flow velocity at selected measurement points in the environment. We test the approach in a controlled mechanical system constituted by an actively pitching airfoil and a compliant flag that simulates the behaviour of two fish swimming in line. The system consists of two distinct types of interactions – hydrodynamic and electromechanical. By using transfer entropy of the measured time series, we unveil a strong causal influence of the airfoil pitching on the flag undulation with an accurate estimate of the time delay between the two. By conditioning the computation on the flow-speed information, recorded by laser Doppler velocimetry, we discover a significant reduction in transfer entropy, correctly implying the presence of a hydrodynamic pathway of interaction. Similarly, the electromechanical pathway of interaction is identified accurately when present. The study supports the potential use of information-theoretic methods to decipher the existence of different pathways of interaction between schooling fish.
Chapter 8 explores the asymptotic regime of quantum information processing, beginning with quantum typicality, which illustrates the convergence of quantum states toward a typical form with increasing copies. This leads to the asymptotic equipartition property (AEP), indicating that with a high number of copies, probability vectors become uniformly distributed. The method of types is introduced next, a tool from classical information theory that classifies sequences based on their statistical properties. This is crucial for understanding the behavior of large quantum systems and has implications for quantum data compression. Advancing to quantum hypothesis testing, the chapter outlines efficient strategies for distinguishing between two quantum states through repeated measurements. Central to this is the Quantum Stein’s lemma, which asserts the exponential decline in the error probability of hypothesis testing as the sample size of quantum systems increases. The chapter highlights the deep interplay between typicality, statistical methods, and hypothesis testing, laying the groundwork for asymptotic interconversion of quantum resources.
Information processing is a process of uncertainty resolution. Information-theoretic constructs such as surprisal and entropy reflect the fine-grained probabilistic knowledge which people have accumulated over time. The information-theoretic constructs explain the extent of processing difficulty that people encounter, for example when comprehending language. Processing difficulty and cognitive effort in turn are a direct reflection of predictability.
This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. Introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite block-length approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC Bayes and variational principle, Kolmogorov's metric entropy, strong data processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by a solutions manual for instructors, and additional standalone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.
Convergence of the expectation-maximization (EM) algorithm to a global optimum of the marginal log likelihood function for unconstrained latent variable models with categorical indicators is presented. The sufficient conditions under which global convergence of the EM algorithm is attainable are provided in an information-theoretic context by interpreting the EM algorithm as alternating minimization of the Kullback–Leibler divergence between two convex sets. It is shown that these conditions are satisfied by an unconstrained latent class model, yielding an optimal bound against which more highly constrained models may be compared.
This chapter introduces communication and information theoretical aspects of molecular communication, relating molecular communication to existing techniques and results in communication systems. Communication models are discussed, as well as detection and estimation problems. The information theory of molecular communication is introduced, and calculation of the Shannon capacity is discussed.
We develop and demonstrate a computationally cheap framework to identify optimal experiments for Bayesian inference of physics-based models. We develop the metrics (i) to identify optimal experiments to infer the unknown parameters of a physics-based model, (ii) to identify optimal sensor placements for parameter inference, and (iii) to identify optimal experiments to perform Bayesian model selection. We demonstrate the framework on thermoacoustic instability, which is an industrially relevant problem in aerospace propulsion, where experiments can be prohibitively expensive. By using an existing densely sampled dataset, we identify the most informative experiments and use them to train the physics-based model. The remaining data are used for validation. We show that, although approximate, the proposed framework can significantly reduce the number of experiments required to perform the three inference tasks we have studied. For example, we show that for task (i), we can achieve an acceptable model fit using just 2.5% of the data that were originally collected.