To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In linear regression, we approximate the regression function with a linear function and estimate the coefficients using least squares. We construct confidence intervals, prediction bands, and show how residual plots help check the linear approximation. We also review some other regression tools, that are not as widely used as in the past.
When the outcome Y is discrete rather than continuous, we refer to the problem of predicting Y as classification. In many ways, this is easier than predicting a continuous outcome since Y can only take a few values. Most of the methods we have covered so far can be adapted to handle discrete outcomes. One particular method, based on neural nets, is covered in Chapter 12.
In this chapter, we extend the methods from Chapter 2 to handle the case, where the number of features d is large. Least squares does not work well in this case. We introduce ridge regression and the lasso, which are two methods for high-dimensional regression. We also discuss the bias–variance tradeoff and the challenges in constructing confidence intervals in this setting.
So far, we have focused on predicting an outcome Y from a set of features X. In this chapter, we turn to causal inference where we ask: What would Y be if we set X to a particular value x? This question concerns the distribution of the outcome Y after some hypothetical intervention. Answering such causal questions requires new tools and stronger assumptions than prediction.
Regression and classification are used to produce a prediction of an outcome Y. In many cases, we will also want a prediction set C that contains Y with probability . In this chapter, we discuss two methods for constructing predictions sets. The first is based on quantile regression. The second uses a method called conformal inference.
In this chapter, we discuss nonparametric regression, which does not assume that the regression function is linear or even approximately linear. We only assume that is a smooth function of x. This chapter describes some of these methods in the case where there is a single feature. Chapter 7 considers the case of multiple features.
In this chapter, we introduce the concept of regression, which is a way to quantify the relationship between an outcome Y and a vector of features X. This relationship is expressed by the regression function, which is the mean of Y given X. This chapter discusses the main ideas and goals of regression analysis.
This work presents a nonlinear system identification framework for modeling the power extraction dynamics of wind turbines, including both freestream and waked conditions. The approach models turbine dynamics using data-driven power coefficient maps expressed as combinations of compact radial basis functions and polynomial bases, parameterized in terms of tip-speed ratio and upstream conditions. These surrogate models are embedded in a first-order dynamic system suitable for model-based control. Experimental validation is carried out in two wind tunnel configurations: a low-turbulence tandem setup and a high-turbulence wind farm scenario. In the tandem case, the identified model is integrated into an adapted $ K{\omega}^2 $ controller, resulting in improved tip-speed ratio tracking and power stability compared to steady-state models, while also outperforming the BEM-based model in the present low Reynolds number setting, where aerodynamic predictions are more uncertain. In the wind farm scenario, the model captures the statistical behavior of the turbines despite unresolved turbulence. The proposed method enables interpretable, adaptive control across a range of operating conditions without relying on black-box learning strategies.
While modelling deterioration or ageing of devices, first-passage times of Markov processes play a significant role, especially when the devices are subject to shocks and wear during their operation. In view of this, obtaining sufficient conditions for first-passage times to belong to specific ageing families constitutes an important problem. There exists a rich literature dealing with this class of problems; see for example [11], [18], [35], [67]. We address the same problem in the context of some new ageing classes such as DMTTF (IMTTF), IMIT, and DRHR. In this connection, some issues in [11] have also been investigated. We wind up by including certain examples to highlight the practical relevance of the results.
Rapid developments in Artificial Intelligence (AI) tools and their ever-expanding use in societal contexts raises questions regarding their design, social acceptance, and regulation. We develop a randomized vignette experiment, detailing six contemporary high-stakes contexts employing algorithmic tools and uncover distinct attitudes toward the source of the AI provisioning (public vs private sector). Participants are drawn from technology and law schools in India, representing key future stakeholders in the AI policy ecosystem in the country. Our main result indicates an overall preference for public sector provisioning of AI tools, with important implications on the social acceptability of public sector expansion of AI systems. Although the Global South has witnessed widespread expansion of AI tools with limited scrutiny, the nascent scholarship on AI attitudes disproportionately emanates from the Global North. Our study fills this gap, as well as the gap in empirical evidence on private vs public provisioning of AI tools in social contexts.
The usual definitions of algorithmic fairness focus on population-level statistics, such as demographic parity or equal opportunity. However, in many social or economic contexts, fairness is not perceived globally, but locally, through an individual’s peer network and comparisons. We propose a theoretical model of perceived fairness networks, in which each individual’s sense of discrimination depends on the local topology of interactions. We show that even if a decision rule satisfies standard criteria of fairness, perceived discrimination can persist or even increase in the presence of homophily or assortative mixing. We propose a formalism for the concept of fairness perception, linking network structure, local observation, and social perception. Analytical and simulation results highlight how network topology affects the divergence between objective fairness and perceived fairness, with implications for algorithmic governance and applications in finance and collaborative insurance.
This article explores the integration of large language models (LLMs) and AI research agents into global benchmarking frameworks, with a focus on data for the public good. Against a backdrop of shrinking funding and rising demand for scalable and reproducible assessments, we ask whether AI can assume core roles in indicator development, evidence discovery, and policy evaluation without compromising contextual nuance or democratic legitimacy. Building on pilot experiments conducted within the Global Data Barometer (GDB), we employed a phased, adaptive methodology that tested workflow-based platforms and deep research agents across tasks ranging from legal interpretation to multisource policy analysis. The preliminary findings suggest that while AI systems show strong potential for automating structured assessments, they falter on complex, fragmented, or normatively loaded indicators, raising concerns about opacity, overinterpretation, and inclusivity. To navigate these tensions, we propose a hybrid human-AI architecture that combines standardized workflows, adaptive agent capabilities, and critical human oversight. Central to this model is the concept of a dynamic evidence infrastructure, designed to embed participatory validation and enhance transparency. By reframing automation as augmentation, the study contributes both an empirical, domain-specific assessment of the opportunities and limits of AI-assisted benchmarking and a theoretical framework for sustainable, context-aware evaluation in the age of AI. We argue that the success of AI-assisted benchmarking should be measured not only in efficiency gains but also in its ability to strengthen legitimacy, accountability, and inclusiveness in data ecosystems worldwide.
In this work, we consider two coherent systems with shared components in the case when the component lifetimes are independent and identically distributed with a discrete-time distribution instead of a continuous-time distribution. Then, a discrete-time joint signature is proposed for the two systems by generalizing the traditional joint signature for systems with continuous lifetimes. Some stochastic properties of the proposed joint signature are studied in detail, including joint distribution, stochastic ordering, and transformation formula for comparison of pairs of systems of different sizes. Some illustrative examples are also presented.
We prove the existence of a stationary IARCH (integrated autoregressive conditionally heteroskedastic) process for a class of models with polynomially decaying coefficients.
This paper proposes a Palm-based framework for component importance measures conditioned on system failure events, uniformly applicable to nonrepairable, repairable, and phase-mission systems. The formulation does not require renewal, stationary, or Markovian assumptions. We classify measures into transition- and state-based importance. The former captures component failure transitions, recovering Birnbaum and Barlow–Proschan importance, while the latter provides an event-based interpretation of Fussell–Vesely importance. This framework yields well-defined finite-time and phase-conditioned measures for nonstationary systems, establishing a rigorous probabilistic foundation for failure-event-based analysis.
Analyses of online social interactions on major platforms have become central to evaluating the ideological leanings of users. Social scientists often represent individuals’ policy preferences as ideal points in a low-dimensional space. We first demonstrate that standard ideal-point estimation approaches can be temporally inconsistent by examining the evolution of ideological positions for political elites and non-elite Twitter users from 2017 to 2022. To address this instability and to improve overall measurement quality, we augment the social-network data conventionally used for ideal-point estimation with unstructured text data widely available on social media. Our results indicate that this data-augmented approach improves identification at the extreme ends of the ideological spectrum and yields robust estimates consistent with those from more complex word-embedding techniques, while offering greater computational efficiency and interpretability. This framework can enhance future studies that measure political ideology using behavioral data and inform broader discussions about integrating textual and network information.