To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter introduces machine learning as a subset of artificial intelligence that enables computers to learn from data and make predictions without explicit programming. It defines machine learning through Tom Mitchell’s formal framework and explores real-world applications like self-driving cars, optical character recognition, and recommendation systems. The chapter focuses on regression as a fundamental machine learning technique, covering both linear modeling approaches and gradient descent algorithms for parameter optimization. Through hands-on examples using R, students learn to implement linear regression and gradient descent from scratch, understanding how models minimize error functions to find optimal parameters. The chapter emphasizes practical application over theoretical derivations.
This chapter introduces machine learning as a subset of artificial intelligence that enables computers to learn from data without explicit programming. It defines machine learning using Tom Mitchell’s formal framework and explores practical applications like self-driving cars, optical character recognition, and recommendation systems. The chapter focuses on regression as a fundamental machine learning technique, explaining linear regression for modeling relationships between variables. A key section covers gradient descent, an optimization algorithm that iteratively finds the best model parameters by minimizing error functions. Through hands-on Python examples, students learn to implement both linear regression and gradient descent algorithms, visualizing how models improve over iterations. The chapter emphasizes practical considerations for choosing appropriate algorithms, including accuracy, training time, linearity assumptions, and the number of parameters, preparing students for more advanced supervised and unsupervised learning techniques.
In any data analysis we should look for ability to predict and for connections to a broader comparative context. Our equations must not predict absurdities, even under extreme circumstances, if we want to be taken seriously as scientists. Poorly done linear regression analysis often does lead to absurd predictions. Fixed exponent and exponential patterns seem more prevalent in social nature than linear patterns. Before applying regression to two variables, graph them against each other, showing the borders of the conceptually allowed space and possible logical anchor points. Transform the data until anchor points and data points do fit a straight line which does not pierce conceptual ceilings or floors. During regression, consider symmetric regression, because Ordinary Least Squares y-on-x and x-on-y differ from each other and their slopes depend on the degree of scatter. After regression, look at the numerical values of parameters and ask what they tell us in a comparative context. When considering multivariable regression, pay more than lip service to Occam's Razor.
This chapter discusses the most basic frequentist method, the linear least squares (LLS) regression, for obtaining the optimal weights for the linear regression model that minimizes the squared error between the model prediction and observation. The chapter also considers how the goodness of its results can be evaluated quantitatively by the coefficient of determination (R-squared). The chapter then further discusses some variations of LLS, including ridge regression with an extra regularization term to make proper tradeoff between overfitting and underfitting, and linear method based on basis functions for nonlinear regression problems. Finally the last section of the chapter briefly discusses the Bayes methods that maximizes the likelihood and the posterior probability of the parameters of the linear regression model. This section serves to prepare the reader for discussion of various Bayesian learning algorithms in future chapters.
Chapter 2 explores the fundamental concept of least squares, covering its geometric, algebraic, and numerical aspects. The chapter begins with a review of vector spaces and matrix inverses, then introduces the geometry of least squares through orthogonal projections. It presents the QR decomposition and Householder transformations as efficient methods for solving least-squares problems. The chapter concludes with an application to regression analysis, demonstrating how to fit linear and polynomial models to data. Key topics include the normal equations, orthogonal decomposition, and the Gram–Schmidt algorithm. The chapter also addresses the issue of overfitting in polynomial regression, highlighting the importance of model selection in data analysis. The chapter includes practical Python implementations and numerical examples to reinforce the theoretical concepts.
Using a linear regression to make predictions from a data set can be found everywhere. From Amazon.com suggesting books for customers to businesses predicting revenue from advertising, regression analyses can be found in everyday life. A researcher can use a multiple regression analysis for predicting cause-and-effect relationships between the dependent and independent variables in a study. Unlike an ANOVA, regressions allow for categorical and scalar independent variables to be included together within the analysis. In this chapter, student researchers are taught what a regression analysis is, what it can be used for, and how to perform a multiple linear regression using SPSS and R. Students are also taught how to interpret regression results for testing hypotheses.
This article addresses a critical research gap by investigating the link between social polarization and contemporary societal challenges, such as poverty and crime. Despite the existence of numerous studies describing the effects of income disparity, the role of social stratification in inducing delinquency and poorness in transitional economies has not received sufficient consideration. The author conducts an analysis of income inequality metrics (IIMs) across Commonwealth of Independent States (CIS) nations, examining dependencies that provide a precise depiction of the correlation between socioeconomic disparity and urgent society’s problems. The study disclosed that the lowest proportion of the impoverished was observed in countries with a minimal income disparity, such as Belarus and Kazakhstan; in contrast, nations that exhibit a significant degree of income differentiation, such as Kyrgyzstan, Moldova, and Armenia, demonstrate the highest proportion of the impoverished. Following a linear regression approach, the study revealed a strong positive correlation between IIMs and crime rate. By shedding light on these dependencies, the article provides fresh insights into the dynamic relationship between major social problems in the CIS countries.
This chapter discusses techniques to build predictive models from data and to quantify the uncertainty of the model parameters and of the model predictions. The chapter discusses important concepts of linear and nonlinear regression and focuses on a couple of major paradigms used for estimation: maximum likelihood and Bayesian estimation. The chapter also discusses how to incorporate prior knowledge in the estimation process.
This chapter introduces linear regression, the workhorse of statistics and (supervised) learning, which relates an input vector to an output response by means of linear combination.
This chapter covers regression and classification, where the goal is to estimate a quantity of interest (the response) from observed features. In regression, the response is a numerical variable. In classification, it belongs to a finite set of predetermined classes. We begin with a comprehensive description of linear regression and discuss how to leverage it to perform causal inference. Then, we explain under what conditions linear models tend to overfit or to generalize robustly to held-out data. Motivated by the threat of overfitting, we introduce regularization and ridge regression, and discuss sparse regression, where the goal is to fit a linear model that only depends on a small subset of the available features. Then, we introduce two popular linear models for binary and multiclass classification: Logistic and softmax regression. At this point, we turn our attention to nonlinear models. First, we present regression and classification trees and explain how to combine them via bagging, random forests, and boosting. Second, we explain how to train neural networks to perform regression and classification. Finally, we discuss how to evaluate classification models.
This chapter covers quantum linear system solvers, which are quantum algorithmic primitives for solving a linear system of equations. The linear system problem is encountered in many real-world situations, and quantum linear system solvers are a prominent ingredient in quantum algorithms in the areas of machine learning and continuous optimization. Quantum linear systems solvers do not themselves solve end-to-end problems because their output is a quantum state, which is one of its major caveats.
Principal components analysis can be redefined in terms of the regression of observed variables upon component variables. Two criteria for the adequacy of a component representation in this context are developed and are shown to lead to different component solutions. Both criteria are generalized to allow weighting, the choice of weights determining the scale invariance properties of the resulting solution. A theorem is presented giving necessary and sufficient conditions for equivalent component solutions under different choices of weighting. Applications of the theorem are discussed that involve the components analysis of linearly derived variables and of external variables.
An extension of component analysis to longitudinal or cross-sectional data is presented. In this method, components are derived under the restriction of invariant and/or stationary compositing weights. Optimal compositing weights are found numerically. The method can be generalized to allow differential weighting of the observed variables in deriving the component solution. Some choices of weightings are discussed. An illustration of the method using real data is presented.
Mediation analysis practices in social and personality psychology would benefit from the integration of practices from statistical mediation analysis, which is currently commonly implemented in social and personality psychology, and causal mediation analysis, which is not frequently used in psychology. In this chapter, I briefly describe each method on its own, then provide recommendations for how to integrate practices from each method to simultaneously evaluate statistical inference and causal inference as part of a single analysis. At the end of the chapter, I describe additional areas of recent development in mediation analysis that that social and personality psychologists should also consider adopting I order to improve the quality of inference in their mediation analysis: latent variables and longitudinal models. Ultimately, this chapter is meant to be a kind introduction to causal inference in the context of mediation with very practical recommendations for how one can implement these practices in one’s own research.
This chapter is devoted to extensive instruction regarding bivariate regression, also known as ordinary least squares regression (OLS).Students are presented with a scatterplot of data with a best-fitting line drawn through it.They are instructed on how to calculate the equation of this line (least squares line) by hand and with the R Commander.Interpretation of the statistical output of the y-intercept, beta coefficient, and R-squared value are discussed.Statistical significance of the beta coefficient and its implications for the relationship between an independent and dependent variable are described.Finally, the use of the regression equation for prediction is illustrated.
Many applications require solving a system of linear equations 𝑨𝒙 = 𝒚 for 𝒙 given 𝑨 and 𝒚. In practice, often there is no exact solution for 𝒙, so one seeks an approximate solution. This chapter focuses on least-squares formulations of this type of problem. It briefly reviews the 𝑨𝒙 = 𝒚 case and then motivates the more general 𝑨𝒙 ≈ 𝒚 cases. It then focuses on the over-determined case where 𝑨 is tall, emphasizing the insights offered by the SVD of 𝑨. It introduces the pseudoinverse, which is especially important for the under-determined case where 𝑨 is wide. It describes alternative approaches for the under-determined case such as Tikhonov regularization. It introduces frames, a generalization of unitary matrices. It uses the SVD analysis of this chapter to describe projection onto a subspace, completing the subspace-based classification ideas introduced in the previous chapter, and also introduces a least-squares approach to binary classifier design. It introduces recursive least-squares methods that are important for streaming data.
In this chapter we cover clustering and regression, looking at two traditional machine learning methods: k-means and linear regression. We briefly discuss how to implement these methods in a non-distributed manner first, to then carefully analyze the bottlenecks of these methods when manipulating big data. This enables us to design global-based solutions based on the DataFrame API of Spark. The key focus is on the principles for designing solutions effectively. Nevertheless, some of the challenges in this chapter are to investigate tools from Spark to speed up the processing even further. k-means is an example of an iterative algorithm, and how to exploit caching in Spark, and we analyze its implementation with both RDD and DataFrame APIs. For linear regression, we first implement the closed form, which involves numerous matrix multiplications and outer products, to simplify the processing in big data. Then, we look at gradient descent. These examples give us the opportunity to expand on the principles of designing a global solution, and also allow us to show how knowing the underlying platform, Spark in this case, well is essential to really maximize the performance.
In this chapter, we introduce some of the more popular ML algorithms. Our objective is to provide the basic concepts and main ideas, how to utilize these algorithms using Matlab, and offer some examples. In particular, we discuss essential concepts in feature engineering and how to apply them in Matlab. Support vector machines (SVM), K-nearest neighbor (KNN), linear regression, Naïve Bayes algorithm, and decision trees are introduced and the fundamental underlying mathematics is explained while using Matlab’s corresponding Apps to implement each of these algorithms. A special section on reinforcement learning is included, detailing the key concepts and basic mechanism of this third ML category. In particular, we showcase how to implement reinforcement learning in Matlab as well as make use of some of the Python libraries available online and show how to use reinforcement learning for controller design.
Regression models with log-transformed dependent variables are widely used by social scientists to investigate nonlinear relationships between variables. Unfortunately, this transformation complicates the substantive interpretation of estimation results and often leads to incomplete and sometimes even misleading interpretations. We focus on one valuable but underused method, the presentation of quantities of interest such as expected values or first differences on the original scale of the dependent variable. The procedure to derive these quantities differs in seemingly minor but critical aspects from the well-known procedure based on standard linear models. To improve empirical practice, we explain the underlying problem and develop guidelines that help researchers to derive meaningful interpretations from regression results of models with log-transformed dependent variables.
Simple linear regression is extended to multiple linear regression (for multiple predictor variables) and to multivariate linear regression for (multiple response variables). Regression with circular data and/or categorical data is covered. How to select predictors and how to avoid overfitting with techniques such as ridge regression and lasso are followed by quantile regression. The assumption of Gaussian noise or residual is removed in generalized least squares, with applications to optimal fingerprinting in climate change.