To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This comprehensive modern look at regression covers a wide range of topics and relevant contemporary applications, going well beyond the topics covered in most introductory books. With concision and clarity, the authors present linear regression, nonparametric regression, classification, logistic and Poisson regression, high-dimensional regression, quantile regression, conformal prediction and causal inference. There are also brief introductions to neural nets, deep learning, random effects, survival analysis, graphical models and time series. Suitable for advanced undergraduate and beginning graduate students, the book will also serve as a useful reference for researchers and practitioners in data science, machine learning, and artificial intelligence who want to understand modern methods for data analysis.
Statistical modelling and machine learning offer a vast toolbox of inference methods with which to model the world, discover patterns and reach beyond the data to make predictions when the truth is not certain. This concise book provides a clear introduction to those tools and to the core ideas – probabilistic model, likelihood, prior, posterior, overfitting, underfitting, cross-validation – that unify them. A mixture of toy and real examples illustrates diverse applications ranging from biomedical data to treasure hunts, while the accompanying datasets and computational notebooks in R and Python encourage hands-on learning. Instructors can benefit from online lecture slides and exercise solutions. Requiring only first-year university-level knowledge of calculus, probability and linear algebra, the book equips students in statistics, data science and machine learning, as well as those in quantitative applied and social science programmes, with the tools and conceptual foundations to explore more advanced techniques.
Understanding change over time is a critical component of social science. However, data measured over time – time series – requires their own set of statistical and inferential tools. In this book, Suzanna Linn, Matthew Lebo, and Clayton Webb explain the most commonly used time series models and demonstrate their applications using examples. The guide outlines the steps taken to identify a series, make determinations about exogeneity/endogeneity, and make appropriate modelling decisions and inferences. Detailing challenges and explanations of key techniques not covered in most time series textbooks, the authors show how navigating between data and models, deliberately and transparently, allows researchers to clearly explain their statistical analyses to a broad audience.
Bridging theory and practice in network data analysis, this guide offers an intuitive approach to understanding and analyzing complex networks. It covers foundational concepts, practical tools, and real-world applications using Python frameworks including NumPy, SciPy, scikit-learn, graspologic, and NetworkX. Readers will learn to apply network machine learning techniques to real-world problems, transform complex network structures into meaningful representations, leverage Python libraries for efficient network analysis, and interpret network data and results. The book explores methods for extracting valuable insights across various domains such as social networks, ecological systems, and brain connectivity. Hands-on tutorials and concrete examples develop intuition through visualization and mathematical reasoning. The book will equip data scientists, students, and researchers in applications using network data with the skills to confidently tackle network machine learning projects, providing a robust toolkit for data science applications involving network-structured data.
This appendix delves into the mathematical foundations of network representation techniques, focusing on two key areas: maximum likelihood estimation (MLE) and spectral embedding theory. It begins by exploring MLE for Erdös-Rényi (ER) and stochastic block model (SBM) networks, demonstrating the unbiasedness and consistency of estimators. The limitations of MLE for more complex models are discussed, leading to the introduction of spectral methods. The chapter then presents theoretical considerations for spectral embeddings, including the adjacency spectral embedding (ASE) and its statistical properties. It explores the concepts of consistency and asymptotic normality in the context of random dot product graphs (RDPGs). Finally, we extend these insights to multiple network models, covering graph matching for correlated networks and joint spectral embeddings like the omnibus embedding and multiple adjacency spectral embedding (MASE).
This chapter presents a unified framework for analyzing complex networks through statistical models. Starting with the Inhomogeneous Erdős-Rényi model’s concept of independent edge probabilities, we progress through increasingly sophisticated representations, including the Erdös-Rényi, Stochastic Block Model, and Random Dot Product Graph (RDPG) models. We explore how each model generalizes its predecessors, with the RDPG encompassing many earlier models under certain conditions. The crucial role of positive semidefiniteness in connecting block models to RDPGs is examined, providing insight into model interrelationships. We also introduce models addressing specific network characteristics, such as heterogeneous node degrees and edge-based clustering. The chapter extends to multiple and correlated network models, demonstrating how concepts from simpler models inform more complex scenarios. A hierarchical framework is presented, unifying these models and illustrating their relative generality, thus laying the groundwork for advanced network analysis techniques.
This chapter explores practical applications of network representation learning techniques for analyzing individual networks. It begins by addressing the community detection problem, demonstrating how to estimate community labels using network embeddings. The chapter then discusses the challenges posed by network sparsity and introduces efficient storage methods for sparse networks. The text proceeds to examine testing for differences between groups of edges, applying hypothesis testing to stochastic block models and structured independent edge models. It also covers model selection techniques for stochastic block models, helping readers choose appropriate levels of model complexity. The chapter introduces the vertex nomination problem, which aims to identify nodes similar to a set of known "seed" nodes. It presents spectral vertex nomination techniques and explores extensions to related problems. Finally, the chapter addresses out-of-sample embedding, providing efficient strategies for embedding new nodes into existing network representations. This approach is particularly valuable for large-scale, dynamic networks where frequent re-embedding would be computationally prohibitive.
This chapter explores techniques for analyzing and comparing pairs of networks, building on previously introduced statistical models and representation learning methods. It focuses on two-sample testing for networks, introducing methods to determine whether two network observations are sampled from the same or different random networks. The chapter covers latent position and distribution testing, addressing nonidentifiability issues in network comparisons. It then explores specialized techniques for comparing stochastic block models (SBMs), leveraging their community structure and discussing methods for testing differences in block matrices, including density adjustment approaches. A significant portion is devoted to the graph matching problem, addressing the challenge of identifying node correspondences between networks. This section introduces permutation matrices and explores optimization-based methods, including gradient descent approaches, for both exact and inexact matching scenarios. Throughout, the chapter emphasizes practical implementations with code examples, bridging the gap between theoretical concepts and real-world applications in network analysis. These techniques provide a comprehensive toolkit for comparing networks, essential for understanding evolving networks, analyzing differences across domains, and integrating multisource network data.
This appendix provides a comprehensive overview of statistical network models, building from fundamental concepts to advanced frameworks. It begins with essential mathematical background and probability theory, then introduces the foundations of random network models. The appendix covers a range of models, including Erdös-Rényi, stochastic block models (both a priori and a posteriori), random dot product graphs, and their generalizations. Each model is presented with its parameters, generative process, probability calculations, and equivalence classes. The appendix also explores degree-corrected variants and the Inhomogeneous Erdös-Rényi model. Throughout, we emphasize the relationships between models and their increasing complexity, providing a solid theoretical foundation for understanding network structures and dynamics.
This chapter presents a comprehensive workflow for applying network machine learning to functional MRI connectomes. We demonstrate data preprocessing, edge weight transformations, and spectral embedding techniques to analyze multiple brain networks simultaneously. Using multiple adjacency spectral embedding (MASE) and unsupervised clustering, we identify functionally similar brain regions across subjects. Results are visualized through abstract representations and brain-space projections, and compared with established brain parcellations. Our findings reveal that MASE-derived communities often align with known functional and spatial organization of the brain, particularly in occipital and parietal areas, while also identifying regions where functional similarity doesn’t imply spatial proximity. We illustrate how network machine learning can uncover meaningful patterns in complex neuroimaging data, emphasizing the importance of combining algorithmic approaches with domain expertise to motivate the remainder of the book.
This chapter introduces the network machine learning landscape, bridging traditional machine learning with network-specific approaches. It defines networks, contrasts them with tabular data structures, and explains their ubiquity in various domains. The chapter outlines different types of network learning systems, including single vs. multiple network, attributed vs. non-attributed, and model-based vs. non-model-based approaches. It also discusses the scope of network analysis, from individual edges to entire networks. The chapter concludes by addressing key challenges in network machine learning, such as imperfect observations, partial network visibility, and sample limitations. Throughout, it emphasizes the importance of statistical learning in generalizing findings from network samples to broader populations, setting the stage for more advanced concepts in subsequent chapters.
This chapter presents a framework for learning useful representations, or embeddings, of networks. Building on the statistical models from Chapter 4, we explore techniques to transform complex network data into vector representations suitable for traditional machine learning algorithms. We begin with maximum likelihood estimation for simple network models, then motivate the need for network embeddings by contrasting network dependencies with typical machine learning independence assumptions. We progress through spectral embedding methods, introducing adjacency spectral embedding (ASE) for learning latent position representations from adjacency matrices, and Laplacian spectral embedding (LSE) as an alternative approach effective for networks with degree heterogeneities. The chapter then extends to multiple network representations, exploring parallel techniques like omnibus embedding (OMNI) and fused methods such as multiple adjacency spectral embedding (MASE). We conclude by addressing the estimation of appropriate latent dimensions for embeddings. Throughout, we emphasize practical applications with code examples and visualizations. This unified framework for network embedding enables the application of various machine learning algorithms to network analysis tasks, bridging complex network structures and traditional data analysis techniques.
This appendix provides a concise introduction to key machine learning techniques employed throughout the book. It focuses on two main areas: unsupervised learning and Bayesian classification. The appendix begins with an exploration of K-means clustering, a fundamental unsupervised learning algorithm, demonstrating its application to network community detection. It then discusses methods for evaluating unsupervised learning techniques, including confusion matrices and the adjusted Rand index. The silhouette score is introduced as a metric for assessing clustering quality across different numbers of clusters. The appendix concludes with an explanation of the Bayes plugin classifier, a simple yet effective tool for network classification tasks.