To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter introduces machine learning as a subset of artificial intelligence that enables computers to learn from data without explicit programming. It defines machine learning using Tom Mitchell’s formal framework and explores practical applications like self-driving cars, optical character recognition, and recommendation systems. The chapter focuses on regression as a fundamental machine learning technique, explaining linear regression for modeling relationships between variables. A key section covers gradient descent, an optimization algorithm that iteratively finds the best model parameters by minimizing error functions. Through hands-on Python examples, students learn to implement both linear regression and gradient descent algorithms, visualizing how models improve over iterations. The chapter emphasizes practical considerations for choosing appropriate algorithms, including accuracy, training time, linearity assumptions, and the number of parameters, preparing students for more advanced supervised and unsupervised learning techniques.
This chapter focuses on data collection methods, analysis approaches, and evaluation techniques in data science. It covers various data collection methods including surveys (with different question types like multiple-choice, Likert scales, and open-ended questions), interviews, focus groups, diary studies, and user studies in lab and field settings.
The chapter distinguishes between quantitative methods (using numerical measurements and statistical analysis) and qualitative methods (observing behaviors, attitudes, and opinions through techniques like grounded theory and constant comparison). It also discusses mixed-method approaches that combine both methodologies.
For evaluation, the chapter explains model comparison metrics including precision, recall, F-measure, ROC curves, AIC, and BIC. It covers validation techniques like training-testing splits, A/B testing, and cross-validation methods. The chapter emphasizes that data science involves pre-data collection planning and post-analysis evaluation, not just data processing.
This chapter introduces Python as a powerful yet beginner-friendly programming language essential for data science. It covers getting access to Python through direct installation or integrated development environments like Anaconda and Spyder. The chapter teaches fundamental programming concepts including basic operations, data types, and key data structures (lists, tuples, dictionaries, sets, and DataFrames). Students learn to write control structures using if-else statements and while/for loops, create reusable functions, and make programs interactive through user input. The chapter also explains how to install and use Python packages, which extend the language’s capabilities for specialized tasks. Throughout, practical examples demonstrate concepts like leap year calculations, temperature categorization, and sales data analysis. The chapter emphasizes Python’s accessibility, extensive package ecosystem, and suitability for data science applications, positioning it as an ideal tool for solving computational and data analysis problems.
This chapter covers unsupervised learning, where algorithms analyze data without known true labels or outcomes. Unlike supervised learning, the goal is to discover hidden patterns and structures in data.
The chapter explores three main techniques: Agglomerative clustering works bottom-up, starting with individual data points and merging similar ones into larger clusters. Divisive clustering (including k-means) takes a top-down approach, splitting data into smaller groups. Both methods use distance matrices and dendrograms to visualize cluster relationships.
Expectation Maximization (EM) handles incomplete data by iteratively estimating missing parameters using maximum likelihood estimation. Model quality is assessed using AIC and BIC criteria.
The chapter also introduces reinforcement learning, where agents learn optimal actions through trial-and-error interactions with environments, receiving rewards or penalties. Applications include robotics, gaming, and autonomous systems. Throughout, the chapter emphasizes the creative, interpretive nature of unsupervised learning compared to more structured supervised approaches.
The chapter focuses on the network and architecture layers of the design stack building up from device and circuit concepts introduced in Chapters 3 and 4. Architectural advantages like address-event representation stemming from neuromorphic models by leveraging spiking sparsity are discussed. Near-memory and in-memory architectures using CMOS implementations are first discussed followed by several emerging technologies, namely, correlated electron semiconductor-based devices, filamentary devices, organic devices, spintronic devices, and photonic neural networks.
This preface introduces “A Hands-On Introduction to Data Science with Python,” designed for advanced undergraduates and graduate students across diverse fields including information science, business, psychology, and sociology. The book requires minimal programming experience but expects computational thinking and basic statistics knowledge. It’s structured in four parts: foundations of data science, tools and platforms (Python programming and cloud computing), machine learning techniques, and real-world applications. The second edition adds "DS in Practice" boxes, cloud computing coverage, AI/ethics discussions, and downloadable datasets. The book emphasizes practical, hands-on learning with 39 solved exercises, 40 try-it-yourself problems, and 57 end-of-chapter problems, making data science accessible to non-technical students.
This chapter demonstrates R’s capabilities for statistical analysis and data science applications. It covers data importing from CSV/TSV files into R dataframes and computing basic statistics (mean, median, mode, variance, standard deviation) using built-in functions.
The chapter explores data visualization with ggplot2, creating histograms, bar charts, pie charts, and scatterplots for effective data presentation. Key statistical concepts include correlation analysis to measure variable relationships and statistical inference through hypothesis testing.
Practical statistical tests covered include t-tests for comparing two group means and ANOVA for comparing multiple groups. The chapter emphasizes R’s strengths in statistical computing, providing hands-on examples with real datasets and demonstrating how to interpret results for data-driven decision making.
This introductory chapter defines data science as a field focused on collecting, storing, and processing data to derive meaningful insights for decision-making. It explores data science applications across diverse sectors including finance, healthcare, politics, public policy, urban planning, education, and libraries. The chapter examines how data science relates to statistics, computer science, engineering, business analytics, and information science, while introducing computational thinking as a fundamental skill. It discusses the explosive growth of data (the 3Vs: velocity, volume, variety) and essential skills for data scientists, including statistical knowledge, programming abilities, and data literacy. The chapter concludes by addressing critical ethical concerns around privacy, bias, and fairness in data science practice.
This chapter introduces R, a free, open-source programming environment designed for data analysis and statistical computing. It covers R installation through RStudio IDE and demonstrates fundamental programming concepts including basic syntax, mathematical operations, and logical operators.
Key topics include data types (numeric, integer, character, logical, factor), data structures (vectors, matrices, lists), control structures (if-else statements, for/while loops), and functions for code organization and reusability.
The chapter emphasizes R’s advantages for data science: powerful statistical capabilities, extensive package ecosystem, and built-in data handling features. It concludes with R Markdown, which enables creation of professional reports combining code, output, and documentation in a single document for reproducible research and presentation.
This chapter explores supervised learning techniques where algorithms learn from labeled training data to make predictions. It begins with logistic regression for binary classification problems, using the sigmoid function to output probabilities between 0 and 1. Softmax regression extends this to multi-class problems. The chapter covers k-nearest neighbors (kNN), which classifies data points based on their similarity to training examples. Decision trees use entropy and information gain to create interpretable classification rules, while random forests combine multiple decision trees to reduce overfitting through ensemble methods. Naive Bayes applies Bayes’ theorem with independence assumptions for probabilistic classification, particularly effective for text classification. Finally, support vector machines (SVM) find optimal decision boundaries by maximizing margins between classes. Each technique is demonstrated through hands-on Python examples using real datasets, showing practical applications in various domains from healthcare to finance.