To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter explores the fundamentals of data in data science, covering data types (structured vs. unstructured), collection sources (open data, social media APIs, multimodal data, synthetic data), and storage formats (CSV, TSV, XML, RSS, JSON). It emphasizes the critical importance of data pre-processing, including data cleaning (handling missing values, smoothing noisy data, data munging), integration, transformation, reduction, and discretization. Through hands-on examples, the chapter demonstrates how to systematically prepare "dirty" real-world data for analysis by addressing inconsistencies, outliers, and missing information. The chapter highlights that data preparation is often half the battle in data science, requiring both technical skills and careful attention to data quality and bias.
This introductory chapter defines data science as a field focused on collecting, storing, and processing data to derive meaningful insights for decision-making. It explores data science applications across diverse sectors including finance, healthcare, politics, public policy, urban planning, education, and libraries. The chapter examines how data science relates to statistics, computer science, engineering, business analytics, and information science, while introducing computational thinking as a fundamental skill. It discusses the explosive growth of data (the 3Vs: velocity, volume, variety) and essential skills for data scientists, including statistical knowledge, programming abilities, and data literacy. The chapter concludes by addressing critical ethical concerns around privacy, bias, and fairness in data science practice.
This chapter focuses on data collection methods, analysis approaches, and evaluation techniques in data science. It covers various data collection methods including surveys (with different question types like multiple-choice, Likert scales, and open-ended questions), interviews, focus groups, diary studies, and user studies in lab and field settings.
The chapter distinguishes between quantitative methods (using numerical measurements and statistical analysis) and qualitative methods (observing behaviors, attitudes, and opinions through techniques like grounded theory and constant comparison). It also discusses mixed-method approaches that combine both methodologies.
For evaluation, the chapter explains model comparison metrics including precision, recall, F-measure, ROC curves, AIC, and BIC. It covers validation techniques like training-testing splits, A/B testing, and cross-validation methods. The chapter emphasizes that data science involves pre-data collection planning and post-analysis evaluation, not just data processing.
This chapter introduces decision-making criteria for the three core choices of qualitative research design: the number of cases to study, which cases to study, and which particular analysis technique to use. Background research is important in identifying potential cases to study, especially cases that represent negative evidence, and in making case selections. Negative evidence plays a key role in maximizing the credibility of hypothesis tests by creating variation in the values of key variables. Common techniques in qualitative analysis include process tracing, content analysis, analytic narratives, structured focused comparison, and methods of similarity; principles of research design and case selection vary across these techniques.
A framing case study discusses child workers in Bolivia. Then the chapter provides an overview of international human rights law. The chapter first discusses the historical origins of the human rights movement and the multilateral and regional human rights systems. Then it outlines major physical integrity rights, including laws that prohibit genocide, ethnic cleansing, torture, and human trafficking. It next turns to major civil and political rights, including the right to free expression, assembly, and association, various religious protections, and criminal justice rights. Finally, it examines major economic, social, and cultural rights, including rules about labor, economic and social assistance, cultural rights, and the rights of marginalized groups, like women, children, and the disabled.
This chapter introduces cloud computing platforms essential for modern data science work. It covers three major cloud services: Google Cloud Platform (GCP), Microsoft Azure, and Amazon Web Services (AWS). Students learn to create virtual machines, configure storage, and access cloud resources through SSH connections. The chapter demonstrates hands-on Python development using browser-based IDEs like Google Colab, Azure Machine Learning notebooks, and AWS Cloud9. Key topics include setting up accounts, managing costs through free tiers, and leveraging cloud resources for data science projects. The chapter also covers Hadoop for big data processing and discusses platform migration strategies. Practical exercises guide students through currency conversion programs, interactive calculations, and Olympic year predictions, emphasizing that cloud computing skills are now essential for data science professionals due to scalable processing power and storage capabilities.
This chapter introduces machine learning as a subset of artificial intelligence that enables computers to learn from data and make predictions without explicit programming. It defines machine learning through Tom Mitchell’s formal framework and explores real-world applications like self-driving cars, optical character recognition, and recommendation systems. The chapter focuses on regression as a fundamental machine learning technique, covering both linear modeling approaches and gradient descent algorithms for parameter optimization. Through hands-on examples using R, students learn to implement linear regression and gradient descent from scratch, understanding how models minimize error functions to find optimal parameters. The chapter emphasizes practical application over theoretical derivations.
This chapter introduces cloud computing platforms essential for modern data science work. It covers the three major providers: Google Cloud Platform (GCP), Microsoft Azure, and Amazon Web Services (AWS).
Key topics include setting up virtual machines, configuring SSH access, and running RStudio Server in browser-based environments on each platform. The chapter demonstrates how to migrate data science workflows from local machines to cloud infrastructure, providing scalable computing resources and storage.
Practical examples show installing R and RStudio on cloud VMs, accessing them through web browsers, and managing costs. The chapter emphasizes that cloud computing skills are now essential for data science practitioners, offering dynamic scaling, redundancy, and pay-as-you-use pricing models for computational resources.
In this chapter, we explore the idea of the social world as a collection of patterned phenomena, and how the social sciences attempt to make sense of those patterns. We value the characteristics of parsimony, predictiveness, falsifiability, fertility, and replicability in research. Research questions are one of four types – normative, hypothetical, factual/procedural, or empirical – depending on the goal or purpose of the investigation. Empirical research questions deal with the world ‘as it is,’ seeking general explanations for patterns of outcomes or classes of phenomena. A good research question is one whose answer takes as much space as your paper has length.