To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter focuses on data collection methods, analysis approaches, and evaluation techniques in data science. It covers various data collection methods including surveys (with different question types like multiple-choice, Likert scales, and open-ended questions), interviews, focus groups, diary studies, and user studies in lab and field settings.
The chapter distinguishes between quantitative methods (using numerical measurements and statistical analysis) and qualitative methods (observing behaviors, attitudes, and opinions through techniques like grounded theory and constant comparison). It also discusses mixed-method approaches that combine both methodologies.
For evaluation, the chapter explains model comparison metrics including precision, recall, F-measure, ROC curves, AIC, and BIC. It covers validation techniques like training-testing splits, A/B testing, and cross-validation methods. The chapter emphasizes that data science involves pre-data collection planning and post-analysis evaluation, not just data processing.
Both qualitative and quantitative methods provide rigorous ways to establish and assess causality, though their understandings of causality and techniques for doing so differ. Qualitative and quantitative techniques are generally complementary rather than substitutes or opposites. Common quantitative tools include cross-tabulation, regression, and logit/probit models; which is appropriate typically depends on the level of measurement of the dependent variable. Common qualitative tools include two within-case techniques, process tracing and analytic narratives, and three between-case techniques, case control, structured focused comparison, and content analysis. “Intermediate-N” techniques exist for research questions whose number of cases is greater than that typically used for qualitative analysis but below the threshold for successful quantitative analysis, along with Big Data approaches for extreme-N analysis, content analysis techniques for corpora, and mixed methods approaches.
This chapter introduces Python as a powerful yet beginner-friendly programming language essential for data science. It covers getting access to Python through direct installation or integrated development environments like Anaconda and Spyder. The chapter teaches fundamental programming concepts including basic operations, data types, and key data structures (lists, tuples, dictionaries, sets, and DataFrames). Students learn to write control structures using if-else statements and while/for loops, create reusable functions, and make programs interactive through user input. The chapter also explains how to install and use Python packages, which extend the language’s capabilities for specialized tasks. Throughout, practical examples demonstrate concepts like leap year calculations, temperature categorization, and sales data analysis. The chapter emphasizes Python’s accessibility, extensive package ecosystem, and suitability for data science applications, positioning it as an ideal tool for solving computational and data analysis problems.
A framing case study examines a debt dispute between a Wall Street investor and Argentina that resulted in the seizure of an Argentine warship in Ghana. Then the chapter tackles the topic of upholding international law. The chapter discusses: (1) international legal enforcement, including major bodies, when these bodies refuse to rule, and access to non-state actors; (2) domestic legal enforcement, including jurisdiction and various forms of immunity; and (3) political enforcement via coercion and persuasion.
This chapter covers unsupervised learning, where algorithms analyze data without known true labels or outcomes. Unlike supervised learning, the goal is to discover hidden patterns and structures in data.
The chapter explores three main techniques: Agglomerative clustering works bottom-up, starting with individual data points and merging similar ones into larger clusters. Divisive clustering (including k-means) takes a top-down approach, splitting data into smaller groups. Both methods use distance matrices and dendrograms to visualize cluster relationships.
Expectation Maximization (EM) handles incomplete data by iteratively estimating missing parameters using maximum likelihood estimation. Model quality is assessed using AIC and BIC criteria.
The chapter also introduces reinforcement learning, where agents learn optimal actions through trial-and-error interactions with environments, receiving rewards or penalties. Applications include robotics, gaming, and autonomous systems. Throughout, the chapter emphasizes the creative, interpretive nature of unsupervised learning compared to more structured supervised approaches.
This chapter explores how to get and prepare quantitative data prior to analysis. Use theory to identify the unit of analysis for your study, then determine the population and sample for your study. Be sure to capture appropriate variation in the DV and be alert for selection bias in how cases enter the sample. Issues of validity and reliability can potentially cause major problems with your analysis. Again, use your theory to carefully match indicators to concepts to minimize the risk of these problems. Think through the data collection process and plan ahead to maximize efficiency; gather all data for control variables and robustness checks in a single sweep, if possible. Much data, particularly for standard indicators of common concepts, is freely available online through a variety of sources, and your library probably also subscribes to other quantitative databases. Collecting new data is substantially more time-consuming than using previously-gathered data, but it is often necessary to test novel theories. Whether you use existing data or novel data, be sure to define your data needs list before beginning data collection, allow sufficient time, and document and back up everything.
This preface introduces “A Hands-On Introduction to Data Science with Python,” designed for advanced undergraduates and graduate students across diverse fields including information science, business, psychology, and sociology. The book requires minimal programming experience but expects computational thinking and basic statistics knowledge. It’s structured in four parts: foundations of data science, tools and platforms (Python programming and cloud computing), machine learning techniques, and real-world applications. The second edition adds "DS in Practice" boxes, cloud computing coverage, AI/ethics discussions, and downloadable datasets. The book emphasizes practical, hands-on learning with 39 solved exercises, 40 try-it-yourself problems, and 57 end-of-chapter problems, making data science accessible to non-technical students.