Skip to main content Accessibility help
×
Publisher:
Cambridge University Press
Online publication date:
September 2017
Print publication year:
2017
Online ISBN:
9781316338599

Book description

Real-world data sets are messy and complicated. Written for students in social science and public management, this authoritative but approachable guide describes all the tools needed to collect data and prepare it for analysis. Offering detailed, step-by-step instructions, it covers collection of many different types of data including web files, APIs, and maps; data cleaning; data formatting; the integration of different sources into a comprehensive data set; and storage using third-party tools to facilitate access and shareability, from Google Docs to GitHub. Assuming no prior knowledge of R and Python, the author introduces programming concepts gradually, using real data sets that provide the reader with practical, functional experience.

Reviews

'Data science has now firmly moved from computer science and engineering to the disciplines of the social sciences, where scholars are harnessing the insightful power of ever larger and more complex data sets. This volume provides a clear introduction for social scientists and policy researchers into the use of R and Python, including best practice of working with data files, command files, and outputs. The step by step approach with real world examples will be of great value to students, scholars, and practitioners engaged in data analytic approaches to social problems.'

Todd Landman - Pro-Vice Chancellor, Faculty of Social Sciences, University of Nottingham

'The irruption of big data and the need to comply with high standards of research reproducibility require social scientists and policy analysts to be conversant in data collection and management techniques. Unfortunately, even those with sophisticated methodological training often lack the necessary tools to take on these requirements. Magallanes's book at long last collects and organizes a large amount of information and useful advice on how to curate data for scientific analysis. Through agile narrative and compelling examples, he walks the reader through the use of open-source tools of data science such as R, Python, and Github. The book is an invaluable resource for students and scholars at different levels of proficiency, from neophytes to advanced users.'

Guillermo Rosas - Washington University, St. Louis

'This new, practical, reader-friendly, how-to manual on computational social data analysis is both long overdue and a must-have for analysts ad researchers. The range of problem-solving strategies and demonstrations is impressive. While eminently practical, Magallanes’ contribution is also rigorous and true to its scientific aims, which will please both basic and applied scientists and practitioners.'

Claudio Cioffi-Revilla - Director, Center for Social Complexity, George Mason University, Washington DC, and founding President, Computational Social Science Society of the Americas

'Magallanes’ excellent book on data science for researchers and policy analysts is an accessible yet thorough introduction to data management and analyses in R and Python. It has a broad coverage of the techniques required to capture, clean, and process complex information. It is the perfect companion for sophisticated policy analysts and researchers that are ready to take advantage of the wealth of data that is available to skilled computer scientists.'

Ernesto Calvo - University of Maryland

'It is rare indeed to pick up a new manuscript and immediately think how much you wish it had been written five years earlier, but I suspect many people will have that reaction to this book. This timely, thorough, and remarkably clear tutorial to both R and Python serves as a much needed on ramp to the data part of data science, and will undoubtedly soon grace the bookshelves of many social scientists - both students and their instructors. If you are intrigued by the possibilities of data science but concerned about the start up costs, look no farther: help has arrived.'

Joshua Tucker - New York University

'If you need to develop new skills in R and Python but you don’t know where to start, this is the book for you. With simple language, Magallanes shows you how to install the programs, retrieve data using APIs and scrape Internet sources, and how to get the data ready for modeling. This book is a gem.'

Aníbal Pérez-Liñán - University of Pittsburgh

'This book will be of great assistance to public policy and management scholars desiring a rigorous introduction to Data Science, particularly with regard to the intricacies of data management. The step-by-step approach will help teachers and students, in both undergraduate and graduate programs, become familiar with essential programming skills, particularly with respect to analyzing Big Data and making it available through Open Government initiatives. The author also provides a very helpful service in using both R and Python to show how to accomplish the same task, which allows readers to decide which of these languages will best serve their needs.'

Craig W. Thomas - Evans School of Public Policy and Governance, University of Washington

Refine List

Actions for selected content:

Select all | Deselect all
  • View selected items
  • Export citations
  • Download PDF (zip)
  • Save to Kindle
  • Save to Dropbox
  • Save to Google Drive

Save Search

You can save your searches here and later view and run them again in "My saved searches".

Please provide a title, maximum of 40 characters.
×

Contents

References
Allison, Paul David. 2002. Missing dat. Sage university papers. Quantitative applications in the social science. nos. 07–136. Thousand Oaks, CA: SAGE.
Allison, Paul Davi. 2014. Event history and survival analysi. Second edition. Quantitative applications in the social science. nos. 07–46. Los Angeles: SAGE.
ANE. 2014. Users guide and codebook for the ANES 2012 Time Series Stud. Ann Arbor, MI and Palo Alto, CA: University of Michigan and Stanford University.
Arel-Bundock, Vincen. 2013 (Aug.). WDI: World Development Indicator. Washington, DC: World Bank.
Arnold, Ti. 2014 (Nov.). Manipulating PDFs with Python | Python.
Bivand, Roge. Pebesma, Edzer J. and Gāmez-Rubio, Virgili. 2013. Applied Spatial data analysis with. Second editio. New York: Springer. OCLC: ocn857289562.
Box-Steffensmeier, Janet M. and Jones, Bradford S. 2004. Event history modeling: A guide for social scientist. New York: Cambridge University Press.
Brunsdon, Chri. and Comber, Le. 2015. An introduction to R for spatial analysis & mappin. Los Angeles: SAGE. OCLC: ocn904424364.
Chrisman, Nicholas R. 1998. Rethinking Levels of Measurement for Cartography. Cartography and geographic information science, 25(4), 231–242.
Danneman, Natha. and Heimann, Richar. 2014. Social media mining with. Birmingham, UK: Packt Publishing.
Davidson-Pilon, Camero. 2015. Lifelines: Survival analysis in Python.
DeBell, Matthe. 2010. How to analyze ANES survey dat. Tech. rep. nes012492. Palo Alto, CA, and Ann Arbor,MI: Stanford University and the University of Michigan.
Duck, Mat. 2013 (Oct.). wbpy. wbpy - Python interface to the World Bank Indicators and Climate APIs. https://github.com/mattduck/wbpy
Elff, Marti. 2015 (Mar.). Tools for management of survey data, graphics, programming, statistics, and simulation. Package ‘memisc’. https://cran.r-project.org/web/packages/memisc/index.html.
European Commission, Organisation for Economic Co-operation and Development, and SourceOECD (Online service) (eds). 2008. Handbook on constructing composite indicators: Methodology and user guid. Paris: OECD.
Feinerer, Ing. Hornik, Kur. and Software, Artifex, and GPL Ghostscript, Inc (pdf_info ps taken from. 2015 (July). tm: Text Mining Package.
Figueroa, Adolf. 2008. Competition and circulation of economic elites: Theory and application to the case of Peru. Quarterly Review of Economics and Finance, 48(2), 263–273.
Forta, Be. 2000. Sams teach yourself regular expressions in 10 minute. Indianapolis, Ind: Sams.
Garrard, Chri. 2016. Geoprocessing with Pytho. Shelter Island, NY: Manning Publications. OCLC: ocn915498655.
Goyvaerts, Ja. and Levithan, Steve. 2012. Regular expressions cookboo. Second editio. Beijing: O'Reilly.
Grolemund, Garret. Spinu, Vitali. and Wickham, Hadle. 2016 (Sept.). Make dealing with dates a little easier [R package lubridate version 1.6.0].
Lawhead, Joe. 2015. Learning geospatial analysis with Python: An effective guide to geographic information system and remote sensing analysis using Python. Birmingham, UK: Packt Publishing. OCLC: 934039792.
Lawson, Richar. 2015.Web scraping with Python..Birmingham, UK: Packt Publishing.
Levenshtein, V. I. 1966. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet physics doklady, 10(Feb.), 707.
Mitchell, Rya. 2015.Web scraping with Python: Collecting data from the modern we. Beijing: O'Reilly.
Mosteller, Frederic. and Tukey, John Wilde. 1977. Data analysis and regression: A second course in statistic. Addison-Wesley Series in Behavioral Scienc. Reading, MA: Addison-Wesley.
Munzert, Simo. 2014. Automated data collection with R: A practical guide to Web scraping and text minin. Chichester, UK: John Wiley & Sons.
R Core Team. 2015 (Aug.). Read data stored by Minitab, S, SAS, SPSS, Stata, Systat, Weka, dBase, …
Raghunathan, Trivellor. 2015. Missing data analysis in practic. Boca Raton, FL: CRC Press.
Ram, Karthi. 2015 (July). rdrop2: Programmatic interface to the ‘Dropbox’ API.
Richardson, Leonar. 2015 (Nov.). Beautiful Soup.
Rubba, Christia. 2016 (May). htmltab: Assemble data frames from HTML tables.
Ryan, Jeffrey A. and Ulrich, Joshua M. 2014 (Jan.). xts: eXtensible time series.
Sherouse, Olive. 2014 (May). wbdata.
Simon, Herbert A. 1996. The sciences of the artificia. Cambridge, MA: MIT Press.
Solon, Gar. Haider, Steve. and Wooldridge,, Jeffre. 2013 (Feb.). What are we weighting for. Tech. Rep. #18859. Bureau of Economic Research, Cambridge, MA.
Stevens, S. S.194. On the theory of scales of measurement. Science, 103(2684), 677– 680.
Temple Lang, Dunca. and CRAN team. 2015a (June). RCurl: General network (HTTP/FTP/…) client interface for R.
Temple Lang, Dunca. and CRAN Team. 2015b (June). XML: Tools for parsing and generating XML within R and S-Plus.
Velleman, Paul F. and Wilkinson, Lelan. 1993. Nominal, Ordinal, Interval, and Ratio Typologies Are Misleading. American statistician, 47(1), 65.

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Book summary page views

Total views: 0 *
Loading metrics...

* Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

Usage data cannot currently be displayed.