To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In working with network data, data acquisition is often the most basic yet the most important and challenging step. The availability of data and norms around data vary drastically across different areas and types of research. A team of biologists may spend more than a decade running assays to gather a cells interactome; another team of biologists may only analyze publicly available data. A social scientist may spend years conducting surveys of underrepresented groups. A computational social scientist may examine the entire network of Facebook. An economist may comb through large financial documents to gather tables of data on stakes in corporate holdings. In this chapter, we move one step along the network study life-cycle. Key to data gathering is good record-keeping and data provenance. Good data gathering sets us up for future success—otherwise, garbage in, garbage out—making it critical to ensure the best quality and most appropriate data is used to power your investigation.
This chapter covers ways to explore your network data using visual means and basic summary statistics, and how to apply statistical models to validate aspects of the data. Data analysis can generally be divided into two main approaches, exploratory and confirmatory. Exploratory data analysis (EDA) is a pillar of statistics and data mining and we can leverage existing techniques when working with networks. However, we can also use specialized techniques for network data and uncover insights that general-purpose EDA tools, which neglect the network nature of our data, may miss. Confirmatory analysis, on the other hand, grounds the researcher with specific, preexisting hypotheses or theories, and then seeks to understand whether the given data either support or refute the preexisting knowledge. Thus, complementing EDA, we can define statistical models for properties of the network, such as the degree distribution, or for the network structure itself. Fitting and analyzing these models then recapitulates effectively all of statistical inference, including hypothesis testing and Bayesian inference.
Realistic networks are rich in information. Often too rich for all that information to be easily conveyed. Summarizing the network then becomes useful, often necessary, for communication and understanding but, being wary, of course, that a summary necessarily loses information about the network. Further, networks often do not exist in isolation. Multiple networks may arise from a given dataset or multiple datasets may each give rise to different views of the same network. In such cases and more, researchers need tools and techniques to compare and contrast those networks. In this chapter, In this chapter, well show you how to summarize a network, using statistics, visualizations, and even other networks. From these summaries we then describe ways to compare networks, defining a distance between networks for example. Comparing multiple networks using the techniques we describe can help researchers choose the best data processing options and unearth intriguing similarities and differences between networks in diverse fields.
Machine learning, especially neural network methods, is increasingly important in network analysis. This chapter will discuss the theoretical aspects of network embedding methods and graph neural networks. As we have seen, much of the success of advanced machine learning is thanks to useful representations—embeddings—of data. Embedding and machine learning are closely aligned. Translating network elements to embedding vectors and sending those vectors as features to a predictive model often leads to a simpler, more performant model than trying to work directly with the network. Embeddings help with network learning tasks, from node classification to link prediction. We can even embed entire networks and then use models to summarize and compare networks. But not only does machine learning benefit from embeddings, but embeddings benefit from machine learning. Inspired by the incredible recent progress with natural language data, embeddings created by predictive models are becoming more useful and important. Often these embeddings are produced by neural networks of various flavors, and we explore current approaches for using neural networks on network data.
This chapter discusses record keeping, like maintaining a lab notebook. Historically, lab notebooks were analog, pen-and-paper affairs. With so much work being performed on the computer and with most scientific instruments creating digital data directly, most record-keeping efforts are digital. Therefore, we focus on strategies for establishing and maintaining records of computer-based work. Keeping good records of your work is essential. These records inform your future thoughts as you reflect on the work you have already done, acting as reminders and inspiration. They also provide important details for collaborators, and scientists working in large groups often have predefined standards for group members to use when keeping lab notebooks and the like. Computational work differs from traditional bench science, and this chapter describes practices for good record-keeping habits in the more slippery world of computer work.
Drawing examples from real-world networks, this essential book traces the methods behind network analysis and explains how network data is first gathered, then processed and interpreted. The text will equip you with a toolbox of diverse methods and data modelling approaches, allowing you to quickly start making your own calculations on a huge variety of networked systems. This book sets you up to succeed, addressing the questions of what you need to know and what to do with it, when beginning to work with network data. The hands-on approach adopted throughout means that beginners quickly become capable practitioners, guided by a wealth of interesting examples that demonstrate key concepts. Exercises using real-world data extend and deepen your understanding, and develop effective working patterns in network calculations and analysis. Suitable for both graduate students and researchers across a range of disciplines, this novel text provides a fast-track to network data expertise.
Ellen Balka, Simon Fraser University, British Columbia,Ina Wagner, Universität Siegen, Germany,Anne Weibert, Universität Siegen, Germany,Volker Wulf, Universität Siegen, Germany
This chapter frames the book. It explains the focus of women as a historically highly relevant category while acknowledging the multiplicities of (gender) identities and relations that the rise of queer theory has opened. It also draws attention to the different experiences that women have at work in relation to technology, which are mediated in complex ways by ethnic and class backgrounds as well as issues of sexuality. The chapter outlines the different disciplinary orientations the book draws upon – including feminist theory, science, technology, and society studies, sociology of work, political economy, organizational studies, labour history, as well as CSCW, HCI, and participatory design – to then introduce the key concepts and theories used in the book: the distinction between sex and gender, intersectionality, the problematic notion of race, the view of engineers/designers making ethical-political choices, the concept of technology. It forwards the notion of practice-based research and the importance of involving users in design decisions as key to achieving gender equality in design. These concepts will be elaborated as well as made ‘practical’ in the course of the book.
This chapter motivates the need for a book that covers both theoretical and practical aspects of deep learning for natural language processing. We summarize the content of the book, as well as aspects that are not within scope, and current limitations of deep learning in general.
In this chapter, we describe the main goal of the book, its organization, course outline, and suggestions for instructions and self-study. The textbook material is aimed for a one-semester undergraduate/graduate course for mathematics and computer science students. The course might also be recommended for students of physics, interested in networks and the evolution of large systems, as well as engineering students, specializing in telecommunication. Our textbook aims to give a gentle introduction to the mathematical foundations of random graphs and to build a platform to understand the nature of real-life networks. The text is divided into three parts and presents the basic elements of the theory of random graphs and networks. To help the reader navigate through the text, we have decided to start with describing in the preliminary part (Part I) the main technical tools used throughout the text. Part II of the text is devoted to the classic Erdős–Rényi–Gilbert uniform and binomial random graphs. Part III concentrates on generalizations of the Erdős–Rényi–Gilbert models of random graphs whose features better reflect some characteristic properties of real-world networks.
The quantity of data that is collected, processed, and employed has exploded in this millennium. Many organizations now collect more data in a month than the total stored in the Library of Congress. With the goal of gaining insight and drawing conclusions from this vast sea of information, data science has fueled many of the vast benefits brought by the Internet and provided the business models that pay many of its costs.
This chapter introduces the basic concepts of cybersecurity and the data analytics perspective to cybersecurity. It lays out the areas of study and how data analytics should be a key part of the spectrum of cybersecurity solutions.
The key to parallel programming is sharing a task between many cooperating threads running in parallel. A chart is presented showing how since 2003 the Moore’s law growth in computing performance has depended on parallel computing. This chapter includes a simple introductory CUDA example which performs numerical integration using 1000 000 000 threads. Using CUDA gives a speed-up of about 1000 compared to a single CPU thread. Key CUDA concepts including thread blocks, thread grids and warps are introduced. The hardware differences between conventional CPU architectures and GPUs are then discussed. Optimisations in memory caching on GPUs are also explained as memory access time is often a key performance constraint for many programs. The use of OpenMP to share a single task across all cores of a multicore CPU is also discussed.
In this introductory chapter, we outline the ways in which various problems in data analysis can be formulated as optimization problems. Specifically, we discuss least squares problems, problems in matrix optimization (particularly those involving low-rank matrices), linear and kernel support vector machines, binary and multiclass logistic regression, and deep learning. We also outline the scope of the remainder of the book.