To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Graphs are the mathematical objects used to represent networks, and graph theory is the branch of mathematics that deals with the study of graphs. Graph theory has a long history. The notion of the graph was introduced for the first time in 1763 by Euler, to settle a famous unsolved problem of his time: the so-called Königsberg bridge problem. It is no coincidence that the first paper on graph theory arose from the need to solve a problem from the real world. Also subsequent work in graph theory by Kirchhoff and Cayley had its root in the physical world. For instance, Kirchhoff's investigations into electric circuits led to his development of a set of basic concepts and theorems concerning trees in graphs. Nowadays, graph theory is a well-established discipline which is commonly used in areas as diverse as computer science, sociology and biology. To give some examples, graph theory helps us to schedule airplane routing and has solved problems such as finding the maximum flow per unit time from a source to a sink in a network of pipes, or colouring the regions of a map using the minimum number of different colours so that no neighbouring regions are coloured the same way. In this chapter we introduce the basic definitions, setting up the language we will need in the rest of the book. We also present the first data set of a real network in this book, namely Elisa's kindergarten network. The two final sections are devoted to, respectively, the proof of the Euler theorem and the description of a graph as an array of numbers.
What Is a Graph?
The natural framework for the exact mathematical treatment of a complex network is a branch of discrete mathematics known as graph theory [48, 47, 313, 150, 272, 144]. Discrete mathematics, also called finite mathematics, is the study of mathematical structures that are fundamentally discrete, i.e. made up of distinct parts, not supporting or requiring the notion of continuity. Most of the objects studied in discrete mathematics are countable sets, such as integers and finite graphs. Discrete mathematics has become popular in recent decades because of its applications to computer science. In fact, concepts and notations from discrete mathematics are often useful to study or describe objects or problems in computer algorithms and programming languages. The concept of the graph is better introduced by the two following examples.
Imagine you are invited to a party; you observe what happens in the room when the other guests arrive. They start to talk in small groups, usually of two people, then the groups grow in size, they split, merge again, change shape. Some of the people move from one group to another. Some of them know each other already, while others are introduced by mutual friends at the party. Suppose you are also able to track all of the guests and their movements in space; their head and body gestures, the content of their discussions. Each person is different from the others. Some are more lively and act as the centre of the social gathering: they tell good stories, attract the attention of the others and lead the group conversation. Other individuals are more shy: they stay in smaller groups and prefer to listen to the others. It is also interesting to notice how different genders and ages vary between groups. For instance, there may be groups which are mostly male, others which are mostly female, and groups with a similar proportion of both men and women. The topic of each discussion might even depend on the group composition. Then, when food and beverages arrive, the people move towards the main table. They organise into more or less regular queues, so that the shape of the newly formed groups is different. The individuals rearrange again into new groups sitting at the various tables. Old friends, but also those who have just met at the party, will tend to sit at the same tables. Then, discussions will start again during the dinner, on the same topics as before, or on some new topics. After dinner, when the music begins, we again observe a change in the shape and size of the groups, with the formation of couples and the emergence of collective motion as everybody starts to dance.
The social system we have just considered is a typical example of what is known today as a complex system [16, 44]. The study of complex systems is a new science, and so a commonly accepted formal definition of a complex system is still missing.
Real-world networks present interesting mesoscopic structures, meaning that they carry important information also at an intermediate scale: a scale that is larger than that of the single nodes, but smaller than that of the whole network. In fact, their nodes are often organised into communities, i.e. clusters of nodes such that nodes within the same cluster are more tightly connected than nodes belonging to two different clusters. In such cases we say that the networks have a community structure. The most important point is that nodes in the same network cluster usually share common features. For instance, we will see that communities in the Zachary's karate club network coincide with real social groupings, communities in brain networks identify areas of the brain with different functions, while tightly connected groups of nodes in theWorldWideWeb correspond to pages on common topics. This is the reason why, by finding the communities of a network, we can learn a lot about the way the network works. In this chapter we will consider various methods to find communities, starting with two traditional approaches, namely spectral partitioning and hierarchical clustering, and then focusing on more recent methods specifically introduced by network scientists to find community structure in networks.We will present the Girvan– Newman approach that is based on the removal of the high-centrality edges, and then we will define a quality function, the so-called modularity, that quantifies the quality of a given partition of the nodes of the network.We will show that communities can be extracted directly by optimising the modularity over the set of possible graph partitions. Finally, we will discuss the label-propagation algorithm, a local and fast method to detect communities which can be used for very large graphs. The study of network community structure is now considered a research field by itself, and is an area of network science that is still rapidly expanding in different directions, with important contributions also from computer scientists and software engineers. Needless to say, it is very difficult to keep pace with the most recent approaches and algorithms. The choice of the topics of this chapter is therefore mainly didactic, and we have included in Section 9.8 a few pointers to some of the most advanced methods for community detection.
Many of the networks around us continuously grow in time by the addition of new nodes and new links. One typical example is the network of the World Wide Web. As we saw in the previous chapter, millions of new websites have been created over recent years, and the number of hyperlinks among them has also increased enormously over time. Networks of citations among scientific papers is another interesting example of growing systems. The size of these networks constantly increases because of the publication of new papers, all arriving with new citations to previously published papers. All the models we have studied so far in the last three chapters deal, instead, with static graphs. For instance, in order to construct random graphs and small-world networks we have always fixed the number N of vertices and then we have randomly connected such vertices, or rewired the existing edges, without modifying N. In this chapter we show that it is possible to reproduce the final structure of a network by modelling its dynamical evolution, i.e. by modelling the continuous addition in time of nodes and links to a graph. In particular, we concentrate on the simplest growth mechanisms able to produce scale-free networks. Hence, we will discuss in detail the Barabási–Albert model, in which newly arrived nodes select and link existing nodes with a probability linearly proportional to their degree, the so-called rich gets richer mechanism, and various other extensions and modifications of this model. Finally, we will show that scale-free graphs can also be produced in a completely different way by means of growing models based on optimisation principles.
Citation Networks and the Linear Preferential Attachment
As authors of scientific publications we are all interested in our articles being cited in other papers’ bibliographies. Citations are in fact an indication of the impact of a work on the research community and, in general, articles of high quality or broad interest are expected to receive many more citations than articles of low quality or limited interest. This is the reason why citation data are a useful source not only to identify influential publications, but also to find hot research topics, to discover new connections across different fields, and to rank authors and journals [87, 194].
The term random graph refers to the disordered nature of the arrangement of links between different nodes. The systematic study of random graphs was initiated by Erdős and Rényi in the late 1950s with the original purpose of studying theoretically, by means of probabilistic methods, the properties of graphs as a function of the increasing number of random connections. In this chapter we introduce the two random graph models proposed by Erdős and Rényi, and we show how many of their average properties can be derived exactly. We focus our attention on the shape of the degree distributions and on how the average properties of a random graph change as we increase the number of links. In particular, we study the critical thresholds for the appearance of small subgraphs, and for the emergence of a giant connected component or of a single connected component. As a practical example we compute the component order distribution in a set of large real networks of scientific paper coauthorships and we compare the results with random graphs having the same number of nodes and links. We finally derive an analytical expression for the characteristic path length, the average distance between nodes, in random graphs.
Erdős and Renyi (ER) Models
A random graph is a graph in which the edges are randomly distributed. In the late 1950s, two Hungarian mathematicians, Paul Erdős and Alfréd Rényi came up with a formalism for random graphs that would change traditional graph theory, and led to modern graph theory. Up to that point, graph theory was mainly combinatorics. We have seen one typical argument in Section 1.5. The new idea was to add probabilistic reasoning together with combinatorics. In practice, the idea was to consider not a single graph, but the ensemble of all the possible graphs with some fixed properties (for instance with N nodes and K links), and then use probability theory to derive the properties of the ensemble. We will show below how we can get useful information from this approach. Erdős and Rényi introduced two closely related models to generate ensembles of random graphs with a given number N of nodes, that we will henceforth call Erdős and Rényi (ER) random graphs [49, 100, 101].
What are the building blocks of a complex network? We have seen in Chapter 4 that triangles are highly recurrent in social and biological networks, so that they can be considered as one of their elementary bricks. In this chapter we will discuss a general approach to define and detect the building blocks of a given network. The basic idea is to look not only at triangles but also at cycles of length larger than three, and at other small subgraphs, known as motifs, which occur in real networks more frequently than in their corresponding randomised counterparts. We will first derive a set of formulas to count the number of cycles in a graph directly from its adjacency matrix. As an application, we will use these formulas to find the number of cycles of different lengths in urban street networks and to compare various cities from all over the world. Notice that urban streets are a very special type of network. Their nodes have a position in Euclidean space and their links have a length, and as such they need to be described in terms of spatial graphs. The topology of spatial graphs is constrained by their spatial embedding, so that urban streets require special treatment. This will imply, in our case, the choice of appropriate spatial graphs to use as network null models when counting cycles in a city. In the second part of the chapter we will concentrate on other small subgraphs which are overabundant in some real networks and can therefore be very useful to characterise their microscopic properties. In particular, we will show that one specific motif, namely the so-called feed-forward loop, that emerges in the structure of the transcription regulation network of E. coli and of other biological networks, is there because it plays an important biological function. This relation between structure and function is the main reason why, by performing a so-called motif analysis and by looking at the profile of abundance of all possible subgraphs, it is possible to classify complex networks and to group them into different network superfamilies.
We introduce here the elementary concepts of computer science and complexity theory which will be useful in understanding the material of the other appendices. This appendix is primarily intended for those readers who do not have a background in computer science. Its main aim is that of presenting basic notions about computational problems, the axiomatic definition of algorithms, the standard methods to represent algorithms, the concept of time complexity and a few useful tools to estimate the time complexity of simple algorithms. Whenever possible, the discussion has been intentionally left informal and all the unnecessary technicalities have been discarded, in order to allow the reader to focus on the essential concepts without being distracted by definitions and theorems. For a more formal treatment of the material presented in this appendix we encourage the interested reader to refer to any classical book on algorithms, such as the trilogy by Donald Ervin Knuth [181, 182, 183] or in Complexity Theory [250].
What Is a “Problem”?
Formally, a problem P is a pair (D,Q) of an abstract description D and a question Q requiring an answer. A simple example of a problem in graph theory is the Graph Connectivity Problem: “Given a graph G(N,L) where N denotes a set of vertices and L is the set of edges among vertices in N, is the graph G connected?” In this example, the description provides the context of the problem, which is usually represented as a class of objects (a generic graph G(N,L)), while the question to be answered (“is G connected?”) is a precise enquiry about a specific property of the class of objects under consideration.
The definition above is quite general and is given for an entire class of objects, without any specific connection to a particular object of that class. Conversely, an instance of a problem includes a full specification of one particular object of a class, which we indicate as x, and the solution P(x) is the answer to the question Q for the specific object x under consideration.