Multi-group connectivity structures and their implications

We investigate the implications of different forms of multi-group connectivity. Four multi-group connectivity modalities are considered: co-memberships, edge bundles, bridges, and liaison hierarchies. We propose generative models to generate these four modalities. Our models are variants of planted partition or stochastic block models conditioned under certain topological constraints. We report findings of a comparative analysis in which we evaluate these structures, controlling for their edge densities and sizes, on mean rates of information propagation, convergence times to consensus, and steady state deviations from the consensus value in the presence of noise as network size increases.


Motivation and problem description
As the size of a connected social network increases, multigroup formations that are distinguishable clusters of individuals become a characteristic and important feature of network topology. The connectivity of multigroup networks may be based on co-memberships, edge bundles that connect multiple individuals located in two disjoint groups, bridges that connect two individuals in two disjoint groups, or liaison hierarchies of nodes. Figure 1 illustrates each form. A largescale network may include instances of all four connectivity modalities. The work reported in this article is addressed to the implications of these different forms of intergroup connectivity. We set up populations of multiple subgroups and evaluate the implications of different forms of intergroup connectivity structures. We analyze the implications of different forms by adopting standard models of opinion formation and information propagation that allow a comparative analysis on metrics of mean rates of information propagation, convergence times to consensus, and steady-state deviations from the consensus value under conditions of noise.

Related literature
Typically, a corporation has formal hierarchical structure and additional informal communication structures (Likert, 1967). The authority of the large-scale organizations is subject to the well-known problem of control loss, i.e., the cumulative decay of influence of superiors over subordinates along the chain-of-command (Williamson, 1970;Friedkin & Johnsen, 2002). Classic and fascinating work on organization cultures (Crozier, 1964) points to the importance of the ( Wang & Wong, 1987), overlapping memberships (Airoldi et al., 2008), weighted graphs (Aicher et al., 2014), and arbitrary degree distributions (Karrer & Newman, 2011) have also been studied.
In the field of social network science, the four forms of subgroup connectivity illustrated in Figure 1 are familiar constructs. Comparative research on their implications is limited. Granovetter (1973) and Watts & Strogatz (1998) have focused on the implications of multigroup connectivity based on bridges. Friedkin (1998) focused on co-membership and edge-bundle connectivity constructs, referring to them as "ridge" structures. Reynolds & Johnson (1982) focused on the importance of liaisons. It may be that ridge structures provide a more robust basis of influence and information flows than thinly dispersed bridges and liaisons. We are unaware of any comparative analysis of all four forms of intergroup connectivity structures that employs a common set of dynamical-system behavioral metrics.

Statement of contribution
In this article, we develop generative-network models that set up sample networks for each form of multigroup connectivity topology and conduct a comparative analysis of them, which we believe is lacking in the literature. Our models, under some additional constraints, can be regarded as SBMs. We compare these network topologies on three metrics: (i) spectral radius that is a metric of the rate of information propagation in a network propagation models, (ii) convergence time to consensus based on the classic French-DeGroot opinion dynamics, and (iii) steady-state deviation from the French-DeGroot consensus value in the presence of noise. We perform a regression analysis to obtain an equitable comparison on the performance of these four connectivity structures and to account for the discrepancies among their structural properties. We learned that the development of generative-network models, suitable for this comparative analysis, is nontrivial. We lay out in detail the assumptions of our models. This is the methodological contribution of the article. The comparative analysis of network metrics, over samples of networks of increasing size in the class of each form of multigroup connectivity, is the article's theoretical contribution to a better understanding of the implications of these different forms.

Preliminaries
Graph theory Each graph G (V , E ) is identified with the pair (V , E ). The set of graph nodes V = ∅ represents actors or groups of actors in a social network. |V | = n is the size of the network. The set of graph links E represents the social interactions or ties among those actors. We denote the set of neighbors of node i with N i . In a weighted graph, edge weights represent the frequency or the strength of contact between two individuals, whereas in a binary graph all edge weights are equal to one. The density of G is given by ratio of the number of its observed to possible edges, 2|E | n(n − 1) . Graph G is called dense if |E | = O(n 2 ) and sparse if |E | n 2 . A graph with density of 1 is a clique. A walk of minimum length between two nodes is the shortest path or geodesic. Average geodesic length is defined by L = 1 n(n − 1) i,j∈V ,i =j d ij , where d ij is the length of the geodesic from node i to node j. A connected acyclic subgraph of G spanning all of its nodes is a spanning tree. A uniform spanning tree of size n is a spanning tree chosen uniformly at random in the set of all possible spanning trees of size n. Degree or connectivity of node i is defined as the number of edges incident on it. The degree distribution of a graph P(k) is the number of nodes with degree k, or the probability that a node chosen uniformly at random has degree k. The clustering coefficient of node i is given by the ratio of existing edges between the neighbors of node i over all the possible edges among those neighbors. Letting average clustering coefficient of graph G is defined as C = 1 n i∈V c i . An Erdős-Rényi graph (1959) is constructed by connecting nodes randomly. Each edge is included in the graph with a fixed probability p independent from every other edge. We represent such graph as G ER (n, p), where p is the probability that each edge is included in the graph independent from every other edge. The probability distribution of G ER (n, p) follows a binomial distribution P(k) = n−1 k p k (1 − p) n−1−k , and its average clustering coefficient is given as C = p.
Linear algebra We denote the adjacency matrix of G with A ∈ R n×n whose a ij th entry is equal to the weight of the link between nodes i and j when such an edge exists, and zero otherwise. Matrix A is irreducible if the underlying digraph is strongly connected. If digraph G is aperiodic and irreducible, then A is primitive. (A digraph is aperiodic if the greatest common divisor of all cycle lengths is 1.) A cycle is a closed walk, of at least three nodes, in which no edge is repeated. We adopt the shorthand notations 1 n = [1, . . . , 1] and 0 n = [0, . . . , 0] . Given x = [x 1 , . . . , x n ] ∈ R n , diag (x) denotes the diagonal matrix whose diagonal entries are x 1 , . . . , x n . For an irreducible nonnegative matrix A, λ max denotes the dominant eigenvalue of A which is equal to the spectral radius of A, ρ(A). The left positive eigenvector of A associated with λ max is called the left dominant eigenvector of A.
Empirical networks properties Our generative-network models attend to three often observed properties of real networks. (i) Small average shortest path: in networks with a large number of vertices, the average shortest path lengths are relatively small due to the existence of bridges or shortcuts. (ii) Heavy tail degree distribution: in contrast to Erdős-Rényi graphs with binomial degree distribution, degree distributions of more realistic networks display a power law shape: P(k) ∼ Ak −α , where typically 2 < α < 3. (iii) High average clustering coefficient: in most real-world networks, particularly social networks, nodes tend to create tightly knit groups with relatively high clustering coefficient.
Stochastic block model Let n, k ∈ Z + denote the number of vertices and the communities, respectively; p = (p 1 , . . . , p k ) be a probability vector (the prior) on the k communities, and W ∈ {0, 1} k×k be a symmetric matrix of connectivity probabilities. The pair (X, G ) is drawn under the SBM(n, p, W) if X is an n-dimensional random vector with i.i.d. components distributed under p, and G (V , E ) is a simple graph where vertices v and u are connected with probability W X v ,X u , independently of any other pairs. We define the community sets by Note that edges are independently but not identically distributed. Instead, they are conditionally independent-that is, conditioned on their groups, all edges are independent and for a given pair of groups (i, j), they are i.i.d. Because each vertex in a given group connects to all other vertices in the same way, vertices in the same community are said to be stochastically equivalent. The distribution of (X, G ) for x ∈ {1, . . . , k} n is given by The law of large numbers implies that, almost surely,

Symmetric SBM (SSBM)
If the probability vector p is uniform and W has all diagonal entries equal to q in and all non-diagonal entries equal to q out , then the SBM is said to be symmetric. We say (X, G ) is drawn under the SSBM(n, k, q in , q out ), where the community prior is p = {1/k} k , and X is drawn uniformly at random with the constraints |{v ∈ V : X v = i}| = n/k. The case where q in > q out is called assortative model.

Methods
To design our four models, we first generate a sequence of group sizes, and refer to the appendix for some of the detailed algorithms involved. Second, we produce the community structures according to the sequence of group sizes and add the interconnections among them in the four modalities of multigroup connectivity.

Generating subgroup sizes
In this section, we describe an algorithm to generate relative subgroup sizes and introduce the resulting properties of these subgroups. We compute a normalized sequence of group sizes with a heavy tail distribution. We refer to Algorithm 1 in the appendix for a formal description based on pseudocode. Each subgroup is modeled as a connected dense Erdős-Rényi graph. For ε substantially smaller than 1 (we shall select it to be 10%), a subgroup of size i is the random graph Each subgroup of size i and edge probability 1 − ε has the following properties: i. connectivity threshold of t(i) = ln (i) i , that is, for 1 − ε > t(i), G ER is almost surely connected (almost any graph in the ensemble G ER is connected); ii.
(1 − ε) i(i − 1) 2 edges on average; iii. small average shortest path close to 1 and depending at most logarithmically on i; iv. binomial degree distribution: Note that as ε decreases, the standard error becomes smaller and the distribution is more densely concentrated around the mean (i − 1)(1 − ε); and v. large clustering coefficient close to 1 (conditioned on small ε) and equal to C = 1 − ε.
Given a population of n individuals, Algorithm 1 generates a sequence of relative subgroup sizes, such that, when interpreted as a disconnected graph, the collection of these subgroups exhibits a heavy tail degree distribution. An example of subgroup sizes generated by Algorithm 1 is illustrated in Figure 2.
As part of Algorithm 1, we design the probability distribution for the subgroup size i to be proportional to 1 i 3 . The choice of exponent equal to 3 is based on the following notes: first, in order for f (i) = k i α and its mean to be well defined, one should have α ≥ 2; second, if one additionally requires the distribution to have a finite variance, then α ≥ 3. With exponent 3, the outcome of each realization of the algorithm is a collection of mostly small connected subgroups.

Models of multigroup connectivity
In this section, we describe the algorithms that generate realizations of the four multigroup connectivity modalities.
For three of the four modalities (bridges, edge bundles, and co-members), we connect the subgroups through a minimal set of pairwise coordination problems among them. Specifically,  a minimal set of pairwise coordination problems is modeled through the notion of a random spanning tree among the subgroups. To define the generative algorithms for these three structures, we apply the notion of SBMs.

Bridge connectivity model
Here, we propose an algorithm to generate the bridge connected model. This structure can be modeled as an SBM where the communities are connected through a uniform randomly generated spanning tree, and the interconnections are through precisely one node of each subgroup. We denote the edge set of this random tree with E T . The graph is drawn under the SBM(n, p, W B ), conditioned under connectivity, where p is calculated by Algorithm 1, and W B is given by where s i = | i | denotes the size of group i, and W B contains a tree structure. Note that given an SBM, a node in community i has np j W ij neighbors in expectation in community j. We illustrate a realization of our algorithm in Figure 3.

Edge bundle connectivity model
In this section, we propose an algorithm to generate the edge bundle connectivity model. Again we apply a random spanning tree as the building block of the interconnections. Here, instead of adding a single edge as the basis of intergroup connectivity, we add multiple edges whose number grows with the size of the subgroups. We illustrate an algorithm realization in Figure 4. We draw the graph under the SBM(n, p, W EB ), conditioned under redundant connectivity. Communities are connected through a uniform randomly generated spanning tree with edge set E T . The interconnections involve two or more nodes from neighboring subgroups. p is calculated by Algorithm 1, and W EB is given by where W EB contains a tree structure, α ij = α ji ≥ 2 for all i, j, and α ij scales with s i s j .

Co-membership connectivity model
In addition to the existence of a uniform random spanning tree over the subgroup, our comembership connectivity model generation is conditioned under the following topological constraint: we consider each pair of connected subgroups, say i and j, and select a fraction of edges in the complete bipartite graph over i and j. For each of these selected edges, we randomly select one of the two individuals, say the individual in i, and we turn this individual into a member of the subgroup j by adding edges from this individual to almost all members of v. We illustrate an algorithm realization in Figure 5.
The co-membership model can be generated as a realization of SBM(n, p, W C ), conditioned under the edge bundles initiated from a single node in one of the corresponding subgroups. Again E T denotes the edge set of the random tree, p is calculated by Algorithm 1, and W C is given by where W C contains a tree structure, α ij = α ji ≥ 3 for all i, j, and α ij scales with either s i or s j (α ij ≈ s i or α ij ≈ s j ).

Liaison hierarchy connectivity model
Here, applying Algorithm 1 we first generate the subgroups as dense Erdős-Rényi graphs. Then we partition the subgroups into sets of 2 or 3, and (i) assign a liaison to each of sets and (ii) recursively assign a new liaison to groups of 2 or 3 liaisons until we reach the root at the top  of the hierarchy. The resulting graph is a hierarchical tree structure with random branching factors of 2 and 3. A detailed description is provided in Algorithm 3 in the appendix, and Figure 6 illustrates a realization of this model.

Results
Realistic networks are usually not exclusively based on a single modality of subgroup connectivity. Our comparative analysis of connectivity modalities is oriented to the question of the implications of a shift away from one modality toward another modality, e.g., a modality shift from a liaison hierarchy toward direct bridges among subgroups, or from bridges among subgroups to intergroup edge bundles, or from intergroup edge bundles to co-memberships. In Figure 7, we present a comparison of the average shortest paths and average degrees of our generated networks as a function of network size for each of the four multigroup connectivity modalities. Each sample point on the curves is based on 100 realizations on networks with sizes that increase in step sizes of 50 up to 2,000 nodes. In analyses that increase the sample point size to 1,500 over a range of sizes up to 500, there is no marked change in the trajectories. In general, the confidence interval bands are narrow. Here, and elsewhere, red refers to the bridge model, purple to the edge bundle model, green to the co-membership model, and blue to the liaison hierarchy model. Figure 7(a) shows that the liaison hierarchy increasingly distinguishes itself from the three modalities as network size increases. Its displayed trajectory is conditional on the liaison structure design. Average shortest paths are insensitive to redundancies. Hence, the lack of distinctions among the other three modalities is not surprising. Figure 7(b) shows that the four modalities are systematically ordered with respect to their average degrees: (co-membership) > (edge-bundle) > (bridge) > (liaison) with respect to their average degrees.

Spectral radius and propagation processes
Propagation phenomena appear in various disciplines, such as spread of infectious diseases, transmission of information, diffusion of innovations, cascading failures in power grids, and spread of wildfires in forests. Based on the application, the objective can vary from avoiding epidemic outbreaks and eradicating the disease in a population to facilitating the spread of an ideology or product over a network in marketing campaigns. In this subsection, we provide a comparison of the system behavior under the simple and well-studied epidemic models proposed in the literature for our four proposed network models.
Let x(t) = x 1 (t), . . . , x n (t) denote the infection probabilities of each node at time t and A ∈ R n×n denote the adjacency matrix of the contact graph. Let β > 0 be the infection rate, and γ > 0 be the recovery rate to the susceptible state. Then the linearization of the SI (Susceptible-Infected) and SIS (Susceptible-Infected-Susceptible) network propagation models about the no-infection equilibrium point 0 n on a weighted digraph are given by, respectively, The following results are well known (see the classic works (Lajmanovich & Yorke, 1976;Allen, 1994;Wang et al., 2003) and the recent review (Mei et al., 2017)). In the SI model, the epidemic  initially experiences exponential growth with rate βλ max . In the SIS model, near the onset of an epidemic outbreak, the exponential growth rate is βλ max − γ and the outbreak tends to align with the dominant eigenvector.
In Figure 8, we plot the spectral radius of the networks as a function of network size 50-2,000 for the four models. In Table 1, we evaluate the differences among these curves controlling a network's size (N), average degree (Degree), and (0,1) indicator variables for the edge-bundle, co-membership, and liaison modalities with the bridge modality taken as the baseline. Similar findings were obtained with 660K observations on a reduced range of network sizes 24-655. The average degree of a network has a positive effect on the speed of viral propagation. Controlling for network size and average degree, relative to the propagation speeds in the bridge modality, propagation speeds in the edge-bundle and liaison modalities are greater and those of the comembership modality are less. The elevated curve for co-membership modality in Figure 8 is based on its systematically higher average degrees.

Time to convergence in influence processes generating consensus with distributed linear averaging
Consensus algorithms play an important role in many multiagent systems. They are usually defined as in French-DeGroot discrete-time averaging recursion where W is row stochastic and x(t) ∈ R n is the vector of individuals' opinions at time t. For primitive stochastic matrices, the solution to Equation (6) satisfies where v is the left dominant eigenvector of W satisfying v 1 + · · · + v n = 1. Convergence time to consensus may be defined as τ asym = 1 log (1/r asym ) and it gives the asymptotic number of steps for the error to decrease by the factor 1/e, where r asym denotes the asymptotic convergence factor. It is well known, e.g., see Bullo Bullo (2018), Chapter 10, that convergence to consensus is exponentially fast as ρ t 2 , where ρ 2 is the second largest eigenvalue of W in magnitude. We construct W from A as follows: where D = diag (A1 n ) denotes the diagonal matrix of all the nodes' out-degrees, with d ii = n j=1 a ij ∀i. Equation (8) gives positive weights w ii that are equal to the w ij weights of i's neighbors in A.
In Figure 9, we plot the average convergence times of the networks as a function of network size 50-2,000 for the four models. In Table 2, we evaluate the differences among these curves controlling a network's size (N), average degree (Degree), and (0,1) indicator variables for the edge-bundle, co-membership, and liaison modalities with the bridge modality taken as the baseline. Similar findings were obtained with 660K observations on a reduced range of network sizes 24-655. The convergence times of the bridge modality are larger than those of the three other modalities, and the liaison modality has the fastest convergence times. Higher average degrees lower times to convergence. Controlling for network size and average degree, the convergence times of the edge-bundle modality are faster than those of the co-membership modality.

Consensus processes subject to white Gaussian noise
The general form of a French-DeGroot influence process with white Gaussian noise is where e(t) is a random vector with zero mean and covariance e having independent entries. In the presence of noise, the states of the agents will be brought close to each other, but will not fully align to exact consensus. The resulting noisy consensus is referred to as persistent disagreement.  For strongly connected and aperiodic graphs, the consensus dynamics (6) correspond to an irreducible and aperiodic Markov chain. The matrix W then corresponds to the transition probability matrix and its normalized left dominant eigenvector π corresponds to the stationary distribution vector of the chain. The results on the steady-state disagreement by Jadbabaie & Olshevsky (2017) apply to reversible Markov chains which with the choice of weights on our adjacency matrix will be met. For the Markov chain with reversible transition matrix W and with uncorrelated noise, the mean square asymptotic error δ ss can be measured by where D π = diag (π) and H is the matrix of hitting times for the Markov chain. The algorithm by Kemeny & Snell (1976) is applied to compute H. In Figure 10, we plot the steady-state mean deviation from consensus, given by Equation (10), on the networks as a function of network size 50-2,000 for the four models. In Table 3, we evaluate the differences among these curves controlling a network's size (N), average degree (Degree), and (0,1) indicator variables for the edge-bundle, co-membership, and liaison modalities with the bridge modality taken as the baseline. Similar findings were obtained with 660K observations on a reduced range of network sizes 24-655. The steady-state mean deviations for the bridge modality are larger than those of the three other modalities. Higher average degrees lower steady-state mean deviations from consensus. Although the modalities have distinguishable effects, again we note that average degree differences are "boiled into" the modality models, so that when average degree is controlled, the relative ordering of modalities is altered. The edge-bundle and liaison modalities have greater noise reduction properties than the co-membership modality.

Discussion
In this article, we have proposed simple, synergistic, and stochastic algorithms to generate four modalities of multigroup connectivity and have compared their implications. These algorithms are a variant of what is known as planted partition or SBMs, under some further topological constraints including that the intergroup connectivity is shaped by an underlying tree. Models 1-3 are nested in the following sense: for appropriate parameters, (1) graphs generated by the bridge connectivity structure are subgraphs of those generated by the edge bundles and (2) graphs generated by the edge bundle connectivity structure could be subgraphs of those generated by the co-membership. However, moving from the edge bundles to co-memberships, we introduce an additional constraint; that is, edge bundles of the spanning tree are initiated from the same node in one of neighboring subgroups in the co-membership model. The work touches on two central traditions in network analysis: models of network structure and models of dynamical processes that unfold on networks composed of multiple small groups with dense within-group edges. In a connected network, any two such groups might be intersecting (with one or more individuals who are members of both) or disjoint. Two disjoint subgroups may be linked by a bridge, or by multiple edges, or by individuals who are not members of any dense group. We consider networks that can be strictly characterized in terms of one of these types of intergroup connectivity. The touchstone for our analysis is the work that has been conducted on multiple-group connectivity based on bridges. Here we elaborate the analysis with a comparison of implications of group-connectivity based on (i) a minimal set of bridges, (ii) a minimal block-model structure in which pairs of groups are linked by multiple edges, (iii) a minimal set of group membership intersections, and (iv) a hierarchical tree of group-independent agents (intermediary liaisons.) No doubt there are many ways to construct realizations of each type of connectivity. No doubt there are many process metrics that might be examined. We compare structures in terms of network process metrics. We focus on metrics of two processes-epidemic propagation and consensus formation. These metrics are sensitive to network topology. We emphasize that the results of these comparative analyses are not merely due to the different numbers of links being added to isolated clusters. The regression results controlling for the network sizes and average node degrees affirm this claim. We constrain topology to four broad classes of markedly nonrandom clustered networks. Our contribution is to show the feasibility of a principled approach to a comparative analysis that we believe is currently lacking with respect to these distinguishable topological classes. Our findings on the speed of viral propagation show that the speeds differ depending on the form of multigroup connectivity. The average degree of a network has a positive effect on the speed of viral propagation. If the average degree differences, shown in Figure 7(b), are characteristic features of the modalities, then Figure 8 shows the net effect of each modality. Controlling for network size and average degree, our regression analysis in Table 1 evaluates the independent contributions of average degree and modality type. If it were possible to construct modality types with identical average degrees, then the regression results suggest that the bridge, edge-bundle, and liaison modalities do not substantially differ in their speeds of viral propagation, and that the co-membership modality dampens the speed of viral propagation.
Our findings on the times to convergence to consensus show that convergence times differ depending on the form of multigroup connectivity. The average degree of a network has a negative effect on convergence times; that is, higher average degrees are associated with faster convergence to consensus. If the average degree differences, shown in Figure 7(b), are characteristic features of the modalities, then Figure 9 shows the net effect of each modality. The bridge modality has slower convergence times than all other modalities. If it were possible to construct modality types with identical average degrees, then the regression results in Table 2 suggest somewhat similar results. As in Figure 9, the convergence times in the bridge modality are greater than all other modalities, and the liaison modality has the fastest convergence times. The regression on the edgebundle and co-membership modalities indicates that, for a given average degree and network size, convergence is faster for edge-bundle than co-membership modalities.
Finally, our findings for levels of steady-state stochastic deviations from consensus in the presence of noise show that the mean deviations differ depending on the form of multigroup connectivity. The average degree of a network has a negative effect on steady deviation; that is, higher average degrees are associated with smaller deviations (more reduction of noise). If the average degree differences, shown in Figure 7(b), are characteristic features of the modalities, then Figure 10 shows the net effect of each modality. The bridge modality has greater deviations (less reduction of noise) than all other modalities. If it were possible to construct modality types with identical average degrees, then the regression results in Table 3 suggest somewhat similar results. As in Figure 9, the levels of noise reduction in the bridge modality are less than in all other modalities. The regression on the edge-bundle, co-membership, and liaison modalities indicate that edge bundles are associated with the greatest reduction of noise.
The important caveat on our findings is that they are conditional on positions taken in the models with which we generated realizations of each modality; see Algorithms 1 and 2 in the appendix. In addition, although it is reasonable that differences of average degree are associated with different modalities, we have not derived bounds on average degree for each modality (this may be an intractable problem). Furthermore, our analysis of multigroup connectivity modalities involves a uniform modality, whereas real networks with multiple subgroups are likely to be connected with mixed modalities including instances of bridges, edge-bundles, co-memberships, and liaison nodes who are not members of any group. We believe that these obvious limitations are outweighed by the insights obtained from an analysis of artificial network topologies with controllable features. In the set of findings of this paper, we were particularly struck by (1) the implications on network process metrics of the social cohesion entailed in edge-bundle and co-membership modalities of multigroup connectivity and (2) by the strong effects on process metrics of network differences of average degree arising from the multiple modalities.
An interesting future research direction is to propose sufficiently predictive indicators that enable one to categorize an arbitrary graph into any of the four connectivity structures discussed in this paper. In other words, we are interested in the following question: "given an empirically observed graph, can one provide a computationally efficient algorithm to identify subgroups and classify them into these different connectivity structures?" We find the results on the following literature relevant: recovery of the communities in the prolific community detection literature (Newman & Girvan, 2004;Fortunato, 2010), graph clustering (Schaeffer, 2007), and graph modularity (Newman, 2006). SBMs are widely recognized generative models for community detection and clustering in graphs and they provide a ground truth for identifying subgroups. Abbe Abbe (2017) surveys recent developments for necessary and sufficient conditions for community recovery and community detection in SBMs.