Efficiently Generating Geometric Inhomogeneous and Hyperbolic Random Graphs

Hyperbolic random graphs (HRG) and geometric inhomogeneous random graphs (GIRG) are two similar generative network models that were designed to resemble complex real world networks. In particular, they have a power-law degree distribution with controllable exponent $\beta$, and high clustering that can be controlled via the temperature $T$. We present the first implementation of an efficient GIRG generator running in expected linear time. Besides varying temperatures, it also supports underlying geometries of higher dimensions. It is capable of generating graphs with ten million edges in under a second on commodity hardware. The algorithm can be adapted to HRGs. Our resulting implementation is the fastest sequential HRG generator, despite the fact that we support non-zero temperatures. Though non-zero temperatures are crucial for many applications, most existing generators are restricted to $T = 0$. Our generators support parallelization, although this is not the focus of this paper. We note that our generators draw from the correct probability distribution, i.e., they involve no approximation. Besides the generators themselves, we also provide an efficient algorithm to determine the non-trivial dependency between the average degree of the resulting graph and the input parameters of the GIRG model. This makes it possible to specify the expected average degree as input. Moreover, we investigate the differences between HRGs and GIRGs, shedding new light on the nature of the relation between the two models. Although HRGs represent, in a certain sense, a special case of the GIRG model, we find that a straight-forward inclusion does not hold in practice. However, the difference is negligible for most use cases.


Introduction
Network models play an important role in different fields of science. From the perspective of network science, models can be used to explain observed behavior in the real world.
To mention one example, Watts and Strogatz [25] observed that few random long-range connections suffice to guarantee a small diameter. This explains why many real-world networks exhibit the small-world property despite heavily favoring local over long-range connections. From the perspective of computer science, and specifically algorithmics, realistic random networks can provide input instances for graph algorithms. This facilitates theoretical approaches (e.g., average-case analysis), as well as extensive empirical evaluations by providing an abundance of benchmark instances, solving the pervasive scarcity of real-world instances.
There are some crucial features that make a network model useful. The generated instances have to resemble real-world networks. The model should be as simple and natural as possible to facilitate theoretical analysis, and to prevent untypical artifacts. And it must be possible to efficiently draw networks from the model. This is particularly important for the empirical analysis of model properties and for generating benchmark instances.
A model that has proven itself useful in recent years is the hyperbolic random graph (HRG) model [16]. HRGs are generated by drawing vertex positions uniformly at random from a disk in the hyperbolic plane. Two vertices are joined by an edge if and only if their distance lies below a certain threshold; see Section 2.2. HRGs resemble real-world networks with respect to crucial properties. Most notable are the power-law degree distribution [14] (i.e., the number of vertices of degree k is roughly proportional to k −β with β ∈ (2, 3)), the high clustering coefficient [14] (i.e., two vertices are more likely to be connected if they have a common neighbor), and the small diameter [10,18]. Moreover, HRGs are accessible for theoretical analysis (see, e.g., [4,10,14,18]). Finally there is a multitude of efficient generators with different emphases [2,11,12,20,[22][23][24]; see Section 1.2 for a discussion.
Here every vertex has a position on the d-dimensional torus and a weight following a power law. Two vertices are then connected if and only if their distance on the torus is smaller than a threshold based on the product of their weights. When using positions on the circle (d = 1), GIRGs approximate HRGs in the following sense: the processes of generating a HRG and a GIRG can be coupled such that it suffices to decrease and increase the average degree of the GIRG by only a constant factor to obtain a subgraph and a supergraph of the corresponding HRG, respectively. Compared to HRGs, GIRGs are potentially easier to analyze, generalize nicely to higher dimensions, and the weights allow to directly adjust the degree distribution.
Above, we described the idealized threshold variants of the models, where two vertices are connected if an only if their distance is small enough. Arguably more realistic are the binomial variants, which allow longer edges and shorter non-edges with a small probability. This is achieved with an additional parameter T , called temperature. For T → 0, the binomial and threshold variants coincide. Many publications focus on the threshold case, as it is typically simpler. This is particularly true for generation algorithms: in the threshold variants one can ignore all vertex pairs with sufficient distance, which can be done using geometric data structures. In the binomial case, any pair of vertices could be adjacent, and the search space cannot be reduced as easily. For practical purposes, however, a non-zero temperature is crucial as real-world networks are generally assumed to have positive temperature. Moreover, from an algorithmic perspective, the threshold variants typically produce particularly wellbehaved instances, while a higher temperature leads to difficult problem inputs. Thus, to obtain benchmark instances of varying difficulty, generators for the binomial variants are key.

Contribution & Outline
Based on the algorithm by Bringmann, Keusch, and Lengler [7], we provide an efficient and flexible GIRG generator. It includes the binomial case and allows higher dimensions. Its expected running time is linear. To the best of our knowledge, this is the first efficient generator for the GIRG model. Moreover, we adapt the algorithm to the HRG model, including the binomial variant. Compared to existing HRG generators (most of which only support the threshold variant), our implementation is the fastest sequential HRG generator.
A refactoring of the original GIRG algorithm [7] allows us to parallelize our generators. They do not use multiple processors as effectively as the threshold-HRG generator by Penschuck [20], which was specifically tailored towards parallelism. However, in a setting realistic for commodity hardware (8 cores, 16 threads), we still achieve comparable run times.
Our generators come as an open-source C++ library 1 with documentation, command-line interface, unit tests, micro benchmarks, and OpenMP [6] parallelization using shared memory. An integration into NetworKit [21] is planned.
Besides the efficient generators, we have three secondary contributions. (I) We provide a comprehensible description of the sampling algorithm that should make it easy to understand how the algorithm works, why it works, and how it can be implemented. Although the core idea of the algorithm is not new [7], the previous description is somewhat technical. (II) The expected average degree can be controlled via an input parameter. However, the dependence of the average degree on the actual parameter is non-trivial. In fact, given the average degree, there is no closed formula to determine the parameter. We provide a linear-time algorithm to estimate it. (III) We investigate how GIRGs and HRGs actually relate to each other by measuring how much the average degree of the GIRG has to be decreased and increased to obtain a subgraph and supergraph of the HRG, respectively.
In the following we first discuss our main contribution in the context of existing HRG generators. In Section 2, we formally define the GIRG and HRG models. Afterwards we describe the sampling algorithm in Section 3. In Section 4 we discuss implementation details, including the parameter estimation for the average degree (Section 4.1) as well as multiple performance improvements. Section 5 contains our experiments: we investigate the scaling behavior of our generator in Section 5.1, compare our HRG generator to existing ones in Section 5.2, and compare GIRGs to HRGs in Section 5.3.

Comparison with Existing Generators
We are not aware of a previous GIRG generator. Concerning HRGs, most previous algorithms only support the threshold case; see Table 1. The only published exceptions are the trivial quadratic algorithm [2], and an O((n 3/2 + m) log n) algorithm [22] based on a quad-tree data structure [23]. The latter is part of NetworkKit; we call it NkQuad. Moreover, the code for a hyperbolic embedding algorithm [5] includes an HRG generator implemented by Bringmann based on the GIRG algorithm [7]; we call it Embedder in the following. Embedder has been widely ignored as a high performance generator. This is because it was somewhat hidden, and it is heavily outperformed by other threshold generators. Experiments show that our generator HyperGIRGs is much faster than NkQuad, which is to be expected considering the asymptotic running time. Moreover, on a single processor, we outperform Embedder by an order of magnitude for T = 0 and by a factor of 4 for higher temperatures. As Embedder does not support parallelization, this speed-up increases for multiple processors. For the threshold variant of HRGs, there are the following generators. The quad-tree data structure mentioned above was initially used for a threshold generator (QuadTree) [23]. It was later improved leading to the algorithm currently implemented in NetworKit (NkGen) [24]. A later re-implementation by Penschuck [20] improves it by about a factor of 2 (NkOpt). However, the main contribution of Penschuck [20] was a new generator that features sublinear memory and near optimal parallelization (HyperGen). Up to date, HyperGen was the fastest threshold-HRG generator on a single processor. Our generator, HyperGIRGs, improves by a factor of 1.3 -2 (depending on the parameters) but scales worse for more processors. Finally Two vertices u = v are independently connected with probability p uv . For a positive temperature 0 < T < 1, while for T = 0 a threshold variant of the model is obtained with The constant c > 0 controls the expected average degree. We note that the above formulation slightly deviates from the original definition; see Section 2.3 for more details.

Hyperbolic Random Graphs
HRGs [16] are generated by sampling random positions in the hyperbolic plane and connecting vertices that are close. More formally, let V = {1, . . . , n} be a set of vertices. Let α > 1/2 and C ∈ R be two constants, where α controls the power-law degree distribution with exponent β = 2α+1 > 2, and C determines the average degreed. For each vertex v ∈ V , we sample a random point p v = (r v , θ v ) in the hyperbolic plane, using polar coordinates. Its angular coordinate θ v is chosen uniformly from [0, 2π] while its radius 0 ≤ r v < R with R = 2 log(n) + C is drawn according to the density function f (r) = α sinh(αr) cosh(αR)−1 . In the threshold case of HRGs two vertices u = v are connected if and only if their distance is below R. The hyperbolic distance The binomial variant adds a temperature T ∈ [0, 1] to control the clustering, with lower temperatures leading to higher clustering. Two nodes u, v ∈ V are then connected with For T → 0, the two definitions (threshold and binomial) coincide.

Comparison of GIRGs and HRGs
Bringmann et al. [7] show that the HRG model can be seen as a special case of the GIRG model in the following sense. Let d HRG be the average degree of a HRG. Then there exist GIRGs with average degree d GIRG and D GIRG with d GIRG ≤ d HRG ≤ D GIRG such that they are sub-and supergraphs of the HRG, respectively. Moreover, d GIRG and D GIRG differ only by a constant factor. Formally, this is achieved by using the big-O notation instead of a single constant c for the connection probability. We call this the generic GIRG framework. It basically captures any specific model whose connection probabilities differ from Equation (1) by only a constant factor. From a theoretical point of view this is useful as proving something for the generic GIRG framework also proves it for any manifestation, including HRGs.
To see how HRGs fit into the generic GIRG framework, consider the following mapping [7]. Radii are mapped to weights w v = e (R−rv)/2 , and angles are scaled to fit on a 1-dimensional torus x v = θ v /(2π). One can then see that the hyperbolic connection probability p T (d) under the provided mapping deviates from Equation (1) by only a constant. Thus, c in Equation (1) can be chosen such that all GIRG probabilities are larger or smaller than the corresponding HRG probabilities, leading to the two average degrees d GIRG and D GIRG mentioned above. Bringmann et al. [7] note that the two constants, which they hide in the big-O notation, do not have to match. They leave it open if they match, converge asymptotically, or how large the interval between them is in practice. We investigate this empirically in Section 5.3.

Sampling Algorithm
As mentioned in the introduction, the core of our sampling algorithm is based on the algorithm by Bringmann et al. [7]. In the following, we first give a description of the core ideas and then work out the details that lead to an efficient implementation.
To explain the idea, we make two temporary assumptions and relax them in Section 3.1 and Section 3.2, respectively. For now, assume that all weights are equal and consider only the threshold variant T = 0. The task is to find all vertex pairs that form an edge, i.e., their distance is below the threshold c(w u w v /W ) 1/d . Since all weights are equal, the threshold in this restricted scenario is the same for all vertex pairs. One approach to quickly identify adjacent vertices is to partition the ground space into a grid of cells. The size of the cells

23:6
Efficiently Generating Geometric Inhomogeneous and Hyperbolic Random Graphs should be chosen, such that (I) the cells are as small as possible and (II) the diameter of cells is larger than the threshold c(w u w v /W ) 1/d . The latter implies that only vertices in neighboring cells can be connected thus narrowing down the search space. The former ensures that neighboring cells contain as few vertex pairs as possible reducing the number of comparisons. Figure 1a shows an example of such a grid for a 2-dimensional ground space.

Inhomogeneous Weights
Assume that we have vertices with two different weights w 1 , w 2 , rather than one. As before, the cells should still be as small as possible while having a diameter larger than the connection threshold. However, there are three different thresholds now, one for each combination of weights. To resolve this, we can group the vertices by weight and use three differently sized grids to find the edges between them.
As GIRGs require not only two but many weights, considering one grid for every weight pair is infeasible. The solution is to discretize the weights by grouping ranges of weights into weight buckets. When searching for edges between vertices in two weight buckets, the pair of largest weights in these buckets provides the threshold for the cell diameter. This choice of the cell diameter satisfies property (II). Property (I) is violated only slightly, if the weight range within the bucket is not too large. Thus, each combination of two weight buckets uses a grid of cells, whose granularity is based on the maximum weight in the respective buckets.
There is a tradeoff when choosing the number of weight buckets. Logarithmically many buckets yield a sublinear number of grids. Moreover, the largest and smallest weight in a bucket are at most a factor 2 apart. Thus, the diameter of a cell is too large by a factor ≤ 4.
With this approach, a single vertex has to appear in grids of different granularity. To do this in an efficient manner, we recursively divide the space into ever smaller grid cells, leading to a hierarchical subdivision of the space. This hierarchy is naturally described by a tree. For a 2-dimensional ground space, each node has four children, which is why we call it quadtree. Note that each level of the quadtree represents a grid of different granularity. Moreover, the side length of a grid cell on level is 2 − . For a pair (i, j) of weight buckets, we then choose the level that fits best for the corresponding weights, i.e., the deepest level such that the diameter of each grid cell is above the connection threshold for the largest weights in bucket i and j, respectively. We call this level the comparison level, denoted by CL(i, j). It suffices to insert vertices of a bucket into the deepest level among all its comparison levels. This level is called the insertion level and we denote it by I(i). In Section 3.4, we discuss in detail how to efficiently access all vertices in a given grid cell belonging to a given weight bucket.

Binomial Variant of the Model
For T > 0, neighboring cell pairs are still easy to handle: a constant fraction of vertex pairs will have an edge and one can sample them by explicitly checking every pair. For distant cell pairs and a fixed pair of weight buckets, the distance between the cells yields an upper bound on the connection probability of included vertices; see Equation (1). The probability bound depends on both, the weight buckets and the cell pair distance, using the maximum weight within the buckets and the minimum distance between points in the cells. We note that, the individual connection probabilities are only a constant factor smaller than the upper bound.
Knowing this, we can use geometric jumps to skip most vertex pairs [1]. The approach works as follows. Assume that we want to create an edge with probability p for each vertex pair. For this process, we define the random variable X to be the number of vertex pairs we see until we add the next edge. Then X follows a geometric distribution. Thus, instead of throwing a coin for each vertex pair, we can do a single experiment that samples X from the geometric distribution and then skip X vertex pairs ahead. Since not all vertex pairs reach the upper bound p, we accept encountered pairs with probability p uv /p to get correct results.
Although distant cell pairs are handled efficiently, their number is still quadratic, most of which yield no edges. To circumvent this problem, the sampling algorithm, yet again, uses a quadtree. In the quadratic set of cell pairs to compare for one weight bucket pair, non-neighboring cells are grouped together along the quadtree hierarchy. They are replaced by their parents as shown in Figure 1b. The grouping of distant cell pairs is done as much as possible, meaning as long as the parents are not neighbors.
In conclusion, for each pair of weight buckets (i, j) the following two types of cell pairs have to be processed. Any two neighboring cell pairs on the comparison level CL(i, j); and any distant cell pair with level larger or equal CL(i, j) that has neighboring parents.

Efficiently Iterating Over Cell Pairs
The previous description sketches the algorithm as originally published. Here, we propose a refactoring that greatly simplifies the implementation and enables parallelization. We attribute a significant amount of HyperGIRGs' speed up over Embedder to this change.
Instead of first iterating over all bucket pairs and then over all corresponding cell pairs, we reverse this order. This removes the need to repeatedly determine the cell pairs to process for a given bucket pair. Instead it suffices to find the bucket pairs that process a given cell pair. This only depends on the level of the two cells and their type (neighboring or distant). Inverting the mapping from bucket pairs to cell pairs in the previous section yields the following. A neighboring cell pair on level is processed for bucket pairs with a comparison level of exactly . A distant cell pair on level (with neighboring parents) is processed for bucket pairs with a comparison level larger than or equal to . Thus, for each level of the quadtree we must enumerate all neighboring cell pairs, as well as distant cell pairs with neighboring parents. Algorithm 1 recursively enumerates exactly these cell pairs.

Efficient Access to Vertices by Bucket and Cell
A crucial part of the algorithm is to quickly access the set of vertices restricted to a weight bucket i and a cell A, which we denote by V A i . To this end, we linearize the cells of each level as illustrated in Figure 1c. This linearization is called Morton code [17] or z-order curve [19]. It has the nice properties that (I) for each cell in level , its descendants in level > in the quadtree appear consecutively; and (II) it is easy to convert between a cells position in the linear order and its d-dimensional coordinates (see Section 4.2). We sort the vertices of a fixed weight bucket i by the Morton code of their containing cell on the insertion level I(i), using arbitrary tie-breaking for vertices in the same cell. This has the effect that for any cell A with level(A) ≤ I(i), the vertices of V A i appear consecutive. Thus, to efficiently enumerate them, it suffices to know for each cell A the index of the first vertex in V A i . This can be precomputed using prefix sums leading to the following lemma.

Implementation Details
The description in the previous section is an idealized version of the algorithm. For an actual implementation, there are some gaps to fill in. Moreover, omitting many minor tweaks, we want to sketch optimizations that are crucial to achieve a good practical run time in the following. More details on the sketched approaches can be found in Appendix B.

Estimating the Average Degree Parameter
Here, we sketch how to estimate the parameter c in Eq. (1) to achieve a given expected average degree. We estimate the constant based on the actual weights, not on their probability distribution. This leads to lower variance and allows user-defined weights. We start with an arbitrary constant c, calculate the resulting expected average degree E[d] and adjust c accordingly, using a modified binary search. This is possible, as E[d] is monotone in c. We derive an exact formula for E[d], depending on c and the weights. It cannot simply be solved for c, which is why we use binary search instead of a closed expression.
For the binary search, we need to evaluate E[d] for different values of c. This is potentially problematic, as the formula for E[d] sums over all vertex pairs. The issue preventing us from simplifying this formula is the minimum in the connection probability. We solve this, by first ignoring the minimum and subtracting an error term for those vertex pairs, where the minimum takes effect. The remaining hard part is to calculate this error term. Let E R be the set of vertex pairs appearing in the error term and let R be the set of vertices with at least one partner in E R . Although |E R | itself is sufficiently small, R is too large to determine E R by iterating over all pairs in R × R. We solve this by iterating over the vertices in R, sorted by weight. Then, for each vertex we encounter, the set of partners in E R is a superset of the partners of the previous vertex (with smaller weight).

Efficiently Encoding and Decoding Morton Codes
Recall from Section 3.4 that we linearize the d-dimensional grid of cells using Morton code. As vertex positions are given as d-dimensional coordinates, we have to convert the coordinates to Morton codes (i.e., the index in the linearization) and vice versa. This is done by bitwise interleaving the coordinates. For example, the 2-dimensional Morton code of the four-bit coordinates a = a 3 a 2 a 1 a 0 and b = b 3 b 2 b 1 b 0 is a 3 b 3 a 2 b 2 a 1 b 1 a 0 b 0 . We evaluated different encoding and decoding approaches via micro benchmarks. The fastest approach, at least on Intel processors, was an assembler instruction from BMI2 proposed by Intel in 2013 [15].

Generating HRGs Avoiding Expensive Mathematical Operations
The algorithm from Section 3 can be used to generate HRGs. The algorithm works conceptually the same, except that most formulas change. This has for example the effect that we no longer get a closed formula to determine the insertion level of a weight bucket or the comparison level of a bucket pair. Instead, one has to search them, by iterating over the levels of the quadtree. Furthermore, HRGs introduce many computationally expensive mathematical operations like the hyperbolic cosine. This can be mitigated as follows.
For the threshold model, an edge exists if the distance d is smaller than R. Considering how the hyperbolic distance is defined (Section 2.2), reformulating it to cosh(d) < cosh(R) avoids the expensive arccosh, while cosh(R) remains constant during execution and can thus be precomputed. Similar to recent threshold HRG generators, we compute intermediate values per vertex such that cosh(d) can be computed using only multiplication and addition [11,20].
For the binomial model, evaluating the connection probability is a performance bottleneck. The straightforward way to sample edges is: compute the connection probability p T (d) depending on the distance, sample a uniform random value u ∈ [0, 1], and create the edge if and only if u < p T (d). We can improve this by precomputing the inverse of p T (d) for equidistant values in [0, 1]. This lets us, for small ranges in [0, 1], quickly access the corresponding range of distances. Changing the order, we first sample u ∈ [0, 1], which falls in a range between two precomputed values, which in turn yields a range of distances. If the actual distance lies below that range, there has to be an edge and if it lies above, there is no edge. Only if it lies in the range, we actually have to compute the probability p T (d).

Parallelization
The algorithm has five steps: generate weights, generate positions, estimate the average degree constant, precompute the geometric data structure, and sample edges. The first two are trivial to parallelize. For estimating the constants, we parallelize the dominant computations with linear running time. To sample the edges, we make use of the fact that we iterate over cell pairs in a recursive manner. This can be parallelized by cutting the recursion tree at a certain level and distributing the loose ends among multiple processors.
For the preprocessing we have to do three subtasks: compute for each vertex its containing cells on its insertion level, sort the vertices according to their Morton code index, and compute the prefix sum for all cells. We parallelize all three tasks and optimize them by handling all weight buckets together, sorting by weight bucket first and Morton code second. This is done by encoding this criterion into integers that are sorted with parallel radix sort.

Experimental Evaluation
We perform three types of experiments. In Section 5.1 we investigate the scaling behavior of our GIRG generator, broken down into the different tasks performed by the algorithm. In Section 5.2 we compare our HRG generator with existing generators. In Section 5.3 we experimentally investigate the difference between HRGs and their GIRG counterpart. Whenever a data point represents the mean over multiple iterations, our plots include errorbars that indicate the standard deviation. Besides the implementation itself, all benchmarks and analysis scripts are also accessible in our source repository.

Scaling of the GIRG Generator
We investigate the scaling of the generator, broken down into five steps. 1.  Figure 2 shows the sequential run time over the number of nodes n (top left), number of edges m (top right), temperature T (bottom right), and dimension d (bottom right). The performance is measured in nanoseconds per edge. Each data point represents the mean over 10 iterations. To make the measurements independent of the graph representation, we do not save the edges into RAM, but accumulate a checksum instead. Note that the top right plot increases the average degree, resulting in a decreased time per edge.
The empirical run times match the theoretical bounds: it is linear in n and m, grows exponentially in the dimension d, and is unaffected by the temperature T . The overall time is dominated by the edge sampling. Generating the weights includes expensive exponential functions, making it the slowest step after edge sampling. Generating the positions is significantly faster even for higher dimensions. For the parameter estimation using binary search, one can see that the run time never exceeds the time to generate the weights. For non-zero temperature T the performance of the binary search is similar to the generation of the weights, as it also requires exponential functions. The lower run times per edge for the increasing number of edges (top right) show that the run time is dominated by the number of nodes n. Only for very high average degrees, the cost per edge outgrows the cost per vertex.

HRG Run Time Comparison
We evaluate the run time performance of HyperGIRGs compared to the generators in Table 1, excluding the generators with high asymptotic run time as well as RHG and sRHG. RHG and sRHG are designed for distributed machines. Executed on a single compute node, the performance of the faster sRHG is comparable to HyperGen [11]. To avoid systematic biases between different graph representations, the implementations are modified 2 not to store the resulting graph. Instead, only the number of edges produced is counted and we ensure that the computation of incident nodes is not optimized away by the compiler.
We used different machines for our sequential and parallel experiments. The former are done on an Intel Core i7-8700K with 16 GB RAM, the latter on an Intel Xeon CPU E5-2630 v3 with 8 cores (16 threads) and 64 GB RAM.
Our generator HyperGIRGs is consistently faster than the competitors, independent of the parameter choices; see Figure 3a and 3b. Only for unrealistic average degrees (1 k), HyperGen slightly outperforms HyperGIRGs. Moreover, HyperGIRGs beats Embedder, the only other efficient generator supporting non-zero temperature, by an order of magnitude.

23:12
Efficiently Generating Geometric Inhomogeneous and Hyperbolic Random Graphs For higher temperatures, we compare our algorithm with the two other non-quadratic generators NkQuad (included in NetworKit) and Embedder; see Figure 3c. We note that Embedder uses a different estimation for R, which leads to an insignificant left-shift of the corresponding curve. In Figure 3c, one can clearly see the worse asymptotic running time of NkQuad. Compared to Embedder, HyperGIRGs is consistently 4 times faster. Figure 3d shows measurements for parallel experiments using 16 threads. The parameters coincide with Figure 3b. Embedder does not support parallelization and is outperformed even more by the other generators. The fastest generator in this multi-core setting is HyperGen, which is specifically tailored towards parallel execution. Nonetheless, HyperGIRGs shows comparable performance and outperforms the other two generators NkGen and NkOpt. We note that even on parallel machines, the sequential performance is of high importance: One often needs a large collection of graphs rather than a single huge instance. In this case, it is more efficient to run multiple instances of a sequential generator in parallel.

Difference Between HRGs and GIRGs
Recall from Section 2.3 that a HRG with average degree d HRG has a corresponding GIRG sub-and supergraphs with average degrees d GIRG and D GIRG , respectively.
We experimentally determine, for given HRGs, the values for d GIRG by decreasing the average degree of the corresponding GIRGs until it is a subgraph of the HRG. Analogously, we determine the value for D GIRG . We focus on the threshold variant of the models, as this makes the coupling between HRGs and GIRGs much simpler (the graph is uniquely determined by the coordinates). Figure 4a shows d GIRG and D GIRG , compared to d HRG for growing n. One can see that d GIRG and D GIRG are actually quite far apart. They in particular do not converge to the same value for growing n. However, at least d GIRG seems to approach d HRG . This indicates that every HRG corresponds to a GIRG subgraph that is missing only a sublinear fraction of edges. On the other hand, the average degree of the GIRG has to be increased by a lot to actually contain all edges also contained in the HRG. Figure 4b gives a more detailed view for a single HRG. Depending on the average degree of the GIRG, it shows how many edges the GIRG lacks and how many edges the GIRG has in addition to the HRG. For degree 100, the GIRG contains about 38 k additional and lacks about 42 k edges. These are rather small numbers compared to the 50 M edges of the graphs.

A Omitted Proof of Lemma 1 Lemma 1. After linear preprocessing, for all cells A and weight buckets i with level(A) ≤ I(i), vertices in the set V
Proof. As mentioned above, we have to sort the vertices V i of each weight bucket i according to the index (Morton code) of the containing cell. Clearly, the d-dimensional coordinates of the cell containing a given vertex is obtained in constant time by rounding. From this one can obtain the index in constant time (also see Section 4.2). This can be done using, e.g., bucket sort with respect to this index to sort the vertices. In the following, we refer to this sorted array with V i . Besides these sorted arrays V i of vertices, one for each weight bucket i, we store for each cell C at level I(i) the number of vertices preceding the vertices in cell C. Note that this is simply the prefix sum of the number of vertices in all cells that come before cell C. Denote this prefix sum of cell C with P C . Now let i be a weight bucket and let A be a cell identifying the requested set of vertices V A i (with level(A) ≤ I(i)). Let C 1 , . . . , C j be the descendants of cell A at level I(i), appearing in this order according to the Morton code. Recall that the vertices in C 1 , . . . , C j appear consecutive in the sorted array V i . Thus, V A i is given by the range [P C1 , . . . , P Cj+1 ) in V i . In terms of running time, each weight bucket requires O(|V i | + 2 d·I(i) ) time for bucket sort and O(2 d·I(i) ) time for the prefix sums, where 2 d·I(i) is the number of cells in the insertion level I(i). Over all weight buckets, the term |V i | sums up to |V | and Bringmann et al. [7] show that the same holds for 2 d·I(i) .

B.1 Avoiding Double Counting Buckets, Cells, and Vertices
The algorithm as described in Section 3 iterates over pairs of buckets, cells, and vertices. All three entities need to be handled correctly to avoid visiting vertex or cell pairs multiple times. Consider the recursive Algorithm 1. In lines 8 and 9, it is sufficient to consider only cell pairs with A ≤ B, because the pairs (A, B) and (B, A) can be handled simultaneously.
Meaning, if a cell pair is processed by the bucket pair (i, j), then it needs to be processed by the bucket pair (j, i) (cf. line 1). However, the bucket pair (i, i) should occur once per cell pair. Alternatively, one can separate the cell pairs (A, B) and (B, A), but instead consider only bucket pairs (i, j) with i ≤ j. In any case, bucket pairs (i, i) require special treatment for cell pairs of the form (A, A). Then, only edges between vertices u < v should be checked (lines 3,5-6). If self loops are desired, the constraint can be relaxed to u ≤ v.

B.2 Estimating the Average Degree Parameter
This section covers the estimation for the binomial version of the model T > 0. The calculations for the threshold case T = 0 are analogous (and simpler). Typically, a random graph generator accepts the expected average degree or the number of edges as an input parameter. In the following we describe how binary search can be used to estimate the constant c in the edge probability p uv (Eq. (1)) for a desired expected average degree. The constant is found based on the actual weights instead of their probability distribution, because the resulting average degree has lower variance and the generator should be able to accept user-defined weights as well. Note that we implement GIRGs without explicitly modeling the constant c, because scaling all weights by c T emulates the same behavior.
Let X uv be a random indicator variable for the existence of edge uv.
To remove the minimum, we distinguish between short edges that are guaranteed to exist and long edges that exist with probability below 1.
If k ≥ 0.5 then the weights guarantee the existence of the edge uv independent of position. For simplicity we assume that k ≤ 0.5 for all vertex pairs. In the end we subtract an error to account for the ignored pairs. For any constant t ≤ 0.5, P r(||x u − x v || ≤ t) = (2t) d , which is the fraction of the ground space which is covered by a hypercube with radius t. The probability for a short edge becomes The probability density function of ||x u − x v || between 0 and 0.5 is the derivative of (2x) d , namely d2 d x d−1 . We calculate the probability for a long edge based on two specific weights.      of those operations. The first optimization applies to the threshold variant and the second optimization to the binomial version. For the threshold model, an edge exists if the distance d is smaller than R. Considering how the hyperbolic the distance is defined (Section 2.2), reformulating it to cosh(d) < cosh(R) avoids the expensive arccosh, while cosh(R) remains constant during execution. Similar to recent threshold HRG generators, we compute intermediate values per vertex such that cosh(d) can be computed using only multiplication and addition [11,20].

B.3 Efficiently Encoding and Decoding Morton Codes
For the binomial model, evaluating the connection probability p T (d) from the optimized cosh(d) is a performance bottleneck and made up half of the total run time. Evaluating p T (d) includes an expensive exponential function and cannot avoid the arccosh like in the threshold model. We introduce a distance filter (see Figure 7a) to reduce the frequency of the operation resulting in a speedup of approximately factor two. The process before was: compute the probability p T (d), sample a uniform random value u ∈ [0, 1], and emit an edge if u < p T (d). The idea of our filter is that we invert the probability function and compute p −1 T (d) in advance for multiple equidistant values between 0 and 1. Each entry x → p −1 T (x) in the filter represents the distance -or rather proximity -needed for an edge probability of x. During edge sampling, we generate u ∈ [0, 1] before evaluating p T . The generated u falls in an interval between two precomputed entries l ≤ u < h in our filter. We know that p T is monotonically decreasing so p −1 T (l) ≥ p −1 T (u) > p −1 T (h), meaning the higher the distance the lower the probability and vice versa. Instead of emitting an edge if and only if u < p T (d), we emit an edge if p −1 T (h) ≥ d and skip the edge if p −1 T (l) ≤ d. Only if p T (d) is in the interval between l and h, the expensive p T (d) has to be evaluated. Since u is uniformly distributed, the probability to hit the interval where p T (d) has to be evaluated is 1/(k − 1), where k is the number of entries in the filter. Our generator uses k = 100. Additionally, we avoid the arccosh by directly storing cosh[p −1 T (x)] in the filter.

B.5 Parallelization
This section describes how the sampling algorithm can be parallelized focusing on the preprocessing building the geometric data structure (Section 3.4) and on the recursion enumerating pairs of grid cells for sampling the edges (Section 3.3). The presented approach applies to the GIRG and HRG implementations. The preprocessing for a weight bucket i computes the containing cell on the insertion level for all vertices in V i . We optimized the process by processing all weight buckets together. The containing cell for all vertices in V is computed in parallel. We sort vertices by weight bucket first and by cell second. Theoretically, bucket sort results in linear run time. For the implementation, however, we use a parallel radix sort instead. The vertices V i of weight bucket i form a contiguous subsequence in V . Moreover, V i is sorted by cell, allowing parallel computation of the prefix sums for all cells in the insertion level of the weight bucket.
The recursion is executed in parallel and experiments suggest a near optimal scaling when the number of threads is a power of two. Each thread has a local random generator. We use static scheduling to produce deterministic results even for the binomial model. However, the ordering of edges in the edge list varies, because each thread locally buffers generated edges before writing them while locking a mutex. We distinguish two stages of execution. The first stage is to "saw off" the recursion tree at a certain level and collect the omitted recursive calls as tasks to execute in stage two. A task is represented by a cell pair from which to pick up the execution later. One thread collects the tasks by traversing the recursion tree without sampling any edges (omitting lines 1-6 in Algorithm 1). Meanwhile, the other threads process the pairs that the main thread passed through. When all tasks are collected stage two begins. In stage two, the threads pick up the "loose ends" of the cut recursion tree. There are three different types of tasks with varying load. For 1-dimensional geometry, level > 2, and assuming a number of threads that is constant in n, the types of tasks are the following. There are 2 heavy tasks given by a neighboring cell pair of the form (A, A). Their number of recursive calls grows exponentially with each subsequent level implying a load of O(n). There are 2 light tasks given by a neighboring cell pair of the form (A, A + 1). They produce four recursive calls per subsequent level implying a load of O(log n). Finally, there are 3 · 2 −1 constant tasks given by a distant cell pair. They invoke no recursive calls at all. The number of distant cell pairs in a level is explained by Figure 7b Since heavy tasks dominate the run time during stage two, we distribute heavy tasks evenly among all threads. This is why the approach scales best when the number of threads is a power of two. The level where we saw off the recursion tree is a tuning parameter of the generator. We choose it, such that there are two heavy tasks per thread to reduce load imbalance if one thread stalls. To apply the same scheduling approach to higher dimensions it suffices to know that the load of tasks remains similar and the number of heavy tasks is 2 d .