Consensus embedding for multiple networks: Computation and applications

Abstract Machine learning applications on large-scale network-structured data commonly encode network information in the form of node embeddings. Network embedding algorithms map the nodes into a low-dimensional space such that the nodes that are “similar” with respect to network topology are also close to each other in the embedding space. Real-world networks often have multiple versions or can be “multiplex” with multiple types of edges with different semantics. For such networks, computation of Consensus Embeddings based on the node embeddings of individual versions can be useful for various reasons, including privacy, efficiency, and effectiveness of analyses. Here, we systematically investigate the performance of three dimensionality reduction methods in computing consensus embeddings on networks with multiple versions: singular value decomposition, variational auto-encoders, and canonical correlation analysis (CCA). Our results show that (i) CCA outperforms other dimensionality reduction methods in computing concensus embeddings, (ii) in the context of link prediction, consensus embeddings can be used to make predictions with accuracy close to that provided by embeddings of integrated networks, and (iii) consensus embeddings can be used to improve the efficiency of combinatorial link prediction queries on multiple networks by multiple orders of magnitude.


Introduction
Large-scale information networks are becoming ubiquitous. Mining knowledge from these information networks proves useful in a broad range of applications. For various analysis and prediction tasks on networks, representation of networks in a hyperspace enables effective use of out-of-the-shelf machine learning algorithms. In recent years, node embeddings have gained popularity in network representation (Goyal & Ferrara, 2018).
Node embeddings aim to map each node in the network to a low-dimensional vector representation to extract features that represent the topological characteristics of the network. Many techniques are developed for this purpose (Grover & Leskovec, 2016;Ahmed et al., 2019;Tang et al., 2015), and these techniques are shown to be effective in addressing problems such as link prediction (Yue et al., 2020;Kuo et al., 2013), node classification (Cavallari et al., 2017), and clustering (Rozemberczki et al., 2019).
Many real-life networks are versioned (Cowman et al., 2020) or multiplex (Park et al., 2020). Different versions of a network have the same set of nodes and different sets of edges. These different sets of edges may represent identical semantics but different sources (e.g., protein-protein interaction (PPI) networks obtained from different databases) or different semantics (e.g., physical PPIs vs. genetic interactions). Consensus embedding, a concept we introduced recently (Li & Koyutürk, 2020), aims to compute node embeddings for the integration of multiple network versions using the embeddings obtained from the individual versions. To compute consensus embeddings, we proposed two dimensionality reduction methods: singular value decomposition (SVD) and Variational Autoencoder. We showed that the link prediction accuracy of consensus embeddings is close to the accuracy provided by embeddings computed directly from the integrated network. We also showed that consensus embedding can improve the efficiency of processing combinatorial link prediction queries, and balances the trade-off between the earnings in query runtime and pre-processing time.
In this paper, we extend the framework for computing consensus embeddings. First, we generalize the notion of consensus embeddings such that the number of dimensions in the embedding of different individual networks can be different. Observing that these embeddings represent different hyperspaces, we adapt a well-established statistical method for mapping two spaces to each other, namely Canonical Correlation Analysis (CCA) (Hotelling, 1936), to compute consensus embeddings. For computing consensus embeddings of multiple network versions, we apply Generalized Canonical Correlation Analysis (GCCA) (Kettenring et al., 1971).
Since versioned networks have identical node sets and their edge sets can overlap significantly, it can be expected that the embedding spaces of different versions can be similar. Indeed, state-of-the-art machine learning applications use simple aggregation to integrate embeddings of multiplex networks (Park et al., 2020). To assess the correspondence between the embedding spaces of networks with multiple versions, we also consider a baseline method that computes a consensus embedding by taking mean of the individual embeddings. We also systematically investigate the correspondence of the embedding dimensions of embeddings computed on different network versions. Finally, we systematically characterize the link prediction performance of the four methods we consider for computing concensus embeddings, in terms of accuracy and efficency of processing combinatorial link prediction queries.
Our results show that use of consensus embeddings do not significantly compromise the accuracy of link prediction, and consensus embeddings can sometimes deliver more accurate predictions than embeddings computed on the integrated networks. We observe that CCA outperforms SVD or Variational Autoencoder in link predicton across different numbers of embedding dimensions. Our runtime analyses show that consensus embedding is multiple orders of magnitude more efficient than computing the embeddings of the integrated networks at query time. CCA also provides an efficient method for computing of consensus embeddings as compared to Variational Autoencoder, and it is robust to large numbers of versions and large number of dimensions.

Node embedding
Node embedding aims to learn a low-dimensional representation of nodes in networks . Given a graph G = (V, E), a node embedding is a function f :V → R d that maps each node v ∈ V to a vector in R d where d |V|. A node embedding method computes a vector for each node in the network such that the proximity in the embedding space reflects the proximity/similarity in the network.
There are many existing methods for computing node embeddings (Grover & Leskovec, 2016;Ahmed et al., 2019;Perozzi et al., 2014). Node embedding methods can also be roughly divided into community-based approaches and role-based approaches . Communitybased approaches aim to preserve the similarity of the nodes in terms of the communities they induce in the network. In contrast, role-based approaches aim to capture the topological roles of The framework for the computation of consensus embeddings. The k network versions represent networks with the same node set but different sets of edges. Our framework for the computation of consensus embeddings assumes that embeddings for each network were computed separately, possibly using embedding spaces with different number of dimensions. It then computes a d-dimensional consensus embedding that represents the superposition of the k versions, to be used for downstream analysis tasks.
the nodes and map nodes with similar topological roles close to each other in the embedding space. As representatives of these different approaches, we here consider node2vec (Grover & Leskovec, 2016) (community-based) and role2vec  (role-based) in our experiments.

Consensus embedding
Consensus embedding (Li & Koyutürk, 2020) is defined as follows: Let G 1 = (V, E 1 ), G 2 = (V, E 2 ), ..., G k = (V, E k ) be k versions of a network, their embeddings X 1 , X 2 , ..., X n are given. X 1 , X 2 , ..., X n can have same or different numbers of dimensions, but they all have n rows since all versions have the same set of nodes. Our goal is to use X i ∈ R n×d i to compute d-dimensional node embeddings X c for G, without knowledge of G or the G i s. Here, d i denotes the number of dimensions of X i and d is our target dimension. d should be less than or equal to min{d i } to get a meaningful result of consensus embedding.
The framework for the computation of consensus embeddings is shown in Figure 1. Consensus embedding can be used in many downstream tasks, including link prediction and node classification.

Link prediction
Link prediction is an important task in network analysis (Martínez et al., 2016). Given a network G = (V, E), link prediction aims to predict the potential edges that are likely to appear in the network based on the topological relationships between pairs of nodes. Link prediction can be supervised (De Sá et al., 2011) or unsupervised (Kuo et al., 2013).
In our experiments, we use BioNEV (Yue et al., 2020) to test the performance of the link prediction accuracy of the consensus embeddings. It is a supervised method that aims to systematically evaluate embeddings. It outputs the AUC scores of the link predictions using the embeddings.

Computing consensus embeddings
Our framework is illustrated in Figure 1. In the framework, the dimensions of embeddings that represent different network versions can be different from each other. The "node embedding" in the figure can be any embedding methods, we use node2vec and role2vec in our experiments. We consider four methods for computing consensus embeddings: (i) Baseline (average of individual embeddings, requiring individual embeddings to have the same number of dimensions), (ii) SVD, (iii) Variational Autoencoder, (iv) CCA (for pairs of network versions), or GCCA (for more than two network versions). Figure 2 illustrates the dimensionality reduction methods.

Baseline consensus embedding:
The baseline embedding we consider assumes that the embeddings of individual network versions represent similar spaces and the dimensions of the embeddings align with each other (Park et al., 2020). This requires that d 1 = d 2 = ... = d k = d. Thus, provided that the embeddings of invidiual versions have the same number of dimensions, we compute the baseline embedding as follows: ( 1 )

Singular value decomposition (SVD):
SVD is a matrix decomposition method for reducing a matrix to its constituent parts. The singular value decomposition of an m × p matrix M, whose rank is r, is a factorization of the form USV T , where U is an m × r unitary matrix, S is an r × r diagonal matrix, and V is an p × r unitary matrix. S is a diagonal matrix and the diagonal values of S are called the singular values of M. Let X be the n × D matrix obtained by concatenating X 1 , X 2 , ..., X k , where D = d 1 + d 2 + ... + d k . If we set our objective as one of choosing an n × D matrix Y with rank d to minimize the Frobenius or 2-norm of the difference ||X − Y||, then the optimal solution is given by the truncation of the SVD of X to the largest d singular values (and corresponding singular vectors) of X.
In other words, letting M = X in the formulation of SVD, we obtain n × r dimensional matrix U, r × r dimensional matrix S, and D × r dimensional matrix V, where r denotes the rank of X and X = USV T . Now let U , S , and V denote the n × d, d × d, and D × d matrices obtained by choosing the first d columns (also rows for S) of, respectively, U, S, and V. Then the matrix Y = U S V T provides the best rank-d approximation to X. Consequently, V provides an optimal mapping of the D dimensions in X to d-dimensional space. Based on this observation, SVD-based dimensionality reduction sets i.e., it maps the D-dimensional concatenated embedding of each node of the graph into the d-dimensional space defined by the SVD of X.

Variational autoencoder:
An autoencoder is an unsupervised learning algorithm that applies backpropagation to obtain a lower-dimensional representation of data, setting the target values to be equal to the inputs. The autoencoder is a neural network with D inputs, each representing a column of the matrix X (i.e., a dimension in one of the k embeddings spaces). The encoder layer(s) map these D inputs to d latent features shown in the middle, which are subsequently transformed into the D output by the decoder layer(s). While training the network, each row of the matrix X (i.e., the embedding of each node) is used as an input and the respective output. The neural network is trained using this loss function: where Y denotes the n × D matrix whose rows represent the outputs of the network corresponding to the inputs that represent the rows of X. Thus, the idea behind the variational autoenconder is to learn an encoding of the D input dimensions into the d latent features (shown in the middle) such that the D inputs can be reconstructed by the decoder with minimum loss. Observe that this loss function is identical to that of SVD; however, the use of neural networks provides the ability to perform nonlinear dimensionality reduction. Once the neural network is trained, we perform dimensionality reduction by retaining the d-dimensional output of the encoder that corresponds to each of the n training instances (rows of the matrix X or nodes in V). These n d-dimensional vectors comprise the matrix X (VAE) C , i.e, consensus embeddings of the nodes in V computed by variational autoencoder.
In our implementation, we use a convolutional autoencoder (Masci et al., 2011). Same as a standard autoencoder, a convolutional autoencoder also aims to output the same vectors as the input.
The convolutional autoencoder contains convolutional layers in the encoder part of the autoencoder. In every convolutional layer, there is a filter that slides around the input matrix to compute the next layer. Convolutional autoencoder also have pooling layers after each convolutional layer. In the decoder part, there are deconvolutional layers and unpooling layers that recovers the input matrix.

Canonical correlation analysis (CCA):
Given two embedding matrices, X 1 ∈ R n×d 1 and X 2 ∈ R n×d 2 , computed by any embedding algorithm, such as node2vec (Grover & Leskovec, 2016) for G 1 = (V, E 1 ) and G 2 = (V, E 2 ), respectively. Our objective is to compute an n × d embedding matrix, where d = min{d 1 , d 2 }, such that intrinsic information encoded in each of these embedding matrices is preserved.
CCA aims to find low-dimensional latent representations, W 1 ∈ R d 1 ×d and W 2 ∈ R d 2 ×d , such that cosine angles between W 1 T X 1 (i) and W 2 T X 2 (i) is minimized, where X 1 (i) and X 2 (i) denote the vectors containing the ith rows of the respective matrices (Uurtio et al., 2017), i.e. they are the embeddings of the same node in two different versions of the network.
We first discuss the case for d = 1 for ease of exposition, and then generalize to the case d ≥ 2. Minimizing the cosine of the angle between two vectors can also be thought as minimizing Euclidean distance between the vectors in a unit-ball. Thus, we can state CCA's objective as follows: By using the sample covariance matrix definition and ignoring constant terms, we can restate CCA's objective function as a maximization problem (Uurtio et al., 2017). Furthermore, we can extend the optimization problem to the case with d ≥ 2, by adding the additional constraint that the columns of W 1 and W 2 are orthogonal: where 11 = 1 n n i=1 X 1 X 1 T , 22 = 1 n n i=1 X 2 (i)X 2 (i) T and 12 = 1 n n i=1 X 1 (i)X 2 (i) T are sample covariance matrices. By using Lagrange duality, this problem can be solved as a generalized eigenvalue problem or SVD (Uurtio et al., 2017).
Namely, letting B = 11 −1/2 12 22 −1/2 and computing the SVD of matrix B as B = U T B S B V B , the best projection matrices sought by CCA are given as: Finally, after computing W 1 and W 2 using the above framework, we construct consensus embedding of G 1 and G 2 as An illustration of the application of CCA to the computation of consensus embeddings is illustrated in Figure 3. The two versions of the network, G 1 and G 2 are shown in Figure 3(a) and (b), respectively. The integrated network obtained by superposing these two versions is shown in Figure 3(c). The two-dimensional embeddings of the two versions are shown in Figure 3(c). Clearly, these embeddings are in different spaces, so we arbitrarily match the dimensions of the two embeddings. The 10 × 2 matrices W 1 T X 1 and W 2 T Y 2 are visualized in Figure 3(f). This represents the projection of the embeddings of the two versions to the common space computed by CCA. As seen in the figure, the points that correspond to the same node are brought close together, as captured by the objective function of CCA. The consensus embedding, X (CCA) c , is computed by taking the mean of the two points that correspond to each node, and is shown in Figure 3(g). The two-dimensional embedding of the integrated network is shown in Figure 3(c). Comparison of the two embeddings shows that CCA is able to reconstruct the overall structure of the embeddings, with the nodes in the two communities mapped close to each other.
The effect of CCA on multidimensional embeddings is illustrated in Figure 4(a). Given two matrices (here we use two embeddings), the correlations between the dimensions do not have any  Figure 4(a)), i.e., the dimensions of the embeddings do not have a clear correspondence. After application of CCA, the correlations between the corresponding dimensions (diagonal of the matrix) become larger, and the correlations of the pairs other than corresponding dimensions are all close to 0.

Generalized canonical correlation analysis (GCCA):
Since CCA is designed to work with two vector spaces, it can be applied to the computation of consensus embeddings with two network versions. To compute consensus embeddings for k > 2 dimensions, the framework needs to be generalized.
Generalized CCA (Kettenring et al., 1971) is a method that applies CCA on more than two matrices. Let G 1 , G 2 , ..., G k denote the k versions of a network with n nodes. Assume that we have the k respective embeddings X 1 , X 2 , ..., X k available, where the embeddings are, respectively, d 1 , d 2 , ..., d k dimensional. GCCA learns the following optimization problem: where A ∈ R d×n is a shared representation of the n embedding spaces, and Z i ∈ R d i ×d are the individual projection matrices for each embedding. Once A and Z i 's are computed, we use the mean of the projected embeddings as the consensus embedding, i.e.
Figure 4(b) illustrates the effect of GCCA on the correlations between dimensions. Given three 16-dimensional embeddings X 1 , X 2 , X 3 , the correlations between {X 1 , X 2 }, {X 1 , X 3 }, and {X 2 , X 3 } are shown in the upper panel of Figure 4(b). We can see no patterns in the three matrices. However, after using GCCA, we get the lower panel of Figure 4(b). The darkest parts of those three matrices appear close to their diagonals.
We use the software (Jameschapman19, 2020) in our implementations of CCA and GCCA consensus embedding. It provides the implementations of multiple kinds of CCA-related methods.

Processing combinatorial link prediction queries for versioned networks
Consider the following scenario: A graph database houses k versions of a network (as formulated at the beginning of this section). These k versions may either come from different resources (e.g., different protein-protein interaction databases) or represent semantically different types of edges between a common set of nodes (e.g., genetic interactions vs. physical interactions vs. functional association among human proteins). In this setting, a "combinatorial" link prediction query can be formulated as follows: The user chooses (i) a node q ∈ V, and (ii) a subset S ⊆ {G 1 , G 2 , ..., G k } of networks. The query seeks to identify the nodes that are most likely to be associated with the query node q based on the topology of the integrated network G (S) = (V, E (S) ), where E (S) = i∈S E i . Such a flexible query framework is highly useful in the context of many applications, since the relevance and reliability of different network versions can be variable, and different users may have different needs and preferences.
The above framework defines a "combinatorial" query in the sense that a user can select any combination of networks to integrate. This poses a significant computational challenge as the number of possible combinations of networks is exponential in the number of networks in the database, i.e., the user can choose from 2 k − 1 possible combinations of networks.
Embedding-based link prediction can facilitate the development of effective solutions to the combinatorial challenge associated with combinatorial link prediction queries, because link prediction algorithms using node embeddings do not need to access to the network topology while performing link prediction. By computing and storing node embeddings in advance, it is possible to efficiently process link prediction queries while giving the user the flexibility to choose the combination of networks to integrate.
Consensus embeddings provide an alternate solution that can render storage feasible while enabling real-time query processing for very large networks and large number of versions: Compute and store the embeddings for each network separately. When the user selects a combination, compute a consensus embedding for that combination and use it to process the query. One important consideration in the application of this idea is the "inexact" nature of consensus embeddings, i.e., consensus embeddings may not adequately capture the information represented by the embeddings computed on the integrated network.

Experimental results
In our experiments, we try to predict new links in the integrated network of multiple networks. We use BioNEV to split multiple input graphs into training and testing sets, and compute the consensus embedding of the training graphs. Then the consensus embedding of the training graphs are used as an input of BioNEV evaluation to predict all the testing edges from the multiple graphs.

Datasets
In our computational experiments, we use human protein-protein interaction (PPI) networks obtained from BioGRID (Stark et al., 2006) and yeast PPI networks from STRING (Franceschini et al., 2012;Cho et al., 2016). PPI networks contain physical interactions and functional associations between pairs of proteins. The human PPI dataset we use contains multiple PPI networks separated based on experimental systems. Each network (version) contains a unique type of PPI (genetic or physical). The yeast PPI network dataset contains four PPI networks derived from different sources (e.g. experimental data, or curated database). The types of the interactions represented by each network version are shown in Tables 1 and 2. In order to obtain multiple networks with the same set of nodes, we remove the nodes (proteins) that do not exist in all versions. After preprocessing, all the human PPI versions have 1025 nodes , and all the yeast PPI versions have 1164 nodes. The type of PPI and the number of edges for each network are shown in Tables 1 and 2.

Experimental setup
We compare the link prediction performance of the node embeddings computed on integrated networks and consensus embeddings computed based on the embeddings of individual networks. We consider two embedding algorithms, Node2vec (Grover & Leskovec, 2016) and Role2vec , and multiple methods for computing consensus embeddings, SVD, variational autoencoder, CCA (or GCCA for more than two versions), and baseline(average).
To assess link prediction performance, we use BioNEV (Yue et al., 2020), a Python package that is developed to assess the performance of various tasks that utilize network embeddings. The software splits an input graph into a training graph and a testing edge set. BioNEV uses the known interactions as positive samples and randomly selects the negative samples. Both samples are split into a training set (80%) and a testing set (20%). For each node pair, BioNEV concatenates the embeddings of two nodes as the edge feature and then build a binary classifier. It outputs the AUCs of the link predictions using the embeddings.
In the experiments focusing on the consensus embeddings of pairs of versions, we are given two embeddings X 1 ∈ R n×d 1 and X 2 ∈ R n×d 2 , we compute the consensus embedding X c ∈ R n×d , in which d = min{d 1 , d 2 }. We show results for cases d1 = d2 or d1 = d2, and compare different dimensionality reduction methods. Note that, when d1 = d2, the baseline method for computing consensus embeddings does not apply as it requires the embeddings to have equal number of dimensions.
For multiple (more than two) networks, we consider the case where all embeddings have the same number of dimensions, and the number dimensions of the consensus embedding is the same as the embeddings of individual versions.

Link prediction for pairs of versions
As mentioned before, the dimensions of the individual embeddings can be the same or different. In this section, we show the results of pairs of versions for both same numbers of dimensions and different numbers of dimensions. We compute the consensus embedding of all combinations of two networks from Table 1 (eight versions with 28 pairs), and get their link prediction accuracies by BioNEV.

Two embeddings with same dimensions
We test all the combinations of the eight versions described in Table 1. Figure 5 shows the AUCs provided by the consensus embeddings computed using four different methods, compared with the AUC of the embedding of the integrated network. The embedding algorithm used in the figure is node2vec. For most network pairs, consensus embeddings deliver better link prediction accuracy than the integrated network's own embeddings. Among the three dimensionality reduction methods, SVD and CCA are more stable and consistently deliver better accuracy than variational autoencoder. As expected, the accuracy of the baseline consensus embedding (the average of the pair of embeddings) is much lower, but is still better than would be expected at random. This suggests that there can be some correspondence between the embedding spaces of different network versions. Figure 6 shows the link prediction results of consensus embeddings for pairs of networks when the numbers of dimensions of two embeddings are not the same. From Figure 6, the average accuracy of consensus embedding using CCA is higher than SVD or autoencoder. The data points of SVD or autoencoder are sparser in the range, and the lowest point is between 0.55 and 0.6. Unlike SVD or autoencoder, all data points of CCA are above 0.65, and most of them are in the range of (0.70, 0.75).

Link prediction for multiple versions
Our experiments for multiple (> 2) networks are based on all combinations of the datasets (8 versions with 247 potential combinations). Figure 7 shows the link prediction results of consensus embeddings of more than two versions. In these experiments, we use GCCA instead of CCA even for two versions. In each figure, the AUC of link prediction is shown as a function of the number of network versions. We observe that, on average, the accuracy of link prediction goes down with increasing number of versions that are integrated. However, the performance difference between embedding of integrated network and consensus embeddings are not very large. This observation suggests that the utility of concensus embeddings can be more pronounced for network databases with larger number of versions.  As seen in Figure 7, accuracy of link prediction is improved with increasing number of dimensions in node embeddings. The embedding of the integrated network is the most accurate one in most cases. The link prediction accuracy of GCCA is better in lower cases, and it exceeds the accuracy of the integrated network when number of dimensions is 16. In larger numbers of dimensions, SVD and autoencoder perform better than GCCA.
In this part, we include another embedding method, role2vec, to show that consensus embedding also works for other embedding methods. GCCA performs better than SVD or variational autoencoder when used in consensus embedding of role2vec. Also, in the results of role2vec, the accuracy of consensus embedding become closer and closer to the accuracy of the embedding of the integrated network as the number of dimensions goes higher.
We also plot the AUC as a function of dimensions (Figures 8 and 9). We take all the networks in each dataset and compute the consensus embeddings by different methods. In general, the AUC increases as the number of dimensions increases for both node2vec and role2vec. When the number of dimensions reaches 128, the accuracy does not increase as much as the lower dimensions, so we might not gain much accuracy improvement when we use too large dimensions. In Figure 8, the performance of the integrated networks' embeddings is better than consensus embeddings, but GCCA outperforms the integrated network in Figure 9, especially for role2vec.
From these results, consensus embedding performs good or even better than the embedding of the integrated networks. Among the three kinds of dimensionality reduction methods, the results of SVD and variational autoencoder are similar in most experiments, and GCCA is the best one in most cases. Average might work well when the number of dimensions is low, but becomes worse when the number of dimensions is higher. For role2vec, average is always the worst among all the methods.

Runtime analysis
In this section, we investigate whether consensus embeddings improve the efficiency of processing link prediction queries. For this purpose, we compare the query processing time for consensus embeddings computed using different methods (SVD, autoencoder, and so on) against embeddings computed at query time after integrating the combination of networks selected by the user.
The results of this analysis are shown in Figure 10. As seen in the figure, processing queries using consensus embeddings drastically improves the efficiency of query processing. "Consensus   Embedding at Query Time" using SVD enables processing of combinatorial link prediction queries in real time across the board, while integration of networks at query time requires orders of magnitude more time to process these queries. In most cases, "Consensus Embedding at Query Time" of convolutional autoencoder is also faster than "Network Integration at Query Time", but its performance degrades with increasing number of networks that are being integrated. Also, the runtime becomes longer when the number of dimensions goes higher.
The runtime of computing an embedding increases as networks become denser, especially for node2vec, because node2vec runs random walks starting from every nodes. As seen in 10, the blue dots in the plots of node2vec are separated into two groups. This is because G 4 is extremely dense (see Table 1), making the integrated networks that contain G 4 also dense. Therefore, combinations that contain G 4 have a significantly higher query runtime as compared to those that do not contain G 4 . Computation of consensus embeddings using SVD or CCA/GCCA is more robust to this effect. Average of individual embeddings is the fastest, even much faster than SVD or CCA, and it is also very stable.

Conclusions
In this work, we consider the problem of computing node embeddings for integrated networks derived from the multiple network versions. We focus on the performance of link prediction using consensus embeddings compared with using the embeddings of the integrated networks.
We introduce a new dimensionlity reduction method, CCA and GCCA, into the consensus embedding process, and generalized the method such that the input embeddings can have different dimensions. CCA performs better than SVD or autoencoder when the numbers of dimensions of the pairs of embeddings are different.
We test the performance of link prediction of the consensus embeddings and found that accuracy of consensus embeddings is similar with the accuracy of embeddings computed directly from the integrated network. When there are only two versions, consensus embedding (for embeddings of the same number of dimensions) performs better than the embedding of the integrated network in link prediction in almost all experimental tests. For more than two versions, consensus embeddings also perform good or even better than the embeddings of the integrated networks, and CCA/GCCA performs better than SVD or variational autoencoder in most cases, especially when the embedding method is role2vec.
From our results, linear methods like CCA/GCCA and SVD work better than nonlinear methods like autoencoder. We guess that there are two main reasons: • Network embedding algorithms use linear models to represent the proximity of nodes in a network, thus linear methods may perform better in computing consensus embeddings. • Autoencoder is a neural-network based method, thus it may need more training data, i.e. the networks we are working with may be too small (in terms of the number of nodes) or too sparse for them to learn reliable latent patterns. To this end, autoencoders require more hyperparameters to be tuned, e.g. the number of layers, and the dimensions of each layer, which may also have an adverse effect on their reliability.