Towards Higher-order Topological Consistency for Unsupervised Network Alignment

1^st Qingqiang Sun University of New South Wales
Sydney, Australia
qingqiang.sun@unsw.edu.au 2^nd Xuemin Lin Shanghai Jiao Tong Universiy
Shanghai, China
xuemin.lin@gmail.com 3^rd Ying Zhang {@IEEEauthorhalign} 4^th Wenjie Zhang1 1Corresponding author. University of Technology Sydney
Sydney, Australia
ying.zhang@uts.edu.au University of New South Wales
Sydney, Australia
zhangw@cse.unsw.edu.au 5^th Chaoqi Chen University of Hong Kong
Hong Kong, China
chencq@connect.hku.hk

Abstract

Network alignment task, which aims to identify corresponding nodes in different networks, is of great significance for many subsequent applications. Without the need for labeled anchor links, unsupervised alignment methods have been attracting more and more attention. However, the topological consistency assumptions defined by existing methods are generally low-order and less accurate because only the edge-indiscriminative topological pattern is considered, which is especially risky in an unsupervised setting. To reposition the focus of the alignment process from low-order to higher-order topological consistency, in this paper, we propose a fully unsupervised network alignment framework named HTC. The proposed higher-order topological consistency is formulated based on edge orbits, which is merged into the information aggregation process of a graph convolutional network so that the alignment consistencies are transformed into the similarity of node embeddings. Furthermore, the encoder is trained to be multi-orbit-aware and then be refined to identify more trusted anchor links. Node correspondence is comprehensively evaluated by integrating all different orders of consistency. In addition to sound theoretical analysis, the superiority of the proposed method is also empirically demonstrated through extensive experimental evaluation. On three pairs of real-world datasets and two pairs of synthetic datasets, our HTC consistently outperforms a wide variety of unsupervised and supervised methods with the least or comparable time consumption. It also exhibits robustness to structural noise as a result of our multi-orbit-aware training mechanism.

unsupervised network alignment, edge orbit, graph convolutional network, high-order topological consistency

I Introduction

Network alignment task, which aims to identify entity correspondence across different networks, is usually the very first step of many downstream analyzing tasks. For instance, recognizing the same user on different social networks can facilitate friend suggestion, item recommendation, personalized advertisement [1, 2, 3, 4, 5]. Similar scenarios also exist widely in other fields, such as protein network analysis [6], knowledge discovery [7], etc.

Identifying corresponding nodes across different networks is an extremely hard task, even for humans. Manually labelling correspondence can be prohibitively challenging, expensive (in human efforts, time, and money costs), and tedious [8]. Due to such obstacles, in some cases, it may be impractical to get access to sufficient labels for training well-performed supervised or even semi-supervised models [9, 4]. By contrast, unsupervised models can be trained without the need for labeled data, which is more flexible and practical in real-world application scenarios. Thus, unsupervised alignment methods have been drawing a surge of interest recently [10, 11, 12].

In general, most existing alignment methods construct their models based on the assumption of attribute consistency or structural consistency, no matter supervised or unsupervised [13]. For instance, IsoRank [14] is a typical topology-only method that recognizes two nodes from two networks to be similar if their neighborhoods are similar. The most direct application of attribute consistency is to heuristically align users with the same user profiles such as name, gender, birthday, and location [15, 16]. Instead of relying only on either attribute consistency or structural consistency, an increasing body of recent work makes efforts to take both attributes and structural information into consideration (if both are available) and receives significant performance gains by embracing the complementary effect between them [17, 18, 10].

Despite various types of topological information being utilized, such as node degree [18], connectivity [19], and co-occurrence in random walks [20], most existing works extract such information by relying solely on the most trivial topological pattern and accordingly define their low-order topological consistency. Specifically, in the trivial perspective of network topology, the connections between nodes are indiscriminative; that is, each edge plays the same role in the network. Nevertheless, edges can be further distinguished in a higher-order view of network topology based on their specific functions in local network structures, such as motifs [21] or graphlets [22] (it is worth noting that the so-called higher-order topology is not simply equivalent to a larger neighborhood). The main differences between trivial and high-order topology lie in two ways:

(a) Topological discrimination. Compared with trivial topology, high-order topology is more discriminative. For instance, as displayed in Fig. 1, all edges have the same weights/widths in the plain type of network topology, but they show a significant difference when higher-order edge patterns are considered (details about edge orbit will be illustrated in section IV). As a result, the local topology around a node can be distinguished to a larger extent.

(b) Accuracy in topological consistency. As the trivial topology is of less discrimination, the corresponding topological consistency defined by that may not be accurate enough and hence degrades the alignment performance. For example, say we want to identify the anchor node of node A in Fig. 1 based on their one-hop neighborhood. Any node with the same degree as node A can be regarded as satisfying low-order topological consistency based on the original topological pattern, which may be easily satisfied by a lot of candidates and lead to an ambiguous alignment, while in the higher-order topology, a potential anchor node should not only have an identical degree but also have edges with similar weights connecting its neighbors, which results in a more accurate consistency and helps to filter out deceptive candidates.

Focusing solely on the trivial topology and corresponding consistency is generally more risky in the unsupervised setting. The fact is, models lack the capacity to extract extra information favourable to alignment accuracy from the trivial topology without the guidance of label information, and hence heavily depend on the consistency assumption. In other words, developing low-order topological consistency to high order is more crucial in this scenario.

Fig. 1: Network topology can differ significantly when we define adjacency based on different orbit patterns. Take node A as an example: all of its edges are identical in (a); its edges in (b) and (c) have different weights/widths according to the frequency that they occur on orbit 3 and 8 respectively (the edge removed is equivalent to its corresponding weight being 0).

To alleviate the aforementioned issues, in this paper, we propose a fully unsupervised framework called HTC (towards Higher-order Topological Consistency for unsupervised network alignment). Our assumption is that two nodes are more likely to have an anchor link if they hold not only low-order but also higher-order topological consistency. Considering that, we define higher-order topological consistency based on edge orbits. By conducting an orbit-weighted aggregation process, we transform the higher-order topological consistency and attribute consistency into the similarity of corresponding node embeddings, which is theoretically analyzed and proved. Without labeled data, we learn from the training paradigm of Graph Auto-Encoder (GAE) [23] but share the encoding parameters between different orders of topology so that the encoder can be multi-orbit-aware. In addition, we propose an effective fine-tuning mechanism to find more trustworthy anchor links by assigning trusted node pairs a larger aggregation coefficient, and its validity is verified in theory. A comprehensive evaluation of node correspondence is obtained by further integrating alignment scores computed based on different orders of consistency, rather than focusing on any single topological pattern.

The proposed HTC framework is not only theoretically sound but also empirically performs well. HTC consistently outperforms a wide variety of state-of-the-art models, even though some of those competitors are supervised, across three pairs of real-world datasets with the least or comparable time cost. In particular, significant improvements (up to 49% in terms of $p r e c i s i o n @ 1$ ) over the basic Graph Convolutional Network (GCN) based alignment paradigm are observed continuously. Experimental results on two pairs of synthetic datasets also verify the robustness of HTC against structural noise. Other experimental results, including ablation tests, hyperparameter sutdy, and visulization analysis, further verify the significance of the proposed higher-order topological consistency for improving alignment precision.

Our contributions can be summarized as follows:

We propose a fully unsupervised network alignment framework, which formulates higher-order topological consistency based on edge orbits.
It is theoretically proved that the formulated higher-order topological consistency combined with attribute consistency can derive node embedding similarity, which transforms the network alignment task into the task of node embedding similarity measurement.
As an additional effect of the proposed multi-orbit-aware training mechanism, our method can be robust to structural noise.
We introduce the concept of trusted pairs and accordingly refine embeddings so as to find more trusted pairs, which can alleviate the hubness problem that accompanies the roughly-learned embeddings.
The superiority of our method, such as effectiveness, efficiency, and robustness, is comprehensively evaluated through extensive experiments.

The rest of this paper is organized as follows. We briefly review related work in Section II. The problem is formulated in Section III before proposing our framework in Section IV. Section V reports experimental results, followed by conclusions made in Section VI.

Ii Related Work

Ii-a Network Alignment

So far, most network alignment approaches are supervised. Supervisory data are required so as to refine embedding space either in the form of partial ground truth (e.g. PALE [19], CENALP [24]) or prior alignment matrix (e.g. IsoRank [14], FINAL [17]). Due to the expensive cost of manual labeling, unsupervised methods are more suitable and desirable in some real-life applications.

Another noticeable trend is that more and more methods try to leverage both structural and attribute information so as to better capture the information of the network nodes (such as FINAL[17] and REGAL [18]). Recently, some work has taken advantage of the natural strength of GCN in integrating both network topology and node attributes and has empirically shown the great potential of GCN for alignment tasks [10, 25]. A wide variety of structural metrics are proposed and utilized for formulating corresponding topological consistency, like node degree [18], connectivity [19], and co-occurrence in random walks [20]. Besides, some methods pay attention to a broader neighborhood of nodes, such as stacking more GCN layers [10] or aligning triangular structures [26]. Even so, most previous work essentially just models the edge-indiscriminative topological pattern, and hence their corresponding topological consistencies remain low-order. Unlike previous work, we formulate the high-order topological consistency based on different orders of edge pattern. It should be emphasized that larger neighborhoods and higher-order consistency cannot be simply equalized. The experimental results also empirically demonstrate that our proposed higher-order topological consistency performs far better than simply considering a larger neighborhood, such as introducing diffusion matrices[27].

Ii-B High-order Pattern

High-order patterns of network structure such as graphlets and motifs have been demonstrated to be very informative and helpful for different tasks in many fields. For instance, graph motifs are utilized in [28] for classifying directed CORA citation networks. Graphlet patterns are taken into consideration to obtain better performance in the semi-supervised node classification task [29]. In [30], the use of motifs enables the perception of high-order semantics in heterogeneous graphs. In more detail, each node or edge can be differentiated at the level of orbit [31, 32]. Some network alignment techniques, such as H-GRAAL [33], GREAT [34], GraphletAlign [35], etc., treat the graph degree vectors as node/edge features and then quantify the corresponding topological similarity for network alignment. By contrast, we define high-order topological consistency based on the constructed graph orbit matrix, which is injected into the aggregation process of GCN in order to further extract and integrate useful information. In this way, we are capable of obtaining abstract features of nodes for similarity computation.

Ii-C Graph Convolutional Network

Compared with other node embedding techniques like Node2Vec [36], DeepWalk [37], and LINE [38], GCN [39] is naturally capable of integrating both structure and attribute information due to its special encoding mechanism and has shown its effectiveness in processing graph-structured data [40]. To address the issue that GCN treats every edge equally without distinction, Graph Attention Network and its variants are proposed, which generally need labels to train extra parameters [41, 29]. However, by using higher-order topological patterns, we can distinguish edges without introducing additional training-required parameters, which is a great merit for unsupervised learning.

Auto-Encoder and its variants have important applications in many fields. For example, Wang et al. [42] use variational autoencoder with dynamic extensions to successfully address the dynamic interaction problem of latent variables. As a special variant, Graph Autoencoder (GAE) [23] is proposed for unsupervised learning on graph-structured data. Inspired by such an encoder-decoder paradigm, we train our GCN encoder to be multi-orbit-aware by sharing encoding parameters among different orders of orbit topology. Hence, it can be expected to prevent the model from relying on any single topological pattern.

Iii Problem Formulation

Our work targets the problem of aligning two attributed networks. Since the alignment tasks are usually conducted on networks in similar domains, it is easy to find common (not necessarily all) attributes in their attribute spaces. Generally, an attributed network can be represented as $G = (V, A, X)$ , where $V$ is a set consisting of $n$ nodes, $A \in (0, 1)^{n \times n}$ is the adjacency matrix of the network which contains the connection details among those nodes, and $X \in R^{n \times d}$ is the attribute/feature matrix in which every node has a $d$ -dimensional feature.

Typically, one of two to-be-alined networks/graphs is called source network/graph and the other is called target network/graph, denoted as $G_{s} = (V_{s}, A_{s}, X_{s})$ and $G_{t} = (V_{t}, A_{t}, X_{t})$ respectively. The correspondence is referred to as anchor links, and the nodes forming the links are referred to as anchor nodes. Mathematically, the unsupervised network alignment task can be defined as follows:

Definition 1.

Unsupervised Network Alignment.

Given a source network $G_{s} = (V_{s}, A_{s}, X_{s})$ and a target network $G_{t} = (V_{t}, A_{t}, X_{t})$ , the objective of unsupervised network alignment is to identify all possible anchor links across $G_{s}$ and $G_{t}$ by computing the alignment matrix $M \in R^{n_{s} \times n_{t}}$ without using any observed anchor links, where every entry $M (i, j)$ represents the alignment score between node $i \in V_{s}$ and node $j \in V_{t}$ .

Generally, most existing work follows the consistency restraints shown in Fig. 2.

Fig. 2: An illustrative example of alignment consistency. Attribute Consistency: the gender attribute of node “a” and its corresponding node “A” are both “male”; Topology Consistency: nodes “a” and “b” are connected in $G_{s}$ , and their corresponding nodes “A” and “B” are also connected in $G_{t}$ .

Iv Methodology

The framework of the proposed HTC is shown in Fig. 3. We introduce details of HTC in this section.

Iv-a Higher-order Topological Consistency

Fig. 4: Induced graphlets with 2-4 nodes and automorphism orbits.

Fig. 5: As an illustrative example, only the first five orbits are considered here. For node $b$ , edge $(a, b)$ and $(b, c)$ are treated exactly the same if only consider two-node connectivity (orbit 0). However, taking other higher-order topological patterns into account, as listed in the left table, these two edges can be further distinguished since the frequencies at which they occur on orbit 1 and 4 are different, which indicates different roles that $(a, b)$ and $(b, c)$ play in the same subgraph.

We first construct graphlet orbit matrix and define the higher-order topological consistency based on it.

Graphlet is a term used to denote a connected network with a small number of nodes [22]. Any induced subgraph can be grouped into a certain type of graphlet if they are isomorphic. $G^{'} = (V^{'}, E^{'})$ is an induced subgraph of $G = (V, E)$ if and only if $V^{'} \in V$ and $E^{'} = {(u, v) \in E | u, v \in V^{'}}$ . Furthermore, the edges of each graphlet can be clustered into orbits with respect to the graphlet automorphisms [32]. For instance, there are 9 graphlets and 13 edge orbits for subgraphs with 2-4 nodes as shown in Fig. 4. Orbits define the “roles” of edges within the graphlet. As an example, in the graphlet of a three-edge chain, one edge represents the bridge and the remaining two edges can be connected through the bridge; the edges of this graphlet thus form two different orbits (numbered 4 and 3, respectively). Moreover, edges around a node can be distinguished by comparing their occuring frequency on multiple orbits, as in the illustrative example shown in Fig. 5.

We construct a set of graphlet orbit matrix (GOM), denoted as $O = {O_{0}, O_{1}, \dots, O_{K}}$ , where $K$ matrices correspond to $K$ orbits. Each GOM contains the frequency of edges on the corresponding orbit. Formally, given a graph $G$ with a vertex set of size $n$ and any two vertices $i$ and $j$ of it, the $k$ th GOM $O_{k} \in N^{n \times n}$ is defined as follows:

\begin{matrix} O_{k} (i, j) = & the number % of times that edge (i, j) occurs on orbit k . \end{matrix}

(1)

The greater the value of $O_{k} (i, j)$ is, the stronger the connection between node $i$ and node $j$ is with respect to the orbit $k$ . In addition to the above weighted definition, GOM can also be binary, in which $O_{k} (i, j)$ equals to 1 as long as the edge $(i, j)$ is observed on orbit $k$ for at least one time, otherwise the value of $O_{k} (i, j)$ is 0. In the binary case, the distinctions between edges are weakened. Referring to the example in Fig. 5, the value of edge $(b, c)$ on orbit 1 would be 1, which is the same as that of edge $(a, b)$ if using binary setup. Hence, in this paper, we mainly consider the weighted form of GOM.

Taking the value of each entry in GOM as the weight of the corresponding edge, GOMs exactly represent different levels of induced networks, as shown in Fig. 1. Therefore, we can mathematically define higher-order topological consistency based on GOMs:

Definition 2.

High-order Topological Consistency.

Given two nodes $u, v \in V_{s}$ and their corresponding nodes $u^{'}, v^{'} \in V_{t}$ , they are regarded as satisfying $k$ -order topological consistency, if their edges have identical weights in terms of the $k$ -th GOM, i.e., $O_{s, k} (u, v) = O_{t, k} (u^{'}, v^{'})$ .

Iv-B Orbit-weighted Encoding

To integrate attribute consistency and higher-order topological consistency, we elegantly merge higher-order topological information into the aggregation process of GCN. Instead of treating all edges equally, we respect the fact that edges playing different roles in each high-order structure should be given different weights when passing messages.

Weighted aggregation. We naturally assign the orbit-defined weight to an edge for passing information between nodes. Formally, the feature of node $i$ on $l$ th layer and $k$ th orbit can be obtained by:

h_{k, i}^{l} = f (\sum_{j \in N (i)} O_{k} (i, j) h_{k, j}^{l - 1} W^{l - 1}),

(2)

which can be performed efficiently in matrix formulation: $H_{k}^{l} = f (O_{k} H_{k}^{l - 1} W^{l - 1})$ , where $H \in R^{n \times d^{l - 1}}$ is a feature matrix, $f$ is a nonlinear activation function, and $W^{l - 1} \in R^{d^{l - 1} \times d^{l}}$ is a trainable weight matrix for node representation encoding.

Modified self-connection. In Eq. (2), only information from neighbors is considered to generate a node’s feature. To include the information from a node itself, we need to add self-connections. Note that the values of a weighted orbit matrix denote the frequency of edges occuring on a certain orbit, which may be far bigger than 1. In this case, using a typical identity matrix as the self-connection matrix is likely to result in the issue that the impact of self-connection is significantly weakened compared with its neighbors. Here, we define a more appropriate self-connection matrix with respect to orbit $k$ as follows:

C_{k} (i, i) = ⎧ ⎪ ⎨ ⎪ ⎩ \begin{matrix} 1, & if max 1 \leq j \leq n O_{k} (i, j) = 0; max 1 \leq j \leq n O_{k} (i, j), & otherwise . \end{matrix}

(3)

where ${max}_{1 \leq j \leq n} O_{k} (i, j) = 0$ means that node $i$ is not connected to any other node on orbit $k$ . Intuitively, this matrix allows a node to either assign an importance to itself that is in line with that of its most important neighbor, or just consider its own information if there are no neighbors around it. Hence, the orbit matrix can be modified by ${~ O}_{k} = O_{k} + C_{k}$ .

Normalized Laplacian. According to the aggregation mechanism of GCN, nodes with large values in ${~ O}_{k}$ will have large values in their feature representation, and vice versa. This may cause vanishing or exploding gradients [39]. Hence, in practice, the procedure of normalization is required. The symmetric normalized Laplacian matrix on orbit $k$ can be obtained by ${~ L}_{k} = {~ F}_{k}^{- \frac{1}{2}} {~ O}_{k} {~ F}_{k}^{- \frac{1}{2}}$ , where ${~ F}_{k} (i, i) = \sum_{j = 1}^{n} {~ O}_{k} (i, j)$ is a diagonal frequency matrix.

Forward encoding. Say the GCN encoder contains $L$ hidden layers, the output embeddings of two graphs, denoted as $H_{s, k}^{L}$ and $H_{t, k}^{L}$ respectively, can be obtained through the forward encoding process:

H_{s, k}^{L} = f^{L - 1} ({~ L}_{s, k} \dots f^{0} ({~ L}_{s, k} X_{s} W^{0}) \dots W^{L - 1}),

(4)

H_{t, k}^{L} = f^{L - 1} ({~ L}_{t, k} \dots f^{0} ({~ L}_{t, k} X_{t} W^{0}) \dots W^{L - 1}) .

(5)

where the non-linear activation functions $F = {f^{0}, f^{1}, \dots, f^{L - 1}}$ and the weight parameters $W = {W^{0}, W^{1}, \dots, W^{L - 1}}$ are shared between source and target graphs.

Theoretical Analysis. We mathematically prove that high-order structural consistency and attribute consistency will lead to identical node embeddings across two networks through our proposed encoding mechanism. First, we have the following lemma.

Lemma 1.

For two nodes in the same graph $u$ and $u^{'} \in V$ , if there exists a matching set $M = {(v, v^{'}) | v \in N (u), v^{'} \in N (u^{'})}$ and nodes of each matching pair have identical attributes, i.e. $X (v) = X (v^{'})$ and identical topological patterns in terms of orbit $k$ , i.e. $O_{k} (u, v) = O_{k} (u^{'}, v^{'})$ , then the embeddings of $u$ and $u^{'}$ are identical after encoded by one GCN layer, i.e. $H_{k} (u) = H_{k} (u^{'})$ .

Proof.

On the one hand,

\begin{matrix} H_{k} (u) & = f^{0} ({~ L}_{k} (u, :) X W^{0}) = f^{0} ⎛ ⎝ \sum v \in N (u) {~ L}_{k} (u, v) X (v) W^{0} ⎞ ⎠; \end{matrix}

on the other hand,

\begin{matrix} H_{k} (u^{'}) & = f^{0} ({~ L}_{k} (u^{'}, :) X W^{0}) = f^{0} ⎛ ⎝ \sum v^{'} \in N (u^{'}) {~ L}_{k} (u^{'}, v^{'}) X (v^{'}) W^{0} ⎞ ⎠ . \end{matrix}

Since $O_{k} (u, v) = O_{k} (u^{'}, v^{'})$ for all $(v, v^{'})$ pairs, we have ${~ L}_{k} (u, v) = {~ L}_{k} (u^{'}, v^{'})$ . Taking $X (v) = X (v^{'})$ into consideration, it is satisfied that $\sum_{v \in N (u)} {~ L}_{k} (u, v) X (v) = \sum_{v^{'} \in N (u^{'})} {~ L}_{k} (u^{'}, v^{'}) X (v^{'})$ . Hence, $H_{k} (u) = H_{k} (u^{'})$ . ∎

Actually, Lemma 1 indicates that orbit-defined aggregation inherits the characteristics of the typical aggregation manner of GCN that attribute and topology similarity of nodes can be transformed into the similarity of node embeddings. To leverage the power of orbit-weighted aggregation for alignment tasks, we extend Lemma 1 to the scenario of two networks and obtain the following proposition.

Proposition 1.

Suppose that two nodes $u \in V_{s}$ , $u^{'} \in V_{t}$ have an anchor link between them, and there exists a matching set $M = {(v, v^{'}) | v \in N_{s} (u), v^{'} \in N_{t} (u^{'})}$ . All matching pairs $(v, v^{'})$ satisfy the attribute consistency, i.e. $X_{s} (v) = X_{t} (v^{'})$ and the $k$ -order topological consistency, i.e. $O_{s, k} (u, v) = O_{t, k} (u^{'}, v^{'})$ , then the embeddings of $u$ and $u^{'}$ after encoded by the shared GCN parameters are identical, i.e. $H_{s, k} (u) = H_{t, k} (u^{'})$ .

Proof.

Similar to the proof of Lemma 1, we first expand the expression of embeddings of node $u$ and $u^{'}$ . For node $u$ ,

\begin{matrix} H_{s, k} (u) & = f_{s}^{0} ({~ L}_{s, k} (u, :) X_{s} W_{s}^{0}) = f_{s}^{0} ⎛ ⎝ \sum v \in N_{s} (u) {~ L}_{s, k} (u, v) X_{s} (v) W_{s}^{0} ⎞ ⎠; \end{matrix}

for node $u^{'}$ ,

\begin{matrix} H_{t, k} (u^{'}) & = f_{t}^{0} ({~ L}_{t, k} (u^{'}, :) X_{t} W_{t}^{0}) = f_{t}^{0} ⎛ ⎝ \sum v^{'} \in N_{t} (u^{'}) {~ L}_{t, k} (u^{'}, v^{'}) X_{t} (v^{'}) W_{t}^{0} ⎞ ⎠ . \end{matrix}

According to attribute consistency and topological consistency in terms of orbit $k$ , we have $X_{s} (v) = X_{t} (v^{'})$ and ${~ L}_{s, k} (u, v) = {~ L}_{t, k} (u^{'}, v^{'})$ . Besides, as the GCN encoder is shared between source and target graphs, $f_{s}^{0} = f_{t}^{0} = f^{0}$ and $W_{s}^{0} = W_{t}^{0} = W^{0}$ are satisfied. Hence, $H_{s, k} (u) = H_{t, k} (u^{'})$ . Similarly, for $L$ -layer GCN, $H_{s, k}^{L} (u) = H_{t, k}^{L} (u^{'})$ holds if corresponding $L$ -hop neighborhood is consistent. ∎

The key factor in extending Lemma 1 to Proposition 1 is the sharing of GCN encoder between two networks, which guarantees that nodes in two networks are embedded by the same mapping mechanism. Therefore, the consistency constraints of alignment can be transformed into the similarity of node embeddings and an extra step for aligning feature space can be omitted, which is claimed to be necessary in some previous embedding-based methods [43].

Iv-C Multi-orbit-aware Training

As opposed to just using the trivial edge-defined topology information, multiple higher-order topological patterns give us more comprehensive information for aligning nodes.

Orbit-reconstruction loss. Without alignment labels, we adopt auto-encoder scheme to train the model parameters $W$ . We reconstruct each orbit Laplacian matrix using generated representations. The matrix of source graph on orbit $k$ is rebuilt by:

{^L}_{s, k} = H_{s, k}^{L} \cdot {H_{s, k}^{L}}^{⊤} .

(6)

The reconstructed Laplacian matrix of target graph ${^L}_{t, k}$ can also be obtained in a similar way. The orbit-reconstruction loss for orbit $k$ is further computed as follows:

l o s s_{k} = {∥ ∥ {~ L}_{s, k} - {^L}_{s, k} ∥ ∥}_{F} + {∥ ∥ {~ L}_{t, k} - {^L}_{t, k} ∥ ∥}_{F},

(7)

where ${∥ \cdot ∥}_{F}$ denotes the Frobenius norm which is used to measure the difference between the original Laplacian matrix ${~ L}_{\cdot, k}$ and the reconstructed one ${^L}_{\cdot, k}$ . By minimizing $l o s s_{k}$ , the GCN encoder can be regarded as kth-orbit-aware since the useful information required for reconstructing the $k$ th orbit matrix is captured.

To be multi-orbit-aware, all orbit-reconstruction loss items are added up to the overall objective:

Γ = \sum k l o s s_{k} .

(8)

As $Γ$ is minimized by gradient descent optimizers, like Adam [44], the similarity of the embeddings generated by the encoder is able to reflect the corresponding topological and attribute consistency. The multi-orbit-aware process is summarized in Algorithm 1.

0: Orbit graphs

O = {O_{0}, O_{1}, \dots, O_{K}}

;

0: Embeddings of source graph and target graph

H_{s} = {H_{s, 0}, H_{s, 1}, \dots, H_{s, K}}

H_{t} = {H_{t, 0}, H_{t, 1}, \dots, H_{t, K}}

1: Obtain modified Laplacian matrices

{~ L}_{s} = {{~ L}_{s, 0}, {~ L}_{s, 1}, \dots, {~ L}_{s, K}}

{~ L}_{t} = {{~ L}_{t, 0}, {~ L}_{t, 1}, \dots, {~ L}_{t, K}}

;

2: Randomly initialize the encoding parameters

W = {W^{0}, W^{1}, \dots, W^{L - 1}}

;

3: for some epochs do

4: for

k = 0 \to K

5: Generate node embeddings

H_{s, k}

H_{t, k}

of source and target graphs on orbit

k

layer-by-layer using Eq. (4)-(5);

6: Decode

H_{s, k}

H_{t, k}

to obtain reconstructed Laplacian matrices

{^L}_{s, k}

{^L}_{t, k}

respectively using Eq. (6);

7: Compute the reconstruction loss of source graph and target graph simultaneously using Eq. (7);

8: end for

9: Compute the total reconstruction loss

Γ

of all patterns by using Eq. 8;

10: Minimize

Γ

to update

W

with the gradient descent optimizer;

11: end for

12: Generate embeddings

H_{s}

H_{t}

;

13: return

H_{s}

H_{t}

Algorithm 1 Multi-orbit-aware Embedding

Iv-D Trusted-pair based Fine-tuning

In this subsection, we use $h_{s}$ and $h_{t}$ to denote the embeddings of source graph and target graph output by the last hidden layer on any orbit, omitting the superscripts $L$ and subscripts $k$ for brevity.

Hubness problem. By Lemma 1 and Proposition 1, both potential anchor nodes across different graphs and nodes with similar attributes and structure in the same graph will have embeddings as similar as possible. In high dimensional embedding space, these similar embeddings will be located closely. As a result, some nodes, called hubs [45], tend to be disproportionally nearest neighbors of many other nodes in the other graph. Direct nearest-neighbor rule favors these hubs, while at most, one of all the nearby nodes around a hub is a true anchor node. Thus, to diminish the hubness problem and identify nodes that are more isolated and more trustworthy to align with other nodes, we introduce the locally isolated similarity index (LISI).

Locally isolated similarity index. Due to the properties of translation invariance and scale invariance, we use Pearson correlation coefficient to measure the similarity between the embeddings of two nodes $h_{s} \in H_{s}$ and $h_{t} \in H_{t}$ :

c o r r (h_{s}, h_{t}) = \frac{\sum i [h_{s} (i) - {¯ ¯¯ ¯ h}_{s}] [h_{t} (i) - {¯ ¯¯ ¯ h}_{t}]}{\sqrt{\sum i {[h_{s} (i) - {¯ ¯¯ ¯ h}_{s}]}_{s}^{2} \sum i {[h_{t} (i) - {¯ ¯¯ ¯ h}_{t}]}_{t}^{2}}},

(9)

where ${¯ ¯¯ ¯ h}_{s}$ , ${¯ ¯¯ ¯ h}_{t}$ is the mean value of $h_{s}$ and $h_{t}$ .

We consider a bi-partite neighborhood graph, in which each node of a given graph is connected to its $m$ nearest neighbors in the other graph. Using $N_{t} (h_{s})$ to denote the neighborhood of node embedding $h_{s}$ on target graph space, the hubness degree of node $h_{s}$ is measured by computing its mean similarity to its target neighborhood as follow:

D_{t} (h_{s}) = \frac{\sum_{h_{t} \in N_{t} (h_{s})} c o r r (h_{s}, h_{t})}{m} .

(10)

According to the definition, the larger $D_{t} (h_{s})$ is, the more likely $h_{s}$ is to be a hub. Similarly, denoted by $D_{s} (h_{t})$ , the mean similarity between $h_{t}$ and its source neighborhood can be computed. They are used to calculate the similarity measure LISI:

L I S I (h_{s}, h_{t}) = 2 c o r r (h_{s}, h_{t}) - D_{t} (h_{s}) - D_{s} (h_{t}) .

(11)

Here, a bigger value of LISI can be obtained when these two embeddings are similar enough to each other (the first item is large) while they are as less similar as possible to their neighbors in the other graph (the latter two items are small). In other words, LISI aims to identify two similar embeddings that are locally isolated rather than being hubs of dense areas.

Trusted pairs. The LISI values of node pairs across source graph and target graph are computed as the alignment matrix $M$ . If two nodes are each other’s nearest neighbors with respect to LISI, they are regarded as a trusted pair. Mathematically, a trusted pair $⟨ h_{s}, h_{t} ⟩$ satisfies the following conditions:

⎧ ⎪ ⎨ ⎪ ⎩ \begin{matrix} h_{t} = arg max h \in H_{t} L I S I (h_{s}, h), h_{s} = arg max h \in H_{s} L I S I (h_{t}, h) . \end{matrix}

(12)

Aggregation coefficient fine-tuning. According to topology consistency, if two nodes are connected in a network, then their anchor nodes (if both exist) are likely to maintain connectivity in another network. In other words, if node $A$ and node $a$ form an anchor link across source graph and target graph, then potential anchor links are likely to exist across the neighborhood of node $A$ and that of node $a$ . To find them, we have the following proposition.

Proposition 2.

By assigning bigger aggregation coefficients to trusted (or known) anchor nodes, more potential anchor nodes around them can be identified.

Proof.

Without loss of generality, suppose that there exist four nodes: $u, v \in V_{s}$ and $u^{'}, v^{'} \in V_{t}$ , where $(v, v^{'})$ is a pair of trusted (or known) anchor nodes, $(u, u^{'})$ is a pair of potential anchor nodes that needed to be identified and $v \in N_{s} (u), v^{'} \in N_{t} (u^{'})$ . The first-layer embeddings of node $u$ and $u^{'}$ can be expanded as:

\begin{matrix} H_{s, k}^{1} (u) & = f^{0} (\sum r \in N_{s} (u) {~ L}_{s, k} (u, r) X_{s} (r) W^{0}) = f^{0} ([{~ L}_{s, k} (u, v) X_{s} (v) + \sum r \in N_{s} (u) ∖ v {~ L}_{s, k} (u, r) X_{s} (r)] W^{0}) \end{matrix}

and

\begin{matrix} H_{t, k}^{1} (u^{'}) & = f^{0} (\sum r^{'} \in N_{t} (u^{'}) {~ L}_{t, k} (u^{'}, r^{'}) X_{t} (r^{'}) W^{0}) = f^{0} ([{~ L}_{t, k} (u^{'}, v^{'}) X_{t} (v^{'}) + \sum r^{'} \in N_{t} (u^{'}) ∖ v^{'} {~ L}_{t, k} (u^{'}, r^{'}) X_{t} (r^{'})] W^{0}) \end{matrix}

To increase the similarity between $H_{s, k}^{1} (u)$ and $H_{t, k}^{1} (u^{'})$ , we should make the terms in square brackets as close as possible. Note that ${~ L}_{s, k} (u, \cdot)$ and ${~ L}_{t, k} (u^{'}, \cdot)$ are scalars, which are generally regarded as aggregating weights, and $X_{s} (\cdot)$ and $X_{t} (\cdot)$ are vectors. In square brackets, the first item is a vector representing trusted/known anchor nodes, while the other item is a synthesized vector to aggregate the other nodes. As $(v, v^{'})$ is a pair of trusted/known anchor nodes, $X_{s} (v) = X_{t} (v^{'})$ holds due to attribute consistency. By enlarging the common vector components, namely the trusted-pair-related item, the synthesized vectors are tuned to be closer. In practice, since we do not know which pair of nodes around trusted pairs may have a potential anchor link, we can only tune ${~ L}_{s, k} (:, v)$ and ${~ L}_{t, k} (:, v^{'})$ synchronously such that potential anchor links can be found and existing anchor links can be preserved. The process is similar for the following layers, which is omitted. ∎

In fact, assigning bigger aggregation coefficients to trusted anchor nodes means that their neighboring potential anchor nodes can receive more similar information. The more trusted nodes around potential anchor nodes, the easier they can be identified. Formally, two vectors $r_{s} \in R^{n_{s}}$ and $r_{t} \in R^{n_{t}}$ are maintained to update the reinforcement factors for nodes in source graph and target graph respectively. At the beginning, both of them are initialized to all-one vectors. If node $i \in V_{s}$ and node $j \in V_{t}$ are identified as trusted pairs, their corresponding reinforcement factors are updated as follows:

{\begin{matrix} r_{s} (i) = β \cdot r_{s} (i), r_{t} (j) = β \cdot r_{t} (j), \end{matrix}

(13)

where $β > 1$ is the reinforcement rate.

The aggregation scheme of GCN is modified as:

H_{s}^{l + 1} = f^{l} (R_{s} {~ L}_{s} R_{s} H_{s}^{l} W^{l})

(14)

where $R_{s}$ is a diagonal reinforcement matrix in which $R_{s} (i, i) = r_{s} (i)$ . Likewise, the embeddings of target network are generated by introducing reinforcement matrix $R_{t}$ .

The fine-tuning process is iteratively conducted as shown in Fig. 3. Each loop contains four steps: (a) identify trusted pairs from the alignment matrix; (b) update aggregation coefficients of trusted pairs; (c) generate new node embeddings; (d) compute new alignment matrix. We also need to track the number of trusted pairs $T$ in every loop. The objective is to maximize the number of trusted pairs identified based on each level of consistency. If the number stops growing, the loop is terminated (the loops for different orbits are independent of each other). The fine-tuning algorithm is summarized in Algorithm 2.

0: Embeddings of source graph and target graph

H_{s}

H_{t}

, reinforcement rate

β

;

0: Refined alignment matrices

M = {M_{0}, M_{1} \dots, M_{K}}

, the maximal number of trusted pairs

T_{m a x} = {T_{m 0}, T_{m 1}, \dots, T_{m K}}

1: Initialize reinforcement vectors

r_{s}

r_{t}

with all-one vectors, number of trusted pairs

T_{c u r} = {T_{c 0}, T_{c 1}, \dots, T_{c K}}

T_{m a x}

with all-zero vectors;

2: for

k = 0 \to K

3: repeat

T_{m k} \leftarrow T_{c k}

;

5: Compute LISI as alignment matrix

M_{k}

using Eq. (9)-(11);

6: Identify and count trusted pairs

T_{c k}

from

M_{k}

using Eq. (12);

7: Update

r_{s}

r_{t}

using Eq. (13);

8: Generate new embeddings

H_{s, k}

H_{t, k}

using Eq. (14);

9: until

T_{c k} \leq T_{m k}

10: end for

11: return

M

T_{m a x}

Algorithm 2 Trusted-pair Based Fine-tuning

Iv-E Posterior Importance Assignment

After refinement procedure is done on each orbit, corresponding alignment matrices $M = {M_{0}, M_{1} \dots, M_{K}}$ are obtained. Each of them contains the alignment score of all node pairs, focusing on a specific order of topological consistency. To integrate those scores and produce the final alignment matrix, we consider an alignment-related index: the number of trusted pairs identified in different orders of topological consistency. The motivation is that the more trusted nodes that can be identified, the higher the corresponding embedding quality.

We denote the weight of $M_{k}$ by $γ_{k}$ , which is defined as:

γ_{k} = \frac{T_{k}}{\sum_{i} T_{i}},

(15)

where $T_{i}$ denotes the number of trusted pairs identified based on orbit $i$ . Then the final alignment matrix is the weighted sum of each orbit-specific matrix: $M = \sum_{k} γ_{k} \cdot M_{k}$ .

Given the final alignment matrix, for each node in source graph, the node with the highest score in target graph is regarded as the most possible anchor node.

Iv-F Complexity Analysis

Without loss of generality, we denote the number of nodes by $n$ , the number of edges by $e$ (non-zero entries of adjacency matrix), the maximal node degree by $D$ , and the dimension of hidden features by $d$ .

Time complexity. There are mainly three steps to analyze:

Orbit matrix construction. In this work, we use orbits in graphlets with not more than 4 nodes. Thanks to the scalable Orca algorithm [46, 47], edge orbit counting can be done in $O (e D^{2})$ for 4-node graphlets, which allows us to count orbits on graphs with millions of edges within a second. And constructing orbit-based Laplacian matrices takes $O (e)$ .
Multi-orbit-aware embedding. The propagation of GCN encoder takes $O (K e d^{L})$ where the numbers of orbits $K$ and hidden layers $L$ are usually small constants.
Trusted-pair based fine-tuning. Identifying trusted-pairs takes $O (K n^{2})$ , and refining the aggregation coefficients takes $O (n)$ .

Therefore, the first two steps take $O (e)$ and the third step takes $O (n^{2})$ . Note that $O (n^{2})$ is dominated by the global alignment task itself ( $M \in R^{n_{s} \times n_{t}}$ ), and most operations, such as trusted-pair identification, can be efficiently implemented through matrix manipulation. Also, considering the practical running time shown in Fig. 8, the overall time cost of our approach is less than or comparable to that of most benchmark methods.

Space complexity. We analyze space complexity from three aspects:

Orbit-relevant storage. Due to the increasing sparsity of higher-order orbits, there are at most $O (e)$ non-zero entries in each orbit matrix. Therefore, the worst-case space complexity for storing $K$ sparse orbit Laplacian matrices is $O (K e)$ .
Embedding-relevant storage. It takes $O (L d^{2})$ to store trainable parameters in $L$ GCN layers, and the space required for storing the features of $n$ nodes in $L$ hidden layers is $O (L n d)$ .
Alignment-relevant storage. In the fine-tuning stage, the LISI matrix is computed for identifying trusted pairs and can be released from memory once the identification process is done. Hence, it takes $O (n^{2})$ to store the LISI matrix in the whole refinement process. As for the weighted integration stage, a cumulative approach can be adopted, which only requires maintaining a main matrix and iteratively adding each orbit-specified score matrix, and thus it also takes $O (n^{2})$ , the same as most alignment methods.

In summary, the overall space complexity is $O (K e + L d^{2} + L n d + n^{2})$ . As the space complexity of a common GCN-based alignment approach would be $O (e + L d^{2} + L n d + n^{2})$ , our method generally just requires extra space for storing sparse orbit matrices.

V Experimental Evaluations

In this section, we comprehensively evaluate the superiority of our unsupervised alignment framework on both real-world and synthetic datasets and compare it with state-of-the-art unsupervised and supervised methods. Other detailed evaluations are also reported which empirically demonstrate the effectiveness of the proposed method.

V-a Experiment Setup

Datasets. To facilitate a fair comparison, in this work, we evaluate our approach by following the setup of the primary baselines [17, 10, 13] and using the same datasets as they do, including three widely used real-world benchmark datasets and two synthetic datasets. For synthetic datasets, the source networks are collected from the real world, while the target networks are generated from source networks in the same way as [10] by randomly removing a certain percentage of the edges in original networks. Node identity is preserved, indicating the alignment groundtruth.

$∙$ Allmovie & Imdb: each node is a movie and each edge between two movies signifies that they have at least one common actor. The former is constructed from Rotten Tomatoes website¹¹1https://www.kaggle.com/ayushkalla1/rotten-tomatoes-movie-database and the latter from Imdb website²²2https://www.kaggle.com/jyoti1706/imdbmoviesdataset. There are totally 5176 anchor links across these two networks [10].

$∙$ Douban Online & Douban Offline: a pair of Chinese social networks, whose nodes denote users and edges represent whether two users are contacts or friends. This pair of networks contains 1118 anchor links [17, 10, 13].

$∙$ Flickr & Myspace: Two subnetworks and their corresponding groundtruths are extracted by [17]. Each node represents a user, and part of the user’s profile information is treated as attributes. There are 267 anchor links across these two networks [17, 13].

$∙$ Econ: this dataset comes from a economic model of Victoria State, Australia during the banking crisis in 1880 [48, 10]. Each node is an organisation (like banks, firms), and the edges denote the contractual relationships between them. Partial edges (from 10% to 50%) are randomly removed to construct the target network.

$∙$ BN: this network represents a part of brain structure. The nodes are brain voxels, and each edge is a fiber tract that connects two voxels [49, 10]. Partial edges (from 10% to 50%) are randomly removed to construct the target network.

The detailed statistics of aforementioned datasets are listed in Table. I.

Networks	#Edges	#Nodes	#Attrs	Avg. Deg
Allmovie	124709	6011	14	41.4
Imdb	119073	5713	14	41.7
Douban Online	8164	3906	538	4.2
Douban Offline	1511	1118	538	2.7
Flickr	7333	6714	3	2.2
Myspace	11081	10733	3	2.1
Econ	7619	1258	20	12.1
BN	9016	1781	20	10.1

TABLE I: Statistical details of networks

Configurations for proposed framework. If not stated otherwise, the hyperparameters are set as follows: number of GCN layers $L = 2$ , embedding dimension $d = 200$ , learning rate $η = 0.01$ , number of nearest neighbors $m = 20$ , aggregation reinforcement rate $β = 1.1$ . The sensitivity of key hyperparameters is studied in V-F.

Baseline methods. Other six representative methods are constructed for comparison, including methods that only use topology information (IsoRank [14], PALE [18], CENALP [24]) and three methods that use both topology information and node attributes (FINAL [17], REGAL [18], GAlign [10]). In particular, GAlign is the most related competitor, which is also an unsupervised network alignment method based on GCN. Since other general embedding techniques, like DeepWalk [37]/Node2Vec [36]/LINE [38], have been discussed and surpassed in papers of PALE/CENALP/REGAL, we compare with these alignment methods directly. According to the original settings of these baseline methods, some of them have to introduce supervised information. Therefore, 10% of ground truth is used to train PALE and CENALP and generate the prior alignment matrix for FINAL and IsoRank.

Methods	Allmovie & Imdb				Douban Online & Offline				Flickr & Myspace
Methods	$p @ 1$	$p @ 10$	$M R R$	Time(s)	$p @ 1$	$p @ 10$	$M R R$	Time(s)	$p @ 1$	$p @ 10$	$M R R$	Time(s)
HTC	0.8436	0.9051	0.8574	87.51	0.4651	0.8005	0.5766	8.15	0.0150	0.0487	0.0289	42.94
GAlign	0.8214	0.9003	0.8496	92.48	0.4526	0.7800	0.5632	10.44	0.0112	0.0412	0.0267	48.29
FINAL	0.7647	0.9609	0.8459	67.71	0.4383	0.7710	0.5539	183.57	0.0075	0.0412	0.0212	358.2
PALE	0.6947	0.7159	0.7601	1810.28	0.0775	0.4479	0.1901	86.82	0.0000	0.0135	0.0079	166.11
CENALP	0.4866	0.8327	0.5693	43572.24	0.2572	0.4618	0.3537	8871.37	0.0075	0.0375	0.0244	21264.59
IsoRank	0.4653	0.6427	0.5271	69.89	0.0903	0.2048	0.1299	19.29	0.0037	0.0337	0.0162	132.63
REGAL	0.0953	0.3869	0.1888	70.14	0.0528	0.2326	0.1120	20.47	0.0112	0.0562	0.0272	124.07
Abs. Imp.	0.0222	-	0.0078	-	0.0125	0.0205	0.0134	2.29	0.0038	-	0.0017	5.35
Rel. Imp.	2.70%	-	0.92%	-	2.76%	2.63%	2.38%	21.93%	33.93%	-	6.25%	11.08%

TABLE II: Numerical evaluations of alignment performance on three real-world datasets

Metrics. Two numerical metrics are used to comprehensively evaluate and compare the effectiveness of different alignment approaches. The first metric is $p r e c i s i o n @ q$ (also known as $a c c u r a c y @ q$ or $s u c c e s s @ q$ ) [17]. It calculates the rate that a node’s true anchor node can be found in its top- $q$ candidates. Mathematically, $p r e c i s i o n @ q$ is defined as:

p r e c i s i o n @ q = \frac{\sum_{v_{s}^{*} \in V_{s}} \sum_{v_{t} \in V_{t}^{q} (v_{s}^{*})} 1 ((v_{s}^{*}, v_{t}) \in L^{*})}{| L^{*} |}

(16)

where $L^{*}$ denotes the set of ground-truth anchor pairs, $V_{t}^{q} (v_{s}^{*})$ denotes a subset of $V_{t}$ which contains the target nodes that hold the top- $q$ highest alignment scores in the row $M (v_{s}^{*}, \cdot)$ , and $1 (\cdot)$ is an indicator function.

Another metric, Mean Reciprocal Rank ( $M R R$ ) [50], also known as Mean Average Precision ( $M A P$ ) [19], evaluates the average performance of reciprocal rank. $M R R$ is defined as follows:

M R R = \frac{1}{| L^{*} |} \sum i \frac{1}{r_{i}}

(17)

where $r_{i}$ is the rank of $i$ th true anchor node’s alignment score in the corresponding row of $M$ . Note that for both $p r e c i s i o n @ q$ and $M R R$ , the higher the values are, the better the performance is. To mitigate randomness, all reported results are averaged over 20 runs.

Computing infrastructures. All experiments are conducted on a compute node with 32 Intel Xeon Gold 6242 CPUs (2.80GHz) and 2 Tesla V100-SXM2 GPUs (32GB of memory each). As for the main softwares, all methods are implemented based on NetworkX-2.6.0 and PyTorch-1.11.0 (if required).

Fig. 7: Runtime comparisons between HTC and baselines

V-B Overall Effectiveness

The overall alignment performance is first investigated. We report the $p r e c i s i o n @ 1$ , $p r e c i s i o n @ 10$ , and $M R R$ metrics for numerical comparison purpose, which are presented in Table II. As can be seen, the proposed HTC framework consistently outperforms all baseline models across three pairs of real-world datasets. Specifically, we achieve up to 33.93%, 2.63%, and 6.25% relative improvements over the best results achieved by baselines in terms of $p r e c i s i o n @ 1$ , $p r e c i s i o n @ 10$ , and $M R R$ , respectively.

Although 10% of ground truth is provided for supervised methods, it seems inadequate to make them outstanding. Among them, FINAL is the best-performing supervised model, which achieves a competitive result in terms of $p r e c i s i o n @ 10$ on Allmovie & Imdb dataset. Even so, compared with it, our HTC is fully unsupervised, and is 10.31% more accurate on Allmovie & Imdb dataset in terms of $p r e c i s i o n @ 1$ , which is the most important metric.

As for unsupervised competitors, GAlign is the most powerful one. Like our HTC, it does not focus solely on the original network topology. Instead, GAlign utilizes augmented networks to increase its adaptiveness to structural consistency violation, and thus it is the runner-up on three pairs of datasets in terms of several metrics. In addition to steadily outperforming GAlign in terms of accuracy, our method likewise exhibits robustness to structural noise, which is further discussed in subsection V-D.

It is worth noting that all alignment methods perform poorly on Flickr & Myspace datasets. In addition to very sparse network topology and very limited attribute information as listed in Table I, according to the insights provided in [13], another important factor that makes Flickr & Myspace difficult to align is that their groundtruths rarely satisfy the common alignment consistency that neighbours in the source network have connections in the target network. As a result, all methods can hardly achieve satisfactory performance, especially for the $p r e c i s i o n @ 1$ metric. However, our HTC is still the tallest among the short, and REGAL also shows relatively competitive results on this pair of datasets.

We also investigate the importance scores of different levels of orbit-weighted topology computed by Eq. 15 on three pairs of real-world datasets. In Fig. 6, we rank orbits by their $γ$ values, and the higher the position of the orbit in the plot, the higher its $γ$ value. Some interesting and insightful observations are as follows: (1) The overall distributions of orbit weights over three datasets are quite different. The weights on Allmovie & Imdb datasets have the smallest variance than that of the other two pairs of datasets. This is consistent with network characteristics listed in Table I, where Allmovie & Imdb are far more dense, with an average degree greater than 40, and thus some higher-order patterns are more likely to exist and make contributions. In contrast, Flickr & Myspace datasets are extremely sparse, which means they are less likely to form sufficient higher-order patterns, and thus only a few lower-order patterns account for most of the importance scores. The difference in the orbit importance distribution between these three datasets shows that our method can effectively and adaptively focus on more favourable topological consistency information according to specific network characteristics rather than maintaining fixed weights for each level. (2) Another key finding is that orbits 1, 3, and 5 rank in the top five in terms of the importance score in all three datasets, while orbit 0, the most basic edge pattern, is only included in the top five on Flickr & Myspace datasets. This fact indicates that higher-order consistencies can actually make more contributions to the alignment tasks than the trivial one, which is the core factor that makes HTC work better than other baselines.

V-C Efficiency Analysis

To comprehensively evaluate the efficiency of our proposed HTC, the runtime of HTC on three pairs of real-world datasets is reported in Table II. Besides, to facilitate comparison with the baseline methods, we present runtime results in Fig. 8 (CENALP takes much more time and is therefore excluded). As illustrated, HTC takes the least time to achieve the best accuracy on two out of three datasets (Douban online & offline and Flickr & Myspace), and the time on another dataset (Allmovie & Imdb) is also comparable to most benchmarks. Although multiple higher-order topological consistencies are considered in HTC, we do not introduce additional training-required encoding parameters as they are shared by all orbits, and thus will not result in much extra time cost. Also, the termination conditions of the fine-tuning processes on different orbits are independent of each other (refer to Algorithm 2), which means redundant time consumption for extra running loops can be saved.

For an in-depth analysis, we divide the whole process of HTC into six parts, i.e., orbit counting, laplacian matrix construction, multi-orbit-aware training, trusted pair based fine-tuning, weighted integration, and other operations, and their corresponding time consumption is shown in Fig. 8. In particular, the time for other operations refers to the rest time consumption in addition to the other five parts of the total time, such as the time for calculating assessment metrics. On three datasets, orbit counting, Laplacian matrix construction, and weighted integration take relatively less time than the other processes. In addition to other operations, most of the time is spent on multi-orbit-aware training and trusted-pair based fine-tuning processes, which are the cores of our method.

In summary, taking both effectiveness and efficiency into consideration, our method shows its superiority over baselines.

V-D Robustness Test

To test the robustness of our proposed high-order topological consistency, in this subsection, we inject different levels of topological noise into synthetic datasets. Specifically, 10% to 50% of edges in original source networks are randomly removed to generate the corresponding target networks for Econ and BN datasets. The performance of our HTC is evaluated and compared with that of baselines, which is shown in Fig. 9. Generally, all methods suffer accuracy degradation as the noise level increases. Among those baselines, PALE and REGAL are more sensitive to structural noise than other baselines, while FINAL and CENALP are more robust. Besides, IsoRank performs less competitively under all different noise levels. In comparison, HTC consistently outperforms all baselines except GAlign by a remarkable margin. Although our method does not target noise adaptation as GAlign, it can still achieve better performance than GAlign when the noise ratio is smaller than 0.3 (specifically, when ratio equals 0.1, the $p @ 1$ metrics of HTC $|$ GAlign on Econ and BN are 0.9944 $|$ 0.9781 and 0.9748 $|$ 0.9524, respectively), and the degree of performance degradation is generally comparable to GAlign, which is far smaller than the other baselines like PALE and REGAL and demonstrates the robustness of our HTC. For example, the performance degradations of HTC, GAlign, and PALE on Econ dataset are 0.2448, 0.2213, and 0.4252, respectively.

We owe the robustness of HTC to our multi-orbit-aware training mechanism based on two facts. First, not all higher order orbits are present for a given edge, e.g., the counts of orbit 3 and 4 for edge $(a, b)$ are both 0 as shown in Fig. 5, which is equivalent to removing the edge from the corresponding orbit. Second, the encoder is shared and co-trained by all different orders of topological patterns. As a result, HTC is trained to be robust to structural noise, i.e., edge removal/missing.

V-E Ablation Test.

To distinguish the contributions made by different components of HTC, we construct several variants and compare their performance on two real-world datasets (Because of the issues with Flickr & Myspace mentioned in subsection V-B, we exclude them from the ablation test and the subsequent hyperparameter study). All of them are built based on a two-layer GCN and all related hyperparameters are consistent with HTC.

HTC-L (low-order, w/o fine-tuning): only considers low-order topology, namely the simplest pattern (orbit 0) to learn node features and compute alignment matrix without a refinement process.
HTC-H (high-order, w/o fine-tuning): utilizes multi-orbit-aware training techniques to learn higher-order topological consistency while removes the refinement step.
HTC-LT (low-order, w/ fine-tuning): learns node embeddings based on the simplest pattern, while the alignment matrix is computed after conducting trusted-pair based fine-tuning.
HTC-DT (diffusion, w/ fine-tuning): replaces our GOMs with diffusion matrices of varying orders, which help to capture a larger neighborhood of structural information (best result is achieved with $k = 5$ and teleport probability $α = 0.15$ ) [27].

The results are reported in Table III. Compared with the other four variants, HTC (namely HTC-HT) performs far better in terms of both $p @ 1$ and $M R R$ metrics. Specifically, HTC achieves up to a 49% absolute improvement in $p @ 1$ over HTC-L, which ignores the proposed higher-order topological consistency and computes the alignment matrix directly without fine-tuning. This verifies the significant contributions we make to the original GCN in the context of unsupervised network alignment. Furthermore, HTC-H and HTC-LT outperform HTC-L by up to 38% and 16%, respectively, which indicates that more contributions are made by introducing higher-order topological consistency.

Methods	Douban On/Off-line		Allmovie & Imdb
Methods	$p @ 1$	$M R R$	$p @ 1$	$M R R$
HTC-L	0.1978	0.3034	0.3515	0.4522
HTC-H	0.4133	0.4941	0.7324	0.7418
HTC-LT	0.2947	0.3988	0.5182	0.5892
HTC-DT	0.0653	0.1508	0.1303	0.2214
HTC(-HT)	0.4651	0.5766	0.8436	0.8574

TABLE III: Numerical results of ablation test

The diffusion matrix is demonstrated to be helpful for graph learning since multi-hop neighborhoods are considered instead of direct one-hop neighbors [27]. Thus, we take it as a representative technique that considers a larger range of neighboring topologies based on the trivial edge pattern for aggregation information (HTC-DT). However, it performs even worse than HTC-L in the unsupervised alignment task. The fact is, GOMs can generally provide more sparse but more accurate topological consistency information when the order increases, while diffusion matrices are quite the opposite. Densified matrices make it harder to identify good anchor links because structural specificity is reduced and node embeddings become more smooth. Therefore, simply considering a larger range of neighboring topologies may hardly provide favorable information, especially in the unsupervised alignment scenario.

V-F Hyperparameter Study

The key hyperparameters of HTC are discussed in detail here.

(1) The number of orbits ( $K$ ). Orbits have two properties. First, the number of induced subgraphs on $N$ nodes increases exponentially with $N$ and so does the number of orbits defined on $N$ nodes. Second, the larger a graphlet is, the less frequently it occurs since such a graphlet requires more nodes and edges to form a specific local topology. As a result, the orbit matrix of a higher-order pattern is generally more sparse, which means less consistency information. The trend of incrementally using first $K$ orbits is displayed in Fig. 9(a). It can be found that increasing the number of orbits used at the beginning significantly improves the performance. The growth rate then decays, showing a smaller slope on the curve. The precision is no longer increasing and even slightly decreased when $K$ is larger than 13. Since considering too many orbits may bring little information gain (For instance, in Fig. 6, orbits 11, 12 account least for the model on both datasets) and cost more computation, in this work, we consider orbits in induced subgraphs with 2-4 nodes, i.e. the first 13 orbits.

(2) Structural hyperparameters of GCN ( $L$ ; $d$ ). It has been widely demonstrated that the best performance of a GCN model is achieved with 2 or 3 layers [39, 10, 29], and so is our model (we achieve best performance with $L = 2$ ). As for the embedding dimension, it can be seen in Fig. 9(b) that the performance improves significantly at first and then slows down when $d$ grows from 5 to 200. Besides, the $p @ 1$ metric on Douban dataset decreases when $d > 200$ , which results from an over-fitting problem. It should be avoided by setting the dimension too large, which increases computational load and may even degrade performance. Hence, we set $d = 200$ .

(3) The number of nearest neighbors ( $m$ ). The results of the controlled experiment are shown in Fig. 9(c). Generally, both too small and too large values of $m$ lead to less satisfactory results, and the best performances on both datasets are achieved within the range [15, 25]. Hence, $m$ is set as 20 in this paper.

(4) Reinforcement rate ( $β$ ). It can be seen from Fig. 9(d) that the performance of HTC decreases as $β$ increases. This reveals that a small increase in the aggregation coefficient in each epoch can better detect more trusted pairs. Hence, we empirically set $β = 1.1$ .

V-G Visualization Analysis

Fig. 11: The t-SNE visualization of node embeddings on Douban Online and Offline datasets

We use t-SNE technique [51] to visualize the embedding of ground truth anchor nodes before and after being aligned by HTC, as shown in Fig. 11. We randomly sample 150 nodes from Douban Online dataset and their corresponding anchor nodes from Douban Offline dataset, which are mapped from high-dimensional feature space to 2-dimensional space. Due to space limitations, only five orbits are investigated and visualized here. It can be observed that the embeddings of two graphs before alignment have more distinct distributions (the upper five subgraphs). Many source graph embeddings (blue points) cannot find target graph embeddings (red points) that are close enough to them. While after alignment (the lower five subgraphs), the embeddings of source graph and target graph show a more similar distribution, and most node embeddings of source network can be covered by that of target network. This indicates that more consistency information between source graph and target graph is successfully captured by HTC.

Vi Conclusions

In this paper, we propose a fully unsupervised network alignment framework, named HTC, to develop topological consistency from lower order to higher order. We define higher-order topological consistency based on the introduced graphlet orbit matrix. By elegantly integrating higher-order consistency into the information aggregation process of GCN, the task of network alignment is converted into the task of node embedding similarity measurement. Instead of relying on any single level of topology, we train our encoder to be multi-orbit-aware in the unsupervised encoder-decoder paradigm. To further improve the alignment quality, we introduce a trusted-pair based fine-tuning mechanism. Accordingly, all orders of alignment consistency are integrated to obtain a comprehensive evaluation of node correspondence. Extensive experimental results on benchmark datasets well demonstrate our superiority, such as effectiveness, efficiency, and robustness, over a wide variety of state-of-the-art baselines. Our work verifies the significance of higher-order topological consistency for unsupervised alignment tasks. It also provides a general roadmap for improving the alignment performance of GCN-based models, as our method is built on the most common GCN-based alignment setting.

References

[1] Y. Dong, J. Tang, S. Wu, J. Tian, N. V. Chawla, J. Rao, and H. Cao, “Link prediction and recommendation across heterogeneous social networks,” in 2012 IEEE 12th International conference on data mining, pp. 181–190, IEEE, 2012.
[2] Y. Xiang, G. Zhang, S. Gu, and J. Cai, “Online multi-layer dictionary pair learning for visual classification,” Expert Systems with Applications, vol. 105, pp. 174–182, 2018.
[3] C.-Y. Li and S.-D. Lin, “Matching users and items across domains to improve the recommendation quality,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 801–810, 2014.
[4] L. Liu, W. K. Cheung, X. Li, and L. Liao, “Aligning users across social networks using network embedding.,” in Ijcai, pp. 1774–1780, 2016.
[5] H. Yin, X. Zhou, B. Cui, H. Wang, K. Zheng, and Q. V. H. Nguyen, “Adapting to user interest drift for poi recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 10, pp. 2566–2581, 2016.
[6] H. Nassar, N. Veldt, S. Mohammadi, A. Grama, and D. F. Gleich, “Low rank spectral network alignment,” in Proceedings of the 2018 World Wide Web Conference, pp. 619–628, 2018.
[7] Q. Zhan, J. Zhang, S. Wang, S. Y. Philip, and J. Xie, “Influence maximization across partially aligned heterogenous social networks,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 58–69, Springer, 2015.
[8] J. Zhang and P. S. Yu, Unsupervised Network Alignment, pp. 165–202. Cham: Springer International Publishing, 2019.
[9] X. Mu, F. Zhu, E.-P. Lim, J. Xiao, J. Wang, and Z.-H. Zhou, “User identity linkage by latent user space modelling,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1775–1784, 2016.
[10] H. T. Trung, T. Van Vinh, N. T. Tam, H. Yin, M. Weidlich, and N. Q. V. Hung, “Adaptive network alignment with unsupervised and multi-order convolutional networks,” in 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 85–96, IEEE, 2020.
[11] D. Zhu, Y. Sun, H. Du, N. Cao, T. Baker, and G. Srivastava, “Huna: A method of hierarchical unsupervised network alignment for iot,” IEEE Internet of Things Journal, vol. 8, no. 5, pp. 3201–3210, 2020.
[12] Y. Zhou, J. Ren, R. Jin, Z. Zhang, D. Dou, and D. Yan, “Unsupervised multiple network alignment with multinominal gan and variational inference,” in 2020 IEEE International Conference on Big Data (Big Data), pp. 868–877, IEEE, 2020.
[13] H. T. Trung, N. T. Toan, T. Van Vinh, H. T. Dat, D. C. Thang, N. Q. V. Hung, and A. Sattar, “A comparative study on network alignment techniques,” Expert Systems with Applications, vol. 140, p. 112883, 2020.
[14] R. Singh, J. Xu, and B. Berger, “Global alignment of multiple protein interaction networks with application to functional orthology detection,” Proceedings of the National Academy of Sciences, vol. 105, no. 35, pp. 12763–12768, 2008.
[15] S. Labitzke, I. Taranu, and H. Hartenstein, “What your friends tell others about you: Low cost linkability of social network profiles,” in Proc. 5th International ACM Workshop on Social Network Mining and Analysis, San Diego, CA, USA, pp. 1065–1070, 2011.
[16] J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, and H.-W. Hon, “What’s in a name? an unsupervised approach to link users across communities,” in Proceedings of the sixth ACM international conference on Web search and data mining, pp. 495–504, 2013.
[17] S. Zhang and H. Tong, “Final: Fast attributed network alignment,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1345–1354, 2016.
[18] M. Heimann, H. Shen, T. Safavi, and D. Koutra, “Regal: Representation learning-based graph alignment,” in Proceedings of the 27th ACM international conference on information and knowledge management, pp. 117–126, 2018.
[19] T. Man, H. Shen, S. Liu, X. Jin, and X. Cheng, “Predict anchor links across social networks via an embedding approach.,” in Ijcai, vol. 16, pp. 1823–1829, 2016.
[20] F. Zhou, L. Liu, K. Zhang, G. Trajcevski, J. Wu, and T. Zhong, “Deeplink: A deep learning approach for user identity linkage,” in IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 1313–1321, IEEE, 2018.
[21] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, “Network motifs: simple building blocks of complex networks,” Science, vol. 298, no. 5594, pp. 824–827, 2002.
[22] N. Pržulj, D. G. Corneil, and I. Jurisica, “Modeling interactome: scale-free or geometric?,” Bioinformatics, vol. 20, no. 18, pp. 3508–3515, 2004.
[23] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” arXiv preprint arXiv:1611.07308, 2016.
[24] X. Du, J. Yan, and H. Zha, “Joint link prediction and network alignment via cross-graph embedding.,” in IJCAI, pp. 2251–2257, 2019.
[25] Z. Liang, Y. Rong, C. Li, Y. Zhang, Y. Huang, T. Xu, X. Ding, and J. Huang, “Unsupervised large-scale social network alignment via cross network embedding,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 1008–1017, 2021.
[26] S. Mohammadi, D. F. Gleich, T. G. Kolda, and A. Grama, “Triangular alignment (tame): A tensor-based approach for higher-order network alignment,” IEEE/ACM transactions on computational biology and bioinformatics, vol. 14, no. 6, pp. 1446–1458, 2016.
[27] J. Klicpera, S. Weißenberger, and S. Günnemann, “Diffusion improves graph learning,” Advances in Neural Information Processing Systems, vol. 32, pp. 13354–13366, 2019.
[28] F. Monti, K. Otness, and M. M. Bronstein, “Motifnet: a motif-based graph convolutional network for directed graphs,” in 2018 IEEE Data Science Workshop (DSW), pp. 225–228, IEEE, 2018.
[29] J. B. Lee, R. A. Rossi, X. Kong, S. Kim, E. Koh, and A. Rao, “Graph convolutional networks with motif-based attention,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 499–508, 2019.
[30] A. Sankar, X. Zhang, and K. C.-C. Chang, “Meta-gnn: Metagraph neural network for semi-supervised learning in attributed heterogeneous information networks,” in Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 137–144, 2019.
[31] N. Pržulj, “Biological network comparison using graphlet degree distribution,” Bioinformatics, vol. 23, no. 2, pp. e177–e183, 2007.
[32] R. W. Solava, R. P. Michaels, and T. Milenković, “Graphlet-based edge clustering reveals pathogen-interacting proteins,” Bioinformatics, vol. 28, no. 18, pp. i480–i486, 2012.
[33] T. Milenković, W. L. Ng, W. Hayes, and N. Pržulj, “Optimal network alignment with graphlet degree vectors,” Cancer informatics, vol. 9, pp. CIN–S4744, 2010.
[34] J. Crawford and T. Milenković, “Great: graphlet edge-based network alignment,” in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 220–227, IEEE, 2015.
[35] A. Almulhim, V. S. Dave, and M. A. Hasan, “Network alignment using graphlet signature and high order proximity,” in International Conference on Machine Learning, Optimization, and Data Science, pp. 130–142, Springer, 2019.
[36] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864, 2016.
[37] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710, 2014.
[38] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in Proceedings of the 24th international conference on world wide web, pp. 1067–1077, 2015.
[39] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
[40] T. Nguyen and R. Grishman, “Graph convolutional networks with argument-aware pooling for event detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.
[41] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
[42] K. Wang, J. Chen, Z. Song, Y. Wang, and C. Yang, “Deep neural network-embedded stochastic nonlinear state-space models and their applications to process monitoring,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
[43] X. Chen, M. Heimann, F. Vahedian, and D. Koutra, “Cone-align: Consistent network alignment with proximity-preserving node embedding,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1985–1988, 2020.
[44] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[45] G. Dinu, A. Lazaridou, and M. Baroni, “Improving zero-shot learning by mitigating the hubness problem,” arXiv preprint arXiv:1412.6568, 2014.
[46] T. Hočevar and J. Demšar, “A combinatorial approach to graphlet counting,” Bioinformatics, vol. 30, no. 4, pp. 559–565, 2014.
[47] T. Hočevar and J. Demšar, “Computation of graphlet orbits for nodes and edges in sparse graphs,” Journal of Statistical Software, vol. 71, no. 1, pp. 1–24, 2016.
[48] R. Rossi and N. Ahmed, “The network data repository with interactive graph analytics and visualization,” in Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
[49] K. Amunts, C. Lepage, L. Borgeat, H. Mohlberg, T. Dickscheid, M.-É. Rousseau, S. Bludau, P.-L. Bazin, L. B. Lewis, A.-M. Oros-Peusquens, et al., “Bigbrain: an ultrahigh-resolution 3d human brain model,” Science, vol. 340, no. 6139, pp. 1472–1475, 2013.
[50] T. Iofciu, P. Fankhauser, F. Abel, and K. Bischoff, “Identifying users across social tagging systems,” in Fifth International AAAI Conference on Weblogs and Social Media, 2011.
[51] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.,” Journal of machine learning research, vol. 9, no. 11, 2008.