Fairness has been taken as a critical metric in machine learning models, which is considered as an important component of trustworthy machine learning. In this paper, we focus on obtaining fairness for popular link prediction tasks, which are measured by dyadic fairness. A novel pre-processing methodology is proposed to establish dyadic fairness through data repairing based on optimal transport theory. With the well-established theoretical connection between the dyadic fairness for graph link prediction and a conditional distribution alignment problem, the dyadic repairing scheme can be equivalently transformed into a conditional distribution alignment problem. Furthermore, an optimal transport-based dyadic fairness algorithm called DyadicOT is obtained by efficiently solving the alignment problem, satisfying flexibility and unambiguity requirements. The proposed DyadicOT algorithm shows superior results in obtaining fairness compared to other fairness methods on two benchmark graph datasets.
图表表示学习已经成为许多情景中的无处不在的组成部分,从社会网络分析到智能电网的能量预测。在几个应用程序中,确保关于某些受保护属性的节点(或图形)表示的公平对其正确部署至关重要。然而,图表深度学习的公平仍然在探索,很少有解决方案。特别地,在若干真实世界图(即同声源性)上相似节点对簇的趋势可以显着恶化这些程序的公平性。在本文中,我们提出了一种新颖的偏见边缘辍学算法(Fairdrop)来反击精神剧并改善图形表示学习中的公平性。 Fairdrop可以在许多现有算法上轻松插入,具有高效,适应性,并且可以与其他公平诱导的解决方案结合。在描述了一般算法之后,我们在两个基准任务中展示其应用,具体地,作为用于生产节点嵌入的随机步道模型,以及用于链路预测的图形卷积网络。我们证明,所提出的算法可以成功地改善所有型号的公平,直到精度小或可忽略的降低,并与现有的最先进的解决方案相比。在一个消融研究中,我们证明我们的算法可以灵活地在偏置公平性和无偏见的边缘辍学之间插入。此外,为了更好地评估增益,我们提出了一种新的二元组定义,以测量与基于组的公平度量配对时的链路预测任务的偏差。特别是,我们扩展了用于测量节点嵌入的偏差的指标,以考虑图形结构。
Learning fair graph representations for downstream applications is becoming increasingly important, but existing work has mostly focused on improving fairness at the global level by either modifying the graph structure or objective function without taking into account the local neighborhood of a node. In this work, we formally introduce the notion of neighborhood fairness and develop a computational framework for learning such locally fair embeddings. We argue that the notion of neighborhood fairness is more appropriate since GNN-based models operate at the local neighborhood level of a node. Our neighborhood fairness framework has two main components that are flexible for learning fair graph representations from arbitrary data: the first aims to construct fair neighborhoods for any arbitrary node in a graph and the second enables adaption of these fair neighborhoods to better capture certain application or data-dependent constraints, such as allowing neighborhoods to be more biased towards certain attributes or neighbors in the graph.Furthermore, while link prediction has been extensively studied, we are the first to investigate the graph representation learning task of fair link classification. We demonstrate the effectiveness of the proposed neighborhood fairness framework for a variety of graph machine learning tasks including fair link prediction, link classification, and learning fair graph embeddings. Notably, our approach achieves not only better fairness but also increases the accuracy in the majority of cases across a wide variety of graphs, problem settings, and metrics.
Clustering is a fundamental problem in network analysis that finds closely connected groups of nodes and separates them from other nodes in the graph, while link prediction is to predict whether two nodes in a network are likely to have a link. The definition of both naturally determines that clustering must play a positive role in obtaining accurate link prediction tasks. Yet researchers have long ignored or used inappropriate ways to undermine this positive relationship. In this article, We construct a simple but efficient clustering-driven link prediction framework(ClusterLP), with the goal of directly exploiting the cluster structures to obtain connections between nodes as accurately as possible in both undirected graphs and directed graphs. Specifically, we propose that it is easier to establish links between nodes with similar representation vectors and cluster tendencies in undirected graphs, while nodes in a directed graphs can more easily point to nodes similar to their representation vectors and have greater influence in their own cluster. We customized the implementation of ClusterLP for undirected and directed graphs, respectively, and the experimental results using multiple real-world networks on the link prediction task showed that our models is highly competitive with existing baseline models. The code implementation of ClusterLP and baselines we use are available at https://github.com/ZINUX1998/ClusterLP.
图形神经网络(GNNS)已被证明是在预测建模任务中的Excel,其中底层数据是图形。然而,由于GNN广泛用于人以人为本的应用,因此出现了公平性问题。虽然边缘删除是用于促进GNNS中公平性的常用方法,但是当数据本质上缺少公平连接时,它就无法考虑。在这项工作中,我们考虑未删除的边缘添加方法,促进公平。我们提出了两个模型 - 不可知的算法来执行边缘编辑:蛮力方法和连续近似方法,公平。Fairedit通过利用公平损失的梯度信息来执行有效的边缘编辑,以找到改善公平性的边缘。我们发现Fairedit优于许多数据集和GNN方法的标准培训,同时表现了许多最先进的方法,展示了公平的能力,以改善许多领域和模型的公平性。
Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
公平机器学习旨在减轻模型预测的偏见,这对于关于诸如种族和性别等敏感属性的某些群体的偏见。在许多现有的公平概念中,反事实公平通过比较来自原始数据和反事实的预测来衡量因因果角度来源的模型公平。在反事实上,该个人的敏感属性值已被修改。最近,少数作品将反事实公平扩展到图数据,但大多数忽略了可能导致偏差的以下事实:1)每个节点邻居的敏感属性可能会影响预测w.r.t.这个节点; 2)敏感属性可能会导致其他特征和图形结构。为了解决这些问题,在本文中,我们提出了一种新颖的公平概念 - 图形反应性公平,这考虑了上述事实领导的偏差。要学习对图形反事实公平的节点表示,我们提出了一种基于反事实数据增强的新颖框架。在此框架中,我们生成对应于每个节点和邻居敏感属性的扰动的反应性。然后,我们通过最大限度地减少从原始图表中学到的表示与每个节点的反事实之间的差异来执行公平性。合成和真实图的实验表明,我们的框架优于图形反事实公平性的最先进的基线,并且还实现了可比的预测性能。
While machine learning models have achieved unprecedented success in real-world applications, they might make biased/unfair decisions for specific demographic groups and hence result in discriminative outcomes. Although research efforts have been devoted to measuring and mitigating bias, they mainly study bias from the result-oriented perspective while neglecting the bias encoded in the decision-making procedure. This results in their inability to capture procedure-oriented bias, which therefore limits the ability to have a fully debiasing method. Fortunately, with the rapid development of explainable machine learning, explanations for predictions are now available to gain insights into the procedure. In this work, we bridge the gap between fairness and explainability by presenting a novel perspective of procedure-oriented fairness based on explanations. We identify the procedure-based bias by measuring the gap of explanation quality between different groups with Ratio-based and Value-based Explanation Fairness. The new metrics further motivate us to design an optimization objective to mitigate the procedure-based bias where we observe that it will also mitigate bias from the prediction. Based on our designed optimization objective, we propose a Comprehensive Fairness Algorithm (CFA), which simultaneously fulfills multiple objectives - improving traditional fairness, satisfying explanation fairness, and maintaining the utility performance. Extensive experiments on real-world datasets demonstrate the effectiveness of our proposed CFA and highlight the importance of considering fairness from the explainability perspective. Our code is publicly available at https://github.com/YuyingZhao/FairExplanations-CFA .
Knowledge graph data are prevalent in real-world applications, and knowledge graph neural networks (KGNNs) are essential techniques for knowledge graph representation learning. Although KGNN effectively models the structural information from knowledge graphs, these frameworks amplify the underlying data bias that leads to discrimination towards certain groups or individuals in resulting applications. Additionally, as existing debiasing approaches mainly focus on the entity-wise bias, eliminating the multi-hop relational bias that pervasively exists in knowledge graphs remains an open question. However, it is very challenging to eliminate relational bias due to the sparsity of the paths that generate the bias and the non-linear proximity structure of knowledge graphs. To tackle the challenges, we propose Fair-KGNN, a KGNN framework that simultaneously alleviates multi-hop bias and preserves the proximity information of entity-to-relation in knowledge graphs. The proposed framework is generalizable to mitigate the relational bias for all types of KGNN. We develop two instances of Fair-KGNN incorporating with two state-of-the-art KGNN models, RGCN and CompGCN, to mitigate gender-occupation and nationality-salary bias. The experiments carried out on three benchmark knowledge graph datasets demonstrate that the Fair-KGNN can effectively mitigate unfair situations during representation learning while preserving the predictive performance of KGNN models.
这项工作提供了有关图消息传递神经网络(GMPNNS)(例如图形神经网络(GNNS))的第一个理论研究,以执行归纳性脱离分布(OOD)链接预测任务,在部署(测试)(测试))图大小比训练图大。我们首先证明了非反应界限,表明基于GMPNN获得的基于置换 - 等值的(结构)节点嵌入的链接预测变量可以随着测试图变大,可以收敛到随机猜测。然后,我们提出了一个理论上的GMPNN,该GMPNN输出结构性成对(2节点)嵌入,并证明非扰动边界表明,随着测试图的增长,这些嵌入量会收敛到连续函数的嵌入,以保留其预测链接的能力。随机图上的经验结果表明与我们的理论结果一致。
图表上的表示学习(也称为图形嵌入)显示了其对一系列机器学习应用程序(例如分类,预测和建议)的重大影响。但是,现有的工作在很大程度上忽略了现代应用程序中图和边缘的属性(或属性)中包含的丰富信息,例如,属性图表示的节点和边缘。迄今为止,大多数现有的图形嵌入方法要么仅关注具有图形拓扑的普通图,要么仅考虑节点上的属性。我们提出了PGE,这是一个图形表示学习框架,该框架将节点和边缘属性都包含到图形嵌入过程中。 PGE使用节点聚类来分配偏差来区分节点的邻居,并利用多个数据驱动的矩阵来汇总基于偏置策略采样的邻居的属性信息。 PGE采用了流行的邻里聚合归纳模型。我们通过显示PGE如何实现更好的嵌入结果的详细分析,并验证PGE的性能,而不是最新的嵌入方法嵌入方法在基准应用程序上的嵌入方法,例如节点分类和对现实世界中的链接预测数据集。
图形神经网络(GNN)表现出令人满意的各种图分析问题的性能。因此,在各种决策方案中,它们已成为\ emph {de exto}解决方案。但是,GNN可以针对某些人口亚组产生偏差的结果。最近的一些作品在经验上表明,输入网络的偏见结构是GNN的重要来源。然而,没有系统仔细检查输入网络结构的哪一部分会导致对任何给定节点的偏见预测。对输入网络的结构如何影响GNN结果的偏见的透明度很大,在很大程度上限制了在各种决策方案中的安全采用GNN。在本文中,我们研究了GNN中偏见的结构解释的新研究问题。具体而言,我们提出了一个新颖的事后解释框架,以识别可以最大程度地解释出偏见的两个边缘集,并最大程度地促进任何给定节点的GNN预测的公平水平。这种解释不仅提供了对GNN预测的偏见/公平性的全面理解,而且在建立有效但公平的GNN模型方面具有实际意义。对现实世界数据集的广泛实验验证了拟议框架在为GNN偏见提供有效的结构解释方面的有效性。可以在https://github.com/yushundong/referee上找到开源代码。
在本文中,我们旨在提供有效的成对学习神经链路预测(PLNLP)框架。该框架将链路预测视为对等级问题的成对学习,包括四个主要组件,即邻域编码器,链路预测器,负采样器和目标函数组成。该框架灵活地,任何通用图形神经卷积或链路预测特定神经结构都可以作为邻域编码器。对于链路预测器,我们设计不同的评分功能,可以基于不同类型的图表来选择。在否定采样器中,我们提供了几种采样策略,这些策略是特定的问题。至于目标函数,我们建议使用有效的排名损失,这大约最大化标准排名度量AUC。我们在4个链路属性预测数据集上评估了开放图基准的4个链接属性预测数据集,包括\ texttt {ogbl-ddi},\ texttt {ogbl-collbab},\ texttt {ogbl-ppa}和\ texttt {ogbl-ciation2}。 PLNLP在\ TextTt {ogbl-ddi}上实现前1个性能,以及仅使用基本神经架构的\ texttt {ogbl-collab}和\ texttt {ogbl-ciation2}的前2个性能。该性能展示了PLNLP的有效性。
Graph Neural Networks (GNNs) have become increasingly important in recent years due to their state-of-the-art performance on many important downstream applications. Existing GNNs have mostly focused on learning a single node representation, despite that a node often exhibits polysemous behavior in different contexts. In this work, we develop a persona-based graph neural network framework called PersonaSAGE that learns multiple persona-based embeddings for each node in the graph. Such disentangled representations are more interpretable and useful than a single embedding. Furthermore, PersonaSAGE learns the appropriate set of persona embeddings for each node in the graph, and every node can have a different number of assigned persona embeddings. The framework is flexible enough and the general design helps in the wide applicability of the learned embeddings to suit the domain. We utilize publicly available benchmark datasets to evaluate our approach and against a variety of baselines. The experiments demonstrate the effectiveness of PersonaSAGE for a variety of important tasks including link prediction where we achieve an average gain of 15% while remaining competitive for node classification. Finally, we also demonstrate the utility of PersonaSAGE with a case study for personalized recommendation of different entity types in a data management platform.
