图形离群值检测是一项具有许多应用程序的新兴但至关重要的机器学习任务。尽管近年来算法扩散,但缺乏标准和统一的绩效评估设置限制了它们在现实世界应用中的进步和使用。为了利用差距,我们(据我们所知)(据我们所知)第一个全面的无监督节点离群值检测基准为unod,并带有以下亮点:(1)评估骨架从经典矩阵分解到最新图形神经的骨架的14个方法网络; (2)在现实世界数据集上使用不同类型的注射异常值和自然异常值对方法性能进行基准测试; (3)通过在不同尺度的合成图上使用运行时和GPU存储器使用算法的效率和可扩展性。基于广泛的实验结果的分析,我们讨论了当前渠道方法的利弊,并指出了多个关键和有希望的未来研究方向。
translated by 谷歌翻译
A large number of studies on Graph Outlier Detection (GOD) have emerged in recent years due to its wide applications, in which Unsupervised Node Outlier Detection (UNOD) on attributed networks is an important area. UNOD focuses on detecting two kinds of typical outliers in graphs: the structural outlier and the contextual outlier. Most existing works conduct experiments based on datasets with injected outliers. However, we find that the most widely-used outlier injection approach has a serious data leakage issue. By only utilizing such data leakage, a simple approach can achieve state-of-the-art performance in detecting outliers. In addition, we observe that most existing algorithms have a performance drop with varied injection settings. The other major issue is on balanced detection performance between the two types of outliers, which has not been considered by existing studies. In this paper, we analyze the cause of the data leakage issue in depth since the injection approach is a building block to advance UNOD. Moreover, we devise a novel variance-based model to detect structural outliers, which outperforms existing algorithms significantly at different injection settings. On top of this, we propose a new framework, Variance-based Graph Outlier Detection (VGOD), which combines our variance-based model and attribute reconstruction model to detect outliers in a balanced way. Finally, we conduct extensive experiments to demonstrate the effectiveness and efficiency of VGOD. The results on 5 real-world datasets validate that VGOD achieves not only the best performance in detecting outliers but also a balanced detection performance between structural and contextual outliers. Our code is available at https://github.com/goldenNormal/vgod-github.
translated by 谷歌翻译
考虑到过去几十年中开发的一长串异常检测算法,它们如何在(i)(i)不同级别的监督,(ii)不同类型的异常以及(iii)嘈杂和损坏的数据方面执行?在这项工作中,我们通过(据我们所知)在55个名为Adbench的55个基准数据集中使用30个算法来回答这些关键问题。我们的广泛实验(总共93,654)确定了对监督和异常类型的作用的有意义的见解,并解锁了研究人员在算法选择和设计中的未来方向。借助Adbench,研究人员可以轻松地对数据集(包括我们从自然语言和计算机视觉域的贡献)对现有基线的新提出的方法进行全面和公平的评估。为了促进可访问性和可重复性,我们完全开源的Adbench和相应的结果。
translated by 谷歌翻译
Anomaly analytics is a popular and vital task in various research contexts, which has been studied for several decades. At the same time, deep learning has shown its capacity in solving many graph-based tasks like, node classification, link prediction, and graph classification. Recently, many studies are extending graph learning models for solving anomaly analytics problems, resulting in beneficial advances in graph-based anomaly analytics techniques. In this survey, we provide a comprehensive overview of graph learning methods for anomaly analytics tasks. We classify them into four categories based on their model architectures, namely graph convolutional network (GCN), graph attention network (GAT), graph autoencoder (GAE), and other graph learning models. The differences between these methods are also compared in a systematic manner. Furthermore, we outline several graph-based anomaly analytics applications across various domains in the real world. Finally, we discuss five potential future research directions in this rapidly growing field.
translated by 谷歌翻译
对于由硬件和软件组件组成的复杂分布式系统而言,异常检测是一个重要的问题。对此类系统的异常检测的要求和挑战的透彻理解对于系统的安全性至关重要,尤其是对于现实世界的部署。尽管有许多解决问题的研究领域和应用领域,但很少有人试图对这种系统进行深入研究。大多数异常检测技术是针对某些应用域的专门开发的,而其他检测技术则更为通用。在这项调查中,我们探讨了基于图的算法在复杂分布式异质系统中识别和减轻不同类型异常的重要潜力。我们的主要重点是在分布在复杂分布式系统上的异质计算设备上应用时,可深入了解图。这项研究分析,比较和对比该领域的最新研究文章。首先,我们描述了现实世界分布式系统的特征及其在复杂网络中的异常检测的特定挑战,例如数据和评估,异常的性质以及现实世界的要求。稍后,我们讨论了为什么可以在此类系统中利用图形以及使用图的好处。然后,我们将恰当地深入研究最先进的方法,并突出它们的优势和劣势。最后,我们评估和比较这些方法,并指出可能改进的领域。
translated by 谷歌翻译
Recently, graph anomaly detection has attracted increasing attention in data mining and machine learning communities. Apart from existing attribute anomalies, graph anomaly detection also captures suspicious topological-abnormal nodes that differ from the major counterparts. Although massive graph-based detection approaches have been proposed, most of them focus on node-level comparison while pay insufficient attention on the surrounding topology structures. Nodes with more dissimilar neighborhood substructures have more suspicious to be abnormal. To enhance the local substructure detection ability, we propose a novel Graph Anomaly Detection framework via Multi-scale Substructure Learning (GADMSL for abbreviation). Unlike previous algorithms, we manage to capture anomalous substructures where the inner similarities are relatively low in dense-connected regions. Specifically, we adopt a region proposal module to find high-density substructures in the network as suspicious regions. Their inner-node embedding similarities indicate the anomaly degree of the detected substructures. Generally, a lower degree of embedding similarities means a higher probability that the substructure contains topology anomalies. To distill better embeddings of node attributes, we further introduce a graph contrastive learning scheme, which observes attribute anomalies in the meantime. In this way, GADMSL can detect both topology and attribute anomalies. Ultimately, extensive experiments on benchmark datasets show that GADMSL greatly improves detection performance (up to 7.30% AUC and 17.46% AUPRC gains) compared to state-of-the-art attributed networks anomaly detection algorithms.
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
与其他图表相比,图形级异常检测(GAD)描述了检测其结构和/或其节点特征的图表的问题。GAD中的一个挑战是制定图表表示,该图表示能够检测本地和全局 - 异常图,即它们的细粒度(节点级)或整体(图级)属性异常的图形,分别。为了解决这一挑战,我们介绍了一种新的深度异常检测方法,用于通过图表和节点表示的联合随机蒸馏学习丰富的全球和局部正常模式信息。通过训练一个GNN来实现随机初始化网络权重的另一GNN来实现随机蒸馏。来自各种域的16个真实图形数据集的广泛实验表明,我们的模型显着优于七种最先进的模型。代码和数据集可以在https://git.io/llocalkd中获得。
translated by 谷歌翻译
异常值是一个事件或观察,其被定义为不同于距群体的不规则距离的异常活动,入侵或可疑数据点。然而,异常事件的定义是主观的,取决于应用程序和域(能量,健康,无线网络等)。重要的是要尽可能仔细地检测异常事件,以避免基础设施故障,因为异常事件可能导致对基础设施的严重损坏。例如,诸如微电网的网络物理系统的攻击可以发起电压或频率不稳定性,从而损坏涉及非常昂贵的修复的智能逆变器。微电网中的不寻常活动可以是机械故障,行为在系统中发生变化,人体或仪器错误或恶意攻击。因此,由于其可变性,异常值检测(OD)是一个不断增长的研究领域。在本章中,我们讨论了使用AI技术的OD方法的进展。为此,通过多个类别引入每个OD模型的基本概念。广泛的OD方法分为六大类:基于统计,基于距离,基于密度的,基于群集的,基于学习的和合奏方法。对于每个类别,我们讨论最近最先进的方法,他们的应用领域和表演。之后,关于对未来研究方向的建议提供了关于各种技术的优缺点和挑战的简要讨论。该调查旨在指导读者更好地了解OD方法的最新进展,以便保证AI。
translated by 谷歌翻译
近年来,由于其在研究和实践中的重要性,对归属网络的异常检测问题有望的兴趣。虽然已经提出了各种方法来解决这个问题,但存在两种主要限制:(1)由于缺乏监控信号,未经监督的方法通常会效率低得多,(2)现有的异常检测方法仅使用本地语境信息来检测异常信息以检测异常信息节点,例如,单跳或两跳信息,但忽略全局上下文信息。由于异常节点与结构和属性中的正常节点不同,因此如果我们删除连接异常和正常节点的边缘,异常节点和其邻居之间的距离应该大于正常节点和其邻居之间的距离直观。因此,基于全局和本地上下文信息的跳数可以作为异常的指标。通过这种直觉激励,我们提出了一种基于跳数的模型(HCM)来通过建模本地和全局上下文信息来检测异常。为了更好地利用异常识别的跳跃计数,我们建议使用跳数预测作为自我监督任务。我们根据HOP计数通过HCM模型设计了两个异常的分数来识别异常。此外,我们雇用贝叶斯学习培训HCM模型,以捕获学习参数的不确定性,避免过度装备。关于现实世界归属网络的广泛实验表明,我们所提出的模型在异常检测中是有效的。
translated by 谷歌翻译
Anomaly detection is defined as discovering patterns that do not conform to the expected behavior. Previously, anomaly detection was mostly conducted using traditional shallow learning techniques, but with little improvement. As the emergence of graph neural networks (GNN), graph anomaly detection has been greatly developed. However, recent studies have shown that GNN-based methods encounter challenge, in that no graph anomaly detection algorithm can perform generalization on most datasets. To bridge the tap, we propose a multi-view fusion approach for graph anomaly detection (Mul-GAD). The view-level fusion captures the extent of significance between different views, while the feature-level fusion makes full use of complementary information. We theoretically and experimentally elaborate the effectiveness of the fusion strategies. For a more comprehensive conclusion, we further investigate the effect of the objective function and the number of fused views on detection performance. Exploiting these findings, our Mul-GAD is proposed equipped with fusion strategies and the well-performed objective function. Compared with other state-of-the-art detection methods, we achieve a better detection performance and generalization in most scenarios via a series of experiments conducted on Pubmed, Amazon Computer, Amazon Photo, Weibo and Books. Our code is available at https://github.com/liuyishoua/Mul-Graph-Fusion.
translated by 谷歌翻译
我们提出了TOD,这是一个在分布式多GPU机器上进行有效且可扩展的离群检测(OD)的系统。 TOD背后的一个关键思想是将OD应用程序分解为基本张量代数操作。这种分解使TOD能够通过利用硬件和软件中深度学习基础架构的最新进展来加速OD计算。此外,要在有限内存的现代GPU上部署昂贵的OD算法,我们引入了两种关键技术。首先,可证明的量化可以加快OD计算的速度,并通过以较低的精度执行特定的浮点操作来减少其内存足迹,同时证明没有准确的损失。其次,为了利用多个GPU的汇总计算资源和内存能力,我们引入了自动批处理,该批次将OD计算分解为小批次,以便在多个GPU上并行执行。 TOD支持一套全面且多样化的OD算法,例如LOF,PCA和HBOS以及实用程序功能。对真实和合成OD数据集的广泛评估表明,TOD平均比领先的基于CPU的OD系统PYOD快11.6倍(最大加速度为38.9倍),并且比各种GPU底线要处理的数据集更大。值得注意的是,TOD可以直接整合其他OD算法,并提供了将经典OD算法与深度学习方法相结合的统一框架。这些组合产生了无限数量的OD方法,其中许多方法是新颖的,可以很容易地在TOD中进行原型。
translated by 谷歌翻译
本次调查绘制了用于分析社交媒体数据的生成方法的研究状态的广泛的全景照片(Sota)。它填补了空白,因为现有的调查文章在其范围内或被约会。我们包括两个重要方面,目前正在挖掘和建模社交媒体的重要性:动态和网络。社会动态对于了解影响影响或疾病的传播,友谊的形成,友谊的形成等,另一方面,可以捕获各种复杂关系,提供额外的洞察力和识别否则将不会被注意的重要模式。
translated by 谷歌翻译
We present the OPEN GRAPH BENCHMARK (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source code ASTs, and knowledge graphs. For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics. In addition to building the datasets, we also perform extensive benchmark experiments for each dataset. Our experiments suggest that OGB datasets present significant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research. Finally, OGB provides an automated end-to-end graph ML pipeline that simplifies and standardizes the process of graph data loading, experimental setup, and model evaluation. OGB will be regularly updated and welcomes inputs from the community. OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu.
translated by 谷歌翻译
由于其在许多有影响力的领域中的广泛应用,归因网络上的图形异常检测已成为普遍的研究主题。在现实情况下,属性网络中的节点和边缘通常显示出不同的异质性,即不同类型的节点的属性显示出大量的多样性,不同类型的关系表示多种含义。在这些网络中,异常在异质性的各个角度上的表现通常与大多数不同。但是,现有的图异常检测方法不能利用归因网络中的异质性,这与异常检测高度相关。鉴于这个问题,我们提出了前方的提议:基于编码器解码器框架的异质性无监督图异常检测方法。具体而言,对于编码器,我们设计了三个关注级别,即属性级别,节点类型级别和边缘级别的关注,以捕获网络结构的异质性,节点属性和单个节点的信息。在解码器中,我们利用结构,属性和节点类型重建项来获得每个节点的异常得分。广泛的实验表明,与无监督环境中的艺术品相比,在几个现实世界中的异质信息网络上,前方的优势。进一步的实验验证了我们三重注意力,模型骨干和解码器的有效性和鲁棒性。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have been widely applied to different tasks such as bioinformatics, drug design, and social networks. However, recent studies have shown that GNNs are vulnerable to adversarial attacks which aim to mislead the node or subgraph classification prediction by adding subtle perturbations. Detecting these attacks is challenging due to the small magnitude of perturbation and the discrete nature of graph data. In this paper, we propose a general adversarial edge detection pipeline EDoG without requiring knowledge of the attack strategies based on graph generation. Specifically, we propose a novel graph generation approach combined with link prediction to detect suspicious adversarial edges. To effectively train the graph generative model, we sample several sub-graphs from the given graph data. We show that since the number of adversarial edges is usually low in practice, with low probability the sampled sub-graphs will contain adversarial edges based on the union bound. In addition, considering the strong attacks which perturb a large number of edges, we propose a set of novel features to perform outlier detection as the preprocessing for our detection. Extensive experimental results on three real-world graph datasets including a private transaction rule dataset from a major company and two types of synthetic graphs with controlled properties show that EDoG can achieve above 0.8 AUC against four state-of-the-art unseen attack strategies without requiring any knowledge about the attack type; and around 0.85 with knowledge of the attack type. EDoG significantly outperforms traditional malicious edge detection baselines. We also show that an adaptive attack with full knowledge of our detection pipeline is difficult to bypass it.
translated by 谷歌翻译
Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare. The presence of anomalies can indicate novel or unexpected events, such as production faults, system defects, or heart fluttering, and is therefore of particular interest. The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning. It providing a taxonomy based on the factors that divide anomaly detection models into different categories. Aside from describing the basic anomaly detection technique for each category, the advantages and limitations are also discussed. Furthermore, this study includes examples of deep anomaly detection in time series across various application domains in recent years. It finally summarises open issues in research and challenges faced while adopting deep anomaly detection models.
translated by 谷歌翻译
关于图表的深度学习最近吸引了重要的兴趣。然而,大多数作品都侧重于(半)监督学习,导致缺点包括重标签依赖,普遍性差和弱势稳健性。为了解决这些问题,通过良好设计的借口任务在不依赖于手动标签的情况下提取信息知识的自我监督学习(SSL)已成为图形数据的有希望和趋势的学习范例。与计算机视觉和自然语言处理等其他域的SSL不同,图表上的SSL具有独家背景,设计理念和分类。在图表的伞下自我监督学习,我们对采用图表数据采用SSL技术的现有方法及时及全面的审查。我们构建一个统一的框架,数学上正式地规范图表SSL的范例。根据借口任务的目标,我们将这些方法分为四类:基于生成的,基于辅助性的,基于对比的和混合方法。我们进一步描述了曲线图SSL在各种研究领域的应用,并总结了绘图SSL的常用数据集,评估基准,性能比较和开源代码。最后,我们讨论了该研究领域的剩余挑战和潜在的未来方向。
translated by 谷歌翻译
在线零售平台,积极检测交易风险至关重要,以提高客户体验,并尽量减少财务损失。在这项工作中,我们提出了一种可解释的欺诈行为预测框架,主要由探测器和解释器组成。 Xfraud探测器可以有效和有效地预测进货交易的合法性。具体地,它利用异构图形神经网络来从事务日志中的信息的非渗透键入实体中学习表达式表示。 Xfraud中的解释器可以从图表中生成有意义和人性化的解释,以便于业务部门中的进一步进程。在我们对具有高达11亿节点和37亿边缘的实际交易网络上的Xfraud实验中,XFraud能够在许多评估度量中倾销各种基线模型,同时在分布式设置中剩余可扩展。此外,我们表明,XFraud解释者可以通过定量和定性评估来显着帮助业务分析来产生合理的解释。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译