自动检测异常轨迹是智能运输系统中大量应用的重要问题。许多现有的研究集中在区分异常轨迹和正常轨迹上,忽略了异常轨迹之间的巨大差异。最近的一项研究在鉴定异常轨迹模式方面取得了长足进步,并提出了一种两阶段算法,用于异常轨迹检测和分类(ATDC)。该算法具有出色的性能,但受到了一些局限性,例如高时间的复杂性和不良的解释。在这里,我们对ATDC算法进行了仔细的理论和经验分析,表明可以简化两个阶段的异常得分的计算,并且该算法的第二阶段比第一阶段重要得多。因此,我们开发了一种FastATDC算法,该算法在两个阶段都引入了随机抽样策略。实验结果表明,FastATDC在实际数据集上的速度比ATDC快10到20倍。此外,FastAtDC优于基线算法,与ATDC算法相当。
translated by 谷歌翻译
Existing measures and representations for trajectories have two longstanding fundamental shortcomings, i.e., they are computationally expensive and they can not guarantee the `uniqueness' property of a distance function: dist(X,Y) = 0 if and only if X=Y, where $X$ and $Y$ are two trajectories. This paper proposes a simple yet powerful way to represent trajectories and measure the similarity between two trajectories using a distributional kernel to address these shortcomings. It is a principled approach based on kernel mean embedding which has a strong theoretical underpinning. It has three distinctive features in comparison with existing approaches. (1) A distributional kernel is used for the very first time for trajectory representation and similarity measurement. (2) It does not rely on point-to-point distances which are used in most existing distances for trajectories. (3) It requires no learning, unlike existing learning and deep learning approaches. We show the generality of this new approach in three applications: (a) trajectory anomaly detection, (b) anomalous sub-trajectory detection, and (c) trajectory pattern mining. We identify that the distributional kernel has (i) a unique data-dependent property and the above uniqueness property which are the key factors that lead to its superior task-specific performance; and (ii) runtime orders of magnitude faster than existing distance measures.
translated by 谷歌翻译
日志是确保许多软件系统的可靠性和连续性,尤其是大规模分布式系统的命令。他们忠实地录制运行时信息,以便于系统故障排除和行为理解。由于现代软件系统的大规模和复杂性,日志量已达到前所未有的水平。因此,对于基于逻究的异常检测,常规的手动检查方法甚至传统的基于机器学习的方法变得不切实际,这是一种不切实际的是,作为基于深度学习的解决方案的快速发展的催化剂。然而,目前在诉诸神经网络的代表性日志的异常探测器之间缺乏严格的比较。此外,重新实现过程需要不琐碎的努力,并且可以轻易引入偏差。为了更好地了解不同异常探测器的特性,在本文中,我们提供了六种最先进的方法使用的五种流行神经网络的全面审查和评估。特别是,4种所选方法是无监督的,并且剩下的两个是监督的。这些方法是用两个公开的日志数据集进行评估,其中包含近1600万日志消息和总共有04万个异常实例。我们相信我们的工作可以作为这一领域的基础,为未来的学术研究和工业应用做出贡献。
translated by 谷歌翻译
我们如何检测异常:也就是说,与给定的一组高维数据(例如图像或传感器数据)显着不同的样品?这是众多应用程序的实际问题,也与使学习算法对意外输入更强大的目标有关。自动编码器是一种流行的方法,部分原因是它们的简单性和降低维度的能力。但是,异常评分函数并不适应正常样品范围内重建误差的自然变化,这阻碍了它们检测实际异常的能力。在本文中,我们从经验上证明了局部适应性对具有真实数据的实验中异常评分的重要性。然后,我们提出了新颖的自适应重建基于错误的评分方法,该方法根据潜在空间的重建误差的局部行为来适应其评分。我们表明,这改善了各种基准数据集中相关基线的异常检测性能。
translated by 谷歌翻译
异常的可视化和检测异常(异常值)对许多领域,特别是网络安全的重要性至关重要。在这些领域提出了几种方法,但我们的知识迄今为止,它们都不是在一个相干框架中同时或合作地满足了两个目标。引入了这些方法的可视化方法,用于解释检测算法的输出,而不是用于促进独立视觉检测的数据探测。这是我们的出发点:未经避免,不审视和非分析方法,对Vission(人类流程)和检测(算法)的异常值,分配不变的异常分数(标准化为$ [0,1] $) ,而不是硬二元决定。 Novely的新颖性的主要方面是它将数据转换为新的空间,该空间是在本文中引入的作为邻域累积密度函数(NCDF),其中进行了可视化和检测。在该空间中,异常值非常明显可区分,因此检测算法分配的异常分数在ROC曲线(AUC)下实现了高区域。我们在模拟和最近公布的网络安全数据集中评估了不避免,并将其与其中的三种最成功的异常检测方法进行比较:LOF,IF和FABOD。就AUC而言,不避免几乎是整体胜利者。这篇文章通过提供了对未避免的新理论和实际途径的预测来了解。其中包括设计一种可视化辅助异常检测(Vaad),一种软件通过提供不避免的检测算法(在后发动机中运行),NCDF可视化空间(呈现为绘图)以及其他传统方法在原始特征空间中的可视化,所有这些都在一个交互环境中链接。
translated by 谷歌翻译
长序列中的子序列异常检测是在广泛域中应用的重要问题。但是,迄今为止文献中提出的方法具有严重的局限性:它们要么需要用于设计异常发现算法的先前领域知识,要么在与相同类型的复发异常情况下使用繁琐且昂贵。在这项工作中,我们解决了这些问题,并提出了一种适用于域的不可知论次序列异常检测的方法。我们的方法series2graph基于新型低维嵌入子序列的图表。 Series2Graph不需要标记的实例(例如监督技术)也不需要无异常的数据(例如零阳性学习技术),也不需要识别长度不同的异常。在迄今为止使用的最大合成和真实数据集的实验结果表明,所提出的方法正确地识别了单一和复发异常,而无需任何先验的特征,以优于多种差距的准确性,同时提高了几种竞争的方法,同时又表现出色更快的数量级。本文出现在VLDB 2020中。
translated by 谷歌翻译
The detection of anomalies in time series data is crucial in a wide range of applications, such as system monitoring, health care or cyber security. While the vast number of available methods makes selecting the right method for a certain application hard enough, different methods have different strengths, e.g. regarding the type of anomalies they are able to find. In this work, we compare six unsupervised anomaly detection methods with different complexities to answer the questions: Are the more complex methods usually performing better? And are there specific anomaly types that those method are tailored to? The comparison is done on the UCR anomaly archive, a recent benchmark dataset for anomaly detection. We compare the six methods by analyzing the experimental results on a dataset- and anomaly type level after tuning the necessary hyperparameter for each method. Additionally we examine the ability of individual methods to incorporate prior knowledge about the anomalies and analyse the differences of point-wise and sequence wise features. We show with broad experiments, that the classical machine learning methods show a superior performance compared to the deep learning methods across a wide range of anomaly types.
translated by 谷歌翻译
分析序列数据通常导致有趣模式的发现,然后是异常检测。近年来,已经提出了许多框架和方法来发现序列数据中有趣的模式以及检测异常行为。然而,现有的算法主要专注于频率驱动的分析,并且它们是在现实世界的环境中应用的具有挑战性。在这项工作中,我们展示了一个名为Duos的新的异常检测框架,可以从一组序列中发现实用程序感知异常顺序规则。在基于模式的异常检测算法中,我们纳入了一个组的异常度和实用程序,然后介绍了实用程序感知异常序列规则(UOSR)的概念。我们表明这是一种检测异常的更有意义的方式。此外,我们提出了一些有效的修剪策略w.r.t.用于挖掘UOSR的上限,以及异常检测。在若干现实世界数据集上进行了广泛的实验研究表明,所提出的Duos算法具有更好的有效性和效率。最后,DUOS优于基线算法,具有合适的可扩展性。
translated by 谷歌翻译
近年来,Hyperspectral异常检测(有)已成为一个积极的主题,在军事和平民中发挥着重要作用。作为经典的方法,基于协作的探测器(CRD)引起了广泛的关注和深入研究。尽管CRD方法具有良好的性能,但其计算成本主要由滑动双窗策略产生的,对于广泛的应用来说太高了。此外,需要多次重复测试来确定一旦数据集更改,需要重置的双窗口的大小,并且不能以先验知识预先识别。为了缓解这一问题,我们提出了一种新的集合和随机协同代表性的探测器(ERCRD),其包括两个密切相关的阶段。首先,我们处理CRD(RCRD)上的随机子采样,以获得多个检测结果而不是滑动双窗策略,这显着降低了计算复杂性,并使其在实际应用中更加可行。其次,使用集合学习来优化RCRD的多个结果,这充当各种“专家”提供丰富的互补信息,以更好地定位不同的异常。这样的两个阶段形成有机和理论探测器,这不仅可以提高有方法的准确性和稳定性,而且还可以提高其泛化能力。四个真实高光谱数据集的实验表现出这一提出的Ercrd方法的准确性和效率与最多的最先进的方法相比。
translated by 谷歌翻译
Recently, graph anomaly detection has attracted increasing attention in data mining and machine learning communities. Apart from existing attribute anomalies, graph anomaly detection also captures suspicious topological-abnormal nodes that differ from the major counterparts. Although massive graph-based detection approaches have been proposed, most of them focus on node-level comparison while pay insufficient attention on the surrounding topology structures. Nodes with more dissimilar neighborhood substructures have more suspicious to be abnormal. To enhance the local substructure detection ability, we propose a novel Graph Anomaly Detection framework via Multi-scale Substructure Learning (GADMSL for abbreviation). Unlike previous algorithms, we manage to capture anomalous substructures where the inner similarities are relatively low in dense-connected regions. Specifically, we adopt a region proposal module to find high-density substructures in the network as suspicious regions. Their inner-node embedding similarities indicate the anomaly degree of the detected substructures. Generally, a lower degree of embedding similarities means a higher probability that the substructure contains topology anomalies. To distill better embeddings of node attributes, we further introduce a graph contrastive learning scheme, which observes attribute anomalies in the meantime. In this way, GADMSL can detect both topology and attribute anomalies. Ultimately, extensive experiments on benchmark datasets show that GADMSL greatly improves detection performance (up to 7.30% AUC and 17.46% AUPRC gains) compared to state-of-the-art attributed networks anomaly detection algorithms.
translated by 谷歌翻译
视频异常检测(VAD)主要是指识别在训练集中没有发生的异常事件,其中只有正常样本可用。现有的作品通常将VAD制定为重建或预测问题。然而,这些方法的适应性和可扩展性受到限制。在本文中,我们提出了一种新颖的基于距离的VAD方法,可以有效和灵活地利用所有可用的正常数据。在我们的方法中,测试样本和正常样本之间的距离越小,测试样本正常的概率越高。具体地,我们建议将位置敏感的散列(LSH)使用以预先将其相似度超过特定阈值的样本进行地图。以这种方式,近邻搜索的复杂性显着减少。为了使语义上类似的样本更接近和样本不相似,我们提出了一种新颖的学习版LSH,将LSH嵌入神经网络,并优化具有对比学习策略的哈希功能。该方法对数据不平衡具有鲁棒性,并且可以灵活地处理正常数据的大型类内变化。此外,它具有良好的可扩展性能力。广泛的实验证明了我们的方法的优势,这实现了Vad基准的新型结果。
translated by 谷歌翻译
监视网络流量数据以检测异常的任何隐藏模式是一个具有挑战性和耗时的任务,需要高计算资源。为此,适当的摘要技术非常重要,在那里它可以是原始数据的替代品。但是,总结数据受到异常的威胁。因此,创建一个可以将与原始数据相同的模式的摘要至关重要。因此,在本文中,我们提出了一种智能摘要方法,用于识别隐藏的异常,称为innident。建议的方法保证了将原始数据分布保持在总结数据。我们的方法是一种基于聚类的算法,它通过每个群集中的本地加权功能动态地将原始要素空间映射到新的特征空间。因此,在新的特征空间中,类似的样本更近,因此,异常值更为可检测。此外,基于簇大小的选择代表与总结数据中的原始数据保持相同的分发。在执行异常检测算法和异常检测算法之前,可以使用载体作为预处理方法。基准数据集的实验结果证明了数据的摘要可以是异常检测任务中的原始数据的替代品。
translated by 谷歌翻译
Isolation forest
分类:
Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current literature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement. Our empirical evaluation shows that iForest performs favourably to ORCA, a near-linear time complexity distance-based method, LOF and Random Forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies.
translated by 谷歌翻译
时间序列数据的积累和标签的不存在使时间序列异常检测(AD)是自我监督的深度学习任务。基于单拟合的方法只能触及整个正态性的某些方面,不足以检测各种异常。其中,AD采用的对比度学习方法总是选择正常的负面对,这是反对AD任务的目的。现有的基于多促进的方法通常是两阶段的,首先应用了训练过程,其目标可能与AD不同,因此性能受到预训练的表示的限制。本文提出了一种深层对比的单级异常检测方法(COCA),该方法结合了对比度学习和一级分类的正态性假设。关键思想是将表示和重建表示形式视为无阴性对比度学习的积极对,我们将其命名为序列对比。然后,我们应用了由不变性和方差项组成的对比度损失函数,前者同时优化了这两个假设的损失,后者则防止了超晶体崩溃。在四个现实世界中的时间序列数据集上进行的广泛实验表明,所提出的方法的卓越性能达到了最新。该代码可在https://github.com/ruiking04/coca上公开获得。
translated by 谷歌翻译
在智能交通系统中,交通拥堵异常检测至关重要。运输机构的目标有两个方面:监视感兴趣领域的一般交通状况,并在异常拥堵状态下定位道路细分市场。建模拥塞模式可以实现这些目标,以实现全市道路的目标,相当于学习多元时间序列(MTS)的分布。但是,现有作品要么不可伸缩,要么无法同时捕获MTS中的空间信息。为此,我们提出了一个由数据驱动的生成方法组成的原则性和全面的框架,该方法可以执行可拖动的密度估计来检测流量异常。我们的方法在特征空间中的第一群段段,然后使用条件归一化流以在无监督的设置下在群集级别识别异常的时间快照。然后,我们通过在异常群集上使用内核密度估计器来识别段级别的异常。关于合成数据集的广泛实验表明,我们的方法在召回和F1得分方面显着优于几种最新的拥塞异常检测和诊断方法。我们还使用生成模型来采样标记的数据,该数据可以在有监督的环境中训练分类器,从而减轻缺乏在稀疏设置中进行异常检测的标记数据。
translated by 谷歌翻译
Due to the issue that existing wireless sensor network (WSN)-based anomaly detection methods only consider and analyze temporal features, in this paper, a self-supervised learning-based anomaly node detection method based on an autoencoder is designed. This method integrates temporal WSN data flow feature extraction, spatial position feature extraction and intermodal WSN correlation feature extraction into the design of the autoencoder to make full use of the spatial and temporal information of the WSN for anomaly detection. First, a fully connected network is used to extract the temporal features of nodes by considering a single mode from a local spatial perspective. Second, a graph neural network (GNN) is used to introduce the WSN topology from a global spatial perspective for anomaly detection and extract the spatial and temporal features of the data flows of nodes and their neighbors by considering a single mode. Then, the adaptive fusion method involving weighted summation is used to extract the relevant features between different models. In addition, this paper introduces a gated recurrent unit (GRU) to solve the long-term dependence problem of the time dimension. Eventually, the reconstructed output of the decoder and the hidden layer representation of the autoencoder are fed into a fully connected network to calculate the anomaly probability of the current system. Since the spatial feature extraction operation is advanced, the designed method can be applied to the task of large-scale network anomaly detection by adding a clustering operation. Experiments show that the designed method outperforms the baselines, and the F1 score reaches 90.6%, which is 5.2% higher than those of the existing anomaly detection methods based on unsupervised reconstruction and prediction. Code and model are available at https://github.com/GuetYe/anomaly_detection/GLSL
translated by 谷歌翻译
轨迹预测(TP)是计算机视觉和机器人领域的重要研究主题。最近,已经提出了许多随机TP模型来处理这个问题,并且比具有确定性轨迹输出的传统模型实现了更好的性能。然而,这些随机模型可以产生许多具有不同品质的未来轨迹。它们缺乏自我评估能力,即检查其预测结果的合理性,因此未能引导用户从候选结果中识别高质量的用户。这阻碍了他们在真正的应用中玩最佳。在本文中,我们弥补了这种缺陷并提出了基于轨迹异常检测(AD)技术的新型TP评估方法。在TPAD中,我们首先将自动化机器学习(Automl)技术和广告和TP字段中的经验结合起来,以自动设计有效的轨迹广告模型。然后,我们利用学习的轨迹广告模型来检查预测轨迹的合理性,并筛选出用户的良好TP结果。广泛的实验结果表明,TPAD可以有效地识别近最佳预测结果,提高随机TP模型的实际应用效果。
translated by 谷歌翻译
异常检测(AD),将异常与正常数据分开,从安全性到医疗保健都有许多范围内的应用程序。尽管大多数以前的作品都被证明对具有完全或部分标记数据的案例有效,但由于标记对此任务特别乏味,因此设置在实践中较不常见。在本文中,我们专注于完全无监督的AD,其中包含正常样本和异常样本的整个培训数据集未标记。为了有效地解决这个问题,我们建议通过使用数据改进过程来提高接受自我监督表示的一类分类的鲁棒性。我们提出的数据完善方法基于单级分类器(OCCS)的集合,每个分类器均经过培训的训练数据子集。随着数据改进的改进,通过自我监督学习学到的表示的表示。我们在具有图像和表格数据的各种无监督的AD任务上演示了我们的方法。 CIFAR-10图像数据的异常比率为10% /甲状腺表格数据的2.5%异常比率,该方法的表现优于最先进的单级分类器,高于6.3 AUC和12.5平均精度 / 22.9 F1评分。 。
translated by 谷歌翻译
Anomaly detection and localization are widely used in industrial manufacturing for its efficiency and effectiveness. Anomalies are rare and hard to collect and supervised models easily over-fit to these seen anomalies with a handful of abnormal samples, producing unsatisfactory performance. On the other hand, anomalies are typically subtle, hard to discern, and of various appearance, making it difficult to detect anomalies and let alone locate anomalous regions. To address these issues, we propose a framework called Prototypical Residual Network (PRN), which learns feature residuals of varying scales and sizes between anomalous and normal patterns to accurately reconstruct the segmentation maps of anomalous regions. PRN mainly consists of two parts: multi-scale prototypes that explicitly represent the residual features of anomalies to normal patterns; a multisize self-attention mechanism that enables variable-sized anomalous feature learning. Besides, we present a variety of anomaly generation strategies that consider both seen and unseen appearance variance to enlarge and diversify anomalies. Extensive experiments on the challenging and widely used MVTec AD benchmark show that PRN outperforms current state-of-the-art unsupervised and supervised methods. We further report SOTA results on three additional datasets to demonstrate the effectiveness and generalizability of PRN.
translated by 谷歌翻译
Cyber intrusion attacks that compromise the users' critical and sensitive data are escalating in volume and intensity, especially with the growing connections between our daily life and the Internet. The large volume and high complexity of such intrusion attacks have impeded the effectiveness of most traditional defence techniques. While at the same time, the remarkable performance of the machine learning methods, especially deep learning, in computer vision, had garnered research interests from the cyber security community to further enhance and automate intrusion detections. However, the expensive data labeling and limitation of anomalous data make it challenging to train an intrusion detector in a fully supervised manner. Therefore, intrusion detection based on unsupervised anomaly detection is an important feature too. In this paper, we propose a three-stage deep learning anomaly detection based network intrusion attack detection framework. The framework comprises an integration of unsupervised (K-means clustering), semi-supervised (GANomaly) and supervised learning (CNN) algorithms. We then evaluated and showed the performance of our implemented framework on three benchmark datasets: NSL-KDD, CIC-IDS2018, and TON_IoT.
translated by 谷歌翻译