异常检测方法努力以语义方式发现与规范不同的模式。这个目标是模棱两可的,因为数据点与规范不同的属性不同,例如年龄,种族或性别,可能被某些操作员认为是异常的,而其他操作员可能认为这种属性无关紧要。从先前的研究中断,我们提出了一种新的异常检测方法,该方法使操作员可以将属性排除在被认为与异常检测相关的情况下。然后,我们的方法学习了不包含有关滋扰属性的信息的表示形式。使用基于密度的方法进行异常评分。重要的是,我们的方法不需要指定与检测异常相关的属性,这在异常检测中通常是不可能的,而是只能忽略的属性。提出了一项实证研究,以验证我们方法的有效性。
translated by 谷歌翻译
异常检测方法识别偏离数据集的正常行为的样本。它通常用于训练集,其中包含来自多个标记类或单个未标记的类的普通数据。当前方法面对培训数据时争取多个类但没有标签。在这项工作中,我们首先发现自我监督的图像聚类方法学习的分类器为未标记的多级数据集上的异常检测提供了强大的基线。也许令人惊讶的是,我们发现初始化具有预先训练功能的聚类方法并不能改善其自我监督的对应物。这是由于灾难性遗忘的现象。相反,我们建议了两级方法。我们使用自我监督方法群集图像并为每个图像获取群集标签。我们使用群集标签作为“伪监督”,用于分销(OOD)方法。具体而言,我们通过群集标签对图像进行分类的任务进行预训练功能。我们提供了我们对方法的广泛分析,并展示了我们两级方法的必要性。我们评估符合最先进的自我监督和预用方法,并表现出卓越的性能。
translated by 谷歌翻译
Deep anomaly detection methods learn representations that separate between normal and anomalous images. Although self-supervised representation learning is commonly used, small dataset sizes limit its effectiveness. It was previously shown that utilizing external, generic datasets (e.g. ImageNet classification) can significantly improve anomaly detection performance. One approach is outlier exposure, which fails when the external datasets do not resemble the anomalies. We take the approach of transferring representations pre-trained on external datasets for anomaly detection. Anomaly detection performance can be significantly improved by fine-tuning the pre-trained representations on the normal training images. In this paper, we first demonstrate and analyze that contrastive learning, the most popular self-supervised learning paradigm cannot be naively applied to pre-trained features. The reason is that pre-trained feature initialization causes poor conditioning for standard contrastive objectives, resulting in bad optimization dynamics. Based on our analysis, we provide a modified contrastive objective, the Mean-Shifted Contrastive Loss. Our method is highly effective and achieves a new state-of-the-art anomaly detection performance including $98.6\%$ ROC-AUC on the CIFAR-10 dataset.
translated by 谷歌翻译
机器学习模型通常会遇到与训练分布不同的样本。无法识别分布(OOD)样本,因此将该样本分配给课堂标签会显着损害模​​型的可靠性。由于其对在开放世界中的安全部署模型的重要性,该问题引起了重大关注。由于对所有可能的未知分布进行建模的棘手性,检测OOD样品是具有挑战性的。迄今为止,一些研究领域解决了检测陌生样本的问题,包括异常检测,新颖性检测,一级学习,开放式识别识别和分布外检测。尽管有相似和共同的概念,但分别分布,开放式检测和异常检测已被独立研究。因此,这些研究途径尚未交叉授粉,创造了研究障碍。尽管某些调查打算概述这些方法,但它们似乎仅关注特定领域,而无需检查不同领域之间的关系。这项调查旨在在确定其共同点的同时,对各个领域的众多著名作品进行跨域和全面的审查。研究人员可以从不同领域的研究进展概述中受益,并协同发展未来的方法。此外,据我们所知,虽然进行异常检测或单级学习进行了调查,但没有关于分布外检测的全面或最新的调查,我们的调查可广泛涵盖。最后,有了统一的跨域视角,我们讨论并阐明了未来的研究线,打算将这些领域更加紧密地融为一体。
translated by 谷歌翻译
Video anomaly detection (VAD) is a challenging computer vision task with many practical applications. As anomalies are inherently ambiguous, it is essential for users to understand the reasoning behind a system's decision in order to determine if the rationale is sound. In this paper, we propose a simple but highly effective method that pushes the boundaries of VAD accuracy and interpretability using attribute-based representations. Our method represents every object by its velocity and pose. The anomaly scores are computed using a density-based approach. Surprisingly, we find that this simple representation is sufficient to achieve state-of-the-art performance in ShanghaiTech, the largest and most complex VAD dataset. Combining our interpretable attribute-based representations with implicit, deep representation yields state-of-the-art performance with a $99.1\%, 93.3\%$, and $85.9\%$ AUROC on Ped2, Avenue, and ShanghaiTech, respectively. Our method is accurate, interpretable, and easy to implement.
translated by 谷歌翻译
We aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without anomalous data. To this end, we propose a two-stage framework for building anomaly detectors using normal training data only. We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. We learn representations by classifying normal data from the CutPaste, a simple data augmentation strategy that cuts an image patch and pastes at a random location of a large image. Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects. We bring the improvement upon previous arts by 3.1 AUCs when learning representations from scratch. By transfer learning on pretrained representations on ImageNet, we achieve a new state-of-theart 96.6 AUC. Lastly, we extend the framework to learn and extract representations from patches to allow localizing defective areas without annotations during training.
translated by 谷歌翻译
我们如何检测异常:也就是说,与给定的一组高维数据(例如图像或传感器数据)显着不同的样品?这是众多应用程序的实际问题,也与使学习算法对意外输入更强大的目标有关。自动编码器是一种流行的方法,部分原因是它们的简单性和降低维度的能力。但是,异常评分函数并不适应正常样品范围内重建误差的自然变化,这阻碍了它们检测实际异常的能力。在本文中,我们从经验上证明了局部适应性对具有真实数据的实验中异常评分的重要性。然后,我们提出了新颖的自适应重建基于错误的评分方法,该方法根据潜在空间的重建误差的局部行为来适应其评分。我们表明,这改善了各种基准数据集中相关基线的异常检测性能。
translated by 谷歌翻译
增加光伏(PV)工厂的部署需要在模态中自动检测故障PV模块,例如红外(IR)图像。最近,深入学习已经为此受欢迎。然而,相关的作品通常是来自相同分布的样本列车和测试数据忽略不同光伏工厂数据之间的域移位的存在。相反,我们将故障检测视为更现实无监督的域适应问题,我们在训练一个源PV工厂的标记数据并在另一个目标工厂进行预测。我们培训具有监督对比损失的Reset-34卷积神经网络,在其中我们采用K-Collect Exband Classifier来检测异常。我们的方法在接收器下实现令人满意的区域(Auroc),在九个源和目标数据集的九种组合中的达到73.3%至96.6%,其中8.5%的8.5%是异常的。在某些情况下,它甚至优于二进制交叉熵分类器。固定决策阈值,这导致79.4%和77.1%分别正确分类正常和异常图像。大多数错误分类的异常具有低严重程度,例如热二极管和小型热点。我们的方法对封锁率设置不敏感,汇聚快速并可靠地检测未知类型的异常,使其适合实践。可能的用途是自动PV工厂检测系统或通过过滤普通图像来简化IR数据集的手动标记。此外,我们的工作为使用无监督域适应的PV模块故障检测提供了更现实的观点,以开发具有有利的概括功能的更加性能的方法。
translated by 谷歌翻译
时间序列数据的积累和标签的不存在使时间序列异常检测(AD)是自我监督的深度学习任务。基于单拟合的方法只能触及整个正态性的某些方面,不足以检测各种异常。其中,AD采用的对比度学习方法总是选择正常的负面对,这是反对AD任务的目的。现有的基于多促进的方法通常是两阶段的,首先应用了训练过程,其目标可能与AD不同,因此性能受到预训练的表示的限制。本文提出了一种深层对比的单级异常检测方法(COCA),该方法结合了对比度学习和一级分类的正态性假设。关键思想是将表示和重建表示形式视为无阴性对比度学习的积极对,我们将其命名为序列对比。然后,我们应用了由不变性和方差项组成的对比度损失函数,前者同时优化了这两个假设的损失,后者则防止了超晶体崩溃。在四个现实世界中的时间序列数据集上进行的广泛实验表明,所提出的方法的卓越性能达到了最新。该代码可在https://github.com/ruiking04/coca上公开获得。
translated by 谷歌翻译
与行业4.0的发展相一致,越来越多的关注被表面缺陷检测领域所吸引。提高效率并节省劳动力成本已稳步成为行业领域引起人们关注的问题,近年来,基于深度学习的算法比传统的视力检查方法更好。尽管现有的基于深度学习的算法偏向于监督学习,但这不仅需要大量标记的数据和大量的劳动力,而且还效率低下,并且有一定的局限性。相比之下,最近的研究表明,无监督的学习在解决视觉工业异常检测的高于缺点方面具有巨大的潜力。在这项调查中,我们总结了当前的挑战,并详细概述了最近提出的针对视觉工业异常检测的无监督算法,涵盖了五个类别,其创新点和框架详细描述了。同时,提供了包含表面图像样本的公开可用数据集的信息。通过比较不同类别的方法,总结了异常检测算法的优点和缺点。预计将协助研究社区和行业发展更广泛,更跨域的观点。
translated by 谷歌翻译
Cross-domain graph anomaly detection (CD-GAD) describes the problem of detecting anomalous nodes in an unlabelled target graph using auxiliary, related source graphs with labelled anomalous and normal nodes. Although it presents a promising approach to address the notoriously high false positive issue in anomaly detection, little work has been done in this line of research. There are numerous domain adaptation methods in the literature, but it is difficult to adapt them for GAD due to the unknown distributions of the anomalies and the complex node relations embedded in graph data. To this end, we introduce a novel domain adaptation approach, namely Anomaly-aware Contrastive alignmenT (ACT), for GAD. ACT is designed to jointly optimise: (i) unsupervised contrastive learning of normal representations of nodes in the target graph, and (ii) anomaly-aware one-class alignment that aligns these contrastive node representations and the representations of labelled normal nodes in the source graph, while enforcing significant deviation of the representations of the normal nodes from the labelled anomalous nodes in the source graph. In doing so, ACT effectively transfers anomaly-informed knowledge from the source graph to learn the complex node relations of the normal class for GAD on the target graph without any specification of the anomaly distributions. Extensive experiments on eight CD-GAD settings demonstrate that our approach ACT achieves substantially improved detection performance over 10 state-of-the-art GAD methods. Code is available at https://github.com/QZ-WANG/ACT.
translated by 谷歌翻译
Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications - for example, the labeled data may contain only anomalies unlike unlabeled data, or unlabeled data may contain different types of anomalies, or labeled data may contain only 'easy-to-label' samples. SPADE utilizes an ensemble of one class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching is proposed to automatically select the critical hyper-parameters for pseudo-labeling without validation data, which is crucial with limited labeled data. SPADE shows state-of-the-art semi-supervised anomaly detection performance across a wide range of scenarios with distribution mismatch in both tabular and image domains. In some common real-world settings such as model facing new types of unlabeled anomalies, SPADE outperforms the state-of-the-art alternatives by 5% AUC in average.
translated by 谷歌翻译
开放式视频异常检测(OpenVAD)旨在从视频数据中识别出异常事件,在测试中都存在已知的异常和新颖的事件。无监督的模型仅从普通视频中学到的模型适用于任何测试异常,但遭受高误报率的损失。相比之下,弱监督的方法可有效检测已知的异常情况,但在开放世界中可能会失败。我们通过将证据深度学习(EDL)和将流量(NFS)归一化为多个实例学习(MIL)框架来开发出一种新颖的OpenVAD问题的弱监督方法。具体而言,我们建议使用图形神经网络和三重态损失来学习训练EDL分类器的区分特征,在该特征中,EDL能够通过量化不确定性来识别未知异常。此外,我们制定了一种不确定性感知的选择策略,以获取清洁异常实例和NFS模块以生成伪异常。我们的方法通过继承无监督的NF和弱监督的MIL框架的优势来优于现有方法。多个现实世界视频数据集的实验结果显示了我们方法的有效性。
translated by 谷歌翻译
近年来,许多作品已经解决了在视频中发现从未见过的问题。然而,大多数工作都集中在从安全摄像机中检测监视视频中的异常帧。同时,异常检测(AD)在具有异常力学行为的视频中的任务大多被忽视。在这些视频中的异常检测是学术和实际的兴趣,因为它们可以在许多制造,维护和现实生活中自动检测出故障。为了评估检测这种异常的不同方法的潜力,我们评估了两个简单的基线方法:(i)时间汇集图像广告技术。 (ii)用于视频分类的预追溯特征的视频的密度估计。开发此类方法要求新的基准,以允许评估不同可能的方法。我们介绍了物理异常轨迹或运动(幻像)数据集,其中包含六个不同的视频类。每个类都包括正常和异常的视频。课程在呈现的现象,正常的级别变异性和视频中的异常类型中不同。我们还建议甚至更难的基准,应该在高度变量场景中发现异常活动。
translated by 谷歌翻译
We propose a novel reconstruction-based model for anomaly detection, called Y-GAN. The model consists of a Y-shaped auto-encoder and represents images in two separate latent spaces. The first captures meaningful image semantics, key for representing (normal) training data, whereas the second encodes low-level residual image characteristics. To ensure the dual representations encode mutually exclusive information, a disentanglement procedure is designed around a latent (proxy) classifier. Additionally, a novel consistency loss is proposed to prevent information leakage between the latent spaces. The model is trained in a one-class learning setting using normal training data only. Due to the separation of semantically-relevant and residual information, Y-GAN is able to derive informative data representations that allow for efficient anomaly detection across a diverse set of anomaly detection tasks. The model is evaluated in comprehensive experiments with several recent anomaly detection models using four popular datasets, i.e., MNIST, FMNIST and CIFAR10, and PlantVillage.
translated by 谷歌翻译
Recently, graph anomaly detection has attracted increasing attention in data mining and machine learning communities. Apart from existing attribute anomalies, graph anomaly detection also captures suspicious topological-abnormal nodes that differ from the major counterparts. Although massive graph-based detection approaches have been proposed, most of them focus on node-level comparison while pay insufficient attention on the surrounding topology structures. Nodes with more dissimilar neighborhood substructures have more suspicious to be abnormal. To enhance the local substructure detection ability, we propose a novel Graph Anomaly Detection framework via Multi-scale Substructure Learning (GADMSL for abbreviation). Unlike previous algorithms, we manage to capture anomalous substructures where the inner similarities are relatively low in dense-connected regions. Specifically, we adopt a region proposal module to find high-density substructures in the network as suspicious regions. Their inner-node embedding similarities indicate the anomaly degree of the detected substructures. Generally, a lower degree of embedding similarities means a higher probability that the substructure contains topology anomalies. To distill better embeddings of node attributes, we further introduce a graph contrastive learning scheme, which observes attribute anomalies in the meantime. In this way, GADMSL can detect both topology and attribute anomalies. Ultimately, extensive experiments on benchmark datasets show that GADMSL greatly improves detection performance (up to 7.30% AUC and 17.46% AUPRC gains) compared to state-of-the-art attributed networks anomaly detection algorithms.
translated by 谷歌翻译
异常检测(AD),将异常与正常数据分开,从安全性到医疗保健都有许多范围内的应用程序。尽管大多数以前的作品都被证明对具有完全或部分标记数据的案例有效,但由于标记对此任务特别乏味,因此设置在实践中较不常见。在本文中,我们专注于完全无监督的AD,其中包含正常样本和异常样本的整个培训数据集未标记。为了有效地解决这个问题,我们建议通过使用数据改进过程来提高接受自我监督表示的一类分类的鲁棒性。我们提出的数据完善方法基于单级分类器(OCCS)的集合,每个分类器均经过培训的训练数据子集。随着数据改进的改进,通过自我监督学习学到的表示的表示。我们在具有图像和表格数据的各种无监督的AD任务上演示了我们的方法。 CIFAR-10图像数据的异常比率为10% /甲状腺表格数据的2.5%异常比率,该方法的表现优于最先进的单级分类器,高于6.3 AUC和12.5平均精度 / 22.9 F1评分。 。
translated by 谷歌翻译
检测与培训数据偏离的测试数据是安全和健壮的机器学习的核心问题。通过生成模型学到的可能性,例如,通过标准对数似然训练的归一流流量,作为异常得分的表现不佳。我们建议使用未标记的辅助数据集和概率异常得分进行异常检测。我们使用在辅助数据集上训练的自我监督功能提取器,并通过最大程度地提高分布数据的可能性并最大程度地减少辅助数据集上的可能性来训练提取功能的正常化流程。我们表明,这等同于学习分布和辅助特征密度之间的归一化正差。我们在基准数据集上进行实验,并显示出与可能性,似然比方法和最新异常检测方法相比的强大改进。
translated by 谷歌翻译
无监督的异常检测(UAD)只需要正常(健康)训练图像是实现医学图像分析(MIA)应用的重要工具,例如疾病筛查,因为通常难以收集和注释异常(或疾病)MIA中的图像。然而,严重依赖于正常图像可能导致模型训练过度填写正常类。自我监督的预训练是对这个问题的有效解决方案。遗憾的是,从计算机视觉调整的当前自我监督方法是MIA应用的次优,因为它们不探索设计借口任务或培训过程的MIA域知识。在本文中,我们提出了一种为MIA应用设计的UAD的新的自我监督的预训练方法,通过对比学习(MSACL)命名为多级强大增强。 MSACL基于新颖的优化,以对比正常和多种合成的异常图像,每个类在欧几里德距离和余弦相似度方面强制形成紧密和密集的聚类,其中通过模拟变化数量的病变形成异常图像在正常图像中的不同尺寸和外观。在实验中,我们表明,我们的MSACL预培训使用结肠镜检查,眼底筛选和Covid-19胸部X射线数据集来提高SOTA UAD方法的准确性。
translated by 谷歌翻译
我们介绍了一个简单而直观的自我实施任务,自然合成异常(NSA),用于训练仅使用正常培训数据的端到端模型,以实现异常检测和定位。NSA将Poisson图像编辑整合到来自单独图像的各种尺寸的无缝混合缩放贴片。这会产生广泛的合成异常,与以前的自我监督异常检测的数据 - 启发策略相比,它们更像自然的子图像不规则。我们使用天然和医学图像评估提出的方法。我们对MVTEC AD数据集进行的实验表明,经过训练的用于本地NSA异常的模型可以很好地概括地检测现实世界中的先验未知类型的制造缺陷。我们的方法实现了97.2的总检测AUROC,优于所有以前的方法,这些方法在不使用其他数据集的情况下学习。可在https://github.com/hmsch/natural-synthetic-anomalies上获得代码。
translated by 谷歌翻译