诸如模型可视化之类的透明度方法提供了仅输出可能会错过的信息,因为它们描述了神经网络的内部。但是,我们可以相信该模型解释反映了模型行为吗?例如,他们可以诊断出异常行为,例如后门或形状偏见吗?为了评估模型解释,如果模型与普通模型的参考集不同,我们将模型定义为异常,并且我们测试透明度方法是否为异常和正常模型分配了不同的解释。我们发现,尽管现有方法可以检测出诸如形状偏见或对抗性训练之类的鲜明异常,但他们努力识别出更微妙的异常情况,例如接受不完整数据训练的模型。此外,他们通常无法区分诱导异常行为的输入,例如包含后门触发器的图像。这些结果揭示了现有模型解释中的新盲点,这表明需要进一步开发方法。
translated by 谷歌翻译
We investigate whether three types of post hoc model explanations--feature attribution, concept activation, and training point ranking--are effective for detecting a model's reliance on spurious signals in the training data. Specifically, we consider the scenario where the spurious signal to be detected is unknown, at test-time, to the user of the explanation method. We design an empirical methodology that uses semi-synthetic datasets along with pre-specified spurious artifacts to obtain models that verifiably rely on these spurious training signals. We then provide a suite of metrics that assess an explanation method's reliability for spurious signal detection under various conditions. We find that the post hoc explanation methods tested are ineffective when the spurious artifact is unknown at test-time especially for non-visible artifacts like a background blur. Further, we find that feature attribution methods are susceptible to erroneously indicating dependence on spurious signals even when the model being explained does not rely on spurious artifacts. This finding casts doubt on the utility of these approaches, in the hands of a practitioner, for detecting a model's reliance on spurious signals.
translated by 谷歌翻译
We propose a technique for producing 'visual explanations' for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable.Our approach -Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say 'dog' in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fullyconnected layers (e.g. VGG), (2) CNNs used for structured outputs (e.g. captioning), (3) CNNs used in tasks with multimodal inputs (e.g. visual question answering) or reinforcement learning, all without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative vi-
translated by 谷歌翻译
我们介绍了几个新的数据集即想象的A / O和Imagenet-R以及合成环境和测试套件,我们称为CAOS。 Imagenet-A / O允许研究人员专注于想象成剩余的盲点。由于追踪稳健的表示,以特殊创建了ImageNet-R,因为表示不再简单地自然,而是包括艺术和其他演绎。 Caos Suite由Carla Simulator构建,允许包含异常物体,可以创建可重复的合成环境和用于测试稳健性的场景。所有数据集都是为测试鲁棒性和衡量鲁棒性的衡量进展而创建的。数据集已用于各种其他作品中,以衡量其具有鲁棒性的自身进步,并允许切向进展,这些进展不会完全关注自然准确性。鉴于这些数据集,我们创建了几种旨在推进鲁棒性研究的新方法。我们以最大Logit的形式和典型程度的形式构建简单的基线,并以深度的形式创建新的数据增强方法,从而提高上述基准。最大Logit考虑Logit值而不是SoftMax操作后的值,而微小的变化会产生明显的改进。典型程分将输出分布与类的后部分布进行比较。我们表明,除了分段任务之外,这将提高对基线的性能。猜测可能在像素级别,像素的语义信息比类级信息的语义信息不太有意义。最后,新的Deepaulment的新增强技术利用神经网络在彻底不同于先前使用的传统几何和相机的转换的图像上创建增强。
translated by 谷歌翻译
随着深度神经网络的兴起,解释这些网络预测的挑战已经越来越识别。虽然存在许多用于解释深度神经网络的决策的方法,但目前没有关于如何评估它们的共识。另一方面,鲁棒性是深度学习研究的热门话题;但是,在最近,几乎没有谈论解释性。在本教程中,我们首先呈现基于梯度的可解释性方法。这些技术使用梯度信号来分配对输入特征的决定的负担。后来,我们讨论如何为其鲁棒性和对抗性的鲁棒性在具有有意义的解释中扮演的作用来评估基于梯度的方法。我们还讨论了基于梯度的方法的局限性。最后,我们提出了在选择解释性方法之前应检查的最佳实践和属性。我们结束了未来在稳健性和解释性融合的地区研究的研究。
translated by 谷歌翻译
We introduce two challenging datasets that reliably cause machine learning model performance to substantially degrade. The datasets are collected with a simple adversarial filtration technique to create datasets with limited spurious cues. Our datasets' real-world, unmodified examples transfer to various unseen models reliably, demonstrating that computer vision models have shared weaknesses. The first dataset is called IMAGENET-A and is like the ImageNet test set, but it is far more challenging for existing models. We also curate an adversarial out-ofdistribution detection dataset called IMAGENET-O, which is the first out-of-distribution detection dataset created for ImageNet models. On IMAGENET-A a DenseNet-121 obtains around 2% accuracy, an accuracy drop of approximately 90%, and its out-of-distribution detection performance on IMAGENET-O is near random chance levels. We find that existing data augmentation techniques hardly boost performance, and using other public training datasets provides improvements that are limited. However, we find that improvements to computer vision architectures provide a promising path towards robust models.
translated by 谷歌翻译
我们介绍了一个简单而直观的自我实施任务,自然合成异常(NSA),用于训练仅使用正常培训数据的端到端模型,以实现异常检测和定位。NSA将Poisson图像编辑整合到来自单独图像的各种尺寸的无缝混合缩放贴片。这会产生广泛的合成异常,与以前的自我监督异常检测的数据 - 启发策略相比,它们更像自然的子图像不规则。我们使用天然和医学图像评估提出的方法。我们对MVTEC AD数据集进行的实验表明,经过训练的用于本地NSA异常的模型可以很好地概括地检测现实世界中的先验未知类型的制造缺陷。我们的方法实现了97.2的总检测AUROC,优于所有以前的方法,这些方法在不使用其他数据集的情况下学习。可在https://github.com/hmsch/natural-synthetic-anomalies上获得代码。
translated by 谷歌翻译
无监督的异常检测已成为一种流行的方法,可以检测医学图像中的病理,因为它不需要监督或标签进行训练。最常见的是,异常检测模型会生成输入映像的“正常”版本,而Pixel $ l^p $ - 两者的差异用于本地化异常。但是,大多数医学图像中存在的复杂解剖结构的不完善重建通常是由于不完善的重建而发生的。该方法还无法检测到没有与周围组织的强度差异很大的异常。我们建议使用特征映射功能解决此问题,该功能将输入强度图像转换为具有多个通道的空间,在该空间中可以沿着从原始图像提取的不同判别特征地图检测到异常。然后,我们使用结构相似性损失在该空间中训练自动编码器模型,该模型不仅考虑强度差异,而且考虑对比度和结构。我们的方法大大提高了大脑MRI的两个医学数据集的性能。代码和实验可从https://github.com/felime/feature-autoencoder获得
translated by 谷歌翻译
Benchmark performance of deep learning classifiers alone is not a reliable predictor for the performance of a deployed model. In particular, if the image classifier has picked up spurious features in the training data, its predictions can fail in unexpected ways. In this paper, we develop a framework that allows us to systematically identify spurious features in large datasets like ImageNet. It is based on our neural PCA components and their visualization. Previous work on spurious features of image classifiers often operates in toy settings or requires costly pixel-wise annotations. In contrast, we validate our results by checking that presence of the harmful spurious feature of a class is sufficient to trigger the prediction of that class. We introduce a novel dataset "Spurious ImageNet" and check how much existing classifiers rely on spurious features.
translated by 谷歌翻译
Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.
translated by 谷歌翻译
Anomaly detection and localization are widely used in industrial manufacturing for its efficiency and effectiveness. Anomalies are rare and hard to collect and supervised models easily over-fit to these seen anomalies with a handful of abnormal samples, producing unsatisfactory performance. On the other hand, anomalies are typically subtle, hard to discern, and of various appearance, making it difficult to detect anomalies and let alone locate anomalous regions. To address these issues, we propose a framework called Prototypical Residual Network (PRN), which learns feature residuals of varying scales and sizes between anomalous and normal patterns to accurately reconstruct the segmentation maps of anomalous regions. PRN mainly consists of two parts: multi-scale prototypes that explicitly represent the residual features of anomalies to normal patterns; a multisize self-attention mechanism that enables variable-sized anomalous feature learning. Besides, we present a variety of anomaly generation strategies that consider both seen and unseen appearance variance to enlarge and diversify anomalies. Extensive experiments on the challenging and widely used MVTec AD benchmark show that PRN outperforms current state-of-the-art unsupervised and supervised methods. We further report SOTA results on three additional datasets to demonstrate the effectiveness and generalizability of PRN.
translated by 谷歌翻译
Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions. In this paper, we develop a novel post-hoc visual explanation method called Score-CAM based on class activation mapping. Unlike previous class activation mapping based approaches, Score-CAM gets rid of the dependence on gradients by obtaining the weight of each activation map through its forward passing score on target class, the final result is obtained by a linear combination of weights and activation maps. We demonstrate that Score-CAM achieves better visual performance and fairness for interpreting the decision making process. Our approach outperforms previous methods on both recognition and localization tasks, it also passes the sanity check. We also indicate its application as debugging tools. The implementation is available 1 .
translated by 谷歌翻译
解释性对于理解深神经网络(DNN)的内部工作至关重要,并且许多解释方法产生显着图,这些图突出了输入图像的一部分,这些图像对DNN的预测有了最大的影响。在本文中,我们设计了一种后门攻击,该攻击改变了网络为输入图像而改变的显着图,仅带有注入的触发器,而肉眼看不见,同时保持预测准确性。该攻击依赖于将中毒的数据注入训练数据集中。显着性图被合并到用于训练深层模型的目标函数的惩罚项中,其对模型训练的影响基于触发器的存在。我们设计了两种类型的攻击:有针对性的攻击,该攻击可以实施显着性图的特定修改和无靶向攻击的特定攻击,而当原始显着性图的顶部像素的重要性得分大大降低时。我们对针对各种深度学习体系结构的基于梯度和无梯度解释方法进行的后门攻击进行经验评估。我们表明,在部署不信任来源开发的深度学习模型时,我们的攻击构成了严重的安全威胁。最后,在补充中,我们证明了所提出的方法可以在倒置的设置中使用,在这种情况下,只有在存在触发器的情况下才能获得正确的显着性图(键),从而有效地使解释系统仅适用于选定的用户。
translated by 谷歌翻译
显着性方法是一种流行的特征归因说明方法,旨在通过识别输入图像中的“重要”像素来捕获模型的预测推理。但是,由于缺乏获得地面模型推理的访问,这些方法的开发和采用受到阻碍,从而阻止了准确的评估。在这项工作中,我们设计了一个合成的基准测试框架SMERF,该框架使我们能够在控制模型推理的复杂性的同时进行基于基础真相的评估。在实验上,SMERF揭示了现有的显着性方法的重大局限性,因此代表了开发新显着性方法的有用工具。
translated by 谷歌翻译
Saliency methods have emerged as a popular tool to highlight features in an input deemed relevant for the prediction of a learned model. Several saliency methods have been proposed, often guided by visual appeal on image data. In this work, we propose an actionable methodology to evaluate what kinds of explanations a given method can and cannot provide. We find that reliance, solely, on visual assessment can be misleading. Through extensive experiments we show that some existing saliency methods are independent both of the model and of the data generating process. Consequently, methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model, such as, finding outliers in the data, explaining the relationship between inputs and outputs that the model learned, and debugging the model. We interpret our findings through an analogy with edge detection in images, a technique that requires neither training data nor model. Theory in the case of a linear model and a single-layer convolutional neural network supports our experimental findings 2 . * Work done during the Google AI Residency Program. 2 All code to replicate our findings will be available here: https://goo.gl/hBmhDt 3 We refer here to the broad category of visualization and attribution methods aimed at interpreting trained models. These methods are often used for interpreting deep neural networks particularly on image data.
translated by 谷歌翻译
无法解释的黑框模型创建场景,使异常引起有害响应,从而造成不可接受的风险。这些风险促使可解释的人工智能(XAI)领域通过评估黑盒神经网络中的局部解释性来改善信任。不幸的是,基本真理对于模型的决定不可用,因此评估仅限于定性评估。此外,可解释性可能导致有关模型或错误信任感的不准确结论。我们建议通过探索Black-Box模型的潜在特征空间来从用户信任的有利位置提高XAI。我们提出了一种使用典型的几弹网络的Protoshotxai方法,该方法探索了不同类别的非线性特征之间的对比歧管。用户通过扰动查询示例的输入功能并记录任何类的示例子集的响应来探索多种多样。我们的方法是第一个可以将其扩展到很少的网络的本地解释的XAI模型。我们将ProtoShotxai与MNIST,Omniglot和Imagenet的最新XAI方法进行了比较,以进行定量和定性,Protoshotxai为模型探索提供了更大的灵活性。最后,Protoshotxai还展示了对抗样品的新颖解释和检测。
translated by 谷歌翻译
由于机器学习算法越来越多地应用于高影响的高风险任务,例如医学诊断或自主驾驶,研究人员可以解释这些算法如何到达他们的预测是至关重要的。近年来,已经开发了许多图像显着性方法,总结了在图像中高度复杂的神经网络“看”以获得其预测的证据。然而,这些技术受到他们启发式性质和建筑限制的限制。在本文中,我们提出了两个主要贡献:首先,我们提出了一般框架,用于学习任何黑盒算法的不同类型的解释。其次,我们专注于框架,找到最负责分类器决定的图像的一部分。与以前的作品不同,我们的方法是模型 - 不可知和可测试的,因为它是在明确和可解释的图像扰动中的基础。
translated by 谷歌翻译
目前无监督的异常本地化方法依赖于生成模型来学习正常图像的分布,后来用于识别从重建图像上的误差导出的潜在的异常区域。然而,几乎所有先前文献的主要限制是需要采用异常图像来设置特定类阈值以定位异常。这限制了它们在现实方案中的可用性,通常可以访问正常数据。尽管存在这一重大缺点,但只有少数工程才能通过整合在培训期间对关注地图的监督来解决了这一限制。在这项工作中,我们提出了一种新的制定,不需要访问异常来定义阈值的图像。此外,与最近的工作相反,所提出的约束以更具原则的方式配制,利用了在约束优化中的知名知识。特别是,在现有工作中的注意图上的平等限制由不等式约束取代,这允许更灵活性。此外,为了解决基于惩罚的函数的限制,我们使用流行的日志屏障方法的扩展来处理约束。对流行的Brats'19数据集的综合实验表明,该方法的方法显着优于相关文献,为无监督的病变细分建立了新的最先进结果。
translated by 谷歌翻译
当前的无监督异常定位方法依赖于生成模型来学习正常图像的分布,后来用于识别从重建图像上的错误中得出的潜在异常区域。但是,几乎所有先前的文献的主要局限性是需要使用异常图像来设置特定于类的阈值以定位异常。这限制了它们在现实的情况下的可用性,其中通常只能访问正常数据。尽管存在这一主要缺点,但只有少量作品通过在培训期间将监督整合到注意地图上,从而解决了这一限制。在这项工作中,我们提出了一种新颖的公式,不需要访问异常的图像来定义阈值。此外,与最近的工作相反,提出的约束是以更有原则的方式制定的,在约束优化方面利用了知名的知识。特别是,对先前工作中注意图的平等约束被不平等约束所取代,这允许更具灵活性。此外,为了解决基于惩罚的功能的局限性,我们采用了流行的对数栏方法的扩展来处理约束。最后,我们提出了一个替代正规化项,该项最大化了注意图的香农熵,从而减少了所提出模型的超参数量。关于脑病变细分的两个公开数据集的全面实验表明,所提出的方法基本上优于相关文献,为无监督病变细分建立了新的最新结果,而无需访问异常图像。
translated by 谷歌翻译
We consider the problem of anomaly detection in images, and present a new detection technique. Given a sample of images, all known to belong to a "normal" class (e.g., dogs), we show how to train a deep neural model that can detect out-of-distribution images (i.e., non-dog objects). The main idea behind our scheme is to train a multi-class model to discriminate between dozens of geometric transformations applied on all the given images. The auxiliary expertise learned by the model generates feature detectors that effectively identify, at test time, anomalous images based on the softmax activation statistics of the model when applied on transformed images. We present extensive experiments using the proposed detector, which indicate that our technique consistently improves all known algorithms by a wide margin.1 Unless otherwise mentioned, the use of the adjective "normal" is unrelated to the Gaussian distribution.32nd Conference on Neural Information Processing Systems (NIPS 2018),
translated by 谷歌翻译