Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net. Evaluations of saliency methods convert this heat map into a new {\em masked input} by retaining the $k$ highest-ranked pixels of the original input and replacing the rest with \textquotedblleft uninformative\textquotedblright\ pixels, and checking if the net's output is mostly unchanged. This is usually seen as an {\em explanation} of the output, but the current paper highlights reasons why this inference of causality may be suspect. Inspired by logic concepts of {\em completeness \& soundness}, it observes that the above type of evaluation focuses on completeness of the explanation, but ignores soundness. New evaluation metrics are introduced to capture both notions, while staying in an {\em intrinsic} framework -- i.e., using the dataset and the net, but no separately trained nets, human evaluations, etc. A simple saliency method is described that matches or outperforms prior methods in the evaluations. Experiments also suggest new intrinsic justifications, based on soundness, for popular heuristic tricks such as TV regularization and upsampling.
translated by 谷歌翻译
Deep neural networks are being used increasingly to automate data analysis and decision making, yet their decision-making process is largely unclear and is difficult to explain to the end users. In this paper, we address the problem of Explainable AI for deep neural networks that take images as input and output a class probability. We propose an approach called RISE that generates an importance map indicating how salient each pixel is for the model's prediction. In contrast to white-box approaches that estimate pixel importance using gradients or other internal network state, RISE works on blackbox models. It estimates importance empirically by probing the model with randomly masked versions of the input image and obtaining the corresponding outputs. We compare our approach to state-of-the-art importance extraction methods using both an automatic deletion/insertion metric and a pointing metric based on human-annotated object segments. Extensive experiments on several benchmark datasets show that our approach matches or exceeds the performance of other methods, including white-box approaches.
translated by 谷歌翻译
In this work we develop a fast saliency detection method that can be applied to any differentiable image classifier. We train a masking model to manipulate the scores of the classifier by masking salient parts of the input image. Our model generalises well to unseen images and requires a single forward pass to perform saliency detection, therefore suitable for use in real-time systems. We test our approach on CIFAR-10 and ImageNet datasets and show that the produced saliency maps are easily interpretable, sharp, and free of artifacts. We suggest a new metric for saliency and test our method on the ImageNet object localisation task. We achieve results outperforming other weakly supervised methods.
translated by 谷歌翻译
由于机器学习算法越来越多地应用于高影响的高风险任务,例如医学诊断或自主驾驶,研究人员可以解释这些算法如何到达他们的预测是至关重要的。近年来,已经开发了许多图像显着性方法,总结了在图像中高度复杂的神经网络“看”以获得其预测的证据。然而,这些技术受到他们启发式性质和建筑限制的限制。在本文中,我们提出了两个主要贡献:首先,我们提出了一般框架,用于学习任何黑盒算法的不同类型的解释。其次,我们专注于框架,找到最负责分类器决定的图像的一部分。与以前的作品不同,我们的方法是模型 - 不可知和可测试的,因为它是在明确和可解释的图像扰动中的基础。
translated by 谷歌翻译
Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.
translated by 谷歌翻译
该属性方法通过识别和可视化占据网络输出的输入区域/像素来提供用于以可视化方式解释不透明神经网络的方向。关于视觉上解释视频理解网络的归因方法,由于视频输入中存在的独特的时空依赖性以及视频理解网络的特殊3D卷积或经常性结构,它具有具有挑战性。然而,大多数现有的归因方法专注于解释拍摄单个图像的网络作为输入,并且少量设计用于视频归属的作品来处理视频理解网络的多样化结构。在本文中,我们调查了与多样化视频理解网络兼容的基于通用扰动的归因方法。此外,我们提出了一种新的正则化术语来增强方法,通过限制其归属的平滑度导致空间和时间维度。为了评估不同视频归因方法的有效性而不依赖于手动判断,我们引入了通过新提出的可靠性测量检查的可靠的客观度量。我们通过主观和客观评估和与多种重要归因方法进行比较验证了我们的方法的有效性。
translated by 谷歌翻译
Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions. In this paper, we develop a novel post-hoc visual explanation method called Score-CAM based on class activation mapping. Unlike previous class activation mapping based approaches, Score-CAM gets rid of the dependence on gradients by obtaining the weight of each activation map through its forward passing score on target class, the final result is obtained by a linear combination of weights and activation maps. We demonstrate that Score-CAM achieves better visual performance and fairness for interpreting the decision making process. Our approach outperforms previous methods on both recognition and localization tasks, it also passes the sanity check. We also indicate its application as debugging tools. The implementation is available 1 .
translated by 谷歌翻译
随着深度神经网络的兴起,解释这些网络预测的挑战已经越来越识别。虽然存在许多用于解释深度神经网络的决策的方法,但目前没有关于如何评估它们的共识。另一方面,鲁棒性是深度学习研究的热门话题;但是,在最近,几乎没有谈论解释性。在本教程中,我们首先呈现基于梯度的可解释性方法。这些技术使用梯度信号来分配对输入特征的决定的负担。后来,我们讨论如何为其鲁棒性和对抗性的鲁棒性在具有有意义的解释中扮演的作用来评估基于梯度的方法。我们还讨论了基于梯度的方法的局限性。最后,我们提出了在选择解释性方法之前应检查的最佳实践和属性。我们结束了未来在稳健性和解释性融合的地区研究的研究。
translated by 谷歌翻译
State-of-the-art object detectors are treated as black boxes due to their highly non-linear internal computations. Even with unprecedented advancements in detector performance, the inability to explain how their outputs are generated limits their use in safety-critical applications. Previous work fails to produce explanations for both bounding box and classification decisions, and generally make individual explanations for various detectors. In this paper, we propose an open-source Detector Explanation Toolkit (DExT) which implements the proposed approach to generate a holistic explanation for all detector decisions using certain gradient-based explanation methods. We suggests various multi-object visualization methods to merge the explanations of multiple objects detected in an image as well as the corresponding detections in a single image. The quantitative evaluation show that the Single Shot MultiBox Detector (SSD) is more faithfully explained compared to other detectors regardless of the explanation methods. Both quantitative and human-centric evaluations identify that SmoothGrad with Guided Backpropagation (GBP) provides more trustworthy explanations among selected methods across all detectors. We expect that DExT will motivate practitioners to evaluate object detectors from the interpretability perspective by explaining both bounding box and classification decisions.
translated by 谷歌翻译
我们描述了一种新颖的归因方法,它基于敏感性分析并使用Sobol指数。除了模拟图像区域的个人贡献之外,索尔索尔指标提供了一种有效的方法来通过方差镜头捕获图像区域与其对神经网络的预测的贡献之间的高阶相互作用。我们描述了一种通过使用扰动掩模与有效估计器耦合的扰动掩模来计算用于高维问题的这些指标的方法,以处理图像的高维度。重要的是,我们表明,与其他黑盒方法相比,该方法对视觉(和语言模型)的标准基准测试的标准基准有利地导致了有利的分数 - 甚至超过最先进的白色的准确性 - 需要访问内部表示的箱方法。我们的代码是免费的:https://github.com/fel-thomas/sobol-attribution-method
translated by 谷歌翻译
可解释的AI(XAI)的基本任务是确定黑匣子功能$ f $做出的预测背后的最重要功能。 Petsiuk等人的插入和缺失测试。 (2018年)用于判断从最重要的对分类至最不重要的算法的质量。在回归问题的激励下,我们在曲线标准(AUC)标准下建立了一个公式,就$ f $的锚定分解中的某些主要效果和相互作用而言。我们找到了在输入到$ f $的随机排序下AUC的期望值的表达式,并提出了回归设置的直线上方的替代区域。我们使用此标准将集成梯度(IG)计算出的特征与内核Shap(KS)以及石灰,DeepLift,Vanilla梯度和输入$ \ times $ \ times $梯度方法进行比较。 KS在我们考虑的两个数据集中具有最好的总体性能,但是计算非常昂贵。我们发现IG几乎和KS一样好,同时更快。我们的比较问题包括一些对IG构成挑战的二进制输入,因为它必须使用可能的变量级别之间的值,因此我们考虑处理IG中二进制变量的方法。我们表明,通过其shapley值进行排序变量并不一定给出插入插入测试的最佳排序。但是,对于加性模型的单调函数(例如逻辑回归),它将做到这一点。
translated by 谷歌翻译
众所周知,端到端的神经NLP体系结构很难理解,这引起了近年来为解释性建模的许多努力。模型解释的基本原则是忠诚,即,解释应准确地代表模型预测背后的推理过程。这项调查首先讨论了忠诚的定义和评估及其对解释性的意义。然后,我们通过将方法分为五类来介绍忠实解释的最新进展:相似性方法,模型内部结构的分析,基于反向传播的方法,反事实干预和自我解释模型。每个类别将通过其代表性研究,优势和缺点来说明。最后,我们从它们的共同美德和局限性方面讨论了上述所有方法,并反思未来的工作方向忠实的解释性。对于有兴趣研究可解释性的研究人员,这项调查将为该领域提供可访问且全面的概述,为进一步探索提供基础。对于希望更好地了解自己的模型的用户,该调查将是一项介绍性手册,帮助选择最合适的解释方法。
translated by 谷歌翻译
可解释的人工智能(XAI)的新兴领域旨在为当今强大但不透明的深度学习模型带来透明度。尽管本地XAI方法以归因图的形式解释了个体预测,从而确定了重要特征的发生位置(但没有提供有关其代表的信息),但全局解释技术可视化模型通常学会的编码的概念。因此,两种方法仅提供部分见解,并留下将模型推理解释的负担。只有少数当代技术旨在将本地和全球XAI背后的原则结合起来,以获取更多信息的解释。但是,这些方法通常仅限于特定的模型体系结构,或对培训制度或数据和标签可用性施加其他要求,这实际上使事后应用程序成为任意预训练的模型。在这项工作中,我们介绍了概念相关性传播方法(CRP)方法,该方法结合了XAI的本地和全球观点,因此允许回答“何处”和“ where”和“什么”问题,而没有其他约束。我们进一步介绍了相关性最大化的原则,以根据模型对模型的有用性找到代表性的示例。因此,我们提高了对激活最大化及其局限性的共同实践的依赖。我们证明了我们方法在各种环境中的能力,展示了概念相关性传播和相关性最大化导致了更加可解释的解释,并通过概念图表,概念组成分析和概念集合和概念子区和概念子区和概念子集和定量研究对模型的表示和推理提供了深刻的见解。它们在细粒度决策中的作用。
translated by 谷歌翻译
值得信赖的机器学习正在推动大量ML社区工作,以提高ML接受和采用。值得信赖的机器学习的主要方面是以下内容:公平,不确定性,鲁棒性,解释性和正式保证。这些各个领域中的每一个都获得了ML社区的兴趣,这是相关出版物数量可见的。但是,很少有作品能够解决这些领域之间的互连。在本文中,我们通过研究校准与解释之间的关系,展示了不确定性和解释性之间的第一个联系。由于给定模型的校准改变了其分数的方式,并且解释方法通常依赖于这些分数,因此可以肯定地假设模型的置信度与我们解释这种模型的能力相互作用。在本文中,我们在对图像分类任务进行培训的网络中显示,解释对置信度敏感。它使我们提出了一种简单的做法来改善解释结果:校准解释。
translated by 谷歌翻译
Despite the popularity of Vision Transformers (ViTs) and eXplainable AI (XAI), only a few explanation methods have been proposed for ViTs thus far. They use attention weights of the classification token on patch embeddings and often produce unsatisfactory saliency maps. In this paper, we propose a novel method for explaining ViTs called ViT-CX. It is based on patch embeddings, rather than attentions paid to them, and their causal impacts on the model output. ViT-CX can be used to explain different ViT models. Empirical results show that, in comparison with previous methods, ViT-CX produces more meaningful saliency maps and does a better job at revealing all the important evidence for prediction. It is also significantly more faithful to the model as measured by deletion AUC and insertion AUC.
translated by 谷歌翻译
Interpretability provides a means for humans to verify aspects of machine learning (ML) models and empower human+ML teaming in situations where the task cannot be fully automated. Different contexts require explanations with different properties. For example, the kind of explanation required to determine if an early cardiac arrest warning system is ready to be integrated into a care setting is very different from the type of explanation required for a loan applicant to help determine the actions they might need to take to make their application successful. Unfortunately, there is a lack of standardization when it comes to properties of explanations: different papers may use the same term to mean different quantities, and different terms to mean the same quantity. This lack of a standardized terminology and categorization of the properties of ML explanations prevents us from both rigorously comparing interpretable machine learning methods and identifying what properties are needed in what contexts. In this work, we survey properties defined in interpretable machine learning papers, synthesize them based on what they actually measure, and describe the trade-offs between different formulations of these properties. In doing so, we enable more informed selection of task-appropriate formulations of explanation properties as well as standardization for future work in interpretable machine learning.
translated by 谷歌翻译
自我监督的视觉学习彻底改变了深度学习,成为域中的下一个重大挑战,并通过大型计算机视觉基准的监督方法迅速缩小了差距。随着当前的模型和培训数据成倍增长,解释和理解这些模型变得关键。我们研究了视力任务的自我监督学习领域中可解释的人工智能的问题,并提出了了解经过自学训练的网络及其内部工作的方法。鉴于自我监督的视觉借口任务的巨大多样性,我们缩小了对理解范式的关注,这些范式从同一图像的两种观点中学习,主要是旨在了解借口任务。我们的工作重点是解释相似性学习,并且很容易扩展到所有其他借口任务。我们研究了两个流行的自我监督视觉模型:Simclr和Barlow Twins。我们总共开发了六种可视化和理解这些模型的方法:基于扰动的方法(条件闭塞,上下文无形的条件闭塞和成对的闭塞),相互作用-CAM,特征可视化,模型差异可视化,平均变换和像素无形。最后,我们通过将涉及单个图像的监督图像分类系统量身定制的众所周知的评估指标来评估这些解释,并将其涉及两个图像的自我监督学习领域。代码为:https://github.com/fawazsammani/xai-ssl
translated by 谷歌翻译
缺失或缺乏输入功能,是许多模型调试工具的基础概念。但是,在计算机视觉中,不能简单地从图像中删除像素。因此,一种倾向于诉诸启发式方法,例如涂黑像素,这反过来又可能引入调试过程中的偏见。我们研究了这样的偏见,特别是展示了基于变压器的架构如何使遗失性更自然地实施,哪些侧架来侧翼这些问题并提高了实践中模型调试的可靠性。我们的代码可从https://github.com/madrylab/missingness获得
translated by 谷歌翻译
随着现代复杂的神经网络不断破坏记录并解决更严重的问题,它们的预测也变得越来越少。目前缺乏解释性通常会破坏敏感设置中精确的机器学习工具的部署。在这项工作中,我们提出了一种基于Shapley系数的层次扩展的图像分类的模型 - 不足的解释方法 - 层次结构(H-SHAP)(H-SHAP) - 解决了当前方法的某些局限性。与其他基于沙普利的解释方法不同,H-shap是可扩展的,并且可以计算而无需近似。在某些分布假设下,例如在多个实例学习中常见的假设,H-shap检索了确切的Shapley系数,并具有指数改善的计算复杂性。我们将我们的分层方法与基于Shapley的流行基于Shapley和基于Shapley的方法进行比较,而基于Shapley的方法,医学成像方案以及一般的计算机视觉问题,表明H-Shap在准确性和运行时都超过了最先进的状态。代码和实验已公开可用。
translated by 谷歌翻译
研究人员提出了多种模型解释方法,但目前尚不清楚大多数方法如何相关或何时一种方法比另一种方法更可取。我们研究了文献,发现许多方法都是基于通过删除来解释的共同原理 - 本质上是测量从模型中删除一组特征的影响。这些方法在几个方面有所不同,因此我们为基于删除的解释开发了一个沿三个维度表征每个方法的框架:1)该方法如何删除特征,2)该方法解释的模型行为以及3)方法如何汇总每个方法功能的影响。我们的框架统一了26种现有方法,其中包括几种最广泛使用的方法(Shap,Lime,有意义的扰动,排列测试)。揭露这些方法之间的基本相似性使用户能够推荐使用哪种工具,并为正在进行的模型解释性研究提出了有希望的方向。
translated by 谷歌翻译