注意机制主导着深层模型的解释性。它们在输入上产生概率分布,该输入被广泛认为是特征对重要指标。但是,在本文中,我们发现注意力解释中的一个关键局限性:识别特征影响的极性的弱点。这将是一种误导性 - 注意力较高的特征可能不会忠实地促进模型预测;相反,它们可以施加抑制作用。有了这一发现,我们反思了当前基于注意力的技术的解释性,例如Attentio $ \ odot $梯度和基于LRP的注意解释。我们首先提出了一种可操作的诊断方法(此后忠实违规测试),以衡量解释权重与影响极性之间的一致性。通过广泛的实验,我们表明大多数经过测试的解释方法出乎意料地受到违反忠诚问题的阻碍,尤其是原始关注。对影响违规问题的因素的经验分析进一步为采用注意模型中采用解释方法提供了有用的观察。
translated by 谷歌翻译
众所周知,端到端的神经NLP体系结构很难理解,这引起了近年来为解释性建模的许多努力。模型解释的基本原则是忠诚,即,解释应准确地代表模型预测背后的推理过程。这项调查首先讨论了忠诚的定义和评估及其对解释性的意义。然后,我们通过将方法分为五类来介绍忠实解释的最新进展:相似性方法,模型内部结构的分析,基于反向传播的方法,反事实干预和自我解释模型。每个类别将通过其代表性研究,优势和缺点来说明。最后,我们从它们的共同美德和局限性方面讨论了上述所有方法,并反思未来的工作方向忠实的解释性。对于有兴趣研究可解释性的研究人员,这项调查将为该领域提供可访问且全面的概述,为进一步探索提供基础。对于希望更好地了解自己的模型的用户,该调查将是一项介绍性手册,帮助选择最合适的解释方法。
translated by 谷歌翻译
变形金刚在NLP中广泛使用,它们始终如一地实现最先进的性能。这是由于他们基于注意力的架构,这使他们能够对单词之间的丰富语言关系进行建模。但是,变压器很难解释。能够为其决策提供推理是人类生命受影响的领域(例如仇恨言论检测和生物医学)的模型的重要特性。随着变压器在这些领域中发现广泛使用,因此需要为其量身定制的可解释性技术。在这项工作中研究了基于注意力的可解释性技术对文本分类中的有效性。尽管担心文献中的基于注意力的解释,但我们表明,通过适当的设置,可以将注意力用于此类任务,结果与最先进的技术相当,同时也更快,更友好。我们通过采用新功能重要性指标的一系列实验来验证我们的主张。
translated by 谷歌翻译
Deep Learning and Machine Learning based models have become extremely popular in text processing and information retrieval. However, the non-linear structures present inside the networks make these models largely inscrutable. A significant body of research has focused on increasing the transparency of these models. This article provides a broad overview of research on the explainability and interpretability of natural language processing and information retrieval methods. More specifically, we survey approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking. The concluding section suggests some possible directions for future research on this topic.
translated by 谷歌翻译
变压器已成为机器学习的重要主力,并具有许多应用。这需要开发可靠的方法来提高其透明度。已经提出了多种基于梯度信息的多种可解释性方法。我们表明,变压器中的梯度仅在本地反映该函数,因此无法可靠地确定输入特征对预测的贡献。我们将注意力头和分层确定为这种不可靠的解释的主要原因,并提出了通过这些层传播的一种更稳定的方式。我们的建议在理论上和经验上都显示出良好的LRP方法的适当扩展,以克服简单基于梯度的方法的缺乏,并实现先进的解释绩效在广泛的变压器模型和数据集上。
translated by 谷歌翻译
视觉问题的视觉关注在视觉问题上应答(VQA)目标在定位有关答案预测的右图像区域,提供强大的技术来促进多模态理解。然而,最近的研究指出,来自视觉关注的突出显示的图像区域通常与给定的问题和答案无关,导致模型混淆正确的视觉推理。为了解决这个问题,现有方法主要是为了对准人类关注的视觉注意力。尽管如此,收集这种人类数据是费力且昂贵的,使其在数据集中调整良好开发的模型。为了解决这个问题,在本文中,我们设计了一种新的视觉关注正规化方法,即attreg,以便在VQA中更好地视觉接地。具体而言,attraT首先识别了由骨干模型出乎意料地忽略(即,分配低注意重量)的问题所必需的图像区域。然后,利用掩模引导的学习方案来规范视觉注意力,以便更多地关注这些忽略的关键区域。所提出的方法是非常灵活的,模型不可知,可以集成到基于大多数基于视觉关注的VQA模型中,并且不需要人类注意监督。已经进行了三个基准数据集,即VQA-CP V2,VQA-CP V1和VQA V2的广泛实验,以评估attreg的有效性。作为副产品,将Attreg纳入强基线LMH时,我们的方法可以实现新的最先进的准确性为60.00%,在VQA-CP V2基准数据集上绝对性能增益为7.01%。 。
translated by 谷歌翻译
As the societal impact of Deep Neural Networks (DNNs) grows, the goals for advancing DNNs become more complex and diverse, ranging from improving a conventional model accuracy metric to infusing advanced human virtues such as fairness, accountability, transparency (FaccT), and unbiasedness. Recently, techniques in Explainable Artificial Intelligence (XAI) are attracting considerable attention, and have tremendously helped Machine Learning (ML) engineers in understanding AI models. However, at the same time, we started to witness the emerging need beyond XAI among AI communities; based on the insights learned from XAI, how can we better empower ML engineers in steering their DNNs so that the model's reasonableness and performance can be improved as intended? This article provides a timely and extensive literature overview of the field Explanation-Guided Learning (EGL), a domain of techniques that steer the DNNs' reasoning process by adding regularization, supervision, or intervention on model explanations. In doing so, we first provide a formal definition of EGL and its general learning paradigm. Secondly, an overview of the key factors for EGL evaluation, as well as summarization and categorization of existing evaluation procedures and metrics for EGL are provided. Finally, the current and potential future application areas and directions of EGL are discussed, and an extensive experimental study is presented aiming at providing comprehensive comparative studies among existing EGL models in various popular application domains, such as Computer Vision (CV) and Natural Language Processing (NLP) domains.
translated by 谷歌翻译
Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.
translated by 谷歌翻译
越来越多的电子健康记录(EHR)数据和深度学习技术进步的越来越多的可用性(DL)已经引发了在开发基于DL的诊断,预后和治疗的DL临床决策支持系统中的研究兴趣激增。尽管承认医疗保健的深度学习的价值,但由于DL的黑匣子性质,实际医疗环境中进一步采用的障碍障碍仍然存在。因此,有一个可解释的DL的新兴需求,它允许最终用户评估模型决策,以便在采用行动之前知道是否接受或拒绝预测和建议。在这篇综述中,我们专注于DL模型在医疗保健中的可解释性。我们首先引入深入解释性的方法,并作为该领域的未来研究人员或临床从业者的方法参考。除了这些方法的细节之外,我们还包括对这些方法的优缺点以及它们中的每个场景都适合的讨论,因此感兴趣的读者可以知道如何比较和选择它们供使用。此外,我们讨论了这些方法,最初用于解决一般域问题,已经适应并应用于医疗保健问题以及如何帮助医生更好地理解这些数据驱动技术。总的来说,我们希望这项调查可以帮助研究人员和从业者在人工智能(AI)和临床领域了解我们为提高其DL模型的可解释性并相应地选择最佳方法。
translated by 谷歌翻译
Some recent works observed the instability of post-hoc explanations when input side perturbations are applied to the model. This raises the interest and concern in the stability of post-hoc explanations. However, the remaining question is: is the instability caused by the neural network model or the post-hoc explanation method? This work explores the potential source that leads to unstable post-hoc explanations. To separate the influence from the model, we propose a simple output probability perturbation method. Compared to prior input side perturbation methods, the output probability perturbation method can circumvent the neural model's potential effect on the explanations and allow the analysis on the explanation method. We evaluate the proposed method with three widely-used post-hoc explanation methods (LIME (Ribeiro et al., 2016), Kernel Shapley (Lundberg and Lee, 2017a), and Sample Shapley (Strumbelj and Kononenko, 2010)). The results demonstrate that the post-hoc methods are stable, barely producing discrepant explanations under output probability perturbations. The observation suggests that neural network models may be the primary source of fragile explanations.
translated by 谷歌翻译
虽然许多方法旨在通过突出突出特征来解释预测,但是这些解释服务的目标以及如何评估它们通常不合适。在这项工作中,我们介绍了一个框架,通过在训练教师模型的学生模型上授予学生模型的准确性增益来量化解释的价值。至关重要的是,培训期间学生可以使用解释,但在测试时间不可用。与先前的建议相比,我们的方法不太易于绘制,实现原则,自动,模型 - 无话会的归属。使用我们的框架,我们比较了许多归属方法,用于文本分类和问题应答,并观察不同学生模型架构和学习策略之间的定量差异(在中度到高度)。
translated by 谷歌翻译
深度学习的显着成功引起了人们对医学成像诊断的应用的兴趣。尽管最新的深度学习模型在分类不同类型的医学数据方面已经达到了人类水平的准确性,但这些模型在临床工作流程中几乎不采用,这主要是由于缺乏解释性。深度学习模型的黑盒子性提出了制定策略来解释这些模型的决策过程的必要性,从而导致了可解释的人工智能(XAI)主题的创建。在这种情况下,我们对应用于医学成像诊断的XAI进行了详尽的调查,包括视觉,基于示例和基于概念的解释方法。此外,这项工作回顾了现有的医学成像数据集和现有的指标,以评估解释的质量。此外,我们还包括一组基于报告生成的方法的性能比较。最后,还讨论了将XAI应用于医学成像以及有关该主题的未来研究指示的主要挑战。
translated by 谷歌翻译
Vision and language models (VL) are known to exploit unrobust indicators in individual modalities (e.g., introduced by distributional biases), instead of focusing on relevant information in each modality. A small drop in accuracy obtained on a VL task with a unimodal model suggests that so-called unimodal collapse occurred. But how to quantify the amount of unimodal collapse reliably, at dataset and instance-level, to diagnose and combat unimodal collapse in a targeted way? We present MM-SHAP, a performance-agnostic multimodality score that quantifies the proportion by which a model uses individual modalities in multimodal tasks. MM-SHAP is based on Shapley values and will be applied in two ways: (1) to compare models for their degree of multimodality, and (2) to measure the contribution of individual modalities for a given task and dataset. Experiments with 6 VL models -- LXMERT, CLIP and four ALBEF variants -- on four VL tasks highlight that unimodal collapse can occur to different degrees and in different directions, contradicting the wide-spread assumption that unimodal collapse is one-sided. We recommend MM-SHAP for analysing multimodal tasks, to diagnose and guide progress towards multimodal integration. Code available at: https://github.com/Heidelberg-NLP/MM-SHAP
translated by 谷歌翻译
许多过去的作品旨在通过监督特征重要性(通过模型解释技术估算)通过人类注释(例如重要图像区域的亮点)来改善模型中的视觉推理。但是,最近的工作表明,即使在随机的监督下,对视觉问题答案(VQA)任务的特征重要性(FI)监督的绩效收益也会持续下去,这表明这些方法不会有意义地将模型FI与人类FI保持一致。在本文中,我们表明模型FI监督可以有意义地提高VQA模型的准确性,并通过优化四个关键模型目标来提高几个正确的右季节(RRR)指标的性能:(1)给出的准确预测有限。但是足够的信息(足够); (2)没有重要信息(不确定性)的最大 - 凝集预测; (3)预测不重要的特征变化(不变性)的不变性; (4)模型FI解释与人类FI解释(合理性)之间的对齐。我们的最佳性能方法,视觉功能重要性监督(Visfis),就分布和分布的精度而言,在基准VQA数据集上优于基准VQA数据集的强大基准。尽管过去的工作表明,提高准确性的机制是通过改善解释的合理性,但我们表明这种关系取决于忠诚的解释(解释是否真的代表了模型的内部推理)。当解释是合理的和忠实的,而不是当它们是合理而不是忠实的时候,预测更为准确。最后,我们表明,令人惊讶的是,在控制模型的分布精度时,RRR指标不能预测分布模型的准确性,这使这些指标的价值质疑评估模型推理的价值。所有支持代码均可在https://github.com/zfying/disfis上获得
translated by 谷歌翻译
Attention mechanisms play a central role in NLP systems, especially within recurrent neural network (RNN) models. Recently, there has been increasing interest in whether or not the intermediate representations offered by these modules may be used to explain the reasoning for a model's prediction, and consequently reach insights regarding the model's decision-making process. A recent paper claims that 'Attention is not Explanation' (Jain and Wallace, 2019). We challenge many of the assumptions underlying this work, arguing that such a claim depends on one's definition of explanation, and that testing it needs to take into account all elements of the model. We propose four alternative tests to determine when/whether attention can be used as explanation: a simple uniform-weights baseline; a variance calibration based on multiple random seed runs; a diagnostic framework using frozen weights from pretrained models; and an end-to-end adversarial attention training protocol. Each allows for meaningful interpretation of attention mechanisms in RNN models. We show that even when reliable adversarial distributions can be found, they don't perform well on the simple diagnostic, indicating that prior work does not disprove the usefulness of attention mechanisms for explainability.
translated by 谷歌翻译
卷积神经网络(CNN)最近由于捕获非线性系统行为并提取预测性时空模式而引起了地球科学的极大关注。然而,鉴于其黑盒的性质以及预测性的重要性,可解释的人工智能方法(XAI)已成为解释CNN决策策略的一种手段。在这里,我们建立了一些最受欢迎的XAI方法的比较,并研究了它们在解释CNN的地球科学应用决策方面的保真度。我们的目标是提高对这些方法的理论局限性的认识,并深入了解相对优势和缺点,以帮助指导最佳实践。所考虑的XAI方法首先应用于理想化的归因基准,在该基准中,该网络解释的基础真实是先验,以帮助客观地评估其性能。其次,我们将XAI应用于与气候相关的预测设置,即解释CNN,该CNN经过训练,可以预测气候模拟每日快照中的大气河流数量。我们的结果突出了XAI方法的几个重要问题(例如,梯度破碎,无法区分归因的迹象,对零输入的无知),这些迹象以前在我们的领域被忽略了,如果不谨慎地考虑,可能会导致扭曲的图片CNN决策策略。我们设想,我们的分析将激发对XAI保真度的进一步调查,并将有助于在地球科学中谨慎地实施XAI,这可能导致进一步剥削CNN和深入学习预测问题。
translated by 谷歌翻译
如今,人工智能(AI)已成为临床和远程医疗保健应用程序的基本组成部分,但是最佳性能的AI系统通常太复杂了,无法自我解释。可解释的AI(XAI)技术被定义为揭示系统的预测和决策背后的推理,并且在处理敏感和个人健康数据时,它们变得更加至关重要。值得注意的是,XAI并未在不同的研究领域和数据类型中引起相同的关注,尤其是在医疗保健领域。特别是,许多临床和远程健康应用程序分别基于表格和时间序列数据,而XAI并未在这些数据类型上进行分析,而计算机视觉和自然语言处理(NLP)是参考应用程序。为了提供最适合医疗领域表格和时间序列数据的XAI方法的概述,本文提供了过去5年中文献的审查,说明了生成的解释的类型以及为评估其相关性所提供的努力和质量。具体而言,我们确定临床验证,一致性评估,客观和标准化质量评估以及以人为本的质量评估作为确保最终用户有效解释的关键特征。最后,我们强调了该领域的主要研究挑战以及现有XAI方法的局限性。
translated by 谷歌翻译
许多可解释性工具使从业人员和研究人员可以解释自然语言处理系统。但是,每个工具都需要不同的配置,并提供不同形式的解释,从而阻碍了评估和比较它们的可能性。原则上的统一评估基准将指导用户解决中心问题:哪种解释方法对我的用例更可靠?我们介绍了雪貂,这是一个易于使用的,可扩展的Python库,以解释与拥抱面枢纽集成的基于变形金刚的模型。它提供了一个统一的基准测试套件来测试和比较任何文本或可解释性语料库的广泛最先进的解释器。此外,雪貂提供方便的编程摘要,以促进新的解释方法,数据集或评估指标的引入。
translated by 谷歌翻译
Attention mechanisms have seen wide adoption in neural NLP models. In addition to improving predictive performance, these are often touted as affording transparency: models equipped with attention provide a distribution over attended-to input units, and this is often presented (at least implicitly) as communicating the relative importance of inputs. However, it is unclear what relationship exists between attention weights and model outputs. In this work we perform extensive experiments across a variety of NLP tasks that aim to assess the degree to which attention weights provide meaningful "explanations" for predictions. We find that they largely do not. For example, learned attention weights are frequently uncorrelated with gradient-based measures of feature importance, and one can identify very different attention distributions that nonetheless yield equivalent predictions. Our findings show that standard attention modules do not provide meaningful explanations and should not be treated as though they do. Code to reproduce all experiments is available at https://github.com/successar/ AttentionExplanation.
translated by 谷歌翻译
深层神经网络以其对各种机器学习和人工智能任务的精湛处理而闻名。但是,由于其过度参数化的黑盒性质,通常很难理解深层模型的预测结果。近年来,已经提出了许多解释工具来解释或揭示模型如何做出决策。在本文中,我们回顾了这一研究,并尝试进行全面的调查。具体来说,我们首先介绍并阐明了人们通常会感到困惑的两个基本概念 - 解释和解释性。为了解决解释中的研究工作,我们通过提出新的分类法来阐述许多解释算法的设计。然后,为了了解解释结果,我们还调查了评估解释算法的性能指标。此外,我们总结了使用“可信赖”解释算法评估模型的解释性的当前工作。最后,我们审查并讨论了深层模型的解释与其他因素之间的联系,例如对抗性鲁棒性和从解释中学习,并介绍了一些开源库,以解释算法和评估方法。
translated by 谷歌翻译