Post-hoc explanation methods have become increasingly depended upon for understanding black-box classifiers in high-stakes applications, precipitating a need for reliable explanations. While numerous explanation methods have been proposed, recent works have shown that many existing methods can be inconsistent or unstable. In addition, high-performing classifiers are often highly nonlinear and can exhibit complex behavior around the decision boundary, leading to brittle or misleading local explanations. Therefore, there is an impending need to quantify the uncertainty of such explanation methods in order to understand when explanations are trustworthy. We introduce a novel uncertainty quantification method parameterized by a Gaussian Process model, which combines the uncertainty approximation of existing methods with a novel geodesic-based similarity which captures the complexity of the target black-box decision boundary. The proposed framework is highly flexible; it can be used with any black-box classifier and feature attribution method to amortize uncertainty estimates for explanations. We show theoretically that our proposed geodesic-based kernel similarity increases with the complexity of the decision boundary. Empirical results on multiple tabular and image datasets show that our decision boundary-aware uncertainty estimate improves understanding of explanations as compared to existing methods.
translated by 谷歌翻译
由于黑匣子的解释越来越多地用于在高赌注设置中建立模型可信度,重要的是确保这些解释准确可靠。然而,事先工作表明,最先进的技术产生的解释是不一致的,不稳定的,并且提供了对它们的正确性和可靠性的极少了解。此外,这些方法也在计算上效率低下,并且需要显着的超参数调谐。在本文中,我们通过开发一种新的贝叶斯框架来涉及用于产生当地解释以及相关的不确定性来解决上述挑战。我们将本框架实例化以获取贝叶斯版本的石灰和kernelshap,其为特征重要性输出可靠的间隔,捕获相关的不确定性。由此产生的解释不仅使我们能够对其质量进行具体推论(例如,有95%的几率是特征重要性在给定范围内),但也是高度一致和稳定的。我们执行了一个详细的理论分析,可以利用上述不确定性来估计对样品的扰动有多少,以及如何进行更快的收敛。这项工作首次尝试在一次拍摄中通过流行的解释方法解决几个关键问题,从而以计算上有效的方式产生一致,稳定和可靠的解释。具有多个真实世界数据集和用户研究的实验评估表明,提出的框架的功效。
translated by 谷歌翻译
Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.
translated by 谷歌翻译
机器学习方法在做出预测方面越来越好,但与此同时,它们也变得越来越复杂和透明。结果,通常依靠解释器为这些黑框预测模型提供可解释性。作为关键的诊断工具,重要的是这些解释者本身是可靠的。在本文中,我们关注可靠性的一个特定方面,即解释器应为类似的数据输入提供类似的解释。我们通过引入和定义解释者的敏锐度,类似于分类器的敏锐性来形式化这个概念。我们的形式主义灵感来自概率lipchitzness的概念,该概念捕捉了功能局部平滑度的可能性。对于各种解释者(例如,摇摆,上升,CXPLAIN),我们为这些解释者的敏锐性提供了下限保证,鉴于预测函数的lipschitzness。这些理论上的结果表明,局部平滑的预测函数为局部强大的解释提供了自身。我们对模拟和真实数据集进行经验评估这些结果。
translated by 谷歌翻译
基于Shapley值的功能归因在解释机器学习模型中很受欢迎。但是,从理论和计算的角度来看,它们的估计是复杂的。我们将这种复杂性分解为两个因素:(1)〜删除特征信息的方法,以及(2)〜可拖动估计策略。这两个因素提供了一种天然镜头,我们可以更好地理解和比较24种不同的算法。基于各种特征删除方法,我们描述了多种类型的Shapley值特征属性和计算每个类型的方法。然后,基于可进行的估计策略,我们表征了两个不同的方法家族:模型 - 不合时宜的和模型特定的近似值。对于模型 - 不合稳定的近似值,我们基准了广泛的估计方法,并将其与Shapley值的替代性但等效的特征联系起来。对于特定于模型的近似值,我们阐明了对每种方法的线性,树和深模型的障碍至关重要的假设。最后,我们确定了文献中的差距以及有希望的未来研究方向。
translated by 谷歌翻译
封闭曲线的建模和不确定性量化是形状分析领域的重要问题,并且可以对随后的统计任务产生重大影响。这些任务中的许多涉及封闭曲线的集合,这些曲线通常在多个层面上表现出结构相似性。以有效融合这种曲线间依赖性的方式对多个封闭曲线进行建模仍然是一个具有挑战性的问题。在这项工作中,我们提出并研究了一个多数输出(又称多输出),多维高斯流程建模框架。我们说明了提出的方法学进步,并在几个曲线和形状相关的任务上证明了有意义的不确定性量化的实用性。这种基于模型的方法不仅解决了用内核构造对封闭曲线(及其形状)的推断问题,而且还为通常对功能对象的多层依赖性的非参数建模打开了门。
translated by 谷歌翻译
Interpretability provides a means for humans to verify aspects of machine learning (ML) models and empower human+ML teaming in situations where the task cannot be fully automated. Different contexts require explanations with different properties. For example, the kind of explanation required to determine if an early cardiac arrest warning system is ready to be integrated into a care setting is very different from the type of explanation required for a loan applicant to help determine the actions they might need to take to make their application successful. Unfortunately, there is a lack of standardization when it comes to properties of explanations: different papers may use the same term to mean different quantities, and different terms to mean the same quantity. This lack of a standardized terminology and categorization of the properties of ML explanations prevents us from both rigorously comparing interpretable machine learning methods and identifying what properties are needed in what contexts. In this work, we survey properties defined in interpretable machine learning papers, synthesize them based on what they actually measure, and describe the trade-offs between different formulations of these properties. In doing so, we enable more informed selection of task-appropriate formulations of explanation properties as well as standardization for future work in interpretable machine learning.
translated by 谷歌翻译
我们研究了回归中神经网络(NNS)的模型不确定性的方法。为了隔离模型不确定性的效果,我们专注于稀缺训练数据的无噪声环境。我们介绍了关于任何方法都应满足的模型不确定性的五个重要的逃亡者。但是,我们发现,建立的基准通常无法可靠地捕获其中一些逃避者,即使是贝叶斯理论要求的基准。为了解决这个问题,我们介绍了一种新方法来捕获NNS的模型不确定性,我们称之为基于神经优化的模型不确定性(NOMU)。 NOMU的主要思想是设计一个由两个连接的子NN组成的网络体系结构,一个用于模型预测,一个用于模型不确定性,并使用精心设计的损耗函数进行训练。重要的是,我们的设计执行NOMU满足我们的五个Desiderata。由于其模块化体系结构,NOMU可以为任何给定(先前训练)NN提供模型不确定性,如果访问其培训数据。我们在各种回归任务和无嘈杂的贝叶斯优化(BO)中评估NOMU,并具有昂贵的评估。在回归中,NOMU至少和最先进的方法。在BO中,Nomu甚至胜过所有考虑的基准。
translated by 谷歌翻译
Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines
translated by 谷歌翻译
Post-hoc explanation methods are used with the intent of providing insights about neural networks and are sometimes said to help engender trust in their outputs. However, popular explanations methods have been found to be fragile to minor perturbations of input features or model parameters. Relying on constraint relaxation techniques from non-convex optimization, we develop a method that upper-bounds the largest change an adversary can make to a gradient-based explanation via bounded manipulation of either the input features or model parameters. By propagating a compact input or parameter set as symbolic intervals through the forwards and backwards computations of the neural network we can formally certify the robustness of gradient-based explanations. Our bounds are differentiable, hence we can incorporate provable explanation robustness into neural network training. Empirically, our method surpasses the robustness provided by previous heuristic approaches. We find that our training method is the only method able to learn neural networks with certificates of explanation robustness across all six datasets tested.
translated by 谷歌翻译
Explainable artificial intelligence is proposed to provide explanations for reasoning performed by an Artificial Intelligence. There is no consensus on how to evaluate the quality of these explanations, since even the definition of explanation itself is not clear in the literature. In particular, for the widely known Local Linear Explanations, there are qualitative proposals for the evaluation of explanations, although they suffer from theoretical inconsistencies. The case of image is even more problematic, where a visual explanation seems to explain a decision while detecting edges is what it really does. There are a large number of metrics in the literature specialized in quantitatively measuring different qualitative aspects so we should be able to develop metrics capable of measuring in a robust and correct way the desirable aspects of the explanations. In this paper, we propose a procedure called REVEL to evaluate different aspects concerning the quality of explanations with a theoretically coherent development. This procedure has several advances in the state of the art: it standardizes the concepts of explanation and develops a series of metrics not only to be able to compare between them but also to obtain absolute information regarding the explanation itself. The experiments have been carried out on image four datasets as benchmark where we show REVEL's descriptive and analytical power.
translated by 谷歌翻译
在本文中,我们对在表格数据的情况下进行了详尽的理论分析。我们证明,在较大的样本限制中,可以按照算法参数的函数以及与黑框模型相关的一些期望计算来计算表格石灰提供的可解释系数。当要解释的函数具有一些不错的代数结构(根据坐标的子集,线性,乘法或稀疏)时,我们的分析提供了对Lime提供的解释的有趣见解。这些可以应用于一系列机器学习模型,包括高斯内核或卡车随机森林。例如,对于线性函数,我们表明Lime具有理想的属性,可以提供与函数系数成正比的解释,以解释并忽略该函数未使用的坐标来解释。对于基于分区的回归器,另一方面,我们表明石灰会产生可能提供误导性解释的不希望的人工制品。
translated by 谷歌翻译
我们介绍了一个简单而直观的框架,该框架通过对输入特征重要性的概率评估来提供统计模型的定量解释。核心思想来自利用Dirichlet分布来定义输入功能的重要性,并通过大致贝叶斯推断学习。学到的重要性具有概率的解释,并提供了每个输入特征与模型输出的相对重要性,从而评估了对其重要性量化的信心。由于在解释上使用了Dirichlet分布,因此我们可以定义封闭形式的差异来衡量不同模型下所学到的重要性之间的相似性。我们利用这种差异来研究特征重要性的解释性权衡,并在现代机器学习中的基本概念(例如隐私和公平)中进行了折衷。此外,BIF可以在两个层面上工作:全局说明(所有数据实例中的特征重要性)和局部说明(每个数据实例的个人特征重要性)。考虑到表格数据集和图像数据集,我们显示了方法对各种合成和真实数据集的有效性。该代码可在https://github.com/kamadforge/featimp_dp上获得。
translated by 谷歌翻译
收购用于监督学习的标签可能很昂贵。为了提高神经网络回归的样本效率,我们研究了活跃的学习方法,这些方法可以适应地选择未标记的数据进行标记。我们提出了一个框架,用于从(与网络相关的)基础内核,内核转换和选择方法中构造此类方法。我们的框架涵盖了许多基于神经网络的高斯过程近似以及非乘式方法的现有贝叶斯方法。此外,我们建议用草图的有限宽度神经切线核代替常用的最后层特征,并将它们与一种新型的聚类方法结合在一起。为了评估不同的方法,我们引入了一个由15个大型表格回归数据集组成的开源基准。我们所提出的方法的表现优于我们的基准测试上的最新方法,缩放到大数据集,并在不调整网络体系结构或培训代码的情况下开箱即用。我们提供开源代码,包括所有内核,内核转换和选择方法的有效实现,并可用于复制我们的结果。
translated by 谷歌翻译
最先进的实体匹配(EM)方法很难解释,并且为EM带来可解释的AI具有重要的价值。不幸的是,大多数流行的解释性方法无法开箱即用,需要适应。在本文中,我们确定了将本地事后特征归因方法应用于实体匹配的三个挑战:跨记录的交互作用,不匹配的解释和灵敏度变化。我们提出了新颖的模型 - 静态和模式 - 富含模型的方法柠檬柠檬,该方法通过(i)产生双重解释来避免交叉记录的互动效果来应对所有三个挑战,(ii)介绍了归因潜力的新颖概念,以解释两个记录如何能够拥有如何具有匹配,(iii)自动选择解释粒度以匹配匹配器和记录对的灵敏度。公共数据集上的实验表明,所提出的方法更忠实于匹配器,并且在帮助用户了解匹配器的决策边界的工作中比以前的工作更具忠诚度。此外,用户研究表明,与标准的解释相比石灰的适应。
translated by 谷歌翻译
在过去的几年中,已经引入了许多基于输入数据扰动的解释方法,以提高我们对黑盒模型做出的决策的理解。这项工作的目的是引入一种新颖的扰动方案,以便可以获得更忠实和强大的解释。我们的研究重点是扰动方向对数据拓扑的影响。我们表明,在对离散的Gromov-Hausdorff距离的最坏情况分析以及通过持久的同源性的平均分析中,沿输入歧管的正交方向的扰动更好地保留了数据拓扑。从这些结果中,我们引入EMAP算法,实现正交扰动方案。我们的实验表明,EMAP不仅改善了解释者的性能,而且还可以帮助他们克服最近对基于扰动的方法的攻击。
translated by 谷歌翻译
尽管在最近的文献中提出了几种类型的事后解释方法(例如,特征归因方法),但在系统地以有效且透明的方式进行系统基准测试这些方法几乎没有工作。在这里,我们介绍了OpenXai,这是一个全面且可扩展的开源框架,用于评估和基准测试事后解释方法。 OpenXAI由以下关键组件组成:(i)灵活的合成数据生成器以及各种现实世界数据集,预训练的模型和最新功能属性方法的集合,(ii)开源实现22个定量指标,用于评估忠诚,稳定性(稳健性)和解释方法的公平性,以及(iii)有史以来第一个公共XAI XAI排行榜对基准解释。 OpenXAI很容易扩展,因为用户可以轻松地评估自定义说明方法并将其纳入我们的排行榜。总体而言,OpenXAI提供了一种自动化的端到端管道,该管道不仅简化并标准化了事后解释方法的评估,而且还促进了基准这些方法的透明度和可重复性。 OpenXAI数据集和数据加载程序,最先进的解释方法的实现和评估指标以及排行榜,可在https://open-xai.github.io/上公开获得。
translated by 谷歌翻译
超参数优化构成了典型的现代机器学习工作流程的很大一部分。这是由于这样一个事实,即机器学习方法和相应的预处理步骤通常只有在正确调整超参数时就会产生最佳性能。但是在许多应用中,我们不仅有兴趣仅仅为了预测精度而优化ML管道;确定最佳配置时,必须考虑其他指标或约束,从而导致多目标优化问题。由于缺乏知识和用于多目标超参数优化的知识和容易获得的软件实现,因此通常在实践中被忽略。在这项工作中,我们向读者介绍了多个客观超参数优化的基础知识,并激励其在应用ML中的实用性。此外,我们从进化算法和贝叶斯优化的领域提供了现有优化策略的广泛调查。我们说明了MOO在几个特定ML应用中的实用性,考虑了诸如操作条件,预测时间,稀疏,公平,可解释性和鲁棒性之类的目标。
translated by 谷歌翻译
Monumental advancements in artificial intelligence (AI) have lured the interest of doctors, lenders, judges, and other professionals. While these high-stakes decision-makers are optimistic about the technology, those familiar with AI systems are wary about the lack of transparency of its decision-making processes. Perturbation-based post hoc explainers offer a model agnostic means of interpreting these systems while only requiring query-level access. However, recent work demonstrates that these explainers can be fooled adversarially. This discovery has adverse implications for auditors, regulators, and other sentinels. With this in mind, several natural questions arise - how can we audit these black box systems? And how can we ascertain that the auditee is complying with the audit in good faith? In this work, we rigorously formalize this problem and devise a defense against adversarial attacks on perturbation-based explainers. We propose algorithms for the detection (CAD-Detect) and defense (CAD-Defend) of these attacks, which are aided by our novel conditional anomaly detection approach, KNN-CAD. We demonstrate that our approach successfully detects whether a black box system adversarially conceals its decision-making process and mitigates the adversarial attack on real-world data for the prevalent explainers, LIME and SHAP.
translated by 谷歌翻译
As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, we propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous. Using extensive evaluation with multiple real world datasets (including COMPAS), we demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases. CCS CONCEPTS• Computing methodologies → Machine learning; Supervised learning by classification; • Human-centered computing → Interactive systems and tools.
translated by 谷歌翻译