Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
translated by 谷歌翻译
这是一门专门针对STEM学生开发的介绍性机器学习课程。我们的目标是为有兴趣的读者提供基础知识,以在自己的项目中使用机器学习,并将自己熟悉术语作为进一步阅读相关文献的基础。在这些讲义中,我们讨论受监督,无监督和强化学习。注释从没有神经网络的机器学习方法的说明开始,例如原理分析,T-SNE,聚类以及线性回归和线性分类器。我们继续介绍基本和先进的神经网络结构,例如密集的进料和常规神经网络,经常性的神经网络,受限的玻尔兹曼机器,(变性)自动编码器,生成的对抗性网络。讨论了潜在空间表示的解释性问题,并使用梦和对抗性攻击的例子。最后一部分致力于加强学习,我们在其中介绍了价值功能和政策学习的基本概念。
translated by 谷歌翻译
We propose a technique for producing 'visual explanations' for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable.Our approach -Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say 'dog' in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fullyconnected layers (e.g. VGG), (2) CNNs used for structured outputs (e.g. captioning), (3) CNNs used in tasks with multimodal inputs (e.g. visual question answering) or reinforcement learning, all without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative vi-
translated by 谷歌翻译
机器学习模型通常会遇到与训练分布不同的样本。无法识别分布(OOD)样本,因此将该样本分配给课堂标签会显着损害模​​型的可靠性。由于其对在开放世界中的安全部署模型的重要性,该问题引起了重大关注。由于对所有可能的未知分布进行建模的棘手性,检测OOD样品是具有挑战性的。迄今为止,一些研究领域解决了检测陌生样本的问题,包括异常检测,新颖性检测,一级学习,开放式识别识别和分布外检测。尽管有相似和共同的概念,但分别分布,开放式检测和异常检测已被独立研究。因此,这些研究途径尚未交叉授粉,创造了研究障碍。尽管某些调查打算概述这些方法,但它们似乎仅关注特定领域,而无需检查不同领域之间的关系。这项调查旨在在确定其共同点的同时,对各个领域的众多著名作品进行跨域和全面的审查。研究人员可以从不同领域的研究进展概述中受益,并协同发展未来的方法。此外,据我们所知,虽然进行异常检测或单级学习进行了调查,但没有关于分布外检测的全面或最新的调查,我们的调查可广泛涵盖。最后,有了统一的跨域视角,我们讨论并阐明了未来的研究线,打算将这些领域更加紧密地融为一体。
translated by 谷歌翻译
The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.
translated by 谷歌翻译
Future surveys such as the Legacy Survey of Space and Time (LSST) of the Vera C. Rubin Observatory will observe an order of magnitude more astrophysical transient events than any previous survey before. With this deluge of photometric data, it will be impossible for all such events to be classified by humans alone. Recent efforts have sought to leverage machine learning methods to tackle the challenge of astronomical transient classification, with ever improving success. Transformers are a recently developed deep learning architecture, first proposed for natural language processing, that have shown a great deal of recent success. In this work we develop a new transformer architecture, which uses multi-head self attention at its core, for general multi-variate time-series data. Furthermore, the proposed time-series transformer architecture supports the inclusion of an arbitrary number of additional features, while also offering interpretability. We apply the time-series transformer to the task of photometric classification, minimising the reliance of expert domain knowledge for feature selection, while achieving results comparable to state-of-the-art photometric classification methods. We achieve a logarithmic-loss of 0.507 on imbalanced data in a representative setting using data from the Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC). Moreover, we achieve a micro-averaged receiver operating characteristic area under curve of 0.98 and micro-averaged precision-recall area under curve of 0.87.
translated by 谷歌翻译
在图像分类任务中,深度神经网络通常是脆弱的,并且已知错误分类输入。虽然这些错误分类可能是不可避免的,但不能认为所有失败模式都是平等的。某些错误分类(例如,将狗的图像分类为飞机)可以困扰人类并导致系统中的人类信任丢失。更糟糕的是,这些错误(例如,被错误分类为灵长类动物的人)可以具有可憎的社会影响。因此,在这项工作中,我们的目标是降低差不可估量的错误。为了解决这一挑战,我们首先讨论获取捕获人类期望($ M ^ H $)的类级语义的方法,这是关于哪些类的语义关闭{\ EM与}。我们表明,对于流行的图像基准(如CiFar-10,CiFar-100,Imagenet),可以通过利用人类主题研究或公开的人类策划知识库来容易地获得类级语义。其次,我们建议使用加权损失函数(WLF)以惩罚其无法解释的错误分类。最后,我们表明培训(或微调)现有分类器具有所提出的方法,导致具有(1)的深度神经网络,具有相当的前1个精度,(2)在分销和外部的更具可扩展的故障模式 - 与现有工程相比,分布(ood)测试数据,(3)额外的人类标签的收集成本明显较低。
translated by 谷歌翻译
众所周知,端到端的神经NLP体系结构很难理解,这引起了近年来为解释性建模的许多努力。模型解释的基本原则是忠诚,即,解释应准确地代表模型预测背后的推理过程。这项调查首先讨论了忠诚的定义和评估及其对解释性的意义。然后,我们通过将方法分为五类来介绍忠实解释的最新进展:相似性方法,模型内部结构的分析,基于反向传播的方法,反事实干预和自我解释模型。每个类别将通过其代表性研究,优势和缺点来说明。最后,我们从它们的共同美德和局限性方面讨论了上述所有方法,并反思未来的工作方向忠实的解释性。对于有兴趣研究可解释性的研究人员,这项调查将为该领域提供可访问且全面的概述,为进一步探索提供基础。对于希望更好地了解自己的模型的用户,该调查将是一项介绍性手册,帮助选择最合适的解释方法。
translated by 谷歌翻译
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and topdown attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.
translated by 谷歌翻译
当前的自动驾驶汽车技术主要集中于将乘客从A点带到B。但是,已经证明乘客害怕乘坐自动驾驶汽车。减轻此问题的一种方法是允许乘客给汽车提供自然语言命令。但是,汽车可能会误解发布的命令或视觉环境,这可能导致不确定的情况。希望自动驾驶汽车检测到这些情况并与乘客互动以解决它们。本文提出了一个模型,该模型检测到命令时不确定的情况并找到引起该命令的视觉对象。可选地,包括描述不确定对象的系统生成的问题。我们认为,如果汽车可以以人类的方式解释这些物体,乘客就可以对汽车能力获得更多信心。因此,我们研究了如何(1)检测不确定的情况及其根本原因,以及(2)如何为乘客产生澄清的问题。在对Talk2CAR数据集进行评估时,我们表明所提出的模型\ acrfull {pipeline},改善\ gls {m:模棱两可 - absolute-Increse},与$ iou _ {.5} $相比,与不使用\ gls {pipeline {pipeline {pipeline { }。此外,我们设计了一个引用表达生成器(reg)\ acrfull {reg_model}量身定制的自动驾驶汽车设置,该设置可产生\ gls {m:流星伴侣} Meteor的相对改进,\ gls \ gls {m:rouge felative}}与最先进的REG模型相比,Rouge-L的速度快三倍。
translated by 谷歌翻译
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
translated by 谷歌翻译
对于使用高性能机器学习算法通常不透明的决策,人们越来越担心。用特定于领域的术语对推理过程的解释对于在医疗保健等风险敏感领域中采用至关重要。我们认为,机器学习算法应该可以通过设计来解释,并且表达这些解释的语言应与域和任务有关。因此,我们将模型的预测基于数据的用户定义和特定于任务的二进制函数,每个都对最终用户有明确的解释。然后,我们最大程度地减少了在任何给定输入上准确预测所需的预期查询数。由于解决方案通常是棘手的,因此在事先工作之后,我们根据信息增益顺序选择查询。但是,与以前的工作相反,我们不必假设查询在有条件地独立。取而代之的是,我们利用随机生成模型(VAE)和MCMC算法(未经调整的Langevin)来选择基于先前的查询 - 答案的输入的最有用的查询。这使得在线确定要解决预测歧义所需的任何深度的查询链。最后,关于视觉和NLP任务的实验证明了我们的方法的功效及其优越性比事后解释的优势。
translated by 谷歌翻译
我们介绍了几个新的数据集即想象的A / O和Imagenet-R以及合成环境和测试套件,我们称为CAOS。 Imagenet-A / O允许研究人员专注于想象成剩余的盲点。由于追踪稳健的表示,以特殊创建了ImageNet-R,因为表示不再简单地自然,而是包括艺术和其他演绎。 Caos Suite由Carla Simulator构建,允许包含异常物体,可以创建可重复的合成环境和用于测试稳健性的场景。所有数据集都是为测试鲁棒性和衡量鲁棒性的衡量进展而创建的。数据集已用于各种其他作品中,以衡量其具有鲁棒性的自身进步,并允许切向进展,这些进展不会完全关注自然准确性。鉴于这些数据集,我们创建了几种旨在推进鲁棒性研究的新方法。我们以最大Logit的形式和典型程度的形式构建简单的基线,并以深度的形式创建新的数据增强方法,从而提高上述基准。最大Logit考虑Logit值而不是SoftMax操作后的值,而微小的变化会产生明显的改进。典型程分将输出分布与类的后部分布进行比较。我们表明,除了分段任务之外,这将提高对基线的性能。猜测可能在像素级别,像素的语义信息比类级信息的语义信息不太有意义。最后,新的Deepaulment的新增强技术利用神经网络在彻底不同于先前使用的传统几何和相机的转换的图像上创建增强。
translated by 谷歌翻译
现有的方法用于隔离数据集中的硬群和虚假相关性通常需要人为干预。这可以使这些方法具有劳动密集型和特定于数据集的特定方式。为了解决这些缺点,我们提出了一种自动提炼模型故障模式的可扩展方法。具体而言,我们利用线性分类器来识别一致的误差模式,然后又诱导这些故障模式作为特征空间内的方向的自然表示。我们证明,该框架使我们能够发现并自动为培训数据集中的子群体提起挑战,并进行干预以改善模型对这些亚群的绩效。可在https://github.com/madrylab/failure-directions上找到代码
translated by 谷歌翻译
机器学习模型可能涉及决策边界,这些界限由于对规则和规则的更新而随时间而变化,例如在贷款批准或索赔管理中。然而,在这种情况下,可能需要足够的训练数据来累积时的时间,以便重新恢复模型以反映新的决策边界。虽然已经完成了加强现有决策边界的工作,但已经介绍了ML模型的决策边界应该改变的这些方案,以便反映新规则。在本文中,我们专注于用户提供的反馈规则作为加快ML模型更新过程的方式,我们正式介绍预处理训练数据的问题,以响应于反馈规则,使得模型一旦模型在预处理的数据上被培训,其决策边界与规则更紧密地对齐。为了解决这个问题,我们提出了一种新的数据增强方法,基于反馈规则的过采样技术。使用不同ML模型和现实世界数据集的广泛实验证明了该方法的有效性,特别是增强的好处和处理许多反馈规则的能力。
translated by 谷歌翻译
可解释的人工智能(XAI)的新兴领域旨在为当今强大但不透明的深度学习模型带来透明度。尽管本地XAI方法以归因图的形式解释了个体预测,从而确定了重要特征的发生位置(但没有提供有关其代表的信息),但全局解释技术可视化模型通常学会的编码的概念。因此,两种方法仅提供部分见解,并留下将模型推理解释的负担。只有少数当代技术旨在将本地和全球XAI背后的原则结合起来,以获取更多信息的解释。但是,这些方法通常仅限于特定的模型体系结构,或对培训制度或数据和标签可用性施加其他要求,这实际上使事后应用程序成为任意预训练的模型。在这项工作中,我们介绍了概念相关性传播方法(CRP)方法,该方法结合了XAI的本地和全球观点,因此允许回答“何处”和“ where”和“什么”问题,而没有其他约束。我们进一步介绍了相关性最大化的原则,以根据模型对模型的有用性找到代表性的示例。因此,我们提高了对激活最大化及其局限性的共同实践的依赖。我们证明了我们方法在各种环境中的能力,展示了概念相关性传播和相关性最大化导致了更加可解释的解释,并通过概念图表,概念组成分析和概念集合和概念子区和概念子区和概念子集和定量研究对模型的表示和推理提供了深刻的见解。它们在细粒度决策中的作用。
translated by 谷歌翻译
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.
translated by 谷歌翻译
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the chal-
translated by 谷歌翻译
以对象为中心的表示是通过提供柔性抽象可以在可以建立的灵活性抽象来实现更系统的推广的有希望的途径。最近的简单2D和3D数据集的工作表明,具有对象的归纳偏差的模型可以学习段,并代表单独的数据的统计结构中的有意义对象,而无需任何监督。然而,尽管使用越来越复杂的感应偏差(例如,用于场景的尺寸或3D几何形状),但这种完全无监督的方法仍然无法扩展到不同的现实数据。在本文中,我们采取了弱监督的方法,并专注于如何使用光流的形式的视频数据的时间动态,2)调节在简单的对象位置上的模型可以用于启用分段和跟踪对象在明显更现实的合成数据中。我们介绍了一个顺序扩展,以便引入我们训练的推出,我们训练用于预测现实看的合成场景的光流,并显示调节该模型的初始状态在一小组提示,例如第一帧中的物体的质量中心,是足以显着改善实例分割。这些福利超出了新型对象,新颖背景和更长的视频序列的培训分配。我们还发现,在推论期间可以使用这种初始状态调节作为对特定物体或物体部分的型号查询模型,这可能会为一系列弱监管方法铺平,并允许更有效的互动训练有素的型号。
translated by 谷歌翻译
Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012-achieving a mAP of 53.3%. Our approach combines two key insights:(1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at http://www.cs.berkeley.edu/ ˜rbg/rcnn.
translated by 谷歌翻译