最先进的深度学习方法在许多任务上实现了类似人类的表现,但仍会犯错。用易于解释的术语表征这些错误,可以深入了解分类器是否容易出现系统错误,但也提供了一种行动和改善分类器的方法。我们建议发现与正确响应密切相关的那些特征值组合(即模式)。错误的预测,以获取任意分类器的全局和可解释的描述。我们证明这是更通用的标签描述问题的实例,我们根据最小描述长度原理提出了这一点。要发现一个良好的模式集,我们开发了有效的前提算法。通过大量的实验,我们表明它在合成数据和现实世界中的实践中表现出色。与现有的解决方案不同,即使在许多功能上的高度不平衡数据上,它也可以恢复地面真相模式。通过两个有关视觉问题答案和命名实体识别的案例研究,我们确认前提可以清楚且可行的见解对现代NLP分类器的系统错误。
translated by 谷歌翻译
我们介绍了强大的子组发现的问题,即,找到一个关于一个或多个目标属性的脱颖而出的子集的一组可解释的描述,2)是统计上的鲁棒,并且3)非冗余。许多尝试已经挖掘了局部强壮的子组或解决模式爆炸,但我们是第一个从全球建模角度同时解决这两个挑战的爆炸。首先,我们制定广泛的模型类别的子组列表,即订购的子组,可以组成的单次组和多变量目标,该目标可以由标称或数字变量组成,并且包括其定义中的传统Top-1子组发现。这种新颖的模型类允许我们使用最小描述长度(MDL)原理来形式地形化最佳强大的子组发现,在那里我们分别为标称和数字目标的最佳归一化最大可能性和贝叶斯编码而度假。其次,正如查找最佳子组列表都是NP-Hard,我们提出了SSD ++,一个贪婪的启发式,找到了很好的子组列表,并保证了根据MDL标准的最重要的子组在每次迭代中添加,这被显示为等同于贝叶斯一个样本比例,多项式或子组之间的多项式或T检验,以及数据集边际目标分布以及多假设检测罚款。我们经验上显示了54个数据集,即SSD ++优于先前的子组设置发现方法和子组列表大小。
translated by 谷歌翻译
众所周知,端到端的神经NLP体系结构很难理解,这引起了近年来为解释性建模的许多努力。模型解释的基本原则是忠诚,即,解释应准确地代表模型预测背后的推理过程。这项调查首先讨论了忠诚的定义和评估及其对解释性的意义。然后,我们通过将方法分为五类来介绍忠实解释的最新进展:相似性方法,模型内部结构的分析,基于反向传播的方法,反事实干预和自我解释模型。每个类别将通过其代表性研究,优势和缺点来说明。最后,我们从它们的共同美德和局限性方面讨论了上述所有方法,并反思未来的工作方向忠实的解释性。对于有兴趣研究可解释性的研究人员,这项调查将为该领域提供可访问且全面的概述,为进一步探索提供基础。对于希望更好地了解自己的模型的用户,该调查将是一项介绍性手册,帮助选择最合适的解释方法。
translated by 谷歌翻译
即使机器学习算法已经在数据科学中发挥了重要作用,但许多当前方法对输入数据提出了不现实的假设。由于不兼容的数据格式,或数据集中的异质,分层或完全缺少的数据片段,因此很难应用此类方法。作为解决方案,我们提出了一个用于样本表示,模型定义和培训的多功能,统一的框架,称为“ Hmill”。我们深入审查框架构建和扩展的机器学习的多个范围范式。从理论上讲,为HMILL的关键组件的设计合理,我们将通用近似定理的扩展显示到框架中实现的模型所实现的所有功能的集合。本文还包含有关我们实施中技术和绩效改进的详细讨论,该讨论将在MIT许可下发布供下载。该框架的主要资产是其灵活性,它可以通过相同的工具对不同的现实世界数据源进行建模。除了单独观察到每个对象的一组属性的标准设置外,我们解释了如何在框架中实现表示整个对象系统的图表中的消息推断。为了支持我们的主张,我们使用框架解决了网络安全域的三个不同问题。第一种用例涉及来自原始网络观察结果的IoT设备识别。在第二个问题中,我们研究了如何使用以有向图表示的操作系统的快照可以对恶意二进制文件进行分类。最后提供的示例是通过网络中实体之间建模域黑名单扩展的任务。在所有三个问题中,基于建议的框架的解决方案可实现与专业方法相当的性能。
translated by 谷歌翻译
Multi-label classification is becoming increasingly ubiquitous, but not much attention has been paid to interpretability. In this paper, we develop a multi-label classifier that can be represented as a concise set of simple "if-then" rules, and thus, it offers better interpretability compared to black-box models. Notably, our method is able to find a small set of relevant patterns that lead to accurate multi-label classification, while existing rule-based classifiers are myopic and wasteful in searching rules,requiring a large number of rules to achieve high accuracy. In particular, we formulate the problem of choosing multi-label rules to maximize a target function, which considers not only discrimination ability with respect to labels, but also diversity. Accounting for diversity helps to avoid redundancy, and thus, to control the number of rules in the solution set. To tackle the said maximization problem we propose a 2-approximation algorithm, which relies on a novel technique to sample high-quality rules. In addition to our theoretical analysis, we provide a thorough experimental evaluation, which indicates that our approach offers a trade-off between predictive performance and interpretability that is unmatched in previous work.
translated by 谷歌翻译
In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.
translated by 谷歌翻译
Originally, tangles were invented as an abstract tool in mathematical graph theory to prove the famous graph minor theorem. In this paper, we showcase the practical potential of tangles in machine learning applications. Given a collection of cuts of any dataset, tangles aggregate these cuts to point in the direction of a dense structure. As a result, a cluster is softly characterized by a set of consistent pointers. This highly flexible approach can solve clustering problems in various setups, ranging from questionnaires over community detection in graphs to clustering points in metric spaces. The output of our proposed framework is hierarchical and induces the notion of a soft dendrogram, which can help explore the cluster structure of a dataset. The computational complexity of aggregating the cuts is linear in the number of data points. Thus the bottleneck of the tangle approach is to generate the cuts, for which simple and fast algorithms form a sufficient basis. In our paper we construct the algorithmic framework for clustering with tangles, prove theoretical guarantees in various settings, and provide extensive simulations and use cases. Python code is available on github.
translated by 谷歌翻译
注释数据是用于培训和评估机器学习模型的自然语言处理中的重要成分。因此,注释具有高质量是非常理想的。但是,最近的工作表明,几个流行的数据集包含令人惊讶的注释错误或不一致之处。为了减轻此问题,多年来已经设计了许多注释错误检测方法。尽管研究人员表明他们的方法在新介绍的数据集上效果很好,但他们很少将其方法与以前的工作或同一数据集进行比较。这引起了人们对方法的一般表现的强烈关注,并且使他们的优势和劣势很难解决。因此,我们重新实现18种检测潜在注释错误的方法,并在9个英语数据集上对其进行评估,以进行文本分类以及令牌和跨度标签。此外,我们定义了统一的评估设置,包括注释错误检测任务,评估协议和一般最佳实践的新形式化。为了促进未来的研究和可重复性,我们将数据集和实施释放到易于使用和开源软件包中。
translated by 谷歌翻译
越来越多的工作已经认识到利用机器学习(ML)进步的重要性,以满足提取访问控制属性,策略挖掘,策略验证,访问决策等有效自动化的需求。在这项工作中,我们调查和总结了各种ML解决不同访问控制问题的方法。我们提出了ML模型在访问控制域中应用的新分类学。我们重点介绍当前的局限性和公开挑战,例如缺乏公共现实世界数据集,基于ML的访问控制系统的管理,了解黑盒ML模型的决策等,并列举未来的研究方向。
translated by 谷歌翻译
使用机器学习算法从未标记的文本中提取知识可能很复杂。文档分类和信息检索是两个应用程序,可以从无监督的学习(例如文本聚类和主题建模)中受益,包括探索性数据分析。但是,无监督的学习范式提出了可重复性问题。初始化可能会导致可变性,具体取决于机器学习算法。此外,关于群集几何形状,扭曲可能会产生误导。在原因中,异常值和异常的存在可能是决定因素。尽管初始化和异常问题与文本群集和主题建模相关,但作者并未找到对它们的深入分析。这项调查提供了这些亚地区的系统文献综述(2011-2022),并提出了共同的术语,因为类似的程序具有不同的术语。作者描述了研究机会,趋势和开放问题。附录总结了与审查的作品直接或间接相关的文本矢量化,分解和聚类算法的理论背景。
translated by 谷歌翻译
本次调查绘制了用于分析社交媒体数据的生成方法的研究状态的广泛的全景照片(Sota)。它填补了空白,因为现有的调查文章在其范围内或被约会。我们包括两个重要方面,目前正在挖掘和建模社交媒体的重要性:动态和网络。社会动态对于了解影响影响或疾病的传播,友谊的形成,友谊的形成等,另一方面,可以捕获各种复杂关系,提供额外的洞察力和识别否则将不会被注意的重要模式。
translated by 谷歌翻译
Realizing when a model is right for a wrong reason is not trivial and requires a significant effort by model developers. In some cases, an input salience method, which highlights the most important parts of the input, may reveal problematic reasoning. But scrutinizing highlights over many data instances is tedious and often infeasible. Furthermore, analyzing examples in isolation does not reveal general patterns in the data or in the model's behavior. In this paper we aim to address these issues and go from understanding single examples to understanding entire datasets and models. The methodology we propose is based on aggregated salience maps. Using this methodology we address multiple distinct but common model developer needs by showing how problematic data and model behavior can be identified -- a necessary first step for improving the model.
translated by 谷歌翻译
背景信息:在过去几年中,机器学习(ML)一直是许多创新的核心。然而,包括在所谓的“安全关键”系统中,例如汽车或航空的系统已经被证明是非常具有挑战性的,因为ML的范式转变为ML带来完全改变传统认证方法。目的:本文旨在阐明与ML为基础的安全关键系统认证有关的挑战,以及文献中提出的解决方案,以解决它们,回答问题的问题如何证明基于机器学习的安全关键系统?'方法:我们开展2015年至2020年至2020年之间发布的研究论文的系统文献综述(SLR),涵盖了与ML系统认证有关的主题。总共确定了217篇论文涵盖了主题,被认为是ML认证的主要支柱:鲁棒性,不确定性,解释性,验证,安全强化学习和直接认证。我们分析了每个子场的主要趋势和问题,并提取了提取的论文的总结。结果:单反结果突出了社区对该主题的热情,以及在数据集和模型类型方面缺乏多样性。它还强调需要进一步发展学术界和行业之间的联系,以加深域名研究。最后,它还说明了必须在上面提到的主要支柱之间建立连接的必要性,这些主要柱主要主要研究。结论:我们强调了目前部署的努力,以实现ML基于ML的软件系统,并讨论了一些未来的研究方向。
translated by 谷歌翻译
Label noise is an important issue in classification, with many potential negative consequences. For example, the accuracy of predictions may decrease, whereas the complexity of inferred models and the number of necessary training samples may increase. Many works in the literature have been devoted to the study of label noise and the development of techniques to deal with label noise. However, the field lacks a comprehensive survey on the different types of label noise, their consequences and the algorithms that consider label noise. This paper proposes to fill this gap. First, the definitions and sources of label noise are considered and a taxonomy of the types of label noise is proposed. Second, the potential consequences of label noise are discussed. Third, label noise-robust, label noise cleansing, and label noise-tolerant algorithms are reviewed. For each category of approaches, a short discussion is proposed to help the practitioner to choose the most suitable technique in its own particular field of application. Eventually, the design of experiments is also discussed, what may interest the researchers who would like to test their own algorithms. In this paper, label noise consists of mislabeled instances: no additional information is assumed to be available like e.g. confidences on labels.
translated by 谷歌翻译
对比模式挖掘(CPM)是数据挖掘的重要且流行的子场。传统的顺序模式无法描述不同类别数据之间的对比度信息,而涉及对比概念的对比模式可以描述不同对比条件下数据集之间的显着差异。根据该领域发表的论文数量,我们发现研究人员对CPM的兴趣仍然活跃。由于CPM有许多研究问题和研究方法。该领域的新研究人员很难在短时间内了解该领域的一般状况。因此,本文的目的是为对比模式挖掘的研究方向提供最新的全面概述。首先,我们对CPM提出了深入的理解,包括评估歧视能力的基本概念,类型,采矿策略和指标。然后,我们根据CPM方法根据其特征分类为基于边界的算法,基于树的算法,基于进化模糊的系统算法,基于决策树的算法和其他算法。此外,我们列出了这些方法的经典算法,并讨论它们的优势和缺点。提出了CPM中的高级主题。最后,我们通过讨论该领域的挑战和机遇来结束调查。
translated by 谷歌翻译
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
translated by 谷歌翻译
我们提出了一种可解释的关系提取方法,通过共同训练这两个目标来减轻概括和解释性之间的张力。我们的方法使用多任务学习体系结构,该体系结构共同训练分类器以进行关系提取,并在解释关系分类器的决策的关系中标记单词的序列模型。我们还将模型输出转换为规则,以将全局解释带入这种方法。使用混合策略对此序列模型进行训练:有监督,当可获得预先存在的模式的监督时,另外还要半监督。在后一种情况下,我们将序列模型的标签视为潜在变量,并学习最大化关系分类器性能的最佳分配。我们评估了两个数据集中的提议方法,并表明序列模型提供了标签,可作为关系分类器决策的准确解释,并且重要的是,联合培训通常可以改善关系分类器的性能。我们还评估了生成的规则的性能,并表明新规则是手动规则的重要附加功能,并使基于规则的系统更接近神经模型。
translated by 谷歌翻译
We are interested in understanding the underlying generation process for long sequences of symbolic events. To do so, we propose COSSU, an algorithm to mine small and meaningful sets of sequential rules. The rules are selected using an MDL-inspired criterion that favors compactness and relies on a novel rule-based encoding scheme for sequences. Our evaluation shows that COSSU can successfully retrieve relevant sets of closed sequential rules from a long sequence. Such rules constitute an interpretable model that exhibits competitive accuracy for the tasks of next-element prediction and classification.
translated by 谷歌翻译
无论是在功能选择的领域还是可解释的AI领域,都有基于其重要性的“排名”功能的愿望。然后可以将这种功能重要的排名用于:(1)减少数据集大小或(2)解释机器学习模型。但是,在文献中,这种特征排名没有以系统的,一致的方式评估。许多论文都有不同的方式来争论哪些具有重要性排名最佳的特征。本文通过提出一种新的评估方法来填补这一空白。通过使用合成数据集,可以事先知道特征重要性得分,从而可以进行更系统的评估。为了促进使用新方法的大规模实验,在Python建造了一个名为FSEVAL的基准测定框架。该框架允许并行运行实验,并在HPC系统上的计算机上分布。通过与名为“权重和偏见”的在线平台集成,可以在实时仪表板上进行交互探索图表。该软件作为开源软件发布,并在PYPI平台上以包裹发行。该研究结束时,探索了一个这样的大规模实验,以在许多方面找到参与算法的优势和劣势。
translated by 谷歌翻译
在NLP社区中有一个正在进行的辩论,无论现代语言模型是否包含语言知识,通过所谓的探针恢复。在本文中,我们研究了语言知识是否是现代语言模型良好表现的必要条件,我们称之为\ Texit {重新发现假设}。首先,我们展示了语言模型,这是显着压缩的,但在预先磨普目标上表现良好,以便在语言结构探讨时保持良好的分数。这一结果支持重新发现的假设,并导致我们的论文的第二款贡献:一个信息 - 理论框架,与语言建模目标相关。该框架还提供了测量语言信息对字词预测任务的影响的度量标准。我们通过英语综合和真正的NLP任务加固我们的分析结果。
translated by 谷歌翻译