深度学习模型往往不是由于依赖虚假特征来解决任务的依赖而不是分布的。反事实数据增强提供了一种(大约)实现伪造特征反事实的表示形式的一般方法,这是对分布(OOD)鲁棒性的要求。在这项工作中,我们表明,如果增强功能是由{\ em上下文估计机器}执行的,则反事实数据扩展可能无法实现所需的反事实不变性。我们从理论上分析了这种反事实数据增强所施加的不变性,并描述了一个示例性NLP任务,在这种情况下,通过上下文猜测机器的反事实数据增强并不会导致强大的OOD分类器。
translated by 谷歌翻译
当部署在与受过训练的域不同的域中时,机器学习方法可能是不可靠的。为了解决这个问题,我们可能希望学习以域不变性的数据表示,即我们保留跨域稳定但抛出虚假变化的部分的数据结构。这种类型有许多表示学习方法,包括基于数据增强,分配不变性和风险不变性的方法。不幸的是,当面对任何特定的现实世界转移时,目前尚不清楚这些方法中有哪些(如果有的话)可以正常工作。本文的目的是展示不同方法如何相互关系,并阐明各自预期成功的现实情况。关键工具是一个新的域转移概念,它依靠因果关系是不变的想法,但是非因果关系(例如,由于混杂而引起的)可能会有所不同。
translated by 谷歌翻译
非正式地,“虚假关联”是模型对分析师认为无关紧要的输入数据的某些方面的依赖性。在机器学习中,这些都有一个知道它 - 当你看到它的字符;例如,改变句子主题的性别改变了情绪预测因素的输出。要检查虚假相关性,我们可以通过对输入数据的无关部分进行扰动并查看模型预测变化来“压力测试”模型。在本文中,我们使用因果推断的工具研究压力测试。我们将反事实不变性介绍,作为一个正式化的要求,即改变输入不相关的部分不应改变模型预测。我们将反事实不变性与域外模型性能进行连接,并提供用于学习(大约)反事实不变预测器的实用方案(无需访问反事实示例)。事实证明,反事实不变性的手段和含义都基本上取决于数据的真实潜在的因果结构 - 特别是标签是否导致特征或功能导致标签。不同的因果结构需要不同的正则化方案,以诱导反事实不变性。同样,反事实不变性暗示不同的域移位保证,具体取决于底层的因果结构。该理论是通过文本分类的经验结果支持。
translated by 谷歌翻译
最近的学习不变(因果)特征(OOD)概括最近引起了广泛的关注,在建议中不变风险最小化(IRM)(Arjovsky等,2019)是一个显着的解决方案。尽管其对线性回归的理论希望,但在线性分类问题中使用IRM的挑战仍然存在(Rosenfeld等,2020; Nagarajan等,2021)。沿着这一行,最近的一项研究(Arjovsky等人,2019年)迈出了第一步,并提出了基于信息瓶颈的不变风险最小化的学习原理(IB-imm)。在本文中,我们首先表明(Arjovsky等人,2019年)使用不变特征的支持重叠的关键假设对于保证OOD泛化是相当强大的,并且在没有这种假设的情况下仍然可以实现最佳解决方案。为了进一步回答IB-IRM是否足以在线性分类问题中学习不变特征的问题,我们表明IB-IRM在两种情况下仍将失败,无论是否不变功能捕获有关标签的所有信息。为了解决此类失败,我们提出了一个\ textit {基于反事实的信息瓶颈(CSIB)}学习算法,该算法可恢复不变的功能。即使从单个环境访问数据时,提出的算法也可以工作,并且在理论上对二进制和多类问题都具有一致的结果。我们对三个合成数据集进行了经验实验,以验证我们提出的方法的功效。
translated by 谷歌翻译
科学研究的基本目标是了解因果关系。然而,尽管因果关系在生活和社会科学中的重要作用,但在自然语言处理(NLP)中并不具有相同的重要性,而自然语言处理(NLP)传统上更加重视预测任务。这种区别开始逐渐消失,随着因果推理和语言处理的融合,跨学科研究的新兴领域。尽管如此,关于NLP因果关系的研究仍然散布在没有统一的定义,基准数据集的情况下,并清楚地表达了将因果推论应用于文本领域的挑战和机遇,并具有其独特的属性。在这项调查中,我们巩固了整个学术领域的研究,并将其置于更广泛的NLP景观中。我们介绍了用文本估算因果效应的统计挑战,其中包含文本用作结果,治疗或解决混杂问题的设置。此外,我们探讨了因果推理的潜在用途,以提高NLP模型的鲁棒性,公平性和解释性。因此,我们提供了NLP社区因果推断的统一概述。
translated by 谷歌翻译
Machine learning models rely on various assumptions to attain high accuracy. One of the preliminary assumptions of these models is the independent and identical distribution, which suggests that the train and test data are sampled from the same distribution. However, this assumption seldom holds in the real world due to distribution shifts. As a result models that rely on this assumption exhibit poor generalization capabilities. Over the recent years, dedicated efforts have been made to improve the generalization capabilities of these models collectively known as -- \textit{domain generalization methods}. The primary idea behind these methods is to identify stable features or mechanisms that remain invariant across the different distributions. Many generalization approaches employ causal theories to describe invariance since causality and invariance are inextricably intertwined. However, current surveys deal with the causality-aware domain generalization methods on a very high-level. Furthermore, we argue that it is possible to categorize the methods based on how causality is leveraged in that method and in which part of the model pipeline is it used. To this end, we categorize the causal domain generalization methods into three categories, namely, (i) Invariance via Causal Data Augmentation methods which are applied during the data pre-processing stage, (ii) Invariance via Causal representation learning methods that are utilized during the representation learning stage, and (iii) Invariance via Transferring Causal mechanisms methods that are applied during the classification stage of the pipeline. Furthermore, this survey includes in-depth insights into benchmark datasets and code repositories for domain generalization methods. We conclude the survey with insights and discussions on future directions.
translated by 谷歌翻译
A machine learning model, under the influence of observed or unobserved confounders in the training data, can learn spurious correlations and fail to generalize when deployed. For image classifiers, augmenting a training dataset using counterfactual examples has been empirically shown to break spurious correlations. However, the counterfactual generation task itself becomes more difficult as the level of confounding increases. Existing methods for counterfactual generation under confounding consider a fixed set of interventions (e.g., texture, rotation) and are not flexible enough to capture diverse data-generating processes. Given a causal generative process, we formally characterize the adverse effects of confounding on any downstream tasks and show that the correlation between generative factors (attributes) can be used to quantitatively measure confounding between generative factors. To minimize such correlation, we propose a counterfactual generation method that learns to modify the value of any attribute in an image and generate new images given a set of observed attributes, even when the dataset is highly confounded. These counterfactual images are then used to regularize the downstream classifier such that the learned representations are the same across various generative factors conditioned on the class label. Our method is computationally efficient, simple to implement, and works well for any number of generative factors and confounding variables. Our experimental results on both synthetic (MNIST variants) and real-world (CelebA) datasets show the usefulness of our approach.
translated by 谷歌翻译
Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid perpetuating or creating discriminatory practices. In this paper, we develop a framework for modeling fairness using tools from causal inference. Our definition of counterfactual fairness captures the intuition that a decision is fair towards an individual if it is the same in (a) the actual world and (b) a counterfactual world where the individual belonged to a different demographic group. We demonstrate our framework on a real-world problem of fair prediction of success in law school. * Equal contribution. This work was done while JL was a Research Fellow at the Alan Turing Institute. 2 https://obamawhitehouse.archives.gov/blog/2016/05/04/big-risks-big-opportunities-intersection-big-dataand-civil-rights 31st Conference on Neural Information Processing Systems (NIPS 2017),
translated by 谷歌翻译
我们提出了一种学习在某些协变量反事实变化下不变的预测因子的方法。当预测目标受到不应影响预测因子输出的协变量影响时,此方法很有用。例如,对象识别模型可能会受到对象本身的位置,方向或比例的影响。我们解决了训练预测因素的问题,这些预测因素明确反对反对这种协变量的变化。我们提出了一个基于条件内核均值嵌入的模型不合稳定项,以在训练过程中实现反事实的不变性。我们证明了我们的方法的健全性,可以处理混合的分类和连续多变量属性。关于合成和现实世界数据的经验结果证明了我们方法在各种环境中的功效。
translated by 谷歌翻译
尽管在情感分析方面取得了巨大的成功,但现有的神经模型在隐式情感分析中挣扎。这可能是由于它们可能会锁定虚假的相关性(例如,“捷径”,例如,仅关注明确的情感词),从而破坏了学习模型的有效性和鲁棒性。在这项工作中,我们提出了一种使用仪器变量(ISAIV)的因果干预模型,用于隐式情感分析。我们首先从因果角度审查情感分析,并分析此任务中存在的混杂因素。然后,我们引入了一个仪器变量,以消除混杂的因果效应,从而在句子和情感之间提取纯粹的因果效应。我们将所提出的ISAIV模型与几个强大的基线进行比较,同时是一般的隐式情感分析和基于方面的隐式情感分析任务。结果表明我们模型的巨大优势以及隐性情感推理的功效。
translated by 谷歌翻译
能够从专家推断出第二意见的自动决策支持系统可能有助于更有效地分配资源;他们可以帮助决定何时何地寻求第二意见。在本文中,我们从反事实推断的角度研究了这种类型的支持系统的设计。我们专注于多类分类设置,并首先表明,如果专家自行做出预测,那么产生其预测的基本因果机制就需要满足理想的设定不变属性。此外,我们表明,对于满足该特性的任何因果机制,存在一种等效机制,其中每个专家的预测是由由共同噪声控制的独立亚机制产生的。这激发了设定不变的gumbel-max结构因果模型的设计,其中管理模型的亚机制的噪声结构取决于专家之间相似性的直觉概念,可以从数据估算。合成数据和真实数据的实验表明,我们的模型可用于比其非伴侣对应物更准确地推断第二个意见。
translated by 谷歌翻译
Recent studies have shown the impressive efficacy of counterfactually augmented data (CAD) for reducing NLU models' reliance on spurious features and improving their generalizability. However, current methods still heavily rely on human efforts or task-specific designs to generate counterfactuals, thereby impeding CAD's applicability to a broad range of NLU tasks. In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework. AutoCAD first leverages a classifier to unsupervisedly identify rationales as spans to be intervened, which disentangles spurious and causal features. Then, AutoCAD performs controllable generation enhanced by unlikelihood training to produce diverse counterfactuals. Extensive evaluations on multiple out-of-domain and challenge benchmarks demonstrate that AutoCAD consistently and significantly boosts the out-of-distribution performance of powerful pre-trained models across different NLU tasks, which is comparable or even better than previous state-of-the-art human-in-the-loop or task-specific CAD methods. The code is publicly available at https://github.com/thu-coai/AutoCAD.
translated by 谷歌翻译
反事实推断是一种强大的工具,能够解决备受瞩目的领域中具有挑战性的问题。要进行反事实推断,需要了解潜在的因果机制。但是,仅凭观察和干预措施就不能独特地确定因果机制。这就提出了一个问题,即如何选择因果机制,以便在给定领域中值得信赖。在具有二进制变量的因果模型中已经解决了这个问题,但是分类变量的情况仍未得到解答。我们通过为具有分类变量的因果模型引入反事实排序的概念来应对这一挑战。为了学习满足这些约束的因果机制,并对它们进行反事实推断,我们引入了深层双胞胎网络。这些是深层神经网络,在受过训练的情况下,可以进行双网络反事实推断 - 一种替代绑架,动作和预测方法的替代方法。我们从经验上测试了来自医学,流行病学和金融的多种现实世界和半合成数据的方法,并报告了反事实概率的准确估算,同时证明了反事实订购时不执行反事实的问题。
translated by 谷歌翻译
Graph machine learning has been extensively studied in both academia and industry. Although booming with a vast number of emerging methods and techniques, most of the literature is built on the in-distribution hypothesis, i.e., testing and training graph data are identically distributed. However, this in-distribution hypothesis can hardly be satisfied in many real-world graph scenarios where the model performance substantially degrades when there exist distribution shifts between testing and training graph data. To solve this critical problem, out-of-distribution (OOD) generalization on graphs, which goes beyond the in-distribution hypothesis, has made great progress and attracted ever-increasing attention from the research community. In this paper, we comprehensively survey OOD generalization on graphs and present a detailed review of recent advances in this area. First, we provide a formal problem definition of OOD generalization on graphs. Second, we categorize existing methods into three classes from conceptually different perspectives, i.e., data, model, and learning strategy, based on their positions in the graph machine learning pipeline, followed by detailed discussions for each category. We also review the theories related to OOD generalization on graphs and introduce the commonly used graph datasets for thorough evaluations. Finally, we share our insights on future research directions. This paper is the first systematic and comprehensive review of OOD generalization on graphs, to the best of our knowledge.
translated by 谷歌翻译
现实世界的分类问题必须与域移位竞争,该域移动是部署模型的域之间的(潜在)不匹配以及收集训练数据的域。处理此类问题的方法必须指定域之间哪种结构与什么变化。一个自然的假设是,因果关系(结构)关系在所有领域都是不变的。然后,很容易学习仅取决于其因果父母的标签$ y $的预测指标。但是,许多现实世界中的问题是“反农产品”,因为$ y $是协变量$ x $的原因 - 在这种情况下,$ y $没有因果父母,而天真的因果不变性是没有用的。在本文中,我们研究了在特定的域转移概念下的表示形式学习,该概念既尊重因果不变性又自然处理“反毒物”结构。我们展示了如何利用域的共享因果结构来学习一个表示不变预测因子的表示,并且还允许在新域中快速适应。关键是将因果假设转化为学习原理,这些学习原理“不变”和“不稳定”特征。关于合成数据和现实世界数据的实验证明了所提出的学习算法的有效性。代码可在https://github.com/ybjiaang/actir上找到。
translated by 谷歌翻译
分配转移和公平性的稳健性已独立地成为现代机器学习模型所需的两个重要的逃亡者。尽管这两个Desiderata似乎相关,但它们之间的联系通常不清楚。在这里,我们通过因果镜头讨论这些连接,重点是反作用预测任务,其中假定分类器的输入(例如,图像)是作为目标标签和受保护属性的函数生成的。通过采用这一观点,我们在共同的公平标准(分离)和鲁棒性 - 风险不变性的概念之间达到明确的联系。这些连接为在反疗法环境中应用分离标准提供了新的动机,并为关于公平性绩效折衷的旧讨论提供了信息。此外,我们的发现表明,鲁棒性动机的方法可用于强制执行分离,并且它们在实践中通常比旨在直接强制执行分离的方法更好。使用医学数据集,我们从经验上验证了关于检测X射线肺炎的任务的发现,在这种情况下,性别群体的患病率差异激发了公平性缓解。我们的发现突出了选择和执行公平标准时考虑因果结构的重要性。
translated by 谷歌翻译
基于AI和机器学习的决策系统已在各种现实世界中都使用,包括医疗保健,执法,教育和金融。不再是牵强的,即设想一个未来,自治系统将推动整个业务决策,并且更广泛地支持大规模决策基础设施以解决社会最具挑战性的问题。当人类做出决定时,不公平和歧视的问题普遍存在,并且当使用几乎没有透明度,问责制和公平性的机器做出决定时(或可能会放大)。在本文中,我们介绍了\ textit {Causal公平分析}的框架,目的是填补此差距,即理解,建模,并可能解决决策设置中的公平性问题。我们方法的主要见解是将观察到数据中存在的差异的量化与基本且通常是未观察到的因果机制收集的因果机制的收集,这些机制首先会产生差异,挑战我们称之为因果公平的基本问题分析(FPCFA)。为了解决FPCFA,我们研究了分解差异和公平性的经验度量的问题,将这种变化归因于结构机制和人群的不同单位。我们的努力最终达到了公平地图,这是组织和解释文献中不同标准之间关系的首次系统尝试。最后,我们研究了进行因果公平分析并提出一本公平食谱的最低因果假设,该假设使数据科学家能够评估不同影响和不同治疗的存在。
translated by 谷歌翻译
由于NLP模型实现了基准测试的最先进的性能并获得了广泛的应用程序,因此确保在现实世界中的安全部署这些模型的安全部署,例如,确保模型对未经调用或具有挑战性的情景稳健。尽管具有越来越多的学习主题,但它在视觉和NLP等应用中分别探讨了,具有多种研究中的各种定义,评估和缓解策略。在本文中,我们的目标是提供对如何定义,测量和提高NLP鲁棒性的统一调查。我们首先连接多种稳健性的定义,然后统一各种各样的工作方面识别稳健性失败和评估模型的鲁棒性。相应地,我们呈现了数据驱动,模型驱动和基于归纳的缓解策略,具有如何有效地改善NLP模型中的鲁棒性的更系统的观点。最后,我们通过概述开放的挑战和未来方向来促进在这一领域的进一步研究。
translated by 谷歌翻译
Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if crowdsourced and is computationally expensive to extend to new perturbation types if generated using supervised methods. To address this, we introduce a new framework called DISCO for automatically generating high-quality counterfactual data at scale. DISCO engineers prompts to generate phrasal perturbations with a large general language model. Then, a task-specific teacher model filters the generation to distill high-quality counterfactual data. We show that learning with this counterfactual data yields a comparatively small student model that is 6% (absolute) more robust and generalizes 5% better across distributions than baselines on various challenging evaluations. This model is also 15% more sensitive in differentiating original and counterfactual examples, on three evaluation sets written by human workers and via human-AI collaboration.
translated by 谷歌翻译
创建什么故事需要推理关于先前陈述以及变更条件的可能结果。人们可以在新条件下轻松生成连贯的结局,但目前系统会对原始故事进行最小的变化来挑战。因此,一个主要挑战是生成逻辑故事和用最小编辑重写之间的权衡。在本文中,我们提出了教育,这是一种基于编辑的无预测方法,用于反复重写。教育包括基于估计在线条件的因果效果的目标位置检测策略,这使故事的因果不变部分。 Bowat然后在流畅,一致性和最小编辑约束下生成故事。我们还提出了一种新的指标来缓解当前自动指标的缺点,更好地评估权衡。我们评估公共反事故事重写基准的教育。实验表明,教育根据自动和人类评估,达到了无监督的SOTA方法的最佳权衡。教育资源可用于:https://github.com/jiangjiechen/educat。
translated by 谷歌翻译