我们介绍了Dowhy Python库的扩展Dowhy-GCM,该库利用图形因果模型。与现有的因果关系库(主要关注效应估计问题)不同,使用Dowhy-GCM,用户可以提出各种其他因果问题,例如确定异常值的根本原因和分布变化的根本原因,因果结构学习,归因于因果关系,以及因果影响,以及归因于因果关系因果结构的诊断。为此,Dowhy-GCM用户通过图形因果模型在研究系统中的变量之间的首次模型导致关系效果关系,符合接下来变量的因果机制,然后提出因果问题。所有这些步骤仅在Dowhy-GCM中采用几行代码。该库可在https://github.com/py-why/dowhy上找到。
translated by 谷歌翻译
因果关系是理解世界的科学努力的基本组成部分。不幸的是,在心理学和社会科学中,因果关系仍然是禁忌。由于越来越多的建议采用因果方法进行研究的重要性,我们重新制定了心理学研究方法的典型方法,以使不可避免的因果理论与其余的研究渠道协调。我们提出了一个新的过程,该过程始于从因果发现和机器学习的融合中纳入技术的发展,验证和透明的理论形式规范。然后,我们提出将完全指定的理论模型的复杂性降低到与给定目标假设相关的基本子模型中的方法。从这里,我们确定利息量是否可以从数据中估算出来,如果是的,则建议使用半参数机器学习方法来估计因果关系。总体目标是介绍新的研究管道,该管道可以(a)促进与测试因果理论的愿望兼容的科学询问(b)鼓励我们的理论透明代表作为明确的数学对象,(c)将我们的统计模型绑定到我们的统计模型中该理论的特定属性,因此减少了理论到模型间隙通常引起的规范不足问题,以及(d)产生因果关系和可重复性的结果和估计。通过具有现实世界数据的教学示例来证明该过程,我们以摘要和讨论来结论。
translated by 谷歌翻译
我们将定量探测作为模型 - 非局部框架,用于在存在定量域知识的情况下验证因果模型。该方法被构造为基于相关的机器学习中火车/测试拆分的类似物,并增强了与科学发现逻辑一致的当前因果验证策略。在进行彻底基于模拟的研究之前,使用Pearl的洒水示例说明了该方法的有效性。通过研究示例性失败方案来识别该技术的限制,这些方案还用于提出一系列主题,以供未来的研究和改进定量探测的版本。在两个单独的开源python软件包中提供了将定量探测的代码以及基于模拟的定量探测有效性的基于仿真的研究的代码。
translated by 谷歌翻译
因果关系的概念具有争议的历史。是否有可能代表和解决具有概率理论的因果问题的问题,或者如果需要大大新的数学,则需要进行热争论,例如,需要进行DO微积分。珍珠(2001年)国家“我们科学和日常知识的建筑块是”泥浆并没有引起雨“,”症状不会引起疾病“以及这些事实,奇怪的是,不能在概率的词汇表中表达结石”。这导致因果图形建模和DO微积分的主张与应用贝叶斯方法的研究人员之间的二分法。在本文中,我们证明,如果我们这样做,虽然明确地模拟了干预系统中的影响的假设,但是可以完全在标准贝叶斯范式内完成估算因果效应。底层原因图形模型的不变假设可以在普通概率图形模型中编码,允许与贝叶斯统计数据的因果估计,相当于DO微积分。阐明这些方法之间的连接是使每个接近能够组合以解决实际问题的关键步骤。
translated by 谷歌翻译
In this review, we discuss approaches for learning causal structure from data, also called causal discovery. In particular, we focus on approaches for learning directed acyclic graphs (DAGs) and various generalizations which allow for some variables to be unobserved in the available data. We devote special attention to two fundamental combinatorial aspects of causal structure learning. First, we discuss the structure of the search space over causal graphs. Second, we discuss the structure of equivalence classes over causal graphs, i.e., sets of graphs which represent what can be learned from observational data alone, and how these equivalence classes can be refined by adding interventional data.
translated by 谷歌翻译
发现新药是寻求并证明因果关系。作为一种新兴方法利用人类的知识和创造力,数据和机器智能,因果推论具有减少认知偏见并改善药物发现决策的希望。尽管它已经在整个价值链中应用了,但因子推理的概念和实践对许多从业者来说仍然晦涩难懂。本文提供了有关因果推理的非技术介绍,审查了其最新应用,并讨论了在药物发现和开发中采用因果语言的机会和挑战。
translated by 谷歌翻译
我们的许多实验旨在发现数据生成机制(即现象)背后的原因和效果。最重要的是,阐明一个模型,该模型可以使我们能够进一步探索手头上的现象和/或允许我们准确预测它。从根本上讲,这种模型可能是通过因果方法来得出的(与观察或经验平均值相反)。在这种方法中,需要因果发现来创建因果模型,然后可以应用该因果模型来推断干预措施的影响,并回答我们可能拥有的任何假设问题(即以什么IFS的形式)。本文为因果发现和因果推断提供了一个案例,并与传统的机器学习方法进行了对比。都是从公民和结构工程的角度来看。更具体地说,本文概述了因果关系的关键原理以及因果发现和因果推断的最常用算法和包。最后,本文还提出了一系列示例和案例研究,介绍了如何为我们的领域采用因果概念。
translated by 谷歌翻译
不依赖虚假相关性的学习预测因素涉及建立因果关系。但是,学习这样的表示非常具有挑战性。因此,我们制定了从高维数据中学习因果表示的问题,并通过合成数据研究因果恢复。这项工作引入了贝叶斯因果发现的潜在变量解码器模型BCD,并在轻度监督和无监督的环境中进行实验。我们提出了一系列合成实验,以表征因果发现的重要因素,并表明将已知的干预靶标用作标签有助于无监督的贝叶斯推断,对线性高斯添加噪声潜在结构性因果模型的结构和参数。
translated by 谷歌翻译
我们提出了一个新的因果贡献的概念,它描述了在DAG中目标节点上的节点的“内在”部分。我们显示,在某些情况下,现有的因果量化方法无法完全捕获此概念。通过以上游噪声术语递归地将每个节点写入每个节点,我们将每个节点添加的内部信息分开从其祖先所获得的每个节点添加的内部信息。要将内在信息解释为因果贡献,我们考虑“结构保留干预”,该介绍每个节点随机化,以一种模仿通常依赖父母的方式,也不会扰乱观察到的联合分布。为了获得跨越节点的任意排序的措施,我们提出了基于福利的对称化。我们描述了对方差和熵的贡献分析,但可以类似地定义对其他目标度量的贡献。
translated by 谷歌翻译
Learning causal structure from observational data often assumes that we observe independent and identically distributed (i.\,i.\,d) data. The traditional approach aims to find a graphical representation that encodes the same set of conditional independence relationships as those present in the observed distribution. It is known that under i.\,i.\,d assumption, even with infinite data, there is a limit to how fine-grained a causal structure we can identify. To overcome this limitation, recent work has explored using data originating from different, related environments to learn richer causal structure. These approaches implicitly rely on the independent causal mechanisms (ICM) principle, which postulates that the mechanism giving rise to an effect given its causes and the mechanism which generates the causes do not inform or influence each other. Thus, components of the causal model can independently change from environment to environment. Despite its wide application in machine learning and causal inference, there is a lack of statistical formalization of the ICM principle and how it enables identification of richer causal structures from grouped data. Here we present new causal de Finetti theorems which offer a first statistical formalization of ICM principle and show how causal structure identification is possible from exchangeable data. Our work provides theoretical justification for a broad range of techniques leveraging multi-environment data to learn causal structure.
translated by 谷歌翻译
Neurally-parameterized Structural Causal Models in the Pearlian notion to causality, referred to as NCM, were recently introduced as a step towards next-generation learning systems. However, said NCM are only concerned with the learning aspect of causal inference but totally miss out on the architecture aspect. That is, actual causal inference within NCM is intractable in that the NCM won't return an answer to a query in polynomial time. This insight follows as corollary to the more general statement on the intractability of arbitrary SCM parameterizations, which we prove in this work through classical 3-SAT reduction. Since future learning algorithms will be required to deal with both high dimensional data and highly complex mechanisms governing the data, we ultimately believe work on tractable inference for causality to be decisive. We also show that not all ``causal'' models are created equal. More specifically, there are models capable of answering causal queries that are not SCM, which we refer to as \emph{partially causal models} (PCM). We provide a tabular taxonomy in terms of tractability properties for all of the different model families, namely correlation-based, PCM and SCM. To conclude our work, we also provide some initial ideas on how to overcome parts of the intractability of causal inference with SCM by showing an example of how parameterizing an SCM with SPN modules can at least allow for tractable mechanisms. We hope that our impossibility result alongside the taxonomy for tractability in causal models can raise awareness for this novel research direction since achieving success with causality in real world downstream tasks will not only depend on learning correct models as we also require having the practical ability to gain access to model inferences.
translated by 谷歌翻译
反事实推断是一种强大的工具,能够解决备受瞩目的领域中具有挑战性的问题。要进行反事实推断,需要了解潜在的因果机制。但是,仅凭观察和干预措施就不能独特地确定因果机制。这就提出了一个问题,即如何选择因果机制,以便在给定领域中值得信赖。在具有二进制变量的因果模型中已经解决了这个问题,但是分类变量的情况仍未得到解答。我们通过为具有分类变量的因果模型引入反事实排序的概念来应对这一挑战。为了学习满足这些约束的因果机制,并对它们进行反事实推断,我们引入了深层双胞胎网络。这些是深层神经网络,在受过训练的情况下,可以进行双网络反事实推断 - 一种替代绑架,动作和预测方法的替代方法。我们从经验上测试了来自医学,流行病学和金融的多种现实世界和半合成数据的方法,并报告了反事实概率的准确估算,同时证明了反事实订购时不执行反事实的问题。
translated by 谷歌翻译
This review presents empirical researchers with recent advances in causal inference, and stresses the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called "causal effects" or "policy evaluation") (2) queries about probabilities of counterfactuals, (including assessment of "regret," "attribution" or "causes of effects") and (3) queries about direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.
translated by 谷歌翻译
因果推断对于跨业务参与,医疗和政策制定等领域的数据驱动决策至关重要。然而,关于因果发现的研究已经与推理方法分开发展,从而阻止了两个领域方法的直接组合。在这项工作中,我们开发了深层端到端因果推理(DECI),这是一种基于流动的非线性添加噪声模型,该模型具有观察数据,并且可以执行因果发现和推理,包括有条件的平均治疗效果(CATE) )估计。我们提供了理论上的保证,即DECI可以根据标准因果发现假设恢复地面真实因果图。受应用影响的激励,我们将该模型扩展到具有缺失值的异质,混合型数据,从而允许连续和离散的治疗决策。我们的结果表明,与因果发现的相关基线相比,DECI的竞争性能和(c)在合成数据集和因果机器学习基准测试基准的一千多个实验中,跨数据类型和缺失水平进行了估计。
translated by 谷歌翻译
贝叶斯结构学习允许人们对负责生成给定数据的因果定向无环图(DAG)捕获不确定性。在这项工作中,我们提出了结构学习(信任)的可疗法不确定性,这是近似后推理的框架,依赖于概率回路作为我们后验信仰的表示。与基于样本的后近似值相反,我们的表示可以捕获一个更丰富的DAG空间,同时也能够通过一系列有用的推理查询来仔细地理解不确定性。我们从经验上展示了如何将概率回路用作结构学习方法的增强表示,从而改善了推断结构和后部不确定性的质量。有条件查询的实验结果进一步证明了信任的表示能力的实际实用性。
translated by 谷歌翻译
$ \ texttt {gcastle} $是一个端到端Python工具箱,用于因果结构学习。它提供了从模拟器或现实世界数据集的生成数据,从数据学习因果结构的功能,以及评估学到的图表,以及有用的实践,例如先验知识插入,初步邻域选择和后处理以删除错误发现。与相关包相比,$ \ texttt {gcastle} $包括许多最近开发的基于渐变的因果发现方法,具有可选的GPU加速。$ \ texttt {gcastle} $为可以直接尝试代码以及具有图形用户干扰的从业者来为研究人员提供方便。当前版本也提供了电信中的三个现实世界数据集。$ \ texttt {gcastle} $可在Apache许可证2.0下获得\ url {https://github.com/huawei-noah/trustworthyai/tree/master/gcastle}。
translated by 谷歌翻译
Bayesian causal structure learning aims to learn a posterior distribution over directed acyclic graphs (DAGs), and the mechanisms that define the relationship between parent and child variables. By taking a Bayesian approach, it is possible to reason about the uncertainty of the causal model. The notion of modelling the uncertainty over models is particularly crucial for causal structure learning since the model could be unidentifiable when given only a finite amount of observational data. In this paper, we introduce a novel method to jointly learn the structure and mechanisms of the causal model using Variational Bayes, which we call Variational Bayes-DAG-GFlowNet (VBG). We extend the method of Bayesian causal structure learning using GFlowNets to learn not only the posterior distribution over the structure, but also the parameters of a linear-Gaussian model. Our results on simulated data suggest that VBG is competitive against several baselines in modelling the posterior over DAGs and mechanisms, while offering several advantages over existing methods, including the guarantee to sample acyclic graphs, and the flexibility to generalize to non-linear causal mechanisms.
translated by 谷歌翻译
因果发现是一项主要任务,对于机器学习至关重要,因为因果结构可以使模型超越基于纯粹的相关推理并显着提高其性能。但是,从数据中找到因果结构在计算工作和准确性方面都构成了重大挑战,更不用说在没有干预的情况下不可能。在本文中,我们开发了一种元强化学习算法,该算法通过学习执行干预措施以构建明确的因果图来执行因果发现。除了对可能的下游应用程序有用外,估计的因果图还为数据生成过程提供了解释。在本文中,我们表明我们的算法估计了与SOTA方法相比,即使在以前从未见过的基本因果结构的环境中也是如此。此外,我们进行了一项消融研究,展示了学习干预措施如何有助于我们方法的整体表现。我们得出的结论是,干预措施确实有助于提高性能,从而有效地对可能看不见的环境的因果结构进行了准确的估计。
translated by 谷歌翻译
在科学研究和现实世界应用的许多领域中,非实验数据的因果效应的无偏估计对于理解数据的基础机制以及对有效响应或干预措施的决策至关重要。从不同角度对这个具有挑战性的问题进行了大量研究。对于数据中的因果效应估计,始终做出诸如马尔可夫财产,忠诚和因果关系之类的假设。在假设下,仍然需要一组协变量或基本因果图之类的全部知识。一个实用的挑战是,在许多应用程序中,没有这样的全部知识或只有某些部分知识。近年来,研究已经出现了基于图形因果模型的搜索策略,以从数据中发现有用的知识,以进行因果效应估计,并具有一些温和的假设,并在应对实际挑战方面表现出了诺言。在这项调查中,我们回顾了方法,并关注数据驱动方法所面临的挑战。我们讨论数据驱动方法的假设,优势和局限性。我们希望这篇综述将激励更多的研究人员根据图形因果建模设计更好的数据驱动方法,以解决因果效应估计的具有挑战性的问题。
translated by 谷歌翻译
药物的因果模型已用于分析机器学习系统的安全性方面。但是,识别代理是非平凡的 - 通常只是由建模者假设而没有太多理由来实现因果模型 - 建模失败可能会导致安全分析中的错误。本文提出了对代理商的第一个正式因果定义 - 大约是代理人是制度,如果他们的行为以不同的方式影响世界,则可以改善其政策。由此,我们得出了第一个用于从经验数据中发现代理的因果发现算法,并提供了用于在因果模型和游戏理论影响图之间转换的算法。我们通过解决不正确的因果模型引起的一些混乱来证明我们的方法。
translated by 谷歌翻译