监管压力测试已成为在美国最大银行设定资本要求的主要工具。美联储使用机密模型来评估在共同的压力方案中针对银行特定投资组合的特定银行成果。作为政策,尽管机构之间存在相当多的异质性,但所有银行都使用相同的模型;单个银行认为,某些模型不适合其业务。在这场辩论中,我们问,单独量身定制的模型的合理聚集是什么?我们认为,简单地跨银行汇总数据平等对待银行,但会遭受两个缺陷的影响:它可能会扭曲合法投资组合功能的影响,并且很容易受到隐含的合法信息的隐含误导来推断银行身份。我们比较了回归公平的各种概念,以解决这些缺陷,考虑到预测准确性和平等待遇。在线性模型的设置中,我们主张估算,然后丢弃中心的银行固定效果,这是可取的,而不是简单地忽略整个银行的差异。我们提供证据表明总体影响可能是重要的。我们还讨论了非线性模型的扩展。
translated by 谷歌翻译
基于AI和机器学习的决策系统已在各种现实世界中都使用,包括医疗保健,执法,教育和金融。不再是牵强的,即设想一个未来,自治系统将推动整个业务决策,并且更广泛地支持大规模决策基础设施以解决社会最具挑战性的问题。当人类做出决定时,不公平和歧视的问题普遍存在,并且当使用几乎没有透明度,问责制和公平性的机器做出决定时(或可能会放大)。在本文中,我们介绍了\ textit {Causal公平分析}的框架,目的是填补此差距,即理解,建模,并可能解决决策设置中的公平性问题。我们方法的主要见解是将观察到数据中存在的差异的量化与基本且通常是未观察到的因果机制收集的因果机制的收集,这些机制首先会产生差异,挑战我们称之为因果公平的基本问题分析(FPCFA)。为了解决FPCFA,我们研究了分解差异和公平性的经验度量的问题,将这种变化归因于结构机制和人群的不同单位。我们的努力最终达到了公平地图,这是组织和解释文献中不同标准之间关系的首次系统尝试。最后,我们研究了进行因果公平分析并提出一本公平食谱的最低因果假设,该假设使数据科学家能够评估不同影响和不同治疗的存在。
translated by 谷歌翻译
预测组合在预测社区中蓬勃发展,近年来,已经成为预测研究和活动主流的一部分。现在,由单个(目标)系列产生的多个预测组合通过整合来自不同来源收集的信息,从而提高准确性,从而减轻了识别单个“最佳”预测的风险。组合方案已从没有估计的简单组合方法演变为涉及时间变化的权重,非线性组合,组件之间的相关性和交叉学习的复杂方法。它们包括结合点预测和结合概率预测。本文提供了有关预测组合的广泛文献的最新评论,并参考可用的开源软件实施。我们讨论了各种方法的潜在和局限性,并突出了这些思想如何随着时间的推移而发展。还调查了有关预测组合实用性的一些重要问题。最后,我们以当前的研究差距和未来研究的潜在见解得出结论。
translated by 谷歌翻译
我们在分类的背景下研究公平,其中在接收器的曲线下的区域(AUC)下的区域测量的性能。当I型(误报)和II型(假阴性)错误都很重要时,通常使用AUC。然而,相同的分类器可以针对不同的保护组具有显着变化的AUC,并且在现实世界中,通常希望减少这种交叉组差异。我们解决如何选择其他功能,以便最大地改善弱势群体的AUC。我们的结果表明,功能的无条件方差不会通知我们关于AUC公平,而是类条件方差。使用此连接,我们基于功能增强(添加功能)来开发一种新颖的方法Fairauc,以减轻可识别组之间的偏差。我们评估综合性和现实世界(Compas)数据集的Fairauc,并发现它对于相对于基准,最大限度地提高了总体AUC并最大限度地减少了组之间的偏见的基准,它显着改善了弱势群体的AUC。
translated by 谷歌翻译
Objectives: Discussions of fairness in criminal justice risk assessments typically lack conceptual precision. Rhetoric too often substitutes for careful analysis. In this paper, we seek to clarify the tradeoffs between different kinds of fairness and between fairness and accuracy.Methods: We draw on the existing literatures in criminology, computer science and statistics to provide an integrated examination of fairness and accuracy in criminal justice risk assessments. We also provide an empirical illustration using data from arraignments.Results: We show that there are at least six kinds of fairness, some of which are incompatible with one another and with accuracy.Conclusions: Except in trivial cases, it is impossible to maximize accuracy and fairness at the same time, and impossible simultaneously to satisfy all kinds of fairness. In practice, a major complication is different base rates across different legally protected groups. There is a need to consider challenging tradeoffs.
translated by 谷歌翻译
在本文中,我们提出了一种非参数估计的方法,并推断了一般样本选择模型中因果效应参数的异质界限,初始治疗可能会影响干预后结果是否观察到。可观察到的协变量可能会混淆治疗选择,而观察结果和不可观察的结果可能会混淆。该方法提供条件效应界限作为策略相关的预处理变量的功能。它允许对身份不明的条件效应曲线进行有效的统计推断。我们使用灵活的半参数脱偏机学习方法,该方法可以适应柔性功能形式和治疗,选择和结果过程之间的高维混杂变量。还提供了易于验证的高级条件,以进行估计和错误指定的鲁棒推理保证。
translated by 谷歌翻译
在本文中,我们提出了一个通用框架,用于估计以用户定义的公平程度来估算回归模型。我们将公平性作为模型选择步骤,在该步骤中,我们选择山脊惩罚的价值来控制敏感属性的效果。然后,我们估计模型的参数,条件是所选的惩罚值。我们的建议在数学上很简单,其解决方案部分为封闭形式,并产生回归系数的估计值,这些系数直观地解释为公平水平的函数。此外,它很容易扩展到广义线性模型,内核回归模型和其他惩罚。它可以适应公平的多种定义。我们将我们的方法与Komiyama等人的回归模型进行了比较。 (2018年),它实现了一个理想的线性回归模型;以及Zafar等人的公平模型。 (2019)。我们在六个不同的数据集上对这些方法进行了经验评估,我们发现我们的建议提供了更好的合适性和更好的预测准确性,以达到相同的公平水平。此外,我们强调了Komiyama等人的原始实验评估中的偏见来源。 (2018)。
translated by 谷歌翻译
算法在政策和业务中产生越来越多的决策和建议。这种算法决策是自然实验(可条件准随机分配的仪器),因为该算法仅基于可观察输入变量的决定。我们使用该观察来为一类随机和确定性决策算法开发治疗效果估算器。我们的估算器被证明对于明确的因果效应,它们是一致的和渐近正常的。我们估算器的一个关键特例是多维回归不连续性设计。我们应用估算员以评估冠状病毒援助,救济和经济安全(关心)法案的效果,其中数十亿美元的资金通过算法规则分配给医院。我们的估计表明,救济资金对Covid-19相关的医院活动水平影响不大。天真的OLS和IV估计表现出实质性的选择偏差。
translated by 谷歌翻译
Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid perpetuating or creating discriminatory practices. In this paper, we develop a framework for modeling fairness using tools from causal inference. Our definition of counterfactual fairness captures the intuition that a decision is fair towards an individual if it is the same in (a) the actual world and (b) a counterfactual world where the individual belonged to a different demographic group. We demonstrate our framework on a real-world problem of fair prediction of success in law school. * Equal contribution. This work was done while JL was a Research Fellow at the Alan Turing Institute. 2 https://obamawhitehouse.archives.gov/blog/2016/05/04/big-risks-big-opportunities-intersection-big-dataand-civil-rights 31st Conference on Neural Information Processing Systems (NIPS 2017),
translated by 谷歌翻译
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.
translated by 谷歌翻译
Virtually all machine learning tasks are characterized using some form of loss function, and "good performance" is typically stated in terms of a sufficiently small average loss, taken over the random draw of test data. While optimizing for performance on average is intuitive, convenient to analyze in theory, and easy to implement in practice, such a choice brings about trade-offs. In this work, we survey and introduce a wide variety of non-traditional criteria used to design and evaluate machine learning algorithms, place the classical paradigm within the proper historical context, and propose a view of learning problems which emphasizes the question of "what makes for a desirable loss distribution?" in place of tacit use of the expected loss.
translated by 谷歌翻译
Advocates of algorithmic techniques like data mining argue that these techniques eliminate human biases from the decision-making process. But an algorithm is only as good as the data it works with. Data is frequently imperfect in ways that allow these algorithms to inherit the prejudices of prior decision makers. In other cases, data may simply reflect the widespread biases that persist in society at large. In still others, data mining can discover surprisingly useful regularities that are really just preexisting patterns of exclusion and inequality. Unthinking reliance on data mining can deny historically disadvantaged and vulnerable groups full participation in society. Worse still, because the resulting discrimination is almost always an unintentional emergent property of the algorithm's use rather than a conscious choice by its programmers, it can be unusually hard to identify the source of the problem or to explain it to a court. This Essay examines these concerns through the lens of American antidiscrimination law-more particularly, through Title
translated by 谷歌翻译
我们探索了一个新的强盗实验模型,其中潜在的非组织序列会影响武器的性能。上下文 - 统一算法可能会混淆,而那些执行正确的推理面部信息延迟的算法。我们的主要见解是,我们称之为Deconfounst Thompson采样的算法在适应性和健壮性之间取得了微妙的平衡。它的适应性在易于固定实例中带来了最佳效率,但是在硬性非平稳性方面显示出令人惊讶的弹性,这会导致其他自适应算法失败。
translated by 谷歌翻译
算法公平吸引了机器学习社区越来越多的关注。文献中提出了各种定义,但是它们之间的差异和联系并未清楚地解决。在本文中,我们回顾并反思了机器学习文献中先前提出的各种公平概念,并试图与道德和政治哲学,尤其是正义理论的论点建立联系。我们还从动态的角度考虑了公平的询问,并进一步考虑了当前预测和决策引起的长期影响。鉴于特征公平性的差异,我们提出了一个流程图,该流程图包括对数据生成过程,预测结果和诱导的影响的不同类型的公平询问的隐式假设和预期结果。本文展示了与任务相匹配的重要性(人们希望执行哪种公平性)和实现预期目的的手段(公平分析的范围是什么,什么是适当的分析计划)。
translated by 谷歌翻译
我们使用深层部分最小二乘(DPL)来估算单个股票收益的资产定价模型,该模型以灵活而动态的方式利用调理信息,同时将超额回报归因于一小部分统计风险因素。新颖的贡献是解决非线性因子结构,从而推进经验资产定价中深度学习的当前范式,该定价在假设高斯资产回报和因素的假设下使用线性随机折现因子。通过使用预测的最小二乘正方形来共同投影公司特征和资产回报到潜在因素的子空间,并使用深度学习从因子负载到资产回报中学习非线性图。捕获这种非线性风险因素结构的结果是通过线性风险因素暴露和相互作用效应来表征资产回报中的异常情况。因此,深度学习捕获异常值的众所周知的能力,在潜在因素结构中的角色和高阶项在因素风险溢价上的作用。从经验方面来说,我们实施了DPLS因子模型,并表现出比Lasso和Plain Vanilla深度学习模型表现出卓越的性能。此外,由于DPL的更简约的架构,我们的网络培训时间大大减少了。具体而言,在1989年12月至2018年1月的一段时间内使用Russell 1000指数中的3290资产,我们评估了我们的DPLS因子模型,并生成比深度学习大约1.2倍的信息比率。 DPLS解释了变化和定价错误,并确定了最突出的潜在因素和公司特征。
translated by 谷歌翻译
Based on administrative data of unemployed in Belgium, we estimate the labour market effects of three training programmes at various aggregation levels using Modified Causal Forests, a causal machine learning estimator. While all programmes have positive effects after the lock-in period, we find substantial heterogeneity across programmes and unemployed. Simulations show that 'black-box' rules that reassign unemployed to programmes that maximise estimated individual gains can considerably improve effectiveness: up to 20 percent more (less) time spent in (un)employment within a 30 months window. A shallow policy tree delivers a simple rule that realizes about 70 percent of this gain.
translated by 谷歌翻译
业务分析(BA)的广泛采用带来了财务收益和提高效率。但是,当BA以公正的影响为决定时,这些进步同时引起了人们对法律和道德挑战的不断增加。作为对这些关注的回应,对算法公平性的新兴研究涉及算法输出,这些算法可能会导致不同的结果或其他形式的对人群亚组的不公正现象,尤其是那些在历史上被边缘化的人。公平性是根据法律合规,社会责任和效用是相关的;如果不充分和系统地解决,不公平的BA系统可能会导致社会危害,也可能威胁到组织自己的生存,其竞争力和整体绩效。本文提供了有关算法公平的前瞻性,注重BA的评论。我们首先回顾有关偏见来源和措施的最新研究以及偏见缓解算法。然后,我们对公用事业关系的详细讨论进行了详细的讨论,强调经常假设这两种构造之间经常是错误的或短视的。最后,我们通过确定企业学者解决有效和负责任的BA的关键的有影响力的公开挑战的机会来绘制前进的道路。
translated by 谷歌翻译
有许多可用于选择优先考虑治疗的可用方法,包括基于治疗效果估计,风险评分和手工制作规则的遵循申请。我们将秩加权平均治疗效应(RATY)指标作为一种简单常见的指标系列,用于比较水平竞争范围的治疗优先级规则。对于如何获得优先级规则,率是不可知的,并且仅根据他们在识别受益于治疗中受益的单位的方式进行评估。我们定义了一系列速率估算器,并证明了一个中央限位定理,可以在各种随机和观测研究环境中实现渐近精确的推断。我们为使用自主置信区间的使用提供了理由,以及用于测试关于治疗效果中的异质性的假设的框架,与优先级规则相关。我们对速率的定义嵌套了许多现有度量,包括QINI系数,以及我们的分析直接产生了这些指标的推论方法。我们展示了我们从个性化医学和营销的示例中的方法。在医疗环境中,使用来自Sprint和Accor-BP随机对照试验的数据,我们发现没有明显的证据证明异质治疗效果。另一方面,在大量的营销审判中,我们在一些数字广告活动的治疗效果中发现了具有的强大证据,并证明了如何使用率如何比较优先考虑估计风险的目标规则与估计治疗效益优先考虑的目标规则。
translated by 谷歌翻译
Uncertainty is prevalent in engineering design, statistical learning, and decision making broadly. Due to inherent risk-averseness and ambiguity about assumptions, it is common to address uncertainty by formulating and solving conservative optimization models expressed using measure of risk and related concepts. We survey the rapid development of risk measures over the last quarter century. From its beginning in financial engineering, we recount their spread to nearly all areas of engineering and applied mathematics. Solidly rooted in convex analysis, risk measures furnish a general framework for handling uncertainty with significant computational and theoretical advantages. We describe the key facts, list several concrete algorithms, and provide an extensive list of references for further reading. The survey recalls connections with utility theory and distributionally robust optimization, points to emerging applications areas such as fair machine learning, and defines measures of reliability.
translated by 谷歌翻译
This review presents empirical researchers with recent advances in causal inference, and stresses the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called "causal effects" or "policy evaluation") (2) queries about probabilities of counterfactuals, (including assessment of "regret," "attribution" or "causes of effects") and (3) queries about direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.
translated by 谷歌翻译