通常对机器学习分类器进行培训,以最大程度地减少数据集的平均误差。不幸的是,在实践中,这个过程通常会利用训练数据中亚组不平衡引起的虚假相关性,从而导致高平均性能,但跨亚组的性能高度可变。解决此问题的最新工作提出了使用骆驼进行模型修补。这种先前的方法使用生成的对抗网络来执行类内的群间数据增强,需要(a)训练许多计算昂贵的模型以及(b)给定域模型的合成输出的足够质量。在这项工作中,我们提出了RealPatch,这是一个基于统计匹配的简单,更快,更快的数据增强的框架。我们的框架通过使用真实样本增强数据集来执行模型修补程序,从而减轻了为目标任务训练生成模型的需求。我们证明了RealPatch在三个基准数据集,Celeba,Waterbird和IwildCam的一部分中的有效性,显示了最差的亚组性能和二进制分类中亚组性能差距的改进。此外,我们使用IMSITU数据集进行了211个类的实验,在这种设置中,基于生成模型的修补(例如骆驼)是不切实际的。我们表明,RealPatch可以成功消除数据集泄漏,同时减少模型泄漏并保持高实用程序。可以在https://github.com/wearepal/realpatch上找到RealPatch的代码。
translated by 谷歌翻译
Standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on certain groups, especially in the presence of spurious correlations between the input and label. Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training point, whereas approaches that do not use such group annotations typically achieve unsatisfactory worst-group accuracy. In this paper, we propose a simple two-stage approach, JTT, that first trains a standard ERM model for several epochs, and then trains a second model that upweights the training examples that the first model misclassified. Intuitively, this upweights examples from groups on which standard ERM models perform poorly, leading to improved worst-group performance. Averaged over four image classification and natural language processing tasks with spurious correlations, JTT closes 75% of the gap in worst-group accuracy between standard ERM and group DRO, while only requiring group annotations on a small validation set in order to tune hyperparameters.
translated by 谷歌翻译
虽然神经网络在平均病例的性能方面对分类任务的成功显着,但它们通常无法在某些数据组上表现良好。这样的组信息可能是昂贵的;因此,即使在培训数据不可用的组标签不可用,较稳健性和公平的最新作品也提出了改善最差组性能的方法。然而,这些方法通常在培训时间使用集团信息的表现不佳。在这项工作中,我们假设没有组标签的较大数据集一起访问少量组标签。我们提出了一个简单的两步框架,利用这个部分组信息来提高最差组性能:训练模型以预测训练数据的丢失组标签,然后在强大的优化目标中使用这些预测的组标签。从理论上讲,我们在最差的组性能方面为我们的方法提供泛化界限,展示了泛化误差如何相对于培训点总数和具有组标签的培训点的数量。凭经验,我们的方法优于不使用群组信息的基线表达,即使只有1-33%的积分都有组标签。我们提供消融研究,以支持我们框架的稳健性和可扩展性。
translated by 谷歌翻译
Learning models that gracefully handle distribution shifts is central to research on domain generalization, robust optimization, and fairness. A promising formulation is domain-invariant learning, which identifies the key issue of learning which features are domain-specific versus domaininvariant. An important assumption in this area is that the training examples are partitioned into "domains" or "environments". Our focus is on the more common setting where such partitions are not provided. We propose EIIL, a general framework for domain-invariant learning that incorporates Environment Inference to directly infer partitions that are maximally informative for downstream Invariant Learning. We show that EIIL outperforms invariant learning methods on the CMNIST benchmark without using environment labels, and significantly outperforms ERM on worst-group performance in the Waterbirds and CivilComments datasets. Finally, we establish connections between EIIL and algorithmic fairness, which enables EIIL to improve accuracy and calibration in a fair prediction problem.
translated by 谷歌翻译
接受经验风险最小化(ERM)训练的机器学习模型的预测性能可以大大降解分配变化。在训练数据集中存在虚假相关性的存在导致ERM训练的模型在对不存在此类相关性的少数群体评估时表现出很高的损失。已经进行了广泛的尝试来开发改善最差的鲁棒性的方法。但是,他们需要每个培训输入的组信息,或者至少需要一个带有组标签的验证设置来调整其超参数,这可能是昂贵的或未知的。在本文中,我们应对在培训或验证期间没有小组注释的情况下提高组鲁棒性的挑战。为此,我们建议根据``识别''模型提取的特征的革兰氏集矩阵将训练数据集分为组,并根据这些伪组应用强大的优化。在不可用的小组标签的现实情况下,我们的实验表明,我们的方法不仅可以改善对ERM的稳健性,而且还优于所有最近的基线
translated by 谷歌翻译
尽管无偏见的机器学习模型对于许多应用程序至关重要,但偏见是一个人为定义的概念,可以在任务中有所不同。只有输入标签对,算法可能缺乏足够的信息来区分稳定(因果)特征和不稳定(虚假)特征。但是,相关任务通常具有类似的偏见 - 我们可以利用在转移环境中开发稳定的分类器的观察结果。在这项工作中,我们明确通知目标分类器有关源任务中不稳定功能的信息。具体而言,我们得出一个表示,该表示通过对比源任务中的不同数据环境来编码不稳定的功能。我们通过根据此表示形式将目标任务的数据聚类来实现鲁棒性,并最大程度地降低这些集群中最坏情况的风险。我们对文本和图像分类进行评估。经验结果表明,我们的算法能够在合成生成的环境和现实环境的目标任务上保持鲁棒性。我们的代码可在https://github.com/yujiabao/tofu上找到。
translated by 谷歌翻译
Overparameterized neural networks can be highly accurate on average on an i.i.d.test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, the poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization-a stronger-than-typical 2 penalty or early stopping-we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.
translated by 谷歌翻译
尽管机器学习模型迅速推进了各种现实世界任务的最先进,但鉴于这些模型对虚假相关性的脆弱性,跨域(OOD)的概括仍然是一个挑战性的问题。尽管当前的域概括方法通常着重于通过新的损耗函数设计在不同域上实施某些不变性属性,但我们提出了一种平衡的迷你批次采样策略,以减少观察到的训练分布中域特异性的虚假相关性。更具体地说,我们提出了一种两步方法,该方法1)识别虚假相关性的来源,以及2)通过在确定的来源上匹配,构建平衡的迷你批次而没有虚假相关性。我们提供了伪造来源的可识别性保证,并表明我们提出的方法是从所有培训环境中平衡,无虚拟分布的样本。实验是在三个具有伪造相关性的计算机视觉数据集上进行的,从经验上证明,与随机的迷你批次采样策略相比,我们平衡的微型批次采样策略可改善四个不同建立的域泛化模型基线的性能。
translated by 谷歌翻译
由于视觉识别的社会影响一直受到审查,因此出现了几个受保护的平衡数据集,以解决不平衡数据集中的数据集偏差。但是,在面部属性分类中,数据集偏置既源于受保护的属性级别和面部属性级别,这使得构建多属性级别平衡的真实数据集使其具有挑战性。为了弥合差距,我们提出了一条有效的管道,以产生具有所需面部属性的高质量和足够的面部图像,并将原始数据集补充为两个级别的平衡数据集,从理论上讲,这在理论上满足了几个公平标准。我们方法的有效性在性别分类和面部属性分类方面得到了验证,通过将可比的任务性能作为原始数据集,并通过广泛的度量标准进行全面的公平评估,并进一步提高公平性。此外,我们的方法优于重采样和平衡的数据集构建来解决数据集偏差,以及解决任务偏置的模型模型。
translated by 谷歌翻译
Neural image classifiers are known to undergo severe performance degradation when exposed to input that exhibits covariate-shift with respect to the training distribution. Successful hand-crafted augmentation pipelines aim at either approximating the expected test domain conditions or to perturb the features that are specific to the training environment. The development of effective pipelines is typically cumbersome, and produce transformations whose impact on the classifier performance are hard to understand and control. In this paper, we show that recent Text-to-Image (T2I) generators' ability to simulate image interventions via natural-language prompts can be leveraged to train more robust models, offering a more interpretable and controllable alternative to traditional augmentation methods. We find that a variety of prompting mechanisms are effective for producing synthetic training data sufficient to achieve state-of-the-art performance in widely-adopted domain-generalization benchmarks and reduce classifiers' dependency on spurious features. Our work suggests that further progress in T2I generation and a tighter integration with other research fields may represent a significant step towards the development of more robust machine learning systems.
translated by 谷歌翻译
Models trained via empirical risk minimization (ERM) are known to rely on spurious correlations between labels and task-independent input features, resulting in poor generalization to distributional shifts. Group distributionally robust optimization (G-DRO) can alleviate this problem by minimizing the worst-case loss over a set of pre-defined groups over training data. G-DRO successfully improves performance of the worst-group, where the correlation does not hold. However, G-DRO assumes that the spurious correlations and associated worst groups are known in advance, making it challenging to apply it to new tasks with potentially multiple unknown spurious correlations. We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization -- an end-to-end approach that jointly identifies error-prone groups and improves accuracy on them. AGRO equips G-DRO with an adversarial slicing model to find a group assignment for training examples which maximizes worst-case loss over the discovered groups. On the WILDS benchmark, AGRO results in 8% higher model performance on average on known worst-groups, compared to prior group discovery approaches used with G-DRO. AGRO also improves out-of-distribution performance on SST2, QQP, and MS-COCO -- datasets where potential spurious correlations are as yet uncharacterized. Human evaluation of ARGO groups shows that they contain well-defined, yet previously unstudied spurious correlations that lead to model errors.
translated by 谷歌翻译
我们描述了Countersynth,一种诱导标签驱动的扩散变形的条件生成模型,体积脑图像中的标签驱动的生物合理的变化。该模型旨在综合用于下游判别判断性建模任务的反事实训练数据,其中保真度受数据不平衡,分布不稳定性,混淆或缺点的限制,并且在不同的群体中表现出不公平的性能。专注于人口统计属性,我们评估了具有基于体素的形态学,分类和回归条件属性的合成反事实的质量,以及FR \'{e} CHET开始距离。在设计的人口统计不平衡和混淆背景下检查下游歧视性能,我们使用英国Biobank磁共振成像数据来基准测试对这些问题的当前解决方案的增强。我们实现了最先进的改进,无论是整体忠诚和股权。 CounterSynth的源代码可在线获取。
translated by 谷歌翻译
机器学习算法通常假设培训和测试示例是从相同的分布中汲取的。然而,分发转移是现实世界应用中的常见问题,并且可以在测试时间造成模型急剧执行。在本文中,我们特别考虑域移位和亚泊素班次的问题(例如,不平衡数据)。虽然先前的作品通常会寻求明确地将模型的内部表示和预测器进行明确,以成为域不变的,但我们旨在规范整个功能而不限制模型的内部表示。这导致了一种简单的基于混合技术,它通过名为LISA的选择性增强来学习不变函数。 Lisa选择性地用相同的标签而单独地插值样本,但不同的域或具有相同的域但不同的标签。我们分析了线性设置,从理论上展示了LISA如何导致较小的最差组错误。凭经验,我们研究了LISA对从亚本化转变到域移位的九个基准的有效性,我们发现LISA一直以其他最先进的方法表达。
translated by 谷歌翻译
Trying to capture the sample-label relationship, conditional generative models often end up inheriting the spurious correlation in the training dataset, giving label-conditional distributions that are severely imbalanced in another latent attribute. To mitigate such undesirable correlations engraved into generative models, which we call spurious causality, we propose a general two-step strategy. (a) Fairness Intervention (FI): Emphasize the minority samples that are hard to be generated due to the spurious correlation in the training dataset. (b) Corrective Sampling (CS): Filter the generated samples explicitly to follow the desired label-conditional latent attribute distribution. We design the fairness intervention for various degrees of supervision on the spurious attribute, including unsupervised, weakly-supervised, and semi-supervised scenarios. Our experimental results show that the proposed FICS can successfully resolve the spurious correlation in generated samples on various datasets.
translated by 谷歌翻译
在许多现实世界中的机器学习应用中,亚种群的转移存在着极大地存在,指的是包含相同亚种群组的培训和测试分布,但在亚种群频率中有所不同。重要性重新加权是通过对训练数据集中每个样本施加恒定或自适应抽样权重来处理亚种群转移问题的正常方法。但是,最近的一些研究已经认识到,这些方法中的大多数无法改善性能,而不是经验风险最小化,尤其是当应用于过度参数化的神经网络时。在这项工作中,我们提出了一个简单而实用的框架,称为“不确定性感知混合”(UMIX),以根据样品不确定性重新加权“混合”样品来减轻过度参数化模型中的过度拟合问题。基于训练 - 注射器的不确定性估计为每个样品的拟议UMIX配备,以灵活地表征亚群分布。我们还提供有见地的理论分析,以验证UMIX是否在先前的工作中实现了更好的概括界限。此外,我们在广泛的任务上进行了广泛的经验研究,以验证我们方法的有效性,既有定性和定量。
translated by 谷歌翻译
Machine learning models rely on various assumptions to attain high accuracy. One of the preliminary assumptions of these models is the independent and identical distribution, which suggests that the train and test data are sampled from the same distribution. However, this assumption seldom holds in the real world due to distribution shifts. As a result models that rely on this assumption exhibit poor generalization capabilities. Over the recent years, dedicated efforts have been made to improve the generalization capabilities of these models collectively known as -- \textit{domain generalization methods}. The primary idea behind these methods is to identify stable features or mechanisms that remain invariant across the different distributions. Many generalization approaches employ causal theories to describe invariance since causality and invariance are inextricably intertwined. However, current surveys deal with the causality-aware domain generalization methods on a very high-level. Furthermore, we argue that it is possible to categorize the methods based on how causality is leveraged in that method and in which part of the model pipeline is it used. To this end, we categorize the causal domain generalization methods into three categories, namely, (i) Invariance via Causal Data Augmentation methods which are applied during the data pre-processing stage, (ii) Invariance via Causal representation learning methods that are utilized during the representation learning stage, and (iii) Invariance via Transferring Causal mechanisms methods that are applied during the classification stage of the pipeline. Furthermore, this survey includes in-depth insights into benchmark datasets and code repositories for domain generalization methods. We conclude the survey with insights and discussions on future directions.
translated by 谷歌翻译
我们通过对杂散相关性的因果解释提出了一种信息 - 理论偏置测量技术,这通过利用条件相互信息来识别特征级算法偏压有效。尽管已经提出了几种偏置测量方法并广泛地研究以在各种任务中实现诸如面部识别的各种任务中的算法公平,但它们的准确性或基于Logit的度量易于导致普通预测得分调整而不是基本偏差减少。因此,我们设计针对算法偏差的新型扩张框架,其包括由所提出的信息 - 理论偏置测量方法导出的偏压正则化损耗。此外,我们介绍了一种基于随机标签噪声的简单而有效的无监督的脱叠技术,这不需要明确的偏置信息监督。通过多种标准基准测试的广泛实验,在不同的现实情景中验证了所提出的偏差测量和脱叠方法。
translated by 谷歌翻译
A machine learning model, under the influence of observed or unobserved confounders in the training data, can learn spurious correlations and fail to generalize when deployed. For image classifiers, augmenting a training dataset using counterfactual examples has been empirically shown to break spurious correlations. However, the counterfactual generation task itself becomes more difficult as the level of confounding increases. Existing methods for counterfactual generation under confounding consider a fixed set of interventions (e.g., texture, rotation) and are not flexible enough to capture diverse data-generating processes. Given a causal generative process, we formally characterize the adverse effects of confounding on any downstream tasks and show that the correlation between generative factors (attributes) can be used to quantitatively measure confounding between generative factors. To minimize such correlation, we propose a counterfactual generation method that learns to modify the value of any attribute in an image and generate new images given a set of observed attributes, even when the dataset is highly confounded. These counterfactual images are then used to regularize the downstream classifier such that the learned representations are the same across various generative factors conditioned on the class label. Our method is computationally efficient, simple to implement, and works well for any number of generative factors and confounding variables. Our experimental results on both synthetic (MNIST variants) and real-world (CelebA) datasets show the usefulness of our approach.
translated by 谷歌翻译
虽然大型审计的基础模型(FMS)对数据集级别的分布变化显示出显着的零击分类鲁棒性,但它们对亚群或组移动的稳健性相对却相对不受欢迎。我们研究了这个问题,并发现诸如剪辑之类的FMS可能对各种群体转移可能不健壮。在9个稳健性基准中,其嵌入式分类零射击分类导致平均和最差组精度之间的差距高达80.7个百分点(PP)。不幸的是,现有的改善鲁棒性的方法需要重新培训,这在大型基础模型上可能非常昂贵。我们还发现,改善模型推理的有效方法(例如,通过适配器,具有FM嵌入式作为输入的轻量级网络)不会持续改进,有时与零击相比会伤害组鲁棒性(例如,将精度差距提高到50.1 pp on 50.1 pp on On on 50.1 pp on Celeba)。因此,我们制定了一种适配器培训策略,以有效有效地改善FM组的鲁棒性。我们激励的观察是,尽管同一阶级中的群体中较差的鲁棒性在基础模型“嵌入空间”中分开,但标准适配器训练可能不会使这些要点更加紧密。因此,我们提出了对比度的适应,该适应器会通过对比度学习进行训练适配器,以使样品嵌入在同一类中的地面真相类嵌入和其他样品嵌入。在整个9个基准测试中,我们的方法始终提高组鲁棒性,使最差的组精度提高了8.5至56.0 pp。我们的方法也是有效的,这样做的方法也没有任何FM芬太尼,只有一组固定的冷冻FM嵌入。在水鸟和Celeba等基准上,这导致最差的组精度可与最先进的方法相媲美,而最先进的方法可以重新训练整个模型,而仅训练$ \ leq $ 1%的模型参数。
translated by 谷歌翻译
机器学习模型通常使用诸如“依靠人的存在来检测网球拍”的虚假模式,这不概括。在这项工作中,我们介绍了一个端到端的管道,用于识别和减轻图像分类器的虚假模式。我们首先找到“模型对网球拍预测的模式,如果我们隐藏人民的时间似的63%。”然后,如果模式是虚幻的,我们通过新颖的数据增强来减轻它。我们展示了这种方法识别了一种多样化的杂散模式,并且它通过产生一个模型来减轻它们,这些模型在虚假图案对虚假模式对分布偏移不有用和更鲁棒的分布上进行更准确。
translated by 谷歌翻译