State-of-the-art pre-trained language models (PLMs) outperform other models when applied to the majority of language processing tasks. However, PLMs have been found to degrade in performance under distribution shift, a phenomenon that occurs when data at test-time does not come from the same distribution as the source training set. Equally as challenging is the task of obtaining labels in real-time due to issues like long-labeling feedback loops. The lack of adequate methods that address the aforementioned challenges constitutes the need for approaches that continuously adapt the PLM to a distinct distribution. Unsupervised domain adaptation adapts a source model to an unseen as well as unlabeled target domain. While some techniques such as data augmentation can adapt models in several scenarios, they have only been sparsely studied for addressing the distribution shift problem. In this work, we present an approach (MEMO-CL) that improves the performance of PLMs at test-time under distribution shift. Our approach takes advantage of the latest unsupervised techniques in data augmentation and adaptation to minimize the entropy of the PLM's output distribution. MEMO-CL operates on a batch of augmented samples from a single observation in the test set. The technique introduced is unsupervised, domain-agnostic, easy to implement, and requires no additional data. Our experiments result in a 3% improvement over current test-time adaptation baselines.
translated by 谷歌翻译
Test-time adaptation is the problem of adapting a source pre-trained model using test inputs from a target domain without access to source domain data. Most of the existing approaches address the setting in which the target domain is stationary. Moreover, these approaches are prone to making erroneous predictions with unreliable uncertainty estimates when distribution shifts occur. Hence, test-time adaptation in the face of non-stationary target domain shift becomes a problem of significant interest. To address these issues, we propose a principled approach, PETAL (Probabilistic lifElong Test-time Adaptation with seLf-training prior), which looks into this problem from a probabilistic perspective using a partly data-dependent prior. A student-teacher framework, where the teacher model is an exponential moving average of the student model naturally emerges from this probabilistic perspective. In addition, the knowledge from the posterior distribution obtained for the source task acts as a regularizer. To handle catastrophic forgetting in the long term, we also propose a data-driven model parameter resetting mechanism based on the Fisher information matrix (FIM). Moreover, improvements in experimental results suggest that FIM based data-driven parameter restoration contributes to reducing the error accumulation and maintaining the knowledge of recent domain by restoring only the irrelevant parameters. In terms of predictive error rate as well as uncertainty based metrics such as Brier score and negative log-likelihood, our method achieves better results than the current state-of-the-art for online lifelong test time adaptation across various benchmarks, such as CIFAR-10C, CIFAR-100C, ImageNetC, and ImageNet3DCC datasets.
translated by 谷歌翻译
We demonstrate that self-learning techniques like entropy minimization and pseudo-labeling are simple and effective at improving performance of a deployed computer vision model under systematic domain shifts. We conduct a wide range of large-scale experiments and show consistent improvements irrespective of the model architecture, the pre-training technique or the type of distribution shift. At the same time, self-learning is simple to use in practice because it does not require knowledge or access to the original training data or scheme, is robust to hyperparameter choices, is straight-forward to implement and requires only a few adaptation epochs. This makes self-learning techniques highly attractive for any practitioner who applies machine learning algorithms in the real world. We present state-of-the-art adaptation results on CIFAR10-C (8.5% error), ImageNet-C (22.0% mCE), ImageNet-R (17.4% error) and ImageNet-A (14.8% error), theoretically study the dynamics of self-supervised adaptation methods and propose a new classification dataset (ImageNet-D) which is challenging even with adaptation.
translated by 谷歌翻译
Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the state of the art convolutional architecture TANet and the Video Swin Transformer. Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts. Code will be available at \url{https://github.com/wlin-at/ViTTA}.
translated by 谷歌翻译
预训练的视觉模型(例如,剪辑)在许多下游任务中显示出有希望的零弹性概括,并具有正确设计的文本提示。最近的作品不依赖手工设计的提示,而是使用下游任务的培训数据来学习提示。虽然有效,但针对领域数据的培训却降低了模型的概括能力,使其无法看到新领域。在这项工作中,我们提出了测试时间提示调整(TPT),该方法可以通过单个测试样本即时学习自适应提示。对于图像分类,TPT通过使用置信度选择最小化熵来优化提示,以便模型在每个测试样本的不同增强视图上都具有一致的预测。在评估对自然分布变化的概括时,TPT平均将零击的TOP-1精度提高了3.6%,超过了先前需要其他特定于任务的训练数据的迅速调整方法。在评估看不见类别的跨数据集泛化时,TPT与使用其他培训数据的最先进方法相当。项目页面:https://azshue.github.io/tpt。
translated by 谷歌翻译
域适应对于将学习模型调整到新方案,例如域移位或更改数据分布,这是至关重要的。目前的方法通常需要来自移位域的大量标记或未标记的数据。这可以是在需要连续动态适应或遭受数据稀缺的领域的障碍,例如,自动驾驶在挑战天气条件下。为了解决持续适应分配班的问题,我们提出了动态无监督的适应(DUA)。我们通过持续调整批量归一化层的统计来修改模型的特征表示。我们表明,通过从移位域中仅访问一小部分未标记的数据并按顺序调整,可以实现强大的性能增益。甚至从目标领域的未标记数据的少于1%,Dua已经实现了强大的基线的竞争结果。此外,与先前的方法相比,计算开销最小。我们的方法很简单,但有效,可以应用于任何使用批量归一化作为其组件之一的架构。我们通过在各种域适应数据集和任务中评估DUA的效用,包括对象识别,数字识别和对象检测。
translated by 谷歌翻译
在本文中,我们的目标是在测试时调整预训练的卷积神经网络对域的变化。我们在没有标签的情况下,不断地使用传入的测试批次流。现有文献主要是基于通过测试图像的对抗扰动获得的人工偏移。在此激励的情况下,我们在域转移的两个现实和挑战的来源(即背景和语义转移)上评估了艺术的状态。上下文移动与环境类型相对应,例如,在室内上下文上预先训练的模型必须适应Core-50上的户外上下文[7]。语义转移对应于捕获类型,例如,在自然图像上预先训练的模型必须适应域网上的剪贴画,草图和绘画[10]。我们在分析中包括了最近的技术,例如预测时间批归一化(BN)[8],测试熵最小化(帐篷)[16]和持续的测试时间适应(CottA)[17]。我们的发现是三个方面的:i)测试时间适应方法的表现更好,并且与语义转移相比,在上下文转移方面忘记了更少的忘记,ii)帐篷在短期适应方面的表现优于其他方法,而Cotta则超过了其他关于长期适应的方法, iii)bn是最可靠和强大的。
translated by 谷歌翻译
测试时间的域变化在实践中是不可避免的。测试时间适应性通过在部署过程中调整模型来解决此问题。从理论上讲,最近的工作表明,自我训练可能是逐渐域移动的强大方法。在这项工作中,我们显示了渐进域适应与测试时间适应之间的自然联系。我们发布了一个名为Carlatta的新合成数据集,该数据集允许在测试时间期间探索渐进的域移动,并评估无监督域适应和测试时间适应的几种方法。我们提出了一种基于自我训练和样式转移的新方法GTTA。GTTA明确利用渐进域移动并在该区域设置新标准。我们进一步证明了我们的方法对连续和逐渐的CIFAR10C,CIFAR100C和Imagenet-C基准的有效性。
translated by 谷歌翻译
Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code, by an automated tool. APR tools have recently experienced promising results by leveraging state-of-the-art Neural Language Processing (NLP) techniques. APR tools such as TFix and CodeXGLUE combine text-to-text transformers with software-specific techniques are outperforming alternatives, these days. However, in most APR studies the train and test sets are chosen from the same set of projects. In reality, however, APR models are meant to be generalizable to new and different projects. Therefore, there is a potential threat that reported APR models with high effectiveness perform poorly when the characteristics of the new project or its bugs are different than the training set's(Domain Shift). In this study, we first define and measure the domain shift problem in automated program repair. Then, we then propose a domain adaptation framework that can adapt an APR model for a given target project. We conduct an empirical study with three domain adaptation methods FullFineTuning, TuningWithLightWeightAdapterLayers, and CurriculumLearning using two state-of-the-art domain adaptation tools (TFix and CodeXGLUE) and two APR models on 611 bugs from 19 projects. The results show that our proposed framework can improve the effectiveness of TFix by 13.05% and CodeXGLUE by 23.4%. Another contribution of this study is the proposal of a data synthesis method to address the lack of labelled data in APR. We leverage transformers to create a bug generator model. We use the generated synthetic data to domain adapt TFix and CodeXGLUE on the projects with no data (Zero-shot learning), which results in an average improvement of 5.76% and 24.42% for TFix and CodeXGLUE, respectively.
translated by 谷歌翻译
大多数机器学习算法的基本假设是培训和测试数据是从相同的底层分布中汲取的。然而,在几乎所有实际应用中违反了这种假设:由于不断变化的时间相关,非典型最终用户或其他因素,机器学习系统经常测试。在这项工作中,我们考虑域泛化的问题设置,其中训练数据被构造成域,并且可能有多个测试时间偏移,对应于新域或域分布。大多数事先方法旨在学习在所有域上执行良好的单一强大模型或不变的功能空间。相比之下,我们的目标是使用未标记的测试点学习适应域转移到域移的模型。我们的主要贡献是介绍自适应风险最小化(ARM)的框架,其中模型被直接优化,以便通过学习来转移以适应培训域来改编。与稳健性,不变性和适应性的先前方法相比,ARM方法提供了在表现域移位的多个图像分类问题上的性能增益为1-4%的测试精度。
translated by 谷歌翻译
Models should be able to adapt to unseen data during test-time to avoid performance drops caused by inevitable distribution shifts in real-world deployment scenarios. In this work, we tackle the practical yet challenging test-time adaptation (TTA) problem, where a model adapts to the target domain without accessing the source data. We propose a simple recipe called \textit{Data-efficient Prompt Tuning} (DePT) with two key ingredients. First, DePT plugs visual prompts into the vision Transformer and only tunes these source-initialized prompts during adaptation. We find such parameter-efficient finetuning can efficiently adapt the model representation to the target domain without overfitting to the noise in the learning objective. Second, DePT bootstraps the source representation to the target domain by memory bank-based online pseudo-labeling. A hierarchical self-supervised regularization specially designed for prompts is jointly optimized to alleviate error accumulation during self-training. With much fewer tunable parameters, DePT demonstrates not only state-of-the-art performance on major adaptation benchmarks VisDA-C, ImageNet-C, and DomainNet-126, but also superior data efficiency, i.e., adaptation with only 1\% or 10\% data without much performance degradation compared to 100\% data. In addition, DePT is also versatile to be extended to online or multi-source TTA settings.
translated by 谷歌翻译
尽管进行了多年的研究,但跨域的概括仍然是深层网络的语义分割的关键弱点。先前的研究取决于静态模型的假设,即训练过程完成后,模型参数在测试时间保持固定。在这项工作中,我们通过一种自适应方法来挑战这一前提,用于语义分割,将推理过程调整为每个输入样本。自我适应在两个级别上运行。首先,它采用了自我监督的损失,该损失将网络中卷积层的参数定制为输入图像。其次,在批准层中,自适应近似于整个测试数据的平均值和方差,这是不可用的。它通过在训练和从单个测试样本得出的参考分布之间进行插值来实现这一目标。为了凭经验分析我们的自适应推理策略,我们制定并遵循严格的评估协议,以解决先前工作的严重局限性。我们的广泛分析得出了一个令人惊讶的结论:使用标准训练程序,自我适应大大优于强大的基准,并在多域基准测试方面设定了新的最先进的准确性。我们的研究表明,自适应推断可以补充培训时间的既定模型正规化实践,以改善深度网络的概括到异域数据。
translated by 谷歌翻译
在测试时间适应(TTA)中,给定在某些源数据上培训的模型,目标是使其适应从不同分布的测试实例更好地预测。至关重要的是,TTA假设从目标分布到Finetune源模型,无法访问源数据或甚至从目标分布到任何其他标记/未标记的样本。在这项工作中,我们考虑TTA在更务实的设置中,我们称为SITA(单图像测试时间适应)。这里,在制作每个预测时,该模型只能访问给定的\ emph {单}测试实例,而不是实例的\ emph {批次}。通常在文献中被考虑。这是由逼真的情况激励,其中在按需时尚中需要推断,可能不会被延迟到“批量 - iFY”传入请求或者在没有范围的边缘设备(如移动电话中)发生推断批处理。 SITA的整个适应过程应在推理时间发生时非常快。为了解决这个问题,我们提出了一种新颖的AUGBN,用于仅需要转发传播的SITA设置。该方法可以为分类和分段任务的单个测试实例调整任何特征训练模型。 AUGBN估计仅使用具有标签保存的转换的一个前进通过的给定测试图像的看不见的测试分布的正常化统计。由于AUGBN不涉及任何反向传播,与其他最近的方法相比,它显着更快。据我们所知,这是仅使用单个测试图像解决此硬调整问题的第一个工作。尽管非常简单,但我们的框架能够在我们广泛的实验和消融研究中对目标实例上应用源模型来实现显着的性能增益。
translated by 谷歌翻译
测试时间适应(TTA)是指适应神经网络以进行分配变化,仅在测试时间内从新域中访问未标记的测试样本。先前的TTA方法优化了无监督的目标,例如帐篷中的模型预测的熵[Wang等,2021],但目前尚不清楚到底是什么使TTA损失良好。在本文中,我们首先提出一个令人惊讶的现象:如果我们尝试在广泛的功能上衡量最佳的TTA损失,那么我们恢复了与(温度缩放版本的)非常相似的函数帐篷采用的软磁性 - 凝集。但是,只有在我们正在适应的分类器通过跨凝结训练的情况下,这才能保持;如果通过平方损失训练,则会出现不同的最佳TTA损失。为了解释这一现象,我们通过训练损失的凸结合物分析了TTA。我们表明,在自然条件下,这种(无监督的)共轭功能可以看作是对原始监督损失的局部近似值,实际上,它恢复了元学习发现的最佳损失。这导致了一种通用食谱,可用于为通用类的任何给定监督培训损失功能找到良好的TTA损失。从经验上讲,我们的方法始终在广泛的基准测试中统治其他基线。当应用于新型损失功能的分类器时,我们的方法尤其令人感兴趣,例如,最近所传播的polyloss与基于熵的损失有很大的不同。此外,我们表明我们的方法也可以用非常特定的软标签解释为一种自我训练,我们将其称为共轭伪标记。总体而言,我们的方法为更好地理解和改善测试时间适应提供了广泛的框架。代码可在https://github.com/locuslab/tta_conjugate上找到。
translated by 谷歌翻译
随着预先训练模型的巨大成功,Pretrain-Then-Finetune范式已被广泛采用下游任务,以获得源代码的理解。但是,与昂贵的培训从头开始培训,如何将预先训练的模型从划痕进行有效地调整到新任务的训练模型尚未完全探索。在本文中,我们提出了一种桥接预先训练的模型和与代码相关任务的方法。我们利用语义保留的转换来丰富下游数据分集,并帮助预先接受的模型学习语义特征不变于这些语义上等效的转换。此外,我们介绍课程学习以易于努力的方式组织转换的数据,以微调现有的预先训练的模型。我们将我们的方法应用于一系列预先训练的型号,它们在源代码理解的任务中显着优于最先进的模型,例如算法分类,代码克隆检测和代码搜索。我们的实验甚至表明,在没有重量训练的代码数据上,自然语言预先训练的模型罗伯塔微调我们的轻质方法可以优于或竞争现有的代码,在上述任务中进行微调,如Codebert和Codebert和GraphCodebert。这一发现表明,代码预训练模型中仍有很大的改进空间。
translated by 谷歌翻译
部署在野外的机器学习系统通常在源分布上培训,但部署在不同的目标分布上。未标记的数据可以是用于缓解这些分布班次的强大的利用点,因为它通常比标记数据更具可用。然而,未标记数据的现有分配转换基准不反映现实世界应用中出现的方案的广度。在这项工作中,我们介绍了Wilds 2.0更新,该更新在分发转移的野外基准中扩展了10个数据集中的8个,以包括将在部署中逼真获得的策划未标记数据。为了保持一致性,标记的培训,验证和测试集以及评估度量与原始野外基准中的标记与评估度量完全相同。这些数据集涵盖了广泛的应用程序(从组织学到野生动物保护),任务(分类,回归和检测)和方式(照片,卫星图像,显微镜载玻片,文本,分子图)。我们系统地基准测试最先进的方法,可以利用未标记的数据,包括域不变,自我培训和自我监督方法,并表明他们在野外的成功2.0是有限的。为了方便方法开发和评估,我们提供了一个自动化数据加载的开源包,并包含本文中使用的所有模型架构和方法。代码和排行榜可在https://wilds.stanford.edu获得。
translated by 谷歌翻译
不变性于广泛的图像损坏,例如翘曲,噪声或颜色移位,是在计算机视觉中建立强大模型的一个重要方面。最近,已经提出了几种新的数据增强,从而显着提高了Imagenet-C的性能,这是这种腐败的基准。但是,对数据增强和测试时间损坏之间的关系仍然缺乏基本的理解。为此,我们开发了图像变换的一个特征空间,然后在增强和损坏之间使用该空间中的新措施,称为最小示例距离,以演示相似性和性能之间的强相关性。然后,当测试时间损坏被对来自Imagenet-C中的测试时间损坏被采样时,我们调查最近的数据增强并观察腐败鲁棒性的重大退化。我们的结果表明,通过对感知同类增强的培训来提高测试错误,数据增强可能不会超出现有的基准。我们希望我们的结果和工具将允许更强大的进展,以提高对图像损坏的稳健性。我们在https://github.com/facebookresearch/augmentation - 窗子提供代码。
translated by 谷歌翻译
由于他们学习灵活的数据驱动损失能力,生成的对抗性网络(GANS)是许多半和弱监督医学图像分割方法的一个组成部分。 Gans在一组培训数据上共同优化发电机和对抗性鉴别者。在训练完成之后,通常丢弃鉴别器,并且只使用发电机推断。但我们应该丢弃鉴别者吗?在这项工作中,我们认为训练稳定的鉴别者产生了表现性的损失函数,我们可以在推理中重复使用,以检测和\ yringit {正确}分割错误。首先,我们确定关键挑战,并提出可能的解决方案,使鉴别者在推理中重新使用。然后,我们表明我们可以将具有图像重建成本(通过解码器)的判别器结合在一起以赋予测试时间训练的因果角,并进一步改进模型。我们的方法简单,提高了预先培训的GAN的测试时间性能。此外,我们表明它与标准的后处理技术兼容,它有可能用于在线持续学习。通过我们的工作,我们开设新的研究途径,以便在推理中重新使用对抗性鉴别器。我们的代码可以在https://vios-s.github.io/addersarial-test-time -Time-Tome-Torion。
translated by 谷歌翻译
In this paper, we propose Test-Time Training, a general approach for improving the performance of predictive models when training and test data come from different distributions. We turn a single unlabeled test sample into a self-supervised learning problem, on which we update the model parameters before making a prediction. This also extends naturally to data in an online stream. Our simple approach leads to improvements on diverse image classification benchmarks aimed at evaluating robustness to distribution shifts.
translated by 谷歌翻译
尽管对图像分类任务的表现令人印象深刻,但深网络仍然难以概括其数据的许多常见损坏。为解决此漏洞,事先作品主要专注于提高其培训管道的复杂性,以多样性的名义结合多种方法。然而,在这项工作中,我们逐步回来并遵循原则的方法来实现共同腐败的稳健性。我们提出了一个普遍的数据增强方案,包括最大熵图像变换的简单系列。我们展示了Prime优于现有技术的腐败鲁棒性,而其简单和即插即用性质使其能够与其他方法结合以进一步提升其稳健性。此外,我们分析了对综合腐败图像混合策略的重要性,并揭示了在共同腐败背景下产生的鲁棒性准确性权衡的重要性。最后,我们表明我们的方法的计算效率允许它在线和离线数据增强方案轻松使用。
translated by 谷歌翻译