Training models that generalize to new domains at test time is a problem of fundamental importance in machine learning. In this work, we encode this notion of domain generalization using a novel regularization function. We pose the problem of finding such a regularization function in a Learning to Learn (or) metalearning framework. The objective of domain generalization is explicitly modeled by learning a regularizer that makes the model trained on one domain to perform well on another domain. Experimental validations on computer vision and natural language datasets indicate that our method can learn regularizers that achieve good cross-domain generalization.
translated by 谷歌翻译
Domain generalization (DG) is the challenging and topical problem of learning models that generalize to novel testing domains with different statistics than a set of known training domains. The simple approach of aggregating data from all source domains and training a single deep neural network end-to-end on all the data provides a surprisingly strong baseline that surpasses many prior published methods. In this paper we build on this strong baseline by designing an episodic training procedure that trains a single deep network in a way that exposes it to the domain shift that characterises a novel domain at runtime. Specifically, we decompose a deep network into feature extractor and classifier components, and then train each component by simulating it interacting with a partner who is badly tuned for the current domain. This makes both components more robust, ultimately leading to our networks producing state-of-the-art performance on three DG benchmarks. Furthermore, we consider the pervasive workflow of using an ImageNet trained CNN as a fixed feature extractor for downstream recognition tasks. Using the Visual Decathlon benchmark, we demonstrate that our episodic-DG training improves the performance of such a general purpose feature extractor by explicitly training a feature for robustness to novel problems. This shows that DG training can benefit standard practice in computer vision.
translated by 谷歌翻译
Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge about inter-class relationships. Locally, we promote domainindependent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.
translated by 谷歌翻译
Domain Adaptation is an actively researched problem in Computer Vision. In this work, we propose an approach that leverages unsupervised data to bring the source and target distributions closer in a learned joint feature space. We accomplish this by inducing a symbiotic relationship between the learned embedding and a generative adversarial network. This is in contrast to methods which use the adversarial framework for realistic data generation and retraining deep models with such data. We demonstrate the strength and generality of our approach by performing experiments on three different tasks with varying levels of difficulty: (1) Digit classification (MNIST, SVHN and USPS datasets) (2) Object recognition using OFFICE dataset and (3) Domain adaptation from synthetic to real data. Our method achieves state-of-the art performance in most experimental settings and by far the only GAN-based method that has been shown to work well across different datasets such as OFFICE and DIGITS.
translated by 谷歌翻译
We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains.The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of features that are (i) discriminative for the main learning task on the source domain and (ii) indiscriminate with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation and stochastic gradient descent, and can thus be implemented with little effort using any of the deep learning packages.We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application.
translated by 谷歌翻译
对分布(OOD)数据的概括是人类自然的能力,但对于机器而言挑战。这是因为大多数学习算法强烈依赖于i.i.d.〜对源/目标数据的假设,这在域转移导致的实践中通常会违反。域的概括(DG)旨在通过仅使用源数据进行模型学习来实现OOD的概括。在过去的十年中,DG的研究取得了长足的进步,导致了广泛的方法论,例如,基于域的一致性,元学习,数据增强或合奏学习的方法,仅举几例;还在各个应用领域进行了研究,包括计算机视觉,语音识别,自然语言处理,医学成像和强化学习。在本文中,首次提供了DG中的全面文献综述,以总结过去十年来的发展。具体而言,我们首先通过正式定义DG并将其与其他相关领域(如域适应和转移学习)联系起来来涵盖背景。然后,我们对现有方法和理论进行了彻底的审查。最后,我们通过有关未来研究方向的见解和讨论来总结这项调查。
translated by 谷歌翻译
元学习方法旨在构建能够快速适应低数据制度的新任务的学习算法。这种算法的主要基准之一是几次学习问题。在本文中,我们调查了在培训期间采用多任务方法的标准元学习管道的修改。该提出的方法同时利用来自常见损​​失函数中的几个元训练任务的信息。每个任务在损耗功能中的影响由相应的重量控制。正确优化这些权重可能对整个模型的训练产生很大影响,并且可能会提高测试时间任务的质量。在这项工作中,我们提出并调查了使用同时扰动随机近似(SPSA)方法的方法的使用方法,用于元列车任务权重优化。我们还将提出的算法与基于梯度的方法进行了比较,发现随机近似表明了测试时间最大的质量增强。提出的多任务修改可以应用于使用元学习管道的几乎所有方法。在本文中,我们研究了这种修改对CiFar-FS,FC100,TieredimAgenet和MiniimAgenet几秒钟学习基准的原型网络和模型 - 不可知的元学习算法。在这些实验期间,多任务修改已经证明了对原始方法的改进。所提出的SPSA跟踪算法显示了对最先进的元学习方法具有竞争力的最大精度提升。我们的代码可在线获取。
translated by 谷歌翻译
旨在概括在源域中训练的模型来看不见的目标域,域泛化(DG)最近引起了很多关注。 DG的关键问题是如何防止对观察到的源极域的过度接收,因为在培训期间目标域不可用。我们调查过度拟合不仅导致未经看不见的目标域的普遍推广能力,而且在测试阶段导致不稳定的预测。在本文中,我们观察到,在训练阶段采样多个任务并在测试阶段产生增强图像,很大程度上有利于泛化性能。因此,通过处理不同视图的任务和图像,我们提出了一种新颖的多视图DG框架。具体地,在训练阶段,为了提高泛化能力,我们开发了一种多视图正则化元学习算法,该算法采用多个任务在更新模型期间产生合适的优化方向。在测试阶段,为了减轻不稳定的预测,我们利用多个增强图像来产生多视图预测,这通过熔断测试图像的不同视图的结果显着促进了模型可靠性。三个基准数据集的广泛实验验证了我们的方法优于几种最先进的方法。
translated by 谷歌翻译
机器学习系统通常假设训练和测试分布是相同的。为此,关键要求是开发可以概括到未经看不见的分布的模型。领域泛化(DG),即分销概括,近年来引起了越来越令人利益。域概括处理了一个具有挑战性的设置,其中给出了一个或几个不同但相关域,并且目标是学习可以概括到看不见的测试域的模型。多年来,域概括地区已经取得了巨大进展。本文提出了对该地区最近进步的首次审查。首先,我们提供了域泛化的正式定义,并讨论了几个相关领域。然后,我们彻底审查了与域泛化相关的理论,并仔细分析了泛化背后的理论。我们将最近的算法分为三个类:数据操作,表示学习和学习策略,并为每个类别详细介绍几种流行的算法。第三,我们介绍常用的数据集,应用程序和我们的开放源代码库进行公平评估。最后,我们总结了现有文学,并为未来提供了一些潜在的研究主题。
translated by 谷歌翻译
少量分类旨在执行分类,因为只有利息类别的标记示例。尽管提出了几种方法,但大多数现有的几次射击学习(FSL)模型假设基础和新颖类是从相同的数据域中汲取的。在识别在一个看不见的域中的新型类数据方面,这成为域广义少量分类的更具挑战性的任务。在本文中,我们为域广义的少量拍摄分类提供了一个独特的学习框架,其中基类来自同质的多个源域,而要识别的新类是来自训练期间未见的目标域。通过推进元学习策略,我们的学习框架跨越多个源域利用数据来捕获域不变的功能,通过基于度量学习的机制跨越支持和查询数据来引入FSL能力。我们进行广泛的实验,以验证我们提出的学习框架和展示从小但同质源数据的效果,能够优选地对来自大规模的学习来执行。此外,我们为域广泛的少量分类提供了骨干模型的选择。
translated by 谷歌翻译
几乎没有学习方法的目的是训练模型,这些模型可以根据少量数据轻松适应以前看不见的任务。最受欢迎,最优雅的少学习方法之一是模型敏捷的元学习(MAML)。这种方法背后的主要思想是学习元模型的一般权重,该权重进一步适应了少数梯度步骤中的特定问题。但是,该模型的主要限制在于以下事实:更新过程是通过基于梯度的优化实现的。因此,MAML不能总是在一个甚至几个梯度迭代中将权重修改为基本水平。另一方面,使用许多梯度步骤会导致一个复杂且耗时的优化程序,这很难在实践中训练,并且可能导致过度拟合。在本文中,我们提出了HyperMAML,这是MAML的新型概括,其中更新过程的训练也是模型的一部分。也就是说,在HyperMAML中,我们没有使用梯度下降来更新权重,而是为此目的使用可训练的超级净机。因此,在此框架中,该模型可以生成重大更新,其范围不限于固定数量的梯度步骤。实验表明,超型MAML始终胜过MAML,并且在许多标准的几次学习基准测试基准中与其他最先进的技术相当。
translated by 谷歌翻译
无监督域适应(UDA)旨在将知识从标记的源域传输到未标记的目标域。传统上,基于子空间的方法为此问题形成了一类重要的解决方案。尽管他们的数学优雅和易腐烂性,但这些方法通常被发现在产生具有复杂的现实世界数据集的领域不变的功能时无效。由于近期具有深度网络的代表学习的最新进展,本文重新访问了UDA的子空间对齐,提出了一种新的适应算法,始终如一地导致改进的泛化。与现有的基于对抗培训的DA方法相比,我们的方法隔离了特征学习和分配对准步骤,并利用主要辅助优化策略来有效地平衡域不契约的目标和模型保真度。在提供目标数据和计算要求的显着降低的同时,基于子空间的DA竞争性,有时甚至优于几种标准UDA基准测试的最先进的方法。此外,子空间对准导致本质上定期的模型,即使在具有挑战性的部分DA设置中,也表现出强大的泛化。最后,我们的UDA框架的设计本身支持对测试时间的新目标域的逐步适应,而无需从头开始重新检测模型。总之,由强大的特征学习者和有效的优化策略提供支持,我们将基于子空间的DA建立为可视识别的高效方法。
translated by 谷歌翻译
基于元学习的现有方法通过从(源域)基础类别的培训任务中学到的元知识来预测(目标域)测试任务的新颖类标签。但是,由于范围内可能存在较大的域差异,大多数现有作品可能无法推广到新颖的类别。为了解决这个问题,我们提出了一种新颖的对抗特征增强(AFA)方法,以弥合域间隙,以几乎没有学习。该特征增强旨在通过最大化域差异来模拟分布变化。在对抗训练期间,通过将增强特征(看不见的域)与原始域(可见域)区分开来学习域歧视器,而将域差异最小化以获得最佳特征编码器。所提出的方法是一个插件模块,可以轻松地基于元学习的方式将其集成到现有的几种学习方法中。在九个数据集上进行的广泛实验证明了我们方法对跨域几乎没有射击分类的优越性,与最新技术相比。代码可从https://github.com/youthhoo/afa_for_few_shot_learning获得
translated by 谷歌翻译
Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. Due to the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning researches, as well as to summarize and interpret the mechanisms and the strategies of transfer learning in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Unlike previous surveys, this survey paper reviews more than forty representative transfer learning approaches, especially homogeneous transfer learning approaches, from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, over twenty representative transfer learning models are used for experiments. The models are performed on three different datasets, i.e., Amazon Reviews, Reuters-21578, and Office-31. And the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.
translated by 谷歌翻译
Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias. Fine-tuning deep models in a new domain can require a significant amount of labeled data, which for many applications is simply not available. We propose a new CNN architecture to exploit unlabeled and sparsely labeled target domain data. Our approach simultaneously optimizes for domain invariance to facilitate domain transfer and uses a soft label distribution matching loss to transfer information between tasks. Our proposed adaptation method offers empirical performance which exceeds previously published results on two standard benchmark visual domain adaptation tasks, evaluated across supervised and semi-supervised adaptation settings.
translated by 谷歌翻译
This work provides a unified framework for addressing the problem of visual supervised domain adaptation and generalization with deep models. The main idea is to exploit the Siamese architecture to learn an embedding subspace that is discriminative, and where mapped visual domains are semantically aligned and yet maximally separated. The supervised setting becomes attractive especially when only few target data samples need to be labeled. In this scenario, alignment and separation of semantic probability distributions is difficult because of the lack of data. We found that by reverting to point-wise surrogates of distribution distances and similarities provides an effective solution. In addition, the approach has a high "speed" of adaptation, which requires an extremely low number of labeled target training samples, even one per category can be effective. The approach is extended to domain generalization. For both applications the experiments show very promising results.
translated by 谷歌翻译
当源(训练)数据和目标(测试)数据之间存在域移动时,深网很容易降级。最近的测试时间适应方法更新了通过流数据部署在新目标环境中的预训练源模型的批归归式层,以减轻这种性能降低。尽管此类方法可以在不首先收集大型目标域数据集的情况下进行调整,但它们的性能取决于流媒体条件,例如迷你批量的大小和类别分布,在实践中可能无法预测。在这项工作中,我们提出了一个框架,以适应几个域的适应性,以应对数据有效适应的实际挑战。具体而言,我们提出了在预训练的源模型中对特征归一化统计量的约束优化,该模型由目标域的小支持集监督。我们的方法易于实现,并改善每类用于分类任务的示例较小的源模型性能。对5个跨域分类和4个语义分割数据集进行了广泛的实验表明,我们的方法比测试时间适应更准确,更可靠,同时不受流媒体条件的约束。
translated by 谷歌翻译
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two fewshot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.
translated by 谷歌翻译
在几个真实的世界应用中,部署机器学习模型以使数据对分布逐渐变化的数据进行预测,导致火车和测试分布之间的漂移。这些模型通常会定期在新数据上重新培训,因此他们需要概括到未来的数据。在这种情况下,有很多关于提高时间概括的事先工作,例如,过去数据的连续运输,内核平滑时间敏感参数,最近,越来越多的时间不变的功能。但是,这些方法共享了几个限制,例如可扩展性差,培训不稳定,以及未来未标记数据的依赖性。响应上述限制,我们提出了一种简单的方法,该方法以时间敏感的参数开头,但使用梯度插值(GI)丢失来规则地规则化其时间复杂度。 GI允许决策边界沿着时间改变,并且仍然可以通过允许特定于时间的改变来防止对有限训练时间快照的过度接种。我们将我们的方法与多个实际数据集的现有基线进行比较,这表明GI一方面优于更加复杂的生成和对抗方法,另一方面更简单地梯度正则化方法。
translated by 谷歌翻译
大多数现代无人监督域适应(UDA)方法源于域对齐,即,学习源和目标功能,使用源标签学习目标域分类器。在半监督域适应(SSDA)中,当学习者可以访问少量目标域标签时,先前的方法遵循UDA理论以使用域对齐进行学习。我们表明SSDA的情况是不同的,并且可以在不需要对齐的情况下学习良好的目标分类器。我们使用自我监督的预测(通过旋转预测)和一致性正则化来实现良好的分开的目标集群,同时在学习低误差目标分类器时。凭借我们预先推价和一致性(PAC)方法,我们在该半监控域适应任务上实现了最新的目标准确性,超过了多个数据集的多个对抗域对齐方法。 PAC,同时使用简单的技术,对DomainNet和Visda-17等大而挑战的SSDA基准进行了非常好的,通常通过相当的边距来表现最近的艺术状态。我们的实验代码可以在https://github.com/venkatesh-saligrama/pac找到
translated by 谷歌翻译