自然语言处理(NLP)算法正在迅速改善,但在应用于分布的示例时通常会挣扎。减轻域间隙的突出方法是域的适应性,其中在源域上训练的模型适应了新的目标域。我们提出了一种新的学习设置,``从头开始适应域名'',我们认为这对于以隐私的方式将NLP的覆盖范围扩展到敏感域至关重要。在此设置中,我们旨在有效地从一组源域中注释数据,以便训练有素的模型在敏感的目标域上表现良好,从而从中无法从中获得注释。我们的研究将这种具有挑战性的设置的几种方法比较,从数据选择和域适应算法到主动学习范式,在两个NLP任务上:情感分析和命名实体识别。我们的结果表明,使用上述方法可以缓解域间隙,并将其组合进一步改善结果。
translated by 谷歌翻译
最近,无监督的域适应是一种有效的范例,用于概括深度神经网络到新的目标域。但是,仍有巨大的潜力才能达到完全监督的性能。在本文中,我们提出了一种新颖的主动学习策略,以帮助目标域中的知识转移,有效域适应。我们从观察开始,即当训练(源)和测试(目标)数据来自不同的分布时,基于能量的模型表现出自由能量偏差。灵感来自这种固有的机制,我们经验揭示了一种简单而有效的能源 - 基于能量的采样策略揭示了比需要特定架构或距离计算的现有方法的最有价值的目标样本。我们的算法,基于能量的活动域适应(EADA),查询逻辑数据组,它将域特征和实例不确定性结合到每个选择回合中。同时,通过通过正则化术语对准源域周围的目标数据紧凑的自由能,可以隐含地减少域间隙。通过广泛的实验,我们表明EADA在众所周知的具有挑战性的基准上超越了最先进的方法,具有实质性的改进,使其成为开放世界中的一个有用的选择。代码可在https://github.com/bit-da/eada获得。
translated by 谷歌翻译
输入分布转移是无监督域适应(UDA)中的重要问题之一。最受欢迎的UDA方法集中在域不变表示学习上,试图将不同域中的功能调整为相似的特征分布。但是,这些方法忽略了域之间的输入单词分布的直接对齐,这是单词级分类任务(例如跨域NER)的重要因素。在这项工作中,我们通过引入子词级解决方案X-Pience来为输入单词级分布移动,从而为跨域NER开发了新的灯光。具体而言,我们将源域的输入单词重新划分以接近目标子词分布,该分布是作为最佳运输问题制定和解决的。由于这种方法着重于输入级别,因此它也可以与先前的DIRL方法相结合,以进一步改进。实验结果表明,基于四个基准NER数据集的Bert-Tagger所提出的方法的有效性。同样,事实证明,所提出的方法受益于诸如Dann之类的DIRL方法。
translated by 谷歌翻译
Neural approaches have become very popular in the domain of Question Answering, however they require a large amount of annotated data. Furthermore, they often yield very good performance but only in the domain they were trained on. In this work we propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low resource settings, where the target domains are diverse in terms of difficulty and similarity to the source domain. We also investigate Active Learning for question answering in different stages, overall reducing the annotation effort of humans. For this purpose, we consider target domains in realistic settings, with an extremely low amount of annotated samples but with many unlabeled documents, which we assume can be obtained with little effort. Additionally, we assume sufficient amount of labeled data from the source domain is available. We perform extensive experiments to find the best setup for incorporating domain experts. Our findings show that our novel approach, where humans are incorporated as early as possible in the process, boosts performance in the low-resource, domain-specific setting, allowing for low-labeling-effort question answering systems in new, specialized domains. They further demonstrate how human annotation affects the performance of QA depending on the stage it is performed.
translated by 谷歌翻译
最近的工作表明,在适应新域时,域名语言模型可以提高性能。但是,与培训前提出的成本提出了一个重要问题:给出了固定预算,NLP从业者应该采取哪些步骤来最大限度地提高绩效?在本文中,我们在预算限制下研究域适应,并将其作为数据注释和预培训之间的客户选择问题。具体而言,我们测量三个程序文本数据集的注释成本以及三种域语言模型的预培训成本。然后,我们评估不同预算限制下的预训练和数据注释的不同组合的效用,以评估哪种组合策略最佳效果。我们发现,对于小预算,支出所有资金都会导致最佳表现;一旦预算变得足够大,数据注释和域内预训练的组合更优先。因此,我们建议任务特定的数据注释应该是在将NLP模型调整到新域时的经济策略的一部分。
translated by 谷歌翻译
Active域适应(ADA)查询所选目标样本的标签,以帮助将模型从相关的源域调整为目标域。由于其有希望的表现,标签成本最少,因此最近引起了人们越来越多的关注。然而,现有的ADA方法尚未完全利用查询数据的局部环境,这对ADA很重要,尤其是当域间隙较大时。在本文中,我们提出了一个局部环境感知的活动域适应性(LADA)的新框架,该框架由两个关键模块组成。本地上下文感知的活动选择(LAS)模块选择其类概率预测与邻居不一致的目标样本。局部上下文感知模型适应(LMA)模块完善了具有查询样本及其扩展的邻居的模型,并由上下文保留损失正规化。广泛的实验表明,与现有的主动选择策略相比,LAS选择了更多的信息样本。此外,配备了LMA,整个LADA方法的表现优于各种基准测试的最先进的ADA解决方案。代码可在https://github.com/tsun/lada上找到。
translated by 谷歌翻译
我们提出了一种新颖的方法,即沙拉,用于将预先训练的“源”域网络适应“目标”域的挑战性视觉任务,在“目标”域中注释的预算很小,标签空间的变化。此外,该任务假定由于隐私问题或其他方式,源数据无法适应。我们假设这样的系统需要共同优化(i)从目标域中选择固定数量的样本以进行注释的双重任务,以及(ii)知识从预训练的网络转移到目标域。为此,沙拉由一个新颖的引导注意转移网络(GATN)和一个主动学习功能组成。 GATN启用了从预训练的网络到目标网络的特征蒸馏,并与HAL采用的转移性和不确定性标准相辅相成。沙拉有三个关键的好处:(i)它是任务不合时宜的,可以在各种视觉任务(例如分类,分割和检测)中应用; (ii)它可以处理从预训练的源网络到目标域的输出标签空间的变化; (iii)它不需要访问源数据进行适应。我们对3个视觉任务进行了广泛的实验,即。数字分类(MNIST,SVHN,VISDA),合成(GTA5)与真实(CityScapes)图像分割和文档布局检测(PublayNet to DSSE)。我们表明,我们的无源方法(沙拉)比先前的适应方法提高了0.5%-31.3%(跨数据集和任务),该方法假设访问大量带注释的源数据以进行适应。
translated by 谷歌翻译
多任务学习,其中几个任务是通过单个模型共同学习的,允许NLP模型共享来自多个注释的信息,并在任务相互关联时可以促进更好的预测。但是,这项技术需要用多个注释方案对相同的文本进行注释,这可能是昂贵和费力的。活跃学习(AL)已被证明可以通过迭代选择对NLP模型最有价值的未标记示例来优化注释过程。然而,多任务主动学习(MT-AL)尚未应用于最新的基于预训练的变压器的NLP模型。本文旨在缩小这一差距。我们在三个现实的多任务场景中探索了各种多任务选择标准,反映了参与任务之间的不同关系,并与单任务选择相比演示了多任务的有效性。我们的结果表明,可以有效地使用MT-AL,以最大程度地减少多任务NLP模型的注释工作。
translated by 谷歌翻译
Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task. We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, showing that a second phase of pretraining indomain (domain-adaptive pretraining) leads to performance gains, under both high-and low-resource settings. Moreover, adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining. Finally, we show that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Overall, we consistently find that multiphase adaptive pretraining offers large gains in task performance.
translated by 谷歌翻译
在域适应性中,当源和目标域之间存在较大距离时,预测性能将降低。假设我们可以访问中间域,从源逐渐从源转移到目标域,则逐渐的域适应性是解决此类问题的解决方案之一。在以前的工作中,假定中间域中的样品数量足够大。因此,无需标记数据就可以进行自我训练。如果限制了可访问的中间域的数量,则域之间的距离变得很大,并且自我训练将失败。实际上,中间域中样品的成本会有所不同,自然可以考虑到中间域越接近目标域,从中间域中获得样品的成本就越高。为了解决成本和准确性之间的权衡,我们提出了一个结合了多重率和主动领域适应性的框架。通过使用现实世界数据集的实验来评估所提出方法的有效性。
translated by 谷歌翻译
在域适应领域,模型性能与目标域注释的数量之间存在权衡。积极的学习,最大程度地提高了模型性能,几乎没有信息的标签数据,以方便这种情况。在这项工作中,我们提出了D2ADA,这是用于语义分割的一般活动域的适应框架。为了使模型使用最小查询标签调整到目标域,我们提出了在目标域中具有高概率密度的样品的获取标签,但源域中的概率密度较低,与现有源域标记的数据互补。为了进一步提高标签效率,我们设计了动态的调度策略,以调整域探索和模型不确定性之间的标签预算。广泛的实验表明,我们的方法的表现优于现有的活跃学习和域适应基线,这两个基准测试基准,GTA5-> CityScapes和Synthia-> CityScapes。对于目标域注释不到5%,我们的方法与完全监督的结果可比结果。我们的代码可在https://github.com/tsunghan-wu/d2ada上公开获取。
translated by 谷歌翻译
自然语言理解(NLU)通过大型基准驱动的大规模进展,与转让学习的研究配对扩大其影响。基准是由一小部分频繁现象的主导,留下了一条长长的不常见现象。在这项工作中,我们反映了问题:转移学习方法足够地解决了长尾的基准训练模型的表现吗?由于基准未列出包括/排除的现象,我们使用宏观级别的宏观尺寸(如经验丰富的类型,主题等)概念化。我们评估通过100个代表性论文转让学习的定性荟萃分析来转移学习研究的趋势nlu。我们的分析问了三个问题:(i)哪个长尾尺寸进行转移学习研究目标? (ii)哪种特性有助于适应方法改善长尾的性能? (iii)哪种方法差距对长尾性能有最大的负面影响?我们对这些问题的答案突出了在长尾的转让学习中的未来研究的主要途径。最后,我们展示了一个案例研究,比较了各种适应方法对临床叙事的性能,以表明系统性开展的元实验如何提供能够沿着这些未来的途径取得进展的见解。
translated by 谷歌翻译
无监督域适应(UDA)旨在将知识从相关但不同的良好标记的源域转移到新的未标记的目标域。大多数现有的UDA方法需要访问源数据,因此当数据保密而不相配在隐私问题时,不适用。本文旨在仅使用培训的分类模型来解决现实设置,而不是访问源数据。为了有效地利用适应源模型,我们提出了一种新颖的方法,称为源假设转移(拍摄),其通过将目标数据特征拟合到冻结源分类模块(表示分类假设)来学习目标域的特征提取模块。具体而言,拍摄挖掘出于特征提取模块的信息最大化和自我监督学习,以确保目标特征通过同一假设与看不见的源数据的特征隐式对齐。此外,我们提出了一种新的标签转移策略,它基于预测的置信度(标签信息),然后采用半监督学习来将目标数据分成两个分裂,然后提高目标域中的较为自信预测的准确性。如果通过拍摄获得预测,我们表示标记转移为拍摄++。关于两位数分类和对象识别任务的广泛实验表明,拍摄和射击++实现了与最先进的结果超越或相当的结果,展示了我们对各种视域适应问题的方法的有效性。代码可用于\ url {https://github.com/tim-learn/shot-plus}。
translated by 谷歌翻译
积极的学习有效地收集了无标记的数据以进行注释,从而减少了对标记数据的需求。在这项工作中,我们建议以局部灵敏度和硬度感知的获取功能检索未标记的样品。所提出的方法通过局部扰动生成数据副本,并选择其预测可能性与其副本最大的数据点。我们通过注入选择的情况扰动来进一步增强我们的采集功能。我们的方法可以在各种分类任务中对常用的活跃学习策略获得一致的收益。此外,我们在基于迅速的几次学习中迅速选择的研究中观察到对基准的持续改进。这些实验表明,我们以局部敏感性和硬度为指导的获取对许多NLP任务都是有效和有益的。
translated by 谷歌翻译
在这项工作中,我们以一种充满挑战的自我监督方法研究无监督的领域适应性(UDA)。困难之一是如何在没有目标标签的情况下学习任务歧视。与以前的文献直接使跨域分布或利用反向梯度保持一致,我们建议域混淆对比度学习(DCCL),以通过域难题桥接源和目标域,并在适应后保留歧视性表示。从技术上讲,DCCL搜索了最大的挑战方向,而精美的工艺领域将增强型混淆为正对,然后对比鼓励该模型向其他领域提取陈述,从而学习更稳定和有效的域名。我们还研究对比度学习在执行其他数据增强时是否必然有助于UDA。广泛的实验表明,DCCL明显优于基准。
translated by 谷歌翻译
State-of-the-art 3D semantic segmentation models are trained on the off-the-shelf public benchmarks, but they often face the major challenge when these well-trained models are deployed to a new domain. In this paper, we propose an Active-and-Adaptive Segmentation (ADAS) baseline to enhance the weak cross-domain generalization ability of a well-trained 3D segmentation model, and bridge the point distribution gap between domains. Specifically, before the cross-domain adaptation stage begins, ADAS performs an active sampling operation to select a maximally-informative subset from both source and target domains for effective adaptation, reducing the adaptation difficulty under 3D scenarios. Benefiting from the rise of multi-modal 2D-3D datasets, ADAS utilizes a cross-modal attention-based feature fusion module that can extract a representative pair of image features and point features to achieve a bi-directional image-point feature interaction for better safe adaptation. Experimentally, ADAS is verified to be effective in many cross-domain settings including: 1) Unsupervised Domain Adaptation (UDA), which means that all samples from target domain are unlabeled; 2) Unsupervised Few-shot Domain Adaptation (UFDA) which means that only a few unlabeled samples are available in the unlabeled target domain; 3) Active Domain Adaptation (ADA) which means that the selected target samples by ADAS are manually annotated. Their results demonstrate that ADAS achieves a significant accuracy gain by easily coupling ADAS with self-training methods or off-the-shelf UDA works.
translated by 谷歌翻译
Top-performing deep architectures are trained on massive amounts of labeled data. In the absence of labeled data for a certain task, domain adaptation often provides an attractive option given that labeled data of similar nature but from a different domain (e.g. synthetic images) are available. Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled targetdomain data is necessary).As the training progresses, the approach promotes the emergence of "deep" features that are (i) discriminative for the main learning task on the source domain and (ii) invariant with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a simple new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation.Overall, the approach can be implemented with little effort using any of the deep-learning packages. The method performs very well in a series of image classification experiments, achieving adaptation effect in the presence of big domain shifts and outperforming previous state-ofthe-art on Office datasets.
translated by 谷歌翻译
Unlike human learning, machine learning often fails to handle changes between training (source) and test (target) input distributions. Such domain shifts, common in practical scenarios, severely damage the performance of conventional machine learning methods. Supervised domain adaptation methods have been proposed for the case when the target data have labels, including some that perform very well despite being "frustratingly easy" to implement. However, in practice, the target domain is often unlabeled, requiring unsupervised adaptation. We propose a simple, effective, and efficient method for unsupervised domain adaptation called CORrelation ALignment (CORAL). CORAL minimizes domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels. Even though it is extraordinarily simple-it can be implemented in four lines of Matlab code-CORAL performs remarkably well in extensive evaluations on standard benchmark datasets."Everything should be made as simple as possible, but not simpler."
translated by 谷歌翻译
We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains.The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of features that are (i) discriminative for the main learning task on the source domain and (ii) indiscriminate with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation and stochastic gradient descent, and can thus be implemented with little effort using any of the deep learning packages.We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application.
translated by 谷歌翻译
Natural Language Inference (NLI) or Recognizing Textual Entailment (RTE) aims at predicting the relation between a pair of sentences (premise and hypothesis) as entailment, contradiction or semantic independence. Although deep learning models have shown promising performance for NLI in recent years, they rely on large scale expensive human-annotated datasets. Semi-supervised learning (SSL) is a popular technique for reducing the reliance on human annotation by leveraging unlabeled data for training. However, despite its substantial success on single sentence classification tasks where the challenge in making use of unlabeled data is to assign "good enough" pseudo-labels, for NLI tasks, the nature of unlabeled data is more complex: one of the sentences in the pair (usually the hypothesis) along with the class label are missing from the data and require human annotations, which makes SSL for NLI more challenging. In this paper, we propose a novel way to incorporate unlabeled data in SSL for NLI where we use a conditional language model, BART to generate the hypotheses for the unlabeled sentences (used as premises). Our experiments show that our SSL framework successfully exploits unlabeled data and substantially improves the performance of four NLI datasets in low-resource settings. We release our code at: https://github.com/msadat3/SSL_for_NLI.
translated by 谷歌翻译