我们提出了一种新颖的方法,即沙拉,用于将预先训练的“源”域网络适应“目标”域的挑战性视觉任务,在“目标”域中注释的预算很小,标签空间的变化。此外,该任务假定由于隐私问题或其他方式,源数据无法适应。我们假设这样的系统需要共同优化(i)从目标域中选择固定数量的样本以进行注释的双重任务,以及(ii)知识从预训练的网络转移到目标域。为此,沙拉由一个新颖的引导注意转移网络(GATN)和一个主动学习功能组成。 GATN启用了从预训练的网络到目标网络的特征蒸馏,并与HAL采用的转移性和不确定性标准相辅相成。沙拉有三个关键的好处:(i)它是任务不合时宜的,可以在各种视觉任务(例如分类,分割和检测)中应用; (ii)它可以处理从预训练的源网络到目标域的输出标签空间的变化; (iii)它不需要访问源数据进行适应。我们对3个视觉任务进行了广泛的实验,即。数字分类(MNIST,SVHN,VISDA),合成(GTA5)与真实(CityScapes)图像分割和文档布局检测(PublayNet to DSSE)。我们表明,我们的无源方法(沙拉)比先前的适应方法提高了0.5%-31.3%(跨数据集和任务),该方法假设访问大量带注释的源数据以进行适应。
translated by 谷歌翻译
语义分割在广泛的计算机视觉应用中起着基本作用,提供了全球对图像​​的理解的关键信息。然而,最先进的模型依赖于大量的注释样本,其比在诸如图像分类的任务中获得更昂贵的昂贵的样本。由于未标记的数据替代地获得更便宜,因此无监督的域适应达到了语义分割社区的广泛成功并不令人惊讶。本调查致力于总结这一令人难以置信的快速增长的领域的五年,这包含了语义细分本身的重要性,以及将分段模型适应新环境的关键需求。我们提出了最重要的语义分割方法;我们对语义分割的域适应技术提供了全面的调查;我们揭示了多域学习,域泛化,测试时间适应或无源域适应等较新的趋势;我们通过描述在语义细分研究中最广泛使用的数据集和基准测试来结束本调查。我们希望本调查将在学术界和工业中提供具有全面参考指导的研究人员,并有助于他们培养现场的新研究方向。
translated by 谷歌翻译
Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source ground truth labels to the target domain is of great interest. In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation. Considering semantic segmentations as structured outputs that contain spatial similarities between the source and target domains, we adopt adversarial learning in the output space. To further enhance the adapted model, we construct a multi-level adversarial network to effectively perform output space domain adaptation at different feature levels. Extensive experiments and ablation study are conducted under various domain adaptation settings, including synthetic-to-real and cross-city scenarios. We show that the proposed method performs favorably against the stateof-the-art methods in terms of accuracy and visual quality.
translated by 谷歌翻译
自我训练具有极大的促进域自适应语义分割,它迭代地在目标域上生成伪标签并删除网络。然而,由于现实分割数据集是高度不平衡的,因此目标伪标签通常偏置到多数类并且基本上嘈杂,导致出错和次优模型。为了解决这个问题,我们提出了一个基于区域的主动学习方法,用于在域移位下进行语义分割,旨在自动查询要标记的图像区域的小分区,同时最大化分割性能。我们的算法,通过区域杂质和预测不确定性(AL-RIPU)的主动学习,介绍了一种新的采集策略,其特征在于图像区域的空间邻接以及预测置信度。我们表明,所提出的基于地区的选择策略比基于图像或基于点的对应物更有效地使用有限预算。同时,我们在源图像上强制在像素和其最近邻居之间的局部预测一致性。此外,我们制定了负面学习损失,以提高目标领域的鉴别表现。广泛的实验表明,我们的方法只需要极少的注释几乎达到监督性能,并且大大优于最先进的方法。
translated by 谷歌翻译
Deep learning has produced state-of-the-art results for a variety of tasks. While such approaches for supervised learning have performed well, they assume that training and testing data are drawn from the same distribution, which may not always be the case. As a complement to this challenge, single-source unsupervised domain adaptation can handle situations where a network is trained on labeled data from a source domain and unlabeled data from a related but different target domain with the goal of performing well at test-time on the target domain. Many single-source and typically homogeneous unsupervised deep domain adaptation approaches have thus been developed, combining the powerful, hierarchical representations from deep learning with domain adaptation to reduce reliance on potentially-costly target data labels. This survey will compare these approaches by examining alternative methods, the unique and common elements, results, and theoretical insights. We follow this with a look at application areas and open research directions.
translated by 谷歌翻译
State-of-the-art 3D semantic segmentation models are trained on the off-the-shelf public benchmarks, but they often face the major challenge when these well-trained models are deployed to a new domain. In this paper, we propose an Active-and-Adaptive Segmentation (ADAS) baseline to enhance the weak cross-domain generalization ability of a well-trained 3D segmentation model, and bridge the point distribution gap between domains. Specifically, before the cross-domain adaptation stage begins, ADAS performs an active sampling operation to select a maximally-informative subset from both source and target domains for effective adaptation, reducing the adaptation difficulty under 3D scenarios. Benefiting from the rise of multi-modal 2D-3D datasets, ADAS utilizes a cross-modal attention-based feature fusion module that can extract a representative pair of image features and point features to achieve a bi-directional image-point feature interaction for better safe adaptation. Experimentally, ADAS is verified to be effective in many cross-domain settings including: 1) Unsupervised Domain Adaptation (UDA), which means that all samples from target domain are unlabeled; 2) Unsupervised Few-shot Domain Adaptation (UFDA) which means that only a few unlabeled samples are available in the unlabeled target domain; 3) Active Domain Adaptation (ADA) which means that the selected target samples by ADAS are manually annotated. Their results demonstrate that ADAS achieves a significant accuracy gain by easily coupling ADAS with self-training methods or off-the-shelf UDA works.
translated by 谷歌翻译
We propose a general framework for unsupervised domain adaptation, which allows deep neural networks trained on a source domain to be tested on a different target domain without requiring any training annotations in the target domain. This is achieved by adding extra networks and losses that help regularize the features extracted by the backbone encoder network. To this end we propose the novel use of the recently proposed unpaired image-toimage translation framework to constrain the features extracted by the encoder network. Specifically, we require that the features extracted are able to reconstruct the images in both domains. In addition we require that the distribution of features extracted from images in the two domains are indistinguishable. Many recent works can be seen as specific cases of our general framework. We apply our method for domain adaptation between MNIST, USPS, and SVHN datasets, and Amazon, Webcam and DSLR Office datasets in classification tasks, and also between GTA5 and Cityscapes datasets for a segmentation task. We demonstrate state of the art performance on each of these datasets.
translated by 谷歌翻译
在域适应领域,模型性能与目标域注释的数量之间存在权衡。积极的学习,最大程度地提高了模型性能,几乎没有信息的标签数据,以方便这种情况。在这项工作中,我们提出了D2ADA,这是用于语义分割的一般活动域的适应框架。为了使模型使用最小查询标签调整到目标域,我们提出了在目标域中具有高概率密度的样品的获取标签,但源域中的概率密度较低,与现有源域标记的数据互补。为了进一步提高标签效率,我们设计了动态的调度策略,以调整域探索和模型不确定性之间的标签预算。广泛的实验表明,我们的方法的表现优于现有的活跃学习和域适应基线,这两个基准测试基准,GTA5-> CityScapes和Synthia-> CityScapes。对于目标域注释不到5%,我们的方法与完全监督的结果可比结果。我们的代码可在https://github.com/tsunghan-wu/d2ada上公开获取。
translated by 谷歌翻译
跨数据集的语义细分的域适应性,由相同类别组成,已经获得了一些最近的成功。但是,更一般的情况是源和目标数据集对应于非重叠标签空间时。例如,分割数据集中的类别根据环境或应用程序的类型发生了很大变化,但共享许多有价值的语义关系。基于特征对齐或差异最小化的现有方法不会考虑此类类别的转移。在这项工作中,我们提出了群集到适应(C2A),这是一种基于计算有效的聚类方法,用于跨分割数据集的域适应性,这些方法完全不同但可能相关类别。我们表明,在变换的特征空间中强制执行的这种聚类目标可以自动选择跨源和目标域的类别,这些类别可以对齐以改善目标性能,同时防止对无关类别的负转移。我们通过实验对室外的挑战性问题进行了实验,以少量拍摄和零拍设置来证明室内适应性的挑战性问题,在所有情况下,性能对现有方法和基准的绩效持续改善。
translated by 谷歌翻译
最近,无监督的域适应是一种有效的范例,用于概括深度神经网络到新的目标域。但是,仍有巨大的潜力才能达到完全监督的性能。在本文中,我们提出了一种新颖的主动学习策略,以帮助目标域中的知识转移,有效域适应。我们从观察开始,即当训练(源)和测试(目标)数据来自不同的分布时,基于能量的模型表现出自由能量偏差。灵感来自这种固有的机制,我们经验揭示了一种简单而有效的能源 - 基于能量的采样策略揭示了比需要特定架构或距离计算的现有方法的最有价值的目标样本。我们的算法,基于能量的活动域适应(EADA),查询逻辑数据组,它将域特征和实例不确定性结合到每个选择回合中。同时,通过通过正则化术语对准源域周围的目标数据紧凑的自由能,可以隐含地减少域间隙。通过广泛的实验,我们表明EADA在众所周知的具有挑战性的基准上超越了最先进的方法,具有实质性的改进,使其成为开放世界中的一个有用的选择。代码可在https://github.com/bit-da/eada获得。
translated by 谷歌翻译
Deep domain adaptation has emerged as a new learning technique to address the lack of massive amounts of labeled data. Compared to conventional methods, which learn shared feature subspaces or reuse important source instances with shallow representations, deep domain adaptation methods leverage deep networks to learn more transferable representations by embedding domain adaptation in the pipeline of deep learning. There have been comprehensive surveys for shallow domain adaptation, but few timely reviews the emerging deep learning based methods. In this paper, we provide a comprehensive survey of deep domain adaptation methods for computer vision applications with four major contributions. First, we present a taxonomy of different deep domain adaptation scenarios according to the properties of data that define how two domains are diverged. Second, we summarize deep domain adaptation approaches into several categories based on training loss, and analyze and compare briefly the state-of-the-art methods under these categories. Third, we overview the computer vision applications that go beyond image classification, such as face recognition, semantic segmentation and object detection. Fourth, some potential deficiencies of current methods and several future directions are highlighted.
translated by 谷歌翻译
我们建议利用模拟的潜力,以域的概括方式对现实世界自动驾驶场景的语义分割。对分割网络进行了训练,没有任何目标域数据,并在看不见的目标域进行了测试。为此,我们提出了一种新的域随机化和金字塔一致性的方法,以学习具有高推广性的模型。首先,我们建议使用辅助数据集以视觉外观的方式随机将合成图像随机化,以有效地学习域不变表示。其次,我们进一步在不同的“风格化”图像和图像中实施了金字塔一致性,以分别学习域不变和规模不变的特征。关于从GTA和合成对城市景观,BDD和Mapillary的概括进行了广泛的实验;而我们的方法比最新技术取得了卓越的成果。值得注意的是,我们的概括结果与最先进的模拟域适应方法相比甚至更好,甚至比在训练时访问目标域数据的结果。
translated by 谷歌翻译
受益于从特定情况(源)收集的相当大的像素级注释,训练有素的语义分段模型表现得非常好,但由于大域移位而导致的新情况(目标)失败。为了缓解域间隙,先前的跨域语义分段方法始终在域对齐期间始终假设源数据和目标数据的共存。但是,在实际方案中访问源数据可能会引发隐私问题并违反知识产权。为了解决这个问题,我们专注于一个有趣和具有挑战性的跨域语义分割任务,其中仅向目标域提供训练源模型。具体地,我们提出了一种称为ATP的统一框架,其包括三种方案,即特征对准,双向教学和信息传播。首先,我们设计了课程熵最小化目标,以通过提供的源模型隐式对准目标功能与看不见的源特征。其次,除了vanilla自我训练中的正伪标签外,我们是第一个向该领域引入负伪标签的,并开发双向自我训练策略,以增强目标域中的表示学习。最后,采用信息传播方案来通过伪半监督学习进一步降低目标域内的域内差异。综合与跨城市驾驶数据集的广泛结果验证\ TextBF {ATP}产生最先进的性能,即使是需要访问源数据的方法。
translated by 谷歌翻译
In this work, we connect two distinct concepts for unsupervised domain adaptation: feature distribution alignment between domains by utilizing the task-specific decision boundary [58] and the Wasserstein metric [73]. Our proposed sliced Wasserstein discrepancy (SWD) is designed to capture the natural notion of dissimilarity between the outputs of task-specific classifiers. It provides a geometrically meaningful guidance to detect target samples that are far from the support of the source and enables efficient distribution alignment in an end-to-end trainable fashion. In the experiments, we validate the effectiveness and genericness of our method on digit and sign recognition, image classification, semantic segmentation, and object detection.
translated by 谷歌翻译
无监督域适应(UDA)旨在将知识从相关但不同的良好标记的源域转移到新的未标记的目标域。大多数现有的UDA方法需要访问源数据,因此当数据保密而不相配在隐私问题时,不适用。本文旨在仅使用培训的分类模型来解决现实设置,而不是访问源数据。为了有效地利用适应源模型,我们提出了一种新颖的方法,称为源假设转移(拍摄),其通过将目标数据特征拟合到冻结源分类模块(表示分类假设)来学习目标域的特征提取模块。具体而言,拍摄挖掘出于特征提取模块的信息最大化和自我监督学习,以确保目标特征通过同一假设与看不见的源数据的特征隐式对齐。此外,我们提出了一种新的标签转移策略,它基于预测的置信度(标签信息),然后采用半监督学习来将目标数据分成两个分裂,然后提高目标域中的较为自信预测的准确性。如果通过拍摄获得预测,我们表示标记转移为拍摄++。关于两位数分类和对象识别任务的广泛实验表明,拍摄和射击++实现了与最先进的结果超越或相当的结果,展示了我们对各种视域适应问题的方法的有效性。代码可用于\ url {https://github.com/tim-learn/shot-plus}。
translated by 谷歌翻译
深度学习极大地提高了语义细分的性能,但是,它的成功依赖于大量注释的培训数据的可用性。因此,许多努力致力于域自适应语义分割,重点是将语义知识从标记的源域转移到未标记的目标域。现有的自我训练方法通常需要多轮训练,而基于对抗训练的另一个流行框架已知对超参数敏感。在本文中,我们提出了一个易于训练的框架,该框架学习了域自适应语义分割的域不变原型。特别是,我们表明域的适应性与很少的学习共享一个共同的角色,因为两者都旨在识别一些从大量可见数据中学到的知识的看不见的数据。因此,我们提出了一个统一的框架,用于域适应和很少的学习。核心思想是使用从几个镜头注释的目标图像中提取的类原型来对源图像和目标图像的像素进行分类。我们的方法仅涉及一个阶段训练,不需要对大规模的未经通知的目标图像进行培训。此外,我们的方法可以扩展到域适应性和几乎没有射击学习的变体。关于适应GTA5到CITYSCAPES和合成景观的实验表明,我们的方法实现了对最先进的竞争性能。
translated by 谷歌翻译
Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) an entropy loss and (ii) an adversarial loss respectively. We demonstrate state-of-theart performance in semantic segmentation on two challenging "synthetic-2-real" set-ups 1 and show that the approach can also be used for detection.
translated by 谷歌翻译
Domain adaptation is critical for success in new, unseen environments. Adversarial adaptation models applied in feature spaces discover domain invariant representations, but are difficult to visualize and sometimes fail to capture pixel-level and low-level domain shifts. Recent work has shown that generative adversarial networks combined with cycle-consistency constraints are surprisingly effective at mapping images between domains, even without the use of aligned image pairs. We propose a novel discriminatively-trained Cycle-Consistent Adversarial Domain Adaptation model. CyCADA adapts representations at both the pixel-level and feature-level, enforces cycle-consistency while leveraging a task loss, and does not require aligned pairs. Our model can be applied in a variety of visual recognition and prediction settings. We show new state-of-the-art results across multiple adaptation tasks, including digit classification and semantic segmentation of road scenes demonstrating transfer from synthetic to real world domains.
translated by 谷歌翻译
Recent deep networks achieved state of the art performance on a variety of semantic segmentation tasks. Despite such progress, these models often face challenges in real world "wild tasks" where large difference between labeled training/source data and unseen test/target data exists. In particular, such difference is often referred to as "domain gap", and could cause significantly decreased performance which cannot be easily remedied by further increasing the representation power. Unsupervised domain adaptation (UDA) seeks to overcome such problem without target domain labels. In this paper, we propose a novel UDA framework based on an iterative self-training (ST) procedure, where the problem is formulated as latent variable loss minimization, and can be solved by alternatively generating pseudo labels on target data and re-training the model with these labels. On top of ST, we also propose a novel classbalanced self-training (CBST) framework to avoid the gradual dominance of large classes on pseudo-label generation, and introduce spatial priors to refine generated labels. Comprehensive experiments show that the proposed methods achieve state of the art semantic segmentation performance under multiple major UDA settings.⋆ indicates equal contribution.
translated by 谷歌翻译
We consider the problem of unsupervised domain adaptation in semantic segmentation. A key in this campaign consists in reducing the domain shift, i.e., enforcing the data distributions of the two domains to be similar. One of the common strategies is to align the marginal distribution in the feature space through adversarial learning. However, this global alignment strategy does not consider the category-level joint distribution. A possible consequence of such global movement is that some categories which are originally well aligned between the source and target may be incorrectly mapped, thus leading to worse segmentation results in target domain. To address this problem, we introduce a category-level adversarial network, aiming to enforce local semantic consistency during the trend of global alignment. Our idea is to take a close look at the category-level joint distribution and align each class with an adaptive adversarial loss. Specifically, we reduce the weight of the adversarial loss for category-level aligned features while increasing the adversarial force for those poorly aligned. In this process, we decide how well a feature is category-level aligned between source and target by a co-training approach. In two domain adaptation tasks, i.e., GTA5 → Cityscapes and SYN-THIA → Cityscapes, we validate that the proposed method matches the state of the art in segmentation accuracy.
translated by 谷歌翻译