Domain adaptation is critical for success in new, unseen environments. Adversarial adaptation models applied in feature spaces discover domain invariant representations , but are difficult to visualize and sometimes fail to capture pixel-level and low-level domain shifts. Recent work has shown that generative adversarial networks combined with cycle-consistency constraints are surprisingly effective at mapping images between domains, even without the use of aligned image pairs. We propose a novel discriminatively-trained Cycle-Consistent Adversarial Domain Adaptation model. CyCADA adapts representations at both the pixel-level and feature-level, enforces cycle-consistency while leveraging a task loss, and does not require aligned pairs. Our model can be applied in a variety of visual recognition and prediction settings. We show new state-of-the-art results across multiple adaptation tasks, including digit classification and semantic segmentation of road scenes demonstrating transfer from synthetic to real world domains.
translated by 谷歌翻译
现代深度学习方法可实现的性能与培训时使用的数据量直接相关。不幸的是,注释过程是非常繁琐且昂贵的,特别是对于像语义分段这样的像素化任务。最近的工作已经提出依靠合成生成的图像来简化训练集的创建。然而,在这种类型的数据上训练的模型通常在真实图像上表现不佳,这是众所周知的域移位问题。我们通过学习adomain-to-domain图像转换GAN来缩小真实和合成图像之间的差距来解决这个问题。特别是对于我们的方法,我们在生成过程中引入了语义约束,以避免伪像并指导合成。为了提高我们提案的有效性,我们展示了如何通过我们的方法改编的合成GTA数据集中的图像训练的语义分割CNN相对于合成图像上训练的相同模型可以提高16%以上的性能。
translated by 谷歌翻译
Harvesting dense pixel-level annotations to train deep neu-ral networks for semantic segmentation is extremely expensive and unwieldy at scale. While learning from synthetic data where labels are readily available sounds promising, performance degrades significantly when testing on novel realistic data due to domain discrepancies. We present Dual Channel-wise Alignment Networks (DCAN), a simple yet effective approach to reduce domain shift at both pixel-level and feature-level. Exploring statistics in each channel of CNN feature maps, our framework performs channel-wise feature alignment, which preserves spatial structures and semantic information, in both an image generator and a segmentation network. In particular, given an image from the source domain and unlabeled samples from the target domain, the generator synthesizes new images on-the-fly to resemble samples from the target domain in appearance and the segmentation network further refines high-level features before predicting semantic maps, both of which leverage feature statistics of sampled images from the target domain. Unlike much recent and concurrent work relying on adversarial training, our framework is lightweight and easy to train. Extensive experiments on adapting models trained on synthetic segmentation benchmarks to real urban scenes demonstrate the effectiveness of the proposed framework.
translated by 谷歌翻译
语义分割是许多计算机视觉任务的关键问题。虽然基于卷积神经网络的方法在不同的基准测试中不断打破新的记录,但很好地推广到各种测试环境仍然是一项重大挑战。在许多实际应用中,在列车和测试域中的数据分布之间确实存在很大差距,这导致在运行时严重的性能损失。在这项工作中,我们解决了语义分割中的无监督域自适应的任务,其中基于像素预测的熵具有损失。为此,我们提出了两种新的互补方法,分别使用(i)熵损失和(ii)对抗性损失。我们在两个具有挑战性的“合成2实际”设置的语义分割中展示了最先进的性能,并表明该方法也可用于检测。
translated by 谷歌翻译
In recent years, deep neural nets have triumphed over many computer vision problems, including semantic seg-mentation, which is a critical task in emerging autonomous driving and medical image diagnostics applications. In general, training deep neural nets requires a humongous amount of labeled data, which is laborious and costly to collect and annotate. Recent advances in computer graphics shed light on utilizing photo-realistic synthetic data with computer generated annotations to train neural nets. Nevertheless , the domain mismatch between real images and synthetic ones is the major challenge against harnessing the generated data and labels. In this paper, we propose a principled way to conduct structured domain adaption for semantic segmentation, i.e., integrating GAN into the FCN framework to mitigate the gap between source and target domains. Specifically, we learn a conditional generator to transform features of synthetic images to real-image like features, and a discriminator to distinguish them. For each training batch, the conditional generator and the dis-criminator compete against each other so that the generator learns to produce real-image like features to fool the dis-criminator; afterwards, the FCN parameters are updated to accommodate the changes of GAN. In experiments, without using labels of real image data, our method significantly outperforms the baselines as well as state-of-the-art methods by 12% ∼ 20% mean IoU on the Cityscapes dataset.
translated by 谷歌翻译
对抗性学习方法是训练健壮的深度网络的有前途的方法,并且可以跨不同领域生成复杂样本。尽管存在域转移或数据集偏差,它们也提高了识别率:最近引入了几种针对无监督域适应的对抗方法,这减少了训练和测试域分布之间的差异,从而提高了泛化性能。先前的生成方法显示出引人注目的可视化,但在不受限制的任务中并不是最佳的,并且可以限制为较小的移位。先前的判别式方法可以处理更大的域移位,但是对模型施加绑定权重并且没有利用基于GAN的丢失。我们首先概述了一种用于对抗性适应的新型广义框架,其将最近的最新方法作为特殊情况包含在内,并且我们使用这种广义视图来更好地联系先前的方法。我们提出了一个以前未开发的我们的一般框架实例,它结合了判别模型,无条件权重共享和GAN损失,我们称之为Adversarial DiscriminativeDomain Adaptation(ADDA)。我们证明ADDA比竞争域对抗方法更有效但相当简单,并且通过在标准跨域数字分类任务和新的更难以跨模态的对象分类中超越最先进的无监督自适应结果来证明我们的方法的前景。任务。
translated by 谷歌翻译
注释大规模数据集以训练现代卷积神经网络对于许多实际任务来说过于昂贵且耗时。另一种选择是在标记的合成数据集上训练模型并将其应用于真实场景中。然而,这种直接的方法通常无法推广,主要是由于合成数据集和真实数据集之间的域偏差。引入了许多无监督域自适应(UDA)方法来解决这个问题,但是大多数方法只关注简单的分类任务。在本文中,我们提出了一种新的UDA模型来解决自动驾驶环境中更复杂的物体检测问题。我们的模型集成了基于像素级别和基于特征级别的转换,以实现跨域检测任务,并且可以进行端到端的进一步培训,以追求更好的性能。我们采用生成对抗网络的目标和像素空间中图像转换的循环一致性损失。为了解决潜在的语义不一致问题,我们提出了基于区域提议的特征对抗训练来保留目标对象的语义以及进一步最小化域移位。在几个不同的数据集上进行了大量的实验,结果证明了我们的方法的鲁棒性和优越性。
translated by 谷歌翻译
We propose a general framework for unsupervised domain adaptation, which allows deep neural networks trained on a source domain to be tested on a different target domain without requiring any training annotations in the target domain. This is achieved by adding extra networks and losses that help regularize the features extracted by the backbone encoder network. To this end we propose the novel use of the recently proposed unpaired image-to-image translation framework to constrain the features extracted by the encoder network. Specifically, we require that the features extracted are able to reconstruct the images in both domains. In addition we require that the distribution of features extracted from images in the two domains are indistinguishable. Many recent works can be seen as specific cases of our general framework. We apply our method for domain adaptation between MNIST, USPS, and SVHN datasets, and Amazon, Webcam and DSLR Office datasets in classification tasks, and also between GTA5 and Cityscapes datasets for a segmentation task. We demonstrate state of the art performance on each of these datasets.
translated by 谷歌翻译
Collecting well-annotated image datasets to train modern machine learning algorithms is prohibitively expensive for many tasks. One appealing alternative is rendering synthetic data where ground-truth annotations are generated automatically. Unfortunately, models trained purely on rendered images often fail to generalize to real images. To address this shortcoming, prior work introduced unsupervised domain adaptation algorithms that attempt to map representations between the two domains or learn to extract features that are domain-invariant. In this work, we present a new approach that learns, in an unsupervised manner, a transformation in the pixel space from one domain to the other. Our generative adversarial network (GAN)-based model adapts source-domain images to appear as if drawn from the target domain. Our approach not only produces plausible samples, but also outperforms the state-of-the-art on a number of unsupervised domain adaptation scenarios by large margins. Finally, we demonstrate that the adaptation process generalizes to object classes unseen during training.
translated by 谷歌翻译
Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive , developing algorithms that can adapt source ground truth labels to the target domain is of great interest. In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation. Considering semantic segmentations as structured outputs that contain spatial similarities between the source and target domains, we adopt adversarial learning in the output space. To further enhance the adapted model, we construct a multi-level adversarial network to effectively perform output space domain adaptation at different feature levels. Extensive experiments and ablation study are conducted under various domain adaptation settings, including synthetic-to-real and cross-city scenarios. We show that the proposed method performs favorably against the state-of-the-art methods in terms of accuracy and visual quality.
translated by 谷歌翻译
The recent advances in deep neural networks have convincingly demonstrated high capability in learning vision models on large datasets. Nevertheless, collecting expert labeled datasets especially with pixel-level annotations is an extremely expensive process. An appealing alternative is to render synthetic data (e.g., computer games) and generate ground truth automatically. However, simply applying the models learnt on synthetic images may lead to high generalization error on real images due to domain shift. In this paper, we facilitate this issue from the perspectives of both visual appearance-level and representation-level domain adaptation. The former adapts source-domain images to appear as if drawn from the "style" in the target domain and the latter attempts to learn domain-invariant representations. Specifically, we present Fully Convolutional Adaptation Networks (FCAN), a novel deep architecture for semantic segmentation which combines Appearance Adaptation Networks (AAN) and Representation Adaptation Networks (RAN). AAN learns a transformation from one domain to the other in the pixel space and RAN is optimized in an adversarial learning manner to maximally fool the domain discriminator with the learnt source and target representations. Extensive experiments are conducted on the transfer from GTA5 (game videos) to Cityscapes (urban street scenes) on semantic segmentation and our proposal achieves superior results when comparing to state-of-the-art unsupervised adaptation techniques. More remarkably, we obtain a new record: mIoU of 47.5% on BDDS (drive-cam videos) in an unsupervised setting.
translated by 谷歌翻译
最近对生成性对抗网络(GAN)的兴趣越来越大,这些网络为生成建模,密度估计和能量函数学习提供了强大的功能。 GAN难以训练和评估,但能够创建令人惊讶的逼真的合成图像数据。源自GAN的想法,例如对抗性损失,正在为域适应等其他挑战创造研究机会。在本文中,我们着眼于GAN的领域,重点是新兴研究的这些领域。为了提供对抗技术的背景,我们调查了GAN领域,查看了原始配方,培训变体,评估方法和扩展。然后,我们调查了最近关于转移学习的工作,重点是比较不同的对抗领域适应方法。最后,我们期待确定GAN和域适应的开放研究方向,包括一些有前途的应用,如基于传感器的人体行为建模。
translated by 谷歌翻译
Real-world robotics problems often occur in domains that differ significantly from the robot's prior training environment. For many robotic control tasks, real world experience is expensive to obtain, but data is easy to collect in either an instrumented environment or in simulation. We propose a novel domain adaptation approach for robot perception that adapts visual representations learned on a large easy-to-obtain source dataset (e.g. synthetic images) to a target real-world domain, without requiring expensive manual data annotation of real world data before policy search. Supervised domain adaptation methods minimize cross-domain differences using pairs of aligned images that contain the same object or scene in both the source and target domains, thus learning a domain-invariant representation. However, they require manual alignment of such image pairs. Fully unsupervised adaptation methods rely on minimizing the discrepancy between the feature distributions across domains. We propose a novel, more powerful combination of both distribution and pairwise image alignment, and remove the requirement for expensive annotation by using weakly aligned pairs of images in the source and target domains. Focusing on adapting from simulation to real world data using a PR2 robot, we evaluate our approach on a manipulation task and show that by using weakly paired images, our method compensates for domain shift more effectively than previous techniques, enabling better robot performance in the real world.
translated by 谷歌翻译
我们介绍了一种用于物体检测的新型无监督域自适应方法。我们的目标是同时缓解像素级别的不完美翻译问题,以及特征性差异的源偏差判别问题。我们的方法由两个阶段组成,即域多样化(DD)和多域不变表示学习(MRL)。在DD阶段,我们通过从源域生成各种不同的移位域来使标记数据的分布多样化。在MRL阶段,我们应用具有多域鉴别器的对抗性学习来鼓励在域之间难以区分的特征。 DD解决了资源偏向的歧视性问题,而MRL减轻了不完美的图像翻译。我们为学习范式构建了一个结构化的域适应框架,并介绍了DD实现的实用方法。在各种数据集中,我们的方法在平均精度(mAP)的3%~11%间隔范围内优于最先进的方法。
translated by 谷歌翻译
深度学习提出了希望和期望,作为许多应用程序的一般解决方案;事实证明它已被证明是有效的,但它也显示出对大量数据的强烈依赖性。幸运的是,已经证明,即使数据稀缺,也可以通过重复使用priorknowledge来训练成功的模型。因此,在最广泛的定义中,开发转移学习技术是部署有效和准确的智能系统的关键因素。本文将重点研究一系列适用于视觉目标识别任务的转移学习方法,特别是图像分类。转移学习是一个通用术语,并且特定设置已经给出了特定的名称:当学习者只能访问来自目标域的标记数据和来自不同域(源)的标记数据时,问题被称为“无监督域适应”。 (DA)。这项工作的第一部分将集中在这个设置的三种方法:其中一种方法涉及特征,一种是图像,而第三种方法同时使用两种。第二部分将重点关注机器人感知的现实生活问题,特别是RGB-D识别。机器人平台通常不仅限于色彩感知;他们经常带着Depthcamera。不幸的是,深度模态很少用于视觉识别,因为缺乏预先训练的模型,从中可以传输并且很少有数据从头开始。将提出两种处理这种情况的方法:一种使用合成数据,另一种利用跨模态转移学习。
translated by 谷歌翻译
Object detection typically assumes that training and test data are drawn froman identical distribution, which, however, does not always hold in practice.Such a distribution mismatch will lead to a significant performance drop. Inthis work, we aim to improve the cross-domain robustness of object detection.We tackle the domain shift on two levels: 1) the image-level shift, such asimage style, illumination, etc, and 2) the instance-level shift, such as objectappearance, size, etc. We build our approach based on the recentstate-of-the-art Faster R-CNN model, and design two domain adaptationcomponents, on image level and instance level, to reduce the domaindiscrepancy. The two domain adaptation components are based on H-divergencetheory, and are implemented by learning a domain classifier in adversarialtraining manner. The domain classifiers on different levels are furtherreinforced with a consistency regularization to learn a domain-invariant regionproposal network (RPN) in the Faster R-CNN model. We evaluate our newlyproposed approach using multiple datasets including Cityscapes, KITTI, SIM10K,etc. The results demonstrate the effectiveness of our proposed approach forrobust object detection in various domain shift scenarios.
translated by 谷歌翻译
为语义分段训练深度网络需要注释大量数据,这可能既耗时又昂贵。不幸的是,当在与训练数据不一致的领域中进行测试时,这些训练有素的网络仍然很难概括。在本文中,我们通过仔细地将标记的源域和代理标记的目标多态数据的混合物呈现给网络,我们可以实现最先进的无监督域适应性结果。通过我们的设计,网络使用仅来源域的注释逐步学习特定于目标域的特征。我们使用网络自己的预测生成目标域的代理标签。然后,我们的架构允许从这组代理标签和来自注释源域的硬样本中选择性地挖掘简单样本。我们使用GTA5,Cityscapes和BDD100k数据集进行了一系列实验,包括合成到实域适应和地理域适应,展示了我们的方法优于基线和现有方法的优势。
translated by 谷歌翻译
Exploiting synthetic data to learn deep models has attracted increasing attention in recent years. However, the intrinsic domain difference between synthetic and real images usually causes a significant performance drop when applying the learned model to real world scenarios. This is mainly due to two reasons: 1) the model overfits to synthetic images, making the convolutional filters incompetent to extract informative representation for real images; 2) there is a distribution difference between synthetic and real data, which is also known as the domain adaptation problem. To this end, we propose a new reality oriented adaptation approach for urban scene semantic segmentation by learning from synthetic data. First, we propose a target guided distillation approach to learn the real image style, which is achieved by training the segmentation model to imitate a pretrained real style model using real images. Second, we further take advantage of the intrinsic spatial structure presented in urban scene images, and propose a spatial-aware adaptation scheme to effectively align the distribution of two domains. These two modules can be readily integrated with existing state-of-the-art semantic segmentation networks to improve their generalizability when adapting from synthetic to real urban scenes. We evaluate the proposed method on Cityscapes dataset by adapting from GTAV and SYNTHIA datasets, where the results demonstrate the effectiveness of our method.
translated by 谷歌翻译
Deep domain adaptation has emerged as a new learning technique to address the lack of massive amounts of labeled data. Compared to conventional methods, which learn shared feature subspaces or reuse important source instances with shallow representations, deep domain adaptation methods leverage deep networks to learn more transferable representations by embedding domain adaptation in the pipeline of deep learning. There have been comprehensive surveys for shallow domain adaptation, but few timely reviews the emerging deep learning based methods. In this paper, we provide a comprehensive survey of deep domain adaptation methods for computer vision applications with four major contributions. First, we present a taxonomy of different deep domain adaptation scenarios according to the properties of data that define how two domains are diverged. Second, we summarize deep domain adaptation approaches into several categories based on training loss, and analyze and compare briefly the state-of-the-art methods under these categories. Third, we overview the computer vision applications that go beyond image classification, such as face recognition, semantic segmentation and object detection. Fourth, some potential deficiencies of current methods and several future directions are highlighted.
translated by 谷歌翻译
分割航拍图像在监控和对城市地区的理解方面具有巨大潜力。它为自动报告在居住区域发生的不同事件提供了一种方法。这显着促进了公共安全和交通管理应用。在广泛采用卷积神经网络方法之后,如果提供健壮的数据集,语义分割算法的准确性可以轻松超过80%。尽管取得了这样的成功,但是部署预训练的分段模型来调查未包含在训练集中的新城市会显着降低准确性。这是由于在训练模型的源数据集和新城市图像的新目标域之间的域移位。在本文中,我们解决了这个问题,并考虑了域自适应在航空图像语义分割中的挑战。我们设计了一种算法,使用GenerativeAdversarial Networks(GAN)减少域移位影响。在实验中,我们测试了国际摄影测量与遥感学会(ISPRS)语义分割数据集的拟议方法,发现当从波茨坦域(被认为是源域)到Vaihingen域时,我们的方法将总体准确度从35%提高到52%。 (被视为目标域)。此外,该方法允许由于传感器变化而有效地恢复反转类别。特别是,由于传感器变化范围从14%到61%,它提高了反转类别的平均分割精度。
translated by 谷歌翻译