虽然在清澈的天气下,在语义场景的理解中取得了相当大的进展,但由于不完美的观察结果引起的不确定性,在恶劣的天气条件下,仍然是一个艰难的问题。此外,收集和标记有雾图像的困难阻碍了这一领域的进展。考虑到在清晰天气下的语义场景理解中的成功,我们认为从清除图像到雾域中学习的知识是合理的。因此,问题变为弥合清晰图像和有雾图像之间的域间隙。与以往的方法不同,主要关注雾雾型磁盘差距 - 缺陷图像或雾化清晰的图像,我们建议通过同时考虑雾影响和风格变化来缓解域间隙。动机基于我们的发现,通过添加中间结构域,可以分别分别划分和关闭迷雾相关间隙。因此,我们提出了一种新的管道来累积适应风格,雾和双因素(风格和雾)。具体而言,我们设计了一个统一的框架,分别解开风格因子和雾因子,然后是不同域中图像的双因素。此外,我们合作了三种因素的解剖,具有新颖的累积损失,以彻底解解这三个因素。我们的方法在三个基准上实现了最先进的性能,并在多雨和雪景中显示了泛化能力。
translated by 谷歌翻译
Domain adaptation aims to bridge the domain shifts between the source and the target domain. These shifts may span different dimensions such as fog, rainfall, etc. However, recent methods typically do not consider explicit prior knowledge about the domain shifts on a specific dimension, thus leading to less desired adaptation performance. In this paper, we study a practical setting called Specific Domain Adaptation (SDA) that aligns the source and target domains in a demanded-specific dimension. Within this setting, we observe the intra-domain gap induced by different domainness (i.e., numerical magnitudes of domain shifts in this dimension) is crucial when adapting to a specific domain. To address the problem, we propose a novel Self-Adversarial Disentangling (SAD) framework. In particular, given a specific dimension, we first enrich the source domain by introducing a domainness creator with providing additional supervisory signals. Guided by the created domainness, we design a self-adversarial regularizer and two loss functions to jointly disentangle the latent representations into domainness-specific and domainness-invariant features, thus mitigating the intra-domain gap. Our method can be easily taken as a plug-and-play framework and does not introduce any extra costs in the inference time. We achieve consistent improvements over state-of-the-art methods in both object detection and semantic segmentation.
translated by 谷歌翻译
无监督的域适应性(UDA)旨在使在标记的源域上训练的模型适应未标记的目标域。在本文中,我们提出了典型的对比度适应(PROCA),这是一种无监督域自适应语义分割的简单有效的对比度学习方法。以前的域适应方法仅考虑跨各个域的阶级内表示分布的对齐,而阶层间结构关系的探索不足,从而导致目标域上的对齐表示可能不像在源上歧视的那样容易歧视。域了。取而代之的是,ProCA将类间信息纳入班级原型,并采用以班级为中心的分布对齐进行适应。通过将同一类原型与阳性和其他类原型视为实现以集体为中心的分配对齐方式的负面原型,Proca在经典领域适应任务上实现了最先进的性能,{\ em i.e. text {and} synthia $ \ to $ cityScapes}。代码可在\ href {https://github.com/jiangzhengkai/proca} {proca}获得代码
translated by 谷歌翻译
如何通过学习和视觉社区进行识别或分割视觉数据时处理域名转移。在本文中,我们解决了域广义语义分割,其中分割模型在多个源极域上培训,预计将概括到未操作数据域。我们提出了一种具有功能解剖能力的新型元学习方案,它可以使用域泛化保证来派生语义分段的域中的功能。特别是,我们在我们的框架中介绍了一个特定于特定的功能批评模块,强制执行域泛化保证的解除义的视觉功能。最后,我们对基准数据集的定量结果证实了我们所提出的模型的有效性和稳健性,以及在分割中的最先进的域适应和泛化方法表现。
translated by 谷歌翻译
无监督域适应(UDA)技术的最新进展在跨域计算机视觉任务中有巨大的成功,通过弥合域分布差距来增强数据驱动的深度学习架构的泛化能力。对于基于UDA的跨域对象检测方法,其中大多数通过对抗性学习策略引导域不变特征产生来缓解域偏差。然而,由于不稳定的对抗性培训过程,他们的域名鉴别器具有有限的分类能力。因此,它们引起的提取特征不能完全域不变,仍然包含域私有因素,使障碍物进一步缓解跨域差异。为了解决这个问题,我们设计一个域分离rcnn(DDF),以消除特定于检测任务学习的特定信息。我们的DDF方法促进了全局和本地阶段的功能解剖,分别具有全局三联脱离(GTD)模块和实例相似性解剖(ISD)模块。通过在四个基准UDA对象检测任务上表现出最先进的方法,对我们的DDF方法进行了宽阔的适用性。
translated by 谷歌翻译
由于难以获得地面真理标签,从虚拟世界数据集学习对于像语义分割等现实世界的应用非常关注。从域适应角度来看,关键挑战是学习输入的域名签名表示,以便从虚拟数据中受益。在本文中,我们提出了一种新颖的三叉戟架构,该架构强制执行共享特征编码器,同时满足对抗源和目标约束,从而学习域不变的特征空间。此外,我们还介绍了一种新颖的训练管道,在前向通过期间能够自我引起的跨域数据增强。这有助于进一步减少域间隙。结合自我培训过程,我们在基准数据集(例如GTA5或Synthia适应城市景观)上获得最先进的结果。Https://github.com/hmrc-ael/trideadapt提供了代码和预先训练的型号。
translated by 谷歌翻译
我们建议利用模拟的潜力,以域的概括方式对现实世界自动驾驶场景的语义分割。对分割网络进行了训练,没有任何目标域数据,并在看不见的目标域进行了测试。为此,我们提出了一种新的域随机化和金字塔一致性的方法,以学习具有高推广性的模型。首先,我们建议使用辅助数据集以视觉外观的方式随机将合成图像随机化,以有效地学习域不变表示。其次,我们进一步在不同的“风格化”图像和图像中实施了金字塔一致性,以分别学习域不变和规模不变的特征。关于从GTA和合成对城市景观,BDD和Mapillary的概括进行了广泛的实验;而我们的方法比最新技术取得了卓越的成果。值得注意的是,我们的概括结果与最先进的模拟域适应方法相比甚至更好,甚至比在训练时访问目标域数据的结果。
translated by 谷歌翻译
Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source ground truth labels to the target domain is of great interest. In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation. Considering semantic segmentations as structured outputs that contain spatial similarities between the source and target domains, we adopt adversarial learning in the output space. To further enhance the adapted model, we construct a multi-level adversarial network to effectively perform output space domain adaptation at different feature levels. Extensive experiments and ablation study are conducted under various domain adaptation settings, including synthetic-to-real and cross-city scenarios. We show that the proposed method performs favorably against the stateof-the-art methods in terms of accuracy and visual quality.
translated by 谷歌翻译
了解驾驶场景中的雾图像序列对于自主驾驶至关重要,但是由于难以收集和注释不利天气的现实世界图像,这仍然是一项艰巨的任务。最近,自我训练策略被认为是无监督域适应的强大解决方案,通过生成目标伪标签并重新训练模型,它迭代地将模型从源域转化为目标域。但是,选择自信的伪标签不可避免地会遭受稀疏与准确性之间的冲突,这两者都会导致次优模型。为了解决这个问题,我们利用了驾驶场景的雾图图像序列的特征,以使自信的伪标签致密。具体而言,基于顺序图像数据的局部空间相似性和相邻时间对应的两个发现,我们提出了一种新型的目标域驱动的伪标签扩散(TDO-DIF)方案。它采用超像素和光学流来识别空间相似性和时间对应关系,然后扩散自信但稀疏的伪像标签,或者是由流量链接的超像素或时间对应对。此外,为了确保扩散像素的特征相似性,我们在模型重新训练阶段引入了局部空间相似性损失和时间对比度损失。实验结果表明,我们的TDO-DIF方案有助于自适应模型在两个公共可用的天然雾化数据集(超过雾气的Zurich and Forggy驾驶)上实现51.92%和53.84%的平均跨工会(MIOU),这超过了最态度ART无监督的域自适应语义分割方法。可以在https://github.com/velor2012/tdo-dif上找到模型和数据。
translated by 谷歌翻译
Domain adaptation for semantic image segmentation is very necessary since manually labeling large datasets with pixel-level labels is expensive and time consuming. Existing domain adaptation techniques either work on limited datasets, or yield not so good performance compared with supervised learning. In this paper, we propose a novel bidirectional learning framework for domain adaptation of segmentation. Using the bidirectional learning, the image translation model and the segmentation adaptation model can be learned alternatively and promote to each other. Furthermore, we propose a self-supervised learning algorithm to learn a better segmentation adaptation model and in return improve the image translation model. Experiments show that our method is superior to the state-of-the-art methods in domain adaptation of segmentation with a big margin. The source code is available at https://github.com/liyunsheng13/BDL.
translated by 谷歌翻译
无监督的域适应(UDA)旨在使源域上培训的模型适应到新的目标域,其中没有可用标记的数据。在这项工作中,我们调查从合成计算机生成的域的UDA的问题,以用于学习语义分割的类似但实际的域。我们提出了一种与UDA的一致性正则化方法结合的语义一致的图像到图像转换方法。我们克服了将合成图像转移到真实的图像的先前限制。我们利用伪标签来学习生成的图像到图像转换模型,该图像到图像转换模型从两个域上的语义标签接收额外的反馈。我们的方法优于最先进的方法,将图像到图像转换和半监督学习与相关域适应基准,即Citycapes和Synthia上的CutyCapes和Synthia进行了全面的学习。
translated by 谷歌翻译
本文提出FogAdapt,一种用于密集有雾场景的语义细分域的新方法。虽然已经针对显着的研究来减少语义分割中的域移位,但对具有恶劣天气条件的场景的适应仍然是一个开放的问题。由于天气状况,如雾,烟雾和雾度,加剧了域移位的场景的可见性,从而使得在这种情况下进行了无监督的适应性。我们提出了一种自熵和多尺度信息增强的自我监督域适应方法(FOGADAPT),以最大限度地减少有雾场景分割的域移位。由经验证据支持,雾密度的增加导致分割概率的高自熵性,我们引入了基于自熵的损耗功能来引导适应方法。此外,在不同的图像尺度上获得的推论由不确定性组合并加权,以生成目标域的尺度不变伪标签。这些规模不变的伪标签对可见性和比例变化具有鲁棒性。我们在真正的雾景场景中评估了真正的清晰天气场景模型,适应和综合非雾图像到真正的雾场景适应情景。我们的实验表明,FogAdapt在有雾图像的语义分割中的目前最先进的情况下显着优异。具体而言,通过考虑标准设置与最先进的(SOTA)方法相比,FogaDATK在Foggy苏黎世上获得3.8%,有雾的驾驶密集为6.0%,而在Miou的雾化驾驶的3.6%,在Miou,在MiOOP中改编为有雾的苏黎世。
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. It can save the cost of manually labeling data in real-world applications such as robot vision and autonomous driving. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. However, such an assumption does not always hold in practice owing to the collection difficulty and the scarcity of the data. Thus, we aim to relieve this need on a large number of real data, and explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization (OSDG) problem, where only one real-world data sample is available. To remedy the limited real data knowledge, we first construct the pseudo-target domain by stylizing the simulated data with the one-shot real data. To mitigate the sim-to-real domain gap on both the style and spatial structure level and facilitate the sim-to-real adaptation, we further propose to use class-aware cross-domain transformers with an intermediate domain randomization strategy to extract the domain-invariant knowledge, from both the simulated and pseudo-target data. We demonstrate the effectiveness of our approach for OSUDA and OSDG on different benchmarks, outperforming the state-of-the-art methods by a large margin, 10.87, 9.59, 13.05 and 15.91 mIoU on GTA, SYNTHIA$\rightarrow$Cityscapes, Foggy Cityscapes, respectively.
translated by 谷歌翻译
Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) an entropy loss and (ii) an adversarial loss respectively. We demonstrate state-of-theart performance in semantic segmentation on two challenging "synthetic-2-real" set-ups 1 and show that the approach can also be used for detection.
translated by 谷歌翻译
在本文中,我们研究了合成到现实域的广义语义分割的任务,该任务旨在学习一个仅使用合成数据的现实场景的强大模型。合成数据和现实世界数据之间的大域移动,包括有限的源环境变化以及合成和现实世界数据之间的较大分布差距,极大地阻碍了看不见的现实现实场景中的模型性能。在这项工作中,我们建议使用样式挂钩的双重一致性学习(Shad)框架来处理此类域转移。具体而言,阴影是基于两个一致性约束,样式一致性(SC)和回顾一致性(RC)构建的。 SC丰富了来源情况,并鼓励模型在样式多样化样本中学习一致的表示。 RC利用现实世界的知识来防止模型过度拟合到合成数据,因此在很大程度上使综合模型和现实世界模型之间的表示一致。此外,我们提出了一个新颖的样式幻觉模块(SHM),以生成对一致性学习至关重要的样式变化样本。 SHM从源分布中选择基本样式,使模型能够在训练过程中动态生成多样化和现实的样本。实验表明,我们的阴影在单个和多源设置上的三个现实世界数据集的平均MIOU的平均MIOU的平均MIOU的平均水平分别优于最先进的方法,并优于最先进的方法。
translated by 谷歌翻译
We consider the problem of unsupervised domain adaptation in semantic segmentation. A key in this campaign consists in reducing the domain shift, i.e., enforcing the data distributions of the two domains to be similar. One of the common strategies is to align the marginal distribution in the feature space through adversarial learning. However, this global alignment strategy does not consider the category-level joint distribution. A possible consequence of such global movement is that some categories which are originally well aligned between the source and target may be incorrectly mapped, thus leading to worse segmentation results in target domain. To address this problem, we introduce a category-level adversarial network, aiming to enforce local semantic consistency during the trend of global alignment. Our idea is to take a close look at the category-level joint distribution and align each class with an adaptive adversarial loss. Specifically, we reduce the weight of the adversarial loss for category-level aligned features while increasing the adversarial force for those poorly aligned. In this process, we decide how well a feature is category-level aligned between source and target by a co-training approach. In two domain adaptation tasks, i.e., GTA5 → Cityscapes and SYN-THIA → Cityscapes, we validate that the proposed method matches the state of the art in segmentation accuracy.
translated by 谷歌翻译
无监督的域适应性(UDA)旨在使标记的源域的模型适应未标记的目标域。现有的基于UDA的语义细分方法始终降低像素级别,功能级别和输出级别的域移动。但是,几乎所有这些都在很大程度上忽略了上下文依赖性,该依赖性通常在不同的领域共享,从而导致较不怀疑的绩效。在本文中,我们提出了一个新颖的环境感知混音(camix)框架自适应语义分割的框架,该框架以完全端到端的可训练方式利用了上下文依赖性的这一重要线索作为显式的先验知识,以增强对适应性的适应性目标域。首先,我们通过利用积累的空间分布和先前的上下文关系来提出上下文掩盖的生成策略。生成的上下文掩码在这项工作中至关重要,并将指导三个不同级别的上下文感知域混合。此外,提供了背景知识,我们引入了重要的一致性损失,以惩罚混合学生预测与混合教师预测之间的不一致,从而减轻了适应性的负面转移,例如早期绩效降级。广泛的实验和分析证明了我们方法对广泛使用的UDA基准的最新方法的有效性。
translated by 谷歌翻译
语义分割在广泛的计算机视觉应用中起着基本作用,提供了全球对图像​​的理解的关键信息。然而,最先进的模型依赖于大量的注释样本,其比在诸如图像分类的任务中获得更昂贵的昂贵的样本。由于未标记的数据替代地获得更便宜,因此无监督的域适应达到了语义分割社区的广泛成功并不令人惊讶。本调查致力于总结这一令人难以置信的快速增长的领域的五年,这包含了语义细分本身的重要性,以及将分段模型适应新环境的关键需求。我们提出了最重要的语义分割方法;我们对语义分割的域适应技术提供了全面的调查;我们揭示了多域学习,域泛化,测试时间适应或无源域适应等较新的趋势;我们通过描述在语义细分研究中最广泛使用的数据集和基准测试来结束本调查。我们希望本调查将在学术界和工业中提供具有全面参考指导的研究人员,并有助于他们培养现场的新研究方向。
translated by 谷歌翻译
深度学习极大地提高了语义细分的性能,但是,它的成功依赖于大量注释的培训数据的可用性。因此,许多努力致力于域自适应语义分割,重点是将语义知识从标记的源域转移到未标记的目标域。现有的自我训练方法通常需要多轮训练,而基于对抗训练的另一个流行框架已知对超参数敏感。在本文中,我们提出了一个易于训练的框架,该框架学习了域自适应语义分割的域不变原型。特别是,我们表明域的适应性与很少的学习共享一个共同的角色,因为两者都旨在识别一些从大量可见数据中学到的知识的看不见的数据。因此,我们提出了一个统一的框架,用于域适应和很少的学习。核心思想是使用从几个镜头注释的目标图像中提取的类原型来对源图像和目标图像的像素进行分类。我们的方法仅涉及一个阶段训练,不需要对大规模的未经通知的目标图像进行培训。此外,我们的方法可以扩展到域适应性和几乎没有射击学习的变体。关于适应GTA5到CITYSCAPES和合成景观的实验表明,我们的方法实现了对最先进的竞争性能。
translated by 谷歌翻译