During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is a core task of various emerging industrial applications such as autonomous driving and medical imaging. However, to train CNNs requires a huge amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNN models on photo-realistic synthetic data with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data significantly decreases the models' performance. Hence we propose a curriculum-style learning approach to minimize the domain gap in semantic segmentation. The curriculum domain adaptation solves easy tasks first in order to infer some necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban traffic scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train the segmentation network in such a way that the network predictions in the target domain follow those inferred properties. In experiments, our method significantly outperforms the baselines as well as the only known existing approach to the same problem.
translated by 谷歌翻译
我们建议利用模拟的潜力,以域的概括方式对现实世界自动驾驶场景的语义分割。对分割网络进行了训练,没有任何目标域数据,并在看不见的目标域进行了测试。为此,我们提出了一种新的域随机化和金字塔一致性的方法,以学习具有高推广性的模型。首先,我们建议使用辅助数据集以视觉外观的方式随机将合成图像随机化,以有效地学习域不变表示。其次,我们进一步在不同的“风格化”图像和图像中实施了金字塔一致性,以分别学习域不变和规模不变的特征。关于从GTA和合成对城市景观,BDD和Mapillary的概括进行了广泛的实验;而我们的方法比最新技术取得了卓越的成果。值得注意的是,我们的概括结果与最先进的模拟域适应方法相比甚至更好,甚至比在训练时访问目标域数据的结果。
translated by 谷歌翻译
语义分割在广泛的计算机视觉应用中起着基本作用,提供了全球对图像​​的理解的关键信息。然而,最先进的模型依赖于大量的注释样本,其比在诸如图像分类的任务中获得更昂贵的昂贵的样本。由于未标记的数据替代地获得更便宜,因此无监督的域适应达到了语义分割社区的广泛成功并不令人惊讶。本调查致力于总结这一令人难以置信的快速增长的领域的五年,这包含了语义细分本身的重要性,以及将分段模型适应新环境的关键需求。我们提出了最重要的语义分割方法;我们对语义分割的域适应技术提供了全面的调查;我们揭示了多域学习,域泛化,测试时间适应或无源域适应等较新的趋势;我们通过描述在语义细分研究中最广泛使用的数据集和基准测试来结束本调查。我们希望本调查将在学术界和工业中提供具有全面参考指导的研究人员,并有助于他们培养现场的新研究方向。
translated by 谷歌翻译
Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source ground truth labels to the target domain is of great interest. In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation. Considering semantic segmentations as structured outputs that contain spatial similarities between the source and target domains, we adopt adversarial learning in the output space. To further enhance the adapted model, we construct a multi-level adversarial network to effectively perform output space domain adaptation at different feature levels. Extensive experiments and ablation study are conducted under various domain adaptation settings, including synthetic-to-real and cross-city scenarios. We show that the proposed method performs favorably against the stateof-the-art methods in terms of accuracy and visual quality.
translated by 谷歌翻译
虽然监督语义分割存在重大进展,但由于领域偏差,将分段模型部署到解除域来仍然具有挑战性。域适应可以通过将知识从标记的源域传输到未标记的目标域来帮助。以前的方法通常尝试执行对全局特征的适应,然而,通常忽略要计入特征空间中的每个像素的本地语义附属机构,导致较少的可辨性。为解决这个问题,我们提出了一种用于细粒度阶级对齐的新型语义原型对比学习框架。具体地,语义原型提供了用于每个像素鉴别的表示学习的监控信号,并且需要在特征空间中的源极和目标域的每个像素来反映相应的语义原型的内容。通过这种方式,我们的框架能够明确地制作较近的类别的像素表示,并且进一步越来越多地分开,以改善分割模型的鲁棒性以及减轻域移位问题。与最先进的方法相比,我们的方法易于实施并达到优异的结果,如众多实验所展示的那样。代码在[此HTTPS URL](https://github.com/binhuixie/spcl)上公开可用。
translated by 谷歌翻译
Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) an entropy loss and (ii) an adversarial loss respectively. We demonstrate state-of-theart performance in semantic segmentation on two challenging "synthetic-2-real" set-ups 1 and show that the approach can also be used for detection.
translated by 谷歌翻译
本文提出FogAdapt,一种用于密集有雾场景的语义细分域的新方法。虽然已经针对显着的研究来减少语义分割中的域移位,但对具有恶劣天气条件的场景的适应仍然是一个开放的问题。由于天气状况,如雾,烟雾和雾度,加剧了域移位的场景的可见性,从而使得在这种情况下进行了无监督的适应性。我们提出了一种自熵和多尺度信息增强的自我监督域适应方法(FOGADAPT),以最大限度地减少有雾场景分割的域移位。由经验证据支持,雾密度的增加导致分割概率的高自熵性,我们引入了基于自熵的损耗功能来引导适应方法。此外,在不同的图像尺度上获得的推论由不确定性组合并加权,以生成目标域的尺度不变伪标签。这些规模不变的伪标签对可见性和比例变化具有鲁棒性。我们在真正的雾景场景中评估了真正的清晰天气场景模型,适应和综合非雾图像到真正的雾场景适应情景。我们的实验表明,FogAdapt在有雾图像的语义分割中的目前最先进的情况下显着优异。具体而言,通过考虑标准设置与最先进的(SOTA)方法相比,FogaDATK在Foggy苏黎世上获得3.8%,有雾的驾驶密集为6.0%,而在Miou的雾化驾驶的3.6%,在Miou,在MiOOP中改编为有雾的苏黎世。
translated by 谷歌翻译
深度学习极大地提高了语义细分的性能,但是,它的成功依赖于大量注释的培训数据的可用性。因此,许多努力致力于域自适应语义分割,重点是将语义知识从标记的源域转移到未标记的目标域。现有的自我训练方法通常需要多轮训练,而基于对抗训练的另一个流行框架已知对超参数敏感。在本文中,我们提出了一个易于训练的框架,该框架学习了域自适应语义分割的域不变原型。特别是,我们表明域的适应性与很少的学习共享一个共同的角色,因为两者都旨在识别一些从大量可见数据中学到的知识的看不见的数据。因此,我们提出了一个统一的框架,用于域适应和很少的学习。核心思想是使用从几个镜头注释的目标图像中提取的类原型来对源图像和目标图像的像素进行分类。我们的方法仅涉及一个阶段训练,不需要对大规模的未经通知的目标图像进行培训。此外,我们的方法可以扩展到域适应性和几乎没有射击学习的变体。关于适应GTA5到CITYSCAPES和合成景观的实验表明,我们的方法实现了对最先进的竞争性能。
translated by 谷歌翻译
Recent deep networks achieved state of the art performance on a variety of semantic segmentation tasks. Despite such progress, these models often face challenges in real world "wild tasks" where large difference between labeled training/source data and unseen test/target data exists. In particular, such difference is often referred to as "domain gap", and could cause significantly decreased performance which cannot be easily remedied by further increasing the representation power. Unsupervised domain adaptation (UDA) seeks to overcome such problem without target domain labels. In this paper, we propose a novel UDA framework based on an iterative self-training (ST) procedure, where the problem is formulated as latent variable loss minimization, and can be solved by alternatively generating pseudo labels on target data and re-training the model with these labels. On top of ST, we also propose a novel classbalanced self-training (CBST) framework to avoid the gradual dominance of large classes on pseudo-label generation, and introduce spatial priors to refine generated labels. Comprehensive experiments show that the proposed methods achieve state of the art semantic segmentation performance under multiple major UDA settings.⋆ indicates equal contribution.
translated by 谷歌翻译
Figure. 1. The SYNTHIA Dataset. A sample frame (Left) with its semantic labels (center) and a general view of the city (right).
translated by 谷歌翻译
In this work, we present a method for unsupervised domain adaptation. Many adversarial learning methods train domain classifier networks to distinguish the features as either a source or target and train a feature generator network to mimic the discriminator. Two problems exist with these methods. First, the domain classifier only tries to distinguish the features as a source or target and thus does not consider task-specific decision boundaries between classes. Therefore, a trained generator can generate ambiguous features near class boundaries. Second, these methods aim to completely match the feature distributions between different domains, which is difficult because of each domain's characteristics.To solve these problems, we introduce a new approach that attempts to align distributions of source and target by utilizing the task-specific decision boundaries. We propose to maximize the discrepancy between two classifiers' outputs to detect target samples that are far from the support of the source. A feature generator learns to generate target features near the support to minimize the discrepancy. Our method outperforms other methods on several datasets of image classification and semantic segmentation. The codes are available at https://github. com/mil-tokyo/MCD_DA
translated by 谷歌翻译
TU Dresden www.cityscapes-dataset.net train/val -fine annotation -3475 images train -coarse annotation -20 000 images test -fine annotation -1525 images
translated by 谷歌翻译
跨数据集的语义细分的域适应性,由相同类别组成,已经获得了一些最近的成功。但是,更一般的情况是源和目标数据集对应于非重叠标签空间时。例如,分割数据集中的类别根据环境或应用程序的类型发生了很大变化,但共享许多有价值的语义关系。基于特征对齐或差异最小化的现有方法不会考虑此类类别的转移。在这项工作中,我们提出了群集到适应(C2A),这是一种基于计算有效的聚类方法,用于跨分割数据集的域适应性,这些方法完全不同但可能相关类别。我们表明,在变换的特征空间中强制执行的这种聚类目标可以自动选择跨源和目标域的类别,这些类别可以对齐以改善目标性能,同时防止对无关类别的负转移。我们通过实验对室外的挑战性问题进行了实验,以少量拍摄和零拍设置来证明室内适应性的挑战性问题,在所有情况下,性能对现有方法和基准的绩效持续改善。
translated by 谷歌翻译
传统的域自适应语义细分解决了在有限或没有其他监督下,将模型调整为新的目标域的任务。在解决输入域间隙的同时,标准域的适应设置假设输出空间没有域的变化。在语义预测任务中,通常根据不同的语义分类法标记不同的数据集。在许多现实世界中,目标域任务需要与源域施加的分类法不同。因此,我们介绍了更通用的自适应跨域语义细分(TAC)问题,从而使两个域之间的分类学不一致。我们进一步提出了一种共同解决图像级和标签级域适应的方法。在标签级别上,我们采用双边混合采样策略来增强目标域,并采用重新标记方法来统一和对齐标签空间。我们通过提出一种不确定性构造的对比度学习方法来解决图像级域间隙,从而导致更多的域不变和类别的歧义特征。我们在不同的TACS设置下广泛评估了框架的有效性:开放分类法,粗到精细的分类学和隐式重叠的分类学。我们的方法的表现超过了先前的最先进的利润,同时能够适应目标分类法。我们的实施可在https://github.com/ethruigong/tada上公开获得。
translated by 谷歌翻译
5级自动驾驶汽车自主权需要一个强大的视觉感知系统,可以在任何视觉条件下解析输入图像。但是,现有的语义分段数据集是由正常条件下捕获的图像主导,或者规模小。为了解决这个问题,我们引入了ACDC,具有对应于培训和测试原种视觉条件的语义分段方法的不利条件数据集。 ACDC由一组大型4006个图像组成,它在四个常见的不利条件之间同样分布:雾,夜间,雨和雪。每个不利条件图像具有高质量的细像素级语义注释,在正常条件下采取的相同场景的相应图像,以及区分清晰和不确定的语义内容的图像内区域之间的二进制掩模。因此,ACDC支持标准语义分割,新引入的不确定性感知语义分割。详细的实证研究表明,ACDC对最先进的监督和无人监督和无监督的方法的挑战,并表明了我们数据集在转向该领域的进展方面的价值。我们的数据集和基准是公开可用的。
translated by 谷歌翻译
了解驾驶场景中的雾图像序列对于自主驾驶至关重要,但是由于难以收集和注释不利天气的现实世界图像,这仍然是一项艰巨的任务。最近,自我训练策略被认为是无监督域适应的强大解决方案,通过生成目标伪标签并重新训练模型,它迭代地将模型从源域转化为目标域。但是,选择自信的伪标签不可避免地会遭受稀疏与准确性之间的冲突,这两者都会导致次优模型。为了解决这个问题,我们利用了驾驶场景的雾图图像序列的特征,以使自信的伪标签致密。具体而言,基于顺序图像数据的局部空间相似性和相邻时间对应的两个发现,我们提出了一种新型的目标域驱动的伪标签扩散(TDO-DIF)方案。它采用超像素和光学流来识别空间相似性和时间对应关系,然后扩散自信但稀疏的伪像标签,或者是由流量链接的超像素或时间对应对。此外,为了确保扩散像素的特征相似性,我们在模型重新训练阶段引入了局部空间相似性损失和时间对比度损失。实验结果表明,我们的TDO-DIF方案有助于自适应模型在两个公共可用的天然雾化数据集(超过雾气的Zurich and Forggy驾驶)上实现51.92%和53.84%的平均跨工会(MIOU),这超过了最态度ART无监督的域自适应语义分割方法。可以在https://github.com/velor2012/tdo-dif上找到模型和数据。
translated by 谷歌翻译
In this work, we connect two distinct concepts for unsupervised domain adaptation: feature distribution alignment between domains by utilizing the task-specific decision boundary [58] and the Wasserstein metric [73]. Our proposed sliced Wasserstein discrepancy (SWD) is designed to capture the natural notion of dissimilarity between the outputs of task-specific classifiers. It provides a geometrically meaningful guidance to detect target samples that are far from the support of the source and enables efficient distribution alignment in an end-to-end trainable fashion. In the experiments, we validate the effectiveness and genericness of our method on digit and sign recognition, image classification, semantic segmentation, and object detection.
translated by 谷歌翻译
无监督的域适应性(UDA)旨在使在标记的源域上训练的模型适应未标记的目标域。在本文中,我们提出了典型的对比度适应(PROCA),这是一种无监督域自适应语义分割的简单有效的对比度学习方法。以前的域适应方法仅考虑跨各个域的阶级内表示分布的对齐,而阶层间结构关系的探索不足,从而导致目标域上的对齐表示可能不像在源上歧视的那样容易歧视。域了。取而代之的是,ProCA将类间信息纳入班级原型,并采用以班级为中心的分布对齐进行适应。通过将同一类原型与阳性和其他类原型视为实现以集体为中心的分配对齐方式的负面原型,Proca在经典领域适应任务上实现了最先进的性能,{\ em i.e. text {and} synthia $ \ to $ cityScapes}。代码可在\ href {https://github.com/jiangzhengkai/proca} {proca}获得代码
translated by 谷歌翻译
We consider the problem of unsupervised domain adaptation in semantic segmentation. A key in this campaign consists in reducing the domain shift, i.e., enforcing the data distributions of the two domains to be similar. One of the common strategies is to align the marginal distribution in the feature space through adversarial learning. However, this global alignment strategy does not consider the category-level joint distribution. A possible consequence of such global movement is that some categories which are originally well aligned between the source and target may be incorrectly mapped, thus leading to worse segmentation results in target domain. To address this problem, we introduce a category-level adversarial network, aiming to enforce local semantic consistency during the trend of global alignment. Our idea is to take a close look at the category-level joint distribution and align each class with an adaptive adversarial loss. Specifically, we reduce the weight of the adversarial loss for category-level aligned features while increasing the adversarial force for those poorly aligned. In this process, we decide how well a feature is category-level aligned between source and target by a co-training approach. In two domain adaptation tasks, i.e., GTA5 → Cityscapes and SYN-THIA → Cityscapes, we validate that the proposed method matches the state of the art in segmentation accuracy.
translated by 谷歌翻译
受益于从特定情况(源)收集的相当大的像素级注释,训练有素的语义分段模型表现得非常好,但由于大域移位而导致的新情况(目标)失败。为了缓解域间隙,先前的跨域语义分段方法始终在域对齐期间始终假设源数据和目标数据的共存。但是,在实际方案中访问源数据可能会引发隐私问题并违反知识产权。为了解决这个问题,我们专注于一个有趣和具有挑战性的跨域语义分割任务,其中仅向目标域提供训练源模型。具体地,我们提出了一种称为ATP的统一框架,其包括三种方案,即特征对准,双向教学和信息传播。首先,我们设计了课程熵最小化目标,以通过提供的源模型隐式对准目标功能与看不见的源特征。其次,除了vanilla自我训练中的正伪标签外,我们是第一个向该领域引入负伪标签的,并开发双向自我训练策略,以增强目标域中的表示学习。最后,采用信息传播方案来通过伪半监督学习进一步降低目标域内的域内差异。综合与跨城市驾驶数据集的广泛结果验证\ TextBF {ATP}产生最先进的性能,即使是需要访问源数据的方法。
translated by 谷歌翻译