本文提出FogAdapt,一种用于密集有雾场景的语义细分域的新方法。虽然已经针对显着的研究来减少语义分割中的域移位,但对具有恶劣天气条件的场景的适应仍然是一个开放的问题。由于天气状况,如雾,烟雾和雾度,加剧了域移位的场景的可见性,从而使得在这种情况下进行了无监督的适应性。我们提出了一种自熵和多尺度信息增强的自我监督域适应方法(FOGADAPT),以最大限度地减少有雾场景分割的域移位。由经验证据支持,雾密度的增加导致分割概率的高自熵性,我们引入了基于自熵的损耗功能来引导适应方法。此外,在不同的图像尺度上获得的推论由不确定性组合并加权,以生成目标域的尺度不变伪标签。这些规模不变的伪标签对可见性和比例变化具有鲁棒性。我们在真正的雾景场景中评估了真正的清晰天气场景模型,适应和综合非雾图像到真正的雾场景适应情景。我们的实验表明,FogAdapt在有雾图像的语义分割中的目前最先进的情况下显着优异。具体而言,通过考虑标准设置与最先进的(SOTA)方法相比,FogaDATK在Foggy苏黎世上获得3.8%,有雾的驾驶密集为6.0%,而在Miou的雾化驾驶的3.6%,在Miou,在MiOOP中改编为有雾的苏黎世。
translated by 谷歌翻译
本文提出了一种新颖的像素级分布正则化方案(DRSL),用于自我监督的语义分割域的适应性。在典型的环境中,分类损失迫使语义分割模型贪婪地学习捕获类间变化的表示形式,以确定决策(类)边界。由于域的转移,该决策边界在目标域中未对齐,从而导致嘈杂的伪标签对自我监督域的适应性产生不利影响。为了克服这一限制,以及捕获阶层间变化,我们通过类感知的多模式分布学习(MMDL)捕获了像素级内的类内变化。因此,捕获阶层内变化所需的信息与阶层间歧视所需的信息明确分开。因此,捕获的功能更具信息性,导致伪噪声低的伪标记。这种分离使我们能够使用前者的基于跨凝结的自学习,在判别空间和多模式分布空间中进行单独的对齐。稍后,我们通过明确降低映射到同一模式的目标和源像素之间的距离来提出一种新型的随机模式比对方法。距离度量标签上计算出的距离度量损失,并从多模式建模头部反向传播,充当与分割头共享的基本网络上的正常化程序。关于合成到真实域的适应设置的全面实验的结果,即GTA-V/Synthia to CityScapes,表明DRSL的表现优于许多现有方法(MIOU的最小余量为2.3%和2.5%,用于MIOU,而合成的MIOU到CityScapes)。
translated by 谷歌翻译
了解驾驶场景中的雾图像序列对于自主驾驶至关重要,但是由于难以收集和注释不利天气的现实世界图像,这仍然是一项艰巨的任务。最近,自我训练策略被认为是无监督域适应的强大解决方案,通过生成目标伪标签并重新训练模型,它迭代地将模型从源域转化为目标域。但是,选择自信的伪标签不可避免地会遭受稀疏与准确性之间的冲突,这两者都会导致次优模型。为了解决这个问题,我们利用了驾驶场景的雾图图像序列的特征,以使自信的伪标签致密。具体而言,基于顺序图像数据的局部空间相似性和相邻时间对应的两个发现,我们提出了一种新型的目标域驱动的伪标签扩散(TDO-DIF)方案。它采用超像素和光学流来识别空间相似性和时间对应关系,然后扩散自信但稀疏的伪像标签,或者是由流量链接的超像素或时间对应对。此外,为了确保扩散像素的特征相似性,我们在模型重新训练阶段引入了局部空间相似性损失和时间对比度损失。实验结果表明,我们的TDO-DIF方案有助于自适应模型在两个公共可用的天然雾化数据集(超过雾气的Zurich and Forggy驾驶)上实现51.92%和53.84%的平均跨工会(MIOU),这超过了最态度ART无监督的域自适应语义分割方法。可以在https://github.com/velor2012/tdo-dif上找到模型和数据。
translated by 谷歌翻译
Recent deep networks achieved state of the art performance on a variety of semantic segmentation tasks. Despite such progress, these models often face challenges in real world "wild tasks" where large difference between labeled training/source data and unseen test/target data exists. In particular, such difference is often referred to as "domain gap", and could cause significantly decreased performance which cannot be easily remedied by further increasing the representation power. Unsupervised domain adaptation (UDA) seeks to overcome such problem without target domain labels. In this paper, we propose a novel UDA framework based on an iterative self-training (ST) procedure, where the problem is formulated as latent variable loss minimization, and can be solved by alternatively generating pseudo labels on target data and re-training the model with these labels. On top of ST, we also propose a novel classbalanced self-training (CBST) framework to avoid the gradual dominance of large classes on pseudo-label generation, and introduce spatial priors to refine generated labels. Comprehensive experiments show that the proposed methods achieve state of the art semantic segmentation performance under multiple major UDA settings.⋆ indicates equal contribution.
translated by 谷歌翻译
Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) an entropy loss and (ii) an adversarial loss respectively. We demonstrate state-of-theart performance in semantic segmentation on two challenging "synthetic-2-real" set-ups 1 and show that the approach can also be used for detection.
translated by 谷歌翻译
在本文中,我们介绍了全景语义细分,该分段以整体方式提供了对周围环境的全景和密集的像素的理解。由于两个关键的挑战,全景分割尚未探索:(1)全景上的图像扭曲和对象变形; (2)缺乏培训全景分段的注释。为了解决这些问题,我们提出了一个用于全景语义细分(Trans4Pass)体系结构的变压器。首先,为了增强失真意识,Trans4Pass配备了可变形的贴片嵌入(DPE)和可变形的MLP(DMLP)模块,能够在适应之前(适应之前或之后)和任何地方(浅层或深度级别的(浅层或深度))和图像变形(通过任何涉及(浅层或深层))和图像变形(通过任何地方)和图像变形设计。我们进一步介绍了升级后的Trans4Pass+模型,其中包含具有平行令牌混合的DMLPV2,以提高建模歧视性线索的灵活性和概括性。其次,我们提出了一种无监督域适应性的相互典型适应(MPA)策略。第三,除了针孔到型 - 帕诺amic(PIN2PAN)适应外,我们还创建了一个新的数据集(Synpass),其中具有9,080个全景图像,以探索360 {\ deg} Imagery中的合成对真实(Syn2real)适应方案。进行了广泛的实验,这些实验涵盖室内和室外场景,并且使用PIN2PAN和SYN2REAL方案进行了研究。 Trans4Pass+在四个域自适应的全景语义分割基准上实现最先进的性能。代码可从https://github.com/jamycheung/trans4pass获得。
translated by 谷歌翻译
由于在不良视觉条件下记录的图像的密集像素级语义注释缺乏,因此对此类图像的语义分割的无监督域适应性(UDA)引起了兴趣。 UDA适应了在正常条件下训练的模型,以适应目标不利条件域。同时,多个带有驾驶场景的数据集提供了跨多个条件的相同场景的相应图像,这可以用作域适应的弱监督。我们提出了重新设计,这是对基于自训练的UDA方法的通用扩展,该方法利用了这些跨域对应关系。重新调整由两个步骤组成:(1)使用不确定性意识到的密度匹配网络将正常条件图像与相应的不良条件图像对齐,以及(2)使用自适应标签校正机制来完善不良预测,并使用正常预测。我们设计自定义模块,以简化这两个步骤,并在几个不良条件基准(包括ACDC和Dark Zurich)上设置域自适应语义分割的新技术。该方法不引入额外的训练参数,只有在训练期间最少的计算开销 - 可以用作撤离扩展,以改善任何给定的基于自我训练的UDA方法。代码可从https://github.com/brdav/refign获得。
translated by 谷歌翻译
由于难以获得地面真理标签,从虚拟世界数据集学习对于像语义分割等现实世界的应用非常关注。从域适应角度来看,关键挑战是学习输入的域名签名表示,以便从虚拟数据中受益。在本文中,我们提出了一种新颖的三叉戟架构,该架构强制执行共享特征编码器,同时满足对抗源和目标约束,从而学习域不变的特征空间。此外,我们还介绍了一种新颖的训练管道,在前向通过期间能够自我引起的跨域数据增强。这有助于进一步减少域间隙。结合自我培训过程,我们在基准数据集(例如GTA5或Synthia适应城市景观)上获得最先进的结果。Https://github.com/hmrc-ael/trideadapt提供了代码和预先训练的型号。
translated by 谷歌翻译
语义分割在广泛的计算机视觉应用中起着基本作用,提供了全球对图像​​的理解的关键信息。然而,最先进的模型依赖于大量的注释样本,其比在诸如图像分类的任务中获得更昂贵的昂贵的样本。由于未标记的数据替代地获得更便宜,因此无监督的域适应达到了语义分割社区的广泛成功并不令人惊讶。本调查致力于总结这一令人难以置信的快速增长的领域的五年,这包含了语义细分本身的重要性,以及将分段模型适应新环境的关键需求。我们提出了最重要的语义分割方法;我们对语义分割的域适应技术提供了全面的调查;我们揭示了多域学习,域泛化,测试时间适应或无源域适应等较新的趋势;我们通过描述在语义细分研究中最广泛使用的数据集和基准测试来结束本调查。我们希望本调查将在学术界和工业中提供具有全面参考指导的研究人员,并有助于他们培养现场的新研究方向。
translated by 谷歌翻译
深度学习极大地提高了语义细分的性能,但是,它的成功依赖于大量注释的培训数据的可用性。因此,许多努力致力于域自适应语义分割,重点是将语义知识从标记的源域转移到未标记的目标域。现有的自我训练方法通常需要多轮训练,而基于对抗训练的另一个流行框架已知对超参数敏感。在本文中,我们提出了一个易于训练的框架,该框架学习了域自适应语义分割的域不变原型。特别是,我们表明域的适应性与很少的学习共享一个共同的角色,因为两者都旨在识别一些从大量可见数据中学到的知识的看不见的数据。因此,我们提出了一个统一的框架,用于域适应和很少的学习。核心思想是使用从几个镜头注释的目标图像中提取的类原型来对源图像和目标图像的像素进行分类。我们的方法仅涉及一个阶段训练,不需要对大规模的未经通知的目标图像进行培训。此外,我们的方法可以扩展到域适应性和几乎没有射击学习的变体。关于适应GTA5到CITYSCAPES和合成景观的实验表明,我们的方法实现了对最先进的竞争性能。
translated by 谷歌翻译
Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source ground truth labels to the target domain is of great interest. In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation. Considering semantic segmentations as structured outputs that contain spatial similarities between the source and target domains, we adopt adversarial learning in the output space. To further enhance the adapted model, we construct a multi-level adversarial network to effectively perform output space domain adaptation at different feature levels. Extensive experiments and ablation study are conducted under various domain adaptation settings, including synthetic-to-real and cross-city scenarios. We show that the proposed method performs favorably against the stateof-the-art methods in terms of accuracy and visual quality.
translated by 谷歌翻译
受益于从特定情况(源)收集的相当大的像素级注释,训练有素的语义分段模型表现得非常好,但由于大域移位而导致的新情况(目标)失败。为了缓解域间隙,先前的跨域语义分段方法始终在域对齐期间始终假设源数据和目标数据的共存。但是,在实际方案中访问源数据可能会引发隐私问题并违反知识产权。为了解决这个问题,我们专注于一个有趣和具有挑战性的跨域语义分割任务,其中仅向目标域提供训练源模型。具体地,我们提出了一种称为ATP的统一框架,其包括三种方案,即特征对准,双向教学和信息传播。首先,我们设计了课程熵最小化目标,以通过提供的源模型隐式对准目标功能与看不见的源特征。其次,除了vanilla自我训练中的正伪标签外,我们是第一个向该领域引入负伪标签的,并开发双向自我训练策略,以增强目标域中的表示学习。最后,采用信息传播方案来通过伪半监督学习进一步降低目标域内的域内差异。综合与跨城市驾驶数据集的广泛结果验证\ TextBF {ATP}产生最先进的性能,即使是需要访问源数据的方法。
translated by 谷歌翻译
Although unsupervised domain adaptation methods have achieved remarkable performance in semantic scene segmentation in visual perception for self-driving cars, these approaches remain impractical in real-world use cases. In practice, the segmentation models may encounter new data that have not been seen yet. Also, the previous data training of segmentation models may be inaccessible due to privacy problems. Therefore, to address these problems, in this work, we propose a Continual Unsupervised Domain Adaptation (CONDA) approach that allows the model to continuously learn and adapt with respect to the presence of the new data. Moreover, our proposed approach is designed without the requirement of accessing previous training data. To avoid the catastrophic forgetting problem and maintain the performance of the segmentation models, we present a novel Bijective Maximum Likelihood loss to impose the constraint of predicted segmentation distribution shifts. The experimental results on the benchmark of continual unsupervised domain adaptation have shown the advanced performance of the proposed CONDA method.
translated by 谷歌翻译
We describe a simple method for unsupervised domain adaptation, whereby the discrepancy between the source and target distributions is reduced by swapping the lowfrequency spectrum of one with the other. We illustrate the method in semantic segmentation, where densely annotated images are aplenty in one domain (e.g., synthetic data), but difficult to obtain in another (e.g., real images). Current state-of-the-art methods are complex, some requiring adversarial optimization to render the backbone of a neural network invariant to the discrete domain selection variable. Our method does not require any training to perform the domain alignment, just a simple Fourier Transform and its inverse. Despite its simplicity, it achieves state-of-the-art performance in the current benchmarks, when integrated into a relatively standard semantic segmentation model. Our results indicate that even simple procedures can discount nuisance variability in the data that more sophisticated methods struggle to learn away. 1
translated by 谷歌翻译
语义细分是智能车辆了解环境的重要任务。当前的深度学习方法需要大量的标记数据进行培训。手动注释很昂贵,而模拟器可以提供准确的注释。但是,在实际场景中应用时,使用模拟器数据训练的语义分割模型的性能将大大降低。对于语义分割的无监督域适应性(UDA)最近引起了越来越多的研究注意力,旨在减少域间隙并改善目标域的性能。在本文中,我们提出了一种新型的基于两阶段熵的UDA方法,用于语义分割。在第一阶段,我们设计了一个阈值适应的无监督局灶性损失,以使目标域中的预测正常,该预测具有轻度的梯度中和机制,并减轻了在基于熵方法中几乎没有优化硬样品的问题。在第二阶段,我们引入了一种名为跨域图像混合(CIM)的数据增强方法,以弥合两个域的语义知识。我们的方法在合成景观和gta5-to-cityscapes上使用DeepLabV2和使用轻量级的Bisenet实现了最新的58.4%和59.6%的MIOS和59.6%的Mious。
translated by 谷歌翻译
无监督的域适应性(UDA)旨在使在源域(例如合成数据)训练的模型适应目标域(例如现实世界数据),而无需对目标域进行进一步的注释。这项工作着重于语义细分的UDA,因为现实世界像素的注释尤其昂贵。由于语义分割的UDA方法通常是GPU内存密集型的,因此大多数以前的方法仅在缩小的图像上运行。我们质疑这一设计是低分辨率预测通常无法保留细节。随机作物的高分辨率图像训练的替代方法减轻了这个问题,但在捕获远程,域名上下文信息方面缺乏。因此,我们提出了针对UDA的多分辨率训练方法HRDA,结合了小型高分辨率作物的优势,以保存细分细节和大型低分辨率作物,以捕获长期的上下文依赖性和学习的规模注意力,同时又有了较高的范围。保持可管理的GPU内存足迹。 HRDA启用适应小对象并保留细分细节。对于GTA-TO-CITESCAPES,它显着提高了5.5 MIOU和合成景观的4.9 MIOU,分别导致了前所未有的73.8和65.8 miou。该实现可在https://github.com/lhoyer/hrda上获得。
translated by 谷歌翻译
无监督的域适应(UDA)旨在使源域上培训的模型适应到新的目标域,其中没有可用标记的数据。在这项工作中,我们调查从合成计算机生成的域的UDA的问题,以用于学习语义分割的类似但实际的域。我们提出了一种与UDA的一致性正则化方法结合的语义一致的图像到图像转换方法。我们克服了将合成图像转移到真实的图像的先前限制。我们利用伪标签来学习生成的图像到图像转换模型,该图像到图像转换模型从两个域上的语义标签接收额外的反馈。我们的方法优于最先进的方法,将图像到图像转换和半监督学习与相关域适应基准,即Citycapes和Synthia上的CutyCapes和Synthia进行了全面的学习。
translated by 谷歌翻译
In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. MIC enforces the consistency between predictions of masked target images, where random patches are withheld, and pseudo-labels that are generated based on the complete image by an exponential moving average teacher. To minimize the consistency loss, the network has to learn to infer the predictions of the masked regions from their context. Due to its simple and universal concept, MIC can be integrated into various UDA methods across different visual recognition tasks such as image classification, semantic segmentation, and object detection. MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8% on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an improvement of +2.1 and +3.0 percent points over the previous state of the art. The implementation is available at https://github.com/lhoyer/MIC.
translated by 谷歌翻译
5级自动驾驶汽车自主权需要一个强大的视觉感知系统,可以在任何视觉条件下解析输入图像。但是,现有的语义分段数据集是由正常条件下捕获的图像主导,或者规模小。为了解决这个问题,我们引入了ACDC,具有对应于培训和测试原种视觉条件的语义分段方法的不利条件数据集。 ACDC由一组大型4006个图像组成,它在四个常见的不利条件之间同样分布:雾,夜间,雨和雪。每个不利条件图像具有高质量的细像素级语义注释,在正常条件下采取的相同场景的相应图像,以及区分清晰和不确定的语义内容的图像内区域之间的二进制掩模。因此,ACDC支持标准语义分割,新引入的不确定性感知语义分割。详细的实证研究表明,ACDC对最先进的监督和无人监督和无监督的方法的挑战,并表明了我们数据集在转向该领域的进展方面的价值。我们的数据集和基准是公开可用的。
translated by 谷歌翻译
We consider the problem of unsupervised domain adaptation in semantic segmentation. A key in this campaign consists in reducing the domain shift, i.e., enforcing the data distributions of the two domains to be similar. One of the common strategies is to align the marginal distribution in the feature space through adversarial learning. However, this global alignment strategy does not consider the category-level joint distribution. A possible consequence of such global movement is that some categories which are originally well aligned between the source and target may be incorrectly mapped, thus leading to worse segmentation results in target domain. To address this problem, we introduce a category-level adversarial network, aiming to enforce local semantic consistency during the trend of global alignment. Our idea is to take a close look at the category-level joint distribution and align each class with an adaptive adversarial loss. Specifically, we reduce the weight of the adversarial loss for category-level aligned features while increasing the adversarial force for those poorly aligned. In this process, we decide how well a feature is category-level aligned between source and target by a co-training approach. In two domain adaptation tasks, i.e., GTA5 → Cityscapes and SYN-THIA → Cityscapes, we validate that the proposed method matches the state of the art in segmentation accuracy.
translated by 谷歌翻译