深度神经网络(DNN)极大地促进了语义分割中的性能增益。然而,训练DNN通常需要大量的像素级标记数据,这在实践中收集昂贵且耗时。为了减轻注释负担,本文提出了一种自组装的生成对抗网络(SE-GAN)利用语义分割的跨域数据。在SE-GaN中,教师网络和学生网络构成用于生成语义分割图的自组装模型,与鉴别器一起形成GaN。尽管它很简单,我们发现SE-GaN可以显着提高对抗性训练的性能,提高模型的稳定性,这是由大多数普遍培训的方法共享的常见障碍。我们理论上分析SE-GaN并提供$ \ Mathcal o(1 / \ sqrt {n})$泛化绑定($ n $是培训样本大小),这表明控制了鉴别者的假设复杂性,以提高概括性。因此,我们选择一个简单的网络作为鉴别器。两个标准设置中的广泛和系统实验表明,该方法显着优于最新的最先进的方法。我们模型的源代码即将推出。
translated by 谷歌翻译
Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) an entropy loss and (ii) an adversarial loss respectively. We demonstrate state-of-theart performance in semantic segmentation on two challenging "synthetic-2-real" set-ups 1 and show that the approach can also be used for detection.
translated by 谷歌翻译
We consider the problem of unsupervised domain adaptation in semantic segmentation. A key in this campaign consists in reducing the domain shift, i.e., enforcing the data distributions of the two domains to be similar. One of the common strategies is to align the marginal distribution in the feature space through adversarial learning. However, this global alignment strategy does not consider the category-level joint distribution. A possible consequence of such global movement is that some categories which are originally well aligned between the source and target may be incorrectly mapped, thus leading to worse segmentation results in target domain. To address this problem, we introduce a category-level adversarial network, aiming to enforce local semantic consistency during the trend of global alignment. Our idea is to take a close look at the category-level joint distribution and align each class with an adaptive adversarial loss. Specifically, we reduce the weight of the adversarial loss for category-level aligned features while increasing the adversarial force for those poorly aligned. In this process, we decide how well a feature is category-level aligned between source and target by a co-training approach. In two domain adaptation tasks, i.e., GTA5 → Cityscapes and SYN-THIA → Cityscapes, we validate that the proposed method matches the state of the art in segmentation accuracy.
translated by 谷歌翻译
无监督的域适应(UDA)旨在使源域上培训的模型适应到新的目标域,其中没有可用标记的数据。在这项工作中,我们调查从合成计算机生成的域的UDA的问题,以用于学习语义分割的类似但实际的域。我们提出了一种与UDA的一致性正则化方法结合的语义一致的图像到图像转换方法。我们克服了将合成图像转移到真实的图像的先前限制。我们利用伪标签来学习生成的图像到图像转换模型,该图像到图像转换模型从两个域上的语义标签接收额外的反馈。我们的方法优于最先进的方法,将图像到图像转换和半监督学习与相关域适应基准,即Citycapes和Synthia上的CutyCapes和Synthia进行了全面的学习。
translated by 谷歌翻译
Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source ground truth labels to the target domain is of great interest. In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation. Considering semantic segmentations as structured outputs that contain spatial similarities between the source and target domains, we adopt adversarial learning in the output space. To further enhance the adapted model, we construct a multi-level adversarial network to effectively perform output space domain adaptation at different feature levels. Extensive experiments and ablation study are conducted under various domain adaptation settings, including synthetic-to-real and cross-city scenarios. We show that the proposed method performs favorably against the stateof-the-art methods in terms of accuracy and visual quality.
translated by 谷歌翻译
语义分割在广泛的计算机视觉应用中起着基本作用,提供了全球对图像​​的理解的关键信息。然而,最先进的模型依赖于大量的注释样本,其比在诸如图像分类的任务中获得更昂贵的昂贵的样本。由于未标记的数据替代地获得更便宜,因此无监督的域适应达到了语义分割社区的广泛成功并不令人惊讶。本调查致力于总结这一令人难以置信的快速增长的领域的五年,这包含了语义细分本身的重要性,以及将分段模型适应新环境的关键需求。我们提出了最重要的语义分割方法;我们对语义分割的域适应技术提供了全面的调查;我们揭示了多域学习,域泛化,测试时间适应或无源域适应等较新的趋势;我们通过描述在语义细分研究中最广泛使用的数据集和基准测试来结束本调查。我们希望本调查将在学术界和工业中提供具有全面参考指导的研究人员,并有助于他们培养现场的新研究方向。
translated by 谷歌翻译
无监督的域适应性(UDA)旨在使标记的源域的模型适应未标记的目标域。现有的基于UDA的语义细分方法始终降低像素级别,功能级别和输出级别的域移动。但是,几乎所有这些都在很大程度上忽略了上下文依赖性,该依赖性通常在不同的领域共享,从而导致较不怀疑的绩效。在本文中,我们提出了一个新颖的环境感知混音(camix)框架自适应语义分割的框架,该框架以完全端到端的可训练方式利用了上下文依赖性的这一重要线索作为显式的先验知识,以增强对适应性的适应性目标域。首先,我们通过利用积累的空间分布和先前的上下文关系来提出上下文掩盖的生成策略。生成的上下文掩码在这项工作中至关重要,并将指导三个不同级别的上下文感知域混合。此外,提供了背景知识,我们引入了重要的一致性损失,以惩罚混合学生预测与混合教师预测之间的不一致,从而减轻了适应性的负面转移,例如早期绩效降级。广泛的实验和分析证明了我们方法对广泛使用的UDA基准的最新方法的有效性。
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
While deep learning methods hitherto have achieved considerable success in medical image segmentation, they are still hampered by two limitations: (i) reliance on large-scale well-labeled datasets, which are difficult to curate due to the expert-driven and time-consuming nature of pixel-level annotations in clinical practices, and (ii) failure to generalize from one domain to another, especially when the target domain is a different modality with severe domain shifts. Recent unsupervised domain adaptation~(UDA) techniques leverage abundant labeled source data together with unlabeled target data to reduce the domain gap, but these methods degrade significantly with limited source annotations. In this study, we address this underexplored UDA problem, investigating a challenging but valuable realistic scenario, where the source domain not only exhibits domain shift~w.r.t. the target domain but also suffers from label scarcity. In this regard, we propose a novel and generic framework called ``Label-Efficient Unsupervised Domain Adaptation"~(LE-UDA). In LE-UDA, we construct self-ensembling consistency for knowledge transfer between both domains, as well as a self-ensembling adversarial learning module to achieve better feature alignment for UDA. To assess the effectiveness of our method, we conduct extensive experiments on two different tasks for cross-modality segmentation between MRI and CT images. Experimental results demonstrate that the proposed LE-UDA can efficiently leverage limited source labels to improve cross-domain segmentation performance, outperforming state-of-the-art UDA approaches in the literature. Code is available at: https://github.com/jacobzhaoziyuan/LE-UDA.
translated by 谷歌翻译
由于难以获得地面真理标签,从虚拟世界数据集学习对于像语义分割等现实世界的应用非常关注。从域适应角度来看,关键挑战是学习输入的域名签名表示,以便从虚拟数据中受益。在本文中,我们提出了一种新颖的三叉戟架构,该架构强制执行共享特征编码器,同时满足对抗源和目标约束,从而学习域不变的特征空间。此外,我们还介绍了一种新颖的训练管道,在前向通过期间能够自我引起的跨域数据增强。这有助于进一步减少域间隙。结合自我培训过程,我们在基准数据集(例如GTA5或Synthia适应城市景观)上获得最先进的结果。Https://github.com/hmrc-ael/trideadapt提供了代码和预先训练的型号。
translated by 谷歌翻译
Unsupervised source-free domain adaptation methods aim to train a model to be used in the target domain utilizing the pretrained source-domain model and unlabeled target-domain data, where the source data may not be accessible due to intellectual property or privacy issues. These methods frequently utilize self-training with pseudo-labeling thresholded by prediction confidence. In a source-free scenario, only supervision comes from target data, and thresholding limits the contribution of the self-training. In this study, we utilize self-training with a mean-teacher approach. The student network is trained with all predictions of the teacher network. Instead of thresholding the predictions, the gradients calculated from the pseudo-labels are weighted based on the reliability of the teacher's predictions. We propose a novel method that uses proxy-based metric learning to estimate reliability. We train a metric network on the encoder features of the teacher network. Since the teacher is updated with the moving average, the encoder feature space is slowly changing. Therefore, the metric network can be updated in training time, which enables end-to-end training. We also propose a metric-based online ClassMix method to augment the input of the student network where the patches to be mixed are decided based on the metric reliability. We evaluated our method in synthetic-to-real and cross-city scenarios. The benchmarks show that our method significantly outperforms the existing state-of-the-art methods.
translated by 谷歌翻译
语义细分是一种关键技术,涉及高分辨率遥感(HRS)图像的自动解释,并引起了遥感社区的广泛关注。由于其层次表示能力,深度卷积神经网络(DCNN)已成功应用于HRS图像语义分割任务。但是,对大量培训数据的严重依赖性以及对数据分布变化的敏感性严重限制了DCNNS在HRS图像的语义分割中的潜在应用。这项研究提出了一种新型的无监督域适应性语义分割网络(MemoryAdaptnet),用于HRS图像的语义分割。 MemoryAdaptnet构建了一种输出空间对抗学习方案,以弥合源域和目标域之间的域分布差异,并缩小域移位的影响。具体而言,我们嵌入了一个不变的特征内存模块来存储不变的域级上下文信息,因为从对抗学习获得的功能仅代表当前有限输入的变体特征。该模块由类别注意力驱动的不变域级上下文集合模块集成到当前伪不变功能,以进一步增强像素表示。基于熵的伪标签滤波策略用于更新当前目标图像的高额伪不变功能的内存模块。在三个跨域任务下进行的广泛实验表明,我们提出的记忆ADAPTNET非常优于最新方法。
translated by 谷歌翻译
无监督的域适应性(UDA)旨在使在标记的源域上训练的模型适应未标记的目标域。在本文中,我们提出了典型的对比度适应(PROCA),这是一种无监督域自适应语义分割的简单有效的对比度学习方法。以前的域适应方法仅考虑跨各个域的阶级内表示分布的对齐,而阶层间结构关系的探索不足,从而导致目标域上的对齐表示可能不像在源上歧视的那样容易歧视。域了。取而代之的是,ProCA将类间信息纳入班级原型,并采用以班级为中心的分布对齐进行适应。通过将同一类原型与阳性和其他类原型视为实现以集体为中心的分配对齐方式的负面原型,Proca在经典领域适应任务上实现了最先进的性能,{\ em i.e. text {and} synthia $ \ to $ cityScapes}。代码可在\ href {https://github.com/jiangzhengkai/proca} {proca}获得代码
translated by 谷歌翻译
Recent deep networks achieved state of the art performance on a variety of semantic segmentation tasks. Despite such progress, these models often face challenges in real world "wild tasks" where large difference between labeled training/source data and unseen test/target data exists. In particular, such difference is often referred to as "domain gap", and could cause significantly decreased performance which cannot be easily remedied by further increasing the representation power. Unsupervised domain adaptation (UDA) seeks to overcome such problem without target domain labels. In this paper, we propose a novel UDA framework based on an iterative self-training (ST) procedure, where the problem is formulated as latent variable loss minimization, and can be solved by alternatively generating pseudo labels on target data and re-training the model with these labels. On top of ST, we also propose a novel classbalanced self-training (CBST) framework to avoid the gradual dominance of large classes on pseudo-label generation, and introduce spatial priors to refine generated labels. Comprehensive experiments show that the proposed methods achieve state of the art semantic segmentation performance under multiple major UDA settings.⋆ indicates equal contribution.
translated by 谷歌翻译
Domain adaptation for semantic image segmentation is very necessary since manually labeling large datasets with pixel-level labels is expensive and time consuming. Existing domain adaptation techniques either work on limited datasets, or yield not so good performance compared with supervised learning. In this paper, we propose a novel bidirectional learning framework for domain adaptation of segmentation. Using the bidirectional learning, the image translation model and the segmentation adaptation model can be learned alternatively and promote to each other. Furthermore, we propose a self-supervised learning algorithm to learn a better segmentation adaptation model and in return improve the image translation model. Experiments show that our method is superior to the state-of-the-art methods in domain adaptation of segmentation with a big margin. The source code is available at https://github.com/liyunsheng13/BDL.
translated by 谷歌翻译
在本文中,我们解决了一次性分段的单次无监督域适应(OSUDA)的问题,其中分段器在训练期间只看到一个未标记的目标图像。在这种情况下,传统的无监督域适应模型通常失败,因为它们不能适应目标域,以具有过度拟合到一个(或几个)目标样本。为了解决这个问题,现有的OSUDA方法通常集成了一种样式传输模块,基于未标记的目标样本执行域随机化,可以在训练期间探讨目标样本周围的多个域。然而,这种样式传输模块依赖于一组额外的图像作为预训练的样式参考,并且还增加了对域适应的内存需求。在这里,我们提出了一种新的奥德达方法,可以有效地缓解这种计算负担。具体而言,我们将多个样式混合层集成到分段器中,该分段器播放样式传输模块的作用,以在不引入任何学习参数的情况下使源图像进行体现。此外,我们提出了一种剪辑的原型匹配(PPM)方法来加权考虑源像素在监督训练期间的重要性,以缓解负适应。实验结果表明,我们的方法在单次设置下的两个常用基准上实现了新的最先进的性能,并且比所有比较方法更有效。
translated by 谷歌翻译
深度学习算法在非常高分辨率(VHR)图像的语义分割方面取得了巨大成功。然而,培训这些模型通常需要大量准确的像素注释,这非常费力且耗时。为了减轻注释负担,本文提出了一个一致性调节的区域生长网络(CRGNET),以实现具有点级注释的VHR图像的语义分割。 CRGNET的关键思想是迭代选择未标记的像素,具有很高的信心,可以从原始稀疏点扩展带注释的区域。但是,由于扩展的注释中可能存在一些错误和噪音,因此直接向它们学习可能会误导网络的培训。为此,我们进一步提出了一致性正则化策略,在该策略中,基本分类器和扩展的分类器被采用。具体而言,基本分类器受原始稀疏注释的监督,而扩展的分类器的目的是从基本分类器生成的扩展注释中学习具有区域生长机制。因此,通过最大程度地减少基础和扩展分类器的预测之间的差异来实现一致性正则化。我们发现如此简单的正则化策略对于控制区域生长机制的质量非常有用。在两个基准数据集上进行的广泛实验表明,所提出的CRGNET显着优于现有的最新方法。代码和预培训模型可在线获得(https://github.com/yonghaoxu/crgnet)。
translated by 谷歌翻译
本文提出了一种新颖的像素级分布正则化方案(DRSL),用于自我监督的语义分割域的适应性。在典型的环境中,分类损失迫使语义分割模型贪婪地学习捕获类间变化的表示形式,以确定决策(类)边界。由于域的转移,该决策边界在目标域中未对齐,从而导致嘈杂的伪标签对自我监督域的适应性产生不利影响。为了克服这一限制,以及捕获阶层间变化,我们通过类感知的多模式分布学习(MMDL)捕获了像素级内的类内变化。因此,捕获阶层内变化所需的信息与阶层间歧视所需的信息明确分开。因此,捕获的功能更具信息性,导致伪噪声低的伪标记。这种分离使我们能够使用前者的基于跨凝结的自学习,在判别空间和多模式分布空间中进行单独的对齐。稍后,我们通过明确降低映射到同一模式的目标和源像素之间的距离来提出一种新型的随机模式比对方法。距离度量标签上计算出的距离度量损失,并从多模式建模头部反向传播,充当与分割头共享的基本网络上的正常化程序。关于合成到真实域的适应设置的全面实验的结果,即GTA-V/Synthia to CityScapes,表明DRSL的表现优于许多现有方法(MIOU的最小余量为2.3%和2.5%,用于MIOU,而合成的MIOU到CityScapes)。
translated by 谷歌翻译
语义细分是智能车辆了解环境的重要任务。当前的深度学习方法需要大量的标记数据进行培训。手动注释很昂贵,而模拟器可以提供准确的注释。但是,在实际场景中应用时,使用模拟器数据训练的语义分割模型的性能将大大降低。对于语义分割的无监督域适应性(UDA)最近引起了越来越多的研究注意力,旨在减少域间隙并改善目标域的性能。在本文中,我们提出了一种新型的基于两阶段熵的UDA方法,用于语义分割。在第一阶段,我们设计了一个阈值适应的无监督局灶性损失,以使目标域中的预测正常,该预测具有轻度的梯度中和机制,并减轻了在基于熵方法中几乎没有优化硬样品的问题。在第二阶段,我们引入了一种名为跨域图像混合(CIM)的数据增强方法,以弥合两个域的语义知识。我们的方法在合成景观和gta5-to-cityscapes上使用DeepLabV2和使用轻量级的Bisenet实现了最新的58.4%和59.6%的MIOS和59.6%的Mious。
translated by 谷歌翻译
深度学习极大地提高了语义细分的性能,但是,它的成功依赖于大量注释的培训数据的可用性。因此,许多努力致力于域自适应语义分割,重点是将语义知识从标记的源域转移到未标记的目标域。现有的自我训练方法通常需要多轮训练,而基于对抗训练的另一个流行框架已知对超参数敏感。在本文中,我们提出了一个易于训练的框架,该框架学习了域自适应语义分割的域不变原型。特别是,我们表明域的适应性与很少的学习共享一个共同的角色,因为两者都旨在识别一些从大量可见数据中学到的知识的看不见的数据。因此,我们提出了一个统一的框架,用于域适应和很少的学习。核心思想是使用从几个镜头注释的目标图像中提取的类原型来对源图像和目标图像的像素进行分类。我们的方法仅涉及一个阶段训练,不需要对大规模的未经通知的目标图像进行培训。此外,我们的方法可以扩展到域适应性和几乎没有射击学习的变体。关于适应GTA5到CITYSCAPES和合成景观的实验表明,我们的方法实现了对最先进的竞争性能。
translated by 谷歌翻译