Most existing scene text detectors require large-scale training data which cannot scale well due to two major factors: 1) scene text images often have domain-specific distributions; 2) collecting large-scale annotated scene text images is laborious. We study domain adaptive scene text detection, a largely neglected yet very meaningful task that aims for optimal transfer of labelled scene text images while handling unlabelled images in various new domains. Specifically, we design SCAST, a subcategory-aware self-training technique that mitigates the network overfitting and noisy pseudo labels in domain adaptive scene text detection effectively. SCAST consists of two novel designs. For labelled source data, it introduces pseudo subcategories for both foreground texts and background stuff which helps train more generalizable source models with multi-class detection objectives. For unlabelled target data, it mitigates the network overfitting by co-regularizing the binary and subcategory classifiers trained in the source domain. Extensive experiments show that SCAST achieves superior detection performance consistently across multiple public benchmarks, and it also generalizes well to other domain adaptive detection tasks such as vehicle detection.
translated by 谷歌翻译
Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) an entropy loss and (ii) an adversarial loss respectively. We demonstrate state-of-theart performance in semantic segmentation on two challenging "synthetic-2-real" set-ups 1 and show that the approach can also be used for detection.
translated by 谷歌翻译
我们解决对象检测中的域适应问题,其中在源(带有监控)和目标域(没有监督的域的域名)之间存在显着的域移位。作为广泛采用的域适应方法,自培训教师学生框架(学生模型从教师模型生成的伪标签学习)在目标域中产生了显着的精度增益。然而,由于其偏向源域,它仍然存在从教师产生的大量低质量伪标签(例如,误报)。为了解决这个问题,我们提出了一种叫做自适应无偏见教师(AUT)的自我训练框架,利用对抗的对抗学习和弱强的数据增强来解决域名。具体而言,我们在学生模型中使用特征级的对抗性培训,确保从源和目标域中提取的功能共享类似的统计数据。这使学生模型能够捕获域不变的功能。此外,我们在目标领域的教师模型和两个域上的学生模型之间应用了弱强的增强和相互学习。这使得教师模型能够从学生模型中逐渐受益,而不会遭受域移位。我们展示了AUT通过大边距显示所有现有方法甚至Oracle(完全监督)模型的优势。例如,我们在有雾的城市景观(Clipart1k)上实现了50.9%(49.3%)地图,分别比以前的最先进和甲骨文高9.2%(5.2%)和8.2%(11.0%)
translated by 谷歌翻译
人搜索是一项具有挑战性的任务,旨在实现共同的行人检测和人重新识别(REID)。以前的作品在完全和弱监督的设置下取得了重大进步。但是,现有方法忽略了人搜索模型的概括能力。在本文中,我们采取了进一步的步骤和现在的域自适应人员搜索(DAPS),该搜索旨在将模型从标记的源域概括为未标记的目标域。在这种新环境下出现了两个主要挑战:一个是如何同时解决检测和重新ID任务的域未对准问题,另一个是如何在目标域上训练REID子任务而不可靠的检测结果。为了应对这些挑战,我们提出了一个强大的基线框架,并使用两个专用设计。 1)我们设计一个域对齐模块,包括图像级和任务敏感的实例级别对齐,以最大程度地减少域差异。 2)我们通过动态聚类策略充分利用未标记的数据,并使用伪边界框来支持目标域上的REID和检测训练。通过上述设计,我们的框架在MAP中获得了34.7%的地图,而PRW数据集的TOP-1则达到80.6%,超过了直接转移基线的大幅度。令人惊讶的是,我们无监督的DAPS模型的性能甚至超过了一些完全和弱监督的方法。该代码可在https://github.com/caposerenity/daps上找到。
translated by 谷歌翻译
无监督的域适应性(UDA)旨在使在标记的源域上训练的模型适应未标记的目标域。在本文中,我们提出了典型的对比度适应(PROCA),这是一种无监督域自适应语义分割的简单有效的对比度学习方法。以前的域适应方法仅考虑跨各个域的阶级内表示分布的对齐,而阶层间结构关系的探索不足,从而导致目标域上的对齐表示可能不像在源上歧视的那样容易歧视。域了。取而代之的是,ProCA将类间信息纳入班级原型,并采用以班级为中心的分布对齐进行适应。通过将同一类原型与阳性和其他类原型视为实现以集体为中心的分配对齐方式的负面原型,Proca在经典领域适应任务上实现了最先进的性能,{\ em i.e. text {and} synthia $ \ to $ cityScapes}。代码可在\ href {https://github.com/jiangzhengkai/proca} {proca}获得代码
translated by 谷歌翻译
We consider the problem of unsupervised domain adaptation in semantic segmentation. A key in this campaign consists in reducing the domain shift, i.e., enforcing the data distributions of the two domains to be similar. One of the common strategies is to align the marginal distribution in the feature space through adversarial learning. However, this global alignment strategy does not consider the category-level joint distribution. A possible consequence of such global movement is that some categories which are originally well aligned between the source and target may be incorrectly mapped, thus leading to worse segmentation results in target domain. To address this problem, we introduce a category-level adversarial network, aiming to enforce local semantic consistency during the trend of global alignment. Our idea is to take a close look at the category-level joint distribution and align each class with an adaptive adversarial loss. Specifically, we reduce the weight of the adversarial loss for category-level aligned features while increasing the adversarial force for those poorly aligned. In this process, we decide how well a feature is category-level aligned between source and target by a co-training approach. In two domain adaptation tasks, i.e., GTA5 → Cityscapes and SYN-THIA → Cityscapes, we validate that the proposed method matches the state of the art in segmentation accuracy.
translated by 谷歌翻译
Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.
translated by 谷歌翻译
受益于从特定情况(源)收集的相当大的像素级注释,训练有素的语义分段模型表现得非常好,但由于大域移位而导致的新情况(目标)失败。为了缓解域间隙,先前的跨域语义分段方法始终在域对齐期间始终假设源数据和目标数据的共存。但是,在实际方案中访问源数据可能会引发隐私问题并违反知识产权。为了解决这个问题,我们专注于一个有趣和具有挑战性的跨域语义分割任务,其中仅向目标域提供训练源模型。具体地,我们提出了一种称为ATP的统一框架,其包括三种方案,即特征对准,双向教学和信息传播。首先,我们设计了课程熵最小化目标,以通过提供的源模型隐式对准目标功能与看不见的源特征。其次,除了vanilla自我训练中的正伪标签外,我们是第一个向该领域引入负伪标签的,并开发双向自我训练策略,以增强目标域中的表示学习。最后,采用信息传播方案来通过伪半监督学习进一步降低目标域内的域内差异。综合与跨城市驾驶数据集的广泛结果验证\ TextBF {ATP}产生最先进的性能,即使是需要访问源数据的方法。
translated by 谷歌翻译
Recent deep networks achieved state of the art performance on a variety of semantic segmentation tasks. Despite such progress, these models often face challenges in real world "wild tasks" where large difference between labeled training/source data and unseen test/target data exists. In particular, such difference is often referred to as "domain gap", and could cause significantly decreased performance which cannot be easily remedied by further increasing the representation power. Unsupervised domain adaptation (UDA) seeks to overcome such problem without target domain labels. In this paper, we propose a novel UDA framework based on an iterative self-training (ST) procedure, where the problem is formulated as latent variable loss minimization, and can be solved by alternatively generating pseudo labels on target data and re-training the model with these labels. On top of ST, we also propose a novel classbalanced self-training (CBST) framework to avoid the gradual dominance of large classes on pseudo-label generation, and introduce spatial priors to refine generated labels. Comprehensive experiments show that the proposed methods achieve state of the art semantic segmentation performance under multiple major UDA settings.⋆ indicates equal contribution.
translated by 谷歌翻译
This paper solves a generalized version of the problem of multi-source model adaptation for semantic segmentation. Model adaptation is proposed as a new domain adaptation problem which requires access to a pre-trained model instead of data for the source domain. A general multi-source setting of model adaptation assumes strictly that each source domain shares a common label space with the target domain. As a relaxation, we allow the label space of each source domain to be a subset of that of the target domain and require the union of the source-domain label spaces to be equal to the target-domain label space. For the new setting named union-set multi-source model adaptation, we propose a method with a novel learning strategy named model-invariant feature learning, which takes full advantage of the diverse characteristics of the source-domain models, thereby improving the generalization in the target domain. We conduct extensive experiments in various adaptation settings to show the superiority of our method. The code is available at https://github.com/lzy7976/union-set-model-adaptation.
translated by 谷歌翻译
本文提出了一种新颖的像素级分布正则化方案(DRSL),用于自我监督的语义分割域的适应性。在典型的环境中,分类损失迫使语义分割模型贪婪地学习捕获类间变化的表示形式,以确定决策(类)边界。由于域的转移,该决策边界在目标域中未对齐,从而导致嘈杂的伪标签对自我监督域的适应性产生不利影响。为了克服这一限制,以及捕获阶层间变化,我们通过类感知的多模式分布学习(MMDL)捕获了像素级内的类内变化。因此,捕获阶层内变化所需的信息与阶层间歧视所需的信息明确分开。因此,捕获的功能更具信息性,导致伪噪声低的伪标记。这种分离使我们能够使用前者的基于跨凝结的自学习,在判别空间和多模式分布空间中进行单独的对齐。稍后,我们通过明确降低映射到同一模式的目标和源像素之间的距离来提出一种新型的随机模式比对方法。距离度量标签上计算出的距离度量损失,并从多模式建模头部反向传播,充当与分割头共享的基本网络上的正常化程序。关于合成到真实域的适应设置的全面实验的结果,即GTA-V/Synthia to CityScapes,表明DRSL的表现优于许多现有方法(MIOU的最小余量为2.3%和2.5%,用于MIOU,而合成的MIOU到CityScapes)。
translated by 谷歌翻译
我们专注于在不同情况下在车道检测中桥接域差异,以大大降低自动驾驶的额外注释和重新训练成本。关键因素阻碍了跨域车道检测的性能改善,即常规方法仅着眼于像素损失,同时忽略了泳道的形状和位置验证阶段。为了解决该问题,我们提出了多级域Adaptation(MLDA)框架,这是一种在三个互补语义级别的像素,实例和类别的互补语义级别处理跨域车道检测的新观点。具体而言,在像素级别上,我们建议在自我训练中应用跨级置信度限制,以应对车道和背景的不平衡置信分布。在实例层面上,我们超越像素,将分段车道视为实例,并通过三胞胎学习促进目标域中的判别特征,这有效地重建了车道的语义环境,并有助于减轻特征混乱。在类别级别,我们提出了一个自适应域间嵌入模块,以在自适应过程中利用泳道的先验位置。在两个具有挑战性的数据集(即Tusimple和Culane)中,我们的方法将车道检测性能提高了很大的利润率,与先进的领域适应算法相比,精度分别提高了8.8%和F1级的7.4%。
translated by 谷歌翻译
语义分割在广泛的计算机视觉应用中起着基本作用,提供了全球对图像​​的理解的关键信息。然而,最先进的模型依赖于大量的注释样本,其比在诸如图像分类的任务中获得更昂贵的昂贵的样本。由于未标记的数据替代地获得更便宜,因此无监督的域适应达到了语义分割社区的广泛成功并不令人惊讶。本调查致力于总结这一令人难以置信的快速增长的领域的五年,这包含了语义细分本身的重要性,以及将分段模型适应新环境的关键需求。我们提出了最重要的语义分割方法;我们对语义分割的域适应技术提供了全面的调查;我们揭示了多域学习,域泛化,测试时间适应或无源域适应等较新的趋势;我们通过描述在语义细分研究中最广泛使用的数据集和基准测试来结束本调查。我们希望本调查将在学术界和工业中提供具有全面参考指导的研究人员,并有助于他们培养现场的新研究方向。
translated by 谷歌翻译
在大量标记培训数据的监督下,视频语义细分取得了巨大进展。但是,域自适应视频分割,可以通过从标记的源域对未标记的目标域进行调整来减轻数据标记约束,这很大程度上被忽略了。我们设计了时间伪监督(TPS),这是一种简单有效的方法,探讨了从未标记的目标视频学习有效表示的一致性培训的想法。与在空间空间中建立一致性的传统一致性训练不同,我们通过在增强视频框架之间执行模型一致性来探索时空空间中的一致性训练,这有助于从更多样化的目标数据中学习。具体来说,我们设计了跨框架伪标签,以从以前的视频帧中提供伪监督,同时从增强的当前视频帧中学习。跨框架伪标签鼓励网络产生高确定性预测,从而有效地通过跨框架增强来促进一致性训练。对多个公共数据集进行的广泛实验表明,与最先进的ART相比,TPS更容易实现,更稳定,并且可以实现卓越的视频细分精度。
translated by 谷歌翻译
由于难以获得地面真理标签,从虚拟世界数据集学习对于像语义分割等现实世界的应用非常关注。从域适应角度来看,关键挑战是学习输入的域名签名表示,以便从虚拟数据中受益。在本文中,我们提出了一种新颖的三叉戟架构,该架构强制执行共享特征编码器,同时满足对抗源和目标约束,从而学习域不变的特征空间。此外,我们还介绍了一种新颖的训练管道,在前向通过期间能够自我引起的跨域数据增强。这有助于进一步减少域间隙。结合自我培训过程,我们在基准数据集(例如GTA5或Synthia适应城市景观)上获得最先进的结果。Https://github.com/hmrc-ael/trideadapt提供了代码和预先训练的型号。
translated by 谷歌翻译
本文提出FogAdapt,一种用于密集有雾场景的语义细分域的新方法。虽然已经针对显着的研究来减少语义分割中的域移位,但对具有恶劣天气条件的场景的适应仍然是一个开放的问题。由于天气状况,如雾,烟雾和雾度,加剧了域移位的场景的可见性,从而使得在这种情况下进行了无监督的适应性。我们提出了一种自熵和多尺度信息增强的自我监督域适应方法(FOGADAPT),以最大限度地减少有雾场景分割的域移位。由经验证据支持,雾密度的增加导致分割概率的高自熵性,我们引入了基于自熵的损耗功能来引导适应方法。此外,在不同的图像尺度上获得的推论由不确定性组合并加权,以生成目标域的尺度不变伪标签。这些规模不变的伪标签对可见性和比例变化具有鲁棒性。我们在真正的雾景场景中评估了真正的清晰天气场景模型,适应和综合非雾图像到真正的雾场景适应情景。我们的实验表明,FogAdapt在有雾图像的语义分割中的目前最先进的情况下显着优异。具体而言,通过考虑标准设置与最先进的(SOTA)方法相比,FogaDATK在Foggy苏黎世上获得3.8%,有雾的驾驶密集为6.0%,而在Miou的雾化驾驶的3.6%,在Miou,在MiOOP中改编为有雾的苏黎世。
translated by 谷歌翻译
了解驾驶场景中的雾图像序列对于自主驾驶至关重要,但是由于难以收集和注释不利天气的现实世界图像,这仍然是一项艰巨的任务。最近,自我训练策略被认为是无监督域适应的强大解决方案,通过生成目标伪标签并重新训练模型,它迭代地将模型从源域转化为目标域。但是,选择自信的伪标签不可避免地会遭受稀疏与准确性之间的冲突,这两者都会导致次优模型。为了解决这个问题,我们利用了驾驶场景的雾图图像序列的特征,以使自信的伪标签致密。具体而言,基于顺序图像数据的局部空间相似性和相邻时间对应的两个发现,我们提出了一种新型的目标域驱动的伪标签扩散(TDO-DIF)方案。它采用超像素和光学流来识别空间相似性和时间对应关系,然后扩散自信但稀疏的伪像标签,或者是由流量链接的超像素或时间对应对。此外,为了确保扩散像素的特征相似性,我们在模型重新训练阶段引入了局部空间相似性损失和时间对比度损失。实验结果表明,我们的TDO-DIF方案有助于自适应模型在两个公共可用的天然雾化数据集(超过雾气的Zurich and Forggy驾驶)上实现51.92%和53.84%的平均跨工会(MIOU),这超过了最态度ART无监督的域自适应语义分割方法。可以在https://github.com/velor2012/tdo-dif上找到模型和数据。
translated by 谷歌翻译
Weakly-supervised object detection (WSOD) models attempt to leverage image-level annotations in lieu of accurate but costly-to-obtain object localization labels. This oftentimes leads to substandard object detection and localization at inference time. To tackle this issue, we propose D2DF2WOD, a Dual-Domain Fully-to-Weakly Supervised Object Detection framework that leverages synthetic data, annotated with precise object localization, to supplement a natural image target domain, where only image-level labels are available. In its warm-up domain adaptation stage, the model learns a fully-supervised object detector (FSOD) to improve the precision of the object proposals in the target domain, and at the same time learns target-domain-specific and detection-aware proposal features. In its main WSOD stage, a WSOD model is specifically tuned to the target domain. The feature extractor and the object proposal generator of the WSOD model are built upon the fine-tuned FSOD model. We test D2DF2WOD on five dual-domain image benchmarks. The results show that our method results in consistently improved object detection and localization compared with state-of-the-art methods.
translated by 谷歌翻译
领域自适应分段努力生成目标域的高质量伪标签并在其上重新训练分段的趋势趋势。在这种自我训练的范式下,一些竞争性方法已寻求潜在的空间信息,该信息建立了语义类别的特征质心(又称原型),并通过与这些质心的距离确定了伪标签候选者。在本文中,我们认为潜在空间包含更多要利用的信息,从而进一步迈出了一步以利用它。首先,我们不仅使用源域原型来确定目标伪标签,而且还像大多数传统方法一样,我们在双向上产生目标域原型来降低那些可能难以理解或无法进行适应的源特征。其次,现有尝试将每个类别模拟为单个和各向同性原型,同时忽略特征分布的方差,这可能导致类似类别的混淆。为了解决这个问题,我们建议通过高斯混合模型代表每个类别,以多种和各向异性原型表示,以根据概率密度估算源域的事实分布并估算目标样品的可能性。我们将我们的方法应用于gta5-> CityScapes和Synthia-> CityScaps任务,并在平均值上分别实现61.2和62.8,这显然优于其他竞争性的自我训练方法。值得注意的是,在某些类别中,我们的方法分别遭受了“卡车”和“公共汽车”等分类混乱的影响,我们的方法分别达到了56.4和68.8,这进一步证明了我们设计的有效性。
translated by 谷歌翻译
对象检测的域适应性(DAOD)最近由于其检测目标对象而没有任何注释而引起了很多关注。为了解决该问题,以前的作品着重于通过对抗训练在两阶段检测器中从部分级别(例如图像级,实例级,RPN级)提取的对齐功能。但是,对象检测管道中的个体级别相互密切相关,并且尚未考虑此层次之间的关系。为此,我们为DAOD介绍了一个新的框架,该框架具有三个提出的组件:多尺度意识不确定性注意力(MUA),可转移的区域建议网络(TRPN)和动态实例采样(DIS)。使用这些模块,我们试图在训练过程中减少负转移效应,同时最大化可传递性以及两个领域的可区分性。最后,我们的框架隐含地学习了域不变区域,以通过利用可转移信息并通过协作利用其域信息来增强不同检测级别之间的互补性。通过消融研究和实验,我们表明所提出的模块以协同方式有助于性能提高,以证明我们方法的有效性。此外,我们的模型在各种基准测试方面达到了新的最新性能。
translated by 谷歌翻译