对象检测网络已经达到了令人印象深刻的性能水平,但是在特定应用程序中缺乏合适的数据通常会限制在实践中。通常,使用其他数据源来支持培训任务。但是,在这些中,不同数据源之间的域间隙在深度学习中构成了挑战。基于GAN的图像到图像样式转移通常用于缩小域间隙,但不稳定并与对象检测任务脱钩。我们提出了Awada,这是一个注意力加权的对抗域适应框架,用于在样式变换和检测任务之间创建反馈循环。通过从对象探测器建议中构造前景对象注意图,我们将转换集中在前景对象区域并稳定样式转移训练。在广泛的实验和消融研究中,我们表明AWADA在常用的基准中达到了最新的无监督域适应对象检测性能,用于诸如合成,不利的天气和跨摄像机适应性。
translated by 谷歌翻译
对象检测的域适应性(DAOD)最近由于其检测目标对象而没有任何注释而引起了很多关注。为了解决该问题,以前的作品着重于通过对抗训练在两阶段检测器中从部分级别(例如图像级,实例级,RPN级)提取的对齐功能。但是,对象检测管道中的个体级别相互密切相关,并且尚未考虑此层次之间的关系。为此,我们为DAOD介绍了一个新的框架,该框架具有三个提出的组件:多尺度意识不确定性注意力(MUA),可转移的区域建议网络(TRPN)和动态实例采样(DIS)。使用这些模块,我们试图在训练过程中减少负转移效应,同时最大化可传递性以及两个领域的可区分性。最后,我们的框架隐含地学习了域不变区域,以通过利用可转移信息并通过协作利用其域信息来增强不同检测级别之间的互补性。通过消融研究和实验,我们表明所提出的模块以协同方式有助于性能提高,以证明我们方法的有效性。此外,我们的模型在各种基准测试方面达到了新的最新性能。
translated by 谷歌翻译
Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.
translated by 谷歌翻译
Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) an entropy loss and (ii) an adversarial loss respectively. We demonstrate state-of-theart performance in semantic segmentation on two challenging "synthetic-2-real" set-ups 1 and show that the approach can also be used for detection.
translated by 谷歌翻译
DETR风格的检测器在内域场景中脱颖而出,但是它们在域移位设置中的属性却没有探索。本文旨在根据两个发现,在域移位设置上使用DETR式检测器建立一个简单但有效的基线。首先,减轻主链的域移动,解码器输出功能在获得有利的结果方面表现出色。对于另一种高级域对准方法,这两个部分都进一步增强了性能。因此,我们提出了对象感知的对准(OAA)模块和最佳基于运输的比对(OTA)模块,以在骨干和检测器的输出上实现全面的域对齐。 OAA模块将伪标签标识的前景区域对齐骨干输出中的伪标签,从而导致基于域的不变特征。 OTA模块利用切成薄片的Wasserstein距离来最大化位置信息的保留,同时最大程度地减少解码器输出中的域间隙。我们将调查结果和对齐模块实施到我们的适应方法中,并基准在域移位设置上基于DETR风格的检测器。在各种领域自适应场景上进行的实验验证了我们方法的有效性。
translated by 谷歌翻译
Domain adaptation aims to bridge the domain shifts between the source and the target domain. These shifts may span different dimensions such as fog, rainfall, etc. However, recent methods typically do not consider explicit prior knowledge about the domain shifts on a specific dimension, thus leading to less desired adaptation performance. In this paper, we study a practical setting called Specific Domain Adaptation (SDA) that aligns the source and target domains in a demanded-specific dimension. Within this setting, we observe the intra-domain gap induced by different domainness (i.e., numerical magnitudes of domain shifts in this dimension) is crucial when adapting to a specific domain. To address the problem, we propose a novel Self-Adversarial Disentangling (SAD) framework. In particular, given a specific dimension, we first enrich the source domain by introducing a domainness creator with providing additional supervisory signals. Guided by the created domainness, we design a self-adversarial regularizer and two loss functions to jointly disentangle the latent representations into domainness-specific and domainness-invariant features, thus mitigating the intra-domain gap. Our method can be easily taken as a plug-and-play framework and does not introduce any extra costs in the inference time. We achieve consistent improvements over state-of-the-art methods in both object detection and semantic segmentation.
translated by 谷歌翻译
未配对的图像到图像转换的目标是产生反映目标域样式的输出图像,同时保持输入源图像的不相关内容不变。但是,由于缺乏对现有方法的内容变化的关注,来自源图像的语义信息遭受翻译期间的降级。在论文中,为了解决这个问题,我们介绍了一种新颖的方法,全局和局部对齐网络(GLA-NET)。全局对齐网络旨在将输入图像从源域传输到目标域。要有效地这样做,我们通过使用MLP-MILLER基于MATY编码器将多元高斯分布的参数(均值和标准偏差)作为样式特征学习。要更准确地传输样式,我们在编码器中使用自适应实例归一化层,具有目标多功能高斯分布的参数作为输入。我们还采用正常化和可能性损失,以进一步降低领域差距并产生高质量的产出。另外,我们介绍了局部对准网络,该网络采用预磨平的自我监督模型来通过新颖的局部对准丢失来产生注意图,确保翻译网络专注于相关像素。在五个公共数据集上进行的广泛实验表明,我们的方法有效地产生比现有方法更锐利和更现实的图像。我们的代码可在https://github.com/ygjwd12345/glanet获得。
translated by 谷歌翻译
我们解决对象检测中的域适应问题,其中在源(带有监控)和目标域(没有监督的域的域名)之间存在显着的域移位。作为广泛采用的域适应方法,自培训教师学生框架(学生模型从教师模型生成的伪标签学习)在目标域中产生了显着的精度增益。然而,由于其偏向源域,它仍然存在从教师产生的大量低质量伪标签(例如,误报)。为了解决这个问题,我们提出了一种叫做自适应无偏见教师(AUT)的自我训练框架,利用对抗的对抗学习和弱强的数据增强来解决域名。具体而言,我们在学生模型中使用特征级的对抗性培训,确保从源和目标域中提取的功能共享类似的统计数据。这使学生模型能够捕获域不变的功能。此外,我们在目标领域的教师模型和两个域上的学生模型之间应用了弱强的增强和相互学习。这使得教师模型能够从学生模型中逐渐受益,而不会遭受域移位。我们展示了AUT通过大边距显示所有现有方法甚至Oracle(完全监督)模型的优势。例如,我们在有雾的城市景观(Clipart1k)上实现了50.9%(49.3%)地图,分别比以前的最先进和甲骨文高9.2%(5.2%)和8.2%(11.0%)
translated by 谷歌翻译
Domain adaptive object detection (DAOD) aims to alleviate transfer performance degradation caused by the cross-domain discrepancy. However, most existing DAOD methods are dominated by computationally intensive two-stage detectors, which are not the first choice for industrial applications. In this paper, we propose a novel semi-supervised domain adaptive YOLO (SSDA-YOLO) based method to improve cross-domain detection performance by integrating the compact one-stage detector YOLOv5 with domain adaptation. Specifically, we adapt the knowledge distillation framework with the Mean Teacher model to assist the student model in obtaining instance-level features of the unlabeled target domain. We also utilize the scene style transfer to cross-generate pseudo images in different domains for remedying image-level differences. In addition, an intuitive consistency loss is proposed to further align cross-domain predictions. We evaluate our proposed SSDA-YOLO on public benchmarks including PascalVOC, Clipart1k, Cityscapes, and Foggy Cityscapes. Moreover, to verify its generalization, we conduct experiments on yawning detection datasets collected from various classrooms. The results show considerable improvements of our method in these DAOD tasks. Our code is available on \url{https://github.com/hnuzhy/SSDA-YOLO}.
translated by 谷歌翻译
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc., and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.
translated by 谷歌翻译
在各种计算机视觉任务(例如对象检测,实例分段等)中,无监督的域适应至关重要。他们试图减少域偏差诱导的性能下降,同时还促进模型应用速度。域适应对象检测中的先前作品尝试使图像级和实例级别变化对准以最大程度地减少域差异,但是它们可能会使单级功能与图像级域适应中的混合级功能相结合,因为对象中的每个图像中的每个图像检测任务可能不止一个类和对象。为了通过单级对齐获得单级和混合级对齐方式,我们将功能的混合级视为新班级,并建议使用混合级$ h-divergence $,以供对象检测到实现均匀特征对准并减少负转移。然后,还提出了基于混合级$ h-Divergence $的语义一致性特征对齐模型(SCFAM)。为了改善单层和混合级的语义信息并完成语义分离,SCFAM模型提出了语义预测模型(SPM)和语义桥接组件(SBC)。然后根据SPM结果更改PIX域鉴别器损耗的重量,以减少样品不平衡。广泛使用的数据集上的广泛无监督域的适应实验说明了我们所提出的方法在域偏置设置中的强大对象检测。
translated by 谷歌翻译
使用合成数据来训练在现实世界数据上实现良好性能的神经网络是一项重要任务,因为它可以减少对昂贵数据注释的需求。然而,合成和现实世界数据具有域间隙。近年来,已经广泛研究了这种差距,也称为域的适应性。通过直接执行两者之间的适应性来缩小源(合成)和目标数据之间的域间隙是具有挑战性的。在这项工作中,我们提出了一个新颖的两阶段框架,用于改进图像数据上的域适应技术。在第一阶段,我们逐步训练一个多尺度神经网络,以从源域到目标域进行图像翻译。我们将新的转换数据表示为“目标中的源”(SIT)。然后,我们将生成的SIT数据插入任何标准UDA方法的输入。该新数据从所需的目标域缩小了域间隙,这有助于应用UDA进一步缩小差距的方法。我们通过与其他领先的UDA和图像对图像翻译技术进行比较来强调方法的有效性,当时用作SIT发电机。此外,我们通过三种用于语义分割的最先进的UDA方法(HRDA,daformer and proda)在两个UDA任务上,GTA5到CityScapes和Synthia to CityScapes来证明我们的框架的改进。
translated by 谷歌翻译
Domain adaptation is critical for success in new, unseen environments. Adversarial adaptation models applied in feature spaces discover domain invariant representations, but are difficult to visualize and sometimes fail to capture pixel-level and low-level domain shifts. Recent work has shown that generative adversarial networks combined with cycle-consistency constraints are surprisingly effective at mapping images between domains, even without the use of aligned image pairs. We propose a novel discriminatively-trained Cycle-Consistent Adversarial Domain Adaptation model. CyCADA adapts representations at both the pixel-level and feature-level, enforces cycle-consistency while leveraging a task loss, and does not require aligned pairs. Our model can be applied in a variety of visual recognition and prediction settings. We show new state-of-the-art results across multiple adaptation tasks, including digit classification and semantic segmentation of road scenes demonstrating transfer from synthetic to real world domains.
translated by 谷歌翻译
In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. MIC enforces the consistency between predictions of masked target images, where random patches are withheld, and pseudo-labels that are generated based on the complete image by an exponential moving average teacher. To minimize the consistency loss, the network has to learn to infer the predictions of the masked regions from their context. Due to its simple and universal concept, MIC can be integrated into various UDA methods across different visual recognition tasks such as image classification, semantic segmentation, and object detection. MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8% on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an improvement of +2.1 and +3.0 percent points over the previous state of the art. The implementation is available at https://github.com/lhoyer/MIC.
translated by 谷歌翻译
最近,检测变压器(DETR)是一种端到端对象检测管道,已达到有希望的性能。但是,它需要大规模标记的数据,并遭受域移位,尤其是当目标域中没有标记的数据时。为了解决这个问题,我们根据平均教师框架MTTRANS提出了一个端到端的跨域检测变压器,该变压器可以通过伪标签充分利用对象检测训练中未标记的目标域数据和在域之间的传输知识中的传输知识。我们进一步提出了综合的多级特征对齐方式,以改善由平均教师框架生成的伪标签,利用跨尺度的自我注意事项机制在可变形的DETR中。图像和对象特征在本地,全局和实例级别与基于域查询的特征对齐(DQFA),基于BI级的基于图形的原型对齐(BGPA)和Wine-Wise图像特征对齐(TIFA)对齐。另一方面,未标记的目标域数据伪标记,可用于平均教师框架的对象检测训练,可以导致更好的特征提取和对齐。因此,可以根据变压器的架构对迭代和相互优化的平均教师框架和全面的多层次特征对齐。广泛的实验表明,我们提出的方法在三个领域适应方案中实现了最先进的性能,尤其是SIM10K到CityScapes方案的结果,从52.6地图提高到57.9地图。代码将发布。
translated by 谷歌翻译
检测变压器最近显示出有希望的对象检测结果,并引起了越来越多的注意力。但是,如何开发有效的域适应技术来改善其跨域性能,尚不清楚和不清楚。在本文中,我们深入研究了这个主题,并从经验上发现,CNN骨架上的直接特征分布对齐仅带来有限的改进,因为它不能保证变压器中的域不变序列特征进行预测。为了解决这个问题,我们提出了一种新型的序列特征比对(SFA)方法,该方法是专门设计用于适应检测变压器的。从技术上讲,SFA由基于域查询的特征对齐(DQFA)模块和令牌特征对齐(TDA)模块组成。在DQFA中,一个新的域查询用于从两个域的令牌序列中汇总和对齐全局上下文。 DQFA分别在变压器编码器和解码器中部署时,降低了全局特征表示和对象关系中的域差异。同时,TDA在两个域中的序列中都对准令牌特征,从而分别降低了变压器编码器和解码器中局部和实例级特征表示中的域间隙。此外,提出了一种新型的两分匹配损失,以增强可鲁棒对象检测的特征可区分性。在三个具有挑战性的基准上进行的实验表明,SFA优于最先进的域自适应对象检测方法。代码已在以下网址提供:https://github.com/encounter1997/sfa。
translated by 谷歌翻译
We propose an approach for unsupervised adaptation of object detectors from label-rich to label-poor domains which can significantly reduce annotation costs associated with detection. Recently, approaches that align distributions of source and target images using an adversarial loss have been proven effective for adapting object classifiers. However, for object detection, fully matching the entire distributions of source and target images to each other at the global image level may fail, as domains could have distinct scene layouts and different combinations of objects. On the other hand, strong matching of local features such as texture and color makes sense, as it does not change category level semantics. This motivates us to propose a novel method for detector adaptation based on strong local alignment and weak global alignment. Our key contribution is the weak alignment model, which focuses the adversarial alignment loss on images that are globally similar and puts less emphasis on aligning images that are globally dissimilar. Additionally, we design the strong domain alignment model to only look at local receptive fields of the feature map. We empirically verify the effectiveness of our method on four datasets comprising both large and small domain shifts. Our code is available at https://github.com/ VisionLearningGroup/DA_Detection.
translated by 谷歌翻译
在本文中,我们解决了一次性分段的单次无监督域适应(OSUDA)的问题,其中分段器在训练期间只看到一个未标记的目标图像。在这种情况下,传统的无监督域适应模型通常失败,因为它们不能适应目标域,以具有过度拟合到一个(或几个)目标样本。为了解决这个问题,现有的OSUDA方法通常集成了一种样式传输模块,基于未标记的目标样本执行域随机化,可以在训练期间探讨目标样本周围的多个域。然而,这种样式传输模块依赖于一组额外的图像作为预训练的样式参考,并且还增加了对域适应的内存需求。在这里,我们提出了一种新的奥德达方法,可以有效地缓解这种计算负担。具体而言,我们将多个样式混合层集成到分段器中,该分段器播放样式传输模块的作用,以在不引入任何学习参数的情况下使源图像进行体现。此外,我们提出了一种剪辑的原型匹配(PPM)方法来加权考虑源像素在监督训练期间的重要性,以缓解负适应。实验结果表明,我们的方法在单次设置下的两个常用基准上实现了新的最先进的性能,并且比所有比较方法更有效。
translated by 谷歌翻译
近年来,语义细分领域取得了巨大进展。但是,剩下的一个具有挑战性的问题是,细分模型并未推广到看不见的域。为了克服这个问题,要么必须标记大量涵盖整个域的数据,这些域通常在实践中是不可行的,要么应用无监督的域适应性(UDA),仅需要标记为源数据。在这项工作中,我们专注于UDA,并另外解决了适应单个域,而且针对一系列目标域的情况。这需要机制,以防止模型忘记其先前学习的知识。为了使细分模型适应目标域,我们遵循利用轻质样式转移将标记的源图像样式转换为目标域样式的想法,同时保留源内容。为了减轻源和目标域之间的分布移位,模型在第二步中在传输的源图像上进行了微调。现有的轻重量样式转移方法依赖于自适应实例归一化(ADAIN)或傅立叶变换仍然缺乏性能,并且在常见数据增强(例如颜色抖动)上没有显着改善。这样做的原因是,这些方法并不关注特定于区域或类别的差异,而是主要捕获最突出的样式。因此,我们提出了一个简单且轻巧的框架,该框架结合了两个类条件的ADAIN层。为了提取传输层所需的特定类目标矩,我们使用未过滤的伪标签,与真实标签相比,我们表明这是有效的近似值。我们在合成序列上广泛验证了我们的方法(CACE),并进一步提出了由真实域组成的具有挑战性的序列。 CACE在视觉和定量上优于现有方法。
translated by 谷歌翻译
大多数现有的域自适应对象检测方法利用对抗特征对齐,以使模型适应新域。对抗性特征比对的最新进展旨在减少发生的负面影响或负转移的负面影响,因为特征的分布取决于对象类别。但是,通过分析无锚的一阶段检测器的特征,在本文中,我们发现可能发生负转移,因为特征分布取决于对边界框的回归值以及类别的回归值而变化。为了通过解决此问题来获得域的不变性,我们考虑了特征分布的模式,以偏移值为条件。通过一种非常简单有效的调节方法,我们提出了在各种实验环境中实现最新性能的OADA(偏置感知域自适应对象检测器)。此外,通过通过单数值分析分析,我们发现我们的模型可以增强可区分性和可传递性。
translated by 谷歌翻译