Deep learning has achieved notable success in 3D object detection with the advent of large-scale point cloud datasets. However, severe performance degradation in the past trained classes, i.e., catastrophic forgetting, still remains a critical issue for real-world deployment when the number of classes is unknown or may vary. Moreover, existing 3D class-incremental detection methods are developed for the single-domain scenario, which fail when encountering domain shift caused by different datasets, varying environments, etc. In this paper, we identify the unexplored yet valuable scenario, i.e., class-incremental learning under domain shift, and propose a novel 3D domain adaptive class-incremental object detection framework, DA-CIL, in which we design a novel dual-domain copy-paste augmentation method to construct multiple augmented domains for diversifying training distributions, thereby facilitating gradual domain adaptation. Then, multi-level consistency is explored to facilitate dual-teacher knowledge distillation from different domains for domain adaptive class-incremental learning. Extensive experiments on various datasets demonstrate the effectiveness of the proposed method over baselines in the domain adaptive class-incremental learning scenario.
translated by 谷歌翻译
基于深度学习的方法在3D对象检测任务中显示出显着性能。然而,当在逐步学习新类时,它们遭受了最初训练的课程的灾难性表现下降,而无需重新审视旧数据。这种“灾难性忘记”现象阻碍了现实世界场景中的3D对象检测方法的部署,其中需要连续学习系统。在本文中,我们研究了未开发的但重要的类增量3D对象检测问题,并提出了第一种解决方案 - SDCOT,一种新型静态动态共同教学方法。我们的SDCOT通过静态教师减轻了灾难性的旧课程,这为新样本中的旧课程提供了伪注释,并通过用蒸馏损失提取先前的知识来规范电流模型。与此同时,SDCOT一致地通过动态教师从新数据中了解基础知识。我们对两个基准数据集进行了广泛的实验,并在几个增量学习场景中展示了我们SDCOT对基线方法的卓越性能。
translated by 谷歌翻译
Domain adaptive object detection (DAOD) aims to alleviate transfer performance degradation caused by the cross-domain discrepancy. However, most existing DAOD methods are dominated by computationally intensive two-stage detectors, which are not the first choice for industrial applications. In this paper, we propose a novel semi-supervised domain adaptive YOLO (SSDA-YOLO) based method to improve cross-domain detection performance by integrating the compact one-stage detector YOLOv5 with domain adaptation. Specifically, we adapt the knowledge distillation framework with the Mean Teacher model to assist the student model in obtaining instance-level features of the unlabeled target domain. We also utilize the scene style transfer to cross-generate pseudo images in different domains for remedying image-level differences. In addition, an intuitive consistency loss is proposed to further align cross-domain predictions. We evaluate our proposed SSDA-YOLO on public benchmarks including PascalVOC, Clipart1k, Cityscapes, and Foggy Cityscapes. Moreover, to verify its generalization, we conduct experiments on yawning detection datasets collected from various classrooms. The results show considerable improvements of our method in these DAOD tasks. Our code is available on \url{https://github.com/hnuzhy/SSDA-YOLO}.
translated by 谷歌翻译
我们解决对象检测中的域适应问题,其中在源(带有监控)和目标域(没有监督的域的域名)之间存在显着的域移位。作为广泛采用的域适应方法,自培训教师学生框架(学生模型从教师模型生成的伪标签学习)在目标域中产生了显着的精度增益。然而,由于其偏向源域,它仍然存在从教师产生的大量低质量伪标签(例如,误报)。为了解决这个问题,我们提出了一种叫做自适应无偏见教师(AUT)的自我训练框架,利用对抗的对抗学习和弱强的数据增强来解决域名。具体而言,我们在学生模型中使用特征级的对抗性培训,确保从源和目标域中提取的功能共享类似的统计数据。这使学生模型能够捕获域不变的功能。此外,我们在目标领域的教师模型和两个域上的学生模型之间应用了弱强的增强和相互学习。这使得教师模型能够从学生模型中逐渐受益,而不会遭受域移位。我们展示了AUT通过大边距显示所有现有方法甚至Oracle(完全监督)模型的优势。例如,我们在有雾的城市景观(Clipart1k)上实现了50.9%(49.3%)地图,分别比以前的最先进和甲骨文高9.2%(5.2%)和8.2%(11.0%)
translated by 谷歌翻译
Vision-Centric Bird-Eye-View (BEV) perception has shown promising potential and attracted increasing attention in autonomous driving. Recent works mainly focus on improving efficiency or accuracy but neglect the domain shift problem, resulting in severe degradation of transfer performance. With extensive observations, we figure out the significant domain gaps existing in the scene, weather, and day-night changing scenarios and make the first attempt to solve the domain adaption problem for multi-view 3D object detection. Since BEV perception approaches are usually complicated and contain several components, the domain shift accumulation on multi-latent spaces makes BEV domain adaptation challenging. In this paper, we propose a novel Multi-level Multi-space Alignment Teacher-Student ($M^{2}ATS$) framework to ease the domain shift accumulation, which consists of a Depth-Aware Teacher (DAT) and a Multi-space Feature Aligned (MFA) student model. Specifically, DAT model adopts uncertainty guidance to sample reliable depth information in target domain. After constructing domain-invariant BEV perception, it then transfers pixel and instance-level knowledge to student model. To further alleviate the domain shift at the global level, MFA student model is introduced to align task-relevant multi-space features of two domains. To verify the effectiveness of $M^{2}ATS$, we conduct BEV 3D object detection experiments on four cross domain scenarios and achieve state-of-the-art performance (e.g., +12.6% NDS and +9.1% mAP on Day-Night). Code and dataset will be released.
translated by 谷歌翻译
最近,检测变压器(DETR)是一种端到端对象检测管道,已达到有希望的性能。但是,它需要大规模标记的数据,并遭受域移位,尤其是当目标域中没有标记的数据时。为了解决这个问题,我们根据平均教师框架MTTRANS提出了一个端到端的跨域检测变压器,该变压器可以通过伪标签充分利用对象检测训练中未标记的目标域数据和在域之间的传输知识中的传输知识。我们进一步提出了综合的多级特征对齐方式,以改善由平均教师框架生成的伪标签,利用跨尺度的自我注意事项机制在可变形的DETR中。图像和对象特征在本地,全局和实例级别与基于域查询的特征对齐(DQFA),基于BI级的基于图形的原型对齐(BGPA)和Wine-Wise图像特征对齐(TIFA)对齐。另一方面,未标记的目标域数据伪标记,可用于平均教师框架的对象检测训练,可以导致更好的特征提取和对齐。因此,可以根据变压器的架构对迭代和相互优化的平均教师框架和全面的多层次特征对齐。广泛的实验表明,我们提出的方法在三个领域适应方案中实现了最先进的性能,尤其是SIM10K到CityScapes方案的结果,从52.6地图提高到57.9地图。代码将发布。
translated by 谷歌翻译
While deep learning methods hitherto have achieved considerable success in medical image segmentation, they are still hampered by two limitations: (i) reliance on large-scale well-labeled datasets, which are difficult to curate due to the expert-driven and time-consuming nature of pixel-level annotations in clinical practices, and (ii) failure to generalize from one domain to another, especially when the target domain is a different modality with severe domain shifts. Recent unsupervised domain adaptation~(UDA) techniques leverage abundant labeled source data together with unlabeled target data to reduce the domain gap, but these methods degrade significantly with limited source annotations. In this study, we address this underexplored UDA problem, investigating a challenging but valuable realistic scenario, where the source domain not only exhibits domain shift~w.r.t. the target domain but also suffers from label scarcity. In this regard, we propose a novel and generic framework called ``Label-Efficient Unsupervised Domain Adaptation"~(LE-UDA). In LE-UDA, we construct self-ensembling consistency for knowledge transfer between both domains, as well as a self-ensembling adversarial learning module to achieve better feature alignment for UDA. To assess the effectiveness of our method, we conduct extensive experiments on two different tasks for cross-modality segmentation between MRI and CT images. Experimental results demonstrate that the proposed LE-UDA can efficiently leverage limited source labels to improve cross-domain segmentation performance, outperforming state-of-the-art UDA approaches in the literature. Code is available at: https://github.com/jacobzhaoziyuan/LE-UDA.
translated by 谷歌翻译
在本文中,我们提出了激光雷达蒸馏,以弥合由不同的激光束引起的3D对象检测的域间隙。在许多现实世界中,大规模生产的机器人和车辆使用的激光点通常比大型公共数据集的光束少。此外,随着LIDARS升级到具有不同光束量的其他产品模型,使用先前版本的高分辨率传感器捕获的标记数据变得具有挑战性。尽管域自适应3D检测最近取得了进展,但大多数方法都难以消除梁诱导的域间隙。我们发现,在训练过程中,必须将源域的点云密度与目标域的点云密度保持一致。受到这一发现的启发,我们提出了一个渐进式框架,以减轻光束诱导的域移位。在每次迭代中,我们首先通过下采样高光束点云来产生低光束伪激光雷达。然后,使用教师学生的框架来将丰富的信息从数据中提取更多的信息。 Waymo,Nuscenes和Kitti数据集的大量实验具有三个不同的基于激光雷达的探测器,这证明了我们激光蒸馏的有效性。值得注意的是,我们的方法不会增加推理的任何额外计算成本。
translated by 谷歌翻译
最近3D点云学习一直是计算机视觉和自主驾驶中的热门话题。由于事实上,难以手动注释一个定性的大型3D点云数据集,无监督的域适应(UDA)在3D点云学习中流行,旨在将学习知识从标记的源域转移到未标记的目标领域。然而,具有简单学习模型引起的域转移引起的泛化和重建误差是不可避免的,这基本上阻碍了模型的学习良好表示的能力。为了解决这些问题,我们提出了一个结束到底自组合网络(SEN),用于3D云域适应任务。一般来说,我们的森林度假前的含义教师和半监督学习的优势,并引入了软的分类损失和一致性损失,旨在实现一致的泛化和准确的重建。在森中,学生网络以具有监督的学习和自我监督学习的协作方式,教师网络进行时间一致性,以学习有用的表示,并确保点云重建的质量。在几个3D点云UDA基准上的广泛实验表明,我们的SEN在分类和分段任务中表现出最先进的方法。此外,进一步的分析表明,我们的森也实现了更好的重建结果。
translated by 谷歌翻译
Semi-supervised object detection is important for 3D scene understanding because obtaining large-scale 3D bounding box annotations on point clouds is time-consuming and labor-intensive. Existing semi-supervised methods usually employ teacher-student knowledge distillation together with an augmentation strategy to leverage unlabeled point clouds. However, these methods adopt global augmentation with scene-level transformations and hence are sub-optimal for instance-level object detection. In this work, we propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection. In this way, the resultant augmentor is derived to emphasize object instances rather than irrelevant backgrounds, making the augmented data more useful for object detector training. Extensive experiments on the ScanNet and SUN RGB-D datasets show that the proposed OPA performs favorably against the state-of-the-art methods under various experimental settings. The source code will be available at https://github.com/nomiaro/OPA.
translated by 谷歌翻译
域自适应对象检测(DAOD)旨在改善探测和测试数据来自不同域时的探测器的泛化能力。考虑到显着的域间隙,一些典型方法,例如基于Conscangan的方法,采用中间域来逐步地桥接源域和靶域。然而,基于Conscangan的中间域缺少对象检测的PIX或实例级监控,这导致语义差异。为了解决这个问题,在本文中,我们介绍了具有四种不同的低频滤波器操作的频谱增强一致性(FSAC)框架。通过这种方式,我们可以获得一系列增强数据作为中间域。具体地,我们提出了一种两级优化框架。在第一阶段,我们利用所有原始和增强的源数据来训练对象检测器。在第二阶段,采用增强源和目标数据,具有伪标签来执行预测一致性的自培训。使用均值优化的教师模型用于进一步修改伪标签。在实验中,我们分别评估了我们在单一和复合目标DAOD上的方法,这证明了我们方法的有效性。
translated by 谷歌翻译
在过去的十年中,许多深入学习模型都受到了良好的培训,并在各种机器智能领域取得了巨大成功,特别是对于计算机视觉和自然语言处理。为了更好地利用这些训练有素的模型在域内或跨域转移学习情况下,提出了知识蒸馏(KD)和域适应(DA)并成为研究亮点。他们旨在通过原始培训数据从训练有素的模型转移有用的信息。但是,由于隐私,版权或机密性,原始数据并不总是可用的。最近,无数据知识转移范式吸引了吸引人的关注,因为它涉及从训练有素的模型中蒸馏宝贵的知识,而无需访问培训数据。特别是,它主要包括无数据知识蒸馏(DFKD)和源无数据域适应(SFDA)。一方面,DFKD旨在将域名域内知识从一个麻烦的教师网络转移到一个紧凑的学生网络,以进行模型压缩和有效推论。另一方面,SFDA的目标是重用存储在训练有素的源模型中的跨域知识并将其调整为目标域。在本文中,我们对知识蒸馏和无监督域适应的视角提供了全面的数据知识转移,以帮助读者更好地了解目前的研究状况和想法。分别简要审查了这两个领域的应用和挑战。此外,我们对未来研究的主题提供了一些见解。
translated by 谷歌翻译
LiDAR-based 3D object detection is an indispensable task in advanced autonomous driving systems. Though impressive detection results have been achieved by superior 3D detectors, they suffer from significant performance degeneration when facing unseen domains, such as different LiDAR configurations, different cities, and weather conditions. The mainstream approaches tend to solve these challenges by leveraging unsupervised domain adaptation (UDA) techniques. However, these UDA solutions just yield unsatisfactory 3D detection results when there is a severe domain shift, e.g., from Waymo (64-beam) to nuScenes (32-beam). To address this, we present a novel Semi-Supervised Domain Adaptation method for 3D object detection (SSDA3D), where only a few labeled target data is available, yet can significantly improve the adaptation performance. In particular, our SSDA3D includes an Inter-domain Adaptation stage and an Intra-domain Generalization stage. In the first stage, an Inter-domain Point-CutMix module is presented to efficiently align the point cloud distribution across domains. The Point-CutMix generates mixed samples of an intermediate domain, thus encouraging to learn domain-invariant knowledge. Then, in the second stage, we further enhance the model for better generalization on the unlabeled target set. This is achieved by exploring Intra-domain Point-MixUp in semi-supervised learning, which essentially regularizes the pseudo label distribution. Experiments from Waymo to nuScenes show that, with only 10% labeled target data, our SSDA3D can surpass the fully-supervised oracle model with 100% target label. Our code is available at https://github.com/yinjunbo/SSDA3D.
translated by 谷歌翻译
无监督的域适应性(UDA)旨在使标记的源域的模型适应未标记的目标域。现有的基于UDA的语义细分方法始终降低像素级别,功能级别和输出级别的域移动。但是,几乎所有这些都在很大程度上忽略了上下文依赖性,该依赖性通常在不同的领域共享,从而导致较不怀疑的绩效。在本文中,我们提出了一个新颖的环境感知混音(camix)框架自适应语义分割的框架,该框架以完全端到端的可训练方式利用了上下文依赖性的这一重要线索作为显式的先验知识,以增强对适应性的适应性目标域。首先,我们通过利用积累的空间分布和先前的上下文关系来提出上下文掩盖的生成策略。生成的上下文掩码在这项工作中至关重要,并将指导三个不同级别的上下文感知域混合。此外,提供了背景知识,我们引入了重要的一致性损失,以惩罚混合学生预测与混合教师预测之间的不一致,从而减轻了适应性的负面转移,例如早期绩效降级。广泛的实验和分析证明了我们方法对广泛使用的UDA基准的最新方法的有效性。
translated by 谷歌翻译
When facing changing environments in the real world, the lightweight model on client devices suffers from severe performance drops under distribution shifts. The main limitations of the existing device model lie in (1) unable to update due to the computation limit of the device, (2) the limited generalization ability of the lightweight model. Meanwhile, recent large models have shown strong generalization capability on the cloud while they can not be deployed on client devices due to poor computation constraints. To enable the device model to deal with changing environments, we propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device and improves the generalization of the device model. Based on this paradigm, we further propose an Uncertainty-based Visual Prompt Adapted (U-VPA) teacher-student model to transfer the generalization capability of the large model on the cloud to the device model. Specifically, we first design the Uncertainty Guided Sampling (UGS) to screen out challenging data continuously and transmit the most out-of-distribution samples from the device to the cloud. Then we propose a Visual Prompt Learning Strategy with Uncertainty guided updating (VPLU) to specifically deal with the selected samples with more distribution shifts. We transmit the visual prompts to the device and concatenate them with the incoming data to pull the device testing distribution closer to the cloud training distribution. We conduct extensive experiments on two object detection datasets with continually changing environments. Our proposed U-VPA teacher-student framework outperforms previous state-of-the-art test time adaptation and device-cloud collaboration methods. The code and datasets will be released.
translated by 谷歌翻译
Scaling object taxonomies is one of the important steps toward a robust real-world deployment of recognition systems. We have faced remarkable progress in images since the introduction of the LVIS benchmark. To continue this success in videos, a new video benchmark, TAO, was recently presented. Given the recent encouraging results from both detection and tracking communities, we are interested in marrying those two advances and building a strong large vocabulary video tracker. However, supervisions in LVIS and TAO are inherently sparse or even missing, posing two new challenges for training the large vocabulary trackers. First, no tracking supervisions are in LVIS, which leads to inconsistent learning of detection (with LVIS and TAO) and tracking (only with TAO). Second, the detection supervisions in TAO are partial, which results in catastrophic forgetting of absent LVIS categories during video fine-tuning. To resolve these challenges, we present a simple but effective learning framework that takes full advantage of all available training data to learn detection and tracking while not losing any LVIS categories to recognize. With this new learning scheme, we show that consistent improvements of various large vocabulary trackers are capable, setting strong baseline results on the challenging TAO benchmarks.
translated by 谷歌翻译
深度学习方法在3D语义细分中取得了显着的成功。但是,收集密集注释的现实世界3D数据集非常耗时且昂贵。关于合成数据和对现实世界情景的培训模型成为一种吸引人的选择,但不幸的是,臭名昭著的领域变化。在这项工作中,我们提出了一个面向数据的域适应性(DODA)框架,以减轻由不同的感应机制和跨域的布局放置引起的模式和上下文差距。我们的DODA涵盖了虚拟扫描模拟,以模仿现实世界中的点云图案和尾声的长方体混合,以减轻基于Cuboid的中间域的内部环境差距。 3D室内语义分割上的第一个无监督的SIM到运行适应基准也构建在3D-Front,Scannet和S3DIS上,以及7种流行的无监督域适应(UDA)方法。我们的DODA在3D -Front-> scannet和3d -Front-> S3DIS上都超过了13%的UDA方法。代码可从https://github.com/cvmi-lab/doda获得。
translated by 谷歌翻译
Domain adaptation aims to bridge the domain shifts between the source and the target domain. These shifts may span different dimensions such as fog, rainfall, etc. However, recent methods typically do not consider explicit prior knowledge about the domain shifts on a specific dimension, thus leading to less desired adaptation performance. In this paper, we study a practical setting called Specific Domain Adaptation (SDA) that aligns the source and target domains in a demanded-specific dimension. Within this setting, we observe the intra-domain gap induced by different domainness (i.e., numerical magnitudes of domain shifts in this dimension) is crucial when adapting to a specific domain. To address the problem, we propose a novel Self-Adversarial Disentangling (SAD) framework. In particular, given a specific dimension, we first enrich the source domain by introducing a domainness creator with providing additional supervisory signals. Guided by the created domainness, we design a self-adversarial regularizer and two loss functions to jointly disentangle the latent representations into domainness-specific and domainness-invariant features, thus mitigating the intra-domain gap. Our method can be easily taken as a plug-and-play framework and does not introduce any extra costs in the inference time. We achieve consistent improvements over state-of-the-art methods in both object detection and semantic segmentation.
translated by 谷歌翻译
3D激光雷达语义细分对于自动驾驶是基础。最近已经提出了几种用于点云数据的无监督域适应性(UDA)方法,以改善不同传感器和环境的模型概括。研究图像域中研究UDA问题的研究人员表明,样品混合可以减轻域的转移。我们提出了一种针对点云UDA的样品混合的新方法,即组成语义混合(Cosmix),这是基于样品混合的第一种UDA方法。 Cosmix由一个两分支对称网络组成,该网络可以同时处理标记的合成数据(源)和现实世界中未标记的点云(目标)。每个分支通过从另一个域中混合选定的数据来在一个域上运行,并使用源标签和目标伪标签的语义信息。我们在两个大规模数据集上评估Cosmix,表明它的表现要优于最先进的方法。我们的代码可在https://github.com/saltoricristiano/cosmix-uda上找到。
translated by 谷歌翻译
临床医生在手术室(OR)的细粒度定位是设计新一代或支持系统的关键组成部分。需要基于人像素的分段和身体视觉计算机的计算机视觉模型检测,以更好地了解OR的临床活动和空间布局。这是具有挑战性的,这不仅是因为或图像与传统视觉数据集有很大不同,还因为在隐私问题上很难收集和生成数据和注释。为了解决这些问题,我们首先研究了如何在低分辨率图像上进行姿势估计和实例分割,而下采样因子从1x到12倍进行下采样因子。其次,为了解决域的偏移和缺乏注释,我们提出了一种新型的无监督域适应方法,称为适配器,以使模型从野外标记的源域中适应统计上不同的未标记目标域。我们建议在未标记的目标域图像的不同增强上利用明确的几何约束,以生成准确的伪标签,并使用这些伪标签在自我训练框架中对高分辨率和低分辨率或图像进行训练。此外,我们提出了分离的特征归一化,以处理统计上不同的源和目标域数据。对两个或数据集MVOR+和TUM-或TUM-或测试的详细消融研究的广泛实验结果表明,我们方法对强构建的基线的有效性,尤其是在低分辨率的隐私性或图像上。最后,我们在大规模可可数据集上显示了我们作为半监督学习方法(SSL)方法的普遍性,在这里,我们获得了可比较的结果,而对经过100%标记的监督培训的模型的标签监督只有1%。 。
translated by 谷歌翻译