尽管深入学习对监督点云语义细分的成功取得了成功,但获得大规模的逐点手动注释仍然是一个重大挑战。为了减轻巨大的注释负担,我们提出了一个基于区域和多样性的积极学习(REDAL),这是许多深度学习方法的一般框架,旨在自动选择用于标签获取的信息丰富和多样化的子场所。观察到只有一小部分带注释的区域足以通过深度学习的方式理解3D场景,我们使用SoftMax熵,颜色不连续性和结构复杂性来衡量子场所区域的信息。还开发了一种多样性的选择算法,以避免通过在查询批次中选择信息性但相似的区域而产生的多余注释。广泛的实验表明,我们的方法的表现高于先前的活跃学习策略,并且我们达到了90%的全面监督学习,而S3DIS和Semantickitti数据集则需要不到15%和5%的注释。我们的代码可在https://github.com/tsunghan-wu/redal上公开获取。
translated by 谷歌翻译
We propose LiDAL, a novel active learning method for 3D LiDAR semantic segmentation by exploiting inter-frame uncertainty among LiDAR frames. Our core idea is that a well-trained model should generate robust results irrespective of viewpoints for scene scanning and thus the inconsistencies in model predictions across frames provide a very reliable measure of uncertainty for active sample selection. To implement this uncertainty measure, we introduce new inter-frame divergence and entropy formulations, which serve as the metrics for active selection. Moreover, we demonstrate additional performance gains by predicting and incorporating pseudo-labels, which are also selected using the proposed inter-frame uncertainty measure. Experimental results validate the effectiveness of LiDAL: we achieve 95% of the performance of fully supervised learning with less than 5% of annotations on the SemanticKITTI and nuScenes datasets, outperforming state-of-the-art active learning methods. Code release: https://github.com/hzykent/LiDAL.
translated by 谷歌翻译
在域适应领域,模型性能与目标域注释的数量之间存在权衡。积极的学习,最大程度地提高了模型性能,几乎没有信息的标签数据,以方便这种情况。在这项工作中,我们提出了D2ADA,这是用于语义分割的一般活动域的适应框架。为了使模型使用最小查询标签调整到目标域,我们提出了在目标域中具有高概率密度的样品的获取标签,但源域中的概率密度较低,与现有源域标记的数据互补。为了进一步提高标签效率,我们设计了动态的调度策略,以调整域探索和模型不确定性之间的标签预算。广泛的实验表明,我们的方法的表现优于现有的活跃学习和域适应基线,这两个基准测试基准,GTA5-> CityScapes和Synthia-> CityScapes。对于目标域注释不到5%,我们的方法与完全监督的结果可比结果。我们的代码可在https://github.com/tsunghan-wu/d2ada上公开获取。
translated by 谷歌翻译
由于准备点云的标记数据用于训练语义分割网络是一个耗时的过程,因此已经引入了弱监督的方法,以从一小部分数据中学习。这些方法通常是基于对比损失的学习,同时自动从一组稀疏的用户注销标签中得出每个点伪标签。在本文中,我们的关键观察是,选择要注释的样品的选择与这些样品的使用方式一样重要。因此,我们介绍了一种对3D场景进行弱监督分割的方法,该方法将自我训练与主动学习结合在一起。主动学习选择注释点可能会导致训练有素的模型的性能改进,而自我培训则可以有效利用用户提供的标签来学习模型。我们证明我们的方法会导致一种有效的方法,该方法可改善场景细分对以前的作品和基线,同时仅需要少量的用户注释。
translated by 谷歌翻译
As an important data selection schema, active learning emerges as the essential component when iterating an Artificial Intelligence (AI) model. It becomes even more critical given the dominance of deep neural network based models, which are composed of a large number of parameters and data hungry, in application. Despite its indispensable role for developing AI models, research on active learning is not as intensive as other research directions. In this paper, we present a review of active learning through deep active learning approaches from the following perspectives: 1) technical advancements in active learning, 2) applications of active learning in computer vision, 3) industrial systems leveraging or with potential to leverage active learning for data iteration, 4) current limitations and future research directions. We expect this paper to clarify the significance of active learning in a modern AI model manufacturing process and to bring additional research attention to active learning. By addressing data automation challenges and coping with automated machine learning systems, active learning will facilitate democratization of AI technologies by boosting model production at scale.
translated by 谷歌翻译
弱监督的点云语义分割方法需要1 \%或更少的标签,希望实现与完全监督的方法几乎相同的性能,这些方法最近引起了广泛的研究关注。该框架中的一个典型解决方案是使用自我训练或伪标记来从点云本身挖掘监督,但忽略了图像中的关键信息。实际上,在激光雷达场景中广泛存在相机,而这种互补信息对于3D应用似乎非常重要。在本文中,我们提出了一种用于3D分割的新型交叉模式弱监督的方法,并结合了来自未标记图像的互补信息。基本上,我们设计了一个配备有效标签策略的双分支网络,以最大程度地发挥标签的力量,并直接实现2D到3D知识转移。之后,我们以期望最大(EM)的视角建立了一个跨模式的自我训练框架,该框架在伪标签估计和更新参数之间进行了迭代。在M-Step中,我们提出了一个跨模式关联学习,通过增强3D点和2D超级像素之间的周期矛盾性,从图像中挖掘互补的监督。在E-Step中,伪标签的自我校准机制被得出过滤噪声标签,从而为网络提供了更准确的标签,以进行全面训练。广泛的实验结果表明,我们的方法甚至优于最先进的竞争对手,而少于1 \%的主动选择注释。
translated by 谷歌翻译
自我训练具有极大的促进域自适应语义分割,它迭代地在目标域上生成伪标签并删除网络。然而,由于现实分割数据集是高度不平衡的,因此目标伪标签通常偏置到多数类并且基本上嘈杂,导致出错和次优模型。为了解决这个问题,我们提出了一个基于区域的主动学习方法,用于在域移位下进行语义分割,旨在自动查询要标记的图像区域的小分区,同时最大化分割性能。我们的算法,通过区域杂质和预测不确定性(AL-RIPU)的主动学习,介绍了一种新的采集策略,其特征在于图像区域的空间邻接以及预测置信度。我们表明,所提出的基于地区的选择策略比基于图像或基于点的对应物更有效地使用有限预算。同时,我们在源图像上强制在像素和其最近邻居之间的局部预测一致性。此外,我们制定了负面学习损失,以提高目标领域的鉴别表现。广泛的实验表明,我们的方法只需要极少的注释几乎达到监督性能,并且大大优于最先进的方法。
translated by 谷歌翻译
手动注释复杂的场景点云数据集昂贵且容易出错。为了减少对标记数据的依赖性,提出了一种名为Snapshotnet的新模型作为自我监督的特征学习方法,它直接用于复杂3D场景的未标记点云数据。 Snapshotnet Pipleine包括三个阶段。在快照捕获阶段,从点云场景中采样被定义为本地点的快照。快照可以是直接从真实场景捕获的本地3D扫描的视图,或者从大3D 3D点云数据集中的虚拟视图。也可以在不同的采样率或视野(FOV)的不同采样率或视野(FOV)中进行对快照进行,从而从场景中捕获比例信息。在特征学习阶段,提出了一种名为Multi-FoV对比度的新的预文本任务,以识别两个快照是否来自同一对象,而不是在同一FOV中或跨不同的FOV中。快照通过两个自我监督的学习步骤:对比学习步骤与零件和比例对比度,然后是快照聚类步骤以提取更高的级别语义特征。然后,通过首先培训在学习特征上的标准SVM分类器的培训中实现了弱监督的分割阶段,其中包含少量标记的快照。训练的SVM用于预测输入快照的标签,并使用投票过程将预测标签转换为整个场景的语义分割的点明智标签分配。实验是在语义3D数据集上进行的,结果表明,该方法能够从无任何标签的复杂场景数据的快照学习有效特征。此外,当与弱监管点云语义分割的SOA方法相比,该方法已经显示了优势。
translated by 谷歌翻译
Deep learning has attained remarkable success in many 3D visual recognition tasks, including shape classification, object detection, and semantic segmentation. However, many of these results rely on manually collecting densely annotated real-world 3D data, which is highly time-consuming and expensive to obtain, limiting the scalability of 3D recognition tasks. Thus, we study unsupervised 3D recognition and propose a Self-supervised-Self-Labeled 3D Recognition (SL3D) framework. SL3D simultaneously solves two coupled objectives, i.e., clustering and learning feature representation to generate pseudo-labeled data for unsupervised 3D recognition. SL3D is a generic framework and can be applied to solve different 3D recognition tasks, including classification, object detection, and semantic segmentation. Extensive experiments demonstrate its effectiveness. Code is available at https://github.com/fcendra/sl3d.
translated by 谷歌翻译
接受注释较弱的对象探测器是全面监督者的负担得起的替代方案。但是,它们之间仍然存在显着的性能差距。我们建议通过微调预先训练的弱监督检测器来缩小这一差距,并使用``Box-In-box''(bib'(bib)自动从训练集中自动选择了一些完全注销的样品,这是一种新颖的活跃学习专门针对弱势监督探测器的据可查的失败模式而设计的策略。 VOC07和可可基准的实验表明,围嘴表现优于其他活跃的学习技术,并显着改善了基本的弱监督探测器的性能,而每个类别仅几个完全宣布的图像。围嘴达到了完全监督的快速RCNN的97%,在VOC07上仅10%的全已通量图像。在可可(COCO)上,平均每类使用10张全面通量的图像,或同等的训练集的1%,还减少了弱监督检测器和完全监督的快速RCN之间的性能差距(In AP)以上超过70% ,在性能和数据效率之间表现出良好的权衡。我们的代码可在https://github.com/huyvvo/bib上公开获取。
translated by 谷歌翻译
Recent aerial object detection models rely on a large amount of labeled training data, which requires unaffordable manual labeling costs in large aerial scenes with dense objects. Active learning is effective in reducing the data labeling cost by selectively querying the informative and representative unlabelled samples. However, existing active learning methods are mainly with class-balanced setting and image-based querying for generic object detection tasks, which are less applicable to aerial object detection scenario due to the long-tailed class distribution and dense small objects in aerial scenes. In this paper, we propose a novel active learning method for cost-effective aerial object detection. Specifically, both object-level and image-level informativeness are considered in the object selection to refrain from redundant and myopic querying. Besides, an easy-to-use class-balancing criterion is incorporated to favor the minority objects to alleviate the long-tailed class distribution problem in model training. To fully utilize the queried information, we further devise a training loss to mine the latent knowledge in the undiscovered image regions. Extensive experiments are conducted on the DOTA-v1.0 and DOTA-v2.0 benchmarks to validate the effectiveness of the proposed method. The results show that it can save more than 75% of the labeling cost to reach the same performance compared to the baselines and state-of-the-art active object detection methods. Code is available at https://github.com/ZJW700/MUS-CDB
translated by 谷歌翻译
大多数现有的点云实例和语义分割方法在很大程度上依赖于强大的监督信号,这需要场景中每个点的点级标签。但是,这种强大的监督遭受了巨大的注释成本,引起了研究有效注释的需求。在本文中,我们发现实例的位置对实例和语义3D场景细分都很重要。通过充分利用位置,我们设计了一种弱监督的点云分割算法,该算法仅需要单击每个实例以指示其注释的位置。通过进行预处理过度分割,我们将这些位置注释扩展到seg级标签中。我们通过将未标记的片段分组分组到相关的附近标签段中,进一步设计一个段分组网络(SEGGROUP),以在SEG级标签下生成点级伪标签,以便现有的点级监督的分段模型可以直接消耗这些PSEUDO标签为了训练。实验结果表明,我们的SEG级监督方法(SEGGROUP)通过完全注释的点级监督方法获得了可比的结果。此外,在固定注释预算的情况下,它的表现优于最近弱监督的方法。
translated by 谷歌翻译
Our dataset provides dense annotations for each scan of all sequences from the KITTI Odometry Benchmark [19]. Here, we show multiple scans aggregated using pose information estimated by a SLAM approach.
translated by 谷歌翻译
主动学习(al)试图通过标记最少的样本来最大限度地提高模型的性能增益。深度学习(DL)是贪婪的数据,需要大量的数据电源来优化大量参数,因此模型了解如何提取高质量功能。近年来,由于互联网技术的快速发展,我们处于信息种类的时代,我们有大量的数据。通过这种方式,DL引起了研究人员的强烈兴趣,并已迅速发展。与DL相比,研究人员对Al的兴趣相对较低。这主要是因为在DL的崛起之前,传统的机器学习需要相对较少的标记样品。因此,早期的Al很难反映其应得的价值。虽然DL在各个领域取得了突破,但大多数这一成功都是由于大量现有注释数据集的宣传。然而,收购大量高质量的注释数据集消耗了很多人力,这在某些领域不允许在需要高专业知识,特别是在语音识别,信息提取,医学图像等领域中, al逐渐受到适当的关注。自然理念是AL是否可用于降低样本注释的成本,同时保留DL的强大学习能力。因此,已经出现了深度主动学习(DAL)。虽然相关的研究非常丰富,但它缺乏对DAL的综合调查。本文要填补这一差距,我们为现有工作提供了正式的分类方法,以及全面和系统的概述。此外,我们还通过申请的角度分析并总结了DAL的发展。最后,我们讨论了DAL中的混乱和问题,为DAL提供了一些可能的发展方向。
translated by 谷歌翻译
密集的注释LiDAR点云是昂贵的,这限制了完全监督学习方法的可伸缩性。在这项工作中,我们研究了激光雷达分割中未充满激光的半监督学习(SSL)。我们的核心思想是利用激光点云的强烈空间提示来更好地利用未标记的数据。我们建议Lasermix混合不同激光扫描的激光束,然后鼓励模型在混合前后进行一致和自信的预测。我们的框架具有三个吸引人的属性:1)通用:Lasermix对LIDAR表示不可知(例如,范围视图和体素),因此可以普遍应用我们的SSL框架。 2)从统计上讲:我们提供详细的分析,以理论上解释所提出的框架的适用性。 3)有效:对流行激光雷达分割数据集(Nuscenes,Semantickitti和Scribblekitti)的全面实验分析证明了我们的有效性和优势。值得注意的是,我们在标签少2倍至5倍的同行中获得了竞争成果,并平均将仅监督的基线提高了10.8%。我们希望这个简洁而高性能的框架可以促进半监督的激光雷达细分的未来研究。代码将公开可用。
translated by 谷歌翻译
大规模发光点云的快速有效语义分割是自主驾驶中的一个基本问题。为了实现这一目标,现有的基于点的方法主要选择采用随机抽样策略来处理大规模点云。但是,我们的数量和定性研究发现,随机抽样可能不适合自主驾驶场景,因为LiDAR点遵循整个空间的不均匀甚至长尾巴分布,这阻止了模型从从中捕获足够的信息,从而从中捕获了足够的信息不同的距离范围并降低了模型的学习能力。为了减轻这个问题,我们提出了一种新的极性缸平衡的随机抽样方法,该方法使下采样的点云能够保持更平衡的分布并改善不同空间分布下的分割性能。此外,引入了采样一致性损失,以进一步提高分割性能并降低模型在不同采样方法下的方差。广泛的实验证实,我们的方法在Semantickitti和Semanticposs基准测试中都产生了出色的性能,分别提高了2.8%和4.0%。
translated by 谷歌翻译
We introduce Similarity Group Proposal Network (SGPN), a simple and intuitive deep learning framework for 3D object instance segmentation on point clouds. SGPN uses a single network to predict point grouping proposals and a corresponding semantic class for each proposal, from which we can directly extract instance segmentation results. Important to the effectiveness of SGPN is its novel representation of 3D instance segmentation results in the form of a similarity matrix that indicates the similarity between each pair of points in embedded feature space, thus producing an accurate grouping proposal for each point. Experimental results on various 3D scenes show the effectiveness of our method on 3D instance segmentation, and we also evaluate the capability of SGPN to improve 3D object detection and semantic segmentation results. We also demonstrate its flexibility by seamlessly incorporating 2D CNN features into the framework to boost performance.
translated by 谷歌翻译
主动学习通过从未标记的数据集中标记有信息的样本来有效地构建标记的数据集。在现实世界中的活跃学习方案中,考虑到所选样本的多样性至关重要,因为存在许多冗余或高度相似的样本。核心设定方法是基于多样性的有希望的方法,根据样品之间的距离选择不同的样品。然而,与选择最困难的样本的基于不确定性的方法相比,该方法的性能差,神经模型表现出低置信度。在这项工作中,我们通过密度的晶状体分析特征空间,有趣的是,观察到局部稀疏区域往往比密集区域具有更多信息样本。通过我们的分析,我们将核心设定方法赋予密度意识,并提出密度感知的核心集(DACS)。该策略是估计未标记样品的密度,并主要从稀疏区域选择不同的样品。为了减少估计密度的计算瓶颈,我们还基于对区域敏感的散列引入了新的密度近似。实验结果清楚地表明了DAC在分类和回归任务中的功效,并特别表明DAC可以在实际情况下产生最先进的性能。由于DACS微弱地取决于神经体系结构,因此我们提出了一种简单而有效的组合方法,以表明现有方法可以与DAC合并。
translated by 谷歌翻译
Active learning aims to develop label-efficient algorithms by sampling the most representative queries to be labeled by an oracle. We describe a pool-based semisupervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner. Unlike conventional active learning algorithms, our approach is task agnostic, i.e., it does not depend on the performance of the task for which we are trying to acquire labeled data. Our method learns a latent space using a variational autoencoder (VAE) and an adversarial network trained to discriminate between unlabeled and labeled data. The minimax game between the VAE and the adversarial network is played such that while the VAE tries to trick the adversarial network into predicting that all data points are from the labeled pool, the adversarial network learns how to discriminate between dissimilarities in the latent space. We extensively evaluate our method on various image classification and semantic segmentation benchmark datasets and establish a new state of the art on CIFAR10/100, Caltech-256, ImageNet, Cityscapes, and BDD100K. Our results demonstrate that our adversarial approach learns an effective low dimensional latent space in large-scale settings and provides for a computationally efficient sampling method. 1
translated by 谷歌翻译
Existing methods for large-scale point cloud semantic segmentation require expensive, tedious and error-prone manual point-wise annotations. Intuitively, weakly supervised training is a direct solution to reduce the cost of labeling. However, for weakly supervised large-scale point cloud semantic segmentation, too few annotations will inevitably lead to ineffective learning of network. We propose an effective weakly supervised method containing two components to solve the above problem. Firstly, we construct a pretext task, \textit{i.e.,} point cloud colorization, with a self-supervised learning to transfer the learned prior knowledge from a large amount of unlabeled point cloud to a weakly supervised network. In this way, the representation capability of the weakly supervised network can be improved by the guidance from a heterogeneous task. Besides, to generate pseudo label for unlabeled data, a sparse label propagation mechanism is proposed with the help of generated class prototypes, which is used to measure the classification confidence of unlabeled point. Our method is evaluated on large-scale point cloud datasets with different scenarios including indoor and outdoor. The experimental results show the large gain against existing weakly supervised and comparable results to fully supervised methods\footnote{Code based on mindspore: https://github.com/dmcv-ecnu/MindSpore\_ModelZoo/tree/main/WS3\_MindSpore}.
translated by 谷歌翻译