点云分析没有姿势前导者在真实应用中非常具有挑战性,因为点云的方向往往是未知的。在本文中,我们提出了一个全新的点集学习框架prin,即点亮旋转不变网络,专注于点云分析中的旋转不变特征提取。我们通过密度意识的自适应采样构建球形信号,以处理球形空间中的扭曲点分布。提出了球形Voxel卷积和点重新采样以提取每个点的旋转不变特征。此外,我们将Prin扩展到称为Sprin的稀疏版本,直接在稀疏点云上运行。 Prin和Sprin都可以应用于从对象分类,部分分割到3D特征匹配和标签对齐的任务。结果表明,在随机旋转点云的数据集上,Sprin比无任何数据增强的最先进方法表现出更好的性能。我们还为我们的方法提供了彻底的理论证明和分析,以实现我们的方法实现的点明智的旋转不变性。我们的代码可在https://github.com/qq456cvb/sprin上找到。
translated by 谷歌翻译
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
translated by 谷歌翻译
点云识别是工业机器人和自主驾驶中的重要任务。最近,几个点云处理模型已经实现了最先进的表演。然而,这些方法缺乏旋转稳健性,并且它们的性能严重降低了随机旋转,未能扩展到具有不同方向的现实情景。为此,我们提出了一种名为基于自行轮廓的转换(SCT)的方法,该方法可以灵活地集成到针对任意旋转的各种现有点云识别模型中。 SCT通过引入轮廓感知的转换(CAT)提供有效的旋转和翻译不变性,该转换(CAT)线性地将点数的笛卡尔坐标转换为翻译和旋转 - 不变表示。我们证明猫是一种基于理论分析的旋转和翻译不变的转换。此外,提出了帧对准模块来增强通过捕获轮廓并将基于自平台的帧转换为帧内帧来增强鉴别特征提取。广泛的实验结果表明,SCT在合成和现实世界基准的有效性和效率的任意旋转下表现出最先进的方法。此外,稳健性和一般性评估表明SCT是稳健的,适用于各种点云处理模型,它突出了工业应用中SCT的优势。
translated by 谷歌翻译
学习3D点云的新表示形式是3D视觉中的一个活跃研究领域,因为订单不变的点云结构仍然对神经网络体系结构的设计构成挑战。最近的作品探索了学习全球或本地功能或两者兼而有之,但是均未通过分析点的局部方向分布来捕获上下文形状信息的早期方法。在本文中,我们利用点附近的点方向分布,以获取点云的表现力局部邻里表示。我们通过将给定点的球形邻域分为预定义的锥体来实现这一目标,并将每个体积内部的统计数据用作点特征。这样,本地贴片不仅可以由所选点的最近邻居表示,还可以考虑沿该点周围多个方向定义的点密度分布。然后,我们能够构建涉及依赖MLP(多层感知器)层的Odfblock的方向分布函数(ODF)神经网络。新的ODFNET模型可实现ModelNet40和ScanObjectNN数据集的对象分类的最新精度,并在Shapenet S3DIS数据集上进行分割。
translated by 谷歌翻译
对于许多应用程序,例如同时本地化和映射(SLAM),基于点云的大规模识别是一项重要但具有挑战性的任务。以任务为云检索问题,以前的方法取得了令人愉快的成就。但是,如何处理由旋转问题引起的灾难性崩溃仍然不足。在本文中,为了解决这个问题,我们提出了一个基于点云的新型旋转型大型位置识别网络(RPR-NET)。特别是,为了解决问题,我们建议分三个步骤学习旋转不变的功能。首先,我们设计了三种新型的旋转不变特征(RIF),它们是可以保持旋转不变属性的低级特征。其次,使用这些Rifs,我们设计了一个细心的模块来学习旋转不变的内核。第三,我们将这些内核应用于先前的点云功能,以生成新功能,这是众所周知的SO(3)映射过程。通过这样做,可以学习高级场景特定的旋转不变功能。我们将上述过程称为细心的旋转不变卷积(ARICONV)。为了实现位置识别目标,我们构建了RPR-NET,它将Ariconv作为构建密集网络体系结构的基本单元。然后,可以从RPR-NET中充分提取用于基于检索的位置识别的强大全局描述符。普遍数据​​集的实验结果表明,我们的方法可以在解决旋转问题时显着优于现有的最新位置识别模型的可比结果,并显着优于其他旋转不变的基线模型。
translated by 谷歌翻译
Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption.
translated by 谷歌翻译
3D点云的卷积经过广泛研究,但在几何深度学习中却远非完美。卷积的传统智慧在3D点之间表现出特征对应关系,这是对差的独特特征学习的内在限制。在本文中,我们提出了自适应图卷积(AGCONV),以供点云分析的广泛应用。 AGCONV根据其动态学习的功能生成自适应核。与使用固定/各向同性核的解决方案相比,AGCONV提高了点云卷积的灵活性,有效,精确地捕获了不同语义部位的点之间的不同关系。与流行的注意力体重方案不同,AGCONV实现了卷积操作内部的适应性,而不是简单地将不同的权重分配给相邻点。广泛的评估清楚地表明,我们的方法优于各种基准数据集中的点云分类和分割的最新方法。同时,AGCONV可以灵活地采用更多的点云分析方法来提高其性能。为了验证其灵活性和有效性,我们探索了基于AGCONV的完成,DeNoing,Upsmpling,注册和圆圈提取的范式,它们与竞争对手相当甚至优越。我们的代码可在https://github.com/hrzhou2/adaptconv-master上找到。
translated by 谷歌翻译
Deep neural networks have enjoyed remarkable success for various vision tasks, however it remains challenging to apply CNNs to domains lacking a regular underlying structures such as 3D point clouds. Towards this we propose a novel convolutional architecture, termed Spi-derCNN, to efficiently extract geometric features from point clouds. Spi-derCNN is comprised of units called SpiderConv, which extend convolutional operations from regular grids to irregular point sets that can be embedded in R n , by parametrizing a family of convolutional filters. We design the filter as a product of a simple step function that captures local geodesic information and a Taylor polynomial that ensures the expressiveness. SpiderCNN inherits the multi-scale hierarchical architecture from classical CNNs, which allows it to extract semantic deep features. Experiments on ModelNet40[4] demonstrate that SpiderCNN achieves state-of-the-art accuracy 92.4% on standard benchmarks, and shows competitive performance on segmentation task.
translated by 谷歌翻译
Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and density functions through kernel density estimation. The most important contribution of this work is a novel reformulation proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.
translated by 谷歌翻译
在本文中,我们提出了一个新颖的基于本地描述符的框架,称您仅假设一次(Yoho),以注册两个未对齐的点云。与大多数依赖脆弱的局部参考框架获得旋转不变性的现有局部描述符相反,拟议的描述符通过群体epoivariant特征学习的最新技术实现了旋转不变性,这为点密度和噪声带来了更大的鲁棒性。同时,Yoho中的描述符也有一个旋转模棱两可的部分,这使我们能够从仅一个对应假设估算注册。这样的属性减少了可行变换的搜索空间,因此大大提高了Yoho的准确性和效率。广泛的实验表明,Yoho在四个广泛使用的数据集(3DMATCH/3DLOMATCH数据集,ETH数据集和WHU-TLS数据集)上实现了卓越的性能。更多详细信息在我们的项目页面中显示:https://hpwang-whu.github.io/yoho/。
translated by 谷歌翻译
This paper presents SO-Net, a permutation invariant architecture for deep learning with orderless point clouds. The SO-Net models the spatial distribution of point cloud by building a Self-Organizing Map (SOM). Based on the SOM, SO-Net performs hierarchical feature extraction on individual points and SOM nodes, and ultimately represents the input point cloud by a single feature vector. The receptive field of the network can be systematically adjusted by conducting point-to-node k nearest neighbor search. In recognition tasks such as point cloud reconstruction, classification, object part segmentation and shape retrieval, our proposed network demonstrates performance that is similar with or better than state-of-the-art approaches. In addition, the training speed is significantly faster than existing point cloud recognition networks because of the parallelizability and simplicity of the proposed architecture. Our code is
translated by 谷歌翻译
卷积神经网络(CNN)已被广泛用于各种视觉任务,例如图像分类,语义分割等。不幸的是,标准2D CNN不太适合球形信号,例如全景图像或球形投影,因为球体是一个非结构化的网格。在本文中,我们提出了球形变压器,可以将球形信号转换为可以通过标准CNN直接处理的向量,从而通过预处理可以在任务和数据集中重复使用许多精心设计的CNNS体系结构。为此,提出的方法首先使用局部结构化采样方法(例如HealPix)通过使用球形点及其相邻点的信息来构建变压器网格,然后通过网格将球形信号转换为向量。通过构建球形变压器模块,我们可以直接使用多个CNN体系结构。我们评估了有关球形MNIST识别,3D对象分类和全向图像语义分割的任务的方法。对于3D对象分类,我们进一步提出了一种基于渲染的投影方法,以提高性能和旋转等值模型,以提高抗旋转能力。关于三个任务的实验结果表明,我们的方法比最先进的方法实现了卓越的性能。
translated by 谷歌翻译
我们为3D点云提出了一种自我监督的胶囊架构。我们通过置换等级的注意力计算对象的胶囊分解,并通过用对随机旋转对象的对进行自我监督处理。我们的主要思想是将注意力掩码汇总为语义关键点,并使用这些来监督满足胶囊不变性/设备的分解。这不仅能够培训语义一致的分解,而且还允许我们学习一个能够以对客观的推理的规范化操作。培训我们的神经网络,我们既不需要分类标签也没有手动对齐训练数据集。然而,通过以自我监督方式学习以对象形式的表示,我们的方法在3D点云重建,规范化和无监督的分类上表现出最先进的。
translated by 谷歌翻译
多视图投影方法在3D理解任务等方面表现出有希望的性能,如3D分类和分割。然而,它仍然不明确如何将这种多视图方法与广泛可用的3D点云组合。以前的方法使用未受忘掉的启发式方法在点级别结合功能。为此,我们介绍了多视图点云(vinoint云)的概念,表示每个3D点作为从多个视图点提取的一组功能。这种新颖的3D Vintor云表示将3D点云表示的紧凑性与多视图表示的自然观。当然,我们可以用卷积和汇集操作配备这一新的表示。我们以理论上建立的功能形式部署了Voint神经网络(vointnet),以学习vinite空间中的表示。我们的小说代表在ScanObjectnn,ModelNet40和ShapEnet​​ Core55上实现了3D分类和检索的最先进的性能。此外,我们在ShapeNet零件上实现了3D语义细分的竞争性能。进一步的分析表明,与其他方法相比,求力提高了旋转和闭塞的鲁棒性。
translated by 谷歌翻译
3D shape models are becoming widely available and easier to capture, making available 3D information crucial for progress in object classification. Current state-of-theart methods rely on CNNs to address this problem. Recently, we witness two types of CNNs being developed: CNNs based upon volumetric representations versus CNNs based upon multi-view representations. Empirical results from these two types of CNNs exhibit a large gap, indicating that existing volumetric CNN architectures and approaches are unable to fully exploit the power of 3D representations. In this paper, we aim to improve both volumetric CNNs and multi-view CNNs according to extensive analysis of existing approaches. To this end, we introduce two distinct network architectures of volumetric CNNs. In addition, we examine multi-view CNNs, where we introduce multiresolution filtering in 3D. Overall, we are able to outperform current state-of-the-art methods for both volumetric CNNs and multi-view CNNs. We provide extensive experiments designed to evaluate underlying design choices, thus providing a better understanding of the space of methods available for object classification on 3D data.
translated by 谷歌翻译
基于2D图像的3D对象的推理由于从不同方向查看对象引起的外观差异很大,因此具有挑战性。理想情况下,我们的模型将是对物体姿势变化的不变或等效的。不幸的是,对于2D图像输入,这通常是不可能的,因为我们没有一个先验模型,即在平面外对象旋转下如何改变图像。唯一的$ \ mathrm {so}(3)$ - 当前存在的模型需要点云输入而不是2D图像。在本文中,我们提出了一种基于Icosahedral群卷积的新型模型体系结构,即通过将输入图像投影到iCosahedron上,以$ \ mathrm {so(3)} $中的理由。由于此投影,该模型大致与$ \ mathrm {so}(3)$中的旋转大致相当。我们将此模型应用于对象构成估计任务,并发现它的表现优于合理的基准。
translated by 谷歌翻译
许多应用程序需要神经网络的鲁棒性或理想的不变性,以使输入数据的某些转换。最常见的是,通过使用对抗性培训或定义包括设计所需不变性的网络体系结构来解决此要求。在这项工作中,我们提出了一种方法,使网络体系结构通过基于固定标准从(可能连续的)轨道中选择一个元素,从而使网络体系结构相对于小组操作证明是不变的。简而言之,我们打算在将数据馈送到实际网络之前“撤消”任何可能的转换。此外,我们凭经验分析了通过训练或体系结构结合不变性的不同方法的特性,并在鲁棒性和计算效率方面证明了我们方法的优势。特别是,我们研究了图像旋转(可以持续到离散化工件)以及3D点云分类的可证明的方向和缩放不变性方面的鲁棒性。
translated by 谷歌翻译
从低级视觉理论中出现,可说的过滤器在先前的卷积神经网络上的工作中发现了对应物,等同于僵化的转换。在我们的工作中,我们提出了一种基于球形决策表面的神经元组成的基于馈送的可转向学习方法,并在点云上运行。这种球形神经元是通过欧几里得空间的共形嵌入来获得的,最近在点集的学习表示中被重新审视。为了关注3D几何形状,我们利用球形神经元的等轴测特性,并得出3D可识别性约束。在训练球形神经元以在规范方向上分类点云之后,我们使用四面体基础来使神经元四倍,并构建旋转 - 等级的球形滤波器库。然后,我们应用派生的约束来插值过滤器库输出,从而获得旋转不变的网络。最后,我们使用合成点集和现实世界3D骨架数据来验证我们的理论发现。该代码可在https://github.com/pavlo-melnyk/steerable-3d-neurons上找到。
translated by 谷歌翻译
Coordinate-based implicit neural networks, or neural fields, have emerged as useful representations of shape and appearance in 3D computer vision. Despite advances however, it remains challenging to build neural fields for categories of objects without datasets like ShapeNet that provide canonicalized object instances that are consistently aligned for their 3D position and orientation (pose). We present Canonical Field Network (CaFi-Net), a self-supervised method to canonicalize the 3D pose of instances from an object category represented as neural fields, specifically neural radiance fields (NeRFs). CaFi-Net directly learns from continuous and noisy radiance fields using a Siamese network architecture that is designed to extract equivariant field features for category-level canonicalization. During inference, our method takes pre-trained neural radiance fields of novel object instances at arbitrary 3D pose, and estimates a canonical field with consistent 3D pose across the entire category. Extensive experiments on a new dataset of 1300 NeRF models across 13 object categories show that our method matches or exceeds the performance of 3D point cloud-based methods.
translated by 谷歌翻译