We present a network architecture for processing point clouds that directly operates on a collection of points represented as a sparse set of samples in a high-dimensional lattice. Naïvely applying convolutions on this lattice scales poorly, both in terms of memory and computational cost, as the size of the lattice increases. Instead, our network uses sparse bilateral convolutional layers as building blocks. These layers maintain efficiency by using indexing structures to apply convolutions only on occupied parts of the lattice, and allow flexible specifications of the lattice structure enabling hierarchical and spatially-aware feature learning, as well as joint 2D-3D reasoning. Both point-based and image-based representations can be easily incorporated in a network with such layers and the resulting model can be trained in an end-to-end manner. We present results on 3D segmentation tasks where our approach outperforms existing state-of-the-art techniques.
translated by 谷歌翻译
Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and density functions through kernel density estimation. The most important contribution of this work is a novel reformulation proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.
translated by 谷歌翻译
3D点云的卷积经过广泛研究,但在几何深度学习中却远非完美。卷积的传统智慧在3D点之间表现出特征对应关系,这是对差的独特特征学习的内在限制。在本文中,我们提出了自适应图卷积(AGCONV),以供点云分析的广泛应用。 AGCONV根据其动态学习的功能生成自适应核。与使用固定/各向同性核的解决方案相比,AGCONV提高了点云卷积的灵活性,有效,精确地捕获了不同语义部位的点之间的不同关系。与流行的注意力体重方案不同,AGCONV实现了卷积操作内部的适应性,而不是简单地将不同的权重分配给相邻点。广泛的评估清楚地表明,我们的方法优于各种基准数据集中的点云分类和分割的最新方法。同时,AGCONV可以灵活地采用更多的点云分析方法来提高其性能。为了验证其灵活性和有效性,我们探索了基于AGCONV的完成,DeNoing,Upsmpling,注册和圆圈提取的范式,它们与竞争对手相当甚至优越。我们的代码可在https://github.com/hrzhou2/adaptconv-master上找到。
translated by 谷歌翻译
We present a simple and general framework for feature learning from point clouds.The key to the success of CNNs is the convolution operator that is capable of leveraging spatially-local correlation in data represented densely in grids (e.g. images). However, point clouds are irregular and unordered, thus directly convolving kernels against features associated with the points will result in desertion of shape information and variance to point ordering. To address these problems, we propose to learn an X -transformation from the input points to simultaneously promote two causes: the first is the weighting of the input features associated with the points, and the second is the permutation of the points into a latent and potentially canonical order. Element-wise product and sum operations of the typical convolution operator are subsequently applied on the X -transformed features. The proposed method is a generalization of typical CNNs to feature learning from point clouds, thus we call it PointCNN. Experiments show that PointCNN achieves on par or better performance than state-of-the-art methods on multiple challenging benchmark datasets and tasks.
translated by 谷歌翻译
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
translated by 谷歌翻译
We present Kernel Point Convolution 1 (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. The convolution weights of KPConv are located in Euclidean space by kernel points, and applied to the input points close to them. Its capacity to use any number of kernel points gives KP-Conv more flexibility than fixed grid convolutions. Furthermore, these locations are continuous in space and can be learned by the network. Therefore, KPConv can be extended to deformable convolutions that learn to adapt kernel points to local geometry. Thanks to a regular subsampling strategy, KPConv is also efficient and robust to varying densities. Whether they use deformable KPConv for complex tasks, or rigid KPconv for simpler tasks, our networks outperform state-of-the-art classification and segmentation approaches on several datasets. We also offer ablation studies and visualizations to provide understanding of what has been learned by KPConv and to validate the descriptive power of deformable KPConv.
translated by 谷歌翻译
Convolutional networks are the de-facto standard for analyzing spatio-temporal data such as images, videos, and 3D shapes. Whilst some of this data is naturally dense (e.g., photos), many other data sources are inherently sparse. Examples include 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard "dense" implementations of convolutional networks are very inefficient when applied on such sparse data. We introduce new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and use them to develop spatially-sparse convolutional networks. We demonstrate the strong performance of the resulting models, called submanifold sparse convolutional networks (SSCNs), on two tasks involving semantic segmentation of 3D point clouds. In particular, our models outperform all prior state-of-the-art on the test set of a recent semantic segmentation competition.
translated by 谷歌翻译
We present OctNet, a representation for deep learning with sparse 3D data. In contrast to existing models, our representation enables 3D convolutional networks which are both deep and high resolution. Towards this goal, we exploit the sparsity in the input data to hierarchically partition the space using a set of unbalanced octrees where each leaf node stores a pooled feature representation. This allows to focus memory allocation and computation to the relevant dense regions and enables deeper networks without compromising resolution. We demonstrate the utility of our OctNet representation by analyzing the impact of resolution on several 3D tasks including 3D object classification, orientation estimation and point cloud labeling.
translated by 谷歌翻译
3D网格的几何特征学习是计算机图形的核心,对于许多视觉应用非常重要。然而,由于缺乏所需的操作和/或其有效的实现,深度学习目前滞后于异构3D网格的层次建模。在本文中,我们提出了一系列模块化操作,以实现异构3D网格的有效几何深度学习。这些操作包括网格卷曲,(UN)池和高效的网格抽取。我们提供这些操作的开源实施,统称为\ Texit {Picasso}。 Picasso的网格抽取模块是GPU加速的模块,可以在飞行中加工一批用于深度学习的网格。我们(联合国)汇集操作在不同分辨率的网络层跨网络层计算新创建的神经元的功能。我们的网格卷曲包括FaceT2Vertex,Vertex2Facet和FaceT2Facet卷积,用于利用VMF混合物和重心插值来包含模糊建模。利用Picasso的模块化操作,我们贡献了一个新型的分层神经网络Picassonet-II,以了解3D网格的高度辨别特征。 Picassonet-II接受原始地理学和Mesh Facet的精细纹理作为输入功能,同时处理完整场景网格。我们的网络达到了各种基准的形状分析和场景的竞争性能。我们在github https://github.com/enyahermite/picasso发布Picasso和Picassonet-II。
translated by 谷歌翻译
Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption.
translated by 谷歌翻译
We present an approach to semantic scene analysis using deep convolutional networks. Our approach is based on tangent convolutions -a new construction for convolutional networks on 3D data. In contrast to volumetric approaches, our method operates directly on surface geometry. Crucially, the construction is applicable to unstructured point clouds and other noisy real-world data. We show that tangent convolutions can be evaluated efficiently on large-scale point clouds with millions of points. Using tangent convolutions, we design a deep fully-convolutional network for semantic segmentation of 3D point clouds, and apply it to challenging real-world datasets of indoor and outdoor 3D environments. Experimental results show that the presented approach outperforms other recent deep network constructions in detailed analysis of large 3D scenes.
translated by 谷歌翻译
近年来,由于强大的3D CNN,基于体素的方法已成为室内场景3D语义分割的最新方法。然而,基于体素的方法忽略了基础的几何形状,由于缺乏地理位置信息而在空间上闭合物体上的模棱两可的特征遭受了含糊的特征,并努力处理复杂和不规则的几何形状。鉴于此,我们提出了Voxel-Mesh网络(VMNET),这是一种新颖的3D深度体系结构,该架构在Voxel和网格表示上运行,并利用了欧几里得和地球信息。从直觉上讲,从体素中提取的欧几里得信息可以提供代表附近对象之间交互的上下文提示,而从网格中提取的地理信息可以帮助空间上接近但断开表面的分离对象。为了合并两个域中的此类信息,我们设计了一个内域的专注模块,以进行有效的特征聚集和一个用于自适应特征融合的专注于域间的模块。实验结果验证了VMNET的有效性:具体而言,在具有挑战性的扫描仪数据集上,用于大规模的室内场景分割,它的表现优于最先进的Sparseconvnet和Minkowskownet(74.6%vs 72.5%和73.6%)更简单的网络结构(17m vs 30m和38m参数)。代码发布:https://github.com/hzykent/vmnet
translated by 谷歌翻译
在本文中,我们提出了一个全面的点云语义分割网络,该网络汇总了本地和全球多尺度信息。首先,我们提出一个角度相关点卷积(ACPCONV)模块,以有效地了解点的局部形状。其次,基于ACPCONV,我们引入了局部多规模拆分(MSS)块,该块从一个单个块中连接到一个单个块中的特征,并逐渐扩大了接受场,这对利用本地上下文是有益的。第三,受HRNET的启发,在2D图像视觉任务上具有出色的性能,我们构建了一个针对Point Cloud的HRNET,以学习全局多尺度上下文。最后,我们介绍了一种融合多分辨率预测并进一步改善点云语义分割性能的点上的注意融合方法。我们在几个基准数据集上的实验结果和消融表明,与现有方法相比,我们提出的方法有效,能够实现最先进的性能。
translated by 谷歌翻译
多视图投影方法在3D理解任务等方面表现出有希望的性能,如3D分类和分割。然而,它仍然不明确如何将这种多视图方法与广泛可用的3D点云组合。以前的方法使用未受忘掉的启发式方法在点级别结合功能。为此,我们介绍了多视图点云(vinoint云)的概念,表示每个3D点作为从多个视图点提取的一组功能。这种新颖的3D Vintor云表示将3D点云表示的紧凑性与多视图表示的自然观。当然,我们可以用卷积和汇集操作配备这一新的表示。我们以理论上建立的功能形式部署了Voint神经网络(vointnet),以学习vinite空间中的表示。我们的小说代表在ScanObjectnn,ModelNet40和ShapEnet​​ Core55上实现了3D分类和检索的最先进的性能。此外,我们在ShapeNet零件上实现了3D语义细分的竞争性能。进一步的分析表明,与其他方法相比,求力提高了旋转和闭塞的鲁棒性。
translated by 谷歌翻译
变压器在自然语言处理中的成功最近引起了计算机视觉领域的关注。由于能够学习长期依赖性,变压器已被用作广泛使用的卷积运算符的替代品。事实证明,这种替代者在许多任务中都取得了成功,其中几种最先进的方法依靠变压器来更好地学习。在计算机视觉中,3D字段还见证了使用变压器来增加3D卷积神经网络和多层感知器网络的增加。尽管许多调查都集中在视力中的变压器上,但由于与2D视觉相比,由于数据表示和处理的差异,3D视觉需要特别注意。在这项工作中,我们介绍了针对不同3D视觉任务的100多种变压器方法的系统和彻底审查,包括分类,细分,检测,完成,姿势估计等。我们在3D Vision中讨论了变形金刚的设计,该设计使其可以使用各种3D表示形式处理数据。对于每个应用程序,我们强调了基于变压器的方法的关键属性和贡献。为了评估这些方法的竞争力,我们将它们的性能与12个3D基准测试的常见非转化方法进行了比较。我们通过讨论3D视觉中变压器的不同开放方向和挑战来结束调查。除了提出的论文外,我们的目标是频繁更新最新的相关论文及其相应的实现:https://github.com/lahoud/3d-vision-transformers。
translated by 谷歌翻译
我们提出CPT:卷积点变压器 - 一种用于处理3D点云数据的非结构化性质的新型深度学习架构。 CPT是对现有关注的卷曲神经网络以及以前的3D点云处理变压器的改进。由于其在创建基于新颖的基于注意力的点集合嵌入通过制作用于处理动态局部点设定的邻域的卷积投影层的嵌入来实现这一壮举。结果点设置嵌入对输入点的排列是强大的。我们的小说CPT块在网络结构中通过动态图计算获得的本地邻居构建。它是完全可差异的,可以像卷积层一样堆叠,以学习点的全局属性。我们评估我们的模型在ModelNet40,ShapEnet​​部分分割和S3DIS 3D室内场景语义分割数据集等标准基准数据集上,以显示我们的模型可以用作各种点云处理任务的有效骨干,与现有状态相比 - 艺术方法。
translated by 谷歌翻译
随着激光雷达传感器和3D视觉摄像头的扩散,3D点云分析近年来引起了重大关注。经过先驱工作点的成功后,基于深度学习的方法越来越多地应用于各种任务,包括3D点云分段和3D对象分类。在本文中,我们提出了一种新颖的3D点云学习网络,通过选择性地执行具有动态池的邻域特征聚合和注意机制来提出作为动态点特征聚合网络(DPFA-NET)。 DPFA-Net有两个可用于三维云的语义分割和分类的变体。作为DPFA-NET的核心模块,我们提出了一个特征聚合层,其中每个点的动态邻域的特征通过自我注意机制聚合。与其他分割模型相比,来自固定邻域的聚合特征,我们的方法可以在不同层中聚合来自不同邻居的特征,在不同层中为查询点提供更具选择性和更广泛的视图,并更多地关注本地邻域中的相关特征。此外,为了进一步提高所提出的语义分割模型的性能,我们提出了两种新方法,即两级BF-Net和BF-Rengralization来利用背景前台信息。实验结果表明,所提出的DPFA-Net在S3DIS数据集上实现了最先进的整体精度分数,在S3DIS数据集上进行了语义分割,并在不同的语义分割,部分分割和3D对象分类中提供始终如一的令人满意的性能。与其他方法相比,它也在计算上更有效。
translated by 谷歌翻译
我们介绍了PointConvormer,这是一个基于点云的深神经网络体系结构的新颖构建块。受到概括理论的启发,PointConvormer结合了点卷积的思想,其中滤波器权重仅基于相对位置,而变形金刚则利用了基于功能的注意力。在PointConvormer中,附近点之间的特征差异是重量重量卷积权重的指标。因此,我们从点卷积操作中保留了不变,而注意力被用来选择附近的相关点进行卷积。为了验证PointConvormer的有效性,我们在点云上进行了语义分割和场景流估计任务,其中包括扫描仪,Semantickitti,FlyingThings3D和Kitti。我们的结果表明,PointConvormer具有经典的卷积,常规变压器和Voxelized稀疏卷积方法的表现,具有较小,更高效的网络。可视化表明,PointConvormer的性能类似于在平面表面上的卷积,而邻域选择效果在物体边界上更强,表明它具有两全其美。
translated by 谷歌翻译
This paper presents SO-Net, a permutation invariant architecture for deep learning with orderless point clouds. The SO-Net models the spatial distribution of point cloud by building a Self-Organizing Map (SOM). Based on the SOM, SO-Net performs hierarchical feature extraction on individual points and SOM nodes, and ultimately represents the input point cloud by a single feature vector. The receptive field of the network can be systematically adjusted by conducting point-to-node k nearest neighbor search. In recognition tasks such as point cloud reconstruction, classification, object part segmentation and shape retrieval, our proposed network demonstrates performance that is similar with or better than state-of-the-art approaches. In addition, the training speed is significantly faster than existing point cloud recognition networks because of the parallelizability and simplicity of the proposed architecture. Our code is
translated by 谷歌翻译