点云正在获得突出的突出,作为代表3D形状的方法,但其不规则结构对深度学习方法构成了挑战。在本文中,我们提出了一种使用随机散步学习3D形状的新方法。以前的作品试图调整卷积神经网络(CNNS)或将网格或网格结构强加到3D点云。这项工作提出了一种不同的方法来表示和学习特定点集的形状。关键的想法是在多个随机散步通过云设置的点上施加结构,用于探索3D对象的不同区域。然后我们学习每次和每次步行代表,并在推理时聚合多个步行预测。我们的方法实现了两个3D形状分析任务的最先进结果:分类和检索。此外,我们提出了一种形状复杂性指示器功能,该函数使用交叉步道和步行间方差措施来细分形状空间。
translated by 谷歌翻译
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
translated by 谷歌翻译
Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption.
translated by 谷歌翻译
3D点云的卷积经过广泛研究,但在几何深度学习中却远非完美。卷积的传统智慧在3D点之间表现出特征对应关系,这是对差的独特特征学习的内在限制。在本文中,我们提出了自适应图卷积(AGCONV),以供点云分析的广泛应用。 AGCONV根据其动态学习的功能生成自适应核。与使用固定/各向同性核的解决方案相比,AGCONV提高了点云卷积的灵活性,有效,精确地捕获了不同语义部位的点之间的不同关系。与流行的注意力体重方案不同,AGCONV实现了卷积操作内部的适应性,而不是简单地将不同的权重分配给相邻点。广泛的评估清楚地表明,我们的方法优于各种基准数据集中的点云分类和分割的最新方法。同时,AGCONV可以灵活地采用更多的点云分析方法来提高其性能。为了验证其灵活性和有效性,我们探索了基于AGCONV的完成,DeNoing,Upsmpling,注册和圆圈提取的范式,它们与竞争对手相当甚至优越。我们的代码可在https://github.com/hrzhou2/adaptconv-master上找到。
translated by 谷歌翻译
多视图投影方法在3D理解任务等方面表现出有希望的性能,如3D分类和分割。然而,它仍然不明确如何将这种多视图方法与广泛可用的3D点云组合。以前的方法使用未受忘掉的启发式方法在点级别结合功能。为此,我们介绍了多视图点云(vinoint云)的概念,表示每个3D点作为从多个视图点提取的一组功能。这种新颖的3D Vintor云表示将3D点云表示的紧凑性与多视图表示的自然观。当然,我们可以用卷积和汇集操作配备这一新的表示。我们以理论上建立的功能形式部署了Voint神经网络(vointnet),以学习vinite空间中的表示。我们的小说代表在ScanObjectnn,ModelNet40和ShapEnet​​ Core55上实现了3D分类和检索的最先进的性能。此外,我们在ShapeNet零件上实现了3D语义细分的竞争性能。进一步的分析表明,与其他方法相比,求力提高了旋转和闭塞的鲁棒性。
translated by 谷歌翻译
Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we propose learning these view-points. Specifically, we introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. As a result, MVTN can be trained end-to-end with any multi-view network for 3D shape classification. We integrate MVTN into a novel adaptive multi-view pipeline that is capable of rendering both 3D meshes and point clouds. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55). Further analysis indicates that our approach exhibits improved robustness to occlusion compared to other methods. We also investigate additional aspects of MVTN, such as 2D pretraining and its use for segmentation. To support further research in this area, we have released MVTorch, a PyTorch library for 3D understanding and generation using multi-view projections.
translated by 谷歌翻译
Shape completion, the problem of estimating the complete geometry of objects from partial observations, lies at the core of many vision and robotics applications. In this work, we propose Point Completion Network (PCN), a novel learning-based approach for shape completion. Unlike existing shape completion methods, PCN directly operates on raw point clouds without any structural assumption (e.g. symmetry) or annotation (e.g. semantic class) about the underlying shape. It features a decoder design that enables the generation of fine-grained completions while maintaining a small number of parameters. Our experiments show that PCN produces dense, complete point clouds with realistic structures in the missing regions on inputs with various levels of incompleteness and noise, including cars from LiDAR scans in the KITTI dataset. Code, data and trained models are available at https://wentaoyuan.github.io/pcn.
translated by 谷歌翻译
卷积神经网络(CNNS)在2D计算机视觉中取得了很大的突破。然而,它们的不规则结构使得难以在网格上直接利用CNNS的潜力。细分表面提供分层多分辨率结构,其中闭合的2 - 歧管三角网格中的每个面正恰好邻近三个面。本文推出了这两种观察,介绍了具有环形细分序列连接的3D三角形网格的创新和多功能CNN框架。在2D图像中的网格面和像素之间进行类比允许我们呈现网状卷积操作者以聚合附近面的局部特征。通过利用面部街区,这种卷积可以支持标准的2D卷积网络概念,例如,可变内核大小,步幅和扩张。基于多分辨率层次结构,我们利用汇集层,将四个面均匀地合并成一个和上采样方法,该方法将一个面分为四个。因此,许多流行的2D CNN架构可以容易地适应处理3D网格。可以通过自我参数化来回收具有任意连接的网格,以使循环细分序列连接,使子变量是一般的方法。广泛的评估和各种应用展示了SubDIVNet的有效性和效率。
translated by 谷歌翻译
学习地区内部背景和区域间关系是加强点云分析的特征表示的两项有效策略。但是,在现有方法中没有完全强调的统一点云表示的两种策略。为此,我们提出了一种名为点关系感知网络(PRA-NET)的小说框架,其由区域内结构学习(ISL)模块和区域间关系学习(IRL)模块组成。ISL模块可以通过可差的区域分区方案和基于代表的基于点的策略自适应和有效地将本地结构信息动态地集成到点特征中,而IRL模块可自适应和有效地捕获区域间关系。在涵盖形状分类,关键点估计和部分分割的几个3D基准测试中的广泛实验已经验证了PRA-Net的有效性和泛化能力。代码将在https://github.com/xiwuchen/pra-net上获得。
translated by 谷歌翻译
Point cloud completion is a generation and estimation issue derived from the partial point clouds, which plays a vital role in the applications in 3D computer vision. The progress of deep learning (DL) has impressively improved the capability and robustness of point cloud completion. However, the quality of completed point clouds is still needed to be further enhanced to meet the practical utilization. Therefore, this work aims to conduct a comprehensive survey on various methods, including point-based, convolution-based, graph-based, and generative model-based approaches, etc. And this survey summarizes the comparisons among these methods to provoke further research insights. Besides, this review sums up the commonly used datasets and illustrates the applications of point cloud completion. Eventually, we also discussed possible research trends in this promptly expanding field.
translated by 谷歌翻译
In this work, we present Point Transformer, a deep neural network that operates directly on unordered and unstructured point sets. We design Point Transformer to extract local and global features and relate both representations by introducing the local-global attention mechanism, which aims to capture spatial point relations and shape information. For that purpose, we propose SortNet, as part of the Point Transformer, which induces input permutation invariance by selecting points based on a learned score. The output of Point Transformer is a sorted and permutation invariant feature list that can directly be incorporated into common computer vision applications. We evaluate our approach on standard classification and part segmentation benchmarks to demonstrate competitive results compared to the prior work. Code is publicly available at: https://github.com/engelnico/point-transformer INDEX TERMS 3D point processing, Artificial neural networks, Computer vision, Feedforward neural networks, Transformer
translated by 谷歌翻译
This paper presents SO-Net, a permutation invariant architecture for deep learning with orderless point clouds. The SO-Net models the spatial distribution of point cloud by building a Self-Organizing Map (SOM). Based on the SOM, SO-Net performs hierarchical feature extraction on individual points and SOM nodes, and ultimately represents the input point cloud by a single feature vector. The receptive field of the network can be systematically adjusted by conducting point-to-node k nearest neighbor search. In recognition tasks such as point cloud reconstruction, classification, object part segmentation and shape retrieval, our proposed network demonstrates performance that is similar with or better than state-of-the-art approaches. In addition, the training speed is significantly faster than existing point cloud recognition networks because of the parallelizability and simplicity of the proposed architecture. Our code is
translated by 谷歌翻译
Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and density functions through kernel density estimation. The most important contribution of this work is a novel reformulation proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.
translated by 谷歌翻译
由于其高质量的对象表示和有效的获取方法,3D点云吸引了越来越多的架构,工程和构建的关注。因此,文献中已经提出了许多点云特征检测方法来自动化一些工作流,例如它们的分类或部分分割。然而,点云自动化系统的性能显着落后于图像对应物。尽管这种故障的一部分源于云云的不规则性,非结构性和混乱,这使得云特征检测的任务比图像一项更具挑战性,但我们认为,图像域缺乏灵感可能是主要的。这种差距的原因。确实,鉴于图像特征检测中卷积神经网络(CNN)的压倒性成功,设计其点云对应物似乎是合理的,但是所提出的方法都不类似于它们。具体而言,即使许多方法概括了点云中的卷积操作,但它们也无法模仿CNN的多种功能检测和汇总操作。因此,我们提出了一个基于图卷积的单元,称为收缩单元,可以垂直和水平堆叠,以设计类似CNN的3D点云提取器。鉴于点云中点之间的自我,局部和全局相关性传达了至关重要的空间几何信息,因此我们在特征提取过程中还利用它们。我们通过为ModelNet-10基准数据集设计功能提取器模型来评估我们的建议,并达到90.64%的分类精度,表明我们的创新想法是有效的。我们的代码可在github.com/albertotamajo/shrinking-unit上获得。
translated by 谷歌翻译
您将如何通过一些错过来修复物理物体?您可能会想象它的原始形状从先前捕获的图像中,首先恢复其整体(全局)但粗大的形状,然后完善其本地细节。我们有动力模仿物理维修程序以解决点云完成。为此,我们提出了一个跨模式的形状转移双转化网络(称为CSDN),这是一种带有全循环参与图像的粗到精细范式,以完成优质的点云完成。 CSDN主要由“ Shape Fusion”和“ Dual-Refinect”模块组成,以应对跨模式挑战。第一个模块将固有的形状特性从单个图像传输,以指导点云缺失区域的几何形状生成,在其中,我们建议iPadain嵌入图像的全局特征和部分点云的完成。第二个模块通过调整生成点的位置来完善粗糙输出,其中本地改进单元通过图卷积利用了小说和输入点之间的几何关系,而全局约束单元则利用输入图像来微调生成的偏移。与大多数现有方法不同,CSDN不仅探讨了图像中的互补信息,而且还可以在整个粗到精细的完成过程中有效利用跨模式数据。实验结果表明,CSDN对十个跨模式基准的竞争对手表现出色。
translated by 谷歌翻译
学习3D点云的新表示形式是3D视觉中的一个活跃研究领域,因为订单不变的点云结构仍然对神经网络体系结构的设计构成挑战。最近的作品探索了学习全球或本地功能或两者兼而有之,但是均未通过分析点的局部方向分布来捕获上下文形状信息的早期方法。在本文中,我们利用点附近的点方向分布,以获取点云的表现力局部邻里表示。我们通过将给定点的球形邻域分为预定义的锥体来实现这一目标,并将每个体积内部的统计数据用作点特征。这样,本地贴片不仅可以由所选点的最近邻居表示,还可以考虑沿该点周围多个方向定义的点密度分布。然后,我们能够构建涉及依赖MLP(多层感知器)层的Odfblock的方向分布函数(ODF)神经网络。新的ODFNET模型可实现ModelNet40和ScanObjectNN数据集的对象分类的最新精度,并在Shapenet S3DIS数据集上进行分割。
translated by 谷歌翻译
Intelligent mesh generation (IMG) refers to a technique to generate mesh by machine learning, which is a relatively new and promising research field. Within its short life span, IMG has greatly expanded the generalizability and practicality of mesh generation techniques and brought many breakthroughs and potential possibilities for mesh generation. However, there is a lack of surveys focusing on IMG methods covering recent works. In this paper, we are committed to a systematic and comprehensive survey describing the contemporary IMG landscape. Focusing on 110 preliminary IMG methods, we conducted an in-depth analysis and evaluation from multiple perspectives, including the core technique and application scope of the algorithm, agent learning goals, data types, targeting challenges, advantages and limitations. With the aim of literature collection and classification based on content extraction, we propose three different taxonomies from three views of key technique, output mesh unit element, and applicable input data types. Finally, we highlight some promising future research directions and challenges in IMG. To maximize the convenience of readers, a project page of IMG is provided at \url{https://github.com/xzb030/IMG_Survey}.
translated by 谷歌翻译
Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data -samples from 2D manifolds in 3D space -we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.
translated by 谷歌翻译
We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image. Limited by the nature of deep neural network, previous methods usually represent a 3D shape in volume or point cloud, and it is non-trivial to convert them to the more ready-to-use mesh model. Unlike the existing methods, our network represents 3D mesh in a graph-based convolutional neural network and produces correct geometry by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image. We adopt a coarse-to-fine strategy to make the whole deformation procedure stable, and define various of mesh related losses to capture properties of different levels to guarantee visually appealing and physically accurate 3D geometry. Extensive experiments show that our method not only qualitatively produces mesh model with better details, but also achieves higher 3D shape estimation accuracy compared to the state-of-the-art.
translated by 谷歌翻译