Standard convolutional neural networks assume a grid structured input is available and exploit discrete convolutions as their fundamental building blocks. This limits their applicability to many real-world applications. In this paper we propose Parametric Continuous Convolution, a new learnable operator that operates over non-grid structured data. The key idea is to exploit parameterized kernel functions that span the full continuous vector space. This generalization allows us to learn over arbitrary data structures as long as their support relationship is computable. Our experiments show significant improvement over the state-ofthe-art in point cloud segmentation of indoor and outdoor scenes, and lidar motion estimation of driving scenes.
translated by 谷歌翻译
3D点云的卷积经过广泛研究,但在几何深度学习中却远非完美。卷积的传统智慧在3D点之间表现出特征对应关系,这是对差的独特特征学习的内在限制。在本文中,我们提出了自适应图卷积(AGCONV),以供点云分析的广泛应用。 AGCONV根据其动态学习的功能生成自适应核。与使用固定/各向同性核的解决方案相比,AGCONV提高了点云卷积的灵活性,有效,精确地捕获了不同语义部位的点之间的不同关系。与流行的注意力体重方案不同,AGCONV实现了卷积操作内部的适应性,而不是简单地将不同的权重分配给相邻点。广泛的评估清楚地表明,我们的方法优于各种基准数据集中的点云分类和分割的最新方法。同时,AGCONV可以灵活地采用更多的点云分析方法来提高其性能。为了验证其灵活性和有效性,我们探索了基于AGCONV的完成,DeNoing,Upsmpling,注册和圆圈提取的范式,它们与竞争对手相当甚至优越。我们的代码可在https://github.com/hrzhou2/adaptconv-master上找到。
translated by 谷歌翻译
Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and density functions through kernel density estimation. The most important contribution of this work is a novel reformulation proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.
translated by 谷歌翻译
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
translated by 谷歌翻译
我们介绍了PointConvormer,这是一个基于点云的深神经网络体系结构的新颖构建块。受到概括理论的启发,PointConvormer结合了点卷积的思想,其中滤波器权重仅基于相对位置,而变形金刚则利用了基于功能的注意力。在PointConvormer中,附近点之间的特征差异是重量重量卷积权重的指标。因此,我们从点卷积操作中保留了不变,而注意力被用来选择附近的相关点进行卷积。为了验证PointConvormer的有效性,我们在点云上进行了语义分割和场景流估计任务,其中包括扫描仪,Semantickitti,FlyingThings3D和Kitti。我们的结果表明,PointConvormer具有经典的卷积,常规变压器和Voxelized稀疏卷积方法的表现,具有较小,更高效的网络。可视化表明,PointConvormer的性能类似于在平面表面上的卷积,而邻域选择效果在物体边界上更强,表明它具有两全其美。
translated by 谷歌翻译
We present Kernel Point Convolution 1 (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. The convolution weights of KPConv are located in Euclidean space by kernel points, and applied to the input points close to them. Its capacity to use any number of kernel points gives KP-Conv more flexibility than fixed grid convolutions. Furthermore, these locations are continuous in space and can be learned by the network. Therefore, KPConv can be extended to deformable convolutions that learn to adapt kernel points to local geometry. Thanks to a regular subsampling strategy, KPConv is also efficient and robust to varying densities. Whether they use deformable KPConv for complex tasks, or rigid KPconv for simpler tasks, our networks outperform state-of-the-art classification and segmentation approaches on several datasets. We also offer ablation studies and visualizations to provide understanding of what has been learned by KPConv and to validate the descriptive power of deformable KPConv.
translated by 谷歌翻译
Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption.
translated by 谷歌翻译
在自动驾驶汽车和移动机器人上使用的多光束liDAR传感器可获得3D范围扫描的序列(“帧”)。由于有限的角度扫描分辨率和阻塞,每个框架都稀疏地覆盖了场景。稀疏性限制了语义分割或表面重建等下游过程的性能。幸运的是,当传感器移动时,帧将从一系列不同的观点捕获。这提供了互补的信息,当积累在公共场景坐标框架中时,会产生更密集的采样和对基础3D场景的更完整覆盖。但是,扫描场景通常包含移动对象。这些对象上的点不能仅通过撤消扫描仪的自我运动来正确对齐。在本文中,我们将多帧点云积累作为3D扫描序列的中级表示,并开发了一种利用室外街道场景的感应偏见的方法,包括其几何布局和对象级刚性。与最新的场景流估计器相比,我们提出的方法旨在使所有3D点在共同的参考框架中对齐,以正确地积累各个对象上的点。我们的方法大大减少了几个基准数据集上的对齐错误。此外,累积的点云使诸如表面重建之类的高级任务受益。
translated by 谷歌翻译
这项工作通过创建具有准确而完整的动态场景的新颖户外数据集来解决语义场景完成(SSC)数据中的差距。我们的数据集是由每个时间步骤的随机采样视图形成的,该步骤可监督无需遮挡或痕迹的场景的普遍性。我们通过利用最新的3D深度学习体系结构来使用时间信息来创建最新的开源网络中的SSC基准,并构建基准实时密集的局部语义映射算法MotionsC。我们的网络表明,提出的数据集可以在存在动态对象的情况下量化和监督准确的场景完成,这可以导致改进的动态映射算法的开发。所有软件均可在https://github.com/umich-curly/3dmapping上找到。
translated by 谷歌翻译
变压器在自然语言处理中的成功最近引起了计算机视觉领域的关注。由于能够学习长期依赖性,变压器已被用作广泛使用的卷积运算符的替代品。事实证明,这种替代者在许多任务中都取得了成功,其中几种最先进的方法依靠变压器来更好地学习。在计算机视觉中,3D字段还见证了使用变压器来增加3D卷积神经网络和多层感知器网络的增加。尽管许多调查都集中在视力中的变压器上,但由于与2D视觉相比,由于数据表示和处理的差异,3D视觉需要特别注意。在这项工作中,我们介绍了针对不同3D视觉任务的100多种变压器方法的系统和彻底审查,包括分类,细分,检测,完成,姿势估计等。我们在3D Vision中讨论了变形金刚的设计,该设计使其可以使用各种3D表示形式处理数据。对于每个应用程序,我们强调了基于变压器的方法的关键属性和贡献。为了评估这些方法的竞争力,我们将它们的性能与12个3D基准测试的常见非转化方法进行了比较。我们通过讨论3D视觉中变压器的不同开放方向和挑战来结束调查。除了提出的论文外,我们的目标是频繁更新最新的相关论文及其相应的实现:https://github.com/lahoud/3d-vision-transformers。
translated by 谷歌翻译
In this work, we address the problem of unsupervised moving object segmentation (MOS) in 4D LiDAR data recorded from a stationary sensor, where no ground truth annotations are involved. Deep learning-based state-of-the-art methods for LiDAR MOS strongly depend on annotated ground truth data, which is expensive to obtain and scarce in existence. To close this gap in the stationary setting, we propose a novel 4D LiDAR representation based on multivariate time series that relaxes the problem of unsupervised MOS to a time series clustering problem. More specifically, we propose modeling the change in occupancy of a voxel by a multivariate occupancy time series (MOTS), which captures spatio-temporal occupancy changes on the voxel level and its surrounding neighborhood. To perform unsupervised MOS, we train a neural network in a self-supervised manner to encode MOTS into voxel-level feature representations, which can be partitioned by a clustering algorithm into moving or stationary. Experiments on stationary scenes from the Raw KITTI dataset show that our fully unsupervised approach achieves performance that is comparable to that of supervised state-of-the-art approaches.
translated by 谷歌翻译
In this work, we study 3D object detection from RGB-D data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by popping up RGB-D scans. However, a key challenge of this approach is how to efficiently localize objects in point clouds of large-scale scenes (region proposal). Instead of solely relying on 3D proposals, our method leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects. Benefited from learning directly in raw point clouds, our method is also able to precisely estimate 3D bounding boxes even under strong occlusion or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection benchmarks, our method outperforms the state of the art by remarkable margins while having real-time capability. * Majority of the work done as an intern at Nuro, Inc. depth to point cloud 2D region (from CNN) to 3D frustum 3D box (from PointNet)
translated by 谷歌翻译
准确的移动对象细分是自动驾驶的重要任务。它可以为许多下游任务提供有效的信息,例如避免碰撞,路径计划和静态地图构建。如何有效利用时空信息是3D激光雷达移动对象分割(LIDAR-MOS)的关键问题。在这项工作中,我们提出了一个新型的深神经网络,利用了时空信息和不同的LiDAR扫描表示方式,以提高LIDAR-MOS性能。具体而言,我们首先使用基于图像图像的双分支结构来分别处理可以从顺序的LiDAR扫描获得的空间和时间信息,然后使用运动引导的注意模块组合它们。我们还通过3D稀疏卷积使用点完善模块来融合LIDAR范围图像和点云表示的信息,并减少对象边界上的伪像。我们验证了我们提出的方法对Semantickitti的LiDAR-MOS基准的有效性。我们的方法在LiDar-Mos IOU方面大大优于最先进的方法。从设计的粗到精细体系结构中受益,我们的方法以传感器框架速率在线运行。我们方法的实现可作为开源可用:https://github.com/haomo-ai/motionseg3d。
translated by 谷歌翻译
了解3D场景是自治代理的关键先决条件。最近,LIDAR和其他传感器已经以点云帧的时间序列形式提供了大量数据。在这项工作中,我们提出了一种新的问题 - 顺序场景流量估计(SSFE) - 该旨在预测给定序列中所有点云的3D场景流。这与先前研究的场景流程估计问题不同,这侧重于两个框架。我们介绍SPCM-NET架构,通过计算相邻点云之间的多尺度时空相关性,然后通过订单不变的复制单元计算多级时空相关性来解决这个问题。我们的实验评估证实,与仅使用两个框架相比,点云序列的复发处理导致SSFE明显更好。另外,我们证明可以有效地修改该方法,用于顺序点云预测(SPF),一种需要预测未来点云帧的相关问题。我们的实验结果是使用SSFE和SPF的新基准进行评估,包括合成和实时数据集。以前,场景流估计的数据集仅限于两个帧。我们为这些数据集提供非琐碎的扩展,用于多帧估计和预测。由于难以获得现实世界数据集的地面真理运动,我们使用自我监督的培训和评估指标。我们认为,该基准将在该领域的未来研究中关键。将可访问基准和型号的所有代码。
translated by 谷歌翻译
本文首先提出了一个有效的3D点云学习架构,名为PWCLO-NET的LIDAR ODOMORY。在该架构中,提出了3D点云的投影感知表示来将原始的3D点云组织成有序数据表单以实现效率。 LIDAR ODOMOMERY任务的金字塔,翘曲和成本量(PWC)结构是为估计和优化在分层和高效的粗良好方法中的姿势。建立一个投影感知的细心成本卷,以直接关联两个离散点云并获得嵌入运动模式。然后,提出了一种可训练的嵌入掩模来称量局部运动模式以回归整体姿势和过滤异常值点。可训练的姿势经线细化模块迭代地与嵌入式掩码进行分层优化,使姿势估计对异常值更加强大。整个架构是全能优化的端到端,实现成本和掩码的自适应学习,并且涉及点云采样和分组的所有操作都是通过投影感知的3D特征学习方法加速。在Kitti Ocomatry DataSet上证明了我们的激光乐队内径架构的卓越性能和有效性。我们的方法优于基于学习的所有基于学习的方法,甚至基于几何的方法,在大多数基于Kitti Odomatry数据集的序列上具有映射优化的遗传。
translated by 谷歌翻译
许多基于点的语义分割方法是为室内场景设计的,但如果它们被应用于户外环境中的LIDAR传感器捕获的点云,则他们挣扎。为了使这些方法更有效和坚固,使得它们可以处理LIDAR数据,我们介绍了重新建立基于3D点的操作的一般概念,使得它们可以在投影空间中运行。虽然我们通过三个基于点的方法显示了重新计算的版本速度快300到400倍,但实现了更高的准确性,但我们还证明了重新制定基于3D点的操作的概念允许设计统一益处的新架构基于点和基于图像的方法。作为示例,我们介绍一种网络,该网络将基于重新的3D点的操作集成到2D编码器 - 解码器架构中,该架构融合来自不同2D尺度的信息。我们评估了四个具有挑战性的语义LIDAR点云分割的方法,并显示利用基于2D图像的操作的重新推出的基于3D点的操作实现了所有四个数据集的非常好的结果。
translated by 谷歌翻译
Our dataset provides dense annotations for each scan of all sequences from the KITTI Odometry Benchmark [19]. Here, we show multiple scans aggregated using pose information estimated by a SLAM approach.
translated by 谷歌翻译
In many robotics and VR/AR applications, 3D-videos are readily-available sources of input (a continuous sequence of depth images, or LIDAR scans). However, these 3D-videos are processed frame-by-frame either through 2D convnets or 3D perception algorithms in many cases. In this work, we propose 4-dimensional convolutional neural networks for spatio-temporal perception that can directly process such 3D-videos using high-dimensional convolutions. For this, we adopt sparse tensors [8, 9] and propose the generalized sparse convolution which encompasses all discrete convolutions. To implement the generalized sparse convolution, we create an open-source auto-differentiation library for sparse tensors that provides extensive functions for highdimensional convolutional neural networks. 1 We create 4D spatio-temporal convolutional neural networks using the library and validate them on various 3D semantic segmentation benchmarks and proposed 4D datasets for 3D-video perception. To overcome challenges in the high-dimensional 4D space, we propose the hybrid kernel, a special case of the generalized sparse convolution, and the trilateral-stationary conditional random field that enforces spatio-temporal consistency in the 7D space-time-chroma space. Experimentally, we show that convolutional neural networks with only generalized sparse convolutions can outperform 2D or 2D-3D hybrid methods by a large margin. 2 Also, we show that on 3D-videos, 4D spatio-temporal convolutional neural networks are robust to noise, outperform 3D convolutional neural networks and are faster than the 3D counterpart in some cases.
translated by 谷歌翻译
We present a network architecture for processing point clouds that directly operates on a collection of points represented as a sparse set of samples in a high-dimensional lattice. Naïvely applying convolutions on this lattice scales poorly, both in terms of memory and computational cost, as the size of the lattice increases. Instead, our network uses sparse bilateral convolutional layers as building blocks. These layers maintain efficiency by using indexing structures to apply convolutions only on occupied parts of the lattice, and allow flexible specifications of the lattice structure enabling hierarchical and spatially-aware feature learning, as well as joint 2D-3D reasoning. Both point-based and image-based representations can be easily incorporated in a network with such layers and the resulting model can be trained in an end-to-end manner. We present results on 3D segmentation tasks where our approach outperforms existing state-of-the-art techniques.
translated by 谷歌翻译