变换同步是从给定的一组相对运动中恢复绝对变换的问题。尽管有其有用,但由于嘈杂和异常相对运动的影响,问题仍然具有挑战性,以及模拟分析并抑制它们高保真的难度。在这项工作中,我们避免了手工强大的损失功能,并建议使用图形神经网络(GNN)来学习转换同步。与使用复杂的多阶段管道的先前作品不同,我们使用迭代方法,其中每个步骤由单个重量共享消息传递层组成,通过预测切线空间中的增量更新,从前一个迭代中改进绝对姿势。为了减少异常值的影响,在聚合之前将加权消息。我们的迭代方法减轻了对明确初始化步骤的需求,并使用身份初始姿势进行良好。虽然我们的方法很简单,但我们表明它通过SO(3)和SE(3)同步的实验来对现有的手工和学习的同步方法进行有利的。
translated by 谷歌翻译
相机的估计与一组图像相关联的估计通常取决于图像之间的特征匹配。相比之下,我们是第一个通过使用对象区域来指导姿势估计问题而不是显式语义对象检测来应对这一挑战的人。我们提出了姿势炼油机网络(PosErnet),一个轻量级的图形神经网络,以完善近似的成对相对摄像头姿势。posernet利用对象区域之间的关联(简洁地表示为边界框),跨越了多个视图到全球完善的稀疏连接的视图图。我们在不同尺寸的图表上评估了7个尺寸的数据集,并展示了该过程如何有益于基于优化的运动平均算法,从而相对于基于边界框获得的初始估计,将旋转的中值误差提高了62度。代码和数据可在https://github.com/iit-pavis/posernet上找到。
translated by 谷歌翻译
如何提取重要点云特征并估计它们之间的姿势仍然是一个具有挑战性的问题,因为点云的固有缺乏结构和暧昧的顺序排列。尽管对大多数3D计算机视觉任务的基于深度学习的方法进行了重大改进,例如对象分类,对象分割和点云注册,但功能之间的一致性在现有的基于学习的流水线上仍然没有吸引力。在本文中,我们提出了一种用于复杂对准场景的新型学习的对齐网络,标题为深度特征一致性,并由三个主模块组成:多尺度图形特征合并网络,用于将几何对应集转换为高维特征,对应加权用于构建多个候选内部子集的模块,以及命名为深度特征匹配的Procrustes方法,用于给出闭合方案来估计相对姿势。作为深度特征匹配模块的最重要步骤,构造每个Inlier子集的特征一致性矩阵以获得其主要向量作为相应子集的含义似然性。我们全面地验证了我们在3DMATCH数据集和基提ODOMOTRY数据集中的方法的鲁棒性和有效性。对于大型室内场景,3DMATCH数据集上的注册结果表明,我们的方法优于最先进的传统和基于学习的方法。对于Kitti户外场景,我们的方法仍然能够降低转换错误。我们还在交叉数据集中探讨其强大的泛化能力。
translated by 谷歌翻译
3D point cloud registration is a fundamental problem in computer vision and robotics. Recently, learning-based point cloud registration methods have made great progress. However, these methods are sensitive to outliers, which lead to more incorrect correspondences. In this paper, we propose a novel deep graph matching-based framework for point cloud registration. Specifically, we first transform point clouds into graphs and extract deep features for each point. Then, we develop a module based on deep graph matching to calculate a soft correspondence matrix. By using graph matching, not only the local geometry of each point but also its structure and topology in a larger range are considered in establishing correspondences, so that more correct correspondences are found. We train the network with a loss directly defined on the correspondences, and in the test stage the soft correspondences are transformed into hard one-to-one correspondences so that registration can be performed by a correspondence-based solver. Furthermore, we introduce a transformer-based method to generate edges for graph construction, which further improves the quality of the correspondences. Extensive experiments on object-level and scene-level benchmark datasets show that the proposed method achieves state-of-the-art performance. The code is available at: \href{https://github.com/fukexue/RGM}{https://github.com/fukexue/RGM}.
translated by 谷歌翻译
This paper proposes a deep recurrent Rotation Averaging Graph Optimizer (RAGO) for Multiple Rotation Averaging (MRA). Conventional optimization-based methods usually fail to produce accurate results due to corrupted and noisy relative measurements. Recent learning-based approaches regard MRA as a regression problem, while these methods are sensitive to initialization due to the gauge freedom problem. To handle these problems, we propose a learnable iterative graph optimizer minimizing a gauge-invariant cost function with an edge rectification strategy to mitigate the effect of inaccurate measurements. Our graph optimizer iteratively refines the global camera rotations by minimizing each node's single rotation objective function. Besides, our approach iteratively rectifies relative rotations to make them more consistent with the current camera orientations and observed relative rotations. Furthermore, we employ a gated recurrent unit to improve the result by tracing the temporal information of the cost graph. Our framework is a real-time learning-to-optimize rotation averaging graph optimizer with a tiny size deployed for real-world applications. RAGO outperforms previous traditional and deep methods on real-world and synthetic datasets. The code is available at https://github.com/sfu-gruvi-3dv/RAGO
translated by 谷歌翻译
Erroneous feature matches have severe impact on subsequent camera pose estimation and often require additional, time-costly measures, like RANSAC, for outlier rejection. Our method tackles this challenge by addressing feature matching and pose optimization jointly. To this end, we propose a graph attention network to predict image correspondences along with confidence weights. The resulting matches serve as weighted constraints in a differentiable pose estimation. Training feature matching with gradients from pose optimization naturally learns to down-weight outliers and boosts pose estimation on image pairs compared to SuperGlue by 6.7% on ScanNet. At the same time, it reduces the pose estimation time by over 50% and renders RANSAC iterations unnecessary. Moreover, we integrate information from multiple views by spanning the graph across multiple frames to predict the matches all at once. Multi-view matching combined with end-to-end training improves the pose estimation metrics on Matterport3D by 18.8% compared to SuperGlue.
translated by 谷歌翻译
我们通过同步在点云上定义的学习函数的地图同步地图来共同寄存多种非刚性形状的新方法。尽管处理非刚性形状的能力在从计算机动画到3D数字化的各种应用中都是至关重要的,但文献仍然缺乏围绕闭塞观察到的真实,嘈杂的扫描的集合的稳健和灵活的框架。给定一组这样的点云,我们的方法首先计算通过功能映射参数化的成对对应关系。我们同时学习潜在的非正交基础函数,以有效地规范变形,同时以优雅的方式处理闭塞。为了最大限度地受益于推断成对变形字段提供的多向信息,我们通过我们的新颖和原则优化配方将成对功能映射与周期一致的整体同步。我们通过广泛的实验证明了我们的方法在注册准确性中实现了最先进的性能,同时可以灵活,高效,因为我们在统一框架中处理非刚性和多体案例并避免昂贵的优化优化通过使用基函数映射的置换。
translated by 谷歌翻译
Estimating 6D poses of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the input image can produce accurate results. In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. Given an initial pose estimation, our network is able to iteratively refine the pose by matching the rendered image against the observed image. The network is trained to predict a relative pose transformation using a disentangled representation of 3D location and 3D orientation and an iterative training process. Experiments on two commonly used benchmarks for 6D pose estimation demonstrate that DeepIM achieves large improvements over stateof-the-art methods. We furthermore show that DeepIM is able to match previously unseen objects.
translated by 谷歌翻译
作为解决多视图注册问题的有效算法,已经对运动平均(MA)算法进行了广泛的研究,并引入了许多基于MA的算法。他们旨在从相对动作中恢复全球动作,并利用信息冗余到平均累积错误。但是,这些方法的一个属性是,它们使用ugas-newton方法来解决最小二乘问题以增加全球运动的增加,这可能会导致效率低下,并且对异常值的稳健性差。在本文中,我们提出了一个新的运动平均框架,用于使用Laplacian基于Laplacian的最大Correntropy Criterion(LMCC)进行多视图注册。利用Lie代数运动框架和CorrentRopy量度,我们提出了一种新的成本函数,该功能应考虑相对动作提供的所有约束。获得用于纠正全局动作的增量,可以进一步提出为旨在最大化成本函数的优化问题。凭借二次技术,可以通过分为两个子问题来解决优化问题,即根据当前残差计算每个相对运动的重量,并解决二阶锥体程序问题(SOCP)以增加下一个迭代。我们还提供了一种新的策略来确定内核宽度,以确保我们的方法可以有效利用许多异常值的相对运动提供的信息冗余。最后,我们将提出的方法与其他基于MA的多视图注册方法进行比较,以验证其性能。关于合成和实际数据的实验测试表明,我们的方法在效率,准确性和鲁棒性方面取得了卓越的性能。
translated by 谷歌翻译
Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs. However, by only relying on close-by frame pairs, those approaches miss out on the richer long-range consistency between distant overlapping frames. To address this, we propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences. Our approach combines pairwise correspondence estimation and registration with a novel SE(3) transformation synchronization algorithm. Our key insight is that self-supervised multiview registration allows us to obtain correspondences over longer time frames; increasing both the diversity and difficulty of sampled pairs. We evaluate our approach on indoor scenes for correspondence estimation and RGB-D pointcloud registration and find that we perform on-par with supervised approaches.
translated by 谷歌翻译
我们提出了一种新颖的有效方法,用于通过几何拓扑来解决全球点云注册问题。基于许多点云成对注册方法(例如ICP),我们关注沿任何循环的转换组成的累积误差问题。本文的主要技术贡献是仅使用泊松方程式消除错误的线性方法。我们从Hodge-Helmhotz分解定理和在现实世界场景的多个RGBD数据集中进行了实验,证明了我们方法的一致性。实验结果还表明,我们的全球注册方法运行迅速并提供准确的重建。
translated by 谷歌翻译
由于稀疏和嘈杂的测量,不完整的观察和大转化,3D对象的点云注册是非常具有挑战性的。在这项工作中,我们提出了匹配共识网络(GMCNet)的图表匹配,该网络估计了ultrange 1偏向部分点云注册(PPR)的姿势不变的对应关系。为了编码强大的点描述符,1)我们首先全面调查各种几何特征的变换 - 鲁棒性和远征性。 2)然后,我们采用新颖的转换 - 强大的点变换器(TPT)模块,以自适应地聚合有关结构关系的本地特征,其利用手工旋转 - 不变($ RI $)功能和噪声弹性空间坐标。 3)基于分层图网络网络和图形建模的协同作用,我们提出了编码由I)从$ RI $特征中汲取的一项机会学习的强大描述符的分层图形建模(HGM)架构;并且ii)通过我们的TPT模块以不同尺度的相邻点关系编码的多个平滑术语。此外,我们用虚拟扫描构建一个具有挑战性的PPR数据集(MVP-RG)。广泛的实验表明,GMCNet优于PPR以前的最先进方法。值得注意的是,GMCNET编码每个点云的点描述符,而不使用CrossContexual信息,或接地真理对应进行培训。我们的代码和数据集将在https://github.com/paul007pl/gmcnet上获得。
translated by 谷歌翻译
我们提出了一种基于学习的刚性和可变形场景的基于学习方法的方法。LePard的关键特征是利用点云匹配的3D位置知识的以下方法:1)将点云表示分为特征空间和3D位置空间的架构。2)一种位置编码方法,其通过向量的点产品明确地明确地揭示了3D相对距离信息。3)修改交叉点云相对位置的重新定位技术。消融研究证明了上述技术的有效性。对于刚性点云匹配,Lepard在3DMatch / 3DLomatch基准上为93.6%/ 69.0%的注册召回设置了新的最先进的。在可变形的情况下,Lepard达到+ 27.1%/ + 34.8%的非刚性特征匹配召回,而不是我们新建的4dmatch / 4dlomatch基准测试的现有技术。
translated by 谷歌翻译
3D点云登记在遥感,摄影测量,机器人和几何计算机视觉中排名最基本的问题。由于3D特征匹配技术的准确性有限,因此可能存在异常值,有时即使在非常大的数字中,则在该对应中也是如此。由于现有的强大的求解器可能会遇到高计算成本或限制性的稳健性,因此我们提出了一种名为VoCra(具有成本函数和旋转平均的投票的新颖,快速,高度强大的解决方案,为极端异常率的点云注册问题。我们的第一款贡献是聘请Tukey的双重强大的成本来引入新的投票和对应分类技术,这证明是在异常值中区分真正的入世性,即使是极端(99%)的异常率。我们的第二次贡献包括基于强大的旋转平均设计时效的共识最大化范例,用于在通信中寻求Inlier候选人。最后,我们使用Tukey的Biweight(GNC-TB)应用毕业的非凸性,以估计所获得的Inlier候选者的正确变换,然后使用它来找到完整的Inlier集。进行了应用于两个实体数据问题的标准基准和现实实验,并且我们表明我们的求解器VORCA对超过99%的异常值较高,而且比最先进的竞争对手更多的时间效率。
translated by 谷歌翻译
生成一组高质量的对应关系或匹配是点云注册中最关键的步骤之一。本文通过共同考虑点对立的结构匹配来提出学习框架COTREG,以预测3D点云登记的对应关系。具体地,我们将这两个匹配转换为基于Wasserstein距离和基于Gromov-Wasserstein距离的优化。因此,建立对应关系的任务可以自然地重塑成耦合的最佳运输问题。此外,我们设计一个网络,以预测点云的每个点的置信度,其提供重叠区域信息以产生对应关系。我们的对应预测管道可以很容易地集成到基于学习的特征,如FCGF或FPFH等传统描述符。我们在3DMATCH,KITTI,3DCSR和ModelNet40基准上进行了全面的实验,显示了所提出的方法的最先进的性能。
translated by 谷歌翻译
Point cloud registration is a key problem for computer vision applied to robotics, medical imaging, and other applications. This problem involves finding a rigid transformation from one point cloud into another so that they align. Iterative Closest Point (ICP) and its variants provide simple and easily-implemented iterative methods for this task, but these algorithms can converge to spurious local optima.To address local optima and other difficulties in the ICP pipeline, we propose a learning-based method, titled Deep Closest Point (DCP), inspired by recent techniques in computer vision and natural language processing. Our model consists of three parts: a point cloud embedding network, an attention-based module combined with a pointer generation layer, to approximate combinatorial matching, and a differentiable singular value decomposition (SVD) layer to extract the final rigid transformation. We train our model end-to-end on the ModelNet40 dataset and show in several settings that it performs better than ICP, its variants (e.g., Go-ICP, FGR), and the recently-proposed learning-based method PointNetLK. Beyond providing a state-of-the-art registration technique, we evaluate the suitability of our learned features transferred to unseen objects. We also provide preliminary analysis of our learned model to help understand whether domain-specific and/or global features facilitate rigid registration.
translated by 谷歌翻译
最近的3D注册方法可以有效处理大规模或部分重叠的点对。然而,尽管具有实用性,但在空间尺度和密度方面与不平衡对匹配。我们提出了一种新颖的3D注册方法,称为uppnet,用于不平衡点对。我们提出了一个层次结构框架,通过逐渐减少搜索空间,可以有效地找到近距离的对应关系。我们的方法预测目标点的子区域可能与查询点重叠。以下超点匹配模块和细粒度的细化模块估计两个点云之间的准确对应关系。此外,我们应用几何约束来完善满足空间兼容性的对应关系。对应性预测是对端到端训练的,我们的方法可以通过单个前向通行率预测适当的刚体转换,并给定点云对。为了验证提出方法的疗效,我们通过增强Kitti LiDAR数据集创建Kitti-UPP数据集。该数据集的实验表明,所提出的方法显着优于最先进的成对点云注册方法,而当目标点云大约为10 $ \ times $ higation时,注册召回率的提高了78%。比查询点云大约比查询点云更密集。
translated by 谷歌翻译
We introduce an approach for recovering the 6D pose of multiple known objects in a scene captured by a set of input images with unknown camera viewpoints. First, we present a single-view single-object 6D pose estimation method, which we use to generate 6D object pose hypotheses. Second, we develop a robust method for matching individual 6D object pose hypotheses across different input images in order to jointly estimate camera viewpoints and 6D poses of all objects in a single consistent scene. Our approach explicitly handles object symmetries, does not require depth measurements, is robust to missing or incorrect object hypotheses, and automatically recovers the number of objects in the scene. Third, we develop a method for global scene refinement given multiple object hypotheses and their correspondences across views. This is achieved by solving an object-level bundle adjustment problem that refines the poses of cameras and objects to minimize the reprojection error in all views. We demonstrate that the proposed method, dubbed Cosy-Pose, outperforms current state-of-the-art results for single-view and multi-view 6D object pose estimation by a large margin on two challenging benchmarks: the YCB-Video and T-LESS datasets. Code and pre-trained models are available on the project webpage. 5
translated by 谷歌翻译
部分重叠点云的实时登记具有对自治车辆和多助手SLAM的合作看法的新兴应用。这些应用中点云之间的相对转换高于传统的SLAM和OCOMOTRY应用程序,这挑战了对应的识别和成功的注册。在本文中,我们提出了一种用于部分重叠点云的新颖注册方法,其中使用有效的点亮特征编码器学习对应关系,并使用基于图形的注意网络改进。这种注意网络利用关键点之间的几何关系,以改善点云中的匹配,低重叠。在推断时间下,通过通过样本共识稳健地拟合对应关系来获得相对姿态变换。在基蒂数据集和新的合成数据集上进行评估,包括低重叠点云,位移高达30米。所提出的方法在Kitti DataSet上使用最先进的方法实现了对映射性能,并且优于低重叠点云的现有方法。此外,所提出的方法可以比竞争方法更快地实现更快的推理时间,低至410ms,低至410ms。我们的代码和数据集可在https://github.com/eduardohenriquearnold/fastreg提供。
translated by 谷歌翻译
基于学习的3D点云注册的任务已经取得了很大的进展,即使在部分到部分匹配方案中,现有方法也在ModelNET40等标准基准上产生未完成的结果。不幸的是,这些方法仍然在实际数据存在下挣扎。在这项工作中,我们确定了这些失败的来源,分析了它们背后的原因,并提出解决它们的解决方案。我们将我们的调查结果总结为一系列准则,并通过将它们应用于不同的基线方法,DCP和IDAM来证明其有效性。简而言之,我们的指导方针改善了它们的培训融合和测试准确性。最终,这转换为最佳实践的3D注册网络(BPNET),构成了一种能够在真实数据中处理先前未经操作的基于学习的方法。尽管仅对合成数据进行培训,但我们的模型将推广到实际数据,而无需任何微调,达到使用商业传感器获得的看不见物体的点云达到高达67%的准确性。
translated by 谷歌翻译