Recent progress in geometric computer vision has shown significant advances in reconstruction and novel view rendering from multiple views by capturing the scene as a neural radiance field. Such approaches have changed the paradigm of reconstruction but need a plethora of views and do not make use of object shape priors. On the other hand, deep learning has shown how to use priors in order to infer shape from single images. Such approaches, though, require that the object is reconstructed in a canonical pose or assume that object pose is known during training. In this paper, we address the problem of how to compute equivariant priors for reconstruction from a few images, given the relative poses of the cameras. Our proposed reconstruction is $SE(3)$-gauge equivariant, meaning that it is equivariant to the choice of world frame. To achieve this, we make two novel contributions to light field processing: we define light field convolution and we show how it can be approximated by intra-view $SE(2)$ convolutions because the original light field convolution is computationally and memory-wise intractable; we design a map from the light field to $\mathbb{R}^3$ that is equivariant to the transformation of the world frame and to the rotation of the views. We demonstrate equivariance by obtaining robust results in roto-translated datasets without performing transformation augmentation.
translated by 谷歌翻译
定义网格上卷积的常用方法是将它们作为图形解释并应用图形卷积网络(GCN)。这种GCNS利用各向同性核,因此对顶点的相对取向不敏感,从而对整个网格的几何形状。我们提出了规范的等分性网状CNN,它概括了GCNS施加各向异性仪表等级核。由于产生的特征携带方向信息,我们引入了通过网格边缘并行传输特征来定义的几何消息传递方案。我们的实验验证了常规GCN和其他方法的提出模型的显着提高的表达性。
translated by 谷歌翻译
我们在从傅立叶角度得出的同质空间上引入了一个统一的框架。我们解决了卷积层之前和之后的特征场的情况。我们通过利用提起的特征场的傅立叶系数的稀疏性来提出通过傅立叶域的统一推导。当同质空间的稳定子亚组是一个紧凑的谎言组时,稀疏性就会出现。我们进一步通过元素定位元素非线性引入了一种激活方法,并通过均等卷积抬起并投射回现场。我们表明,其他将特征视为稳定器亚组中傅立叶系数的方法是我们激活的特殊情况。$ SO(3)$和$ SE(3)$进行的实验显示了球形矢量场回归,点云分类和分子完成中的最新性能。
translated by 谷歌翻译
Coordinate-based implicit neural networks, or neural fields, have emerged as useful representations of shape and appearance in 3D computer vision. Despite advances however, it remains challenging to build neural fields for categories of objects without datasets like ShapeNet that provide canonicalized object instances that are consistently aligned for their 3D position and orientation (pose). We present Canonical Field Network (CaFi-Net), a self-supervised method to canonicalize the 3D pose of instances from an object category represented as neural fields, specifically neural radiance fields (NeRFs). CaFi-Net directly learns from continuous and noisy radiance fields using a Siamese network architecture that is designed to extract equivariant field features for category-level canonicalization. During inference, our method takes pre-trained neural radiance fields of novel object instances at arbitrary 3D pose, and estimates a canonical field with consistent 3D pose across the entire category. Extensive experiments on a new dataset of 1300 NeRF models across 13 object categories show that our method matches or exceeds the performance of 3D point cloud-based methods.
translated by 谷歌翻译
本文提出了一种新的点云卷积结构,该结构学习了SE(3) - 等级功能。与现有的SE(3) - 等级网络相比,我们的设计轻巧,简单且灵活,可以合并到一般的点云学习网络中。我们通过为特征地图选择一个非常规域,在模型的复杂性和容量之间取得平衡。我们通过正确离散$ \ mathbb {r}^3 $来完全利用旋转对称性来进一步减少计算负载。此外,我们采用置换层从其商空间中恢复完整的SE(3)组。实验表明,我们的方法在各种任务中实现了可比或卓越的性能,同时消耗的内存和运行速度要比现有工作更快。所提出的方法可以在基于点云的各种实用应用中促进模棱两可的特征学习,并激发现实世界应用的Equivariant特征学习的未来发展。
translated by 谷歌翻译
点云分析没有姿势前导者在真实应用中非常具有挑战性,因为点云的方向往往是未知的。在本文中,我们提出了一个全新的点集学习框架prin,即点亮旋转不变网络,专注于点云分析中的旋转不变特征提取。我们通过密度意识的自适应采样构建球形信号,以处理球形空间中的扭曲点分布。提出了球形Voxel卷积和点重新采样以提取每个点的旋转不变特征。此外,我们将Prin扩展到称为Sprin的稀疏版本,直接在稀疏点云上运行。 Prin和Sprin都可以应用于从对象分类,部分分割到3D特征匹配和标签对齐的任务。结果表明,在随机旋转点云的数据集上,Sprin比无任何数据增强的最先进方法表现出更好的性能。我们还为我们的方法提供了彻底的理论证明和分析,以实现我们的方法实现的点明智的旋转不变性。我们的代码可在https://github.com/qq456cvb/sprin上找到。
translated by 谷歌翻译
The principle of equivariance to symmetry transformations enables a theoretically grounded approach to neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. Here we show how this principle can be extended beyond global symmetries to local gauge transformations. This enables the development of a very general class of convolutional neural networks on manifolds that depend only on the intrinsic geometry, and which includes many popular methods from equivariant and geometric deep learning.We implement gauge equivariant CNNs for signals defined on the surface of the icosahedron, which provides a reasonable approximation of the sphere. By choosing to work with this very regular manifold, we are able to implement the gauge equivariant convolution using a single conv2d call, making it a highly scalable and practical alternative to Spherical CNNs. Using this method, we demonstrate substantial improvements over previous methods on the task of segmenting omnidirectional images and global climate patterns.
translated by 谷歌翻译
自从神经辐射场(NERF)出现以来,神经渲染引起了极大的关注,并且已经大大推动了新型视图合成的最新作品。最近的重点是在模型上过度适合单个场景,以及学习模型的一些尝试,这些模型可以综合看不见的场景的新型视图,主要包括将深度卷积特征与类似NERF的模型组合在一起。我们提出了一个不同的范式,不需要深层特征,也不需要类似NERF的体积渲染。我们的方法能够直接从现场采样的贴片集中直接预测目标射线的颜色。我们首先利用表现几何形状沿着每个参考视图的异性线提取斑块。每个贴片线性地投影到1D特征向量和一系列变压器处理集合中。对于位置编码,我们像在光场表示中一样对射线进行参数化,并且至关重要的差异是坐标是相对于目标射线的规范化的,这使我们的方法与参考帧无关并改善了概括。我们表明,即使接受比先前的工作要少得多的数据训练,我们的方法在新颖的综合综合方面都超出了最新的视图综合。
translated by 谷歌翻译
事实证明,与对称性的对称性在深度学习研究中是一种强大的归纳偏见。关于网格处理的最新著作集中在各种天然对称性上,包括翻译,旋转,缩放,节点排列和仪表变换。迄今为止,没有现有的体系结构与所有这些转换都不相同。在本文中,我们提出了一个基于注意力的网格数据的架构,该体系结构与上述所有转换相似。我们的管道依赖于相对切向特征的使用:一种简单,有效,等效性的替代品,可作为输入作为输入。有关浮士德和TOSCA数据集的实验证实,我们提出的架构在这些基准测试中的性能提高了,并且确实是对各种本地/全球转换的均等,因此具有强大的功能。
translated by 谷歌翻译
从低级视觉理论中出现,可说的过滤器在先前的卷积神经网络上的工作中发现了对应物,等同于僵化的转换。在我们的工作中,我们提出了一种基于球形决策表面的神经元组成的基于馈送的可转向学习方法,并在点云上运行。这种球形神经元是通过欧几里得空间的共形嵌入来获得的,最近在点集的学习表示中被重新审视。为了关注3D几何形状,我们利用球形神经元的等轴测特性,并得出3D可识别性约束。在训练球形神经元以在规范方向上分类点云之后,我们使用四面体基础来使神经元四倍,并构建旋转 - 等级的球形滤波器库。然后,我们应用派生的约束来插值过滤器库输出,从而获得旋转不变的网络。最后,我们使用合成点集和现实世界3D骨架数据来验证我们的理论发现。该代码可在https://github.com/pavlo-melnyk/steerable-3d-neurons上找到。
translated by 谷歌翻译
由于其在翻译下的增强/不变性,卷积网络成功。然而,在坐标系的旋转取向不会影响数据的含义(例如对象分类)的情况下,诸如图像,卷,形状或点云的可旋转数据需要在旋转下的增强/不变性处理。另一方面,在旋转很重要的情况下是必要的估计/处理旋转(例如运动估计)。最近在所有这些方面的方法和理论方面取得了进展。在这里,我们提供了2D和3D旋转(以及翻译)的现有方法的概述,以及识别它们之间的共性和链接。
translated by 谷歌翻译
线性神经网络层的模棱两可。在这项工作中,我们放宽了肩variance条件,只有在投影范围内才是真实的。特别是,我们研究了投射性和普通的肩那样的关系,并表明对于重要的例子,这些问题实际上是等效的。3D中的旋转组在投影平面上投影起作用。在设计用于过滤2D-2D对应的网络时,我们在实验上研究了旋转肩位的实际重要性。完全模型的模型表现不佳,虽然简单地增加了不变的特征,从而在强大的基线产量中得到了改善,但这似乎并不是由于改善的均衡性。
translated by 谷歌翻译
基于2D图像的3D对象的推理由于从不同方向查看对象引起的外观差异很大,因此具有挑战性。理想情况下,我们的模型将是对物体姿势变化的不变或等效的。不幸的是,对于2D图像输入,这通常是不可能的,因为我们没有一个先验模型,即在平面外对象旋转下如何改变图像。唯一的$ \ mathrm {so}(3)$ - 当前存在的模型需要点云输入而不是2D图像。在本文中,我们提出了一种基于Icosahedral群卷积的新型模型体系结构,即通过将输入图像投影到iCosahedron上,以$ \ mathrm {so(3)} $中的理由。由于此投影,该模型大致与$ \ mathrm {so}(3)$中的旋转大致相当。我们将此模型应用于对象构成估计任务,并发现它的表现优于合理的基准。
translated by 谷歌翻译
A wide range of techniques have been proposed in recent years for designing neural networks for 3D data that are equivariant under rotation and translation of the input. Most approaches for equivariance under the Euclidean group $\mathrm{SE}(3)$ of rotations and translations fall within one of the two major categories. The first category consists of methods that use $\mathrm{SE}(3)$-convolution which generalizes classical $\mathbb{R}^3$-convolution on signals over $\mathrm{SE}(3)$. Alternatively, it is possible to use \textit{steerable convolution} which achieves $\mathrm{SE}(3)$-equivariance by imposing constraints on $\mathbb{R}^3$-convolution of tensor fields. It is known by specialists in the field that the two approaches are equivalent, with steerable convolution being the Fourier transform of $\mathrm{SE}(3)$ convolution. Unfortunately, these results are not widely known and moreover the exact relations between deep learning architectures built upon these two approaches have not been precisely described in the literature on equivariant deep learning. In this work we provide an in-depth analysis of both methods and their equivalence and relate the two constructions to multiview convolutional networks. Furthermore, we provide theoretical justifications of separability of $\mathrm{SE}(3)$ group convolution, which explain the applicability and success of some recent approaches. Finally, we express different methods using a single coherent formalism and provide explicit formulas that relate the kernels learned by different methods. In this way, our work helps to unify different previously-proposed techniques for achieving roto-translational equivariance, and helps to shed light on both the utility and precise differences between various alternatives. We also derive new TFN non-linearities from our equivalence principle and test them on practical benchmark datasets.
translated by 谷歌翻译
包括协调性信息,例如位置,力,速度或旋转在计算物理和化学中的许多任务中是重要的。我们介绍了概括了等级图形网络的可控e(3)的等值图形神经网络(Segnns),使得节点和边缘属性不限于不变的标量,而是可以包含相协同信息,例如矢量或张量。该模型由可操纵的MLP组成,能够在消息和更新功能中包含几何和物理信息。通过可操纵节点属性的定义,MLP提供了一种新的Activation函数,以便与可转向功能字段一般使用。我们讨论我们的镜头通过等级的非线性卷曲镜头讨论我们的相关工作,进一步允许我们引脚点点的成功组件:非线性消息聚集在经典线性(可操纵)点卷积上改善;可操纵的消息在最近发送不变性消息的最近的等价图形网络上。我们展示了我们对计算物理学和化学的若干任务的方法的有效性,并提供了广泛的消融研究。
translated by 谷歌翻译
新型视图综合的古典光场渲染可以准确地再现视图依赖性效果,例如反射,折射和半透明,但需要一个致密的视图采样的场景。基于几何重建的方法只需要稀疏的视图,但不能准确地模拟非兰伯语的效果。我们介绍了一个模型,它结合了强度并减轻了这两个方向的局限性。通过在光场的四维表示上操作,我们的模型学会准确表示依赖视图效果。通过在训练和推理期间强制执行几何约束,从稀疏的视图集中毫无屏蔽地学习场景几何。具体地,我们介绍了一种基于两级变压器的模型,首先沿着ePipoll线汇总特征,然后沿参考视图聚合特征以产生目标射线的颜色。我们的模型在多个前进和360 {\ DEG}数据集中优于最先进的,具有较大的差别依赖变化的场景更大的边缘。
translated by 谷歌翻译
我们提出了一种准确的3D重建方法的方法。我们基于神经重建和渲染(例如神经辐射场(NERF))的最新进展的优势。这种方法的一个主要缺点是,它们未能重建对象的任何部分,这些部分在训练图像中不明确可见,这通常是野外图像和视频的情况。当缺乏证据时,可以使用诸如对称的结构先验来完成缺失的信息。但是,在神经渲染中利用此类先验是高度不平凡的:虽然几何和非反射材料可能是对称的,但环境场景的阴影和反射通常不是对称的。为了解决这个问题,我们将软对称性约束应用于3D几何和材料特性,并将外观纳入照明,反照率和反射率。我们在最近引入的CO3D数据集上评估了我们的方法,这是由于重建高度反射材料的挑战,重点是汽车类别。我们表明,它可以用高保真度重建未观察到的区域,并渲染高质量的新型视图图像。
translated by 谷歌翻译
Steerable convolutional neural networks (CNNs) provide a general framework for building neural networks equivariant to translations and other transformations belonging to an origin-preserving group $G$, such as reflections and rotations. They rely on standard convolutions with $G$-steerable kernels obtained by analytically solving the group-specific equivariance constraint imposed onto the kernel space. As the solution is tailored to a particular group $G$, the implementation of a kernel basis does not generalize to other symmetry transformations, which complicates the development of group equivariant models. We propose using implicit neural representation via multi-layer perceptrons (MLPs) to parameterize $G$-steerable kernels. The resulting framework offers a simple and flexible way to implement Steerable CNNs and generalizes to any group $G$ for which a $G$-equivariant MLP can be built. We apply our method to point cloud (ModelNet-40) and molecular data (QM9) and demonstrate a significant improvement in performance compared to standard Steerable CNNs.
translated by 谷歌翻译
将3D坐标映射到签名距离函数(SDF)或占用值的神经网络具有启用对象形状的高保真隐式表示。本文开发了一种新的形状模型,允许通过优化连续符号定向距离功能(SDDF)来合成新颖距离视图。与Deep SDF模型类似,我们的SDDF配方可以代表整个类别的形状并从部分输入数据中跨越形状填写或插入。与SDF不同,该SDF在任何方向上测量到最近表面的距离,SDDF测量给定方向的距离。这允许训练没有3D形状监控的SDDF模型,仅使用距离测量,从深度相机或激光雷达传感器易获得。我们的模型还通过直接在任意位置和观察方向上直接预测距离,去除像表面提取或渲染的后处理步骤。与深色视角综合技术不同,例如培训高容量黑盒型号的神经辐射字段,我们的模型通过构造SDDF值沿着观察方向线性降低的性质。这种结构约束不仅导致维度降低,而且还提供了关于SDDF预测的准确性的分析信心,无论到物体表面的距离如何。
translated by 谷歌翻译
Humans form mental images of 3D scenes to support counterfactual imagination, planning, and motor control. Our abilities to predict the appearance and affordance of the scene from previously unobserved viewpoints aid us in performing manipulation tasks (e.g., 6-DoF kitting) with a level of ease that is currently out of reach for existing robot learning frameworks. In this work, we aim to build artificial systems that can analogously plan actions on top of imagined images. To this end, we introduce Mental Imagery for Robotic Affordances (MIRA), an action reasoning framework that optimizes actions with novel-view synthesis and affordance prediction in the loop. Given a set of 2D RGB images, MIRA builds a consistent 3D scene representation, through which we synthesize novel orthographic views amenable to pixel-wise affordances prediction for action optimization. We illustrate how this optimization process enables us to generalize to unseen out-of-plane rotations for 6-DoF robotic manipulation tasks given a limited number of demonstrations, paving the way toward machines that autonomously learn to understand the world around them for planning actions.
translated by 谷歌翻译