由于其在翻译下的增强/不变性,卷积网络成功。然而,在坐标系的旋转取向不会影响数据的含义(例如对象分类)的情况下,诸如图像,卷,形状或点云的可旋转数据需要在旋转下的增强/不变性处理。另一方面,在旋转很重要的情况下是必要的估计/处理旋转(例如运动估计)。最近在所有这些方面的方法和理论方面取得了进展。在这里,我们提供了2D和3D旋转(以及翻译)的现有方法的概述,以及识别它们之间的共性和链接。
translated by 谷歌翻译
基于2D图像的3D对象的推理由于从不同方向查看对象引起的外观差异很大,因此具有挑战性。理想情况下,我们的模型将是对物体姿势变化的不变或等效的。不幸的是,对于2D图像输入,这通常是不可能的,因为我们没有一个先验模型,即在平面外对象旋转下如何改变图像。唯一的$ \ mathrm {so}(3)$ - 当前存在的模型需要点云输入而不是2D图像。在本文中,我们提出了一种基于Icosahedral群卷积的新型模型体系结构,即通过将输入图像投影到iCosahedron上,以$ \ mathrm {so(3)} $中的理由。由于此投影,该模型大致与$ \ mathrm {so}(3)$中的旋转大致相当。我们将此模型应用于对象构成估计任务,并发现它的表现优于合理的基准。
translated by 谷歌翻译
卷积神经网络(CNNS)非常有效,因为它们利用自然图像的固有转换不变性。但是,翻译只是无数的有用空间转换之一。在考虑其他空间的侵犯侵犯性时可以获得相同的效率吗?过去已经考虑过这种广义综合,但以高计算成本为例。我们展示了一个简单和精确的建筑,但标准卷积具有相同的计算复杂性。它由一个恒定的图像扭曲,后跟一个简单的卷积,这是深度学习工具箱中的标准块。通过精心制作的经线,所产生的架构可以使成功的架构成为各种各样的双参数空间转换。我们展示了令人鼓舞的现实情景结果,包括谷歌地球数据集(旋转和缩放)中车辆姿势的估计,并且面部在野外注释的面部地标中的面部姿势(在透视下的3D旋转)。
translated by 谷歌翻译
Translating or rotating an input image should not affect the results of many computer vision tasks. Convolutional neural networks (CNNs) are already translation equivariant: input image translations produce proportionate feature map translations. This is not the case for rotations. Global rotation equivariance is typically sought through data augmentation, but patch-wise equivariance is more difficult. We present Harmonic Networks or H-Nets, a CNN exhibiting equivariance to patch-wise translation and 360-rotation. We achieve this by replacing regular CNN filters with circular harmonics, returning a maximal response and orientation for every receptive field patch.H-Nets use a rich, parameter-efficient and fixed computational complexity representation, and we show that deep feature maps within the network encode complicated rotational invariants. We demonstrate that our layers are general enough to be used in conjunction with the latest architectures and techniques, such as deep supervision and batch normalization. We also achieve state-of-the-art classification on rotated-MNIST, and competitive results on other benchmark challenges.
translated by 谷歌翻译
Coordinate-based implicit neural networks, or neural fields, have emerged as useful representations of shape and appearance in 3D computer vision. Despite advances however, it remains challenging to build neural fields for categories of objects without datasets like ShapeNet that provide canonicalized object instances that are consistently aligned for their 3D position and orientation (pose). We present Canonical Field Network (CaFi-Net), a self-supervised method to canonicalize the 3D pose of instances from an object category represented as neural fields, specifically neural radiance fields (NeRFs). CaFi-Net directly learns from continuous and noisy radiance fields using a Siamese network architecture that is designed to extract equivariant field features for category-level canonicalization. During inference, our method takes pre-trained neural radiance fields of novel object instances at arbitrary 3D pose, and estimates a canonical field with consistent 3D pose across the entire category. Extensive experiments on a new dataset of 1300 NeRF models across 13 object categories show that our method matches or exceeds the performance of 3D point cloud-based methods.
translated by 谷歌翻译
从低级视觉理论中出现,可说的过滤器在先前的卷积神经网络上的工作中发现了对应物,等同于僵化的转换。在我们的工作中,我们提出了一种基于球形决策表面的神经元组成的基于馈送的可转向学习方法,并在点云上运行。这种球形神经元是通过欧几里得空间的共形嵌入来获得的,最近在点集的学习表示中被重新审视。为了关注3D几何形状,我们利用球形神经元的等轴测特性,并得出3D可识别性约束。在训练球形神经元以在规范方向上分类点云之后,我们使用四面体基础来使神经元四倍,并构建旋转 - 等级的球形滤波器库。然后,我们应用派生的约束来插值过滤器库输出,从而获得旋转不变的网络。最后,我们使用合成点集和现实世界3D骨架数据来验证我们的理论发现。该代码可在https://github.com/pavlo-melnyk/steerable-3d-neurons上找到。
translated by 谷歌翻译
模棱两可的神经网络,其隐藏的特征根据G组作用于数据的表示,表现出训练效率和提高的概括性能。在这项工作中,我们将群体不变和模棱两可的表示学习扩展到无监督的深度学习领域。我们根据编码器框架提出了一种通用学习策略,其中潜在表示以不变的术语和模棱两可的组动作组件分开。关键的想法是,网络学会通过学习预测适当的小组操作来对齐输入和输出姿势以解决重建任务的适当组动作来编码和从组不变表示形式进行编码和解码数据。我们在Equivariant编码器上得出必要的条件,并提出了对任何G(离散且连续的)有效的构造。我们明确描述了我们的旋转,翻译和排列的构造。我们在采用不同网络体系结构的各种数据类型的各种实验中测试了方法的有效性和鲁棒性。
translated by 谷歌翻译
点云分析没有姿势前导者在真实应用中非常具有挑战性,因为点云的方向往往是未知的。在本文中,我们提出了一个全新的点集学习框架prin,即点亮旋转不变网络,专注于点云分析中的旋转不变特征提取。我们通过密度意识的自适应采样构建球形信号,以处理球形空间中的扭曲点分布。提出了球形Voxel卷积和点重新采样以提取每个点的旋转不变特征。此外,我们将Prin扩展到称为Sprin的稀疏版本,直接在稀疏点云上运行。 Prin和Sprin都可以应用于从对象分类,部分分割到3D特征匹配和标签对齐的任务。结果表明,在随机旋转点云的数据集上,Sprin比无任何数据增强的最先进方法表现出更好的性能。我们还为我们的方法提供了彻底的理论证明和分析,以实现我们的方法实现的点明智的旋转不变性。我们的代码可在https://github.com/qq456cvb/sprin上找到。
translated by 谷歌翻译
We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without increasing the number of parameters. Group convolution layers are easy to use and can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. G-CNNs achieve state of the art results on CI-FAR10 and rotated MNIST.
translated by 谷歌翻译
Recent progress in geometric computer vision has shown significant advances in reconstruction and novel view rendering from multiple views by capturing the scene as a neural radiance field. Such approaches have changed the paradigm of reconstruction but need a plethora of views and do not make use of object shape priors. On the other hand, deep learning has shown how to use priors in order to infer shape from single images. Such approaches, though, require that the object is reconstructed in a canonical pose or assume that object pose is known during training. In this paper, we address the problem of how to compute equivariant priors for reconstruction from a few images, given the relative poses of the cameras. Our proposed reconstruction is $SE(3)$-gauge equivariant, meaning that it is equivariant to the choice of world frame. To achieve this, we make two novel contributions to light field processing: we define light field convolution and we show how it can be approximated by intra-view $SE(2)$ convolutions because the original light field convolution is computationally and memory-wise intractable; we design a map from the light field to $\mathbb{R}^3$ that is equivariant to the transformation of the world frame and to the rotation of the views. We demonstrate equivariance by obtaining robust results in roto-translated datasets without performing transformation augmentation.
translated by 谷歌翻译
许多应用程序需要神经网络的鲁棒性或理想的不变性,以使输入数据的某些转换。最常见的是,通过使用对抗性培训或定义包括设计所需不变性的网络体系结构来解决此要求。在这项工作中,我们提出了一种方法,使网络体系结构通过基于固定标准从(可能连续的)轨道中选择一个元素,从而使网络体系结构相对于小组操作证明是不变的。简而言之,我们打算在将数据馈送到实际网络之前“撤消”任何可能的转换。此外,我们凭经验分析了通过训练或体系结构结合不变性的不同方法的特性,并在鲁棒性和计算效率方面证明了我们方法的优势。特别是,我们研究了图像旋转(可以持续到离散化工件)以及3D点云分类的可证明的方向和缩放不变性方面的鲁棒性。
translated by 谷歌翻译
本文提出了一种新的点云卷积结构,该结构学习了SE(3) - 等级功能。与现有的SE(3) - 等级网络相比,我们的设计轻巧,简单且灵活,可以合并到一般的点云学习网络中。我们通过为特征地图选择一个非常规域,在模型的复杂性和容量之间取得平衡。我们通过正确离散$ \ mathbb {r}^3 $来完全利用旋转对称性来进一步减少计算负载。此外,我们采用置换层从其商空间中恢复完整的SE(3)组。实验表明,我们的方法在各种任务中实现了可比或卓越的性能,同时消耗的内存和运行速度要比现有工作更快。所提出的方法可以在基于点云的各种实用应用中促进模棱两可的特征学习,并激发现实世界应用的Equivariant特征学习的未来发展。
translated by 谷歌翻译
标准卷积神经网络(CNN)的卷积层与翻译一样。然而,卷积和完全连接的层与其他仿射几何变换并不是等等的或不变的。最近,提出了一类新的CNN,其中CNN的常规层被均衡卷积,合并和批量归一化层代替。 eprovariant神经网络中的最终分类层对于不同的仿射几何变换(例如旋转,反射和翻译)是不变的,并且标量值是通过消除过滤器响应的空间尺寸,使用卷积和向下缩采样的整个网络或平均值来获得。接管过滤器响应。在这项工作中,我们建议整合正交力矩,该矩将功能的高阶统计数据作为编码全局不变性在旋转,反射和翻译中的有效手段。结果,网络的中间层变得模棱两可,而分类层变得不变。出于这个目的,考虑使用最广泛使用的Zernike,伪菜单和正交傅立叶粉刺矩。通过在旋转的MNIST和CIFAR10数据集上集成了组等级CNN(G-CNN)的体系结构中的不变过渡和完全连接的层来评估所提出的工作的有效性。
translated by 谷歌翻译
定义网格上卷积的常用方法是将它们作为图形解释并应用图形卷积网络(GCN)。这种GCNS利用各向同性核,因此对顶点的相对取向不敏感,从而对整个网格的几何形状。我们提出了规范的等分性网状CNN,它概括了GCNS施加各向异性仪表等级核。由于产生的特征携带方向信息,我们引入了通过网格边缘并行传输特征来定义的几何消息传递方案。我们的实验验证了常规GCN和其他方法的提出模型的显着提高的表达性。
translated by 谷歌翻译
合并对称性可以通过定义通过转换相关的数据样本的等效类别来导致高度数据效率和可推广的模型。但是,表征转换如何在输入数据上作用通常很困难,从而限制了模型模型的适用性。我们提出了编码输入空间(例如图像)的学习对称嵌入网络(SENS),我们不知道转换的效果(例如旋转),以在这些操作下以已知方式转换的特征空间。可以通过模棱两可的任务网络端对端训练该网络,以学习明确的对称表示。我们在具有3种不同形式的对称形式的模棱两可的过渡模型的背景下验证了这种方法。我们的实验表明,SENS有助于将模棱两可的网络应用于具有复杂对称表示的数据。此外,相对于全等级和非等价基线的准确性和泛化可以提高准确性和概括。
translated by 谷歌翻译
A wide range of techniques have been proposed in recent years for designing neural networks for 3D data that are equivariant under rotation and translation of the input. Most approaches for equivariance under the Euclidean group $\mathrm{SE}(3)$ of rotations and translations fall within one of the two major categories. The first category consists of methods that use $\mathrm{SE}(3)$-convolution which generalizes classical $\mathbb{R}^3$-convolution on signals over $\mathrm{SE}(3)$. Alternatively, it is possible to use \textit{steerable convolution} which achieves $\mathrm{SE}(3)$-equivariance by imposing constraints on $\mathbb{R}^3$-convolution of tensor fields. It is known by specialists in the field that the two approaches are equivalent, with steerable convolution being the Fourier transform of $\mathrm{SE}(3)$ convolution. Unfortunately, these results are not widely known and moreover the exact relations between deep learning architectures built upon these two approaches have not been precisely described in the literature on equivariant deep learning. In this work we provide an in-depth analysis of both methods and their equivalence and relate the two constructions to multiview convolutional networks. Furthermore, we provide theoretical justifications of separability of $\mathrm{SE}(3)$ group convolution, which explain the applicability and success of some recent approaches. Finally, we express different methods using a single coherent formalism and provide explicit formulas that relate the kernels learned by different methods. In this way, our work helps to unify different previously-proposed techniques for achieving roto-translational equivariance, and helps to shed light on both the utility and precise differences between various alternatives. We also derive new TFN non-linearities from our equivalence principle and test them on practical benchmark datasets.
translated by 谷歌翻译
在这项工作中,我们调查如何实现方面,以纯粹来自数据的平台输入变换,而不会被赋予那些转换的模型。例如,卷积神经网络(CNNS)是对图像转换的等意识别,可以容易地建模的变换(通过垂直或水平地移动像素)。其他转换,例如外平面旋转,不承认一个简单的分析模型。我们提出了一种自动编码器架构,其嵌入了obeeys同时嵌入了一组任意的标准关系,例如翻译,旋转,颜色变化以及许多其他。这意味着它可以拍摄输入图像,并产生由之前未观察到的给定金额的版本(例如,相同对象的不同观点或颜色变化)。尽管延伸到许多(甚至是非几何)转换,但我们的模型在翻译标准规范的特殊情况下完全缩短了CNN。协调对深度网络的可解释性和稳健性是重要的,并且我们证明了在几个合成和实际数据集上成功重新渲染的输入图像的转换版本的结果,以及对象姿态估计的结果。
translated by 谷歌翻译
Equivariance of neural networks to transformations helps to improve their performance and reduce generalization error in computer vision tasks, as they apply to datasets presenting symmetries (e.g. scalings, rotations, translations). The method of moving frames is classical for deriving operators invariant to the action of a Lie group in a manifold.Recently, a rotation and translation equivariant neural network for image data was proposed based on the moving frames approach. In this paper we significantly improve that approach by reducing the computation of moving frames to only one, at the input stage, instead of repeated computations at each layer. The equivariance of the resulting architecture is proved theoretically and we build a rotation and translation equivariant neural network to process volumes, i.e. signals on the 3D space. Our trained model overperforms the benchmarks in the medical volume classification of most of the tested datasets from MedMNIST3D.
translated by 谷歌翻译
The principle of equivariance to symmetry transformations enables a theoretically grounded approach to neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. Here we show how this principle can be extended beyond global symmetries to local gauge transformations. This enables the development of a very general class of convolutional neural networks on manifolds that depend only on the intrinsic geometry, and which includes many popular methods from equivariant and geometric deep learning.We implement gauge equivariant CNNs for signals defined on the surface of the icosahedron, which provides a reasonable approximation of the sphere. By choosing to work with this very regular manifold, we are able to implement the gauge equivariant convolution using a single conv2d call, making it a highly scalable and practical alternative to Spherical CNNs. Using this method, we demonstrate substantial improvements over previous methods on the task of segmenting omnidirectional images and global climate patterns.
translated by 谷歌翻译
包括协调性信息,例如位置,力,速度或旋转在计算物理和化学中的许多任务中是重要的。我们介绍了概括了等级图形网络的可控e(3)的等值图形神经网络(Segnns),使得节点和边缘属性不限于不变的标量,而是可以包含相协同信息,例如矢量或张量。该模型由可操纵的MLP组成,能够在消息和更新功能中包含几何和物理信息。通过可操纵节点属性的定义,MLP提供了一种新的Activation函数,以便与可转向功能字段一般使用。我们讨论我们的镜头通过等级的非线性卷曲镜头讨论我们的相关工作,进一步允许我们引脚点点的成功组件:非线性消息聚集在经典线性(可操纵)点卷积上改善;可操纵的消息在最近发送不变性消息的最近的等价图形网络上。我们展示了我们对计算物理学和化学的若干任务的方法的有效性,并提供了广泛的消融研究。
translated by 谷歌翻译