最新的2D图像压缩方案依赖于卷积神经网络(CNN)的力量。尽管CNN为2D图像压缩提供了有希望的观点,但将此类模型扩展到全向图像并不简单。首先,全向图像具有特定的空间和统计特性,这些特性无法通过当前CNN模型完全捕获。其次,在球体上,基本的数学操作组成了CNN体系结构,例如翻译和采样。在本文中,我们研究了全向图像的表示模型的学习,并建议使用球体的HealPix均匀采样的属性来重新定义用于全向图像的深度学习模型中使用的数学工具。特别是,我们:i)提出了在球体上进行新的卷积操作的定义,以保持经典2D卷积的高表现力和低复杂性; ii)适应标准的CNN技术,例如步幅,迭代聚集和像素改组到球形结构域;然后iii)将我们的新框架应用于全向图像压缩的任务。我们的实验表明,与应用于等应角图像的类似学习模型相比,我们提出的球形溶液可带来更好的压缩增益,可以节省比特率的13.7%。同样,与基于图形卷积网络的学习模型相比,我们的解决方案支持更具表现力的过滤器,这些过滤器可以保留高频并提供压缩图像的更好的感知质量。这样的结果证明了拟议框架的效率,该框架为其他全向视觉任务任务打开了新的研究场所,以在球体歧管上有效实施。
translated by 谷歌翻译
虽然昼夜投影(ERP)是存储全向图像(也称为360度图像)的方便形式,但它既不是等区别也不是共形的,因此与随后的视觉通信不友好。在图像压缩的背景下,ERP将过度采样和变形和靠近杆子的东西,使得感知上最佳的比特分配难以实现。在传统的360度图像压缩中,引入了诸如区域明智的包装和平铺表示的技术以减轻过采样问题,实现有限的成功。在本文中,我们首次尝试学习用于全向图像压缩的深度神经网络之一。我们首先描述参数伪压花表示作为常见的伪变性地图突起的概括。提出了一种计算上易贪婪的方法,以确定关于速率失真性能的新型代理目标的假阴压表示的(子) - 优化配置。然后,我们提出了假阴压卷曲的360度图像压缩。在参数表示的合理约束下,可以通过标准卷积与所谓的假阴压填充有效地实现假阴压卷积。为了展示我们想法的可行性,我们实现了一个端到端的360度图像压缩系统,由学习的假阴短表示,分析变换,非均匀量化器,合成变换和熵模型组成。实验结果为19,790美元$ 9,790 $全向图像表明,我们的方法始终如一的比竞争方法达到更好的速率失真性能。此外,对于所有比特率的所有图像,我们的方法的视觉质量显着提高。
translated by 谷歌翻译
The principle of equivariance to symmetry transformations enables a theoretically grounded approach to neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. Here we show how this principle can be extended beyond global symmetries to local gauge transformations. This enables the development of a very general class of convolutional neural networks on manifolds that depend only on the intrinsic geometry, and which includes many popular methods from equivariant and geometric deep learning.We implement gauge equivariant CNNs for signals defined on the surface of the icosahedron, which provides a reasonable approximation of the sphere. By choosing to work with this very regular manifold, we are able to implement the gauge equivariant convolution using a single conv2d call, making it a highly scalable and practical alternative to Spherical CNNs. Using this method, we demonstrate substantial improvements over previous methods on the task of segmenting omnidirectional images and global climate patterns.
translated by 谷歌翻译
Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions), and are natural targets for machine learning techniques. In particular, we would like to use deep neural networks, which have recently proven to be powerful tools for a broad range of problems from computer vision, natural language processing, and audio analysis. However, these tools have been most successful on data with an underlying Euclidean or grid-like structure, and in cases where the invariances of these structures are built into networks used to model them.Geometric deep learning is an umbrella term for emerging techniques attempting to generalize (structured) deep neural models to non-Euclidean domains such as graphs and manifolds. The purpose of this paper is to overview different examples of geometric deep learning problems and present available solutions, key difficulties, applications, and future research directions in this nascent field.
translated by 谷歌翻译
我们在并行计算机架构上的图像的自适应粒子表示(APR)上的离散卷积运算符的本机实现数据结构和算法。 APR是一个内容 - 自适应图像表示,其本地地将采样分辨率局部调整到图像信号。已经开发为大,稀疏图像的像素表示的替代方案,因为它们通常在荧光显微镜中发生。已经显示出降低存储,可视化和处理此类图像的存储器和运行时成本。然而,这要求图像处理本身在APRS上运行,而无需中间恢复为像素。然而,设计高效和可扩展的APR-Native图像处理原语是APR的不规则内存结构的复杂性。这里,我们提供了使用可以在离散卷积方面配制的各种算法有效和本地地处理APR图像所需的算法建筑块。我们表明APR卷积自然地导致缩放 - 自适应算法,可在多核CPU和GPU架构上有效地平行化。与基于像素的算法和概念性数据的卷积相比,我们量化了加速度。我们在单个NVIDIA GeForce RTX 2080 Gaming GPU上实现了最多1 TB / s的像素等效吞吐量,而不是基于像素的实现的存储器最多两个数量级。
translated by 谷歌翻译
我们介绍了CheBlieset,一种对(各向异性)歧管的组成的方法。对基于GRAP和基于组的神经网络的成功进行冲浪,我们利用了几何深度学习领域的最新发展,以推导出一种新的方法来利用数据中的任何各向异性。通过离散映射的谎言组,我们开发由各向异性卷积层(Chebyshev卷积),空间汇集和解凝层制成的图形神经网络,以及全球汇集层。集团的标准因素是通过具有各向异性左不变性的黎曼距离的图形上的等级和不变的运算符来实现的。由于其简单的形式,Riemannian公制可以在空间和方向域中模拟任何各向异性。这种对Riemannian度量的各向异性的控制允许平衡图形卷积层的不变性(各向异性度量)的平衡(各向异性指标)。因此,我们打开大门以更好地了解各向异性特性。此外,我们经验证明了在CIFAR10上的各向异性参数的存在(数据依赖性)甜点。这一关键的结果是通过利用数据中的各向异性属性来获得福利的证据。我们还评估了在STL10(图像数据)和ClimateNet(球面数据)上的这种方法的可扩展性,显示了对不同任务的显着适应性。
translated by 谷歌翻译
相互预测是实现现代视频编码标准高压效率的关键技术之一。在编码之前,需要将360度视频映射到2D图像平面,以便使用现有的视频编码标准进行压缩。但是,当将球形数据映射到2D图像平面上时不可避免地发生扭曲,但是,损害了经典的中间预测技术的性能。在本文中,我们为360度视频提出了一种运动平面自适应相互预测技术(MPA),该视频考虑了360度视频的球形特征。基于视频的已知投影格式,MPA允许对3D空间中的不同运动平面执行相互预测,而不必在理论上任意映射 - 2D图像表示。我们进一步推导了运动平面自适应运动矢量预测技术(MPA-MVP),该技术允许在不同的运动平面和运动模型之间转换运动信息。我们建议将MPA与MPA-MVP一起集成到最新的H.266/VVC视频编码标准中,根据PSNR,Bjontegaard Delta速率节省了1.72%,峰值为3.97%,为1.56%,峰值为3.97%。基于WS-PSNR的峰值为3.40%,而VTM-14.2平均水平为基础。
translated by 谷歌翻译
Point clouds are characterized by irregularity and unstructuredness, which pose challenges in efficient data exploitation and discriminative feature extraction. In this paper, we present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology as a completely regular 2D point geometry image (PGI) structure, in which coordinates of spatial points are captured in colors of image pixels. \mr{Intuitively, Flattening-Net implicitly approximates a locally smooth 3D-to-2D surface flattening process while effectively preserving neighborhood consistency.} \mr{As a generic representation modality, PGI inherently encodes the intrinsic property of the underlying manifold structure and facilitates surface-style point feature aggregation.} To demonstrate its potential, we construct a unified learning framework directly operating on PGIs to achieve \mr{diverse types of high-level and low-level} downstream applications driven by specific task networks, including classification, segmentation, reconstruction, and upsampling. Extensive experiments demonstrate that our methods perform favorably against the current state-of-the-art competitors. We will make the code and data publicly available at https://github.com/keeganhk/Flattening-Net.
translated by 谷歌翻译
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
translated by 谷歌翻译
定义网格上卷积的常用方法是将它们作为图形解释并应用图形卷积网络(GCN)。这种GCNS利用各向同性核,因此对顶点的相对取向不敏感,从而对整个网格的几何形状。我们提出了规范的等分性网状CNN,它概括了GCNS施加各向异性仪表等级核。由于产生的特征携带方向信息,我们引入了通过网格边缘并行传输特征来定义的几何消息传递方案。我们的实验验证了常规GCN和其他方法的提出模型的显着提高的表达性。
translated by 谷歌翻译
360 {\ Deg}成像最近遭受了很大的关注;然而,其角度分辨率比窄视野(FOV)透视图像相对较低,因为它通过使用具有相同传感器尺寸的鱼眼透镜而被捕获。因此,它有利于超声解析360 {\ DEG}图像。已经制造了一些尝试,但大多数是常规的投影(ERP),尽管尽管存在纬度依赖性失真,但仍然是360 {\ DEG}图像表示的方式之一。在这种情况下,随着输出高分辨率(HR)图像始终处于与低分辨率(LR)输入相同的ERP格式,当将HR图像转换为其他投影类型时可能发生另一信息丢失。在本文中,我们提出了从LR 360 {\ Deg}图像产生连续球面图像表示的新颖框架,旨在通过任意360 {\ deg}预测给定球形坐标处的RGB值。图像投影。具体地,我们首先提出了一种特征提取模块,该特征提取模块表示基于IcosaheDron的球面数据,并有效地提取球面上的特征。然后,我们提出了一种球形本地隐式图像功能(SLIIF)来预测球形坐标处的RGB值。这样,Spheresr在任意投影型下灵活地重建HR图像。各种基准数据集的实验表明,我们的方法显着超越了现有方法。
translated by 谷歌翻译
Graph classification is an important area in both modern research and industry. Multiple applications, especially in chemistry and novel drug discovery, encourage rapid development of machine learning models in this area. To keep up with the pace of new research, proper experimental design, fair evaluation, and independent benchmarks are essential. Design of strong baselines is an indispensable element of such works. In this thesis, we explore multiple approaches to graph classification. We focus on Graph Neural Networks (GNNs), which emerged as a de facto standard deep learning technique for graph representation learning. Classical approaches, such as graph descriptors and molecular fingerprints, are also addressed. We design fair evaluation experimental protocol and choose proper datasets collection. This allows us to perform numerous experiments and rigorously analyze modern approaches. We arrive to many conclusions, which shed new light on performance and quality of novel algorithms. We investigate application of Jumping Knowledge GNN architecture to graph classification, which proves to be an efficient tool for improving base graph neural network architectures. Multiple improvements to baseline models are also proposed and experimentally verified, which constitutes an important contribution to the field of fair model comparison.
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
Physically based rendering of complex scenes can be prohibitively costly with a potentially unbounded and uneven distribution of complexity across the rendered image. The goal of an ideal level of detail (LoD) method is to make rendering costs independent of the 3D scene complexity, while preserving the appearance of the scene. However, current prefiltering LoD methods are limited in the appearances they can support due to their reliance of approximate models and other heuristics. We propose the first comprehensive multi-scale LoD framework for prefiltering 3D environments with complex geometry and materials (e.g., the Disney BRDF), while maintaining the appearance with respect to the ray-traced reference. Using a multi-scale hierarchy of the scene, we perform a data-driven prefiltering step to obtain an appearance phase function and directional coverage mask at each scale. At the heart of our approach is a novel neural representation that encodes this information into a compact latent form that is easy to decode inside a physically based renderer. Once a scene is baked out, our method requires no original geometry, materials, or textures at render time. We demonstrate that our approach compares favorably to state-of-the-art prefiltering methods and achieves considerable savings in memory for complex scenes.
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
卷积神经网络(CNNS)在2D计算机视觉中取得了很大的突破。然而,它们的不规则结构使得难以在网格上直接利用CNNS的潜力。细分表面提供分层多分辨率结构,其中闭合的2 - 歧管三角网格中的每个面正恰好邻近三个面。本文推出了这两种观察,介绍了具有环形细分序列连接的3D三角形网格的创新和多功能CNN框架。在2D图像中的网格面和像素之间进行类比允许我们呈现网状卷积操作者以聚合附近面的局部特征。通过利用面部街区,这种卷积可以支持标准的2D卷积网络概念,例如,可变内核大小,步幅和扩张。基于多分辨率层次结构,我们利用汇集层,将四个面均匀地合并成一个和上采样方法,该方法将一个面分为四个。因此,许多流行的2D CNN架构可以容易地适应处理3D网格。可以通过自我参数化来回收具有任意连接的网格,以使循环细分序列连接,使子变量是一般的方法。广泛的评估和各种应用展示了SubDIVNet的有效性和效率。
translated by 谷歌翻译
We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation. * This work was done during an internship at NVIDIA. 35th Conference on Neural Information Processing Systems (NeurIPS 2021).
translated by 谷歌翻译
现有的球形卷积神经网络(CNN)框架在计算方面既可以扩展又是旋转等值的。连续的方法捕获旋转模棱两可,但通常在计算上是过时的。离散的方法提供了更有利的计算性能,但付出了损失。我们开发了一个混合离散(迪斯科)组卷积,该卷积同时均具有等效性,并且在计算上可扩展到高分辨率。虽然我们的框架可以应用于任何紧凑的组,但我们专注于球体。我们的迪斯科球形卷积不仅表现出$ \ text {so}(3)$ rotational equivariance,而且还表现出一种渐近$ \ text {so}(3)/\ text {so}(so}(so}(2)$ rotationation eporational ecorivarianciancience,对于许多应用程序(其中$ \ text {so}(n)$是特殊的正交组,代表$ n $ dimensions中的旋转)。通过稀疏的张量实现,我们可以在球体上的像素数量进行线性缩放,以供计算成本和内存使用情况。对于4K球形图像,与最有效的替代替代品量球卷积相比,我们意识到节省了$ 10^9 $的计算成本和$ 10^4 $的内存使用情况。我们将迪斯科球形CNN框架应用于球体上的许多基准密集预测问题,例如语义分割和深度估计,在所有这些问题上,我们都达到了最先进的性能。
translated by 谷歌翻译
即使机器学习算法已经在数据科学中发挥了重要作用,但许多当前方法对输入数据提出了不现实的假设。由于不兼容的数据格式,或数据集中的异质,分层或完全缺少的数据片段,因此很难应用此类方法。作为解决方案,我们提出了一个用于样本表示,模型定义和培训的多功能,统一的框架,称为“ Hmill”。我们深入审查框架构建和扩展的机器学习的多个范围范式。从理论上讲,为HMILL的关键组件的设计合理,我们将通用近似定理的扩展显示到框架中实现的模型所实现的所有功能的集合。本文还包含有关我们实施中技术和绩效改进的详细讨论,该讨论将在MIT许可下发布供下载。该框架的主要资产是其灵活性,它可以通过相同的工具对不同的现实世界数据源进行建模。除了单独观察到每个对象的一组属性的标准设置外,我们解释了如何在框架中实现表示整个对象系统的图表中的消息推断。为了支持我们的主张,我们使用框架解决了网络安全域的三个不同问题。第一种用例涉及来自原始网络观察结果的IoT设备识别。在第二个问题中,我们研究了如何使用以有向图表示的操作系统的快照可以对恶意二进制文件进行分类。最后提供的示例是通过网络中实体之间建模域黑名单扩展的任务。在所有三个问题中,基于建议的框架的解决方案可实现与专业方法相当的性能。
translated by 谷歌翻译