标量和矢量场的神经近似(例如签名距离函数和辐射场)已成为准确的高质量表示。最先进的结果是通过从可训练的特征网格中进行查找的调节来获得的,这些近似是按照学习任务的一部分,并允许较小,更有效的神经网络。不幸的是,与独立的神经网络模型相比,这些特征网格通常以明显增加的记忆消耗成本。我们提出了一种词典方法,用于压缩此类特征网格,将其内存消耗降低至100倍,并允许多分辨率表示,这对于核心外流很有用。我们将词典优化作为矢量定量的自动码头问题提出,使我们能够在没有直接监督以及具有动态拓扑和结构的空间中学习端到端离散的神经表示。我们的源代码将在https://github.com/nv-tlabs/vqad上找到。
translated by 谷歌翻译
Neural radiance fields (NeRF) have demonstrated the potential of coordinate-based neural representation (neural fields or implicit neural representation) in neural rendering. However, using a multi-layer perceptron (MLP) to represent a 3D scene or object requires enormous computational resources and time. There have been recent studies on how to reduce these computational inefficiencies by using additional data structures, such as grids or trees. Despite the promising performance, the explicit data structure necessitates a substantial amount of memory. In this work, we present a method to reduce the size without compromising the advantages of having additional data structures. In detail, we propose using the wavelet transform on grid-based neural fields. Grid-based neural fields are for fast convergence, and the wavelet transform, whose efficiency has been demonstrated in high-performance standard codecs, is to improve the parameter efficiency of grids. Furthermore, in order to achieve a higher sparsity of grid coefficients while maintaining reconstruction quality, we present a novel trainable masking approach. Experimental results demonstrate that non-spatial grid coefficients, such as wavelet coefficients, are capable of attaining a higher level of sparsity than spatial grid coefficients, resulting in a more compact representation. With our proposed mask and compression pipeline, we achieved state-of-the-art performance within a memory budget of 2 MB. Our code is available at https://github.com/daniel03c1/masked_wavelet_nerf.
translated by 谷歌翻译
We present a novel method to provide efficient and highly detailed reconstructions. Inspired by wavelets, our main idea is to learn a neural field that decompose the signal both spatially and frequency-wise. We follow the recent grid-based paradigm for spatial decomposition, but unlike existing work, encourage specific frequencies to be stored in each grid via Fourier features encodings. We then apply a multi-layer perceptron with sine activations, taking these Fourier encoded features in at appropriate layers so that higher-frequency components are accumulated on top of lower-frequency components sequentially, which we sum up to form the final output. We demonstrate that our method outperforms the state of the art regarding model compactness and efficiency on multiple tasks: 2D image fitting, 3D shape reconstruction, and neural radiance fields.
translated by 谷歌翻译
Neural signed distance functions (SDFs) are emerging as an effective representation for 3D shapes. State-of-theart methods typically encode the SDF with a large, fixedsize neural network to approximate complex shapes with implicit surfaces. Rendering with these large networks is, however, computationally expensive since it requires many forward passes through the network for every pixel, making these representations impractical for real-time graphics. We introduce an efficient neural representation that, for the first time, enables real-time rendering of high-fidelity neural SDFs, while achieving state-of-the-art geometry reconstruction quality. We represent implicit surfaces using an octree-based feature volume which adaptively fits shapes with multiple discrete levels of detail (LODs), and enables continuous LOD with SDF interpolation. We further develop an efficient algorithm to directly render our novel neural SDF representation in real-time by querying only the necessary LODs with sparse octree traversal. We show that our representation is 2-3 orders of magnitude more efficient in terms of rendering speed compared to previous works. Furthermore, it produces state-of-the-art reconstruction quality for complex shapes under both 3D geometric and 2D image-space metrics.
translated by 谷歌翻译
我们介绍了NeuralVDB,它通过利用机器学习的最新进步来提高现有的行业标准,以有效地存储稀疏体积数据,表示VDB。我们的新型混合数据结构可以通过数量级来减少VDB体积的内存足迹,同时保持其灵活性,并且只会产生一个小(用户控制的)压缩误差。具体而言,NeuralVDB用多个层次神经网络替换了浅和宽VDB树结构的下节点,这些神经网络分别通过神经分类器和回归器分别编码拓扑和价值信息。这种方法已证明可以最大化压缩比,同时保持高级VDB数据结构提供的空间适应性。对于稀疏的签名距离字段和密度量,我们已经观察到从已经压缩的VDB输入中的$ 10 \ times $ $ $ \ $ 100 \ $ 100 \ $ 100 \ $ 100 \ $ 100的压缩比,几乎没有可视化伪像。我们还展示了其在动画稀疏体积上的应用如何加速训练并产生时间连贯的神经网络。
translated by 谷歌翻译
尽管神经场景表示的潜力能够在高重建质量下有效压缩3D标量场,但使用场景表示网络的训练和数据重建步骤的计算复杂性限制了它们在实际应用中的使用。在本文中,我们分析了是否可以修改场景表示网络以减少这些限制以及这些架构是否也可以用于时间重建任务。我们提出了一种使用GPU Tensor核心的场景表示网络设计,将重建无缝化为片上芯片的横梁内核。此外,我们调查使用图像引导网络培训作为典型数据驱动方法的替代方案,我们探索了这种替代品质量和速度的潜在优势和缺点。作为时变字段的空间超分辨率方法的替代方案,我们提出了一种在潜在空间插值上建立的解决方案,以使任意粒度的随机访问重建。我们以评估科学可视化任务和概述未来研究方向的现场代表网络的优势和局限性的形式总结了我们的调查结果。
translated by 谷歌翻译
机器学习的最近进步已经创造了利用一类基于坐标的神经网络来解决视觉计算问题的兴趣,该基于坐标的神经网络在空间和时间跨空间和时间的场景或对象的物理属性。我们称之为神经领域的这些方法已经看到在3D形状和图像的合成中成功应用,人体的动画,3D重建和姿势估计。然而,由于在短时间内的快速进展,许多论文存在,但尚未出现全面的审查和制定问题。在本报告中,我们通过提供上下文,数学接地和对神经领域的文学进行广泛综述来解决这一限制。本报告涉及两种维度的研究。在第一部分中,我们通过识别神经字段方法的公共组件,包括不同的表示,架构,前向映射和泛化方法来专注于神经字段的技术。在第二部分中,我们专注于神经领域的应用在视觉计算中的不同问题,超越(例如,机器人,音频)。我们的评论显示了历史上和当前化身的视觉计算中已覆盖的主题的广度,展示了神经字段方法所带来的提高的质量,灵活性和能力。最后,我们展示了一个伴随着贡献本综述的生活版本,可以由社区不断更新。
translated by 谷歌翻译
Approximating radiance fields with volumetric grids is one of promising directions for improving NeRF, represented by methods like Plenoxels and DVGO, which achieve super-fast training convergence and real-time rendering. However, these methods typically require a tremendous storage overhead, costing up to hundreds of megabytes of disk space and runtime memory for a single scene. We address this issue in this paper by introducing a simple yet effective framework, called vector quantized radiance fields (VQRF), for compressing these volume-grid-based radiance fields. We first present a robust and adaptive metric for estimating redundancy in grid models and performing voxel pruning by better exploring intermediate outputs of volumetric rendering. A trainable vector quantization is further proposed to improve the compactness of grid models. In combination with an efficient joint tuning strategy and post-processing, our method can achieve a compression ratio of 100$\times$ by reducing the overall model size to 1 MB with negligible loss on visual quality. Extensive experiments demonstrate that the proposed framework is capable of achieving unrivaled performance and well generalization across multiple methods with distinct volumetric structures, facilitating the wide use of volumetric radiance fields methods in real-world applications. Code Available at \url{https://github.com/AlgoHunt/VQRF}
translated by 谷歌翻译
与图像集表示相比,神经表示表现出非常紧凑的同时非常紧凑的能力表现出了巨大的希望。但是,当前的表示不适合流式传输,因为解码只能以单个详细信息进行,并且需要下载整个神经网络模型。此外,高分辨率光场网络可以表现出闪烁和混叠,因为在没有适当过滤的情况下采样了神经网络。为了解决这些问题,我们提出了一个渐进的多尺度光场网络,该网络编码具有多个细节的光场。较低的细节使用更少的神经网络权重编码,从而可以进行渐进的流和减少渲染时间。我们的渐进多尺度光场网络通过在其较低的细节级别编码较小的反陈述来解决混蛋。此外,每个像素的细节级别使我们的表示能够支持抖动的过渡和浮动渲染。
translated by 谷歌翻译
We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800×800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of scenes with arbitrary geometry and view-dependent effects. Real-time performance is achieved by pre-tabulating the NeRF into a PlenOctree. In order to preserve viewdependent effects such as specularities, we factorize the appearance via closed-form spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss, which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step can be used to reduce the training time, as we no longer need to wait for the NeRF training to converge fully. Our real-time neural rendering approach may potentially enable new applications such as 6-DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to in-browser rendering as well; please visit the project page for the interactive online demo, as well as video and code: https://alexyu. net/plenoctrees.
translated by 谷歌翻译
神经网络在压缩体积数据以进行科学可视化方面表现出巨大的潜力。但是,由于训练和推断的高成本,此类体积神经表示仅应用于离线数据处理和非交互式渲染。在本文中,我们证明,通过同时利用现代的GPU张量核心,本地CUDA神经网络框架以及在线培训,我们可以使用体积神经表示来实现高性能和高效率交互式射线追踪。此外,我们的方法是完全概括的,可以适应时变的数据集。我们提出了三种用于在线培训的策略,每种策略都利用GPU,CPU和核心流程技术的不同组合。我们还开发了三个渲染实现,允许交互式射线跟踪与实时卷解码,示例流和幕后神经网络推断相结合。我们证明,我们的体积神经表示可以扩展到Terascale,以进行常规网格体积可视化,并可以轻松地支持不规则的数据结构,例如OpenVDB,非结构化,AMR和粒子体积数据。
translated by 谷歌翻译
神经辐射场(NERFS)表现出惊人的能力,可以从新颖的观点中综合3D场景的图像。但是,他们依赖于基于射线行进的专门体积渲染算法,这些算法与广泛部署的图形硬件的功能不匹配。本文介绍了基于纹理多边形的新的NERF表示形式,该表示可以有效地与标准渲染管道合成新型图像。 NERF表示为一组多边形,其纹理代表二进制不相处和特征向量。用Z-Buffer对多边形的传统渲染产生了每个像素的图像,该图像由在片段着色器中运行的小型,观点依赖的MLP来解释,以产生最终的像素颜色。这种方法使NERF可以使用传统的Polygon栅格化管道渲染,该管道提供了庞大的像素级并行性,从而在包括移动电话在内的各种计算平台上实现了交互式帧速率。
translated by 谷歌翻译
综合照片 - 现实图像和视频是计算机图形的核心,并且是几十年的研究焦点。传统上,使用渲染算法(如光栅化或射线跟踪)生成场景的合成图像,其将几何形状和材料属性的表示为输入。统称,这些输入定义了实际场景和呈现的内容,并且被称为场景表示(其中场景由一个或多个对象组成)。示例场景表示是具有附带纹理的三角形网格(例如,由艺术家创建),点云(例如,来自深度传感器),体积网格(例如,来自CT扫描)或隐式曲面函数(例如,截短的符号距离)字段)。使用可分辨率渲染损耗的观察结果的这种场景表示的重建被称为逆图形或反向渲染。神经渲染密切相关,并将思想与经典计算机图形和机器学习中的思想相结合,以创建用于合成来自真实观察图像的图像的算法。神经渲染是朝向合成照片现实图像和视频内容的目标的跨越。近年来,我们通过数百个出版物显示了这一领域的巨大进展,这些出版物显示了将被动组件注入渲染管道的不同方式。这种最先进的神经渲染进步的报告侧重于将经典渲染原则与学习的3D场景表示结合的方法,通常现在被称为神经场景表示。这些方法的一个关键优势在于它们是通过设计的3D-一致,使诸如新颖的视点合成捕获场景的应用。除了处理静态场景的方法外,我们还涵盖了用于建模非刚性变形对象的神经场景表示...
translated by 谷歌翻译
我们提出了一个小说嵌入字段\ emph {pref}作为促进神经信号建模和重建任务的紧凑表示。基于纯的多层感知器(MLP)神经技术偏向低频信号,并依赖于深层或傅立叶编码以避免丢失细节。取而代之的是,基于傅立叶嵌入空间的相拟合公式,PREF采用了紧凑且物理上解释的编码场。我们进行全面的实验,以证明PERF比最新的空间嵌入技术的优势。然后,我们使用近似的逆傅里叶变换方案以及新型的parseval正常器来开发高效的频率学习框架。广泛的实验表明,我们的高效和紧凑的基于频率的神经信号处理技术与2D图像完成,3D SDF表面回归和5D辐射场现场重建相同,甚至比最新的。
translated by 谷歌翻译
我们认为,作为离散位置向量值体积功能的采样点云的属性。为了压缩所提供的位置属性,我们压缩体积函数的参数。我们通过平铺空间成块,并通过基于坐标的,或隐式的,神经网络的偏移较每个块中的函数的体积函数建模。输入到网络包括空间坐标和每个块的潜矢量。我们代表使用区域自适应分级的系数潜矢量变换在MPEG基于几何形状的点云的编解码器G-PCC使用(RAHT)。的系数,这是高度可压缩的,是速率 - 失真通过在自动解码器配置的速率 - 失真拉格朗日损失由反向传播最优化。结果由2-4分贝优于RAHT。这是第一工作由局部坐标为基础的神经网络为代表的压缩体积的功能。因此,我们希望它是适用超越的点云,例如高分辨率的神经辐射场的压缩。
translated by 谷歌翻译
量化在隐式/坐标神经网络中的作用仍未完全理解。我们注意到,在训练过程中使用规范的固定量化方案在训练过程中的网络重量分布发生变化,在训练过程中会导致低速表现不佳。在这项工作中,我们表明神经体重的不均匀量化会导致显着改善。具体而言,我们证明了群集量化可以改善重建。最后,通过表征量化和网络容量之间的权衡,我们证明使用二进制神经网络重建信号是可能的(而记忆效率低下)。我们在2D图像重建和3D辐射场上实验证明了我们的发现;并表明简单的量化方法和体系结构搜索可以使NERF的压缩至小于16KB,而性能损失最小(比原始NERF小323倍)。
translated by 谷歌翻译
Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF (Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website: https://github.com/facebookresearch/NSVF.
translated by 谷歌翻译
Physically based rendering of complex scenes can be prohibitively costly with a potentially unbounded and uneven distribution of complexity across the rendered image. The goal of an ideal level of detail (LoD) method is to make rendering costs independent of the 3D scene complexity, while preserving the appearance of the scene. However, current prefiltering LoD methods are limited in the appearances they can support due to their reliance of approximate models and other heuristics. We propose the first comprehensive multi-scale LoD framework for prefiltering 3D environments with complex geometry and materials (e.g., the Disney BRDF), while maintaining the appearance with respect to the ray-traced reference. Using a multi-scale hierarchy of the scene, we perform a data-driven prefiltering step to obtain an appearance phase function and directional coverage mask at each scale. At the heart of our approach is a novel neural representation that encodes this information into a compact latent form that is easy to decode inside a physically based renderer. Once a scene is baked out, our method requires no original geometry, materials, or textures at render time. We demonstrate that our approach compares favorably to state-of-the-art prefiltering methods and achieves considerable savings in memory for complex scenes.
translated by 谷歌翻译
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (nonconvolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
translated by 谷歌翻译
最近,我们看到了照片真实的人类建模和渲染的神经进展取得的巨大进展。但是,将它们集成到现有的下游应用程序中的现有网络管道中仍然具有挑战性。在本文中,我们提出了一种全面的神经方法,用于从密集的多视频视频中对人类表演进行高质量重建,压缩和渲染。我们的核心直觉是用一系列高效的神经技术桥接传统的动画网格工作流程。我们首先引入一个神经表面重建器,以在几分钟内进行高质量的表面产生。它与多分辨率哈希编码的截短签名距离场(TSDF)的隐式体积渲染相结合。我们进一步提出了一个混合神经跟踪器来生成动画网格,该网格将明确的非刚性跟踪与自我监督框架中的隐式动态变形结合在一起。前者将粗糙的翘曲返回到规范空间中,而后者隐含的一个隐含物进一步预测了使用4D哈希编码的位移,如我们的重建器中。然后,我们使用获得的动画网格讨论渲染方案,从动态纹理到各种带宽设置下的Lumigraph渲染。为了在质量和带宽之间取得复杂的平衡,我们通过首先渲染6个虚拟视图来涵盖表演者,然后进行闭塞感知的神经纹理融合,提出一个分层解决方案。我们证明了我们方法在各种平台上的各种基于网格的应用程序和照片真实的自由观看体验中的功效,即,通过移动AR插入虚拟人类的表演,或通过移动AR插入真实环境,或带有VR头戴式的人才表演。
translated by 谷歌翻译