Neural radiance fields (NeRF) have demonstrated the potential of coordinate-based neural representation (neural fields or implicit neural representation) in neural rendering. However, using a multi-layer perceptron (MLP) to represent a 3D scene or object requires enormous computational resources and time. There have been recent studies on how to reduce these computational inefficiencies by using additional data structures, such as grids or trees. Despite the promising performance, the explicit data structure necessitates a substantial amount of memory. In this work, we present a method to reduce the size without compromising the advantages of having additional data structures. In detail, we propose using the wavelet transform on grid-based neural fields. Grid-based neural fields are for fast convergence, and the wavelet transform, whose efficiency has been demonstrated in high-performance standard codecs, is to improve the parameter efficiency of grids. Furthermore, in order to achieve a higher sparsity of grid coefficients while maintaining reconstruction quality, we present a novel trainable masking approach. Experimental results demonstrate that non-spatial grid coefficients, such as wavelet coefficients, are capable of attaining a higher level of sparsity than spatial grid coefficients, resulting in a more compact representation. With our proposed mask and compression pipeline, we achieved state-of-the-art performance within a memory budget of 2 MB. Our code is available at https://github.com/daniel03c1/masked_wavelet_nerf.
translated by 谷歌翻译
标量和矢量场的神经近似(例如签名距离函数和辐射场)已成为准确的高质量表示。最先进的结果是通过从可训练的特征网格中进行查找的调节来获得的,这些近似是按照学习任务的一部分,并允许较小,更有效的神经网络。不幸的是,与独立的神经网络模型相比,这些特征网格通常以明显增加的记忆消耗成本。我们提出了一种词典方法,用于压缩此类特征网格,将其内存消耗降低至100倍,并允许多分辨率表示,这对于核心外流很有用。我们将词典优化作为矢量定量的自动码头问题提出,使我们能够在没有直接监督以及具有动态拓扑和结构的空间中学习端到端离散的神经表示。我们的源代码将在https://github.com/nv-tlabs/vqad上找到。
translated by 谷歌翻译
We present TensoRF, a novel approach to model and reconstruct radiance fields. Unlike NeRF that purely uses MLPs, we model the radiance field of a scene as a 4D tensor, which represents a 3D voxel grid with per-voxel multi-channel features. Our central idea is to factorize the 4D scene tensor into multiple compact low-rank tensor components. We demonstrate that applying traditional CP decomposition -- that factorizes tensors into rank-one components with compact vectors -- in our framework leads to improvements over vanilla NeRF. To further boost performance, we introduce a novel vector-matrix (VM) decomposition that relaxes the low-rank constraints for two modes of a tensor and factorizes tensors into compact vector and matrix factors. Beyond superior rendering quality, our models with CP and VM decompositions lead to a significantly lower memory footprint in comparison to previous and concurrent works that directly optimize per-voxel features. Experimentally, we demonstrate that TensoRF with CP decomposition achieves fast reconstruction (<30 min) with better rendering quality and even a smaller model size (<4 MB) compared to NeRF. Moreover, TensoRF with VM decomposition further boosts rendering quality and outperforms previous state-of-the-art methods, while reducing the reconstruction time (<10 min) and retaining a compact model size (<75 MB).
translated by 谷歌翻译
We present a novel method to provide efficient and highly detailed reconstructions. Inspired by wavelets, our main idea is to learn a neural field that decompose the signal both spatially and frequency-wise. We follow the recent grid-based paradigm for spatial decomposition, but unlike existing work, encourage specific frequencies to be stored in each grid via Fourier features encodings. We then apply a multi-layer perceptron with sine activations, taking these Fourier encoded features in at appropriate layers so that higher-frequency components are accumulated on top of lower-frequency components sequentially, which we sum up to form the final output. We demonstrate that our method outperforms the state of the art regarding model compactness and efficiency on multiple tasks: 2D image fitting, 3D shape reconstruction, and neural radiance fields.
translated by 谷歌翻译
Approximating radiance fields with volumetric grids is one of promising directions for improving NeRF, represented by methods like Plenoxels and DVGO, which achieve super-fast training convergence and real-time rendering. However, these methods typically require a tremendous storage overhead, costing up to hundreds of megabytes of disk space and runtime memory for a single scene. We address this issue in this paper by introducing a simple yet effective framework, called vector quantized radiance fields (VQRF), for compressing these volume-grid-based radiance fields. We first present a robust and adaptive metric for estimating redundancy in grid models and performing voxel pruning by better exploring intermediate outputs of volumetric rendering. A trainable vector quantization is further proposed to improve the compactness of grid models. In combination with an efficient joint tuning strategy and post-processing, our method can achieve a compression ratio of 100$\times$ by reducing the overall model size to 1 MB with negligible loss on visual quality. Extensive experiments demonstrate that the proposed framework is capable of achieving unrivaled performance and well generalization across multiple methods with distinct volumetric structures, facilitating the wide use of volumetric radiance fields methods in real-world applications. Code Available at \url{https://github.com/AlgoHunt/VQRF}
translated by 谷歌翻译
我们提出了一个小说嵌入字段\ emph {pref}作为促进神经信号建模和重建任务的紧凑表示。基于纯的多层感知器(MLP)神经技术偏向低频信号,并依赖于深层或傅立叶编码以避免丢失细节。取而代之的是,基于傅立叶嵌入空间的相拟合公式,PREF采用了紧凑且物理上解释的编码场。我们进行全面的实验,以证明PERF比最新的空间嵌入技术的优势。然后,我们使用近似的逆傅里叶变换方案以及新型的parseval正常器来开发高效的频率学习框架。广泛的实验表明,我们的高效和紧凑的基于频率的神经信号处理技术与2D图像完成,3D SDF表面回归和5D辐射场现场重建相同,甚至比最新的。
translated by 谷歌翻译
Multilayer perceptrons (MLPs) learn high frequencies slowly. Recent approaches encode features in spatial bins to improve speed of learning details, but at the cost of larger model size and loss of continuity. Instead, we propose to encode features in bins of Fourier features that are commonly used for positional encoding. We call these Quantized Fourier Features (QFF). As a naturally multiresolution and periodic representation, our experiments show that using QFF can result in smaller model size, faster training, and better quality outputs for several applications, including Neural Image Representations (NIR), Neural Radiance Field (NeRF) and Signed Distance Function (SDF) modeling. QFF are easy to code, fast to compute, and serve as a simple drop-in addition to many neural field representations.
translated by 谷歌翻译
我们提出了一种可区分的渲染算法,以进行有效的新型视图合成。通过偏离基于音量的表示,支持学习点表示,我们在训练和推理方面的内存和运行时范围内改进了现有方法的数量级。该方法从均匀采样的随机点云开始,并使用基于可区分的SPLAT渲染器来发展模型以匹配一组输入图像,从而学习了每点位置和观看依赖性外观。在训练和推理中,我们的方法比NERF快300倍,质量只有边缘牺牲,而在静态场景中使用少于10 〜MB的记忆。对于动态场景,我们的方法比Stnerf训练两个数量级,并以接近互动速率渲染,同时即使在不施加任何时间固定的正则化合物的情况下保持较高的图像质量和时间连贯性。
translated by 谷歌翻译
Volumetric neural rendering methods like NeRF generate high-quality view synthesis results but are optimized per-scene leading to prohibitive reconstruction time. On the other hand, deep multi-view stereo methods can quickly reconstruct scene geometry via direct network inference. Point-NeRF combines the advantages of these two approaches by using neural 3D point clouds, with associated neural features, to model a radiance field. Point-NeRF can be rendered efficiently by aggregating neural point features near scene surfaces, in a ray marching-based rendering pipeline. Moreover, Point-NeRF can be initialized via direct inference of a pre-trained deep network to produce a neural point cloud; this point cloud can be finetuned to surpass the visual quality of NeRF with 30X faster training time. Point-NeRF can be combined with other 3D reconstruction methods and handles the errors and outliers in such methods via a novel pruning and growing mechanism. The experiments on the DTU, the NeRF Synthetics , the ScanNet and the Tanks and Temples datasets demonstrate Point-NeRF can surpass the existing methods and achieve the state-of-the-art results.
translated by 谷歌翻译
We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800×800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of scenes with arbitrary geometry and view-dependent effects. Real-time performance is achieved by pre-tabulating the NeRF into a PlenOctree. In order to preserve viewdependent effects such as specularities, we factorize the appearance via closed-form spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss, which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step can be used to reduce the training time, as we no longer need to wait for the NeRF training to converge fully. Our real-time neural rendering approach may potentially enable new applications such as 6-DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to in-browser rendering as well; please visit the project page for the interactive online demo, as well as video and code: https://alexyu. net/plenoctrees.
translated by 谷歌翻译
我们介绍了Plenoxels(plenoptic voxels),是一种光电型观测合成系统。Plenoxels表示作为具有球形谐波的稀疏3D网格的场景。该表示可以通过梯度方法和正则化从校准图像进行优化,而没有任何神经元件。在标准,基准任务中,Plenoxels优化了比神经辐射场更快的两个数量级,无需视觉质量损失。
translated by 谷歌翻译
NeRF synthesizes novel views of a scene with unprecedented quality by fitting a neural radiance field to RGB images. However, NeRF requires querying a deep Multi-Layer Perceptron (MLP) millions of times, leading to slow rendering times, even on modern GPUs. In this paper, we demonstrate that real-time rendering is possible by utilizing thousands of tiny MLPs instead of one single large MLP. In our setting, each individual MLP only needs to represent parts of the scene, thus smaller and faster-to-evaluate MLPs can be used. By combining this divide-and-conquer strategy with further optimizations, rendering is accelerated by three orders of magnitude compared to the original NeRF model without incurring high storage costs. Further, using teacher-student distillation for training, we show that this speed-up can be achieved without sacrificing visual quality.
translated by 谷歌翻译
机器学习的最近进步已经创造了利用一类基于坐标的神经网络来解决视觉计算问题的兴趣,该基于坐标的神经网络在空间和时间跨空间和时间的场景或对象的物理属性。我们称之为神经领域的这些方法已经看到在3D形状和图像的合成中成功应用,人体的动画,3D重建和姿势估计。然而,由于在短时间内的快速进展,许多论文存在,但尚未出现全面的审查和制定问题。在本报告中,我们通过提供上下文,数学接地和对神经领域的文学进行广泛综述来解决这一限制。本报告涉及两种维度的研究。在第一部分中,我们通过识别神经字段方法的公共组件,包括不同的表示,架构,前向映射和泛化方法来专注于神经字段的技术。在第二部分中,我们专注于神经领域的应用在视觉计算中的不同问题,超越(例如,机器人,音频)。我们的评论显示了历史上和当前化身的视觉计算中已覆盖的主题的广度,展示了神经字段方法所带来的提高的质量,灵活性和能力。最后,我们展示了一个伴随着贡献本综述的生活版本,可以由社区不断更新。
translated by 谷歌翻译
新观点的合成最近通过直接从稀疏观测中学习神经辐射场进行了革命。但是,使用这种新范式渲染图像的速度很慢,因为这样的事实是,该量渲染方程的准确正交需要为每个射线提供大量样品。先前的工作主要集中于加快与每个样本点相关的网络评估,例如,通过将辐射值的缓存到显式的空间数据结构中,但这是以模型紧凑性为代价的。在本文中,我们提出了一种新颖的双网络体系结构,该架构通过学习如何最好地减少所需的样品数量来实现正交方向。为此,我们将网络分为经过共同培训的采样和阴影网络。我们的培训计划采用沿每条射线的固定样品位置,并在整个训练中逐步引入稀疏性,即使在低样本计数下也可以达到高质量。对目标数量的数量进行微调后,可以实时渲染产生的紧凑神经表示。我们的实验表明,我们的方法在质量和框架速率方面超过同时紧凑的神经表示,并且与高效的混合表示相同。代码和补充材料可从https://thomasneff.github.io/adanerf获得。
translated by 谷歌翻译
神经辐射场(NERFS)产生最先进的视图合成结果。然而,它们慢渲染,需要每像素数百个网络评估,以近似卷渲染积分。将nerfs烘烤到明确的数据结构中实现了有效的渲染,但导致内存占地面积的大幅增加,并且在许多情况下,质量降低。在本文中,我们提出了一种新的神经光场表示,相反,相反,紧凑,直接预测沿线的集成光线。我们的方法支持使用每个像素的单个网络评估,用于小基线光场数据集,也可以应用于每个像素的几个评估的较大基线。在我们的方法的核心,是一个光线空间嵌入网络,将4D射线空间歧管映射到中间可间可动子的潜在空间中。我们的方法在诸如斯坦福光场数据集等密集的前置数据集中实现了最先进的质量。此外,对于带有稀疏输入的面对面的场景,我们可以在质量方面实现对基于NERF的方法具有竞争力的结果,同时提供更好的速度/质量/内存权衡,网络评估较少。
translated by 谷歌翻译
Neural fields, also known as coordinate-based or implicit neural representations, have shown a remarkable capability of representing, generating, and manipulating various forms of signals. For video representations, however, mapping pixel-wise coordinates to RGB colors has shown relatively low compression performance and slow convergence and inference speed. Frame-wise video representation, which maps a temporal coordinate to its entire frame, has recently emerged as an alternative method to represent videos, improving compression rates and encoding speed. While promising, it has still failed to reach the performance of state-of-the-art video compression algorithms. In this work, we propose FFNeRV, a novel method for incorporating flow information into frame-wise representations to exploit the temporal redundancy across the frames in videos inspired by the standard video codecs. Furthermore, we introduce a fully convolutional architecture, enabled by one-dimensional temporal grids, improving the continuity of spatial features. Experimental results show that FFNeRV yields the best performance for video compression and frame interpolation among the methods using frame-wise representations or neural fields. To reduce the model size even further, we devise a more compact convolutional architecture using the group and pointwise convolutions. With model compression techniques, including quantization-aware training and entropy coding, FFNeRV outperforms widely-used standard video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms.
translated by 谷歌翻译
神经辐射场(NERF)在建模3D场景和合成新型视图图像方面取得了巨大成功。但是,大多数以前的NERF方法需要大量时间来优化一个场景。显式数据结构,例如体素特征,显示出加速训练过程的巨大潜力。但是,体素特征面临两个大挑战,要应用于动态场景,即建模时间信息并捕获不同的点运动尺度。我们通过用时间感知的体素特征(称为Tineuvox)表示场景来提出一个辐射现场框架。引入了一个微小的坐标变形网络,以模拟粗糙运动轨迹,并在辐射网络中进一步增强了时间信息。提出了一种多距离插值方法,并应用于体素特征,以模拟小运动和大型运动。我们的框架大大加快了动态光芒度场的优化,同时保持高渲染质量。经验评估均在合成场景和真实场景上进行。我们的Tineuvox仅需8分钟和8 MB的存储成本即可完成培训,同时表现出比以前的动态NERF方法相似甚至更好的渲染性能。
translated by 谷歌翻译
Neural Radiance Field (NeRF), a new novel view synthesis with implicit scene representation has taken the field of Computer Vision by storm. As a novel view synthesis and 3D reconstruction method, NeRF models find applications in robotics, urban mapping, autonomous navigation, virtual reality/augmented reality, and more. Since the original paper by Mildenhall et al., more than 250 preprints were published, with more than 100 eventually being accepted in tier one Computer Vision Conferences. Given NeRF popularity and the current interest in this research area, we believe it necessary to compile a comprehensive survey of NeRF papers from the past two years, which we organized into both architecture, and application based taxonomies. We also provide an introduction to the theory of NeRF based novel view synthesis, and a benchmark comparison of the performance and speed of key NeRF models. By creating this survey, we hope to introduce new researchers to NeRF, provide a helpful reference for influential works in this field, as well as motivate future research directions with our discussion section.
translated by 谷歌翻译
尽管神经场景表示的潜力能够在高重建质量下有效压缩3D标量场,但使用场景表示网络的训练和数据重建步骤的计算复杂性限制了它们在实际应用中的使用。在本文中,我们分析了是否可以修改场景表示网络以减少这些限制以及这些架构是否也可以用于时间重建任务。我们提出了一种使用GPU Tensor核心的场景表示网络设计,将重建无缝化为片上芯片的横梁内核。此外,我们调查使用图像引导网络培训作为典型数据驱动方法的替代方案,我们探索了这种替代品质量和速度的潜在优势和缺点。作为时变字段的空间超分辨率方法的替代方案,我们提出了一种在潜在空间插值上建立的解决方案,以使任意粒度的随机访问重建。我们以评估科学可视化任务和概述未来研究方向的现场代表网络的优势和局限性的形式总结了我们的调查结果。
translated by 谷歌翻译
潜水员在NERF的关键思想和其变体 - 密度模型和体积渲染的关键思想中建立 - 学习可以从少量图像实际渲染的3D对象模型。与所有先前的NERF方法相比,潜水员使用确定性而不是体积渲染积分的随机估计。潜水员的表示是基于体素的功能领域。为了计算卷渲染积分,将光线分为间隔,每个体素;使用MLP的每个间隔的特征估计体渲染积分的组件,并且组件聚合。结果,潜水员可以呈现其他集成商错过的薄半透明结构。此外,潜水员的表示与其他这样的方法相比相对暴露的语义 - 在体素空间中的运动特征向量导致自然编辑。对当前最先进的方法的广泛定性和定量比较表明,潜水员产生(1)在最先进的质量或高于最先进的质量,(2)的情况下非常小而不会被烘烤,(3)在不被烘烤的情况下渲染非常快,并且(4)可以以自然方式编辑。
translated by 谷歌翻译