与图像集表示相比,神经表示表现出非常紧凑的同时非常紧凑的能力表现出了巨大的希望。但是,当前的表示不适合流式传输,因为解码只能以单个详细信息进行,并且需要下载整个神经网络模型。此外,高分辨率光场网络可以表现出闪烁和混叠,因为在没有适当过滤的情况下采样了神经网络。为了解决这些问题,我们提出了一个渐进的多尺度光场网络,该网络编码具有多个细节的光场。较低的细节使用更少的神经网络权重编码,从而可以进行渐进的流和减少渲染时间。我们的渐进多尺度光场网络通过在其较低的细节级别编码较小的反陈述来解决混蛋。此外,每个像素的细节级别使我们的表示能够支持抖动的过渡和浮动渲染。
translated by 谷歌翻译
Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF (Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website: https://github.com/facebookresearch/NSVF.
translated by 谷歌翻译
基于坐标的网络成为3D表示和场景重建的强大工具。这些网络训练以将连续输入坐标映射到每个点处的信号的值。尽管如此,当前的架构是黑色盒子:不能轻易分析它们的光谱特性,并且在无监督点处的行为难以预测。此外,这些网络通常接受训练以以单个刻度表示信号,并且如此天真的下采样或上采样导致伪像。我们引入带限量坐标网络(BACON),具有分析傅里叶谱的网络架构。培根在无监督点处具有可预测的行为,可以基于所代表信号的光谱特性设计,并且可以在没有明确的监督的情况下代表多个尺度的信号。我们向培根展示用于使用符号距离功能的图像,辐射字段和3D场景的多尺度神经表示的培根,并表明它在可解释性和质量方面优于传统的单尺度坐标网络。
translated by 谷歌翻译
在本文中,我们为复杂场景进行了高效且强大的深度学习解决方案。在我们的方法中,3D场景表示为光场,即,一组光线,每组在到达图像平面时具有相应的颜色。对于高效的新颖视图渲染,我们采用了光场的双面参数化,其中每个光线的特征在于4D参数。然后,我们将光场配向作为4D函数,即将4D坐标映射到相应的颜色值。我们训练一个深度完全连接的网络以优化这种隐式功能并记住3D场景。然后,特定于场景的模型用于综合新颖视图。与以前需要密集的视野的方法不同,需要密集的视野采样来可靠地呈现新颖的视图,我们的方法可以通过采样光线来呈现新颖的视图并直接从网络查询每种光线的颜色,从而使高质量的灯场呈现稀疏集合训练图像。网络可以可选地预测每光深度,从而使诸如自动重新焦点的应用。我们的小说视图合成结果与最先进的综合结果相当,甚至在一些具有折射和反射的具有挑战性的场景中优越。我们在保持交互式帧速率和小的内存占地面积的同时实现这一点。
translated by 谷歌翻译
We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800×800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of scenes with arbitrary geometry and view-dependent effects. Real-time performance is achieved by pre-tabulating the NeRF into a PlenOctree. In order to preserve viewdependent effects such as specularities, we factorize the appearance via closed-form spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss, which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step can be used to reduce the training time, as we no longer need to wait for the NeRF training to converge fully. Our real-time neural rendering approach may potentially enable new applications such as 6-DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to in-browser rendering as well; please visit the project page for the interactive online demo, as well as video and code: https://alexyu. net/plenoctrees.
translated by 谷歌翻译
神经辐射场(NERFS)产生最先进的视图合成结果。然而,它们慢渲染,需要每像素数百个网络评估,以近似卷渲染积分。将nerfs烘烤到明确的数据结构中实现了有效的渲染,但导致内存占地面积的大幅增加,并且在许多情况下,质量降低。在本文中,我们提出了一种新的神经光场表示,相反,相反,紧凑,直接预测沿线的集成光线。我们的方法支持使用每个像素的单个网络评估,用于小基线光场数据集,也可以应用于每个像素的几个评估的较大基线。在我们的方法的核心,是一个光线空间嵌入网络,将4D射线空间歧管映射到中间可间可动子的潜在空间中。我们的方法在诸如斯坦福光场数据集等密集的前置数据集中实现了最先进的质量。此外,对于带有稀疏输入的面对面的场景,我们可以在质量方面实现对基于NERF的方法具有竞争力的结果,同时提供更好的速度/质量/内存权衡,网络评估较少。
translated by 谷歌翻译
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (nonconvolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
translated by 谷歌翻译
NeRF synthesizes novel views of a scene with unprecedented quality by fitting a neural radiance field to RGB images. However, NeRF requires querying a deep Multi-Layer Perceptron (MLP) millions of times, leading to slow rendering times, even on modern GPUs. In this paper, we demonstrate that real-time rendering is possible by utilizing thousands of tiny MLPs instead of one single large MLP. In our setting, each individual MLP only needs to represent parts of the scene, thus smaller and faster-to-evaluate MLPs can be used. By combining this divide-and-conquer strategy with further optimizations, rendering is accelerated by three orders of magnitude compared to the original NeRF model without incurring high storage costs. Further, using teacher-student distillation for training, we show that this speed-up can be achieved without sacrificing visual quality.
translated by 谷歌翻译
The rendering procedure used by neural radiance fields (NeRF) samples a scene with a single ray per pixel and may therefore produce renderings that are excessively blurred or aliased when training or testing images observe scene content at different resolutions. The straightforward solution of supersampling by rendering with multiple rays per pixel is impractical for NeRF, because rendering each ray requires querying a multilayer perceptron hundreds of times. Our solution, which we call "mip-NeRF" (à la "mipmap"), extends NeRF to represent the scene at a continuously-valued scale. By efficiently rendering anti-aliased conical frustums instead of rays, mip-NeRF reduces objectionable aliasing artifacts and significantly improves NeRF's ability to represent fine details, while also being 7% faster than NeRF and half the size. Compared to NeRF, mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset that we present. Mip-NeRF is also able to match the accuracy of a brute-force supersampled NeRF on our multiscale dataset while being 22× faster.
translated by 谷歌翻译
Recent efforts in Neural Rendering Fields (NeRF) have shown impressive results on novel view synthesis by utilizing implicit neural representation to represent 3D scenes. Due to the process of volumetric rendering, the inference speed for NeRF is extremely slow, limiting the application scenarios of utilizing NeRF on resource-constrained hardware, such as mobile devices. Many works have been conducted to reduce the latency of running NeRF models. However, most of them still require high-end GPU for acceleration or extra storage memory, which is all unavailable on mobile devices. Another emerging direction utilizes the neural light field (NeLF) for speedup, as only one forward pass is performed on a ray to predict the pixel color. Nevertheless, to reach a similar rendering quality as NeRF, the network in NeLF is designed with intensive computation, which is not mobile-friendly. In this work, we propose an efficient network that runs in real-time on mobile devices for neural rendering. We follow the setting of NeLF to train our network. Unlike existing works, we introduce a novel network architecture that runs efficiently on mobile devices with low latency and small size, i.e., saving $15\times \sim 24\times$ storage compared with MobileNeRF. Our model achieves high-resolution generation while maintaining real-time inference for both synthetic and real-world scenes on mobile devices, e.g., $18.04$ms (iPhone 13) for rendering one $1008\times756$ image of real 3D scenes. Additionally, we achieve similar image quality as NeRF and better quality than MobileNeRF (PSNR $26.15$ vs. $25.91$ on the real-world forward-facing dataset).
translated by 谷歌翻译
神经辐射场(NERFS)表现出惊人的能力,可以从新颖的观点中综合3D场景的图像。但是,他们依赖于基于射线行进的专门体积渲染算法,这些算法与广泛部署的图形硬件的功能不匹配。本文介绍了基于纹理多边形的新的NERF表示形式,该表示可以有效地与标准渲染管道合成新型图像。 NERF表示为一组多边形,其纹理代表二进制不相处和特征向量。用Z-Buffer对多边形的传统渲染产生了每个像素的图像,该图像由在片段着色器中运行的小型,观点依赖的MLP来解释,以产生最终的像素颜色。这种方法使NERF可以使用传统的Polygon栅格化管道渲染,该管道提供了庞大的像素级并行性,从而在包括移动电话在内的各种计算平台上实现了交互式帧速率。
translated by 谷歌翻译
我们探索了基于神经光场表示的几种新颖观点合成的新策略。给定目标摄像头姿势,隐式神经网络将每个射线映射到其目标像素的颜色。该网络的条件是根据来自显式3D特征量的粗量渲染产生的本地射线特征。该卷是由使用3D Convnet的输入图像构建的。我们的方法在基于最先进的神经辐射场竞争方面,在合成和真实MVS数据上实现了竞争性能,同时提供了100倍的渲染速度。
translated by 谷歌翻译
标量和矢量场的神经近似(例如签名距离函数和辐射场)已成为准确的高质量表示。最先进的结果是通过从可训练的特征网格中进行查找的调节来获得的,这些近似是按照学习任务的一部分,并允许较小,更有效的神经网络。不幸的是,与独立的神经网络模型相比,这些特征网格通常以明显增加的记忆消耗成本。我们提出了一种词典方法,用于压缩此类特征网格,将其内存消耗降低至100倍,并允许多分辨率表示,这对于核心外流很有用。我们将词典优化作为矢量定量的自动码头问题提出,使我们能够在没有直接监督以及具有动态拓扑和结构的空间中学习端到端离散的神经表示。我们的源代码将在https://github.com/nv-tlabs/vqad上找到。
translated by 谷歌翻译
我们提出了一种可区分的渲染算法,以进行有效的新型视图合成。通过偏离基于音量的表示,支持学习点表示,我们在训练和推理方面的内存和运行时范围内改进了现有方法的数量级。该方法从均匀采样的随机点云开始,并使用基于可区分的SPLAT渲染器来发展模型以匹配一组输入图像,从而学习了每点位置和观看依赖性外观。在训练和推理中,我们的方法比NERF快300倍,质量只有边缘牺牲,而在静态场景中使用少于10 〜MB的记忆。对于动态场景,我们的方法比Stnerf训练两个数量级,并以接近互动速率渲染,同时即使在不施加任何时间固定的正则化合物的情况下保持较高的图像质量和时间连贯性。
translated by 谷歌翻译
我们介绍了Plenoxels(plenoptic voxels),是一种光电型观测合成系统。Plenoxels表示作为具有球形谐波的稀疏3D网格的场景。该表示可以通过梯度方法和正则化从校准图像进行优化,而没有任何神经元件。在标准,基准任务中,Plenoxels优化了比神经辐射场更快的两个数量级,无需视觉质量损失。
translated by 谷歌翻译
我们提出了HRF-NET,这是一种基于整体辐射场的新型视图合成方法,该方法使用一组稀疏输入来呈现新视图。最近的概括视图合成方法还利用了光辉场,但渲染速度不是实时的。现有的方法可以有效地训练和呈现新颖的观点,但它们无法概括地看不到场景。我们的方法解决了用于概括视图合成的实时渲染问题,并由两个主要阶段组成:整体辐射场预测指标和基于卷积的神经渲染器。该架构不仅基于隐式神经场的一致场景几何形状,而且还可以使用单个GPU有效地呈现新视图。我们首先在DTU数据集的多个3D场景上训练HRF-NET,并且网络只能仅使用光度损耗就看不见的真实和合成数据产生合理的新视图。此外,我们的方法可以利用单个场景的密集参考图像集来产生准确的新颖视图,而无需依赖其他明确表示,并且仍然保持了预训练模型的高速渲染。实验结果表明,HRF-NET优于各种合成和真实数据集的最先进的神经渲染方法。
translated by 谷歌翻译
神经辐射场(NERF)是数据驱动3D重建中的流行方法。鉴于其简单性和高质量的渲染,正在开发许多NERF应用程序。但是,NERF的大量的速度很大。许多尝试如何加速NERF培训和推理,包括复杂的代码级优化和缓存,使用复杂的数据结构以及通过多任务和元学习的摊销。在这项工作中,我们通过NERF之前通过经典技术镜头重新审视NERF的基本构建块。我们提出了Voxel-Accelated Nerf(VaxnerF),与Visual Hull集成了Nerf,一种经典的3D重建技术,只需要每张图像的二进制前景背景像素标签。可视船体,可在大约10秒内优化,可以提供粗略的现场分离,以省略NERF中的大量网络评估。我们在流行的JAXNERF Codebase提供了一个干净的全力验光,基于JAX的实现,其仅包括大约30行的代码更改和模块化视觉船体子程序,并在高度表现的JAXNERF之上实现了大约2-8倍的速度学习基线具有零劣化呈现质量。具有足够的计算,这有效地将单位训练从小时到30分钟缩小到30分钟。我们希望VAXNERF - 一种仔细组合具有深入方法的经典技术(可谓更换它) - 可以赋予并加速新的NERF扩展和应用,以其简单,可移植性和可靠的性能收益。代码在https://github.com/naruya/vaxnerf提供。
translated by 谷歌翻译
神经辐射场(NERF)在建模3D对象和受控场景中取得了出色的性能,通常在单一的范围内。在这项工作中,我们首次尝试将NERF带到城市规模,观点包括卫星级,旨在捕获一个城市的概览,到地面图像,显示架构的复杂细节。到现场的相机距离的广泛跨度产生了具有不同细节水平和空间覆盖率的多尺度数据,这对Vanilla Nerf产生了极大的挑战,并将其偏向于受损结果。为了解决这些问题,我们介绍了CitynerF,这是一个渐进式学习范例,可以同步地增长NERF模型和训练。从拟合浅景层的遥远的视图开始,随着训练的进展,附加新的块以在越来越近的视图中适应新兴细节。该策略有效地激活了位置编码中的高频信道,并将更复杂的细节展开,因为训练进行了。我们展示了CitalnerF在模拟各种城市规模场景中的优越性,具有巨大的不同视图,以及在不同级别的细节中渲染视图的支持。
translated by 谷歌翻译
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multiview posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. 1
translated by 谷歌翻译
神经辐射场(NERF)在代表3D场景和合成新颖视图中示出了很大的潜力,但是在推理阶段的NERF的计算开销仍然很重。为了减轻负担,我们进入了NERF的粗细分,分层采样过程,并指出粗阶段可以被我们命名神经样本场的轻量级模块代替。所提出的示例场地图光线进入样本分布,可以将其转换为点坐标并进料到radiance字段以进行体积渲染。整体框架被命名为Neusample。我们在现实合成360 $ ^ {\ circ} $和真正的前瞻性,两个流行的3D场景集上进行实验,并表明Neusample在享受更快推理速度时比NERF实现更好的渲染质量。Neusample进一步压缩,以提出的样品场提取方法朝向质量和速度之间的更好的权衡。
translated by 谷歌翻译