The rendering procedure used by neural radiance fields (NeRF) samples a scene with a single ray per pixel and may therefore produce renderings that are excessively blurred or aliased when training or testing images observe scene content at different resolutions. The straightforward solution of supersampling by rendering with multiple rays per pixel is impractical for NeRF, because rendering each ray requires querying a multilayer perceptron hundreds of times. Our solution, which we call "mip-NeRF" (à la "mipmap"), extends NeRF to represent the scene at a continuously-valued scale. By efficiently rendering anti-aliased conical frustums instead of rays, mip-NeRF reduces objectionable aliasing artifacts and significantly improves NeRF's ability to represent fine details, while also being 7% faster than NeRF and half the size. Compared to NeRF, mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset that we present. Mip-NeRF is also able to match the accuracy of a brute-force supersampled NeRF on our multiscale dataset while being 22× faster.
translated by 谷歌翻译
虽然神经辐射场(NERF)已经证明了令人印象深刻的视图合成结果对物体和小型空间区域的结果,但它们在“无界”场景上挣扎,其中相机可以在任何方向上点,并且内容在任何距离处都存在。在此设置中,现有的形式的类似形式模型通常会产生模糊或低分辨率渲染(由于附近和远处物体的不平衡细节和规模),慢慢训练,并且由于任务的固有歧义而可能表现出伪影从一小部分图像重建大场景。我们介绍了MIP-NERF(一个NERF变体,用于解决采样和混叠的NERF变体),其使用非线性场景参数化,在线蒸馏和基于新的失真的常规程序来克服无限性场景所呈现的挑战。我们的模型,我们将“MIP-NERF 360”为瞄准相机围绕一点旋转360度的瞄准场景,与MIP NERF相比将平均平方误差减少54%,并且能够产生逼真的合成视图和用于高度复杂,无限性的现实景区的详细深度图。
translated by 谷歌翻译
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (nonconvolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
translated by 谷歌翻译
神经辐射场(NERF)是一种普遍的视图综合技术,其表示作为连续体积函数的场景,由多层的感知来参数化,其提供每个位置处的体积密度和视图相关的发射辐射。虽然基于NERF的技术在代表精细的几何结构时,具有平稳变化的视图依赖性外观,但它们通常无法精确地捕获和再现光泽表面的外观。我们通过引入Ref-nerf来解决这些限制,该ref-nerf替换了nerf的视图依赖性输出辐射的参数化,使用反射辐射的表示和使用空间不同场景属性的集合来构造该函数的表示。我们展示了与正常载体上的规范器一起,我们的模型显着提高了镜面反射的现实主义和准确性。此外,我们表明我们的模型的外向光线的内部表示是可解释的,可用于场景编辑。
translated by 谷歌翻译
神经辐射场(NERFS)产生最先进的视图合成结果。然而,它们慢渲染,需要每像素数百个网络评估,以近似卷渲染积分。将nerfs烘烤到明确的数据结构中实现了有效的渲染,但导致内存占地面积的大幅增加,并且在许多情况下,质量降低。在本文中,我们提出了一种新的神经光场表示,相反,相反,紧凑,直接预测沿线的集成光线。我们的方法支持使用每个像素的单个网络评估,用于小基线光场数据集,也可以应用于每个像素的几个评估的较大基线。在我们的方法的核心,是一个光线空间嵌入网络,将4D射线空间歧管映射到中间可间可动子的潜在空间中。我们的方法在诸如斯坦福光场数据集等密集的前置数据集中实现了最先进的质量。此外,对于带有稀疏输入的面对面的场景,我们可以在质量方面实现对基于NERF的方法具有竞争力的结果,同时提供更好的速度/质量/内存权衡,网络评估较少。
translated by 谷歌翻译
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multiview posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. 1
translated by 谷歌翻译
潜水员在NERF的关键思想和其变体 - 密度模型和体积渲染的关键思想中建立 - 学习可以从少量图像实际渲染的3D对象模型。与所有先前的NERF方法相比,潜水员使用确定性而不是体积渲染积分的随机估计。潜水员的表示是基于体素的功能领域。为了计算卷渲染积分,将光线分为间隔,每个体素;使用MLP的每个间隔的特征估计体渲染积分的组件,并且组件聚合。结果,潜水员可以呈现其他集成商错过的薄半透明结构。此外,潜水员的表示与其他这样的方法相比相对暴露的语义 - 在体素空间中的运动特征向量导致自然编辑。对当前最先进的方法的广泛定性和定量比较表明,潜水员产生(1)在最先进的质量或高于最先进的质量,(2)的情况下非常小而不会被烘烤,(3)在不被烘烤的情况下渲染非常快,并且(4)可以以自然方式编辑。
translated by 谷歌翻译
综合照片 - 现实图像和视频是计算机图形的核心,并且是几十年的研究焦点。传统上,使用渲染算法(如光栅化或射线跟踪)生成场景的合成图像,其将几何形状和材料属性的表示为输入。统称,这些输入定义了实际场景和呈现的内容,并且被称为场景表示(其中场景由一个或多个对象组成)。示例场景表示是具有附带纹理的三角形网格(例如,由艺术家创建),点云(例如,来自深度传感器),体积网格(例如,来自CT扫描)或隐式曲面函数(例如,截短的符号距离)字段)。使用可分辨率渲染损耗的观察结果的这种场景表示的重建被称为逆图形或反向渲染。神经渲染密切相关,并将思想与经典计算机图形和机器学习中的思想相结合,以创建用于合成来自真实观察图像的图像的算法。神经渲染是朝向合成照片现实图像和视频内容的目标的跨越。近年来,我们通过数百个出版物显示了这一领域的巨大进展,这些出版物显示了将被动组件注入渲染管道的不同方式。这种最先进的神经渲染进步的报告侧重于将经典渲染原则与学习的3D场景表示结合的方法,通常现在被称为神经场景表示。这些方法的一个关键优势在于它们是通过设计的3D-一致,使诸如新颖的视点合成捕获场景的应用。除了处理静态场景的方法外,我们还涵盖了用于建模非刚性变形对象的神经场景表示...
translated by 谷歌翻译
本文提出了一种用等值的全向图像重建神经辐射场的方法。带有辐射场的隐式神经场景表示可以在有限的空间区域内连续重建场景的3D形状。但是,培训商用PC硬件的完全隐式表示需要大量时间和计算资源(15 $ \ sim $ 20小时每场景20小时)。因此,我们提出了一种显着加速此过程的方法(每个场景20 $ \ sim $ 40分钟)。我们采用特征体素,而不是使用辐射场重建的光线的完全隐式表示,而是在张量中包含密度和颜色特征的特征体素。考虑全向等值输入和相机布局,我们使用球形素化来表示表示而不是立方表示。我们的体素化方法可以平衡内部场景和外部场景的重建质量。此外,我们在颜色特征上采用了与轴对准的位置编码方法,以提高总图像质量。我们的方法可以在随机摄像头姿势上实现满足合成数据集的经验性能。此外,我们使用包含复杂几何形状并实现最先进性能的真实场景测试我们的方法。我们的代码和完整数据集将与纸质出版物同时发布。
translated by 谷歌翻译
神经场景表示,例如神经辐射场(NERF),基于训练多层感知器(MLP),使用一组具有已知姿势的彩色图像。现在,越来越多的设备产生RGB-D(颜色 +深度)信息,这对于各种任务非常重要。因此,本文的目的是通过将深度信息与颜色图像结合在一起,研究这些有希望的隐式表示可以进行哪些改进。特别是,最近建议的MIP-NERF方法使用圆锥形的圆丝而不是射线进行音量渲染,它使人们可以考虑具有距离距离摄像头中心距离的像素的不同区域。所提出的方法还模拟了深度不确定性。这允许解决基于NERF的方法的主要局限性,包括提高几何形状的准确性,减少伪像,更快的训练时间和缩短预测时间。实验是在众所周知的基准场景上进行的,并且比较在场景几何形状和光度重建中的准确性提高,同时将训练时间减少了3-5次。
translated by 谷歌翻译
基于坐标的网络成为3D表示和场景重建的强大工具。这些网络训练以将连续输入坐标映射到每个点处的信号的值。尽管如此,当前的架构是黑色盒子:不能轻易分析它们的光谱特性,并且在无监督点处的行为难以预测。此外,这些网络通常接受训练以以单个刻度表示信号,并且如此天真的下采样或上采样导致伪像。我们引入带限量坐标网络(BACON),具有分析傅里叶谱的网络架构。培根在无监督点处具有可预测的行为,可以基于所代表信号的光谱特性设计,并且可以在没有明确的监督的情况下代表多个尺度的信号。我们向培根展示用于使用符号距离功能的图像,辐射字段和3D场景的多尺度神经表示的培根,并表明它在可解释性和质量方面优于传统的单尺度坐标网络。
translated by 谷歌翻译
We present a method that takes as input a set of images of a scene illuminated by unconstrained known lighting, and produces as output a 3D representation that can be rendered from novel viewpoints under arbitrary lighting conditions. Our method represents the scene as a continuous volumetric function parameterized as MLPs whose inputs are a 3D location and whose outputs are the following scene properties at that input location: volume density, surface normal, material parameters, distance to the first surface intersection in any direction, and visibility of the external environment in any direction. Together, these allow us to render novel views of the object under arbitrary lighting, including indirect illumination effects. The predicted visibility and surface intersection fields are critical to our model's ability to simulate direct and indirect illumination during training, because the brute-force techniques used by prior work are intractable for lighting conditions outside of controlled setups with a single light. Our method outperforms alternative approaches for recovering relightable 3D scene representations, and performs well in complex lighting settings that have posed a significant challenge to prior work.
translated by 谷歌翻译
We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800×800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of scenes with arbitrary geometry and view-dependent effects. Real-time performance is achieved by pre-tabulating the NeRF into a PlenOctree. In order to preserve viewdependent effects such as specularities, we factorize the appearance via closed-form spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss, which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step can be used to reduce the training time, as we no longer need to wait for the NeRF training to converge fully. Our real-time neural rendering approach may potentially enable new applications such as 6-DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to in-browser rendering as well; please visit the project page for the interactive online demo, as well as video and code: https://alexyu. net/plenoctrees.
translated by 谷歌翻译
We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs. We build on Neural Radiance Fields (NeRF), which uses the weights of a multilayer perceptron to model the density and color of a scene as a function of 3D coordinates. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders. We introduce a series of extensions to NeRF to address these issues, thereby enabling accurate reconstructions from unstructured image collections taken from the internet. We apply our system, dubbed NeRF-W, to internet photo collections of famous landmarks, and demonstrate temporally consistent novel view renderings that are significantly closer to photorealism than the prior state of the art.
translated by 谷歌翻译
新型视图综合的古典光场渲染可以准确地再现视图依赖性效果,例如反射,折射和半透明,但需要一个致密的视图采样的场景。基于几何重建的方法只需要稀疏的视图,但不能准确地模拟非兰伯语的效果。我们介绍了一个模型,它结合了强度并减轻了这两个方向的局限性。通过在光场的四维表示上操作,我们的模型学会准确表示依赖视图效果。通过在训练和推理期间强制执行几何约束,从稀疏的视图集中毫无屏蔽地学习场景几何。具体地,我们介绍了一种基于两级变压器的模型,首先沿着ePipoll线汇总特征,然后沿参考视图聚合特征以产生目标射线的颜色。我们的模型在多个前进和360 {\ DEG}数据集中优于最先进的,具有较大的差别依赖变化的场景更大的边缘。
translated by 谷歌翻译
神经辐射场(NERF)最近在新型视图合成中取得了令人印象深刻的结果。但是,以前的NERF作品主要关注以对象为中心的方案。在这项工作中,我们提出了360ROAM,这是一种新颖的场景级NERF系统,可以实时合成大型室内场景的图像并支持VR漫游。我们的系统首先从多个输入$ 360^\ circ $图像构建全向神经辐射场360NERF。然后,我们逐步估算一个3D概率的占用图,该概率占用图代表了空间密度形式的场景几何形状。跳过空的空间和上采样占据的体素本质上可以使我们通过以几何学意识的方式使用360NERF加速量渲染。此外,我们使用自适应划分和扭曲策略来减少和调整辐射场,以进一步改进。从占用地图中提取的场景的平面图可以为射线采样提供指导,并促进现实的漫游体验。为了显示我们系统的功效,我们在各种场景中收集了$ 360^\ Circ $图像数据集并进行广泛的实验。基线之间的定量和定性比较说明了我们在复杂室内场景的新型视图合成中的主要表现。
translated by 谷歌翻译
Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF (Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website: https://github.com/facebookresearch/NSVF.
translated by 谷歌翻译
我们介绍了Plenoxels(plenoptic voxels),是一种光电型观测合成系统。Plenoxels表示作为具有球形谐波的稀疏3D网格的场景。该表示可以通过梯度方法和正则化从校准图像进行优化,而没有任何神经元件。在标准,基准任务中,Plenoxels优化了比神经辐射场更快的两个数量级,无需视觉质量损失。
translated by 谷歌翻译
神经辐射场(NERF)是数据驱动3D重建中的流行方法。鉴于其简单性和高质量的渲染,正在开发许多NERF应用程序。但是,NERF的大量的速度很大。许多尝试如何加速NERF培训和推理,包括复杂的代码级优化和缓存,使用复杂的数据结构以及通过多任务和元学习的摊销。在这项工作中,我们通过NERF之前通过经典技术镜头重新审视NERF的基本构建块。我们提出了Voxel-Accelated Nerf(VaxnerF),与Visual Hull集成了Nerf,一种经典的3D重建技术,只需要每张图像的二进制前景背景像素标签。可视船体,可在大约10秒内优化,可以提供粗略的现场分离,以省略NERF中的大量网络评估。我们在流行的JAXNERF Codebase提供了一个干净的全力验光,基于JAX的实现,其仅包括大约30行的代码更改和模块化视觉船体子程序,并在高度表现的JAXNERF之上实现了大约2-8倍的速度学习基线具有零劣化呈现质量。具有足够的计算,这有效地将单位训练从小时到30分钟缩小到30分钟。我们希望VAXNERF - 一种仔细组合具有深入方法的经典技术(可谓更换它) - 可以赋予并加速新的NERF扩展和应用,以其简单,可移植性和可靠的性能收益。代码在https://github.com/naruya/vaxnerf提供。
translated by 谷歌翻译
NeRF synthesizes novel views of a scene with unprecedented quality by fitting a neural radiance field to RGB images. However, NeRF requires querying a deep Multi-Layer Perceptron (MLP) millions of times, leading to slow rendering times, even on modern GPUs. In this paper, we demonstrate that real-time rendering is possible by utilizing thousands of tiny MLPs instead of one single large MLP. In our setting, each individual MLP only needs to represent parts of the scene, thus smaller and faster-to-evaluate MLPs can be used. By combining this divide-and-conquer strategy with further optimizations, rendering is accelerated by three orders of magnitude compared to the original NeRF model without incurring high storage costs. Further, using teacher-student distillation for training, we show that this speed-up can be achieved without sacrificing visual quality.
translated by 谷歌翻译