最近,神经辐射场(NERF)在重建3D场景并从一组稀疏的2D图像中综合新视图方面表现出了有希望的表演。尽管有效,但NERF的性能受到训练样品质量的很大影响。由于现场有限的图像,Nerf无法很好地概括到新颖的观点,并可能崩溃到未观察到的区域中的琐碎解决方案。这使得在资源约束的情况下不切实际。在本文中,我们提出了一个新颖的学习框架Activenerf,旨在模拟一个3D场景,并具有限制的输入预算。具体而言,我们首先将不确定性估计纳入NERF模型,该模型在很少的观察下确保了鲁棒性,并提供了NERF如何理解场景的解释。在此基础上,我们建议根据积极学习方案将现有的培训设置补充新捕获的样本。通过评估给定新输入的不确定性的降低,我们选择了带来最多信息增益的样本。这样,可以通过最少的额外资源来提高新型视图合成的质量。广泛的实验验证了我们模型在现实和合成场景上的性能,尤其是在稀缺的训练数据中。代码将在\ url {https://github.com/leaplabthu/activenerf}上发布。
translated by 谷歌翻译
神经辐射场(NERF)在代表3D场景和合成新颖视图中示出了很大的潜力,但是在推理阶段的NERF的计算开销仍然很重。为了减轻负担,我们进入了NERF的粗细分,分层采样过程,并指出粗阶段可以被我们命名神经样本场的轻量级模块代替。所提出的示例场地图光线进入样本分布,可以将其转换为点坐标并进料到radiance字段以进行体积渲染。整体框架被命名为Neusample。我们在现实合成360 $ ^ {\ circ} $和真正的前瞻性,两个流行的3D场景集上进行实验,并表明Neusample在享受更快推理速度时比NERF实现更好的渲染质量。Neusample进一步压缩,以提出的样品场提取方法朝向质量和速度之间的更好的权衡。
translated by 谷歌翻译
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multiview posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. 1
translated by 谷歌翻译
我们提出了一种基于神经隐式表示的少量新型视图综合信息 - 理论正规化技术。所提出的方法最小化由于在每个光线中强制密度的熵约束而发生的潜在的重建不一致。另外,当从几乎冗余的观点获取所有训练图像时,为了减轻潜在的退化问题,我们还通过限制来自一对略微不同观点的光线的信息增益来将空间平滑度约束纳入估计的图像。我们的算法的主要思想是使重建的场景沿各个光线紧凑,并在附近的光线上一致。所提出的常规方基于Nerf以直接的方式插入大部分现有的神经体积渲染技术。尽管其简单性,但是,与现有的神经观察合成方法通过大量标准基准测试的现有神经观察方法相比,我们实现了一致的性能。我们的项目网站可用于\ url {http://cvlab.snu.ac.kr/research/infonerf}。
translated by 谷歌翻译
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (nonconvolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
translated by 谷歌翻译
我们呈现高动态范围神经辐射字段(HDR-NERF),以从一组低动态范围(LDR)视图的HDR辐射率字段与不同的曝光。使用HDR-NERF,我们能够在不同的曝光下生成新的HDR视图和新型LDR视图。我们方法的关键是模拟物理成像过程,该过程决定了场景点的辐射与具有两个隐式功能的LDR图像中的像素值转换为:RADIACE字段和音调映射器。辐射场对场景辐射(值在0到+末端之间的值变化),其通过提供相应的射线源和光线方向来输出光线的密度和辐射。 TONE MAPPER模拟映射过程,即在相机传感器上击中的光线变为像素值。通过将辐射和相应的曝光时间送入音调映射器来预测光线的颜色。我们使用经典的卷渲染技术将输出辐射,颜色和密度投影为HDR和LDR图像,同时只使用输入的LDR图像作为监控。我们收集了一个新的前瞻性的HDR数据集,以评估所提出的方法。综合性和现实世界场景的实验结果验证了我们的方法不仅可以准确控制合成视图的曝光,还可以用高动态范围呈现视图。
translated by 谷歌翻译
由于其显着的合成质量,最近,神经辐射场(NERF)最近对3D场景重建和新颖的视图合成进行了相当大的关注。然而,由散焦或运动引起的图像模糊,这通常发生在野外的场景中,显着降低了其重建质量。为了解决这个问题,我们提出了DeBlur-nerf,这是一种可以从模糊输入恢复尖锐的nerf的第一种方法。我们采用逐合成方法来通过模拟模糊过程来重建模糊的视图,从而使NERF对模糊输入的鲁棒。该仿真的核心是一种新型可变形稀疏内核(DSK)模块,其通过在每个空间位置变形规范稀疏内核来模拟空间变形模糊内核。每个内核点的射线起源是共同优化的,受到物理模糊过程的启发。该模块作为MLP参数化,具有能够概括为各种模糊类型。联合优化NERF和DSK模块允许我们恢复尖锐的NERF。我们证明我们的方法可用于相机运动模糊和散焦模糊:真实场景中的两个最常见的模糊。合成和现实世界数据的评估结果表明,我们的方法优于几个基线。合成和真实数据集以及源代码将公开可用于促进未来的研究。
translated by 谷歌翻译
在本文中,我们为复杂场景进行了高效且强大的深度学习解决方案。在我们的方法中,3D场景表示为光场,即,一组光线,每组在到达图像平面时具有相应的颜色。对于高效的新颖视图渲染,我们采用了光场的双面参数化,其中每个光线的特征在于4D参数。然后,我们将光场配向作为4D函数,即将4D坐标映射到相应的颜色值。我们训练一个深度完全连接的网络以优化这种隐式功能并记住3D场景。然后,特定于场景的模型用于综合新颖视图。与以前需要密集的视野的方法不同,需要密集的视野采样来可靠地呈现新颖的视图,我们的方法可以通过采样光线来呈现新颖的视图并直接从网络查询每种光线的颜色,从而使高质量的灯场呈现稀疏集合训练图像。网络可以可选地预测每光深度,从而使诸如自动重新焦点的应用。我们的小说视图合成结果与最先进的综合结果相当,甚至在一些具有折射和反射的具有挑战性的场景中优越。我们在保持交互式帧速率和小的内存占地面积的同时实现这一点。
translated by 谷歌翻译
Input: 3 views of held-out scene NeRF pixelNeRF Output: Rendered new views Input Novel views Input Novel views Input Novel views Figure 1: NeRF from one or few images. We present pixelNeRF, a learning framework that predicts a Neural Radiance Field (NeRF) representation from a single (top) or few posed images (bottom). PixelNeRF can be trained on a set of multi-view images, allowing it to generate plausible novel view synthesis from very few input images without test-time optimization (bottom left). In contrast, NeRF has no generalization capabilities and performs poorly when only three input views are available (bottom right).
translated by 谷歌翻译
我们提出了一种基于神经辐射场(NERF)的单个$ 360^\ PANORAMA图像合成新视图的方法。在类似环境中的先前研究依赖于多层感知的邻居插值能力来完成由遮挡引起的丢失区域,这导致其预测中的伪像。我们提出了360Fusionnerf,这是一个半监督的学习框架,我们介绍几何监督和语义一致性,以指导渐进式培训过程。首先,将输入图像重新投影至$ 360^\ Circ $图像,并在其他相机位置提取辅助深度图。除NERF颜色指导外,深度监督还改善了合成视图的几何形状。此外,我们引入了语义一致性损失,鼓励新观点的现实渲染。我们使用预先训练的视觉编码器(例如剪辑)提取这些语义功能,这是一个视觉变压器,经过数以千计的不同2D照片,并通过自然语言监督从网络中挖掘出来。实验表明,我们提出的方法可以在保留场景的特征的同时产生未观察到的区域的合理完成。 360fusionnerf在各种场景中接受培训时,转移到合成结构3D数据集(PSNR〜5%,SSIM〜3%lpips〜13%)时,始终达到最先进的性能,SSIM〜3%LPIPS〜9%)和replica360数据集(PSNR〜8%,SSIM〜2%LPIPS〜18%)。
translated by 谷歌翻译
Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF (Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website: https://github.com/facebookresearch/NSVF.
translated by 谷歌翻译
We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs. We build on Neural Radiance Fields (NeRF), which uses the weights of a multilayer perceptron to model the density and color of a scene as a function of 3D coordinates. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders. We introduce a series of extensions to NeRF to address these issues, thereby enabling accurate reconstructions from unstructured image collections taken from the internet. We apply our system, dubbed NeRF-W, to internet photo collections of famous landmarks, and demonstrate temporally consistent novel view renderings that are significantly closer to photorealism than the prior state of the art.
translated by 谷歌翻译
神经辐射场(NERF)最近在新型视图合成中取得了令人印象深刻的结果。但是,以前的NERF作品主要关注以对象为中心的方案。在这项工作中,我们提出了360ROAM,这是一种新颖的场景级NERF系统,可以实时合成大型室内场景的图像并支持VR漫游。我们的系统首先从多个输入$ 360^\ circ $图像构建全向神经辐射场360NERF。然后,我们逐步估算一个3D概率的占用图,该概率占用图代表了空间密度形式的场景几何形状。跳过空的空间和上采样占据的体素本质上可以使我们通过以几何学意识的方式使用360NERF加速量渲染。此外,我们使用自适应划分和扭曲策略来减少和调整辐射场,以进一步改进。从占用地图中提取的场景的平面图可以为射线采样提供指导,并促进现实的漫游体验。为了显示我们系统的功效,我们在各种场景中收集了$ 360^\ Circ $图像数据集并进行广泛的实验。基线之间的定量和定性比较说明了我们在复杂室内场景的新型视图合成中的主要表现。
translated by 谷歌翻译
神经辐射场(NERFS)产生最先进的视图合成结果。然而,它们慢渲染,需要每像素数百个网络评估,以近似卷渲染积分。将nerfs烘烤到明确的数据结构中实现了有效的渲染,但导致内存占地面积的大幅增加,并且在许多情况下,质量降低。在本文中,我们提出了一种新的神经光场表示,相反,相反,紧凑,直接预测沿线的集成光线。我们的方法支持使用每个像素的单个网络评估,用于小基线光场数据集,也可以应用于每个像素的几个评估的较大基线。在我们的方法的核心,是一个光线空间嵌入网络,将4D射线空间歧管映射到中间可间可动子的潜在空间中。我们的方法在诸如斯坦福光场数据集等密集的前置数据集中实现了最先进的质量。此外,对于带有稀疏输入的面对面的场景,我们可以在质量方面实现对基于NERF的方法具有竞争力的结果,同时提供更好的速度/质量/内存权衡,网络评估较少。
translated by 谷歌翻译
神经场景表示,例如神经辐射场(NERF),基于训练多层感知器(MLP),使用一组具有已知姿势的彩色图像。现在,越来越多的设备产生RGB-D(颜色 +深度)信息,这对于各种任务非常重要。因此,本文的目的是通过将深度信息与颜色图像结合在一起,研究这些有希望的隐式表示可以进行哪些改进。特别是,最近建议的MIP-NERF方法使用圆锥形的圆丝而不是射线进行音量渲染,它使人们可以考虑具有距离距离摄像头中心距离的像素的不同区域。所提出的方法还模拟了深度不确定性。这允许解决基于NERF的方法的主要局限性,包括提高几何形状的准确性,减少伪像,更快的训练时间和缩短预测时间。实验是在众所周知的基准场景上进行的,并且比较在场景几何形状和光度重建中的准确性提高,同时将训练时间减少了3-5次。
translated by 谷歌翻译
Point of View & TimeFigure 1: We propose D-NeRF, a method for synthesizing novel views, at an arbitrary point in time, of dynamic scenes with complex non-rigid geometries. We optimize an underlying deformable volumetric function from a sparse set of input monocular views without the need of ground-truth geometry nor multi-view images. The figure shows two scenes under variable points of view and time instances synthesised by the proposed model.
translated by 谷歌翻译
神经辐射场(NERF)及其变体在代表3D场景和合成照片现实的小说视角方面取得了巨大成功。但是,它们通常基于针孔摄像头模型,并假设全焦点输入。这限制了它们的适用性,因为从现实世界中捕获的图像通常具有有限的场地(DOF)。为了减轻此问题,我们介绍了DOF-NERF,这是一种新型的神经渲染方法,可以处理浅的DOF输入并可以模拟DOF效应。特别是,它扩展了NERF,以模拟按照几何光学的原理模拟镜头的光圈。这样的物理保证允许DOF-NERF使用不同的焦点配置操作视图。 DOF-NERF受益于显式光圈建模,还可以通过调整虚拟光圈和焦点参数来直接操纵DOF效果。它是插件,可以插入基于NERF的框架中。关于合成和现实世界数据集的实验表明,DOF-NERF不仅在全焦点设置中与NERF相当,而且可以合成以浅DOF输入为条件的全焦点新型视图。还展示了DOF-nerf在DOF渲染上的有趣应用。源代码将在https://github.com/zijinwuzijin/dof-nerf上提供。
translated by 谷歌翻译
学习辐射场对新型视图综合显示出了显着的结果。学习过程通常会花费大量时间,这激发了最新方法,通过没有神经网络或使用更有效的数据结构来通过学习来加快学习过程。但是,这些专门设计的方法不适用于大多数基于辐射的方法的方法。为了解决此问题,我们引入了一项一般策略,以加快几乎所有基于辐射的方法的学习过程。我们的关键想法是通过在多视图卷渲染过程中拍摄较少的射线来减少冗余,这是几乎所有基于辐射的方法的基础。我们发现,在具有巨大色彩变化的像素上的射击不仅显着减轻了训练负担,而且几乎不会影响学到的辐射场的准确性。此外,我们还根据树中每个节点的平均渲染误差将每个视图自适应地细分为Quadtree,这使我们在更复杂的区域中动态射击更多的射线,并具有较大的渲染误差。我们在广泛使用的基准下使用不同的基于辐射的方法评估我们的方法。实验结果表明,我们的方法通过更快的训练获得了与最先进的可比精度。
translated by 谷歌翻译
We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800×800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of scenes with arbitrary geometry and view-dependent effects. Real-time performance is achieved by pre-tabulating the NeRF into a PlenOctree. In order to preserve viewdependent effects such as specularities, we factorize the appearance via closed-form spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss, which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step can be used to reduce the training time, as we no longer need to wait for the NeRF training to converge fully. Our real-time neural rendering approach may potentially enable new applications such as 6-DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to in-browser rendering as well; please visit the project page for the interactive online demo, as well as video and code: https://alexyu. net/plenoctrees.
translated by 谷歌翻译
由于其简单性和最先进的性能,神经辐射场(NERF)被出现为新型视图综合任务的强大表示。虽然NERF可以在许多输入视图可用时产生看不见的观点的光静观渲染,但是当该数量减少时,其性能显着下降。我们观察到,稀疏输入方案中的大多数伪像是由估计场景几何中的错误引起的,并且在训练开始时通过不同的行为引起。我们通过规范从未观察的视点呈现的修补程序的几何和外观来解决这一点,并在训练期间退火光线采样空间。我们还使用规范化的流模型来规范未观察的视点的颜色。我们的车型不仅优于优化单个场景的其他方法,而是在许多情况下,还有条件模型,这些模型在大型多视图数据集上广泛预先培训。
translated by 谷歌翻译