b) MVS-NeRF no fine-tuning c) MVS-NeRF 6 min fine-tuning d) NeRF 5.1h optimization a) Source views SSIM:0.766 SSIM: 0.923 SSIM:0.924 * Equal contribution Research done when Anpei Chen was in a remote internship with UCSD.generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.
translated by 谷歌翻译
Volumetric neural rendering methods like NeRF generate high-quality view synthesis results but are optimized per-scene leading to prohibitive reconstruction time. On the other hand, deep multi-view stereo methods can quickly reconstruct scene geometry via direct network inference. Point-NeRF combines the advantages of these two approaches by using neural 3D point clouds, with associated neural features, to model a radiance field. Point-NeRF can be rendered efficiently by aggregating neural point features near scene surfaces, in a ray marching-based rendering pipeline. Moreover, Point-NeRF can be initialized via direct inference of a pre-trained deep network to produce a neural point cloud; this point cloud can be finetuned to surpass the visual quality of NeRF with 30X faster training time. Point-NeRF can be combined with other 3D reconstruction methods and handles the errors and outliers in such methods via a novel pruning and growing mechanism. The experiments on the DTU, the NeRF Synthetics , the ScanNet and the Tanks and Temples datasets demonstrate Point-NeRF can surpass the existing methods and achieve the state-of-the-art results.
translated by 谷歌翻译
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multiview posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. 1
translated by 谷歌翻译
我们提出了HRF-NET,这是一种基于整体辐射场的新型视图合成方法,该方法使用一组稀疏输入来呈现新视图。最近的概括视图合成方法还利用了光辉场,但渲染速度不是实时的。现有的方法可以有效地训练和呈现新颖的观点,但它们无法概括地看不到场景。我们的方法解决了用于概括视图合成的实时渲染问题,并由两个主要阶段组成:整体辐射场预测指标和基于卷积的神经渲染器。该架构不仅基于隐式神经场的一致场景几何形状,而且还可以使用单个GPU有效地呈现新视图。我们首先在DTU数据集的多个3D场景上训练HRF-NET,并且网络只能仅使用光度损耗就看不见的真实和合成数据产生合理的新视图。此外,我们的方法可以利用单个场景的密集参考图像集来产生准确的新颖视图,而无需依赖其他明确表示,并且仍然保持了预训练模型的高速渲染。实验结果表明,HRF-NET优于各种合成和真实数据集的最先进的神经渲染方法。
translated by 谷歌翻译
本文旨在减少透明辐射场的渲染时间。一些最近的作品用图像编码器配备了神经辐射字段,能够跨越场景概括,这避免了每场景优化。但是,它们的渲染过程通常很慢。主要因素是,在推断辐射场时,它们在空间中的大量点。在本文中,我们介绍了一个混合场景表示,它结合了最佳的隐式辐射场和显式深度映射,以便有效渲染。具体地,我们首先构建级联成本量,以有效地预测场景的粗糙几何形状。粗糙几何允许我们在场景表面附近的几个点来样,并显着提高渲染速度。该过程是完全可疑的,使我们能够仅从RGB图像共同学习深度预测和辐射现场网络。实验表明,该方法在DTU,真正的前瞻性和NERF合成数据集上展示了最先进的性能,而不是比以前的最可推广的辐射现场方法快至少50倍。我们还展示了我们的方法实时综合动态人类执行者的自由观点视频。代码将在https://zju3dv.github.io/enerf/处提供。
translated by 谷歌翻译
我们呈现Geonerf,一种基于神经辐射场的完全光电素质性新颖性研究综合方法。我们的方法由两个主要阶段组成:几何推理和渲染器。为了渲染新颖的视图,几何件推理首先为每个附近的源视图构造级联成本卷。然后,使用基于变压器的注意力机制和级联成本卷,渲染器Infers的几何和外观,并通过经典音量渲染技术呈现细节的图像。特别是该架构允许复杂的遮挡推理,从一致的源视图中收集信息。此外,我们的方法可以在单个场景中轻松进行微调,通过每场比较优化的神经渲染方法呈现竞争结果,其数量是计算成本。实验表明,Geonerf优于各种合成和实时数据集的最先进的最新神经渲染模型。最后,随着对几何推理的略微修改,我们还提出了一种适应RGBD图像的替代模型。由于深度传感器,该模型通常直接利用经常使用的深度信息。实施代码将公开可用。
translated by 谷歌翻译
最近的神经人类表示可以产生高质量的多视图渲染,但需要使用密集的多视图输入和昂贵的培训。因此,它们在很大程度上仅限于静态模型,因为每个帧都是不可行的。我们展示了人类学 - 一种普遍的神经表示 - 用于高保真自由观察动态人类的合成。类似于IBRNET如何通过避免每场景训练来帮助NERF,Humannerf跨多视图输入采用聚合像素对准特征,以及用于解决动态运动的姿势嵌入的非刚性变形场。原始人物员已经可以在稀疏视频输入的稀疏视频输入上产生合理的渲染。为了进一步提高渲染质量,我们使用外观混合模块增强了我们的解决方案,用于组合神经体积渲染和神经纹理混合的益处。各种多视图动态人类数据集的广泛实验证明了我们在挑战运动中合成照片 - 现实自由观点的方法和非常稀疏的相机视图输入中的普遍性和有效性。
translated by 谷歌翻译
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (nonconvolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
translated by 谷歌翻译
We present TensoRF, a novel approach to model and reconstruct radiance fields. Unlike NeRF that purely uses MLPs, we model the radiance field of a scene as a 4D tensor, which represents a 3D voxel grid with per-voxel multi-channel features. Our central idea is to factorize the 4D scene tensor into multiple compact low-rank tensor components. We demonstrate that applying traditional CP decomposition -- that factorizes tensors into rank-one components with compact vectors -- in our framework leads to improvements over vanilla NeRF. To further boost performance, we introduce a novel vector-matrix (VM) decomposition that relaxes the low-rank constraints for two modes of a tensor and factorizes tensors into compact vector and matrix factors. Beyond superior rendering quality, our models with CP and VM decompositions lead to a significantly lower memory footprint in comparison to previous and concurrent works that directly optimize per-voxel features. Experimentally, we demonstrate that TensoRF with CP decomposition achieves fast reconstruction (<30 min) with better rendering quality and even a smaller model size (<4 MB) compared to NeRF. Moreover, TensoRF with VM decomposition further boosts rendering quality and outperforms previous state-of-the-art methods, while reducing the reconstruction time (<10 min) and retaining a compact model size (<75 MB).
translated by 谷歌翻译
我们探索了基于神经光场表示的几种新颖观点合成的新策略。给定目标摄像头姿势,隐式神经网络将每个射线映射到其目标像素的颜色。该网络的条件是根据来自显式3D特征量的粗量渲染产生的本地射线特征。该卷是由使用3D Convnet的输入图像构建的。我们的方法在基于最先进的神经辐射场竞争方面,在合成和真实MVS数据上实现了竞争性能,同时提供了100倍的渲染速度。
translated by 谷歌翻译
在本文中,我们为复杂场景进行了高效且强大的深度学习解决方案。在我们的方法中,3D场景表示为光场,即,一组光线,每组在到达图像平面时具有相应的颜色。对于高效的新颖视图渲染,我们采用了光场的双面参数化,其中每个光线的特征在于4D参数。然后,我们将光场配向作为4D函数,即将4D坐标映射到相应的颜色值。我们训练一个深度完全连接的网络以优化这种隐式功能并记住3D场景。然后,特定于场景的模型用于综合新颖视图。与以前需要密集的视野的方法不同,需要密集的视野采样来可靠地呈现新颖的视图,我们的方法可以通过采样光线来呈现新颖的视图并直接从网络查询每种光线的颜色,从而使高质量的灯场呈现稀疏集合训练图像。网络可以可选地预测每光深度,从而使诸如自动重新焦点的应用。我们的小说视图合成结果与最先进的综合结果相当,甚至在一些具有折射和反射的具有挑战性的场景中优越。我们在保持交互式帧速率和小的内存占地面积的同时实现这一点。
translated by 谷歌翻译
我们介绍了神经点光场,它用稀疏点云上的轻场隐含地表示场景。结合可分辨率的体积渲染与学习的隐式密度表示使得可以合成用于小型场景的新颖视图的照片现实图像。作为神经体积渲染方法需要潜在的功能场景表示的浓密采样,在沿着射线穿过体积的数百个样本,它们从根本上限制在具有投影到数百个训练视图的相同对象的小场景。向神经隐式光线推广稀疏点云允许我们有效地表示每个光线的单个隐式采样操作。这些点光场作为光线方向和局部点特征邻域的函数,允许我们在没有密集的物体覆盖和视差的情况下插入光场条件训练图像。我们评估大型驾驶场景的新型视图综合的提出方法,在那里我们综合了现实的看法,即现有的隐式方法未能代表。我们验证了神经点光场可以通过显式建模场景来实现沿着先前轨迹的视频来预测沿着看不见的轨迹的视频。
translated by 谷歌翻译
多视图立体声(MVS)是3D计算机视觉中的核心任务。随着新颖的深度学习方法的激增,学习的MVS超过了经典方法的准确性,但仍然依赖于建立记忆密集型密集的成本量。新型视图合成(NVS)是一项平行的研究线,最近发现神经辐射场(NERF)模型的普及程度增加,该模型优化了每个场景辐射场。但是,NERF方法不会推广到新颖的场景,并且训练和测试速度很慢。我们建议用一个可以恢复3D场景几何形状作为距离函数的新型网络以及高分辨率的颜色图像来弥合这两种方法之间的差距。我们的方法仅使用一组稀疏的图像作为输入,可以很好地推广到新颖的场景。此外,我们提出了一种粗糙的球形追踪方法,以显着提高速度。我们在各种数据集上表明,我们的方法达到了与人均优化方法的可比精度,同时能够概括和运行速度更快。我们在https://github.com/ais-bonn/neural_mvs上提供源代码
translated by 谷歌翻译
我们介绍了Sparseneus,这是一种基于神经渲染的新方法,用于从多视图图像中进行表面重建的任务。当仅提供稀疏图像作为输入时,此任务变得更加困难,这种情况通常会产生不完整或失真的结果。此外,他们无法概括看不见的新场景会阻碍他们在实践中的应用。相反,Sparseneus可以概括为新场景,并与稀疏的图像(仅2或3)良好合作。 Sparseneus采用签名的距离函数(SDF)作为表面表示,并通过引入代码编码通用表面预测的几何形状来从图像特征中学习可概括的先验。此外,引入了几种策略,以有效利用稀疏视图来进行高质量重建,包括1)多层几何推理框架以粗略的方式恢复表面; 2)多尺度的颜色混合方案,以实现更可靠的颜色预测; 3)一种一致性意识的微调方案,以控制由遮挡和噪声引起的不一致区域。广泛的实验表明,我们的方法不仅胜过最先进的方法,而且表现出良好的效率,可推广性和灵活性。
translated by 谷歌翻译
我们提出了一个基于变压器的NERF(Transnerf),以学习在新视图合成任务的观察视图图像上进行的通用神经辐射场。相比之下,现有的基于MLP的NERF无法直接接收具有任意号码的观察视图,并且需要基于辅助池的操作来融合源视图信息,从而导致源视图与目标渲染视图之间缺少复杂的关系。此外,当前方法分别处理每个3D点,忽略辐射场场景表示的局部一致性。这些局限性可能会在挑战现实世界应用中降低其性能,在这些应用程序中可能存在巨大的差异和新颖的渲染视图之间的巨大差异。为了应对这些挑战,我们的Transnerf利用注意机制自然地将任意数量的源视图的深层关联解码为基于坐标的场景表示。在统一变压器网络中,在射线铸造空间和周围视图空间中考虑了形状和外观的局部一致性。实验表明,与基于图像的最先进的基于图像的神经渲染方法相比,我们在各种场景上接受过培训的Transnf可以在场景 - 敏捷和每个场景的燃烧场景中获得更好的性能。源视图与渲染视图之间的差距很大。
translated by 谷歌翻译
Neural Radiance Field (NeRF) has revolutionized free viewpoint rendering tasks and achieved impressive results. However, the efficiency and accuracy problems hinder its wide applications. To address these issues, we propose Geometry-Aware Generalized Neural Radiance Field (GARF) with a geometry-aware dynamic sampling (GADS) strategy to perform real-time novel view rendering and unsupervised depth estimation on unseen scenes without per-scene optimization. Distinct from most existing generalized NeRFs, our framework infers the unseen scenes on both pixel-scale and geometry-scale with only a few input images. More specifically, our method learns common attributes of novel-view synthesis by an encoder-decoder structure and a point-level learnable multi-view feature fusion module which helps avoid occlusion. To preserve scene characteristics in the generalized model, we introduce an unsupervised depth estimation module to derive the coarse geometry, narrow down the ray sampling interval to proximity space of the estimated surface and sample in expectation maximum position, constituting Geometry-Aware Dynamic Sampling strategy (GADS). Moreover, we introduce a Multi-level Semantic Consistency loss (MSC) to assist more informative representation learning. Extensive experiments on indoor and outdoor datasets show that comparing with state-of-the-art generalized NeRF methods, GARF reduces samples by more than 25\%, while improving rendering quality and 3D geometry estimation.
translated by 谷歌翻译
神经辐射场(NERF)在代表3D场景和合成新颖视图中示出了很大的潜力,但是在推理阶段的NERF的计算开销仍然很重。为了减轻负担,我们进入了NERF的粗细分,分层采样过程,并指出粗阶段可以被我们命名神经样本场的轻量级模块代替。所提出的示例场地图光线进入样本分布,可以将其转换为点坐标并进料到radiance字段以进行体积渲染。整体框架被命名为Neusample。我们在现实合成360 $ ^ {\ circ} $和真正的前瞻性,两个流行的3D场景集上进行实验,并表明Neusample在享受更快推理速度时比NERF实现更好的渲染质量。Neusample进一步压缩,以提出的样品场提取方法朝向质量和速度之间的更好的权衡。
translated by 谷歌翻译
Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF (Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website: https://github.com/facebookresearch/NSVF.
translated by 谷歌翻译
新型视图综合的古典光场渲染可以准确地再现视图依赖性效果,例如反射,折射和半透明,但需要一个致密的视图采样的场景。基于几何重建的方法只需要稀疏的视图,但不能准确地模拟非兰伯语的效果。我们介绍了一个模型,它结合了强度并减轻了这两个方向的局限性。通过在光场的四维表示上操作,我们的模型学会准确表示依赖视图效果。通过在训练和推理期间强制执行几何约束,从稀疏的视图集中毫无屏蔽地学习场景几何。具体地,我们介绍了一种基于两级变压器的模型,首先沿着ePipoll线汇总特征,然后沿参考视图聚合特征以产生目标射线的颜色。我们的模型在多个前进和360 {\ DEG}数据集中优于最先进的,具有较大的差别依赖变化的场景更大的边缘。
translated by 谷歌翻译
神经辐射场(NERF)是数据驱动3D重建中的流行方法。鉴于其简单性和高质量的渲染,正在开发许多NERF应用程序。但是,NERF的大量的速度很大。许多尝试如何加速NERF培训和推理,包括复杂的代码级优化和缓存,使用复杂的数据结构以及通过多任务和元学习的摊销。在这项工作中,我们通过NERF之前通过经典技术镜头重新审视NERF的基本构建块。我们提出了Voxel-Accelated Nerf(VaxnerF),与Visual Hull集成了Nerf,一种经典的3D重建技术,只需要每张图像的二进制前景背景像素标签。可视船体,可在大约10秒内优化,可以提供粗略的现场分离,以省略NERF中的大量网络评估。我们在流行的JAXNERF Codebase提供了一个干净的全力验光,基于JAX的实现,其仅包括大约30行的代码更改和模块化视觉船体子程序,并在高度表现的JAXNERF之上实现了大约2-8倍的速度学习基线具有零劣化呈现质量。具有足够的计算,这有效地将单位训练从小时到30分钟缩小到30分钟。我们希望VAXNERF - 一种仔细组合具有深入方法的经典技术(可谓更换它) - 可以赋予并加速新的NERF扩展和应用,以其简单,可移植性和可靠的性能收益。代码在https://github.com/naruya/vaxnerf提供。
translated by 谷歌翻译