In this paper, we present a novel and effective framework, named 4K-NeRF, to pursue high fidelity view synthesis on the challenging scenarios of ultra high resolutions, building on the methodology of neural radiance fields (NeRF). The rendering procedure of NeRF-based methods typically relies on a pixel wise manner in which rays (or pixels) are treated independently on both training and inference phases, limiting its representational ability on describing subtle details especially when lifting to a extremely high resolution. We address the issue by better exploring ray correlation for enhancing high-frequency details benefiting from the use of geometry-aware local context. Particularly, we use the view-consistent encoder to model geometric information effectively in a lower resolution space and recover fine details through the view-consistent decoder, conditioned on ray features and depths estimated by the encoder. Joint training with patch-based sampling further facilitates our method incorporating the supervision from perception oriented regularization beyond pixel wise loss. Quantitative and qualitative comparisons with modern NeRF methods demonstrate that our method can significantly boost rendering quality for retaining high-frequency details, achieving the state-of-the-art visual quality on 4K ultra-high-resolution scenario. Code Available at \url{https://github.com/frozoul/4K-NeRF}
translated by 谷歌翻译
我们呈现NERF-SR,一种用于高分辨率(HR)新型视图合成的解决方案,主要是低分辨率(LR)输入。我们的方法是基于神经辐射场(NERF)的内置,其预测每点密度和颜色,具有多层的射击。在在任意尺度上产生图像时,NERF与超越观察图像的分辨率努力。我们的关键识别是NERF具有本地之前的,这意味着可以在附近区域传播3D点的预测,并且保持准确。我们首先通过超级采样策略来利用它,该策略在每个图像像素处射击多个光线,这在子像素级别强制了多视图约束。然后,我们表明,NERF-SR可以通过改进网络进一步提高超级采样的性能,该细化网络利用估计的深度来实现HR参考图像上的相关补丁的幻觉。实验结果表明,NERF-SR在合成和现实世界数据集的HR上为新型视图合成产生高质量结果。
translated by 谷歌翻译
我们提出了HRF-NET,这是一种基于整体辐射场的新型视图合成方法,该方法使用一组稀疏输入来呈现新视图。最近的概括视图合成方法还利用了光辉场,但渲染速度不是实时的。现有的方法可以有效地训练和呈现新颖的观点,但它们无法概括地看不到场景。我们的方法解决了用于概括视图合成的实时渲染问题,并由两个主要阶段组成:整体辐射场预测指标和基于卷积的神经渲染器。该架构不仅基于隐式神经场的一致场景几何形状,而且还可以使用单个GPU有效地呈现新视图。我们首先在DTU数据集的多个3D场景上训练HRF-NET,并且网络只能仅使用光度损耗就看不见的真实和合成数据产生合理的新视图。此外,我们的方法可以利用单个场景的密集参考图像集来产生准确的新颖视图,而无需依赖其他明确表示,并且仍然保持了预训练模型的高速渲染。实验结果表明,HRF-NET优于各种合成和真实数据集的最先进的神经渲染方法。
translated by 谷歌翻译
Neural Radiance Field (NeRF) has revolutionized free viewpoint rendering tasks and achieved impressive results. However, the efficiency and accuracy problems hinder its wide applications. To address these issues, we propose Geometry-Aware Generalized Neural Radiance Field (GARF) with a geometry-aware dynamic sampling (GADS) strategy to perform real-time novel view rendering and unsupervised depth estimation on unseen scenes without per-scene optimization. Distinct from most existing generalized NeRFs, our framework infers the unseen scenes on both pixel-scale and geometry-scale with only a few input images. More specifically, our method learns common attributes of novel-view synthesis by an encoder-decoder structure and a point-level learnable multi-view feature fusion module which helps avoid occlusion. To preserve scene characteristics in the generalized model, we introduce an unsupervised depth estimation module to derive the coarse geometry, narrow down the ray sampling interval to proximity space of the estimated surface and sample in expectation maximum position, constituting Geometry-Aware Dynamic Sampling strategy (GADS). Moreover, we introduce a Multi-level Semantic Consistency loss (MSC) to assist more informative representation learning. Extensive experiments on indoor and outdoor datasets show that comparing with state-of-the-art generalized NeRF methods, GARF reduces samples by more than 25\%, while improving rendering quality and 3D geometry estimation.
translated by 谷歌翻译
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (nonconvolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
translated by 谷歌翻译
Neural volumetric representations have become a widely adopted model for radiance fields in 3D scenes. These representations are fully implicit or hybrid function approximators of the instantaneous volumetric radiance in a scene, which are typically learned from multi-view captures of the scene. We investigate the new task of neural volume super-resolution - rendering high-resolution views corresponding to a scene captured at low resolution. To this end, we propose a neural super-resolution network that operates directly on the volumetric representation of the scene. This approach allows us to exploit an advantage of operating in the volumetric domain, namely the ability to guarantee consistent super-resolution across different viewing directions. To realize our method, we devise a novel 3D representation that hinges on multiple 2D feature planes. This allows us to super-resolve the 3D scene representation by applying 2D convolutional networks on the 2D feature planes. We validate the proposed method's capability of super-resolving multi-view consistent views both quantitatively and qualitatively on a diverse set of unseen 3D scenes, demonstrating a significant advantage over existing approaches.
translated by 谷歌翻译
在本文中,我们为复杂场景进行了高效且强大的深度学习解决方案。在我们的方法中,3D场景表示为光场,即,一组光线,每组在到达图像平面时具有相应的颜色。对于高效的新颖视图渲染,我们采用了光场的双面参数化,其中每个光线的特征在于4D参数。然后,我们将光场配向作为4D函数,即将4D坐标映射到相应的颜色值。我们训练一个深度完全连接的网络以优化这种隐式功能并记住3D场景。然后,特定于场景的模型用于综合新颖视图。与以前需要密集的视野的方法不同,需要密集的视野采样来可靠地呈现新颖的视图,我们的方法可以通过采样光线来呈现新颖的视图并直接从网络查询每种光线的颜色,从而使高质量的灯场呈现稀疏集合训练图像。网络可以可选地预测每光深度,从而使诸如自动重新焦点的应用。我们的小说视图合成结果与最先进的综合结果相当,甚至在一些具有折射和反射的具有挑战性的场景中优越。我们在保持交互式帧速率和小的内存占地面积的同时实现这一点。
translated by 谷歌翻译
Point of View & TimeFigure 1: We propose D-NeRF, a method for synthesizing novel views, at an arbitrary point in time, of dynamic scenes with complex non-rigid geometries. We optimize an underlying deformable volumetric function from a sparse set of input monocular views without the need of ground-truth geometry nor multi-view images. The figure shows two scenes under variable points of view and time instances synthesised by the proposed model.
translated by 谷歌翻译
神经辐射场(NERF)在代表3D场景和合成新颖视图中示出了很大的潜力,但是在推理阶段的NERF的计算开销仍然很重。为了减轻负担,我们进入了NERF的粗细分,分层采样过程,并指出粗阶段可以被我们命名神经样本场的轻量级模块代替。所提出的示例场地图光线进入样本分布,可以将其转换为点坐标并进料到radiance字段以进行体积渲染。整体框架被命名为Neusample。我们在现实合成360 $ ^ {\ circ} $和真正的前瞻性,两个流行的3D场景集上进行实验,并表明Neusample在享受更快推理速度时比NERF实现更好的渲染质量。Neusample进一步压缩,以提出的样品场提取方法朝向质量和速度之间的更好的权衡。
translated by 谷歌翻译
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multiview posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. 1
translated by 谷歌翻译
Neural Radiance Fields (NeRF) have demonstrated superior novel view synthesis performance but are slow at rendering. To speed up the volume rendering process, many acceleration methods have been proposed at the cost of large memory consumption. To push the frontier of the efficiency-memory trade-off, we explore a new perspective to accelerate NeRF rendering, leveraging a key fact that the viewpoint change is usually smooth and continuous in interactive viewpoint control. This allows us to leverage the information of preceding viewpoints to reduce the number of rendered pixels as well as the number of sampled points along the ray of the remaining pixels. In our pipeline, a low-resolution feature map is rendered first by volume rendering, then a lightweight 2D neural renderer is applied to generate the output image at target resolution leveraging the features of preceding and current frames. We show that the proposed method can achieve competitive rendering quality while reducing the rendering time with little memory overhead, enabling 30FPS at 1080P image resolution with a low memory footprint.
translated by 谷歌翻译
神经场景表示,例如神经辐射场(NERF),基于训练多层感知器(MLP),使用一组具有已知姿势的彩色图像。现在,越来越多的设备产生RGB-D(颜色 +深度)信息,这对于各种任务非常重要。因此,本文的目的是通过将深度信息与颜色图像结合在一起,研究这些有希望的隐式表示可以进行哪些改进。特别是,最近建议的MIP-NERF方法使用圆锥形的圆丝而不是射线进行音量渲染,它使人们可以考虑具有距离距离摄像头中心距离的像素的不同区域。所提出的方法还模拟了深度不确定性。这允许解决基于NERF的方法的主要局限性,包括提高几何形状的准确性,减少伪像,更快的训练时间和缩短预测时间。实验是在众所周知的基准场景上进行的,并且比较在场景几何形状和光度重建中的准确性提高,同时将训练时间减少了3-5次。
translated by 谷歌翻译
新型视图综合的古典光场渲染可以准确地再现视图依赖性效果,例如反射,折射和半透明,但需要一个致密的视图采样的场景。基于几何重建的方法只需要稀疏的视图,但不能准确地模拟非兰伯语的效果。我们介绍了一个模型,它结合了强度并减轻了这两个方向的局限性。通过在光场的四维表示上操作,我们的模型学会准确表示依赖视图效果。通过在训练和推理期间强制执行几何约束,从稀疏的视图集中毫无屏蔽地学习场景几何。具体地,我们介绍了一种基于两级变压器的模型,首先沿着ePipoll线汇总特征,然后沿参考视图聚合特征以产生目标射线的颜色。我们的模型在多个前进和360 {\ DEG}数据集中优于最先进的,具有较大的差别依赖变化的场景更大的边缘。
translated by 谷歌翻译
Recent efforts in Neural Rendering Fields (NeRF) have shown impressive results on novel view synthesis by utilizing implicit neural representation to represent 3D scenes. Due to the process of volumetric rendering, the inference speed for NeRF is extremely slow, limiting the application scenarios of utilizing NeRF on resource-constrained hardware, such as mobile devices. Many works have been conducted to reduce the latency of running NeRF models. However, most of them still require high-end GPU for acceleration or extra storage memory, which is all unavailable on mobile devices. Another emerging direction utilizes the neural light field (NeLF) for speedup, as only one forward pass is performed on a ray to predict the pixel color. Nevertheless, to reach a similar rendering quality as NeRF, the network in NeLF is designed with intensive computation, which is not mobile-friendly. In this work, we propose an efficient network that runs in real-time on mobile devices for neural rendering. We follow the setting of NeLF to train our network. Unlike existing works, we introduce a novel network architecture that runs efficiently on mobile devices with low latency and small size, i.e., saving $15\times \sim 24\times$ storage compared with MobileNeRF. Our model achieves high-resolution generation while maintaining real-time inference for both synthetic and real-world scenes on mobile devices, e.g., $18.04$ms (iPhone 13) for rendering one $1008\times756$ image of real 3D scenes. Additionally, we achieve similar image quality as NeRF and better quality than MobileNeRF (PSNR $26.15$ vs. $25.91$ on the real-world forward-facing dataset).
translated by 谷歌翻译
我们提出了可推广的NERF变压器(GNT),这是一种纯粹的,统一的基于变压器的体系结构,可以从源视图中有效地重建神经辐射场(NERF)。与NERF上的先前作品不同,通过颠倒手工渲染方程来优化人均隐式表示,GNT通过封装两个基于变压器的阶段来实现可概括的神经场景表示和渲染。 GNT的第一阶段,称为View Transformer,利用多视图几何形状作为基于注意力的场景表示的电感偏差,并通过在相邻视图上从异性线中汇总信息来预测与坐标对齐的特征。 GNT的第二阶段,名为Ray Transformer,通过Ray Marching呈现新视图,并使用注意机制直接解码采样点特征的序列。我们的实验表明,当在单个场景上进行优化时,GNT可以在不明确渲染公式的情况下成功重建NERF,甚至由于可学习的射线渲染器,在复杂的场景上甚至将PSNR提高了〜1.3db。当在各种场景中接受培训时,GNT转移到前面的LLFF数据集(LPIPS〜20%,SSIM〜25%$)和合成搅拌器数据集(LPIPS〜20%,SSIM 〜25%$)时,GNN会始终达到最先进的性能4%)。此外,我们表明可以从学习的注意图中推断出深度和遮挡,这意味着纯粹的注意机制能够学习一个物理地面渲染过程。所有这些结果使我们更接近将变形金刚作为“通用建模工具”甚至用于图形的诱人希望。请参阅我们的项目页面以获取视频结果:https://vita-group.github.io/gnt/。
translated by 谷歌翻译
Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF (Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website: https://github.com/facebookresearch/NSVF.
translated by 谷歌翻译
我们呈现高动态范围神经辐射字段(HDR-NERF),以从一组低动态范围(LDR)视图的HDR辐射率字段与不同的曝光。使用HDR-NERF,我们能够在不同的曝光下生成新的HDR视图和新型LDR视图。我们方法的关键是模拟物理成像过程,该过程决定了场景点的辐射与具有两个隐式功能的LDR图像中的像素值转换为:RADIACE字段和音调映射器。辐射场对场景辐射(值在0到+末端之间的值变化),其通过提供相应的射线源和光线方向来输出光线的密度和辐射。 TONE MAPPER模拟映射过程,即在相机传感器上击中的光线变为像素值。通过将辐射和相应的曝光时间送入音调映射器来预测光线的颜色。我们使用经典的卷渲染技术将输出辐射,颜色和密度投影为HDR和LDR图像,同时只使用输入的LDR图像作为监控。我们收集了一个新的前瞻性的HDR数据集,以评估所提出的方法。综合性和现实世界场景的实验结果验证了我们的方法不仅可以准确控制合成视图的曝光,还可以用高动态范围呈现视图。
translated by 谷歌翻译
我们提出了一种基于神经隐式表示的少量新型视图综合信息 - 理论正规化技术。所提出的方法最小化由于在每个光线中强制密度的熵约束而发生的潜在的重建不一致。另外,当从几乎冗余的观点获取所有训练图像时,为了减轻潜在的退化问题,我们还通过限制来自一对略微不同观点的光线的信息增益来将空间平滑度约束纳入估计的图像。我们的算法的主要思想是使重建的场景沿各个光线紧凑,并在附近的光线上一致。所提出的常规方基于Nerf以直接的方式插入大部分现有的神经体积渲染技术。尽管其简单性,但是,与现有的神经观察合成方法通过大量标准基准测试的现有神经观察方法相比,我们实现了一致的性能。我们的项目网站可用于\ url {http://cvlab.snu.ac.kr/research/infonerf}。
translated by 谷歌翻译
This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation, while the other module warps another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations adaptively via the learned attention maps, leading to the final high-resolution LF image with satisfactory results on both plain textured areas and depth discontinuous boundaries. Besides, to promote the effectiveness of our method trained with simulated hybrid data on real hybrid data captured by a hybrid LF imaging system, we carefully design the network architecture and the training strategy. Extensive experiments on both real and simulated hybrid data demonstrate the significant superiority of our approach over state-of-the-art ones. To the best of our knowledge, this is the first end-to-end deep learning method for LF reconstruction from a real hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission.
translated by 谷歌翻译
尽管神经辐射场(NERF)迅速发展,但稠密的必要性在很大程度上禁止其更广泛的应用。尽管最近的一些作品试图解决这个问题,但它们要么以稀疏的视图(仍然是其中的一些)操作,要么在简单的对象/场景上运行。在这项工作中,我们考虑了一项更雄心勃勃的任务:通过“只看一次”,即仅使用单个视图来训练神经辐射场,而是在现实的复杂视觉场景上。为了实现这一目标,我们提出了一个视图NERF(SINNERF)框架,该框架由精心设计的语义和几何正规化组成。具体而言,Sinnerf构建了一个半监督的学习过程,我们在其中介绍并传播几何标签和语义伪标签,以指导渐进式训练过程。广泛的实验是在复杂的场景基准上进行的,包括NERF合成数据集,本地光场融合数据集和DTU数据集。我们表明,即使在多视图数据集上进行预训练,Sinnerf也可以产生照片现实的新型视图合成结果。在单个图像设置下,Sinnerf在所有情况下都显着胜过当前最新的NERF基线。项目页面:https://vita-group.github.io/sinnerf/
translated by 谷歌翻译