Synthesizing high-fidelity videos from real-world multi-view input is challenging because of the complexities of real-world environments and highly dynamic motions. Previous works based on neural radiance fields have demonstrated high-quality reconstructions of dynamic scenes. However, training such models on real-world scenes is time-consuming, usually taking days or weeks. In this paper, we present a novel method named MixVoxels to better represent the dynamic scenes with fast training speed and competitive rendering qualities. The proposed MixVoxels represents the 4D dynamic scenes as a mixture of static and dynamic voxels and processes them with different networks. In this way, the computation of the required modalities for static voxels can be processed by a lightweight model, which essentially reduces the amount of computation, especially for many daily dynamic scenes dominated by the static background. To separate the two kinds of voxels, we propose a novel variation field to estimate the temporal variance of each voxel. For the dynamic voxels, we design an inner-product time query method to efficiently query multiple time steps, which is essential to recover the high-dynamic motions. As a result, with 15 minutes of training for dynamic scenes with inputs of 300-frame videos, MixVoxels achieves better PSNR than previous methods. Codes and trained models are available at https://github.com/fengres/mixvoxels
translated by 谷歌翻译
最近,神经辐射场(NERF)正在彻底改变新型视图合成(NVS)的卓越性能。但是,NERF及其变体通常需要进行冗长的每场训练程序,其中将多层感知器(MLP)拟合到捕获的图像中。为了解决挑战,已经提出了体素网格表示,以显着加快训练的速度。但是,这些现有方法只能处理静态场景。如何开发有效,准确的动态视图合成方法仍然是一个开放的问题。将静态场景的方法扩展到动态场景并不简单,因为场景几何形状和外观随时间变化。在本文中,基于素素网格优化的最新进展,我们提出了一种快速变形的辐射场方法来处理动态场景。我们的方法由两个模块组成。第一个模块采用变形网格来存储3D动态功能,以及使用插值功能将观测空间中的3D点映射到规范空间的变形的轻巧MLP。第二个模块包含密度和颜色网格,以建模场景的几何形状和密度。明确对阻塞进行了建模,以进一步提高渲染质量。实验结果表明,我们的方法仅使用20分钟的训练就可以实现与D-NERF相当的性能,该训练比D-NERF快70倍以上,这清楚地证明了我们提出的方法的效率。
translated by 谷歌翻译
3D reconstruction and novel view synthesis of dynamic scenes from collections of single views recently gained increased attention. Existing work shows impressive results for synthetic setups and forward-facing real-world data, but is severely limited in the training speed and angular range for generating novel views. This paper addresses these limitations and proposes a new method for full 360{\deg} novel view synthesis of non-rigidly deforming scenes. At the core of our method are: 1) An efficient deformation module that decouples the processing of spatial and temporal information for acceleration at training and inference time; and 2) A static module representing the canonical scene as a fast hash-encoded neural radiance field. We evaluate the proposed approach on the established synthetic D-NeRF benchmark, that enables efficient reconstruction from a single monocular view per time-frame randomly sampled from a full hemisphere. We refer to this form of inputs as monocularized data. To prove its practicality for real-world scenarios, we recorded twelve challenging sequences with human actors by sampling single frames from a synchronized multi-view rig. In both cases, our method is trained significantly faster than previous methods (minutes instead of days) while achieving higher visual accuracy for generated novel views. Our source code and data is available at our project page https://graphics.tu-bs.de/publications/kappel2022fast.
translated by 谷歌翻译
本文旨在减少透明辐射场的渲染时间。一些最近的作品用图像编码器配备了神经辐射字段,能够跨越场景概括,这避免了每场景优化。但是,它们的渲染过程通常很慢。主要因素是,在推断辐射场时,它们在空间中的大量点。在本文中,我们介绍了一个混合场景表示,它结合了最佳的隐式辐射场和显式深度映射,以便有效渲染。具体地,我们首先构建级联成本量,以有效地预测场景的粗糙几何形状。粗糙几何允许我们在场景表面附近的几个点来样,并显着提高渲染速度。该过程是完全可疑的,使我们能够仅从RGB图像共同学习深度预测和辐射现场网络。实验表明,该方法在DTU,真正的前瞻性和NERF合成数据集上展示了最先进的性能,而不是比以前的最可推广的辐射现场方法快至少50倍。我们还展示了我们的方法实时综合动态人类执行者的自由观点视频。代码将在https://zju3dv.github.io/enerf/处提供。
translated by 谷歌翻译
在本文中,我们为复杂场景进行了高效且强大的深度学习解决方案。在我们的方法中,3D场景表示为光场,即,一组光线,每组在到达图像平面时具有相应的颜色。对于高效的新颖视图渲染,我们采用了光场的双面参数化,其中每个光线的特征在于4D参数。然后,我们将光场配向作为4D函数,即将4D坐标映射到相应的颜色值。我们训练一个深度完全连接的网络以优化这种隐式功能并记住3D场景。然后,特定于场景的模型用于综合新颖视图。与以前需要密集的视野的方法不同,需要密集的视野采样来可靠地呈现新颖的视图,我们的方法可以通过采样光线来呈现新颖的视图并直接从网络查询每种光线的颜色,从而使高质量的灯场呈现稀疏集合训练图像。网络可以可选地预测每光深度,从而使诸如自动重新焦点的应用。我们的小说视图合成结果与最先进的综合结果相当,甚至在一些具有折射和反射的具有挑战性的场景中优越。我们在保持交互式帧速率和小的内存占地面积的同时实现这一点。
translated by 谷歌翻译
神经辐射场(NERF)在建模3D场景和合成新型视图图像方面取得了巨大成功。但是,大多数以前的NERF方法需要大量时间来优化一个场景。显式数据结构,例如体素特征,显示出加速训练过程的巨大潜力。但是,体素特征面临两个大挑战,要应用于动态场景,即建模时间信息并捕获不同的点运动尺度。我们通过用时间感知的体素特征(称为Tineuvox)表示场景来提出一个辐射现场框架。引入了一个微小的坐标变形网络,以模拟粗糙运动轨迹,并在辐射网络中进一步增强了时间信息。提出了一种多距离插值方法,并应用于体素特征,以模拟小运动和大型运动。我们的框架大大加快了动态光芒度场的优化,同时保持高渲染质量。经验评估均在合成场景和真实场景上进行。我们的Tineuvox仅需8分钟和8 MB的存储成本即可完成培训,同时表现出比以前的动态NERF方法相似甚至更好的渲染性能。
translated by 谷歌翻译
Recent methods for neural surface representation and rendering, for example NeuS, have demonstrated remarkably high-quality reconstruction of static scenes. However, the training of NeuS takes an extremely long time (8 hours), which makes it almost impossible to apply them to dynamic scenes with thousands of frames. We propose a fast neural surface reconstruction approach, called NeuS2, which achieves two orders of magnitude improvement in terms of acceleration without compromising reconstruction quality. To accelerate the training process, we integrate multi-resolution hash encodings into a neural surface representation and implement our whole algorithm in CUDA. We also present a lightweight calculation of second-order derivatives tailored to our networks (i.e., ReLU-based MLPs), which achieves a factor two speed up. To further stabilize training, a progressive learning strategy is proposed to optimize multi-resolution hash encodings from coarse to fine. In addition, we extend our method for reconstructing dynamic scenes with an incremental training strategy. Our experiments on various datasets demonstrate that NeuS2 significantly outperforms the state-of-the-arts in both surface reconstruction accuracy and training speed. The video is available at https://vcai.mpi-inf.mpg.de/projects/NeuS2/ .
translated by 谷歌翻译
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. State-of-the-art methods based on temporally varying Neural Radiance Fields (aka dynamic NeRFs) have shown impressive results on this task. However, for long videos with complex object motions and uncontrolled camera trajectories, these methods can produce blurry or inaccurate renderings, hampering their use in real-world applications. Instead of encoding the entire dynamic scene within the weights of an MLP, we present a new approach that addresses these limitations by adopting a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views in a scene-motion-aware manner. Our system retains the advantages of prior methods in its ability to model complex scenes and view-dependent effects, but also enables synthesizing photo-realistic novel views from long videos featuring complex scene dynamics with unconstrained camera trajectories. We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets, and also apply our approach to in-the-wild videos with challenging camera and object motion, where prior methods fail to produce high-quality renderings. Our project webpage is at dynibar.github.io.
translated by 谷歌翻译
我们介绍了一种超快速的收敛方法来重建从一组图像中捕获具有已知姿势的场景的图像的每场辐射场。该任务通常适用于新颖的视图综合,最近是由神经辐射领域(NERF)彻底改革为其最先进的质量和灵活性。然而,NERF及其变体需要漫长的训练时间来为单个场景的数小时到几天。相比之下,我们的方法实现了NERF相当的质量,并通过单个GPU在不到15分钟内从划痕中迅速收敛。我们采用由密度体素网格组成的表示,用于场景几何形状和具有浅网络的特征体素网格,用于复杂的视图依赖性外观。用明确和离散化卷表示的建模并不是新的,但我们提出了两种简单而非琐碎的技术,有助于快速收敛速度和高质量的输出。首先,我们介绍了体素密度的激活后插值,其能够以较低的网格分辨率产生尖锐的表面。其次,直接体素密度优化容易发生次优几何解决方案,因此我们通过施加多个前沿来强制优化过程。最后,对五个内向的基准评估表明,我们的方法匹配,如果没有超越Nerf的质量,但它只需15分钟即可从头开始训练新场景。
translated by 谷歌翻译
Figure 1. Given a monocular image sequence, NR-NeRF reconstructs a single canonical neural radiance field to represent geometry and appearance, and a per-time-step deformation field. We can render the scene into a novel spatio-temporal camera trajectory that significantly differs from the input trajectory. NR-NeRF also learns rigidity scores and correspondences without direct supervision on either. We can use the rigidity scores to remove the foreground, we can supersample along the time dimension, and we can exaggerate or dampen motion.
translated by 谷歌翻译
我们人类正在进入虚拟时代,确实想将动物带到虚拟世界中。然而,计算机生成的(CGI)毛茸茸的动物受到乏味的离线渲染的限制,更不用说交互式运动控制了。在本文中,我们提出了Artemis,这是一种新型的神经建模和渲染管道,用于生成具有外观和运动合成的清晰神经宠物。我们的Artemis可以实现互动运动控制,实时动画和毛茸茸的动物的照片真实渲染。我们的Artemis的核心是神经生成的(NGI)动物引擎,该动物发动机采用了有效的基于OCTREE的动物动画和毛皮渲染的代表。然后,该动画等同于基于显式骨骼翘曲的体素级变形。我们进一步使用快速的OCTREE索引和有效的体积渲染方案来生成外观和密度特征地图。最后,我们提出了一个新颖的阴影网络,以在外观和密度特征图中生成外观和不透明度的高保真细节。对于Artemis中的运动控制模块,我们将最新动物运动捕获方法与最近的神经特征控制方案相结合。我们引入了一种有效的优化方案,以重建由多视图RGB和Vicon相机阵列捕获的真实动物的骨骼运动。我们将所有捕获的运动馈送到神经角色控制方案中,以生成具有运动样式的抽象控制信号。我们将Artemis进一步整合到支持VR耳机的现有引擎中,提供了前所未有的沉浸式体验,用户可以与各种具有生动动作和光真实外观的虚拟动物进行紧密互动。我们可以通过https://haiminluo.github.io/publication/artemis/提供我们的Artemis模型和动态毛茸茸的动物数据集。
translated by 谷歌翻译
We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800×800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of scenes with arbitrary geometry and view-dependent effects. Real-time performance is achieved by pre-tabulating the NeRF into a PlenOctree. In order to preserve viewdependent effects such as specularities, we factorize the appearance via closed-form spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss, which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step can be used to reduce the training time, as we no longer need to wait for the NeRF training to converge fully. Our real-time neural rendering approach may potentially enable new applications such as 6-DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to in-browser rendering as well; please visit the project page for the interactive online demo, as well as video and code: https://alexyu. net/plenoctrees.
translated by 谷歌翻译
我们提出了逐渐变化的辐射场(PDRF),这是一种从模糊图像中有效重建高质量辐射场的新方法。虽然当前的最先进的(SOTA)场景重建方法实现了光真实的渲染,因此清洁源视图会导致其性能在源视图受模糊影响的影响时会受到影响,这通常是野外图像的观察。以前的脱毛方法要么不考虑3D几何形状,要么是计算强度。为了解决这些问题,PDRF是Radiance Field建模中逐渐消除的方案,通过合并3D场景上下文来准确地模拟模糊。 PDRF进一步使用了有效的重要性采样方案,从而导致快速场景优化。具体而言,PDRF提出了一个粗射线渲染器,以快速估计体素密度和特征。然后,使用精细的体素渲染器来实现高质量的射线追踪。我们执行广泛的实验,并表明PDRF比以前的SOTA快15倍,同时在合成场景和真实场景上都取得更好的性能。
translated by 谷歌翻译
Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF (Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website: https://github.com/facebookresearch/NSVF.
translated by 谷歌翻译
最近,我们看到了照片真实的人类建模和渲染的神经进展取得的巨大进展。但是,将它们集成到现有的下游应用程序中的现有网络管道中仍然具有挑战性。在本文中,我们提出了一种全面的神经方法,用于从密集的多视频视频中对人类表演进行高质量重建,压缩和渲染。我们的核心直觉是用一系列高效的神经技术桥接传统的动画网格工作流程。我们首先引入一个神经表面重建器,以在几分钟内进行高质量的表面产生。它与多分辨率哈希编码的截短签名距离场(TSDF)的隐式体积渲染相结合。我们进一步提出了一个混合神经跟踪器来生成动画网格,该网格将明确的非刚性跟踪与自我监督框架中的隐式动态变形结合在一起。前者将粗糙的翘曲返回到规范空间中,而后者隐含的一个隐含物进一步预测了使用4D哈希编码的位移,如我们的重建器中。然后,我们使用获得的动画网格讨论渲染方案,从动态纹理到各种带宽设置下的Lumigraph渲染。为了在质量和带宽之间取得复杂的平衡,我们通过首先渲染6个虚拟视图来涵盖表演者,然后进行闭塞感知的神经纹理融合,提出一个分层解决方案。我们证明了我们方法在各种平台上的各种基于网格的应用程序和照片真实的自由观看体验中的功效,即,通过移动AR插入虚拟人类的表演,或通过移动AR插入真实环境,或带有VR头戴式的人才表演。
translated by 谷歌翻译
静态场景的新颖观看综合在生产照片逼真的结果方面取得了显着的进步。但是,对于动态内容的沉浸式渲染,仍然存在关键挑战。例如,一种基于精英图像的渲染框架之一,多平面图像(MPI)为静态场景产生高新的观看合成质量,但面临着建模动态部分的难度。此外,通过MPI建模动态变化可能需要庞大的存储空间和长期推理时间,其阻碍了其在实时方案中的应用。在本文中,我们提出了一种新的颞型MPI表示,其能够在整个视频中以紧凑的时间编码整个视频中的丰富的3D和动态变化信息。由于高度紧凑且表现力的潜在基础和共同学习的系数,任意时间实例的新颖 - 实例将能够实时具有高视觉质量。我们显示给定的可比内存消耗,我们提出的时间 - MPI框架能够生成时间实例MPI,只有0.002秒,速度快3000倍,与其他状态相比,3DB更高的平均视图合成PSNR - 艺术动态场景建模框架。
translated by 谷歌翻译
Figure 1: Our method can synthesize novel views in both space and time from a single monocular video of a dynamic scene. Here we show video results with various configurations of fixing and interpolating view and time (left), as well as a visualization of the recovered scene geometry (right). Please view with Adobe Acrobat or KDE Okular to see animations.
translated by 谷歌翻译
神经辐射场(NERF)最近在新型视图合成中取得了令人印象深刻的结果。但是,以前的NERF作品主要关注以对象为中心的方案。在这项工作中,我们提出了360ROAM,这是一种新颖的场景级NERF系统,可以实时合成大型室内场景的图像并支持VR漫游。我们的系统首先从多个输入$ 360^\ circ $图像构建全向神经辐射场360NERF。然后,我们逐步估算一个3D概率的占用图,该概率占用图代表了空间密度形式的场景几何形状。跳过空的空间和上采样占据的体素本质上可以使我们通过以几何学意识的方式使用360NERF加速量渲染。此外,我们使用自适应划分和扭曲策略来减少和调整辐射场,以进一步改进。从占用地图中提取的场景的平面图可以为射线采样提供指导,并促进现实的漫游体验。为了显示我们系统的功效,我们在各种场景中收集了$ 360^\ Circ $图像数据集并进行广泛的实验。基线之间的定量和定性比较说明了我们在复杂室内场景的新型视图合成中的主要表现。
translated by 谷歌翻译
Point of View & TimeFigure 1: We propose D-NeRF, a method for synthesizing novel views, at an arbitrary point in time, of dynamic scenes with complex non-rigid geometries. We optimize an underlying deformable volumetric function from a sparse set of input monocular views without the need of ground-truth geometry nor multi-view images. The figure shows two scenes under variable points of view and time instances synthesised by the proposed model.
translated by 谷歌翻译
综合照片 - 现实图像和视频是计算机图形的核心,并且是几十年的研究焦点。传统上,使用渲染算法(如光栅化或射线跟踪)生成场景的合成图像,其将几何形状和材料属性的表示为输入。统称,这些输入定义了实际场景和呈现的内容,并且被称为场景表示(其中场景由一个或多个对象组成)。示例场景表示是具有附带纹理的三角形网格(例如,由艺术家创建),点云(例如,来自深度传感器),体积网格(例如,来自CT扫描)或隐式曲面函数(例如,截短的符号距离)字段)。使用可分辨率渲染损耗的观察结果的这种场景表示的重建被称为逆图形或反向渲染。神经渲染密切相关,并将思想与经典计算机图形和机器学习中的思想相结合,以创建用于合成来自真实观察图像的图像的算法。神经渲染是朝向合成照片现实图像和视频内容的目标的跨越。近年来,我们通过数百个出版物显示了这一领域的巨大进展,这些出版物显示了将被动组件注入渲染管道的不同方式。这种最先进的神经渲染进步的报告侧重于将经典渲染原则与学习的3D场景表示结合的方法,通常现在被称为神经场景表示。这些方法的一个关键优势在于它们是通过设计的3D-一致,使诸如新颖的视点合成捕获场景的应用。除了处理静态场景的方法外,我们还涵盖了用于建模非刚性变形对象的神经场景表示...
translated by 谷歌翻译