Figure 1. Given a monocular image sequence, NR-NeRF reconstructs a single canonical neural radiance field to represent geometry and appearance, and a per-time-step deformation field. We can render the scene into a novel spatio-temporal camera trajectory that significantly differs from the input trajectory. NR-NeRF also learns rigidity scores and correspondences without direct supervision on either. We can use the rigidity scores to remove the foreground, we can supersample along the time dimension, and we can exaggerate or dampen motion.
translated by 谷歌翻译
3D reconstruction and novel view synthesis of dynamic scenes from collections of single views recently gained increased attention. Existing work shows impressive results for synthetic setups and forward-facing real-world data, but is severely limited in the training speed and angular range for generating novel views. This paper addresses these limitations and proposes a new method for full 360{\deg} novel view synthesis of non-rigidly deforming scenes. At the core of our method are: 1) An efficient deformation module that decouples the processing of spatial and temporal information for acceleration at training and inference time; and 2) A static module representing the canonical scene as a fast hash-encoded neural radiance field. We evaluate the proposed approach on the established synthetic D-NeRF benchmark, that enables efficient reconstruction from a single monocular view per time-frame randomly sampled from a full hemisphere. We refer to this form of inputs as monocularized data. To prove its practicality for real-world scenarios, we recorded twelve challenging sequences with human actors by sampling single frames from a synchronized multi-view rig. In both cases, our method is trained significantly faster than previous methods (minutes instead of days) while achieving higher visual accuracy for generated novel views. Our source code and data is available at our project page https://graphics.tu-bs.de/publications/kappel2022fast.
translated by 谷歌翻译
https://video-nerf.github.io Figure 1. Our method takes a single casually captured video as input and learns a space-time neural irradiance field. (Top) Sample frames from the input video. (Middle) Novel view images rendered from textured meshes constructed from depth maps. (Bottom) Our results rendered from the proposed space-time neural irradiance field.
translated by 谷歌翻译
Point of View & TimeFigure 1: We propose D-NeRF, a method for synthesizing novel views, at an arbitrary point in time, of dynamic scenes with complex non-rigid geometries. We optimize an underlying deformable volumetric function from a sparse set of input monocular views without the need of ground-truth geometry nor multi-view images. The figure shows two scenes under variable points of view and time instances synthesised by the proposed model.
translated by 谷歌翻译
综合照片 - 现实图像和视频是计算机图形的核心,并且是几十年的研究焦点。传统上,使用渲染算法(如光栅化或射线跟踪)生成场景的合成图像,其将几何形状和材料属性的表示为输入。统称,这些输入定义了实际场景和呈现的内容,并且被称为场景表示(其中场景由一个或多个对象组成)。示例场景表示是具有附带纹理的三角形网格(例如,由艺术家创建),点云(例如,来自深度传感器),体积网格(例如,来自CT扫描)或隐式曲面函数(例如,截短的符号距离)字段)。使用可分辨率渲染损耗的观察结果的这种场景表示的重建被称为逆图形或反向渲染。神经渲染密切相关,并将思想与经典计算机图形和机器学习中的思想相结合,以创建用于合成来自真实观察图像的图像的算法。神经渲染是朝向合成照片现实图像和视频内容的目标的跨越。近年来,我们通过数百个出版物显示了这一领域的巨大进展,这些出版物显示了将被动组件注入渲染管道的不同方式。这种最先进的神经渲染进步的报告侧重于将经典渲染原则与学习的3D场景表示结合的方法,通常现在被称为神经场景表示。这些方法的一个关键优势在于它们是通过设计的3D-一致,使诸如新颖的视点合成捕获场景的应用。除了处理静态场景的方法外,我们还涵盖了用于建模非刚性变形对象的神经场景表示...
translated by 谷歌翻译
Figure 1: Our method can synthesize novel views in both space and time from a single monocular video of a dynamic scene. Here we show video results with various configurations of fixing and interpolating view and time (left), as well as a visualization of the recovered scene geometry (right). Please view with Adobe Acrobat or KDE Okular to see animations.
translated by 谷歌翻译
我们介绍了一个自由视的渲染方法 - Humannerf - 这对人类进行了复杂的身体运动的给定单曲视频工作,例如,来自YouTube的视频。我们的方法可以在任何帧中暂停视频,并从任意新相机视点呈现对象,甚至是该特定帧和身体姿势的完整360度摄像机路径。这项任务特别具有挑战性,因为它需要合成身体的光电型细节,如从输入视频中可能不存在的各种相机角度所见,以及合成布折叠和面部外观的细细节。我们的方法优化了在规范T型姿势中的人的体积表示,同时通过运动场,该运动场通过向后的警报将估计的规范表示映射到视频的每个帧。运动场分解成骨骼刚性和非刚性运动,由深网络产生。我们对现有工作显示出显着的性能改进,以及从移动人类的单眼视频的令人尖锐的观点渲染的阐释示例,以挑战不受控制的捕获场景。
translated by 谷歌翻译
我们呈现虚拟弹性物体(VEOS):虚拟物体,不仅看起来像他们的真实同行,而且也表现得像他们一样,即使在进行新颖的互动时也是如此。实现这一挑战:不仅必须捕获对象,包括对它们上的物理力量,然后忠实地重建和呈现,而且还发现和模拟了合理的材料参数。要创建VEOS,我们构建了一个多视图捕获系统,捕获压缩空气流的影响下的物体。建立近期型号动态神经辐射区域的进步,我们重建了物体和相应的变形字段。我们建议使用可差异的基于粒子的模拟器来使用这些变形字段来查找代表性的材料参数,这使我们能够运行新的模拟。为了渲染模拟对象,我们设计了一种用神经辐射场将模拟结果集成的方法。结果方法适用于各种场景:它可以处理由非均匀材料组成的物体,具有非常不同的形状,它可以模拟与其他虚拟对象的交互。我们在各种力字段下使用12个对象的新收集的数据集介绍了我们的结果,这将与社区共享。
translated by 谷歌翻译
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. State-of-the-art methods based on temporally varying Neural Radiance Fields (aka dynamic NeRFs) have shown impressive results on this task. However, for long videos with complex object motions and uncontrolled camera trajectories, these methods can produce blurry or inaccurate renderings, hampering their use in real-world applications. Instead of encoding the entire dynamic scene within the weights of an MLP, we present a new approach that addresses these limitations by adopting a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views in a scene-motion-aware manner. Our system retains the advantages of prior methods in its ability to model complex scenes and view-dependent effects, but also enables synthesizing photo-realistic novel views from long videos featuring complex scene dynamics with unconstrained camera trajectories. We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets, and also apply our approach to in-the-wild videos with challenging camera and object motion, where prior methods fail to produce high-quality renderings. Our project webpage is at dynibar.github.io.
translated by 谷歌翻译
捕获一般的变形场景对于许多计算机图形和视觉应用至关重要,当只有单眼RGB视频可用时,这尤其具有挑战性。竞争方法假设密集的点轨道,3D模板,大规模训练数据集或仅捕获小规模的变形。与这些相反,我们的方法UB4D在挑战性的情况下超过了先前的艺术状态,而没有做出这些假设。我们的技术包括两个新的,在非刚性3D重建的背景下,组件,即1)1)针对非刚性场景的基于坐标的和隐性的神经表示,这使动态场景无偏重建,2)新颖的新颖。动态场景流量损失,可以重建较大的变形。我们的新数据集(将公开可用)的结果表明,就表面重建精度和对大变形的鲁棒性而言,对最新技术的明显改善。访问项目页面https://4dqv.mpi-inf.mpg.de/ub4d/。
translated by 谷歌翻译
我们向渲染和时间(4D)重建人类的渲染和时间(4D)重建的神经辐射场,通过稀疏的摄像机捕获或甚至来自单眼视频。我们的方法将思想与神经场景表示,新颖的综合合成和隐式统计几何人称的人类表示相结合,耦合使用新颖的损失功能。在先前使用符号距离功能表示的结构化隐式人体模型,而不是使用统一的占用率来学习具有统一占用的光域字段。这使我们能够从稀疏视图中稳健地融合信息,并概括超出在训练中观察到的姿势或视图。此外,我们应用几何限制以共同学习观察到的主题的结构 - 包括身体和衣服 - 并将辐射场正规化为几何合理的解决方案。在多个数据集上的广泛实验证明了我们方法的稳健性和准确性,其概括能力显着超出了一系列的姿势和视图,以及超出所观察到的形状的统计外推。
translated by 谷歌翻译
最近,神经辐射场(NERF)正在彻底改变新型视图合成(NVS)的卓越性能。但是,NERF及其变体通常需要进行冗长的每场训练程序,其中将多层感知器(MLP)拟合到捕获的图像中。为了解决挑战,已经提出了体素网格表示,以显着加快训练的速度。但是,这些现有方法只能处理静态场景。如何开发有效,准确的动态视图合成方法仍然是一个开放的问题。将静态场景的方法扩展到动态场景并不简单,因为场景几何形状和外观随时间变化。在本文中,基于素素网格优化的最新进展,我们提出了一种快速变形的辐射场方法来处理动态场景。我们的方法由两个模块组成。第一个模块采用变形网格来存储3D动态功能,以及使用插值功能将观测空间中的3D点映射到规范空间的变形的轻巧MLP。第二个模块包含密度和颜色网格,以建模场景的几何形状和密度。明确对阻塞进行了建模,以进一步提高渲染质量。实验结果表明,我们的方法仅使用20分钟的训练就可以实现与D-NERF相当的性能,该训练比D-NERF快70倍以上,这清楚地证明了我们提出的方法的效率。
translated by 谷歌翻译
Image view synthesis has seen great success in reconstructing photorealistic visuals, thanks to deep learning and various novel representations. The next key step in immersive virtual experiences is view synthesis of dynamic scenes. However, several challenges exist due to the lack of high-quality training datasets, and the additional time dimension for videos of dynamic scenes. To address this issue, we introduce a multi-view video dataset, captured with a custom 10-camera rig in 120FPS. The dataset contains 96 high-quality scenes showing various visual effects and human interactions in outdoor scenes. We develop a new algorithm, Deep 3D Mask Volume, which enables temporally-stable view extrapolation from binocular videos of dynamic scenes, captured by static cameras. Our algorithm addresses the temporal inconsistency of disocclusions by identifying the error-prone areas with a 3D mask volume, and replaces them with static background observed throughout the video. Our method enables manipulation in 3D space as opposed to simple 2D masks, We demonstrate better temporal stability than frame-by-frame static view synthesis methods, or those that use 2D masks. The resulting view synthesis videos show minimal flickering artifacts and allow for larger translational movements.
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
我们提出了一种神经动力构造(NDR),这是一种无模板的方法,可从单眼RGB-D摄像机中恢复动态场景的高保真几何形状和动作。在NDR中,我们采用神经隐式函数进行表面表示和渲染,使捕获的颜色和深度可以完全利用以共同优化表面和变形。为了表示和限制非刚性变形,我们提出了一种新型的神经可逆变形网络,以便自动满足任意两个帧之间的循环一致性。考虑到动态场景的表面拓扑可能会随着时间的流逝而发生变化,我们采用一种拓扑感知的策略来构建融合框架的拓扑变化对应关系。NDR还以全球优化的方式进一步完善了相机的姿势。公共数据集和我们收集的数据集的实验表明,NDR的表现优于现有的单眼动态重建方法。
translated by 谷歌翻译
对于场景重建和新型视图综合的数量表示形式的普及最近,人们的普及使重点放在以高视觉质量和实时为实时的体积内容动画上。尽管基于学习功能的隐性变形方法可以产生令人印象深刻的结果,但它们是艺术家和内容创建者的“黑匣子”,但它们需要大量的培训数据才能有意义地概括,并且在培训数据之外不会产生现实的外推。在这项工作中,我们通过引入实时的音量变形方法来解决这些问题,该方法是实时的,易于使用现成的软件编辑,并且可以令人信服地推断出来。为了证明我们方法的多功能性,我们将其应用于两种情况:基于物理的对象变形和触发性,其中使用Blendshapes控制着头像。我们还进行了彻底的实验,表明我们的方法与两种体积方法相比,结合了基于网格变形的隐式变形和方法。
translated by 谷歌翻译
本文旨在减少透明辐射场的渲染时间。一些最近的作品用图像编码器配备了神经辐射字段,能够跨越场景概括,这避免了每场景优化。但是,它们的渲染过程通常很慢。主要因素是,在推断辐射场时,它们在空间中的大量点。在本文中,我们介绍了一个混合场景表示,它结合了最佳的隐式辐射场和显式深度映射,以便有效渲染。具体地,我们首先构建级联成本量,以有效地预测场景的粗糙几何形状。粗糙几何允许我们在场景表面附近的几个点来样,并显着提高渲染速度。该过程是完全可疑的,使我们能够仅从RGB图像共同学习深度预测和辐射现场网络。实验表明,该方法在DTU,真正的前瞻性和NERF合成数据集上展示了最先进的性能,而不是比以前的最可推广的辐射现场方法快至少50倍。我们还展示了我们的方法实时综合动态人类执行者的自由观点视频。代码将在https://zju3dv.github.io/enerf/处提供。
translated by 谷歌翻译
Figure 1: Given a monocular portrait video sequence of a person, we reconstruct a dynamic neural radiance field representing a 4D facial avatar, which allows us to synthesize novel head poses as well as changes in facial expressions.
translated by 谷歌翻译
where the highest resolution is required, using facial performance capture as a case in point.
translated by 谷歌翻译
我们提出了一些动态神经辐射场(FDNERF),这是第一种基于NERF的方法,能够根据少量动态图像重建和表达3D面的表达编辑。与需要密集图像作为输入的现有动态NERF不同,并且只能为单个身份建模,我们的方法可以使跨不同人的不同人进行面对重建。与设计用于建模静态场景的最先进的几杆NERF相比,提出的FDNERF接受视图的动态输入,并支持任意的面部表达编辑,即产生具有输入超出输入的新表达式的面孔。为了处理动态输入之间的不一致之处,我们引入了精心设计的条件特征翘曲(CFW)模块,以在2D特征空间中执行表达条件的翘曲,这也是身份自适应和3D约束。结果,不同表达式的特征被转换为目标的特征。然后,我们根据这些视图一致的特征构建一个辐射场,并使用体积渲染来合成建模面的新型视图。进行定量和定性评估的广泛实验表明,我们的方法在3D面重建和表达编辑任务上都优于现有的动态和几乎没有射击的NERF。我们的代码和模型将在接受后提供。
translated by 谷歌翻译