在这项工作中,我们为来自多视图RGB图像的3D面部重建提供了一种新方法。与以前的方法(3DMMS)构建的先前方法不同,我们的方法利用隐式表示来编码丰富的几何特征。我们的整体管道由两个主要组件组成,包括几何网络,它学习可变形的神经签名距离函数(SDF)作为3D面部表示,以及渲染网络,该渲染网络学会呈现神经SDF的面积点以匹配通过自我监督优化输入图像。要处理在测试时间的不同表达式的相同目标的野外稀疏视图输入,我们进一步提出了残余潜代码,以有效地扩展了学习的隐式面部表示的形状空间,以及新颖的视图开关丢失强制执行不同视图之间的一致性。我们在多个基准数据集上的实验结果表明,与最先进的方法相比,我们的方法优于替代基准,实现了优越的面部重建结果。
translated by 谷歌翻译
We propose a novel 3D morphable model for complete human heads based on hybrid neural fields. At the core of our model lies a neural parametric representation which disentangles identity and expressions in disjoint latent spaces. To this end, we capture a person's identity in a canonical space as a signed distance field (SDF), and model facial expressions with a neural deformation field. In addition, our representation achieves high-fidelity local detail by introducing an ensemble of local fields centered around facial anchor points. To facilitate generalization, we train our model on a newly-captured dataset of over 2200 head scans from 124 different identities using a custom high-end 3D scanning setup. Our dataset significantly exceeds comparable existing datasets, both with respect to quality and completeness of geometry, averaging around 3.5M mesh faces per scan. Finally, we demonstrate that our approach outperforms state-of-the-art methods by a significant margin in terms of fitting error and reconstruction quality.
translated by 谷歌翻译
尽管通过自学意识到,基于多层感知的方法在形状和颜色恢复方面取得了令人鼓舞的结果,但在学习深层隐式表面表示方面通常会遭受沉重的计算成本。由于渲染每个像素需要一个向前的网络推断,因此合成整个图像是非常密集的。为了应对这些挑战,我们提出了一种有效的粗到精细方法,以从本文中从多视图中恢复纹理网格。具体而言,采用可区分的泊松求解器来表示对象的形状,该求解器能够产生拓扑 - 敏捷和水密表面。为了说明深度信息,我们通过最小化渲染网格与多视图立体声预测深度之间的差异来优化形状几何形状。与形状和颜色的隐式神经表示相反,我们引入了一种基于物理的逆渲染方案,以共同估计环境照明和对象的反射率,该方案能够实时呈现高分辨率图像。重建的网格的质地是从可学习的密集纹理网格中插值的。我们已经对几个多视图立体数据集进行了广泛的实验,其有希望的结果证明了我们提出的方法的功效。该代码可在https://github.com/l1346792580123/diff上找到。
translated by 谷歌翻译
传统的变形面模型提供了对表达的细粒度控制,但不能轻易捕获几何和外观细节。神经体积表示方法是光学 - 现实主义,但很难动画,并没有概括到看不见的表达。为了解决这个问题,我们提出了iMavatar(隐式的可变头像),这是一种从单眼视频学习隐含头头像的新方法。灵感来自传统3DMMS提供的细粒度控制机制,我们代表了通过学习的闪打和剥皮领域的表达和与姿势相关的变形。这些属性是姿势独立的,可用于使规范几何形状和纹理字段变成新颖的表达和姿势参数。我们使用射线跟踪和迭代根发现来定位每个像素的规范表面交叉点。关键贡献是我们的新型分析梯度制定,可实现来自视频的imavatars的端到端培训。我们的定量和定性地显示了我们的方法改善了几何形状,并与最先进的方法相比,涵盖了更完整的表达空间。
translated by 谷歌翻译
从单个图像中恢复人头的几何形状,同时对材料和照明进行分解是一个严重不良的问题,需要事先解决。基于3D形态模型(3DMM)及其与可区分渲染器的组合的方法已显示出令人鼓舞的结果。但是,3DMM的表现力受到限制,它们通常会产生过度平滑和身份敏捷的3D形状,仅限于面部区域。最近,使用多层感知器参数化几何形状的神经场获得了高度准确的全头部重建。这些表示形式的多功能性也已被证明可有效解开几何形状,材料和照明。但是,这些方法需要几十个输入图像。在本文中,我们介绍了Sira,该方法从单个图像中,从一个图像中重建了具有高保真度几何形状和分解的灯光和表面材料的人头头像。我们的关键成分是基于神经场的两个数据驱动的统计模型,这些模型可以解决单视3D表面重建和外观分解的歧义。实验表明,Sira获得了最新的状态导致3D头重建,同时它成功地解开了全局照明以及弥漫性和镜面反照率。此外,我们的重建适合基于物理的外观编辑和头部模型重新构建。
translated by 谷歌翻译
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing. While studies over extending 2D StyleGAN to 3D faces have emerged, a corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. In this paper, we study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures. The problem is ill-posed: innumerable compositions of shape and texture could be rendered to the current image. Furthermore, with the limited capacity of a global latent code, 2D inversion methods cannot preserve faithful shape and texture at the same time when applied to 3D models. To solve this problem, we devise an effective self-training scheme to constrain the learning of inversion. The learning is done efficiently without any real-world 2D-3D training pairs but proxy samples generated from a 3D GAN. In addition, apart from a global latent code that captures the coarse shape and texture information, we augment the generation network with a local branch, where pixel-aligned features are added to faithfully reconstruct face details. We further consider a new pipeline to perform 3D view-consistent editing. Extensive experiments show that our method outperforms state-of-the-art inversion methods in both shape and texture reconstruction quality. Code and data will be released.
translated by 谷歌翻译
We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally helps modeling accessories, hair, and loose clothing. Owing to this, we present a complete 3D transformer-based attention framework which, given a single image of a person in an unconstrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing. We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation. Moreover, we show that the proposed methodology allows novel view synthesis, relighting, and re-posing the reconstruction, and can naturally be extended to handle multiple input images (e.g. different views of a person, or the same view, in different poses, in video). Finally, we demonstrate the editing capabilities of our model for 3D virtual try-on applications.
translated by 谷歌翻译
我们向渲染和时间(4D)重建人类的渲染和时间(4D)重建的神经辐射场,通过稀疏的摄像机捕获或甚至来自单眼视频。我们的方法将思想与神经场景表示,新颖的综合合成和隐式统计几何人称的人类表示相结合,耦合使用新颖的损失功能。在先前使用符号距离功能表示的结构化隐式人体模型,而不是使用统一的占用率来学习具有统一占用的光域字段。这使我们能够从稀疏视图中稳健地融合信息,并概括超出在训练中观察到的姿势或视图。此外,我们应用几何限制以共同学习观察到的主题的结构 - 包括身体和衣服 - 并将辐射场正规化为几何合理的解决方案。在多个数据集上的广泛实验证明了我们方法的稳健性和准确性,其概括能力显着超出了一系列的姿势和视图,以及超出所观察到的形状的统计外推。
translated by 谷歌翻译
我们提出了一种参数模型,将自由视图图像映射到编码面部形状,表达和外观的矢量空间,即使用神经辐射场,即可变的面部nerf。具体地,MoFanerf将编码的面部形状,表达和外观以及空间坐标和视图方向作为输入,作为输入到MLP,并输出光学逼真图像合成的空间点的辐射。与传统的3D可变模型(3DMM)相比,MoFanerf在直接综合光学逼真的面部细节方面表现出优势,即使是眼睛,嘴巴和胡须也是如此。而且,通过插入输入形状,表达和外观码,可以容易地实现连续的面部。通过引入特定于特定于特定的调制和纹理编码器,我们的模型合成精确的光度测量细节并显示出强的表示能力。我们的模型显示了多种应用的强大能力,包括基于图像的拟合,随机产生,面部索具,面部编辑和新颖的视图合成。实验表明,我们的方法比以前的参数模型实现更高的表示能力,并在几种应用中实现了竞争性能。据我们所知,我们的作品是基于神经辐射场上的第一款,可用于配合,发电和操作。我们的代码和型号在https://github.com/zhuhao-nju/mofanerf中发布。
translated by 谷歌翻译
We present PhoMoH, a neural network methodology to construct generative models of photorealistic 3D geometry and appearance of human heads including hair, beards, clothing and accessories. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photorealistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and allow the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.
translated by 谷歌翻译
In this report, we focus on reconstructing clothed humans in the canonical space given multiple views and poses of a human as the input. To achieve this, we utilize the geometric prior of the SMPLX model in the canonical space to learn the implicit representation for geometry reconstruction. Based on the observation that the topology between the posed mesh and the mesh in the canonical space are consistent, we propose to learn latent codes on the posed mesh by leveraging multiple input images and then assign the latent codes to the mesh in the canonical space. Specifically, we first leverage normal and geometry networks to extract the feature vector for each vertex on the SMPLX mesh. Normal maps are adopted for better generalization to unseen images compared to 2D images. Then, features for each vertex on the posed mesh from multiple images are integrated by MLPs. The integrated features acting as the latent code are anchored to the SMPLX mesh in the canonical space. Finally, latent code for each 3D point is extracted and utilized to calculate the SDF. Our work for reconstructing the human shape on canonical pose achieves 3rd performance on WCPA MVP-Human Body Challenge.
translated by 谷歌翻译
We propose a differentiable sphere tracing algorithm to bridge the gap between inverse graphics methods and the recently proposed deep learning based implicit signed distance function. Due to the nature of the implicit function, the rendering process requires tremendous function queries, which is particularly problematic when the function is represented as a neural network. We optimize both the forward and backward passes of our rendering layer to make it run efficiently with affordable memory consumption on a commodity graphics card. Our rendering method is fully differentiable such that losses can be directly computed on the rendered 2D observations, and the gradients can be propagated backwards to optimize the 3D geometry. We show that our rendering method can effectively reconstruct accurate 3D shapes from various inputs, such as sparse depth and multi-view images, through inverse optimization. With the geometry based reasoning, our 3D shape prediction methods show excellent generalization capability and robustness against various noises. * Work done while Shaohui Liu was an academic guest at ETH Zurich.
translated by 谷歌翻译
我们提出了一种神经动力构造(NDR),这是一种无模板的方法,可从单眼RGB-D摄像机中恢复动态场景的高保真几何形状和动作。在NDR中,我们采用神经隐式函数进行表面表示和渲染,使捕获的颜色和深度可以完全利用以共同优化表面和变形。为了表示和限制非刚性变形,我们提出了一种新型的神经可逆变形网络,以便自动满足任意两个帧之间的循环一致性。考虑到动态场景的表面拓扑可能会随着时间的流逝而发生变化,我们采用一种拓扑感知的策略来构建融合框架的拓扑变化对应关系。NDR还以全球优化的方式进一步完善了相机的姿势。公共数据集和我们收集的数据集的实验表明,NDR的表现优于现有的单眼动态重建方法。
translated by 谷歌翻译
In this paper, we propose ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image. Existing approaches to digitize 3D humans struggle to handle pose variations and recover details. Also, they do not produce models that are animation ready. In contrast, ARCH is a learned pose-aware model that produces detailed 3D rigged full-body human avatars from a single unconstrained RGB image. A Semantic Space and a Semantic Deformation Field are created using a parametric 3D body estimator. They allow the transformation of 2D/3D clothed humans into a canonical space, reducing ambiguities in geometry caused by pose variations and occlusions in training data. Detailed surface geometry and appearance are learned using an implicit function representation with spatial local features. Furthermore, we propose additional per-pixel supervision on the 3D reconstruction using opacity-aware differentiable rendering. Our experiments indicate that ARCH increases the fidelity of the reconstructed humans. We obtain more than 50% lower reconstruction errors for standard metrics compared to state-of-the-art methods on public datasets. We also show numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.
translated by 谷歌翻译
4D隐式表示中的最新进展集中在全球控制形状和运动的情况下,低维潜在向量,这很容易缺少表面细节和累积跟踪误差。尽管许多深层的本地表示显示了3D形状建模的有希望的结果,但它们的4D对应物尚不存在。在本文中,我们通过提出一个新颖的局部4D隐性代表来填补这一空白,以动态穿衣人,名为Lord,具有4D人类建模和局部代表的优点,并实现具有详细的表面变形的高保真重建,例如衣服皱纹。特别是,我们的主要见解是鼓励网络学习本地零件级表示的潜在代码,能够解释本地几何形状和时间变形。为了在测试时间进行推断,我们首先估计内部骨架运动在每个时间步中跟踪本地零件,然后根据不同类型的观察到的数据通过自动编码来优化每个部分的潜在代码。广泛的实验表明,该提出的方法具有强大的代表4D人类的能力,并且在实际应用上胜过最先进的方法,包括从稀疏点,非刚性深度融合(质量和定量)进行的4D重建。
translated by 谷歌翻译
在本文中,我们提出了一个大型详细的3D面部数据集,FACESCAPE和相应的基准,以评估单视图面部3D重建。通过对FACESCAPE数据进行训练,提出了一种新的算法来预测从单个图像输入的精心索引3D面模型。 FACESCAPE DataSet提供18,760个纹理的3D面,从938个科目捕获,每个纹理和每个特定表达式。 3D模型包含孔径级面部几何形状,也被处理为拓扑均匀化。这些精细的3D面部模型可以表示为用于详细几何的粗糙形状和位移图的3D可线模型。利用大规模和高精度的数据集,进一步提出了一种使用深神经网络学习特定于表达式动态细节的新颖算法。学习的关系是从单个图像输入的3D面预测系统的基础。与以前的方法不同,我们的预测3D模型在不同表达式下具有高度详细的几何形状。我们还使用FACESCAPE数据来生成野外和实验室内基准,以评估最近的单视面重建方法。报告并分析了相机姿势和焦距的尺寸,并提供了忠诚和综合评估,并揭示了新的挑战。前所未有的数据集,基准和代码已被释放到公众以进行研究目的。
translated by 谷歌翻译
本文解决了从多视频视频中重建动画人类模型的挑战。最近的一些作品提出,将一个非刚性变形的场景分解为规范的神经辐射场和一组变形场,它们映射观察空间指向规范空间,从而使它们能够从图像中学习动态场景。但是,它们代表变形场作为转换矢量场或SE(3)字段,这使得优化高度不受限制。此外,这些表示无法通过输入动议明确控制。取而代之的是,我们基于线性混合剥皮算法引入了一个姿势驱动的变形场,该算法结合了混合重量场和3D人类骨架,以产生观察到的对应对应。由于3D人类骨骼更容易观察到,因此它们可以正规化变形场的学习。此外,可以通过输入骨骼运动来控制姿势驱动的变形场,以生成新的变形字段来动画规范人类模型。实验表明,我们的方法显着优于最近的人类建模方法。该代码可在https://zju3dv.github.io/animatable_nerf/上获得。
translated by 谷歌翻译
3D面重建结果的评估通常取决于估计的3D模型和地面真相扫描之间的刚性形状比对。我们观察到,将两个形状与不同的参考点进行排列可以在很大程度上影响评估结果。这给精确诊断和改进3D面部重建方法带来了困难。在本文中,我们提出了一种新的评估方法,并采用了新的基准测试,包括100张全球对齐的面部扫描,具有准确的面部关键点,高质量的区域口罩和拓扑符合的网格。我们的方法执行区域形状比对,并导致计算形状误差期间更准确,双向对应关系。细粒度,区域评估结果为我们提供了有关最先进的3D面部重建方法表现的详细理解。例如,我们对基于单图像的重建方法的实验表明,DECA在鼻子区域表现最好,而Ganfit在脸颊区域的表现更好。此外,使用与我们构造的相同过程以对齐和重新构造几个3D面部数据集的新型和高质量的3DMM基础HIFI3D ++。我们将在https://realy3dface.com上发布真正的HIFI3D ++以及我们的新评估管道。
translated by 谷歌翻译
神经隐式表示在新的视图合成和来自多视图图像的高质量3D重建方面显示了其有效性。但是,大多数方法都集中在整体场景表示上,但忽略了其中的各个对象,从而限制了潜在的下游应用程序。为了学习对象组合表示形式,一些作品将2D语义图作为训练中的提示,以掌握对象之间的差异。但是他们忽略了对象几何和实例语义信息之间的牢固联系,这导致了单个实例的不准确建模。本文提出了一个新颖的框架ObjectsDF,以在3D重建和对象表示中构建具有高保真度的对象复合神经隐式表示。观察常规音量渲染管道的歧义,我们通过组合单个对象的签名距离函数(SDF)来对场景进行建模,以发挥明确的表面约束。区分不同实例的关键是重新审视单个对象的SDF和语义标签之间的牢固关联。特别是,我们将语义信息转换为对象SDF的函数,并为场景和对象开发统一而紧凑的表示形式。实验结果表明,ObjectSDF框架在表示整体对象组合场景和各个实例方面的优越性。可以在https://qianyiwu.github.io/objectsdf/上找到代码
translated by 谷歌翻译
精确地重建由单个图像的各种姿势和服装引起的精确复杂的人类几何形状非常具有挑战性。最近,基于像素对齐的隐式函数(PIFU)的作品已迈出了一步,并在基于图像的3D人数数字化上实现了最先进的保真度。但是,PIFU的培训在很大程度上取决于昂贵且有限的3D地面真相数据(即合成数据),从而阻碍了其对更多样化的现实世界图像的概括。在这项工作中,我们提出了一个名为selfpifu的端到端自我监督的网络,以利用丰富和多样化的野外图像,在对无约束的内部图像进行测试时,在很大程度上改善了重建。 SelfPifu的核心是深度引导的体积/表面感知的签名距离领域(SDF)学习,它可以自欺欺人地学习PIFU,而无需访问GT网格。整个框架由普通估计器,深度估计器和基于SDF的PIFU组成,并在训练过程中更好地利用了额外的深度GT。广泛的实验证明了我们自我监督框架的有效性以及使用深度作为输入的优越性。在合成数据上,与PIFUHD相比,我们的交叉点(IOU)达到93.5%,高18%。对于野外图像,我们对重建结果进行用户研究,与其他最先进的方法相比,我们的结果的选择率超过68%。
translated by 谷歌翻译