闭塞是无关的面部图像中的常见发生。由于闭塞的存在,这种面部图像的单个图像3D重建经常受到腐败。此外,虽然在遮挡区域中具有多个3D重建,但是现有方法仅限于仅产生单个解决方案。为了解决这两种挑战,我们呈现了不同的3DFace,专门设计用于从单个遮挡的面部图像同时产生一系列多样化的3D重建集。它由三个组成部分组成:全局+局部形状拟合过程,基于图形神经网络的网格VAE,以及促进迭代优化过程的决定性点过程的多样性。在闭塞面上的3D重建的定量和定性比较表明,多样化3dface可以估计与目标图像中的可见区域一致的3D形状,同时在遮挡区域上表现出高而逼真的分集。在面部图像上由掩模,眼镜和其他随机物体封闭,不同的3dface在与基线相比,在遮挡区域上产生3D形状的3D形状的分布。此外,我们最接近地面真理的样品比现有方法的单数重建降低了40%。
translated by 谷歌翻译
与仅对面部进行建模的早期方法相比,最近的3D面部重建方法重建了整个头部。尽管这些方法准确地重建了面部特征,但它们并未明确调节头部的上部。由于头发的闭塞程度不同,提取有关头部这一部分的信息具有挑战性。我们提出了一种新颖的方法,可以通过去除遮挡头发并重建皮肤,从而揭示有关头部形状的信息来建模上头。我们介绍了三个目标:1)骰子一致性损失,该骰子一致性损失在源的整体形状和渲染图像之间强制相似,2)刻度一致性损失,以确保即使头部的上部不是头部,也可以准确地复制头部形状可见,3)使用移动平均损耗功能训练的71个地标探测器,以检测头部的其他地标。这些目标用于以无监督的方式训练编码器,以从野外输入图像中回归火焰参数。我们无监督的3MM模型可在流行的基准上实现最新的结果,可用于推断动画或阿凡达创建中直接使用的头部形状,面部特征和纹理。
translated by 谷歌翻译
在过去几年中,许多面部分析任务已经完成了惊人的性能,其中应用包括来自单个“野外”图像的面部生成和3D面重建。尽管如此,据我们所知,没有方法可以从“野外”图像中产生渲染的高分辨率3D面,并且这可以归因于:(a)可用数据的跨度进行培训(b)缺乏可以成功应用于非常高分辨率数据的强大方法。在这项工作中,我们介绍了一种能够从单个“野外”图像中重建光电型渲染3D面部几何和BRDF的第一种方法。我们捕获了一个大型的面部形状和反射率,我们已经公开了。我们用精确的面部皮肤漫射和镜面反射,自遮挡和地下散射近似来定义快速面部光电型拟型渲染方法。有了这一点,我们训练一个网络,将面部漫射和镜面BRDF组件与烘焙照明的形状和质地一起脱颖而出,以最先进的3DMM配件方法重建。我们的方法通过显着的余量优于现有技术,并从单个低分辨率图像重建高分辨率3D面,这可以在各种应用中呈现,并桥接不一体谷。
translated by 谷歌翻译
我们介绍了一个现实的单发网眼的人体头像创作的系统,即简称罗马。使用一张照片,我们的模型估计了特定于人的头部网格和相关的神经纹理,该神经纹理编码局部光度和几何细节。最终的化身是操纵的,可以使用神经网络进行渲染,该神经网络与野外视频数据集上的网格和纹理估计器一起训练。在实验中,我们观察到我们的系统在头部几何恢复和渲染质量方面都具有竞争性的性能,尤其是对于跨人的重新制定。请参阅结果https://samsunglabs.github.io/rome/
translated by 谷歌翻译
3D面部重建是一个具有挑战性的问题,但也是计算机视觉和图形领域的重要任务。最近,许多研究人员对这个问题提请注意,并且已经发表了大量的文章。单个图像重建是3D面部重建的分支之一,在我们的生活中具有大量应用。本文是对从单个图像的3D面部重建最近的文献述评。
translated by 谷歌翻译
This work addresses the problem of generating 3D holistic body motions from human speech. Given a speech recording, we synthesize sequences of 3D body poses, hand gestures, and facial expressions that are realistic and diverse. To achieve this, we first build a high-quality dataset of 3D holistic body meshes with synchronous speech. We then define a novel speech-to-motion generation framework in which the face, body, and hands are modeled separately. The separated modeling stems from the fact that face articulation strongly correlates with human speech, while body poses and hand gestures are less correlated. Specifically, we employ an autoencoder for face motions, and a compositional vector-quantized variational autoencoder (VQ-VAE) for the body and hand motions. The compositional VQ-VAE is key to generating diverse results. Additionally, we propose a cross-conditional autoregressive model that generates body poses and hand gestures, leading to coherent and realistic motions. Extensive experiments and user studies demonstrate that our proposed approach achieves state-of-the-art performance both qualitatively and quantitatively. Our novel dataset and code will be released for research purposes at https://talkshow.is.tue.mpg.de.
translated by 谷歌翻译
Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation. Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations. Due to this linearity, they can not capture extreme deformations and nonlinear expressions. To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface. We introduce mesh sampling operations that enable a hierarchical mesh representation that captures non-linear variations in shape and expression at multiple scales within the model. In a variational setting, our model samples diverse realistic 3D faces from a multivariate Gaussian distribution. Our training data consists of 20,466 meshes of extreme expressions captured over 12 different subjects. Despite limited training data, our trained model outperforms state-of-the-art face models with 50% lower reconstruction error, while using 75% fewer parameters. We show that, replacing the expression space of an existing state-of-theart face model with our model, achieves a lower reconstruction error. Our data, model and code are available at http://coma.is.tue.mpg.de/.
translated by 谷歌翻译
3D面重建结果的评估通常取决于估计的3D模型和地面真相扫描之间的刚性形状比对。我们观察到,将两个形状与不同的参考点进行排列可以在很大程度上影响评估结果。这给精确诊断和改进3D面部重建方法带来了困难。在本文中,我们提出了一种新的评估方法,并采用了新的基准测试,包括100张全球对齐的面部扫描,具有准确的面部关键点,高质量的区域口罩和拓扑符合的网格。我们的方法执行区域形状比对,并导致计算形状误差期间更准确,双向对应关系。细粒度,区域评估结果为我们提供了有关最先进的3D面部重建方法表现的详细理解。例如,我们对基于单图像的重建方法的实验表明,DECA在鼻子区域表现最好,而Ganfit在脸颊区域的表现更好。此外,使用与我们构造的相同过程以对齐和重新构造几个3D面部数据集的新型和高质量的3DMM基础HIFI3D ++。我们将在https://realy3dface.com上发布真正的HIFI3D ++以及我们的新评估管道。
translated by 谷歌翻译
我们提出了神经头头像,这是一种新型神经表示,其明确地模拟了可动画的人体化身的表面几何形状和外观,可用于在依赖数字人类的电影或游戏行业中的AR / VR或其他应用中的电话会议。我们的代表可以从单眼RGB肖像视频中学到,该视频具有一系列不同的表达和视图。具体地,我们提出了一种混合表示,其由面部的粗糙形状和表达式和两个前馈网络组成的混合表示,以及预测底层网格的顶点偏移以及视图和表达依赖性纹理。我们证明,该表示能够准确地外推到看不见的姿势和观点,并在提供尖锐的纹理细节的同时产生自然表达。与先前的磁头头像上的作品相比,我们的方法提供了与标准图形管道兼容的完整人体头(包括头发)的分解形状和外观模型。此外,就重建质量和新型观看合成而定量和定性地优于现有技术的当前状态。
translated by 谷歌翻译
从单个图像重建高保真3D面部纹理是一个具有挑战性的任务,因为缺乏完整的面部信息和3D面和2D图像之间的域间隙。最新作品通过应用基于代或基于重建的方法来解决面部纹理重建问题。尽管各种方法具有自身的优势,但它们不能恢复高保真和可重新可传送的面部纹理,其中术语“重新可调剂”要求面部质地在空间地完成和与环境照明中脱颖而出。在本文中,我们提出了一种新颖的自我监督学习框架,用于从野外的单视图重建高质量的3D面。我们的主要思想是首先利用先前的一代模块来生产先前的Albedo,然后利用细节细化模块来获得详细的Albedo。为了进一步使面部纹理解开照明,我们提出了一种新颖的详细的照明表示,该表现在一起与详细的Albedo一起重建。我们还在反照侧和照明方面设计了几种正规化损失功能,以便于解散这两个因素。最后,由于可怜的渲染技术,我们的神经网络可以以自我监督的方式有效地培训。关于具有挑战性的数据集的广泛实验表明,我们的框架在定性和定量比较方面显着优于最先进的方法。
translated by 谷歌翻译
To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8× over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.
translated by 谷歌翻译
We propose a novel 3D morphable model for complete human heads based on hybrid neural fields. At the core of our model lies a neural parametric representation which disentangles identity and expressions in disjoint latent spaces. To this end, we capture a person's identity in a canonical space as a signed distance field (SDF), and model facial expressions with a neural deformation field. In addition, our representation achieves high-fidelity local detail by introducing an ensemble of local fields centered around facial anchor points. To facilitate generalization, we train our model on a newly-captured dataset of over 2200 head scans from 124 different identities using a custom high-end 3D scanning setup. Our dataset significantly exceeds comparable existing datasets, both with respect to quality and completeness of geometry, averaging around 3.5M mesh faces per scan. Finally, we demonstrate that our approach outperforms state-of-the-art methods by a significant margin in terms of fitting error and reconstruction quality.
translated by 谷歌翻译
我们提出了一种基于优化的新型范式,用于在图像和扫描上拟合3D人类模型。与直接回归输入图像中低维统计体模型(例如SMPL)的参数的现有方法相反,我们训练了每个vertex神经场网络的集合。该网络以分布式的方式预测基于当前顶点投影处提取的神经特征的顶点下降方向。在推断时,我们在梯度降低的优化管道中采用该网络,称为LVD,直到其收敛性为止,即使将所有顶点初始化为单个点,通常也会以一秒钟的分数出现。一项详尽的评估表明,我们的方法能够捕获具有截然不同的身体形状的穿着的人体,与最先进的人相比取得了重大改进。 LVD也适用于人类和手的3D模型配合,为此,我们以更简单,更快的方法对SOTA显示出显着改善。
translated by 谷歌翻译
We present PhoMoH, a neural network methodology to construct generative models of photorealistic 3D geometry and appearance of human heads including hair, beards, clothing and accessories. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photorealistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and allow the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.
translated by 谷歌翻译
3D漫画是对人脸的夸张的3D描述。本文的目的是对紧凑的参数空间中的3D漫画的变化进行建模,以便我们可以为处理3D漫画变形提供有用的数据驱动工具包。为了实现目标,我们提出了一个基于MLP的框架,用于构建可变形的表面模型,该模型采用潜在代码并产生3D表面。在框架中,警笛MLP模拟了在固定模板表面上采用3D位置并返回输入位置的3D位移向量的函数。我们通过学习采用潜在代码并产生MLP参数的超网络来创建3D表面的变化。一旦了解到,我们的可变形模型为3D漫画提供了一个不错的编辑空间,支持基于标签的语义编辑和基于尖的基于尖的变形,这两者都产生了高度夸张和自然的3D讽刺形状。我们还展示了可变形模型的其他应用,例如自动3D漫画创建。
translated by 谷歌翻译
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing. While studies over extending 2D StyleGAN to 3D faces have emerged, a corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. In this paper, we study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures. The problem is ill-posed: innumerable compositions of shape and texture could be rendered to the current image. Furthermore, with the limited capacity of a global latent code, 2D inversion methods cannot preserve faithful shape and texture at the same time when applied to 3D models. To solve this problem, we devise an effective self-training scheme to constrain the learning of inversion. The learning is done efficiently without any real-world 2D-3D training pairs but proxy samples generated from a 3D GAN. In addition, apart from a global latent code that captures the coarse shape and texture information, we augment the generation network with a local branch, where pixel-aligned features are added to faithfully reconstruct face details. We further consider a new pipeline to perform 3D view-consistent editing. Extensive experiments show that our method outperforms state-of-the-art inversion methods in both shape and texture reconstruction quality. Code and data will be released.
translated by 谷歌翻译
生成对抗性网络(GANS)的最新进展导致了面部图像合成的显着成果。虽然使用基于样式的GAN的方法可以产生尖锐的照片拟真的面部图像,但是通常难以以有意义和解开的方式控制所产生的面的特性。之前的方法旨在在先前培训的GaN的潜在空间内实现此类语义控制和解剖。相比之下,我们提出了一个框架,即明确地提出了诸如3D形状,反玻璃,姿势和照明的面部的身体属性,从而通过设计提供解剖。我们的方法,大多数GaN,与非线性3D可变模型的物理解剖和灵活性集成了基于风格的GAN的表现力和质感,我们与最先进的2D头发操纵网络相结合。大多数GaN通过完全解散的3D控制来实现肖像图像的照片拟理性操纵,从而实现了光线,面部表情和姿势变化的极端操作,直到完整的档案视图。
translated by 谷歌翻译
尽管最近从遮挡和嘈杂的面部图像中的3D面部重建的发展,但性能仍然不满意。主要挑战之一是在面部图像中处理中等至重闭塞。另外,面部图像中的噪声抑制了面部属性的正确捕获,从而需要可靠地解决。此外,大多数现有方法依赖于额外的依赖性,对培训过程构成了许多约束。因此,我们提出了一种自我监督的强制性指导(流氓)框架,以获得面部图像中的遮挡和噪声的鲁棒性。所提出的网络包含1)指导管线,用于获得清洁面的3D面系数,以及2)稳定流水线,以获取封闭或噪声图像的估计系数与清洁对应物之间的估计系数之间的一致性。所提出的图像和特征级损失功能有助于流氓学习过程而不会构成额外的依赖性。在Celeba的测试数据集的三种变化:理性闭塞,妄想闭塞和嘈杂的面部图像,我们的方法优于当前的最先进的方法(例如,基于形状的3D顶点错误,合理闭塞的0.146〜0.048的减少,从0.292〜0.061,妄想闭塞和面部图像中的噪声为0.269至0.053),展示了所提出的方法的有效性。
translated by 谷歌翻译
3D可线模型(3DMMS)是面部形状和外观的生成模型。然而,传统3DMMS的形状参数满足多变量高斯分布,而嵌入式嵌入满足过边距分布,并且这种冲突使得面部重建模型同时保持忠诚度和形状一致性的挑战。为了解决这个问题,我们提出了一种用于单眼脸部重建的新型3DMM的球体面部模型(SFM),这可以保持既有忠诚度和身份一致性。我们的SFM的核心是可以用于重建3D面形状的基矩阵,并且通过采用在第一和第二阶段中使用3D和2D训练数据的两级训练方法来学习基本矩阵。为了解决分发不匹配,我们设计一种新的损失,使形状参数具有超球的潜在空间。广泛的实验表明,SFM具有高表示能力和形状参数空间的聚类性能。此外,它产生富翼面形状,并且形状在单眼性重建中的挑战条件下是一致的。
translated by 谷歌翻译
传统上,视频会议是广泛采用的电信解决方案,但由于面部代表性的2D性质,缺乏沉浸性是固有的。通过头戴式显示器(HMDS)的通信/远程呈现系统中虚拟现实(VR)的集成有望为用户提供更好的沉浸体验。然而,HMD通过阻挡用户的面部外观和表达而导致障碍。为了克服这些问题,我们提出了一种用于HMD去闭锁的一种新的关注的编码器解码器架构。我们还建议使用用户的短视频(1-2分钟),在不同的外观中捕获的短视频(1-2分钟)培训我们的特定于人士的模型,并展示了解开了Unseen姿势和外观的概括。我们通过最先进的方法报告了卓越的定性和定量结果。我们还使用现有动画和3D面重建管道向混合视频电话会议提供这种方法的应用。
translated by 谷歌翻译