在本文中,我们开发了一种强大的3D服装数字化解决方案,可以在现实世界时尚目录图像上概括用布纹理遮挡和大体姿势变化。我们假设已知类型的服装类型的固定拓扑参数模板网格模型(例如,T恤,裤子),并从输入目录图像执行高质量纹理的映射到与衣服的参数网格模型相对应的UV映射面板。我们通过首先预测服装边界的稀疏2D地标。随后,我们使用这些地标在UV地图面板上执行基于薄板样条的纹理传输。随后,我们使用深度纹理修复网络来填充TPS输出中的大孔(由于查看变化和自闭电),以产生一致的UV映射。此外,为了培训监督的地标预测和纹理修复任务,我们产生了一大组合成数据,其具有不同于各种姿势的各种视图的不同纹理和照明。此外,我们手动注释了一小组时尚目录图像从在线时尚电子商务平台到Finetune。我们开展彻底的经验评估,并在时尚目录图像上显示我们所提出的3D服装纹理解决方案的令人印象深刻的定性结果。这种3D服装数字化有助于我们解决启用3D虚拟试验的具有挑战性的任务。
translated by 谷歌翻译
3D服装重建的现有方法要么假设服装几何形状的预定义模板(将其限制为固定服装样式),要么产生顶点有色网眼(缺少高频纹理细节)。我们的新型框架共同学习的几何和语义信息来自输入单眼图像,用于无模板纹理的3D服装数字化。更具体地说,我们建议扩展去皮的表示,以预测像素对齐的分层深度和语义图以提取3D服装。进一步利用分层表示,以参数化提取服装的任意表面,而没有任何人类干预以形成紫外线图集。然后,通过将像素从输入图像从输入图像投射到可见区域的UV空间,然后以混合方式将纹理以混合方式赋予,然后添加封闭的区域。因此,我们能够将任意放松的衣服样式数字化,同时从单眼图像中保留高频纹理细节。我们在三个公开可用的数据集中获得了高保真3D服装重建结果,并在Internet图像上概括。
translated by 谷歌翻译
用于运动中的人类的新型视图综合是一个具有挑战性的计算机视觉问题,使得诸如自由视视频之类的应用。现有方法通常使用具有多个输入视图,3D监控或预训练模型的复杂设置,这些模型不会概括为新标识。旨在解决这些限制,我们提出了一种新颖的视图综合框架,以从单视图传感器捕获的任何人的看法生成现实渲染,其具有稀疏的RGB-D,类似于低成本深度摄像头,而没有参与者特定的楷模。我们提出了一种架构来学习由基于球体的神经渲染获得的小说视图中的密集功能,并使用全局上下文修复模型创建完整的渲染。此外,增强剂网络利用了整体保真度,即使在原始视图中的遮挡区域中也能够产生细节的清晰渲染。我们展示了我们的方法为单个稀疏RGB-D输入产生高质量的合成和真实人体演员的新颖视图。它概括了看不见的身份,新的姿势,忠实地重建面部表情。我们的方法优于现有人体观测合成方法,并且对不同水平的输入稀疏性具有稳健性。
translated by 谷歌翻译
We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally helps modeling accessories, hair, and loose clothing. Owing to this, we present a complete 3D transformer-based attention framework which, given a single image of a person in an unconstrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing. We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation. Moreover, we show that the proposed methodology allows novel view synthesis, relighting, and re-posing the reconstruction, and can naturally be extended to handle multiple input images (e.g. different views of a person, or the same view, in different poses, in video). Finally, we demonstrate the editing capabilities of our model for 3D virtual try-on applications.
translated by 谷歌翻译
我们提出了EgoreRender,一种用于渲染由安装在盖帽或VR耳机上的可穿戴的专门鱼眼相机捕获的人的全身神经头像的系统。我们的系统使演员的质感性谱系景观和她的动作从任意虚拟相机位置。从如下视图和大型扭曲,渲染来自此类自主特征的全身头像具有独特的挑战。我们通过将渲染过程分解为几个步骤,包括纹理综合,构建和神经图像翻译来解决这些挑战。对于纹理合成,我们提出了EGO-DPNET,一个神经网络,其在输入的鱼眼图像和底层参数体模型之间倾少密集的对应,并从自我输入输入中提取纹理。此外,为了编码动态外观,我们的方法还学习隐式纹理堆栈,捕获横跨姿势和视点的详细外观变化。对于正确的姿态生成,我们首先使用参数模型从Egentric视图估算身体姿势。然后,我们通过将参数模型投影到用户指定的目标视点来综合外部释放姿势图像。我们接下来将目标姿势图像和纹理组合到组合特征图像中,该组合特征图像使用神经图像平移网络转换为输出彩色图像。实验评估表明,Egorenderer能够产生佩戴Egocentric相机的人的现实自由观点的头像。几个基线的比较展示了我们的方法的优势。
translated by 谷歌翻译
Recent work has shown the benefits of synthetic data for use in computer vision, with applications ranging from autonomous driving to face landmark detection and reconstruction. There are a number of benefits of using synthetic data from privacy preservation and bias elimination to quality and feasibility of annotation. Generating human-centered synthetic data is a particular challenge in terms of realism and domain-gap, though recent work has shown that effective machine learning models can be trained using synthetic face data alone. We show that this can be extended to include the full body by building on the pipeline of Wood et al. to generate synthetic images of humans in their entirety, with ground-truth annotations for computer vision applications. In this report we describe how we construct a parametric model of the face and body, including articulated hands; our rendering pipeline to generate realistic images of humans based on this body model; an approach for training DNNs to regress a dense set of landmarks covering the entire body; and a method for fitting our body model to dense landmarks predicted from multiple views.
translated by 谷歌翻译
人类性能捕获是一种非常重要的计算机视觉问题,在电影制作和虚拟/增强现实中具有许多应用。许多以前的性能捕获方法需要昂贵的多视图设置,或者没有恢复具有帧到帧对应关系的密集时空相干几何。我们提出了一种新颖的深度致密人体性能捕获的深层学习方法。我们的方法是基于多视图监督的弱监督方式培训,完全删除了使用3D地面真理注释的培训数据的需求。网络架构基于两个单独的网络,将任务解散为姿势估计和非刚性表面变形步骤。广泛的定性和定量评估表明,我们的方法在质量和稳健性方面优于现有技术。这项工作是DeepCAP的扩展版本,在那里我们提供更详细的解释,比较和结果以及应用程序。
translated by 谷歌翻译
尽管最近在开发动画全身化身方面取得了进展,但服装的现实建模(人类自我表达的核心方面之一)仍然是一个开放的挑战。最先进的物理模拟方法可以以交互速度产生现实行为的服装几何形状。但是,建模光真逼真的外观通常需要基于物理的渲染,这对于交互式应用来说太昂贵了。另一方面,数据驱动的深度外观模型能够有效地产生逼真的外观,但在合成高度动态服装的几何形状和处理具有挑战性的身体套构型方面挣扎。为此,我们通过对服装的明确建模介绍了姿势驱动的化身,这些化身表现出逼真的服装动力学和从现实世界数据中学到的逼真的外观。关键的想法是引入一个在显式几何形状之上运行的神经服装外观模型:在火车时,我们使用高保真跟踪,而在动画时期,我们依靠物理模拟的几何形状。我们的关键贡献是一个具有物理启发的外观网络,能够生成具有视图依赖性和动态阴影效果的影像逼真的外观,即使对于看不见的身体透明构型也是如此。我们对我们的模型进行了彻底的评估,并在几种受试者和不同类型的衣服上展示了不同的动画结果。与以前关于影迷全身化身的工作不同,我们的方法甚至可以为宽松的衣服产生更丰富的动力和更现实的变形。我们还证明,我们的配方自然允许服装与不同人的头像一起使用,同时保持完全动画,因此首次可以采用新颖的衣服来实现逼真的化身。
translated by 谷歌翻译
We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry. In contrast to the major trend of neural implicit representations, HARP models a hand with a mesh-based parametric hand model, a vertex displacement map, a normal map, and an albedo without any neural components. As validated by our experiments, the explicit nature of our representation enables a truly scalable, robust, and efficient approach to hand avatar creation. HARP is optimized via gradient descent from a short sequence captured by a hand-held mobile phone and can be directly used in AR/VR applications with real-time rendering capability. To enable this, we carefully design and implement a shadow-aware differentiable rendering scheme that is robust to high degree articulations and self-shadowing regularly present in hand motion sequences, as well as challenging lighting conditions. It also generalizes to unseen poses and novel viewpoints, producing photo-realistic renderings of hand animations performing highly-articulated motions. Furthermore, the learned HARP representation can be used for improving 3D hand pose estimation quality in challenging viewpoints. The key advantages of HARP are validated by the in-depth analyses on appearance reconstruction, novel-view and novel pose synthesis, and 3D hand pose refinement. It is an AR/VR-ready personalized hand representation that shows superior fidelity and scalability.
translated by 谷歌翻译
虚拟网格是在线通信的未来。服装是一个人身份和自我表达的重要组成部分。然而,目前,在培训逼真的布置动画的远程介绍模型的必需分子和准确性中,目前无法使用注册衣服的地面真相数据。在这里,我们提出了一条端到端的管道,用于建造可驱动的服装代表。我们方法的核心是一种多视图图案的布跟踪算法,能够以高精度捕获变形。我们进一步依靠跟踪方法生产的高质量数据来构建服装头像:一件衣服的表达和完全驱动的几何模型。可以使用一组稀疏的视图来对所得模型进行动画,并产生高度逼真的重建,这些重建忠于驾驶信号。我们证明了管道对现实的虚拟电视应用程序的功效,在该应用程序中,从两种视图中重建了衣服,并且用户可以根据自己的意愿进行选择和交换服装设计。此外,当仅通过身体姿势驱动时,我们表现出一个具有挑战性的场景,我们可驾驶的服装Avatar能够生产出比最先进的面包质量明显更高的逼真的布几何形状。
translated by 谷歌翻译
Pixel-aligned Implicit function (PIFu): We present pixel-aligned implicit function (PIFu), which allows recovery of high-resolution 3D textured surfaces of clothed humans from a single input image (top row). Our approach can digitize intricate variations in clothing, such as wrinkled skirts and high-heels, including complex hairstyles. The shape and textures can be fully recovered including largely unseen regions such as the back of the subject. PIFu can also be naturally extended to multi-view input images (bottom row).
translated by 谷歌翻译
我们提出了一种新的姿势转移方法,用于从由一系列身体姿势控制的人的单个图像中综合人类动画。现有的姿势转移方法在申请新颖场景时表现出显着的视觉伪影,从而导致保留人的身份和纹理的时间不一致和失败。为了解决这些限制,我们设计了一种构成神经网络,预测轮廓,服装标签和纹理。每个模块化网络明确地专用于可以从合成数据学习的子任务。在推理时间,我们利用训练有素的网络在UV坐标中产生统一的外观和标签,其横跨姿势保持不变。统一的代表提供了一个不完整的且强烈指导,以响应姿势变化而产生外观。我们使用训练有素的网络完成外观并呈现背景。通过这些策略,我们能够以时间上连贯的方式综合人类动画,这些动画可以以时间上连贯的方式保护人的身份和外观,而无需在测试场景上进行任何微调。实验表明,我们的方法在合成质量,时间相干性和泛化能力方面优于最先进的。
translated by 谷歌翻译
新兴的元应用需要人类手的可靠,准确和逼真的复制品,以便在物理世界中进行复杂的操作。虽然真实的人手代表了骨骼,肌肉,肌腱和皮肤之间最复杂的协调之一,但最先进的技术一致专注于仅建模手的骨架。在本文中,我们提出了Nimble,这是一种新型的参数手模型,其中包括缺少的密钥组件,将3D手模型带入了新的现实主义水平。我们首先在最近的磁共振成像手(MRI手)数据集上注释肌肉,骨骼和皮肤,然后在数据集中的单个姿势和受试者上注册一个体积模板手。敏捷由20个骨头组成,作为三角形网格,7个肌肉群作为四面体网眼和一个皮肤网。通过迭代形状的注册和参数学习,它进一步产生形状的混合形状,姿势混合形状和关节回归器。我们证明将敏捷性应用于建模,渲染和视觉推理任务。通过强制执行内部骨骼和肌肉以符合解剖学和运动学规则,Nimble可以使3D手动画为前所未有的现实主义。为了建模皮肤的外观,我们进一步构建了一个光度法,以获取高质量的纹理和正常地图,以模型皱纹和棕榈印刷。最后,敏捷还通过合成丰富的数据或直接作为推理网络中的可区分层来使基于学习的手姿势和形状估计受益。
translated by 谷歌翻译
The combination of artist-curated scans, and deep implicit functions (IF), is enabling the creation of detailed, clothed, 3D humans from images. However, existing methods are far from perfect. IF-based methods recover free-form geometry but produce disembodied limbs or degenerate shapes for unseen poses or clothes. To increase robustness for these cases, existing work uses an explicit parametric body model to constrain surface reconstruction, but this limits the recovery of free-form surfaces such as loose clothing that deviates from the body. What we want is a method that combines the best properties of implicit and explicit methods. To this end, we make two key observations: (1) current networks are better at inferring detailed 2D maps than full-3D surfaces, and (2) a parametric model can be seen as a "canvas" for stitching together detailed surface patches. ECON infers high-fidelity 3D humans even in loose clothes and challenging poses, while having realistic faces and fingers. This goes beyond previous methods. Quantitative, evaluation of the CAPE and Renderpeople datasets shows that ECON is more accurate than the state of the art. Perceptual studies also show that ECON's perceived realism is better by a large margin. Code and models are available for research purposes at https://xiuyuliang.cn/econ
translated by 谷歌翻译
Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can easily reconstruct the body geometry and infer the full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT introduces the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pre-trained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar. Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed current state-of-the-art avatar creation methods when only a single image is available. Code will be public for reseach purpose at https://elicit3d.github.io .
translated by 谷歌翻译
从单眼图像中恢复纹理的3D网格是高度挑战的,尤其是对于缺乏3D地面真理的野外物体。在这项工作中,我们提出了网络文化,这是一个新的框架,可通过利用3D GAN预先训练的3D纹理网格合成的3D GAN的生成性先验。重建是通过在3D GAN中搜索最类似于目标网格的潜在空间来实现重建。由于预先训练的GAN以网状几何形状和纹理封装了丰富的3D语义,因此在GAN歧管内进行搜索,因此自然地使重建的真实性和忠诚度正常。重要的是,这种正则化直接应用于3D空间,从而提供了在2D空间中未观察到的网格零件的关键指导。标准基准测试的实验表明,我们的框架获得了忠实的3D重建,并在观察到的部分和未观察到的部分中都具有一致的几何形状和纹理。此外,它可以很好地推广到不太常见的网格中,例如可变形物体的扩展表达。代码在https://github.com/junzhezhang/mesh-inversion上发布
translated by 谷歌翻译
目前用于学习现实和可动画3D穿衣服的方法需要带有仔细控制的用户的构成3D扫描或2D图像。相比之下,我们的目标是从不受约束的姿势中只有2D人的人们学习化身。给定一组图像,我们的方法估计来自每个图像的详细3D表面,然后将它们组合成一个可动画的化身。隐式功能非常适合第一个任务,因为他们可以捕获像头发或衣服等细节。然而,目前的方法对各种人类的姿势并不稳健,并且通常会产生破碎或肢体的3D表面,缺少细节或非人形状。问题是这些方法使用对全局姿势敏感的全局特征编码器。为了解决这个问题,我们提出图标(“从正规中获得的隐式衣物人类”),它使用本地特征。图标有两个主要模块,两者都利用SMPL(-X)正文模型。首先,图标Infers详细的衣服 - 人类法线(前/后)在SMPL(-X)法线上。其次,可视性感知隐式表面回归系统产生人占用场的ISO表面。重要的是,在推断时间下,反馈回路在使用推断的布料正线改进SMPL(-X)网格之间交替,然后改装正常。给定多种姿势的多个重建帧,我们使用扫描来从中生成可动画的化身。对Agora和Cape数据集的评估显示,即使具有大量有限的培训数据,图标越优于重建中的最新状态。另外,它对分布外样品进行更强大,例如,野外的姿势/图像和帧外裁剪。图标从野外图像中迈向强大的3D穿上人体重建。这使得能够使用个性化和天然姿势依赖布变形来直接从视频创建化身。
translated by 谷歌翻译
Image and video synthesis has become a blooming topic in computer vision and machine learning communities along with the developments of deep generative models, due to its great academic and application value. Many researchers have been devoted to synthesizing high-fidelity human images as one of the most commonly seen object categories in daily lives, where a large number of studies are performed based on various deep generative models, task settings and applications. Thus, it is necessary to give a comprehensive overview on these variant methods on human image generation. In this paper, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods and hybrid methods. For each route, the most representative models and the corresponding variants are presented, where the advantages and characteristics of different methods are summarized in terms of model architectures and input/output requirements. Besides, the main public human image datasets and evaluation metrics in the literature are also summarized. Furthermore, due to the wide application potentials, two typical downstream usages of synthesized human images are covered, i.e., data augmentation for person recognition tasks and virtual try-on for fashion customers. Finally, we discuss the challenges and potential directions of human image generation to shed light on future research.
translated by 谷歌翻译
仅使用单视2D照片的收藏集对3D感知生成对抗网络(GAN)的无监督学习最近取得了很多进展。然而,这些3D gan尚未证明人体,并且现有框架的产生的辐射场不是直接编辑的,从而限制了它们在下游任务中的适用性。我们通过开发一个3D GAN框架来解决这些挑战的解决方案,该框架学会在规范的姿势中生成人体或面部的辐射场,并使用显式变形场将其扭曲成所需的身体姿势或面部表达。使用我们的框架,我们展示了人体的第一个高质量的辐射现场生成结果。此外,我们表明,与未接受明确变形训练的3D GAN相比,在编辑其姿势或面部表情时,我们的变形感知训练程序可显着提高产生的身体或面部的质量。
translated by 谷歌翻译
本文介绍了一个新的大型多视图数据集,称为Humbi的人体表达式,具有天然衣物。 HUMBI的目标是为了便于建模特异性的外观和五个主要身体信号的几何形状,包括来自各种各样的人的凝视,面部,手,身体和服装。 107同步高清摄像机用于捕获772个跨性别,种族,年龄和风格的独特科目。使用多视图图像流,我们使用3D网格模型重建高保真体表达式,允许表示特定于视图的外观。我们证明HUMBI在学习和重建完整的人体模型方面非常有效,并且与人体表达的现有数据集互补,具有有限的观点和主题,如MPII-Gaze,Multi-Pie,Human 3.6m和Panoptic Studio数据集。基于HUMBI,我们制定了一种展开的姿态引导外观渲染任务的新基准挑战,其旨在大大延长了在3D中建模的不同人类表达式中的光敏性,这是真实的社会远程存在的关键能力。 Humbi公开提供http://humbi-data.net
translated by 谷歌翻译