An outfit visualization method generates an image of a person wearing real garments from images of those garments. Current methods can produce images that look realistic and preserve garment identity, captured in details such as collar, cuffs, texture, hem, and sleeve length. However, no current method can both control how the garment is worn -- including tuck or untuck, opened or closed, high or low on the waist, etc.. -- and generate realistic images that accurately preserve the properties of the original garment. We describe an outfit visualization method that controls drape while preserving garment identity. Our system allows instance independent editing of garment drape, which means a user can construct an edit (e.g. tucking a shirt in a specific way) that can be applied to all shirts in a garment collection. Garment detail is preserved by relying on a warping procedure to place the garment on the body and a generator then supplies fine shading detail. To achieve instance independent control, we use control points with garment category-level semantics to guide the warp. The method produces state-of-the-art quality images, while allowing creative ways to style garments, including allowing tops to be tucked or untucked; jackets to be worn open or closed; skirts to be worn higher or lower on the waist; and so on. The method allows interactive control to correct errors in individual renderings too. Because the edits are instance independent, they can be applied to large pools of garments automatically and can be conditioned on garment metadata (e.g. all cropped jackets are worn closed or all bomber jackets are worn closed).
translated by 谷歌翻译
基于图像的虚拟试验是以人为中心的现实潜力,是以人为中心的图像生成的最有希望的应用之一。在这项工作中,我们迈出了一步,探索多功能的虚拟尝试解决方案,我们认为这应该具有三个主要属性,即,它们应支持无监督的培训,任意服装类别和可控的服装编辑。为此,我们提出了一个特征性的端到端网络,即用空间自适应的斑点适应性GAN ++(Pasta-gan ++),以实现用于高分辨率不合规的虚拟试验的多功能系统。具体而言,我们的意大利面++由一个创新的贴布贴片的拆卸模块组成,可以将完整的服装切换为归一化贴剂,该贴片能够保留服装样式信息,同时消除服装空间信息,从而减轻在未受监督训练期间过度适应的问题。此外,面食++引入了基于贴片的服装表示和一个贴片引导的解析合成块,使其可以处理任意服装类别并支持本地服装编辑。最后,为了获得具有逼真的纹理细节的尝试结果,面食gan ++结合了一种新型的空间自适应残留模块,以将粗翘曲的服装功能注入发电机。对我们新收集的未配对的虚拟试验(UPT)数据集进行了广泛的实验,证明了面食gan ++比现有SOTA的优越性及其可控服装编辑的能力。
translated by 谷歌翻译
Previous virtual try-on methods usually focus on aligning a clothing item with a person, limiting their ability to exploit the complex pose, shape and skin color of the person, as well as the overall structure of the clothing, which is vital to photo-realistic virtual try-on. To address this potential weakness, we propose a fill in fabrics (FIFA) model, a self-supervised conditional generative adversarial network based framework comprised of a Fabricator and a unified virtual try-on pipeline with a Segmenter, Warper and Fuser. The Fabricator aims to reconstruct the clothing image when provided with a masked clothing as input, and learns the overall structure of the clothing by filling in fabrics. A virtual try-on pipeline is then trained by transferring the learned representations from the Fabricator to Warper in an effort to warp and refine the target clothing. We also propose to use a multi-scale structural constraint to enforce global context at multiple scales while warping the target clothing to better fit the pose and shape of the person. Extensive experiments demonstrate that our FIFA model achieves state-of-the-art results on the standard VITON dataset for virtual try-on of clothing items, and is shown to be effective at handling complex poses and retaining the texture and embroidery of the clothing.
translated by 谷歌翻译
基于图像的虚拟试验努力将服装的外观转移到目标人的图像上。先前的工作主要集中在上身衣服(例如T恤,衬衫和上衣)上,并忽略了全身或低身物品。这种缺点来自一个主要因素:用于基于图像的虚拟试验的当前公开可用数据集并不解释此品种,从而限制了该领域的进度。为了解决这种缺陷,我们介绍着着装代码,其中包含多类服装的图像。着装代码比基于图像的虚拟试验的公共可用数据集大于3倍以上,并且具有前视图,全身参考模型的高分辨率配对图像(1024x768)。为了生成具有高视觉质量且细节丰富的高清尝试图像,我们建议学习细粒度的区分功能。具体而言,我们利用一种语义意识歧视器,该歧视器在像素级而不是图像级或贴片级上进行预测。广泛的实验评估表明,所提出的方法在视觉质量和定量结果方面超过了基线和最先进的竞争者。着装码数据集可在https://github.com/aimagelab/dress-code上公开获得。
translated by 谷歌翻译
由于发型的复杂性和美味,编辑发型是独一无二的,而且具有挑战性。尽管最近的方法显着改善了头发的细节,但是当源图像的姿势与目标头发图像的姿势大不相同时,这些模型通常会产生不良的输出,从而限制了其真实世界的应用。发型是一种姿势不变的发型转移模型,可以减轻这种限制,但在保留精致的头发质地方面仍然表现出不令人满意的质量。为了解决这些局限性,我们提出了配备潜在优化和新呈现的局部匹配损失的高性能姿势不变的发型转移模型。在stylegan2潜在空间中,我们首先探索目标头发的姿势对准的潜在代码,并根据本地风格匹配保留了详细纹理。然后,我们的模型对源的遮挡构成了对齐的目标头发的遮挡,并将两个图像混合在一起以产生最终输出。实验结果表明,我们的模型在在较大的姿势差异和保留局部发型纹理下转移发型方面具有优势。
translated by 谷歌翻译
尽管基于深度学习的面部相关模型成功显着,但这些模型仍然仅限于真正人类面的领域。另一方面,由于缺乏组织良好的数据集,由于缺乏组织的数据集,动画面的域已经不太积极地研究。在本文中,我们通过可控的合成动画模型介绍了一个大规模动画CeleBfaces数据集(AnimeCeleb),以提高动画面域的研究。为了促进数据生成过程,我们基于开放式3D软件和开发的注释系统构建半自动管道。这导致构建大型动画面部数据集,包括具有丰富注释的多姿态和多样式动画面。实验表明,我们的数据集适用于各种动画相关的任务,如头部重新创建和着色。
translated by 谷歌翻译
基于深度学习的虚拟试用系统最近取得了一些令人鼓舞的进展,但仍然存在需要解决的几个重要挑战,例如尝试所有类型的任意衣服,从一个类别到另一个类别的衣服尝试 - 少数文物的结果。要处理这个问题,我们在本文中首先使用各种类型的衣服,\即顶部,底部和整个衣服收集新的数据集,每个人都有多个类别,具有模式,徽标和其他细节的服装特性丰富的信息。基于此数据集,我们提出了用于全型衣服的任意虚拟试验网络(Avton),这可以通过保存和交易目标衣服和参考人员的特性来综合实际的试验图像。我们的方法包括三个模块:1)四肢预测模块,其用于通过保留参考人物的特性来预测人体部位。这对于处理交叉类别的试验任务(例如长袖\(\ Leftrightarrow \)短袖或长裤(\ Leftrightarrow \)裙子,\等),\等)特别适合,其中暴露的手臂或腿部有皮肤可以合理地预测颜色和细节; 2)改进的几何匹配模块,该模块设计成根据目标人的几何形状的扭曲衣服。通过紧凑的径向功能(Wendland的\(\ PSI \) - 功能),我们改进了基于TPS的翘曲方法; 3)折衷融合模块,即扭转翘曲衣服和参考人员的特点。该模块是基于网络结构的微调对称性来使生成的试验图像看起来更加自然和现实。进行了广泛的模拟,与最先进的虚拟试用方法相比,我们的方法可以实现更好的性能。
translated by 谷歌翻译
The vision community has explored numerous pose guided human editing methods due to their extensive practical applications. Most of these methods still use an image-to-image formulation in which a single image is given as input to produce an edited image as output. However, the problem is ill-defined in cases when the target pose is significantly different from the input pose. Existing methods then resort to in-painting or style transfer to handle occlusions and preserve content. In this paper, we explore the utilization of multiple views to minimize the issue of missing information and generate an accurate representation of the underlying human model. To fuse the knowledge from multiple viewpoints, we design a selector network that takes the pose keypoints and texture from images and generates an interpretable per-pixel selection map. After that, the encodings from a separate network (trained on a single image human reposing task) are merged in the latent space. This enables us to generate accurate, precise, and visually coherent images for different editing tasks. We show the application of our network on 2 newly proposed tasks - Multi-view human reposing, and Mix-and-match human image generation. Additionally, we study the limitations of single-view editing and scenarios in which multi-view provides a much better alternative.
translated by 谷歌翻译
我们提出了一种新的姿势转移方法,用于从由一系列身体姿势控制的人的单个图像中综合人类动画。现有的姿势转移方法在申请新颖场景时表现出显着的视觉伪影,从而导致保留人的身份和纹理的时间不一致和失败。为了解决这些限制,我们设计了一种构成神经网络,预测轮廓,服装标签和纹理。每个模块化网络明确地专用于可以从合成数据学习的子任务。在推理时间,我们利用训练有素的网络在UV坐标中产生统一的外观和标签,其横跨姿势保持不变。统一的代表提供了一个不完整的且强烈指导,以响应姿势变化而产生外观。我们使用训练有素的网络完成外观并呈现背景。通过这些策略,我们能够以时间上连贯的方式综合人类动画,这些动画可以以时间上连贯的方式保护人的身份和外观,而无需在测试场景上进行任何微调。实验表明,我们的方法在合成质量,时间相干性和泛化能力方面优于最先进的。
translated by 谷歌翻译
Facial image manipulation has achieved great progress in recent years. However, previous methods either operate on a predefined set of face attributes or leave users little freedom to interactively manipulate images. To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation. Our key insight is that semantic masks serve as a suitable intermediate representation for flexible face manipulation with fidelity preservation. MaskGAN has two main components: 1) Dense Mapping Network (DMN) and 2) Editing Behavior Simulated Training (EBST). Specifically, DMN learns style mapping between a free-form user modified mask and a target image, enabling diverse generation results. EBST models the user editing behavior on the source mask, making the overall framework more robust to various manipulated inputs. Specifically, it introduces dual-editing consistency as the auxiliary supervision signal. To facilitate extensive studies, we construct a large-scale high-resolution face dataset with fine-grained mask annotations named CelebAMask-HQ. MaskGAN is comprehensively evaluated on two challenging tasks: attribute transfer and style copy, demonstrating superior performance over other state-of-the-art methods. The code, models, and dataset are available at https://github.com/switchablenorms/CelebAMask-HQ.
translated by 谷歌翻译
虚拟试验旨在在店内服装和参考人员图像的情况下产生光真实的拟合结果。现有的方法通常建立多阶段框架来分别处理衣服翘曲和身体混合,或严重依赖基于中间解析器的标签,这些标签可能嘈杂甚至不准确。为了解决上述挑战,我们通过开发一种新型的变形注意流(DAFLOF)提出了一个单阶段的尝试框架,该框架将可变形的注意方案应用于多流量估计。仅将姿势关键点作为指导,分别为参考人员和服装图像估计了自我和跨跨性别的注意力流。通过对多个流场进行采样,通过注意机制同时提取并合并了来自不同语义区域的特征级和像素级信息。它使衣服翘曲和身体合成,同时以端到端的方式导致照片真实的结果。在两个尝试数据集上进行的广泛实验表明,我们提出的方法在定性和定量上都能达到最先进的性能。此外,其他两个图像编辑任务上的其他实验说明了我们用于多视图合成和图像动画方法的多功能性。
translated by 谷歌翻译
Online clothing catalogs lack diversity in body shape and garment size. Brands commonly display their garments on models of one or two sizes, rarely including plus-size models. In this work, we propose a new method, SizeGAN, for generating images of garments on different-sized models. To change the garment and model size while maintaining a photorealistic image, we incorporate image alignment ideas from the medical imaging literature into the StyleGAN2-ADA architecture. Our method learns deformation fields at multiple resolutions and uses a spatial transformer to modify the garment and model size. We evaluate our approach along three dimensions: realism, garment faithfulness, and size. To our knowledge, SizeGAN is the first method to focus on this size under-representation problem for modeling clothing. We provide an analysis comparing SizeGAN to other plausible approaches and additionally provide the first clothing dataset with size labels. In a user study comparing SizeGAN and two recent virtual try-on methods, we show that our method ranks first in each dimension, and was vastly preferred for realism and garment faithfulness. In comparison to most previous work, which has focused on generating photorealistic images of garments, our work shows that it is possible to generate images that are both photorealistic and cover diverse garment sizes.
translated by 谷歌翻译
由于其语义上的理解和用户友好的可控性,通过三维引导,通过三维引导的面部图像操纵已广泛应用于各种交互式场景。然而,现有的基于3D形式模型的操作方法不可直接适用于域名面,例如非黑色素化绘画,卡通肖像,甚至是动物,主要是由于构建每个模型的强大困难具体面部域。为了克服这一挑战,据我们所知,我们建议使用人为3DMM操纵任意域名的第一种方法。这是通过两个主要步骤实现的:1)从3DMM参数解开映射到潜在的STYLEGO2的潜在空间嵌入,可确保每个语义属性的解除响应和精确的控制; 2)通过实施一致的潜空间嵌入,桥接域差异并使人类3DMM适用于域外面的人类3DMM。实验和比较展示了我们高质量的语义操作方法在各种面部域中的优越性,所有主要3D面部属性可控姿势,表达,形状,反照镜和照明。此外,我们开发了直观的编辑界面,以支持用户友好的控制和即时反馈。我们的项目页面是https://cassiepython.github.io/cddfm3d/index.html
translated by 谷歌翻译
最近已经示出了从2D图像中提取隐式3D表示的生成神经辐射场(GNERF)模型,以产生代表刚性物体的现实图像,例如人面或汽车。然而,他们通常难以产生代表非刚性物体的高质量图像,例如人体,这对许多计算机图形应用具有很大的兴趣。本文提出了一种用于人类图像综合的3D感知语义导向生成模型(3D-SAGGA),其集成了GNERF和纹理发生器。前者学习人体的隐式3D表示,并输出一组2D语义分段掩模。后者将这些语义面部掩模转化为真实的图像,为人类的外观添加了逼真的纹理。如果不需要额外的3D信息,我们的模型可以使用照片现实可控生成学习3D人类表示。我们在Deepfashion DataSet上的实验表明,3D-SAGGAN显着优于最近的基线。
translated by 谷歌翻译
人物图像的旨在在源图像上执行非刚性变形,这通常需要未对准数据对进行培训。最近,自我监督的方法通过合并自我重建的解除印章表达来表达这项任务的巨大前景。然而,这些方法未能利用解除戒断功能之间的空间相关性。在本文中,我们提出了一种自我监督的相关挖掘网络(SCM-NET)来重新排列特征空间中的源图像,其中两种协作模块是集成的,分解的样式编码器(DSE)和相关挖掘模块(CMM)。具体地,DSE首先在特征级别创建未对齐的对。然后,CMM建立用于特征重新排列的空间相关领域。最终,翻译模块将重新排列的功能转换为逼真的结果。同时,为了提高跨尺度姿态变换的保真度,我们提出了一种基于曲线图的体结构保持损失(BSR损耗),以保持半体上的合理的身体结构到全身。与Deepfashion DataSet进行的广泛实验表明了与其他监督和无监督和无监督的方法相比的方法的优势。此外,对面部的令人满意的结果显示了我们在其他变形任务中的方法的多功能性。
translated by 谷歌翻译
我们为场景的生成模型提出了一个无监督的中级表示。该表示是中等水平的,因为它既不是人均也不是每图像。相反,场景被建模为一系列空间,深度订购的特征“斑点”。斑点分化在特征网格上,该特征网格被生成对抗网络解码为图像。由于斑点的空间均匀性和卷积固有的局部性,我们的网络学会了将不同的斑点与场景中的不同实体相关联,并安排这些斑点以捕获场景布局。我们通过证明,尽管没有任何监督训练,但我们的方法启用了诸如场景中的物体(例如,移动,卸下和修复家具),创建可行场景(例如,可靠的,Plaausible(例如,可靠),我们的方法可以轻松地操纵对象(例如,可行的情况)来证明这种紧急行为。带抽屉在特定位置的房间),将现实世界图像解析为组成部分。在充满挑战的室内场景的多类数据集上,Blobgan在FID测量的图像质量中优于图像质量。有关视频结果和交互式演示,请参见我们的项目页面:https://www.dave.ml/blobgan
translated by 谷歌翻译
语义图像编辑利用本地语义标签图来生成所需的内容。最近的工作借用了Spade Block来实现语义图像编辑。但是,由于编辑区域和周围像素之间的样式差异,它无法产生令人愉悦的结果。我们将其归因于以下事实:Spade仅使用与图像无关的局部语义布局,但忽略了已知像素中包含的图像特定样式。为了解决此问题,我们提出了一个样式保存的调制(SPM),其中包括两个调制过程:第一个调制包含上下文样式和语义布局,然后生成两个融合的调制参数。第二次调制采用融合参数来调制特征图。通过使用这两种调制,SPM可以在保留特定图像的上下文样式的同时注入给定的语义布局。此外,我们设计了一种渐进式体系结构,以粗到精细的方式生成编辑的内容。提出的方法可以获得上下文一致的结果,并显着减轻生成区域和已知像素之间的不愉快边界。
translated by 谷歌翻译
在本文中,我们专注于人物图像的生成,即在各种条件下产生人物图像,例如腐败的纹理或不同的姿势。在此任务中解决纹理遮挡和大构成错位,以前的作品只使用相应的区域的风格来推断遮挡区域并依靠点明智的对齐来重新组织上下文纹理信息,缺乏全局关联地区的能力代码并保留源的局部结构。为了解决这些问题,我们提出了一种Glocal框架,通过全球推理不同语义区域之间的样式相互关系来改善遮挡感知纹理估计,这也可以用于恢复纹理染色中的损坏图像。对于本地结构信息保存,我们进一步提取了源图像的本地结构,并通过本地结构传输在所生成的图像中重新获得。我们基准测试我们的方法,以充分表征其对Deepfashion DataSet的性能,并显示出突出我们方法的新颖性的广泛消融研究。
translated by 谷歌翻译
人类视频运动转移(HVMT)的目的是鉴于源头的形象,生成了模仿驾驶人员运动的视频。 HVMT的现有方法主要利用生成对抗网络(GAN),以根据根据源人员图像和每个驾驶视频框架估计的流量来执行翘曲操作。但是,由于源头,量表和驾驶人员之间的巨大差异,这些方法始终会产生明显的人工制品。为了克服这些挑战,本文提出了基于gan的新型人类运动转移(远程移动)框架。为了产生逼真的动作,远遥采用了渐进的一代范式:它首先在没有基于流动的翘曲的情况下生成每个身体的零件,然后将所有零件变成驾驶运动的完整人。此外,为了保留自然的全球外观,我们设计了一个全球对齐模块,以根据其布局与驾驶员的规模和位置保持一致。此外,我们提出了一个纹理对准模块,以使人的每个部分都根据纹理的相似性对齐。最后,通过广泛的定量和定性实验,我们的远及以两个公共基准取得了最先进的结果。
translated by 谷歌翻译
Image and video synthesis has become a blooming topic in computer vision and machine learning communities along with the developments of deep generative models, due to its great academic and application value. Many researchers have been devoted to synthesizing high-fidelity human images as one of the most commonly seen object categories in daily lives, where a large number of studies are performed based on various deep generative models, task settings and applications. Thus, it is necessary to give a comprehensive overview on these variant methods on human image generation. In this paper, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods and hybrid methods. For each route, the most representative models and the corresponding variants are presented, where the advantages and characteristics of different methods are summarized in terms of model architectures and input/output requirements. Besides, the main public human image datasets and evaluation metrics in the literature are also summarized. Furthermore, due to the wide application potentials, two typical downstream usages of synthesized human images are covered, i.e., data augmentation for person recognition tasks and virtual try-on for fashion customers. Finally, we discuss the challenges and potential directions of human image generation to shed light on future research.
translated by 谷歌翻译