Figure 1: (a) and (b): input images; (c): the "two-face" generated by naively copying the left half from (a) and the right half from (b); (d): the "two-face" generated by our Image2StyleGAN++ framework.
translated by 谷歌翻译
We propose an efficient algorithm to embed a given image into the latent space of StyleGAN. This embedding enables semantic image editing operations that can be applied to existing photographs. Taking the StyleGAN trained on the FFHQ dataset as an example, we show results for image morphing, style transfer, and expression transfer. Studying the results of the embedding algorithm provides valuable insights into the structure of the StyleGAN latent space. We propose a set of experiments to test what class of images can be embedded, how they are embedded, what latent space is suitable for embedding, and if the embedding is semantically meaningful.
translated by 谷歌翻译
We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quan-titative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region. Code: https://github.com/ZPdesu/SEAN .
translated by 谷歌翻译
Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
translated by 谷歌翻译
我们提出了Exe-Gan,这是一种新型的使用生成对抗网络的典范引导的面部介绍框架。我们的方法不仅可以保留输入面部图像的质量,而且还可以使用类似示例性的面部属性来完成图像。我们通过同时利用输入图像的全局样式,从随机潜在代码生成的随机样式以及示例图像的示例样式来实现这一目标。我们介绍了一个新颖的属性相似性指标,以鼓励网络以一种自我监督的方式从示例中学习面部属性的风格。为了确保跨地区边界之间的自然过渡,我们引入了一种新型的空间变体梯度反向传播技术,以根据空间位置调整损耗梯度。关于公共Celeba-HQ和FFHQ数据集的广泛评估和实际应用,可以验证Exe-GAN的优越性,从面部镶嵌的视觉质量来看。
translated by 谷歌翻译
edu.hk (a) Image Reconstruction (b) Image Colorization (c) Image Super-Resolution (d) Image Denoising (e) Image Inpainting (f) Semantic Manipulation Figure 1: Multi-code GAN prior facilitates many image processing applications using the reconstruction from fixed PGGAN [23] models.
translated by 谷歌翻译
Recent work has shown that a variety of semantics emerge in the latent space of Generative Adversarial Networks (GANs) when being trained to synthesize images. However, it is difficult to use these learned semantics for real image editing. A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code. However, existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space. As a result, the reconstructed image cannot well support semantic editing through varying the inverted code. To solve this problem, we propose an in-domain GAN inversion approach, which not only faithfully reconstructs the input image but also ensures the inverted code to be semantically meaningful for editing. We first learn a novel domain-guided encoder to project a given image to the native latent space of GANs. We then propose domain-regularized optimization by involving the encoder as a regularizer to fine-tune the code produced by the encoder and better recover the target image. Extensive experiments suggest that our inversion method achieves satisfying real image reconstruction and more importantly facilitates various image editing tasks, significantly outperforming start-of-the-arts. 1
translated by 谷歌翻译
We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. Inverting real images into StyleGAN's latent space is an extensively studied problem, yet the trade-off between the image reconstruction fidelity and image editing quality remains an open challenge. The low-rate latent spaces are limited in their expressiveness power for high-fidelity reconstruction. On the other hand, high-rate latent spaces result in degradation in editing quality. In this work, to achieve high-fidelity inversion, we learn residual features in higher latent codes that lower latent codes were not able to encode. This enables preserving image details in reconstruction. To achieve high-quality editing, we learn how to transform the residual features for adapting to manipulations in latent codes. We train the framework to extract residual features and transform them via a novel architecture pipeline and cycle consistency losses. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements. Code: https://github.com/hamzapehlivan/StyleRes
translated by 谷歌翻译
由于发型的复杂性和美味,编辑发型是独一无二的,而且具有挑战性。尽管最近的方法显着改善了头发的细节,但是当源图像的姿势与目标头发图像的姿势大不相同时,这些模型通常会产生不良的输出,从而限制了其真实世界的应用。发型是一种姿势不变的发型转移模型,可以减轻这种限制,但在保留精致的头发质地方面仍然表现出不令人满意的质量。为了解决这些局限性,我们提出了配备潜在优化和新呈现的局部匹配损失的高性能姿势不变的发型转移模型。在stylegan2潜在空间中,我们首先探索目标头发的姿势对准的潜在代码,并根据本地风格匹配保留了详细纹理。然后,我们的模型对源的遮挡构成了对齐的目标头发的遮挡,并将两个图像混合在一起以产生最终输出。实验结果表明,我们的模型在在较大的姿势差异和保留局部发型纹理下转移发型方面具有优势。
translated by 谷歌翻译
最近在图像编辑中找到了生成的对抗网络(GANS)。但是,大多数基于GaN的图像编辑方法通常需要具有用于训练的语义分段注释的大规模数据集,只提供高级控制,或者仅在不同图像之间插入。在这里,我们提出了EditGan,一种用于高质量,高精度语义图像编辑的新方法,允许用户通过修改高度详细的部分分割面罩,例如,为汽车前灯绘制新掩模来编辑图像。编辑登上的GAN框架上建立联合模型图像及其语义分割,只需要少数标记的示例,使其成为编辑的可扩展工具。具体地,我们将图像嵌入GaN潜在空间中,并根据分割编辑执行条件潜代码优化,这有效地修改了图像。算优化优化,我们发现在实现编辑的潜在空间中找到编辑向量。该框架允许我们学习任意数量的编辑向量,然后可以直接应用于交互式速率的其他图像。我们通过实验表明,EditGan可以用前所未有的细节和自由来操纵图像,同时保留完整的图像质量。我们还可以轻松地组合多个编辑并执行超出EditGan训练数据的合理编辑。我们在各种图像类型上展示编辑,并定量优于标准编辑基准任务的几种先前编辑方法。
translated by 谷歌翻译
Our method performs local semantic editing on GAN output images, transferring the appearance of a specific object part from a reference image to a target image.
translated by 谷歌翻译
生成对抗性网络(GANS)的最新进展导致了面部图像合成的显着成果。虽然使用基于样式的GAN的方法可以产生尖锐的照片拟真的面部图像,但是通常难以以有意义和解开的方式控制所产生的面的特性。之前的方法旨在在先前培训的GaN的潜在空间内实现此类语义控制和解剖。相比之下,我们提出了一个框架,即明确地提出了诸如3D形状,反玻璃,姿势和照明的面部的身体属性,从而通过设计提供解剖。我们的方法,大多数GaN,与非线性3D可变模型的物理解剖和灵活性集成了基于风格的GAN的表现力和质感,我们与最先进的2D头发操纵网络相结合。大多数GaN通过完全解散的3D控制来实现肖像图像的照片拟理性操纵,从而实现了光线,面部表情和姿势变化的极端操作,直到完整的档案视图。
translated by 谷歌翻译
最近的研究表明,风格老年提供了对图像合成和编辑的下游任务的有希望的现有模型。然而,由于样式盖的潜在代码被设计为控制全球样式,因此很难实现对合成图像的细粒度控制。我们提出了SemanticStylegan,其中发电机训练以分别培训局部语义部件,并以组成方式合成图像。不同局部部件的结构和纹理由相应的潜在码控制。实验结果表明,我们的模型在不同空间区域之间提供了强烈的解剖。当与为样式器设计的编辑方法结合使用时,它可以实现更细粒度的控制,以编辑合成或真实图像。该模型也可以通过传输学习扩展到其他域。因此,作为具有内置解剖学的通用先前模型,它可以促进基于GaN的应用的发展并实现更多潜在的下游任务。
translated by 谷歌翻译
现有的GAN倒置和编辑方法适用于具有干净背景的对齐物体,例如肖像和动物面孔,但通常会为更加困难的类别而苦苦挣扎,具有复杂的场景布局和物体遮挡,例如汽车,动物和室外图像。我们提出了一种新方法,以在gan的潜在空间(例如stylegan2)中倒转和编辑复杂的图像。我们的关键想法是用一系列层的集合探索反演,从而将反转过程适应图像的难度。我们学会预测不同图像段的“可逆性”,并将每个段投影到潜在层。更容易的区域可以倒入发电机潜在空间中的较早层,而更具挑战性的区域可以倒入更晚的特征空间。实验表明,与最新的复杂类别的方法相比,我们的方法获得了更好的反转结果,同时保持下游的编辑性。请参阅我们的项目页面,网址为https://www.cs.cmu.edu/~saminversion。
translated by 谷歌翻译
Figure 1. The proposed pixel2style2pixel framework can be used to solve a wide variety of image-to-image translation tasks. Here we show results of pSp on StyleGAN inversion, multi-modal conditional image synthesis, facial frontalization, inpainting and super-resolution.
translated by 谷歌翻译
The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.
translated by 谷歌翻译
由于简单但有效的训练机制和出色的图像产生质量,生成的对抗网络(GAN)引起了极大的关注。具有生成照片现实的高分辨率(例如$ 1024 \ times1024 $)的能力,最近的GAN模型已大大缩小了生成的图像与真实图像之间的差距。因此,许多最近的作品表明,通过利用良好的潜在空间和博学的gan先验来利用预先训练的GAN模型的新兴兴趣。在本文中,我们简要回顾了从三个方面利用预先培训的大规模GAN模型的最新进展,即1)大规模生成对抗网络的培训,2)探索和理解预训练的GAN模型,以及预先培训的GAN模型,以及3)利用这些模型进行后续任务,例如图像恢复和编辑。有关相关方法和存储库的更多信息,请访问https://github.com/csmliu/pretretaining-gans。
translated by 谷歌翻译
我们为一个拍摄域适应提供了一种新方法。我们方法的输入是训练的GaN,其可以在域B中产生域A和单个参考图像I_B的图像。所提出的算法可以将训练的GaN的任何输出从域A转换为域B.我们的主要优点有两个主要优点方法与当前现有技术相比:首先,我们的解决方案实现了更高的视觉质量,例如通过明显减少过度装箱。其次,我们的解决方案允许更多地控制域间隙的自由度,即图像I_B的哪些方面用于定义域B.从技术上讲,我们通过在预先训练的样式生成器上建立新方法作为GaN和A用于代表域间隙的预先训练的夹模型。我们提出了几种新的常规程序来控制域间隙,以优化预先训练的样式生成器的权重,以输出域B中的图像而不是域A.常规方法防止优化来自单个参考图像的太多属性。我们的结果表明,对现有技术的显着视觉改进以及突出了改进控制的多个应用程序。
translated by 谷歌翻译
由于GaN潜在空间的勘探和利用,近年来,现实世界的图像操纵实现了奇妙的进展。 GaN反演是该管道的第一步,旨在忠实地将真实图像映射到潜在代码。不幸的是,大多数现有的GaN反演方法都无法满足下面列出的三个要求中的至少一个:重建质量,可编辑性和快速推断。我们在本研究中提出了一种新的两阶段策略,同时适合所有要求。在第一阶段,我们训练编码器将输入图像映射到StyleGan2 $ \ Mathcal {W} $ - 空间,这被证明具有出色的可编辑性,但重建质量较低。在第二阶段,我们通过利用一系列HyperNetWorks来补充初始阶段的重建能力以在反转期间恢复缺失的信息。这两个步骤互相补充,由于Hypernetwork分支和由于$ \ Mathcal {W} $ - 空间中的反转,因此由于HyperNetwork分支和优异的可编辑性而相互作用。我们的方法完全是基于编码器的,导致极快的推断。关于两个具有挑战性的数据集的广泛实验证明了我们方法的优越性。
translated by 谷歌翻译