GAN的进展使高分辨率的感性质量形象产生了产生。 stylegans允许通过数学操作对W/W+空间中的潜在样式向量进行数学操作进行引人入胜的属性修改,从而有效调节生成器的丰富层次结构表示。最近,此类操作已被推广到原始StyleGan纸中的属性交换之外,以包括插值。尽管StyleGans有许多重大改进,但仍被认为会产生不自然的图像。生成的图像的质量基于两个假设。 (a)生成器学到的层次表示的丰富性,以及(b)样式空间的线性和平滑度。在这项工作中,我们提出了一个层次的语义正常化程序(HSR),该层次正常化程序将生成器学到的层次表示与大量数据学到的相应的强大功能保持一致。 HSR不仅可以改善发电机的表示,还可以改善潜在风格空间的线性和平滑度,从而导致产生更自然的样式编辑的图像。为了证明线性改善,我们提出了一种新型的度量 - 属性线性评分(ALS)。通过改善感知路径长度(PPL)度量的改善,在不同的标准数据集中平均16.19%的不自然图像的生成显着降低,同时改善了属性编辑任务中属性变化的线性变化。
translated by 谷歌翻译
Recent 3D-aware GANs rely on volumetric rendering techniques to disentangle the pose and appearance of objects, de facto generating entire 3D volumes rather than single-view 2D images from a latent code. Complex image editing tasks can be performed in standard 2D-based GANs (e.g., StyleGAN models) as manipulation of latent dimensions. However, to the best of our knowledge, similar properties have only been partially explored for 3D-aware GAN models. This work aims to fill this gap by showing the limitations of existing methods and proposing LatentSwap3D, a model-agnostic approach designed to enable attribute editing in the latent space of pre-trained 3D-aware GANs. We first identify the most relevant dimensions in the latent space of the model controlling the targeted attribute by relying on the feature importance ranking of a random forest classifier. Then, to apply the transformation, we swap the top-K most relevant latent dimensions of the image being edited with an image exhibiting the desired attribute. Despite its simplicity, LatentSwap3D provides remarkable semantic edits in a disentangled manner and outperforms alternative approaches both qualitatively and quantitatively. We demonstrate our semantic edit approach on various 3D-aware generative models such as pi-GAN, GIRAFFE, StyleSDF, MVCGAN, EG3D and VolumeGAN, and on diverse datasets, such as FFHQ, AFHQ, Cats, MetFaces, and CompCars. The project page can be found: \url{https://enisimsar.github.io/latentswap3d/}.
translated by 谷歌翻译
现在,使用最近的生成对抗网络(GAN)可以使用高现实主义的不受约束图像产生。但是,用给定的一组属性生成图像非常具有挑战性。最近的方法使用基于样式的GAN模型来执行图像编辑,通过利用发电机层中存在的语义层次结构。我们提出了一些基于潜在的属性操纵和编辑(火焰),这是一个简单而有效的框架,可通过潜在空间操纵执行高度控制的图像编辑。具体而言,我们估计了控制生成图像中语义属性的潜在空间(预训练样式的)中的线性方向。与以前的方法相反,这些方法依赖于大规模属性标记的数据集或属性分类器,而火焰则使用一些策划的图像对的最小监督来估算删除的编辑指示。火焰可以在保留身份的同时,在各种图像集上同时进行高精度和顺序编辑。此外,我们提出了一项新颖的属性样式操纵任务,以生成各种样式的眼镜和头发等属性。我们首先编码相同身份的一组合成图像,但在潜在空间中具有不同的属性样式,以估计属性样式歧管。从该歧管中采样新的潜在将导致生成图像中的新属性样式。我们提出了一种新颖的抽样方法,以从歧管中采样潜在的样品,使我们能够生成各种属性样式,而不是训练集中存在的样式。火焰可以以分离的方式生成多种属性样式。我们通过广泛的定性和定量比较来说明火焰与先前的图像编辑方法相对于先前的图像编辑方法的卓越性能。火焰在多个数据集(例如汽车和教堂)上也很好地概括了。
translated by 谷歌翻译
Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
translated by 谷歌翻译
Recent work has shown that a variety of semantics emerge in the latent space of Generative Adversarial Networks (GANs) when being trained to synthesize images. However, it is difficult to use these learned semantics for real image editing. A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code. However, existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space. As a result, the reconstructed image cannot well support semantic editing through varying the inverted code. To solve this problem, we propose an in-domain GAN inversion approach, which not only faithfully reconstructs the input image but also ensures the inverted code to be semantically meaningful for editing. We first learn a novel domain-guided encoder to project a given image to the native latent space of GANs. We then propose domain-regularized optimization by involving the encoder as a regularizer to fine-tune the code produced by the encoder and better recover the target image. Extensive experiments suggest that our inversion method achieves satisfying real image reconstruction and more importantly facilitates various image editing tasks, significantly outperforming start-of-the-arts. 1
translated by 谷歌翻译
尽管使用StyleGan进行语义操纵的最新进展,但对真实面孔的语义编辑仍然具有挑战性。 $ W $空间与$ W $+空间之间的差距需要重建质量与编辑质量之间的不良权衡。为了解决这个问题,我们建议通过用基于注意的变压器代替Stylegan映射网络中的完全连接的层来扩展潜在空间。这种简单有效的技术将上述两个空间整合在一起,并将它们转换为一个名为$ W $ ++的新的潜在空间。我们的修改后的Stylegan保持了原始StyleGan的最新一代质量,并具有中等程度的多样性。但更重要的是,提议的$ W $ ++空间在重建质量和编辑质量方面都取得了卓越的性能。尽管有这些显着优势,但我们的$ W $ ++空间支持现有的反转算法和编辑方法,仅由于其与$ w/w $+空间的结构相似性,因此仅可忽略不计的修改。 FFHQ数据集上的广泛实验证明,我们提出的$ W $ ++空间显然比以前的$ w/w $+空间更可取。该代码可在https://github.com/anonsubm2021/transstylegan上公开提供。
translated by 谷歌翻译
In this work, we are dedicated to text-guided image generation and propose a novel framework, i.e., CLIP2GAN, by leveraging CLIP model and StyleGAN. The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a mapping network. In the training stage, we encode an image with CLIP and map the output feature to a latent code, which is further used to reconstruct the image. In this way, the mapping network is optimized in a self-supervised learning way. In the inference stage, since CLIP can embed both image and text into a shared feature embedding space, we replace CLIP image encoder in the training architecture with CLIP text encoder, while keeping the following mapping network as well as StyleGAN model. As a result, we can flexibly input a text description to generate an image. Moreover, by simply adding mapped text features of an attribute to a mapped CLIP image feature, we can effectively edit the attribute to the image. Extensive experiments demonstrate the superior performance of our proposed CLIP2GAN compared to previous methods.
translated by 谷歌翻译
Figure 1: Manipulating various facial attributes through varying the latent codes of a well-trained GAN model. The first column shows the original synthesis from PGGAN [21], while each of the other columns shows the results of manipulating a specific attribute.
translated by 谷歌翻译
由于简单但有效的训练机制和出色的图像产生质量,生成的对抗网络(GAN)引起了极大的关注。具有生成照片现实的高分辨率(例如$ 1024 \ times1024 $)的能力,最近的GAN模型已大大缩小了生成的图像与真实图像之间的差距。因此,许多最近的作品表明,通过利用良好的潜在空间和博学的gan先验来利用预先训练的GAN模型的新兴兴趣。在本文中,我们简要回顾了从三个方面利用预先培训的大规模GAN模型的最新进展,即1)大规模生成对抗网络的培训,2)探索和理解预训练的GAN模型,以及预先培训的GAN模型,以及3)利用这些模型进行后续任务,例如图像恢复和编辑。有关相关方法和存储库的更多信息,请访问https://github.com/csmliu/pretretaining-gans。
translated by 谷歌翻译
We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. Inverting real images into StyleGAN's latent space is an extensively studied problem, yet the trade-off between the image reconstruction fidelity and image editing quality remains an open challenge. The low-rate latent spaces are limited in their expressiveness power for high-fidelity reconstruction. On the other hand, high-rate latent spaces result in degradation in editing quality. In this work, to achieve high-fidelity inversion, we learn residual features in higher latent codes that lower latent codes were not able to encode. This enables preserving image details in reconstruction. To achieve high-quality editing, we learn how to transform the residual features for adapting to manipulations in latent codes. We train the framework to extract residual features and transform them via a novel architecture pipeline and cycle consistency losses. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements. Code: https://github.com/hamzapehlivan/StyleRes
translated by 谷歌翻译
最近在图像编辑中找到了生成的对抗网络(GANS)。但是,大多数基于GaN的图像编辑方法通常需要具有用于训练的语义分段注释的大规模数据集,只提供高级控制,或者仅在不同图像之间插入。在这里,我们提出了EditGan,一种用于高质量,高精度语义图像编辑的新方法,允许用户通过修改高度详细的部分分割面罩,例如,为汽车前灯绘制新掩模来编辑图像。编辑登上的GAN框架上建立联合模型图像及其语义分割,只需要少数标记的示例,使其成为编辑的可扩展工具。具体地,我们将图像嵌入GaN潜在空间中,并根据分割编辑执行条件潜代码优化,这有效地修改了图像。算优化优化,我们发现在实现编辑的潜在空间中找到编辑向量。该框架允许我们学习任意数量的编辑向量,然后可以直接应用于交互式速率的其他图像。我们通过实验表明,EditGan可以用前所未有的细节和自由来操纵图像,同时保留完整的图像质量。我们还可以轻松地组合多个编辑并执行超出EditGan训练数据的合理编辑。我们在各种图像类型上展示编辑,并定量优于标准编辑基准任务的几种先前编辑方法。
translated by 谷歌翻译
Although Generative Adversarial Networks (GANs) have made significant progress in face synthesis, there lacks enough understanding of what GANs have learned in the latent representation to map a random code to a photo-realistic image. In this work, we propose a framework called InterFaceGAN to interpret the disentangled face representation learned by the state-of-the-art GAN models and study the properties of the facial semantics encoded in the latent space. We first find that GANs learn various semantics in some linear subspaces of the latent space. After identifying these subspaces, we can realistically manipulate the corresponding facial attributes without retraining the model. We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection, resulting in more precise control of the attribute manipulation. Besides manipulating the gender, age, expression, and presence of eyeglasses, we can even alter the face pose and fix the artifacts accidentally made by GANs. Furthermore, we perform an in-depth face identity analysis and a layer-wise analysis to evaluate the editing results quantitatively. Finally, we apply our approach to real face editing by employing GAN inversion approaches and explicitly training feed-forward models based on the synthetic data established by InterFaceGAN. Extensive experimental results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable face representation.
translated by 谷歌翻译
An open secret in contemporary machine learning is that many models work beautifully on standard benchmarks but fail to generalize outside the lab. This has been attributed to biased training data, which provide poor coverage over real world events. Generative models are no exception, but recent advances in generative adversarial networks (GANs) suggest otherwise -these models can now synthesize strikingly realistic and diverse images. Is generative modeling of photos a solved problem? We show that although current GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold. In particular, we study their ability to fit simple transformations such as camera movements and color changes. We find that the models reflect the biases of the datasets on which they are trained (e.g., centered objects), but that they also exhibit some capacity for generalization: by "steering" in latent space, we can shift the distribution while still creating realistic images. We hypothesize that the degree of distributional shift is related to the breadth of the training data distribution. Thus, we conduct experiments to quantify the limits of GAN transformations and introduce techniques to mitigate the problem. Code is released on our project page: https://ali-design.github.io/gan_steerability/.
translated by 谷歌翻译
The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.
translated by 谷歌翻译
事实证明,通过倒转和操纵与输入真实图像相对应的潜在代码,生成的对抗网络(GAN)对于图像编辑非常有效。这种编辑属性来自潜在空间的分离性质。在本文中,我们确定面部属性分离不是最佳的,因此依靠线性属性分离的面部编辑是有缺陷的。因此,我们建议通过监督改善语义分解。我们的方法包括使用归一化流量学习代理潜在表示,我们证明这会为面部图像编辑提供更有效的空间。
translated by 谷歌翻译
大规模训练的出现产生了强大的视觉识别模型的聚宝盆。然而,传统上以无人监督的方式从划痕训练的生成模型。可以利用来自一大堆预用的视觉模型的集体“知识”来改善GaN培训吗?如果是这样,有这么多的模型可供选择,应该选择哪一个,并且以什么方式最有效?我们发现预磨削的计算机视觉模型可以在鉴别器的集合中使用时显着提高性能。值得注意的是,所选模型的特定子集极大地影响性能。我们提出了一种有效的选择机制,通过探测预训练模型嵌入的实际和假样本之间的线性可分性,选择最准确的模型,并逐步将其添加到鉴别器集合中。有趣的是,我们的方法可以在有限的数据和大规模设置中提高GaN培训。只有10K培训样本,我们的LSUN猫的FID与1.6M图像培训的风格挂牌匹配。在完整的数据集上,我们的方法将FID提高了1.5倍的LSUN猫,教堂和马类的2倍。
translated by 谷歌翻译
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.
translated by 谷歌翻译
我们通过将此任务视为视觉令牌生成问题来提出新的视角来实现图像综合。与现有的范例不同,即直接从单个输入(例如,潜像)直接合成完整图像,新配方使得能够为不同的图像区域进行灵活的本地操作,这使得可以学习内容感知和细粒度的样式控制用于图像合成。具体地,它需要输入潜像令牌的序列,以预测用于合成图像的视觉令牌。在这种观点来看,我们提出了一个基于令牌的发电机(即Tokengan)。特别是,Tokengan输入了两个语义不同的视觉令牌,即,来自潜在空间的学习常量内容令牌和风格代币。鉴于一系列风格令牌,Tokengan能够通过用变压器将样式分配给内容令牌来控制图像合成。我们进行了广泛的实验,并表明拟议的Tokengan在几个广泛使用的图像综合基准上实现了最先进的结果,包括FFHQ和LSUN教会,具有不同的决议。特别地,发电机能够用1024x1024尺寸合成高保真图像,完全用卷曲分配。
translated by 谷歌翻译
最近,由于高质量的发电和解除戒开的潜在空间,Stylegan已经启用了各种图像操纵和编辑任务。但是,通常需要额外的架构或特定于特定的培训范式来实现不同的任务。在这项工作中,我们深入了解样式甘蓝的空间属性。我们展示使用普雷雷达的样式总是以及一些操作,没有任何额外的架构,我们可以相当于各种任务的最先进的方法执行,包括图像混合,全景生成,从单个图像,可控的生成本地多模式图像到图像转换和属性传输。所提出的方法简单,有效,有效,适用于任何现有的预制样式模型。
translated by 谷歌翻译