由于简单但有效的训练机制和出色的图像产生质量,生成的对抗网络(GAN)引起了极大的关注。具有生成照片现实的高分辨率(例如$ 1024 \ times1024 $)的能力,最近的GAN模型已大大缩小了生成的图像与真实图像之间的差距。因此,许多最近的作品表明,通过利用良好的潜在空间和博学的gan先验来利用预先训练的GAN模型的新兴兴趣。在本文中,我们简要回顾了从三个方面利用预先培训的大规模GAN模型的最新进展,即1)大规模生成对抗网络的培训,2)探索和理解预训练的GAN模型,以及预先培训的GAN模型,以及3)利用这些模型进行后续任务,例如图像恢复和编辑。有关相关方法和存储库的更多信息,请访问https://github.com/csmliu/pretretaining-gans。
translated by 谷歌翻译
Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
translated by 谷歌翻译
随着信息中的各种方式存在于现实世界中的各种方式,多式联信息之间的有效互动和融合在计算机视觉和深度学习研究中的多模式数据的创造和感知中起着关键作用。通过卓越的功率,在多式联运信息中建模互动,多式联运图像合成和编辑近年来已成为一个热门研究主题。与传统的视觉指导不同,提供明确的线索,多式联路指南在图像合成和编辑方面提供直观和灵活的手段。另一方面,该领域也面临着具有固有的模态差距的特征的几个挑战,高分辨率图像的合成,忠实的评估度量等。在本调查中,我们全面地阐述了最近多式联运图像综合的进展根据数据模型和模型架构编辑和制定分类。我们从图像合成和编辑中的不同类型的引导方式开始介绍。然后,我们描述了多模式图像综合和编辑方法,其具有详细的框架,包括生成的对抗网络(GAN),GaN反转,变压器和其他方法,例如NERF和扩散模型。其次是在多模式图像合成和编辑中广泛采用的基准数据集和相应的评估度量的综合描述,以及分析各个优点和限制的不同合成方法的详细比较。最后,我们为目前的研究挑战和未来的研究方向提供了深入了解。与本调查相关的项目可在HTTPS://github.com/fnzhan/mise上获得
translated by 谷歌翻译
Recent 3D-aware GANs rely on volumetric rendering techniques to disentangle the pose and appearance of objects, de facto generating entire 3D volumes rather than single-view 2D images from a latent code. Complex image editing tasks can be performed in standard 2D-based GANs (e.g., StyleGAN models) as manipulation of latent dimensions. However, to the best of our knowledge, similar properties have only been partially explored for 3D-aware GAN models. This work aims to fill this gap by showing the limitations of existing methods and proposing LatentSwap3D, a model-agnostic approach designed to enable attribute editing in the latent space of pre-trained 3D-aware GANs. We first identify the most relevant dimensions in the latent space of the model controlling the targeted attribute by relying on the feature importance ranking of a random forest classifier. Then, to apply the transformation, we swap the top-K most relevant latent dimensions of the image being edited with an image exhibiting the desired attribute. Despite its simplicity, LatentSwap3D provides remarkable semantic edits in a disentangled manner and outperforms alternative approaches both qualitatively and quantitatively. We demonstrate our semantic edit approach on various 3D-aware generative models such as pi-GAN, GIRAFFE, StyleSDF, MVCGAN, EG3D and VolumeGAN, and on diverse datasets, such as FFHQ, AFHQ, Cats, MetFaces, and CompCars. The project page can be found: \url{https://enisimsar.github.io/latentswap3d/}.
translated by 谷歌翻译
Face Restoration (FR) aims to restore High-Quality (HQ) faces from Low-Quality (LQ) input images, which is a domain-specific image restoration problem in the low-level computer vision area. The early face restoration methods mainly use statistic priors and degradation models, which are difficult to meet the requirements of real-world applications in practice. In recent years, face restoration has witnessed great progress after stepping into the deep learning era. However, there are few works to study deep learning-based face restoration methods systematically. Thus, this paper comprehensively surveys recent advances in deep learning techniques for face restoration. Specifically, we first summarize different problem formulations and analyze the characteristic of the face image. Second, we discuss the challenges of face restoration. Concerning these challenges, we present a comprehensive review of existing FR methods, including prior based methods and deep learning-based methods. Then, we explore developed techniques in the task of FR covering network architectures, loss functions, and benchmark datasets. We also conduct a systematic benchmark evaluation on representative methods. Finally, we discuss future directions, including network designs, metrics, benchmark datasets, applications,etc. We also provide an open-source repository for all the discussed methods, which is available at https://github.com/TaoWangzj/Awesome-Face-Restoration.
translated by 谷歌翻译
The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.
translated by 谷歌翻译
edu.hk (a) Image Reconstruction (b) Image Colorization (c) Image Super-Resolution (d) Image Denoising (e) Image Inpainting (f) Semantic Manipulation Figure 1: Multi-code GAN prior facilitates many image processing applications using the reconstruction from fixed PGGAN [23] models.
translated by 谷歌翻译
与传统的头像创建管道相反,这是一个昂贵的过程,现代生成方法直接从照片中学习数据分布,而艺术的状态现在可以产生高度的照片现实图像。尽管大量作品试图扩展无条件的生成模型并达到一定程度的可控性,但要确保多视图一致性,尤其是在大型姿势中,仍然具有挑战性。在这项工作中,我们提出了一个3D肖像生成网络,该网络可产生3D一致的肖像,同时根据有关姿势,身份,表达和照明的语义参数可控。生成网络使用神经场景表示在3D中建模肖像,其生成以支持明确控制的参数面模型为指导。尽管可以通过将图像与部分不同的属性进行对比,但可以进一步增强潜在的分离,但在非面积区域(例如,在动画表达式)时,仍然存在明显的不一致。我们通过提出一种体积混合策略来解决此问题,在该策略中,我们通过将动态和静态辐射场融合在一起,形成一个复合输出,并从共同学习的语义场中分割了两个部分。我们的方法在广泛的实验中优于先前的艺术,在自由视点中观看时,在自然照明中产生了逼真的肖像。所提出的方法还证明了真实图像以及室外卡通面孔的概括能力,在实际应用中显示出巨大的希望。其他视频结果和代码将在项目网页上提供。
translated by 谷歌翻译
由于其语义上的理解和用户友好的可控性,通过三维引导,通过三维引导的面部图像操纵已广泛应用于各种交互式场景。然而,现有的基于3D形式模型的操作方法不可直接适用于域名面,例如非黑色素化绘画,卡通肖像,甚至是动物,主要是由于构建每个模型的强大困难具体面部域。为了克服这一挑战,据我们所知,我们建议使用人为3DMM操纵任意域名的第一种方法。这是通过两个主要步骤实现的:1)从3DMM参数解开映射到潜在的STYLEGO2的潜在空间嵌入,可确保每个语义属性的解除响应和精确的控制; 2)通过实施一致的潜空间嵌入,桥接域差异并使人类3DMM适用于域外面的人类3DMM。实验和比较展示了我们高质量的语义操作方法在各种面部域中的优越性,所有主要3D面部属性可控姿势,表达,形状,反照镜和照明。此外,我们开发了直观的编辑界面,以支持用户友好的控制和即时反馈。我们的项目页面是https://cassiepython.github.io/cddfm3d/index.html
translated by 谷歌翻译
Although Generative Adversarial Networks (GANs) have made significant progress in face synthesis, there lacks enough understanding of what GANs have learned in the latent representation to map a random code to a photo-realistic image. In this work, we propose a framework called InterFaceGAN to interpret the disentangled face representation learned by the state-of-the-art GAN models and study the properties of the facial semantics encoded in the latent space. We first find that GANs learn various semantics in some linear subspaces of the latent space. After identifying these subspaces, we can realistically manipulate the corresponding facial attributes without retraining the model. We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection, resulting in more precise control of the attribute manipulation. Besides manipulating the gender, age, expression, and presence of eyeglasses, we can even alter the face pose and fix the artifacts accidentally made by GANs. Furthermore, we perform an in-depth face identity analysis and a layer-wise analysis to evaluate the editing results quantitatively. Finally, we apply our approach to real face editing by employing GAN inversion approaches and explicitly training feed-forward models based on the synthetic data established by InterFaceGAN. Extensive experimental results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable face representation.
translated by 谷歌翻译
本文的目标是对面部素描合成(FSS)问题进行全面的研究。然而,由于获得了手绘草图数据集的高成本,因此缺乏完整的基准,用于评估过去十年的FSS算法的开发。因此,我们首先向FSS引入高质量的数据集,名为FS2K,其中包括2,104个图像素描对,跨越三种类型的草图样式,图像背景,照明条件,肤色和面部属性。 FS2K与以前的FSS数据集不同于难度,多样性和可扩展性,因此应促进FSS研究的进展。其次,我们通过调查139种古典方法,包括34个手工特征的面部素描合成方法,37个一般的神经式传输方法,43个深映像到图像翻译方法,以及35个图像 - 素描方法。此外,我们详细说明了现有的19个尖端模型的综合实验。第三,我们为FSS提供了一个简单的基准,名为FSGAN。只有两个直截了当的组件,即面部感知屏蔽和风格矢量扩展,FSGAN将超越所提出的FS2K数据集的所有先前最先进模型的性能,通过大边距。最后,我们在过去几年中汲取的经验教训,并指出了几个未解决的挑战。我们的开源代码可在https://github.com/dengpingfan/fsgan中获得。
translated by 谷歌翻译
Figure 1. The proposed pixel2style2pixel framework can be used to solve a wide variety of image-to-image translation tasks. Here we show results of pSp on StyleGAN inversion, multi-modal conditional image synthesis, facial frontalization, inpainting and super-resolution.
translated by 谷歌翻译
多年来,2d Gans在影像肖像的一代中取得了巨大的成功。但是,他们在生成过程中缺乏3D理解,因此他们遇到了多视图不一致问题。为了减轻这个问题,已经提出了许多3D感知的甘斯,并显示出显着的结果,但是3D GAN在编辑语义属性方面努力。 3D GAN的可控性和解释性并未得到太多探索。在这项工作中,我们提出了两种解决方案,以克服2D GAN和3D感知gan的这些弱点。我们首先介绍了一种新颖的3D感知gan,Surf-Gan,它能够在训练过程中发现语义属性,并以无监督的方式控制它们。之后,我们将先验的Surf-GAN注入stylegan,以获得高保真3D控制的发电机。与允许隐姿姿势控制的现有基于潜在的方法不同,所提出的3D控制样式gan可实现明确的姿势控制对肖像生成的控制。这种蒸馏允许3D控制与许多基于样式的技术(例如,反转和风格化)之间的直接兼容性,并且在计算资源方面也带来了优势。我们的代码可从https://github.com/jgkwak95/surf-gan获得。
translated by 谷歌翻译
生成照片 - 现实图像,语义编辑和表示学习是高分辨率生成模型的许多潜在应用中的一些。最近在GAN的进展将它们建立为这些任务的绝佳选择。但是,由于它们不提供推理模型,因此使用GaN潜在空间无法在实际图像上完成诸如分类的图像编辑或下游任务。尽管培训了训练推理模型或设计了一种迭代方法来颠覆训练有素的发生器,但之前的方法是数据集(例如人类脸部图像)和架构(例如样式)。这些方法是非延伸到新型数据集或架构的。我们提出了一般框架,该框架是不可知的架构和数据集。我们的主要识别是,通过培训推断和生成模型在一起,我们允许它们彼此适应并收敛到更好的质量模型。我们的\ textbf {invang},可逆GaN的简短,成功将真实图像嵌入到高质量的生成模型的潜在空间。这使我们能够执行图像修复,合并,插值和在线数据增强。我们展示了广泛的定性和定量实验。
translated by 谷歌翻译
In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions. The proposed method consists of three components: StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization. The inversion module maps real images to the latent space of a well-trained StyleGAN. The visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. The instancelevel optimization is for identity preservation in manipulation. Our model can produce diverse and high-quality images with an unprecedented resolution at 1024 2 . Using a control mechanism based on style-mixing, our Tedi-GAN inherently supports image synthesis with multi-modal inputs, such as sketches or semantic labels, with or without instance guidance. To facilitate text-guided multimodal synthesis, we propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions. Extensive experiments on the introduced dataset demonstrate the superior performance of our proposed method. Code and data are available at https://github.com/weihaox/TediGAN.
translated by 谷歌翻译
In this work, we are dedicated to text-guided image generation and propose a novel framework, i.e., CLIP2GAN, by leveraging CLIP model and StyleGAN. The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a mapping network. In the training stage, we encode an image with CLIP and map the output feature to a latent code, which is further used to reconstruct the image. In this way, the mapping network is optimized in a self-supervised learning way. In the inference stage, since CLIP can embed both image and text into a shared feature embedding space, we replace CLIP image encoder in the training architecture with CLIP text encoder, while keeping the following mapping network as well as StyleGAN model. As a result, we can flexibly input a text description to generate an image. Moreover, by simply adding mapped text features of an attribute to a mapped CLIP image feature, we can effectively edit the attribute to the image. Extensive experiments demonstrate the superior performance of our proposed CLIP2GAN compared to previous methods.
translated by 谷歌翻译
GAN的进展使高分辨率的感性质量形象产生了产生。 stylegans允许通过数学操作对W/W+空间中的潜在样式向量进行数学操作进行引人入胜的属性修改,从而有效调节生成器的丰富层次结构表示。最近,此类操作已被推广到原始StyleGan纸中的属性交换之外,以包括插值。尽管StyleGans有许多重大改进,但仍被认为会产生不自然的图像。生成的图像的质量基于两个假设。 (a)生成器学到的层次表示的丰富性,以及(b)样式空间的线性和平滑度。在这项工作中,我们提出了一个层次的语义正常化程序(HSR),该层次正常化程序将生成器学到的层次表示与大量数据学到的相应的强大功能保持一致。 HSR不仅可以改善发电机的表示,还可以改善潜在风格空间的线性和平滑度,从而导致产生更自然的样式编辑的图像。为了证明线性改善,我们提出了一种新型的度量 - 属性线性评分(ALS)。通过改善感知路径长度(PPL)度量的改善,在不同的标准数据集中平均16.19%的不自然图像的生成显着降低,同时改善了属性编辑任务中属性变化的线性变化。
translated by 谷歌翻译
生成的对抗网络(GANS)最近引入了执行图像到图像翻译的有效方法。这些模型可以应用于图像到图像到图像转换中的各种域而不改变任何参数。在本文中,我们调查并分析了八个图像到图像生成的对策网络:PIX2PX,Cyclegan,Cogan,Stargan,Munit,Stargan2,Da-Gan,以及自我关注GaN。这些模型中的每一个都呈现了最先进的结果,并引入了构建图像到图像的新技术。除了对模型的调查外,我们还调查了他们接受培训的18个数据集,并在其上进行了评估的9个指标。最后,我们在常见的一组指标和数据集中呈现6种这些模型的受控实验的结果。结果混合并显示,在某些数据集,任务和指标上,某些型号优于其他型号。本文的最后一部分讨论了这些结果并建立了未来研究领域。由于研究人员继续创新新的图像到图像GAN,因此他们非常重要地了解现有方法,数据集和指标。本文提供了全面的概述和讨论,以帮助构建此基础。
translated by 谷歌翻译
基于生成神经辐射场(GNERF)基于生成神经辐射场(GNERF)的3D感知gan已达到令人印象深刻的高质量图像产生,同时保持了强3D一致性。最显着的成就是在面部生成领域中取得的。但是,这些模型中的大多数都集中在提高视图一致性上,但忽略了分离的方面,因此这些模型无法提供高质量的语义/属性控制对生成。为此,我们引入了一个有条件的GNERF模型,该模型使用特定属性标签作为输入,以提高3D感知生成模型的控制能力和解散能力。我们利用预先训练的3D感知模型作为基础,并集成了双分支属性编辑模块(DAEM),该模块(DAEM)利用属性标签来提供对生成的控制。此外,我们提出了一个Triot(作为INIT的训练,并针对调整进行优化),以优化潜在矢量以进一步提高属性编辑的精度。广泛使用的FFHQ上的广泛实验表明,我们的模型在保留非目标区域的同时产生具有更好视图一致性的高质量编辑。该代码可在https://github.com/zhangqianhui/tt-gnerf上找到。
translated by 谷歌翻译