近年来,着色吸引了越来越多的兴趣。经典的基于参考的方法通常依靠外部颜色图像来获得合理的结果。检索此类示例不可避免地需要大型图像数据库或在线搜索引擎。最近的基于深度学习的方法可以自动以低成本为图像着色。但是,总是伴随着不满意的文物和不连贯的颜色。在这项工作中,我们提出了GCP颜色化,以利用预审前的生成对抗网络(GAN)封装的丰富和多样化的颜色先验进行自动着色。具体而言,我们首先通过GAN编码器“检索”匹配的功能(类似于示例),然后将这些功能与功能调制量合并到着色过程中。得益于强大的生成颜色先验(GCP)和精致的设计,我们的GCP颜色可以通过单个前向传球产生生动的颜色。此外,通过修改GAN潜在代码获得多样化的结果非常方便。 GCP颜色还继承了可解释的gan的功能,并可以通过穿过甘恩潜在空间来实现可控制和平滑的过渡。广泛的实验和用户研究表明,GCP颜色比以前的作品具有出色的性能。代码可在https://github.com/tothebeginning/gcp-colorization上找到。
translated by 谷歌翻译
Automatic image colorization is a particularly challenging problem. Due to the high illness of the problem and multi-modal uncertainty, directly training a deep neural network usually leads to incorrect semantic colors and low color richness. Existing transformer-based methods can deliver better results but highly depend on hand-crafted dataset-level empirical distribution priors. In this work, we propose DDColor, a new end-to-end method with dual decoders, for image colorization. More specifically, we design a multi-scale image decoder and a transformer-based color decoder. The former manages to restore the spatial resolution of the image, while the latter establishes the correlation between semantic representations and color queries via cross-attention. The two decoders incorporate to learn semantic-aware color embedding by leveraging the multi-scale visual features. With the help of these two decoders, our method succeeds in producing semantically consistent and visually plausible colorization results without any additional priors. In addition, a simple but effective colorfulness loss is introduced to further improve the color richness of generated results. Our extensive experiments demonstrate that the proposed DDColor achieves significantly superior performance to existing state-of-the-art works both quantitatively and qualitatively. Codes will be made publicly available.
translated by 谷歌翻译
对于现实和生动的着色,最近已经利用了生成先验。但是,由于其表示空间有限,因此这种生成先验的野外复杂图像通常会失败。在本文中,我们提出了BigColor,这是一种新型的着色方法,可为具有复杂结构的不同野外图像提供生动的着色。虽然先前的生成先验训练以综合图像结构和颜色,但我们在关注颜色合成之前就学会了一种生成颜色,鉴于图像的空间结构。通过这种方式,我们减轻了从生成先验中合成图像结构的负担,并扩大其表示空间以覆盖各种图像。为此,我们提出了一个以Biggan启发的编码生成网络,该网络使用空间特征映射而不是空间框架的Biggan潜在代码,从而产生了扩大的表示空间。我们的方法可以在单个正向传球中为各种输入提供强大的着色,支持任意输入分辨率,并提供多模式着色结果。我们证明,BigColor明显优于现有方法,尤其是在具有复杂结构的野外图像上。
translated by 谷歌翻译
我们表明,诸如Stylegan和Biggan之类的预训练的生成对抗网络(GAN)可以用作潜在银行,以提高图像超分辨率的性能。尽管大多数现有面向感知的方法试图通过以对抗性损失学习来产生现实的产出,但我们的方法,即生成的潜在银行(GLEAN),通过直接利用预先训练的gan封装的丰富而多样的先验来超越现有实践。但是,与需要在运行时需要昂贵的图像特定优化的普遍的GAN反演方法不同,我们的方法只需要单个前向通行证才能修复。可以轻松地将Glean合并到具有多分辨率Skip连接的简单编码器银行decoder架构中。采用来自不同生成模型的先验,可以将收集到各种类别(例如人的面孔,猫,建筑物和汽车)。我们进一步提出了一个轻巧的Glean,名为Lightglean,该版本仅保留Glean中的关键组成部分。值得注意的是,Lightglean仅由21%的参数和35%的拖鞋组成,同时达到可比的图像质量。我们将方法扩展到不同的任务,包括图像着色和盲图恢复,广泛的实验表明,与现有方法相比,我们提出的模型表现出色。代码和模型可在https://github.com/open-mmlab/mmediting上找到。
translated by 谷歌翻译
现实世界图像超分辨率(SR)的关键挑战是在低分辨率(LR)图像中恢复具有复杂未知降解(例如,下采样,噪声和压缩)的缺失细节。大多数以前的作品还原图像空间中的此类缺失细节。为了应对自然图像的高度多样性,他们要么依靠难以训练和容易训练和伪影的不稳定的甘体,要么诉诸于通常不可用的高分辨率(HR)图像中的明确参考。在这项工作中,我们提出了匹配SR(FEMASR)的功能,该功能在更紧凑的特征空间中恢复了现实的HR图像。与图像空间方法不同,我们的FEMASR通过将扭曲的LR图像{\ IT特征}与我们预读的HR先验中的无失真性HR对应物匹配来恢复HR图像,并解码匹配的功能以获得现实的HR图像。具体而言,我们的人力资源先验包含一个离散的特征代码簿及其相关的解码器,它们在使用量化的生成对抗网络(VQGAN)的HR图像上预估计。值得注意的是,我们在VQGAN中结合了一种新型的语义正则化,以提高重建图像的质量。对于功能匹配,我们首先提取由LR编码器组成的LR编码器的LR功能,然后遵循简单的最近邻居策略,将其与预读的代码簿匹配。特别是,我们为LR编码器配备了与解码器的残留快捷方式连接,这对于优化功能匹配损耗至关重要,还有助于补充可能的功能匹配错误。实验结果表明,我们的方法比以前的方法产生更现实的HR图像。代码以\ url {https://github.com/chaofengc/femasr}发布。
translated by 谷歌翻译
edu.hk (a) Image Reconstruction (b) Image Colorization (c) Image Super-Resolution (d) Image Denoising (e) Image Inpainting (f) Semantic Manipulation Figure 1: Multi-code GAN prior facilitates many image processing applications using the reconstruction from fixed PGGAN [23] models.
translated by 谷歌翻译
基于示例的基于彩色方法依赖于参考图像来为目标灰度图像提供合理的颜色。基于示例的颜色的关键和难度是在这两个图像之间建立准确的对应关系。以前的方法已经尝试构建这种对应关系,而是面临两个障碍。首先,使用用于计算对应的亮度通道是不准确的。其次,它们构建的密集信件引入了错误的匹配结果并提高了计算负担。为了解决这两个问题,我们提出了语义 - 稀疏的彩色网络(SSCN)以粗细的方式将全局图像样式和详细的语义相关颜色传输到灰度图像。我们的网络可以完全平衡全局和本地颜色,同时减轻了暧昧的匹配问题。实验表明,我们的方法优于定量和定性评估的现有方法,实现了最先进的性能。
translated by 谷歌翻译
近年来,面部语义指导(包括面部地标,面部热图和面部解析图)和面部生成对抗网络(GAN)近年来已广泛用于盲面修复(BFR)。尽管现有的BFR方法在普通案例中取得了良好的性能,但这些解决方案在面对严重降解和姿势变化的图像时具有有限的弹性(例如,在现实世界情景中看起来右,左看,笑等)。在这项工作中,我们提出了一个精心设计的盲人面部修复网络,具有生成性面部先验。所提出的网络主要由非对称编解码器和stylegan2先验网络组成。在非对称编解码器中,我们采用混合的多路残留块(MMRB)来逐渐提取输入图像的弱纹理特征,从而可以更好地保留原始面部特征并避免过多的幻想。 MMRB也可以在其他网络中插入插件。此外,多亏了StyleGAN2模型的富裕和多样化的面部先验,我们采用了微调的方法来灵活地恢复自然和现实的面部细节。此外,一种新颖的自我监督训练策略是专门设计用于面部修复任务的,以使分配更接近目标并保持训练稳定性。关于合成和现实世界数据集的广泛实验表明,我们的模型在面部恢复和面部超分辨率任务方面取得了卓越的表现。
translated by 谷歌翻译
Face Restoration (FR) aims to restore High-Quality (HQ) faces from Low-Quality (LQ) input images, which is a domain-specific image restoration problem in the low-level computer vision area. The early face restoration methods mainly use statistic priors and degradation models, which are difficult to meet the requirements of real-world applications in practice. In recent years, face restoration has witnessed great progress after stepping into the deep learning era. However, there are few works to study deep learning-based face restoration methods systematically. Thus, this paper comprehensively surveys recent advances in deep learning techniques for face restoration. Specifically, we first summarize different problem formulations and analyze the characteristic of the face image. Second, we discuss the challenges of face restoration. Concerning these challenges, we present a comprehensive review of existing FR methods, including prior based methods and deep learning-based methods. Then, we explore developed techniques in the task of FR covering network architectures, loss functions, and benchmark datasets. We also conduct a systematic benchmark evaluation on representative methods. Finally, we discuss future directions, including network designs, metrics, benchmark datasets, applications,etc. We also provide an open-source repository for all the discussed methods, which is available at https://github.com/TaoWangzj/Awesome-Face-Restoration.
translated by 谷歌翻译
我们提出了第一个统一的框架Unicolor,以支持多种方式的着色,包括无条件和条件性的框架,例如中风,示例,文本,甚至是它们的混合。我们没有为每种类型的条件学习单独的模型,而是引入了一个两阶段的着色框架,以将各种条件纳入单个模型。在第一阶段,多模式条件将转换为提示点的共同表示。特别是,我们提出了一种基于剪辑的新方法,将文本转换为提示点。在第二阶段,我们提出了一个基于变压器的网络,该网络由Chroma-vqgan和Hybrid-Transformer组成,以生成以提示点为条件的多样化和高质量的着色结果。定性和定量比较都表明,我们的方法在每种控制方式中都优于最先进的方法,并进一步实现了以前不可行的多模式着色。此外,我们设计了一个交互式界面,显示了我们统一框架在实际用法中的有效性,包括自动着色,混合控制着色,局部再现和迭代色彩编辑。我们的代码和型号可在https://luckyhzt.github.io/unicolor上找到。
translated by 谷歌翻译
Learning a good image prior is a long-term goal for image restoration and manipulation. While existing methods like deep image prior (DIP) capture low-level image statistics, there are still gaps toward an image prior that captures rich image semantics including color, spatial coherence, textures, and high-level concepts. This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images. As shown in Fig. 1, the deep generative prior (DGP) provides compelling results to restore missing semantics, e.g., color, patch, resolution, of various degraded images. It also enables diverse image manipulation including random jittering, image morphing, and category transfer. Such highly flexible restoration and manipulation are made possible through relaxing the assumption of existing GAN-inversion methods, which tend to fix the generator. Notably, we allow the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN. We show that these easy-to-implement and practical changes help preserve the reconstruction to remain in the manifold of nature image, and thus lead to more precise and faithful reconstruction for real images. Code is available at https://github.com/XingangPan/deepgenerative-prior.
translated by 谷歌翻译
我们呈现SeveryGan,一种能够从单个输入示例自动生成砖纹理映射的方法。与大多数现有方法相比,专注于解决合成问题,我们的工作同时解决问题,合成和涤纶性。我们的关键思想是认识到,通过越野落扩展技术训练的生成网络内的潜伏空间产生具有在接缝交叉点的连续性的输出,然后可以通过裁剪中心区域进入彩色图像。由于不是潜在空间的每个值都有有效的来产生高质量的输出,因此我们利用鉴别者作为能够在采样过程中识别无伪纹理的感知误差度量。此外,与之前的深度纹理合成的工作相比,我们的模型设计和优化,以便使用多层纹理表示,使由多个地图组成的纹理,例如Albedo,法线等。我们广泛地测试网络的设计选择架构,丢失功能和采样参数。我们在定性和定量上展示我们的方法优于以前的方法和适用于不同类型的纹理。
translated by 谷歌翻译
我们提出了Exe-Gan,这是一种新型的使用生成对抗网络的典范引导的面部介绍框架。我们的方法不仅可以保留输入面部图像的质量,而且还可以使用类似示例性的面部属性来完成图像。我们通过同时利用输入图像的全局样式,从随机潜在代码生成的随机样式以及示例图像的示例样式来实现这一目标。我们介绍了一个新颖的属性相似性指标,以鼓励网络以一种自我监督的方式从示例中学习面部属性的风格。为了确保跨地区边界之间的自然过渡,我们引入了一种新型的空间变体梯度反向传播技术,以根据空间位置调整损耗梯度。关于公共Celeba-HQ和FFHQ数据集的广泛评估和实际应用,可以验证Exe-GAN的优越性,从面部镶嵌的视觉质量来看。
translated by 谷歌翻译
由于简单但有效的训练机制和出色的图像产生质量,生成的对抗网络(GAN)引起了极大的关注。具有生成照片现实的高分辨率(例如$ 1024 \ times1024 $)的能力,最近的GAN模型已大大缩小了生成的图像与真实图像之间的差距。因此,许多最近的作品表明,通过利用良好的潜在空间和博学的gan先验来利用预先训练的GAN模型的新兴兴趣。在本文中,我们简要回顾了从三个方面利用预先培训的大规模GAN模型的最新进展,即1)大规模生成对抗网络的培训,2)探索和理解预训练的GAN模型,以及预先培训的GAN模型,以及3)利用这些模型进行后续任务,例如图像恢复和编辑。有关相关方法和存储库的更多信息,请访问https://github.com/csmliu/pretretaining-gans。
translated by 谷歌翻译
现有的GAN倒置和编辑方法适用于具有干净背景的对齐物体,例如肖像和动物面孔,但通常会为更加困难的类别而苦苦挣扎,具有复杂的场景布局和物体遮挡,例如汽车,动物和室外图像。我们提出了一种新方法,以在gan的潜在空间(例如stylegan2)中倒转和编辑复杂的图像。我们的关键想法是用一系列层的集合探索反演,从而将反转过程适应图像的难度。我们学会预测不同图像段的“可逆性”,并将每个段投影到潜在层。更容易的区域可以倒入发电机潜在空间中的较早层,而更具挑战性的区域可以倒入更晚的特征空间。实验表明,与最新的复杂类别的方法相比,我们的方法获得了更好的反转结果,同时保持下游的编辑性。请参阅我们的项目页面,网址为https://www.cs.cmu.edu/~saminversion。
translated by 谷歌翻译
We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. Inverting real images into StyleGAN's latent space is an extensively studied problem, yet the trade-off between the image reconstruction fidelity and image editing quality remains an open challenge. The low-rate latent spaces are limited in their expressiveness power for high-fidelity reconstruction. On the other hand, high-rate latent spaces result in degradation in editing quality. In this work, to achieve high-fidelity inversion, we learn residual features in higher latent codes that lower latent codes were not able to encode. This enables preserving image details in reconstruction. To achieve high-quality editing, we learn how to transform the residual features for adapting to manipulations in latent codes. We train the framework to extract residual features and transform them via a novel architecture pipeline and cycle consistency losses. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements. Code: https://github.com/hamzapehlivan/StyleRes
translated by 谷歌翻译
尽管最近的生成面部先验和几何事物最近证明了盲面修复的高质量结果,但忠实于投入的细粒度细节仍然是一个具有挑战性的问题。由基于经典词典的方法和最近的矢量量化(VQ)技术激励,我们提出了一种基于VQ的面部恢复方法-VQFR。 VQFR利用从高质量面孔中提取的高质量低级特征银行,因此可以帮助恢复现实的面部细节。但是,通过忠实的细节和身份保存,VQ代码簿的简单应用无法取得良好的结果。因此,我们进一步介绍了两个特殊的网络设计。 1)。我们首先研究了VQ代码簿中的压缩补丁大小,并发现使用适当的压缩补丁大小设计的VQ代码簿对于平衡质量和忠诚度至关重要。 2)。为了进一步融合来自输入的低级功能,而不是“污染” VQ代码簿中生成的现实细节,我们提出了一个由纹理解码器和主要解码器组成的并行解码器。然后,这两个解码器与具有变形卷积的纹理翘曲模块进行交互。拟议的VQFR配备了VQ Codebook作为面部细节词典和平行解码器设计,可以在很大程度上提高面部细节的恢复质量,同时保持对先前方法的保真度。
translated by 谷歌翻译
Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
translated by 谷歌翻译
In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions. The proposed method consists of three components: StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization. The inversion module maps real images to the latent space of a well-trained StyleGAN. The visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. The instancelevel optimization is for identity preservation in manipulation. Our model can produce diverse and high-quality images with an unprecedented resolution at 1024 2 . Using a control mechanism based on style-mixing, our Tedi-GAN inherently supports image synthesis with multi-modal inputs, such as sketches or semantic labels, with or without instance guidance. To facilitate text-guided multimodal synthesis, we propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions. Extensive experiments on the introduced dataset demonstrate the superior performance of our proposed method. Code and data are available at https://github.com/weihaox/TediGAN.
translated by 谷歌翻译
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing. While studies over extending 2D StyleGAN to 3D faces have emerged, a corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. In this paper, we study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures. The problem is ill-posed: innumerable compositions of shape and texture could be rendered to the current image. Furthermore, with the limited capacity of a global latent code, 2D inversion methods cannot preserve faithful shape and texture at the same time when applied to 3D models. To solve this problem, we devise an effective self-training scheme to constrain the learning of inversion. The learning is done efficiently without any real-world 2D-3D training pairs but proxy samples generated from a 3D GAN. In addition, apart from a global latent code that captures the coarse shape and texture information, we augment the generation network with a local branch, where pixel-aligned features are added to faithfully reconstruct face details. We further consider a new pipeline to perform 3D view-consistent editing. Extensive experiments show that our method outperforms state-of-the-art inversion methods in both shape and texture reconstruction quality. Code and data will be released.
translated by 谷歌翻译