尽管无条件的特征反演是许多图像合成应用的基础,但训练逆变器需要高计算预算,大型解码容量和强加的条件,例如自回旋先验。为了解决这些局限性,我们建议使用对抗强大的表示作为特征反演的感知原始。我们训练一个对抗性稳健的编码器,以提取分离和感知对齐的图像表示,使其容易逆转。通过使用编码器的镜像架构训练简单的发电机,我们实现了优于标准模型的卓越重建质量和概括。基于此,我们提出了一个具有对抗性的自动编码器,并展示了其在样式转移,图像denoisising和异常检测任务方面的改进性能。与最近的Imagenet特征反演方法相比,我们的模型的性能提高了,复杂性的性能明显较小。
translated by 谷歌翻译
Learning a good image prior is a long-term goal for image restoration and manipulation. While existing methods like deep image prior (DIP) capture low-level image statistics, there are still gaps toward an image prior that captures rich image semantics including color, spatial coherence, textures, and high-level concepts. This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images. As shown in Fig. 1, the deep generative prior (DGP) provides compelling results to restore missing semantics, e.g., color, patch, resolution, of various degraded images. It also enables diverse image manipulation including random jittering, image morphing, and category transfer. Such highly flexible restoration and manipulation are made possible through relaxing the assumption of existing GAN-inversion methods, which tend to fix the generator. Notably, we allow the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN. We show that these easy-to-implement and practical changes help preserve the reconstruction to remain in the manifold of nature image, and thus lead to more precise and faithful reconstruction for real images. Code is available at https://github.com/XingangPan/deepgenerative-prior.
translated by 谷歌翻译
Image-generating machine learning models are typically trained with loss functions based on distance in the image space. This often leads to over-smoothed results. We propose a class of loss functions, which we call deep perceptual similarity metrics (DeePSiM), that mitigate this problem. Instead of computing distances in the image space, we compute distances between image features extracted by deep neural networks. This metric better reflects perceptually similarity of images and thus leads to better results. We show three applications: autoencoder training, a modification of a variational autoencoder, and inversion of deep convolutional networks. In all cases, the generated images look sharp and resemble natural images.
translated by 谷歌翻译
随着脑成像技术和机器学习工具的出现,很多努力都致力于构建计算模型来捕获人脑中的视觉信息的编码。最具挑战性的大脑解码任务之一是通过功能磁共振成像(FMRI)测量的脑活动的感知自然图像的精确重建。在这项工作中,我们调查了来自FMRI的自然图像重建的最新学习方法。我们在架构设计,基准数据集和评估指标方面检查这些方法,并在标准化评估指标上呈现公平的性能评估。最后,我们讨论了现有研究的优势和局限,并提出了潜在的未来方向。
translated by 谷歌翻译
Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, superresolution, and inpainting. Furthermore, the same prior can be used to invert deep neural representations to diagnose them, and to restore images based on flash-no flash input pairs.
translated by 谷歌翻译
We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.
translated by 谷歌翻译
经典图像恢复算法使用各种前瞻性,无论是明确的还是明确的。他们的前沿是手工设计的,它们的相应权重是启发式分配的。因此,深度学习方法通​​常会产生优异的图像恢复质量。然而,深度网络是能够诱导强烈且难以预测的幻觉。在学习图像时,网络隐含地学会联合忠于观察到的数据;然后是不可能的原始数据和下游的幻觉数据的分离。这限制了它们在图像恢复中的广泛采用。此外,通常是降解模型过度装备的受害者的幻觉部分。我们提出了一种具有解耦的网络先前的幻觉和数据保真度的方法。我们将我们的框架称为贝叶斯队的生成先前(BigPrior)的集成。我们的方法植根于贝叶斯框架中,并将其紧密连接到经典恢复方法。实际上,它可以被视为大型经典恢复算法的概括。我们使用网络反转来从生成网络中提取图像先前信息。我们表明,在图像着色,染色和去噪,我们的框架始终如一地提高了反演结果。我们的方法虽然部分依赖于生成网络反演的质量,具有竞争性的监督和任务特定的恢复方法。它还提供了一种额外的公制,其阐述了每像素的先前依赖程度相对于数据保真度。
translated by 谷歌翻译
通过将图像形成过程分解成逐个申请的去噪自身额,扩散模型(DMS)实现了最先进的合成导致图像数据和超越。另外,它们的配方允许引导机构来控制图像生成过程而不会再刷新。然而,由于这些模型通常在像素空间中直接操作,因此强大的DMS的优化通常消耗数百个GPU天,并且由于顺序评估,推理是昂贵的。为了在保留其质量和灵活性的同时启用有限计算资源的DM培训,我们将它们应用于强大的佩带自动化器的潜在空间。与以前的工作相比,这种代表上的培训扩散模型允许第一次达到复杂性降低和细节保存之间的近乎最佳点,极大地提高了视觉保真度。通过将跨关注层引入模型架构中,我们将扩散模型转化为强大而柔性的发电机,以进行诸如文本或边界盒和高分辨率合成的通用调节输入,以卷积方式变得可以实现。我们的潜在扩散模型(LDMS)实现了一种新的技术状态,可在各种任务中进行图像修复和高竞争性能,包括无条件图像生成,语义场景合成和超级分辨率,同时与基于像素的DMS相比显着降低计算要求。代码可在https://github.com/compvis/lattent-diffusion获得。
translated by 谷歌翻译
edu.hk (a) Image Reconstruction (b) Image Colorization (c) Image Super-Resolution (d) Image Denoising (e) Image Inpainting (f) Semantic Manipulation Figure 1: Multi-code GAN prior facilitates many image processing applications using the reconstruction from fixed PGGAN [23] models.
translated by 谷歌翻译
Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image superresolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4× upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.
translated by 谷歌翻译
基于图像的艺术渲染可以使用算法图像过滤合成各种表达式。与基于深度学习的方法相反,这些基于启发式的过滤技术可以在高分辨率图像上运行,可以解释,并且可以根据各个设计方面进行参数化。但是,适应或扩展这些技术生产新样式通常是一项乏味且容易出错的任务,需要专家知识。我们提出了一个新的范式来减轻此问题:实现算法图像过滤技术作为可区分的操作,可以学习与某些参考样式一致的参数化。为此,我们提出了明智的,这是一个基于示例的图像处理系统,可以在公共框架内处理多种风格化技术,例如水彩,油或卡通风格。通过训练全局和本地滤波器参数化的参数预测网络,我们可以同时适应参考样式和图像内容,例如增强面部特征。我们的方法可以在样式转移框架中进行优化,也可以在用于图像到图像翻译的生成对流设置中学习。我们证明,共同训练XDOG滤波器和用于后处理的CNN可以与基于GAN的最新方法获得可比的结果。
translated by 谷歌翻译
由于简单但有效的训练机制和出色的图像产生质量,生成的对抗网络(GAN)引起了极大的关注。具有生成照片现实的高分辨率(例如$ 1024 \ times1024 $)的能力,最近的GAN模型已大大缩小了生成的图像与真实图像之间的差距。因此,许多最近的作品表明,通过利用良好的潜在空间和博学的gan先验来利用预先训练的GAN模型的新兴兴趣。在本文中,我们简要回顾了从三个方面利用预先培训的大规模GAN模型的最新进展,即1)大规模生成对抗网络的培训,2)探索和理解预训练的GAN模型,以及预先培训的GAN模型,以及3)利用这些模型进行后续任务,例如图像恢复和编辑。有关相关方法和存储库的更多信息,请访问https://github.com/csmliu/pretretaining-gans。
translated by 谷歌翻译
We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features. We do this by: (i) the introduction of a perceptual similarity term between generated and real images (ii) incorporating several techniques from the adversarial training literature including multi-scale training and adaptive discriminator augmentation. The combination of these results in not only better pixel reconstruction but also representations which appear to capture better higher-level details within images. More consequentially, we show how our method, Perceptual MAE, leads to better performance when used for downstream tasks outperforming previous methods. We achieve 78.1% top-1 accuracy linear probing on ImageNet-1K and up to 88.1% when fine-tuning, with similar results for other downstream tasks, all without use of additional pre-trained models or data.
translated by 谷歌翻译
在这项工作中,我们研究了面部重建的问题,鉴于从黑框面部识别引擎中提取的面部特征表示。确实,由于引擎中抽象信息的局限性,在实践中,这是非常具有挑战性的问题。因此,我们在蒸馏框架(dab-gan)中引入了一种名为基于注意力的生成对抗网络的新方法,以合成受试者的面孔,鉴于其提取的面部识别功能。鉴于主题的任何不受约束的面部特征,Dab-Gan可以在高清上重建他/她的脸。 DAB-GAN方法包括一种新型的基于注意力的生成结构,采用新的定义的Bioxtive Metrics学习方法。该框架首先引入徒图,以便可以在图像域中直接采用距离测量和度量学习过程,以进行图像重建任务。来自Blackbox面部识别引擎的信息将使用全局蒸馏过程最佳利用。然后,提出了一个基于注意力的发电机,以使一个高度可靠的发电机通过ID保存综合逼真的面孔。我们已经评估了有关具有挑战性的面部识别数据库的方法,即Celeba,LF​​W,AgeDB,CFP-FP,并始终取得了最新的结果。 Dab-Gan的进步也得到了图像现实主义和ID保存属性的证明。
translated by 谷歌翻译
Single image super-resolution is the task of inferring a high-resolution image from a single low-resolution input. Traditionally, the performance of algorithms for this task is measured using pixel-wise reconstruction measures such as peak signal-to-noise ratio (PSNR) which have been shown to correlate poorly with the human perception of image quality. As a result, algorithms minimizing these metrics tend to produce over-smoothed images that lack highfrequency textures and do not look natural despite yielding high PSNR values.We propose a novel application of automated texture synthesis in combination with a perceptual loss focusing on creating realistic textures rather than optimizing for a pixelaccurate reproduction of ground truth images during training. By using feed-forward fully convolutional neural networks in an adversarial training setting, we achieve a significant boost in image quality at high magnification ratios. Extensive experiments on a number of datasets show the effectiveness of our approach, yielding state-of-the-art results in both quantitative and qualitative benchmarks.
translated by 谷歌翻译
The Super-Resolution Generative Adversarial Network (SR-GAN) [1] is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGANnetwork architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN [2] to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge 1 [3]. The code is available at https://github.com/xinntao/ESRGAN.
translated by 谷歌翻译
扩散概率模型(DPMS)在竞争对手GANS的图像生成中取得了显着的质量。但与GAN不同,DPMS使用一组缺乏语义含义的一组潜在变量,并且不能作为其他任务的有用表示。本文探讨了使用DPMS进行表示学习的可能性,并寻求通过自动编码提取输入图像的有意义和可解码的表示。我们的主要思想是使用可学习的编码器来发现高级语义,以及DPM作为用于建模剩余随机变化的解码器。我们的方法可以将任何图像编码为两部分潜在的代码,其中第一部分是语义有意义和线性的,第二部分捕获随机细节,允许接近精确的重建。这种功能使当前箔基于GaN的方法的挑战性应用,例如实际图像上的属性操作。我们还表明,这两级编码可提高去噪效率,自然地涉及各种下游任务,包括几次射击条件采样。
translated by 谷歌翻译
在本文中,我们通过添加Laplacian Pyramid(LP)概念来开发Laplacian类似于类似的自动编码器(LPAE),以广泛用于分析信号处理中的图像。LPAE将图像分解为近似图像和编码器部分中的详细图像,然后尝试使用两个组件在解码器部分中重建原始图像。我们使用LPAE进行分类和超分辨率领域的实验。使用详细图像和较小尺寸的近似图像作为分类网络的输入,我们的LPAE使模型更轻。此外,我们表明连接分类网络的性能仍然很高。在超分辨率区域中,我们表明解码器部分通过设置类似于LP的结构来获得高质量的重建图像。因此,LPAE通过组合自动编码器的解码器和超分辨率网络来改善原始结果。
translated by 谷歌翻译
基于神经网络的图像压缩已经过度研究。模型稳健性很大程度上被忽视,但它对服务能够实现至关重要。我们通过向原始源图像注入少量噪声扰动来执行对抗攻击,然后使用主要学习的图像压缩模型来编码这些对抗示例。实验报告对逆势实例的重建中的严重扭曲,揭示了现有方法的一般漏洞,无论用于底层压缩模型(例如,网络架构,丢失功能,质量标准)和用于注射扰动的优化策略(例如,噪声阈值,信号距离测量)。后来,我们应用迭代对抗的FineTuning来细化掠夺模型。在每次迭代中,将随机源图像和对抗示例混合以更新底层模型。结果通过大大提高压缩模型稳健性来表明提出的FineTuning策略的有效性。总体而言,我们的方法是简单,有效和更广泛的,使其具有开发稳健的学习图像压缩解决方案的吸引力。所有材料都在HTTPS://njuvision.github.io/trobustn中公开访问,以便可重复研究。
translated by 谷歌翻译
Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
translated by 谷歌翻译