Random samples from a single image Single training image Figure 1: Image generation learned from a single training image. We propose SinGAN-a new unconditional generative model trained on a single natural image. Our model learns the image's patch statistics across multiple scales, using a dedicated multi-scale adversarial training scheme; it can then be used to generate new realistic image samples that preserve the original patch distribution while creating new object configurations and structures.
translated by 谷歌翻译
Training a generative model on a single image has drawn significant attention in recent years. Single image generative methods are designed to learn the internal patch distribution of a single natural image at multiple scales. These models can be used for drawing diverse samples that semantically resemble the training image, as well as for solving many image editing and restoration tasks that involve that particular image. Here, we introduce an extended framework, which allows to simultaneously learn the internal distributions of several images, by using a single model with spatially varying image-identity conditioning. Our BlendGAN opens the door to applications that are not supported by single-image models, including morphing, melding, and structure-texture fusion between two or more arbitrary images.
translated by 谷歌翻译
从单个样本产生图像,作为图像合成的新发展分支,引起了广泛的关注。在本文中,我们将该问题与单个图像的条件分布进行采样,提出了一种分层框架,通过关于结构,语义和纹理的分布的连续学习来简化复杂条件分布的学习学习和一代可理解。在此基础上,我们设计由三个级联的GAN组成的Exsingan,用于从给定的图像学习可解释的生成模型,级联的GANS连续模拟结构,语义和纹理的分布。由于以前的作品所做的,但也是从给定图像的内部补丁来学习的,而且来自GaN反演技术的外部获得的外部。与先前作品相比,Exsingan对内部和外部信息的适当组合有利于内部和外部信息的适当组合,对图像操纵任务进行了更强大的生成和竞争泛化能力。
translated by 谷歌翻译
Denoising diffusion models (DDMs) have led to staggering performance leaps in image generation, editing and restoration. However, existing DDMs use very large datasets for training. Here, we introduce a framework for training a DDM on a single image. Our method, which we coin SinDDM, learns the internal statistics of the training image by using a multi-scale diffusion process. To drive the reverse diffusion process, we use a fully-convolutional light-weight denoiser, which is conditioned on both the noise level and the scale. This architecture allows generating samples of arbitrary dimensions, in a coarse-to-fine manner. As we illustrate, SinDDM generates diverse high-quality samples, and is applicable in a wide array of tasks, including style transfer and harmonization. Furthermore, it can be easily guided by external supervision. Particularly, we demonstrate text-guided generation from a single image using a pre-trained CLIP model.
translated by 谷歌翻译
In recent years, multi-scale generative adversarial networks (GANs) have been proposed to build generalized image processing models based on single sample. Constraining on the sample size, multi-scale GANs have much difficulty converging to the global optimum, which ultimately leads to limitations in their capabilities. In this paper, we pioneered the introduction of PAC-Bayes generalized bound theory into the training analysis of specific models under different adversarial training methods, which can obtain a non-vacuous upper bound on the generalization error for the specified multi-scale GAN structure. Based on the drastic changes we found of the generalization error bound under different adversarial attacks and different training states, we proposed an adaptive training method which can greatly improve the image manipulation ability of multi-scale GANs. The final experimental results show that our adaptive training method in this paper has greatly contributed to the improvement of the quality of the images generated by multi-scale GANs on several image manipulation tasks. In particular, for the image super-resolution restoration task, the multi-scale GAN model trained by the proposed method achieves a 100% reduction in natural image quality evaluator (NIQE) and a 60% reduction in root mean squared error (RMSE), which is better than many models trained on large-scale datasets.
translated by 谷歌翻译
Our result (c) Application: Edit object appearance (b) Application: Change label types (a) Synthesized resultFigure 1: We propose a generative adversarial framework for synthesizing 2048 × 1024 images from semantic label maps (lower left corner in (a)). Compared to previous work [5], our results express more natural textures and details. (b) We can change labels in the original label map to create new scenes, like replacing trees with buildings. (c) Our framework also allows a user to edit the appearance of individual objects in the scene, e.g. changing the color of a car or the texture of a road. Please visit our website for more side-by-side comparisons as well as interactive editing demos.
translated by 谷歌翻译
通常在特定对象类别的大型3D数据集上对3D形状的现有生成模型进行培训。在本文中,我们研究了仅从单个参考3D形状学习的深层生成模型。具体而言,我们提出了一个基于GAN的多尺度模型,旨在捕获一系列空间尺度的输入形状的几何特征。为了避免在3D卷上操作引起的大量内存和计算成本,我们在三平面混合表示上构建了我们的发电机,这仅需要2D卷积。我们在参考形状的体素金字塔上训练我们的生成模型,而无需任何外部监督或手动注释。一旦受过训练,我们的模型就可以产生不同尺寸和宽高比的多样化和高质量的3D形状。所得的形状会跨不同的尺度呈现变化,同时保留了参考形状的全局结构。通过广泛的评估,无论是定性还是定量,我们都证明了我们的模型可以生成各种类型的3D形状。
translated by 谷歌翻译
Learning a good image prior is a long-term goal for image restoration and manipulation. While existing methods like deep image prior (DIP) capture low-level image statistics, there are still gaps toward an image prior that captures rich image semantics including color, spatial coherence, textures, and high-level concepts. This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images. As shown in Fig. 1, the deep generative prior (DGP) provides compelling results to restore missing semantics, e.g., color, patch, resolution, of various degraded images. It also enables diverse image manipulation including random jittering, image morphing, and category transfer. Such highly flexible restoration and manipulation are made possible through relaxing the assumption of existing GAN-inversion methods, which tend to fix the generator. Notably, we allow the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN. We show that these easy-to-implement and practical changes help preserve the reconstruction to remain in the manifold of nature image, and thus lead to more precise and faithful reconstruction for real images. Code is available at https://github.com/XingangPan/deepgenerative-prior.
translated by 谷歌翻译
即使自然图像有多种尺寸,生成模型也以固定分辨率运行。由于高分辨率的细节被删除并完全丢弃了低分辨率图像,因此丢失了宝贵的监督。我们认为,每个像素都很重要,并创建具有可变大小图像的数据集,该图像以本机分辨率收集。为了利用各种大小的数据,我们引入了连续尺度训练,该过程以随机尺度进行采样以训练具有可变输出分辨率的新发电机。首先,对生成器进行调节,可以使我们能够生成比以前更高的分辨率图像,而无需在模型中添加层。其次,通过对连续坐标进行调节,我们可以采样仍然遵守一致的全局布局的贴片,这也允许在更高分辨率下进行可扩展的训练。受控的FFHQ实验表明,与离散的多尺度方法相比,我们的方法可以更好地利用多分辨率培训数据,从而获得更好的FID分数和更清洁的高频细节。我们还训练包括教堂,山脉和鸟类在内的其他自然图像领域,并通过连贯的全球布局和现实的本地细节来展示任意量表的综合,超出了我们的实验中的2K分辨率。我们的项目页面可在以下网址找到:https://chail.github.io/anyres-gan/。
translated by 谷歌翻译
在大多数交互式图像生成任务中,用户感兴趣的区域(ROI),预计生成的结果将具有足够的外观,同时保持原始图像中正确且合理的结构。如果只有有限的数据,此类任务将变得更具挑战性。最近提出的生成模型仅基于一个图像完成培训。他们非常关注样本的整体特征,同时忽略样本中不同对象的实际语义信息。结果,对于基于ROI的生成任务,它们可能会产生过多随机性的不适当样品,而无需维护相关对象的正确结构。为了解决这个问题,这项工作介绍了名为Mogan的形态结构感知的生成对抗网络,该网络仅基于一个图像而产生具有不同外观和可靠结构的随机样品。对于ROI的培训,我们建议利用来自原始图像的数据,并引入新型模块将这种增强数据转换为包含结构和外观的知识,从而增强了模型对样品的理解。要学习除ROI以外的其他领域,我们采用二进制面具来确保与ROI隔离的一代。最后,我们设置了上述学习过程的平行和分层分支。与其他单一图像GAN方案相比,我们的方法着重于内部特征,包括维持理性结构和外观变化。实验比其竞争同行确认了我们模型对基于ROI的图像生成任务的能力。
translated by 谷歌翻译
交换自动编码器在深层图像操纵和图像到图像翻译中实现了最先进的性能。我们通过基于梯度逆转层引入简单而有效的辅助模块来改善这项工作。辅助模块的损失迫使发电机学会使用全零纹理代码重建图像,从而鼓励结构和纹理信息之间更好地分解。提出的基于属性的转移方法可以在样式传输中进行精致的控制,同时在不使用语义掩码的情况下保留结构信息。为了操纵图像,我们将对象的几何形状和输入图像的一般样式编码为两个潜在代码,并具有实施结构一致性的附加约束。此外,由于辅助损失,训练时间大大减少。提出的模型的优越性在复杂的域中得到了证明,例如已知最先进的卫星图像。最后,我们表明我们的模型改善了广泛的数据集的质量指标,同时通过多模式图像生成技术实现了可比的结果。
translated by 谷歌翻译
在本文中,我们基于单个图像呈现Deadsim,用于条件图像操纵的生成模型。我们发现广泛的增强是启用单个图像训练的关键,并将使用薄板样条(TPS)作为有效的增强。我们的网络学习在图像本身的图像的原始表示之间映射。原始表示的选择对操纵的缓和和表达性产生影响,并且可以是自动的(例如边缘),手动(例如分段)或混合,例如分割顶部的边缘。在操纵时间时,我们的生成器允许通过修改原始输入表示并通过网络映射映射来进行复杂的图像更改。我们的方法显示在图像操纵任务上实现了显着性能。
translated by 谷歌翻译
在本文中,我们基于单个图像呈现Deadsim,用于条件图像操纵的生成模型。我们发现广泛的增强是启用单个图像训练的关键,并将使用薄板样条(TPS)作为有效的增强。我们的网络学习在图像本身的图像的原始表示之间映射。原始表示的选择对操纵的缓和和表达性产生影响,并且可以是自动的(例如边缘),手动(例如分段)或混合,例如分割顶部的边缘。在操纵时间时,我们的生成器允许通过修改原始输入表示并通过网络映射映射来进行复杂的图像更改。我们的方法显示在图像操纵任务上实现了显着性能。
translated by 谷歌翻译
我们呈现SeveryGan,一种能够从单个输入示例自动生成砖纹理映射的方法。与大多数现有方法相比,专注于解决合成问题,我们的工作同时解决问题,合成和涤纶性。我们的关键思想是认识到,通过越野落扩展技术训练的生成网络内的潜伏空间产生具有在接缝交叉点的连续性的输出,然后可以通过裁剪中心区域进入彩色图像。由于不是潜在空间的每个值都有有效的来产生高质量的输出,因此我们利用鉴别者作为能够在采样过程中识别无伪纹理的感知误差度量。此外,与之前的深度纹理合成的工作相比,我们的模型设计和优化,以便使用多层纹理表示,使由多个地图组成的纹理,例如Albedo,法线等。我们广泛地测试网络的设计选择架构,丢失功能和采样参数。我们在定性和定量上展示我们的方法优于以前的方法和适用于不同类型的纹理。
translated by 谷歌翻译
语义图像编辑利用本地语义标签图来生成所需的内容。最近的工作借用了Spade Block来实现语义图像编辑。但是,由于编辑区域和周围像素之间的样式差异,它无法产生令人愉悦的结果。我们将其归因于以下事实:Spade仅使用与图像无关的局部语义布局,但忽略了已知像素中包含的图像特定样式。为了解决此问题,我们提出了一个样式保存的调制(SPM),其中包括两个调制过程:第一个调制包含上下文样式和语义布局,然后生成两个融合的调制参数。第二次调制采用融合参数来调制特征图。通过使用这两种调制,SPM可以在保留特定图像的上下文样式的同时注入给定的语义布局。此外,我们设计了一种渐进式体系结构,以粗到精细的方式生成编辑的内容。提出的方法可以获得上下文一致的结果,并显着减轻生成区域和已知像素之间的不愉快边界。
translated by 谷歌翻译
基于补丁的方法和深度网络已经采用了解决图像染色问题,具有自己的优势和劣势。基于补丁的方法能够通过从未遮盖区域搜索最近的邻居修补程序来恢复具有高质量纹理的缺失区域。但是,这些方法在恢复大缺失区域时会带来问题内容。另一方面,深度网络显示有希望的成果完成大区域。尽管如此,结果往往缺乏类似周围地区的忠诚和尖锐的细节。通过汇集两个范式中,我们提出了一种新的深度染色框架,其中纹理生成是由从未掩蔽区域提取的补丁样本的纹理记忆引导的。该框架具有一种新颖的设计,允许使用深度修复网络训练纹理存储器检索。此外,我们还介绍了贴片分配损失,以鼓励高质量的贴片合成。所提出的方法在三个具有挑战性的图像基准测试中,即地位,Celeba-HQ和巴黎街道视图数据集来说,该方法显示出质量和定量的卓越性能。
translated by 谷歌翻译
Single image super-resolution is the task of inferring a high-resolution image from a single low-resolution input. Traditionally, the performance of algorithms for this task is measured using pixel-wise reconstruction measures such as peak signal-to-noise ratio (PSNR) which have been shown to correlate poorly with the human perception of image quality. As a result, algorithms minimizing these metrics tend to produce over-smoothed images that lack highfrequency textures and do not look natural despite yielding high PSNR values.We propose a novel application of automated texture synthesis in combination with a perceptual loss focusing on creating realistic textures rather than optimizing for a pixelaccurate reproduction of ground truth images during training. By using feed-forward fully convolutional neural networks in an adversarial training setting, we achieve a significant boost in image quality at high magnification ratios. Extensive experiments on a number of datasets show the effectiveness of our approach, yielding state-of-the-art results in both quantitative and qualitative benchmarks.
translated by 谷歌翻译
盲目图像超分辨率(SR)是CV的长期任务,旨在恢复患有未知和复杂扭曲的低分辨率图像。最近的工作主要集中在采用更复杂的退化模型来模拟真实世界的降级。由此产生的模型在感知损失和产量感知令人信服的结果取得了突破性。然而,电流生成的对抗性网络结构所带来的限制仍然是显着的:处理像素同样地导致图像的结构特征的无知,并且导致性能缺点,例如扭曲线和背景过度锐化或模糊。在本文中,我们提出了A-ESRAN,用于盲人SR任务的GAN模型,其特色是基于U-NET的U-NET的多尺度鉴别器,可以与其他发电机无缝集成。据我们所知,这是第一项介绍U-Net结构作为GaN解决盲人问题的鉴别者的工作。本文还给出了对模型的多规模注意力突破的机制的解释。通过对现有作品的比较实验,我们的模型在非参考自然图像质量评估员度量上提出了最先进的水平性能。我们的消融研究表明,利用我们的鉴别器,基于RRDB的发电机可以利用多种尺度中图像的结构特征,因此与先前作品相比,更加感知地产生了感知的高分辨率图像。
translated by 谷歌翻译
Labels to Facade BW to Color Aerial to Map Labels to Street Scene Edges to Photo input output input input input input output output output output input output Day to Night Figure 1: Many problems in image processing, graphics, and vision involve translating an input image into a corresponding output image.These problems are often treated with application-specific algorithms, even though the setting is always the same: map pixels to pixels. Conditional adversarial nets are a general-purpose solution that appears to work well on a wide variety of these problems. Here we show results of the method on several. In each case we use the same architecture and objective, and simply train on different data.
translated by 谷歌翻译
本文的目标是对面部素描合成(FSS)问题进行全面的研究。然而,由于获得了手绘草图数据集的高成本,因此缺乏完整的基准,用于评估过去十年的FSS算法的开发。因此,我们首先向FSS引入高质量的数据集,名为FS2K,其中包括2,104个图像素描对,跨越三种类型的草图样式,图像背景,照明条件,肤色和面部属性。 FS2K与以前的FSS数据集不同于难度,多样性和可扩展性,因此应促进FSS研究的进展。其次,我们通过调查139种古典方法,包括34个手工特征的面部素描合成方法,37个一般的神经式传输方法,43个深映像到图像翻译方法,以及35个图像 - 素描方法。此外,我们详细说明了现有的19个尖端模型的综合实验。第三,我们为FSS提供了一个简单的基准,名为FSGAN。只有两个直截了当的组件,即面部感知屏蔽和风格矢量扩展,FSGAN将超越所提出的FS2K数据集的所有先前最先进模型的性能,通过大边距。最后,我们在过去几年中汲取的经验教训,并指出了几个未解决的挑战。我们的开源代码可在https://github.com/dengpingfan/fsgan中获得。
translated by 谷歌翻译