我们建议在固定计算预算的约束下,提出一种稳定的,平行的方法来训练Wasserstein条件生成的对抗神经网络(W-CGANS)。与以前的分布式gan训练技术不同,我们的方法避免了过程间通信,降低了模式崩溃的风险并通过使用多个发电机来增强可扩展性,每个发电机都同时在单个数据标签上进行了训练。 Wasserstein度量的使用还通过稳定每个发电机的训练来降低骑自行车的风险。我们说明了CIFAR10,CIFAR100和IMAGENET1K数据集上的三个标准基准图像数据集上的方法,并维护每个数据集的图像的原始分辨率。在有限的固定计算时间和计算资源中,根据可伸缩性和最终准确性评估了性能。为了衡量准确性,我们使用成立得分,特征构成距离和图像质量。与以前的结果相比,通过在深卷积有条件的有条件生成的对抗神经网络(DC-CGANS)上执行并行方法相比,展示了成立评分和特征造成距离的改善,以及改善由新图像的图像质量的改善。甘斯的方法。在OLCF超级计算机峰会上使用多达2,000个NVIDIA V100 GPU的两个数据集都达到了弱缩放。
我们研究了GaN调理问题,其目标是使用标记数据将普雷雷尼的无条件GaN转换为条件GaN。我们首先识别并分析这一问题的三种方法 - 从头开始​​,微调和输入重新编程的条件GaN培训。我们的分析表明,当标记数据的数量很小时,输入重新编程执行最佳。通过稀缺标记数据的现实世界情景,我们专注于输入重编程方法,并仔细分析现有算法。在识别出先前输入重新编程方法的一些关键问题之后,我们提出了一种名为INREP +的新算法。我们的算法INREP +解决了现有问题,具有可逆性神经网络的新颖用途和正面未标记(PU)学习。通过广泛的实验,我们表明Inrep +优于所有现有方法,特别是当标签信息稀缺,嘈杂和/或不平衡时。例如,对于用1%标记数据调节CiFar10 GaN的任务,Inrep +实现了82.13的平均峰值,而第二个最佳方法达到114.51。
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CELEBA images at 1024 2 . We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CELEBA dataset.
与CNN的分类,分割或对象检测相比,生成网络的目标和方法根本不同。最初,它们不是作为图像分析工具,而是生成自然看起来的图像。已经提出了对抗性训练范式来稳定生成方法,并已被证明是非常成功的 - 尽管绝不是第一次尝试。本章对生成对抗网络(GAN)的动机进行了基本介绍,并通​​过抽象基本任务和工作机制并得出了早期实用方法的困难来追溯其成功的道路。将显示进行更稳定的训练方法,也将显示出不良收敛及其原因的典型迹象。尽管本章侧重于用于图像生成和图像分析的gan,但对抗性训练范式本身并非特定于图像,并且在图像分析中也概括了任务。在将GAN与最近进入场景的进一步生成建模方法进行对比之前,将闻名图像语义分割和异常检测的架构示例。这将允许对限制的上下文化观点,但也可以对gans有好处。
Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.
生成对抗网络(GAN)是现实图像合成的最新生成模型之一。虽然培训和评估GAN变得越来越重要,但当前的GAN研究生态系统并未提供可靠的基准,以始终如一地进行评估。此外,由于GAN实施很少,因此研究人员将大量时间用于重现基线。我们研究了GAN方法的分类法,并提出了一个名为Studiogan的新开源库。 Studiogan支持7种GAN体系结构,9种调理方法,4种对抗损失,13个正则化模块,3个可区分的增强,7个评估指标和5个评估骨干。通过我们的培训和评估协议,我们使用各种数据集(CIFAR10,ImageNet,AFHQV2,FFHQ和Baby/Papa/Granpa-Imagenet)和3个不同的评估骨干(InceptionV3,Swav,Swav和Swin Transformer)提出了大规模的基准。与GAN社区中使用的其他基准不同,我们在统一的培训管道中培训了包括Biggan,stylegan2和stylegan3在内的代表GAN,并使用7个评估指标量化了生成性能。基准测试评估其他尖端生成模型(例如,stylegan-xl,adm,maskgit和rq-transformer)。 Studiogan提供了预先训练的权重的GAN实现,培训和评估脚本。 Studiogan可从https://github.com/postech-cvlab/pytorch-studiogan获得。
生成对抗网络(GAN)是最受欢迎的图像生成模型,在各种计算机视觉任务上取得了显着进度。但是,训练不稳定仍然是所有基于GAN的算法的开放问题之一。已经提出了许多方法来稳定gan的训练,其重点分别放在损失功能,正则化和归一化技术,训练算法和模型体系结构上。与上述方法不同,在本文中,提出了有关稳定gan训练的新观点。发现有时发电机产生的图像在训练过程中像歧视者的对抗示例一样,这可能是导致gan不稳定训练的原因的一部分。有了这一发现,我们提出了直接的对抗训练(DAT)方法来稳定gan的训练过程。此外,我们证明DAT方法能够适应歧视器的Lipschitz常数。 DAT的高级性能在多个损失功能,网络体系结构,超参数和数据集上进行了验证。具体而言,基于SSGAN的CIFAR-100无条件生成,DAT在CIFAR-100的无条件生成上实现了11.5%的FID,基于SSGAN的STL-10无条件生成的FID和基于SSGAN的LSUN卧室无条件生成的13.2%FID。代码将在https://github.com/iceli1007/dat-gan上找到
We propose a novel, projection based way to incorporate the conditional information into the discriminator of GANs that respects the role of the conditional information in the underlining probabilistic model. This approach is in contrast with most frameworks of conditional GANs used in application today, which use the conditional information by concatenating the (embedded) conditional vector to the feature vectors. With this modification, we were able to significantly improve the quality of the class conditional image generation on ILSVRC2012 (Im-ageNet) 1000-class image dataset from the current state-of-the-art result, and we achieved this with a single pair of a discriminator and a generator. We were also able to extend the application to super-resolution and succeeded in producing highly discriminative super-resolution images. This new structure also enabled high quality category transformation based on parametric functional transformation of conditional batch normalization layers in the generator. The code with Chainer (Tokui et al., 2015), generated images and pretrained models are available at https://github.com/pfnet-research/sngan_projection.
从文本描述中综合现实图像是计算机视觉中的主要挑战。当前对图像合成方法的文本缺乏产生代表文本描述符的高分辨率图像。大多数现有的研究都依赖于生成的对抗网络(GAN)或变异自动编码器(VAE)。甘斯具有产生更清晰的图像的能力,但缺乏输出的多样性,而VAE擅长生产各种输出,但是产生的图像通常是模糊的。考虑到gan和vaes的相对优势,我们提出了一个新的有条件VAE(CVAE)和条件gan(CGAN)网络架构,用于合成以文本描述为条件的图像。这项研究使用条件VAE作为初始发电机来生成文本描述符的高级草图。这款来自第一阶段的高级草图输出和文本描述符被用作条件GAN网络的输入。第二阶段GAN产生256x256高分辨率图像。所提出的体系结构受益于条件加强和有条件的GAN网络的残留块,以实现结果。使用CUB和Oxford-102数据集进行了多个实验,并将所提出方法的结果与Stackgan等最新技术进行了比较。实验表明,所提出的方法生成了以文本描述为条件的高分辨率图像,并使用两个数据集基于Inception和Frechet Inception评分产生竞争结果
本文提出了有条件生成对抗性网络(CGANS)的两个重要贡献,以改善利用此架构的各种应用。第一个主要贡献是对CGANS的分析表明它们没有明确条件。特别地,将显示鉴别者和随后的Cgan不会自动学习输入之间的条件。第二种贡献是一种新方法,称为逆时针,该方法通过新颖的逆损失明确地模拟了对抗架构的两部分的条件,涉及培训鉴别者学习无条件(不利)示例。这导致了用于GANS(逆学习)的新型数据增强方法,其允许使用不利示例将发电机的搜索空间限制为条件输出。通过提出概率分布分析,进行广泛的实验以评估判别符的条件。与不同应用的CGAN架构的比较显示了众所周知的数据集的性能的显着改进,包括使用不同度量的不同度量的语义图像合成,图像分割,单眼深度预测和“单个标签” - 图像(FID) ),平均联盟(Miou)交叉口,根均线误差日志(RMSE日志)和统计上不同的箱数(NDB)。
Generative Adversarial Networks (GANs) were introduced by Goodfellow in 2014, and since then have become popular for constructing generative artificial intelligence models. However, the drawbacks of such networks are numerous, like their longer training times, their sensitivity to hyperparameter tuning, several types of loss and optimization functions and other difficulties like mode collapse. Current applications of GANs include generating photo-realistic human faces, animals and objects. However, I wanted to explore the artistic ability of GANs in more detail, by using existing models and learning from them. This dissertation covers the basics of neural networks and works its way up to the particular aspects of GANs, together with experimentation and modification of existing available models, from least complex to most. The intention is to see if state of the art GANs (specifically StyleGAN2) can generate album art covers and if it is possible to tailor them by genre. This was attempted by first familiarizing myself with 3 existing GANs architectures, including the state of the art StyleGAN2. The StyleGAN2 code was used to train a model with a dataset containing 80K album cover images, then used to style images by picking curated images and mixing their styles.
有条件的生成对抗网络(CGANs)将标准无条件GaN框架扩展到学习样本的联合数据标签分布,并已建立为能够产生高保真图像的强大生成模型。这种模型的训练挑战在于将课程信息恰当地注入到其发电机和鉴别器中。对于鉴别器,可以通过(1)直接将标签作为输入或(2)涉及辅助分类损失的标签来实现类调节。在本文中,我们表明前者直接对齐类条件的假和实际数据分布$ p(\ text {image} | \ text {class})$({\ EM数据匹配}),而后者对齐数据调节类分布$ p(\ text {class} | \ text {image})$({\ EM标签匹配})。虽然类别可分离性并不直接转化为样本质量,并且如果分类本身是本质上困难的话,如果不同类别的特征映射到同一点,则不能为发电机提供有用的指导,因此可以为同一点映射并因此变得不可分割。通过这种直觉激励,我们提出了一种双重投影GaN(P2Gan)模型,它学会在{\ EM数据匹配}和{\ EM标签匹配}之间平衡。然后,我们提出了一种改进的Cgan模型,通过辅助分类,通过最大限度地减少$ F $ -divergence,通过辅助分类直接对准假和实际条件$ p(\ text {class} | \ text {image})$。高斯(MOG)数据集的合成混合物和各种现实世界数据集的实验,包括CIFAR100,ImageNet和Vggface2,证明了我们所提出的模型的功效。
We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, nor do we require the model to be able to learn well without using any labels. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.
One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques. The code with Chainer (Tokui et al., 2015), generated images and pretrained models are available at https://github.com/pfnet-research/sngan_ projection.
We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to different distances between distributions.
