We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation. * This work was done during an internship at NVIDIA. 35th Conference on Neural Information Processing Systems (NeurIPS 2021).
translated by 谷歌翻译
Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.
translated by 谷歌翻译
The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.
translated by 谷歌翻译
Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
translated by 谷歌翻译
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.
translated by 谷歌翻译
生成的对抗网络由于研究人员的最新性能在生成新图像时仅使用目标分布的数据集,因此引起了研究人员的关注。已经表明,真实图像的频谱和假图像之间存在差异。由于傅立叶变换是一种徒图映射,因此说该模型在学习原始分布方面有一个重大问题是一个公平的结论。在这项工作中,我们研究了当前gan的架构和数学理论中提到的缺点的可能原因。然后,我们提出了一个新模型,以减少实际图像和假图像频谱之间的差异。为此,我们使用几何深度学习的蓝图为频域设计了一个全新的架构。然后,我们通过将原始数据的傅立叶域表示作为训练过程中的主要特征来表明生成图像的质量的有希望的改善。
translated by 谷歌翻译
The principle of equivariance to symmetry transformations enables a theoretically grounded approach to neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. Here we show how this principle can be extended beyond global symmetries to local gauge transformations. This enables the development of a very general class of convolutional neural networks on manifolds that depend only on the intrinsic geometry, and which includes many popular methods from equivariant and geometric deep learning.We implement gauge equivariant CNNs for signals defined on the surface of the icosahedron, which provides a reasonable approximation of the sphere. By choosing to work with this very regular manifold, we are able to implement the gauge equivariant convolution using a single conv2d call, making it a highly scalable and practical alternative to Spherical CNNs. Using this method, we demonstrate substantial improvements over previous methods on the task of segmenting omnidirectional images and global climate patterns.
translated by 谷歌翻译
即使自然图像有多种尺寸,生成模型也以固定分辨率运行。由于高分辨率的细节被删除并完全丢弃了低分辨率图像,因此丢失了宝贵的监督。我们认为,每个像素都很重要,并创建具有可变大小图像的数据集,该图像以本机分辨率收集。为了利用各种大小的数据,我们引入了连续尺度训练,该过程以随机尺度进行采样以训练具有可变输出分辨率的新发电机。首先,对生成器进行调节,可以使我们能够生成比以前更高的分辨率图像,而无需在模型中添加层。其次,通过对连续坐标进行调节,我们可以采样仍然遵守一致的全局布局的贴片,这也允许在更高分辨率下进行可扩展的训练。受控的FFHQ实验表明,与离散的多尺度方法相比,我们的方法可以更好地利用多分辨率培训数据,从而获得更好的FID分数和更清洁的高频细节。我们还训练包括教堂,山脉和鸟类在内的其他自然图像领域,并通过连贯的全球布局和现实的本地细节来展示任意量表的综合,超出了我们的实验中的2K分辨率。我们的项目页面可在以下网址找到:https://chail.github.io/anyres-gan/。
translated by 谷歌翻译
创成对抗性网络(甘斯)的主要目标是产生相同的统计数据所提供的培训数据的新数据。然而,最近的多部作品表明,国家的最先进的架构又斗争,以实现这一目标。特别地,他们报告的升高量在光谱统计这使得它可以直接区分真实和生成的图像的高频率。对于这种现象的解释是有争议的:虽然大多数的作品属性文物发电机,其他作品指向鉴别。我们需要在这些解释清醒的审视,并提供有关什么使有效的打击高频文物提出的措施的见解。要做到这一点,我们首先独立评估发电机和鉴别两者的架构,如果他们表现出的频率偏差,使学习的高频含量尤其成问题的分布调查。基于这些实验中,我们提出以下四点看法:1)不同的采样操作偏向不同光谱特性的发电机。 2)由上采样引入的伪像棋盘不能单独解释的光谱差异作为发电机能够补偿这些伪影。 3)鉴别器不与检测本身高频纠缠,但具有低幅度的频率上而奋斗。 4)在鉴别器的下采样操作可以削弱它提供的训练信号的质量。在这些研究结果,我们分析提出了在国家的最先进的甘训练对高频文物的措施,但发现没有现有的方法可以彻底解决谱伪呢。我们的研究结果表明,有很大的潜力,在提高鉴别和,这可能是关键的训练数据的分布更紧密地匹配。
translated by 谷歌翻译
Modern convolutional networks are not shiftinvariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and averagepooling, ignore the sampling theorem. The wellknown signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe increased accuracy in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe better generalization, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks.
translated by 谷歌翻译
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CELEBA images at 1024 2 . We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CELEBA dataset.
translated by 谷歌翻译
Translating or rotating an input image should not affect the results of many computer vision tasks. Convolutional neural networks (CNNs) are already translation equivariant: input image translations produce proportionate feature map translations. This is not the case for rotations. Global rotation equivariance is typically sought through data augmentation, but patch-wise equivariance is more difficult. We present Harmonic Networks or H-Nets, a CNN exhibiting equivariance to patch-wise translation and 360-rotation. We achieve this by replacing regular CNN filters with circular harmonics, returning a maximal response and orientation for every receptive field patch.H-Nets use a rich, parameter-efficient and fixed computational complexity representation, and we show that deep feature maps within the network encode complicated rotational invariants. We demonstrate that our layers are general enough to be used in conjunction with the latest architectures and techniques, such as deep supervision and batch normalization. We also achieve state-of-the-art classification on rotated-MNIST, and competitive results on other benchmark challenges.
translated by 谷歌翻译
我们分析了旋转模糊性在应​​用于球形图像的卷积神经网络(CNN)中的作用。我们比较了被称为S2CNN的组等效网络的性能和经过越来越多的数据增强量的标准非等级CNN。所选的体系结构可以视为相应设计范式的基线参考。我们的模型对投影到球体的MNIST或FashionMnist数据集进行了训练和评估。对于固有旋转不变的图像分类的任务,我们发现,通过大大增加数据增强量和网络的大小,标准CNN可以至少达到与Equivariant网络相同的性能。相比之下,对于固有的等效性语义分割任务,非等级网络的表现始终超过具有较少参数的模棱两可的网络。我们还分析和比较了不同网络的推理潜伏期和培训时间,从而实现了对等效架构和数据扩展之间的详细权衡考虑,以解决实际问题。实验中使用的均衡球网络可在https://github.com/janegerken/sem_seg_s2cnn上获得。
translated by 谷歌翻译
图像生物标准化倡议(IBSI)旨在通过标准化从图像中提取图像生物标志物(特征)的计算过程来提高射致研究的再现性。我们之前建立了169个常用特征的参考值,创建了标准的射频图像处理方案,并开发了用于垄断研究的报告指南。但是,若干方面没有标准化。在这里,我们提出了在射频中使用卷积图像过滤器的参考手册的初步版本。滤波器,例如高斯滤波器的小波或拉普拉斯,在强调特定图像特征(如边缘和Blob)中发挥重要组成部分。已发现从过滤滤波器响应图派生的功能可重复差。此参考手册构成了持续工作的基础,用于标准化卷积滤波器中的覆盖物中的持续工作,并在这项工作进行时更新。
translated by 谷歌翻译
最新的2D图像压缩方案依赖于卷积神经网络(CNN)的力量。尽管CNN为2D图像压缩提供了有希望的观点,但将此类模型扩展到全向图像并不简单。首先,全向图像具有特定的空间和统计特性,这些特性无法通过当前CNN模型完全捕获。其次,在球体上,基本的数学操作组成了CNN体系结构,例如翻译和采样。在本文中,我们研究了全向图像的表示模型的学习,并建议使用球体的HealPix均匀采样的属性来重新定义用于全向图像的深度学习模型中使用的数学工具。特别是,我们:i)提出了在球体上进行新的卷积操作的定义,以保持经典2D卷积的高表现力和低复杂性; ii)适应标准的CNN技术,例如步幅,迭代聚集和像素改组到球形结构域;然后iii)将我们的新框架应用于全向图像压缩的任务。我们的实验表明,与应用于等应角图像的类似学习模型相比,我们提出的球形溶液可带来更好的压缩增益,可以节省比特率的13.7%。同样,与基于图形卷积网络的学习模型相比,我们的解决方案支持更具表现力的过滤器,这些过滤器可以保留高频并提供压缩图像的更好的感知质量。这样的结果证明了拟议框架的效率,该框架为其他全向视觉任务任务打开了新的研究场所,以在球体歧管上有效实施。
translated by 谷歌翻译
We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without increasing the number of parameters. Group convolution layers are easy to use and can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. G-CNNs achieve state of the art results on CI-FAR10 and rotated MNIST.
translated by 谷歌翻译
随着深度学习(DL)的出现,超分辨率(SR)也已成为一个蓬勃发展的研究领域。然而,尽管结果有希望,但该领域仍然面临需要进一步研究的挑战,例如,允许灵活地采样,更有效的损失功能和更好的评估指标。我们根据最近的进步来回顾SR的域,并检查最新模型,例如扩散(DDPM)和基于变压器的SR模型。我们对SR中使用的当代策略进行了批判性讨论,并确定了有前途但未开发的研究方向。我们通过纳入该领域的最新发展,例如不确定性驱动的损失,小波网络,神经体系结构搜索,新颖的归一化方法和最新评估技术来补充先前的调查。我们还为整章中的模型和方法提供了几种可视化,以促进对该领域趋势的全球理解。最终,这篇综述旨在帮助研究人员推动DL应用于SR的界限。
translated by 谷歌翻译
我们提出了一个视频生成模型,该模型可以准确地重现对象运动,摄像头视图的变化以及随着时间的推移而产生的新内容。现有的视频生成方法通常无法生成新内容作为时间的函数,同时保持在真实环境中预期的一致性,例如合理的动态和对象持久性。一个常见的故障情况是,由于过度依赖归纳偏见而提供时间一致性,因此内容永远不会改变,例如单个潜在代码决定整个视频的内容。在另一个极端情况下,没有长期一致性,生成的视频可能会在不同场景之间不切实际。为了解决这些限制,我们通过重新设计暂时的潜在表示并通过较长的视频培训从数据中学习长期一致性来优先考虑时间轴。为此,我们利用了两阶段的培训策略,在该策略中,我们以低分辨率和高分辨率的较短视频分别训练了较长的视频。为了评估模型的功能,我们介绍了两个新的基准数据集,并明确关注长期时间动态。
translated by 谷歌翻译
由于其在翻译下的增强/不变性,卷积网络成功。然而,在坐标系的旋转取向不会影响数据的含义(例如对象分类)的情况下,诸如图像,卷,形状或点云的可旋转数据需要在旋转下的增强/不变性处理。另一方面,在旋转很重要的情况下是必要的估计/处理旋转(例如运动估计)。最近在所有这些方面的方法和理论方面取得了进展。在这里,我们提供了2D和3D旋转(以及翻译)的现有方法的概述,以及识别它们之间的共性和链接。
translated by 谷歌翻译
卷积神经网络(CNNS)非常有效,因为它们利用自然图像的固有转换不变性。但是,翻译只是无数的有用空间转换之一。在考虑其他空间的侵犯侵犯性时可以获得相同的效率吗?过去已经考虑过这种广义综合,但以高计算成本为例。我们展示了一个简单和精确的建筑,但标准卷积具有相同的计算复杂性。它由一个恒定的图像扭曲,后跟一个简单的卷积,这是深度学习工具箱中的标准块。通过精心制作的经线,所产生的架构可以使成功的架构成为各种各样的双参数空间转换。我们展示了令人鼓舞的现实情景结果,包括谷歌地球数据集(旋转和缩放)中车辆姿势的估计,并且面部在野外注释的面部地标中的面部姿势(在透视下的3D旋转)。
translated by 谷歌翻译