基于图像的艺术渲染可以使用算法图像过滤合成各种表达式。与基于深度学习的方法相反,这些基于启发式的过滤技术可以在高分辨率图像上运行,可以解释,并且可以根据各个设计方面进行参数化。但是,适应或扩展这些技术生产新样式通常是一项乏味且容易出错的任务,需要专家知识。我们提出了一个新的范式来减轻此问题:实现算法图像过滤技术作为可区分的操作,可以学习与某些参考样式一致的参数化。为此,我们提出了明智的,这是一个基于示例的图像处理系统,可以在公共框架内处理多种风格化技术,例如水彩,油或卡通风格。通过训练全局和本地滤波器参数化的参数预测网络,我们可以同时适应参考样式和图像内容,例如增强面部特征。我们的方法可以在样式转移框架中进行优化,也可以在用于图像到图像翻译的生成对流设置中学习。我们证明,共同训练XDOG滤波器和用于后处理的CNN可以与基于GAN的最新方法获得可比的结果。
translated by 谷歌翻译
本文的目标是对面部素描合成(FSS)问题进行全面的研究。然而,由于获得了手绘草图数据集的高成本,因此缺乏完整的基准,用于评估过去十年的FSS算法的开发。因此,我们首先向FSS引入高质量的数据集,名为FS2K,其中包括2,104个图像素描对,跨越三种类型的草图样式,图像背景,照明条件,肤色和面部属性。 FS2K与以前的FSS数据集不同于难度,多样性和可扩展性,因此应促进FSS研究的进展。其次,我们通过调查139种古典方法,包括34个手工特征的面部素描合成方法,37个一般的神经式传输方法,43个深映像到图像翻译方法,以及35个图像 - 素描方法。此外,我们详细说明了现有的19个尖端模型的综合实验。第三,我们为FSS提供了一个简单的基准,名为FSGAN。只有两个直截了当的组件,即面部感知屏蔽和风格矢量扩展,FSGAN将超越所提出的FS2K数据集的所有先前最先进模型的性能,通过大边距。最后,我们在过去几年中汲取的经验教训,并指出了几个未解决的挑战。我们的开源代码可在https://github.com/dengpingfan/fsgan中获得。
translated by 谷歌翻译
Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
translated by 谷歌翻译
提供和渲染室内场景一直是室内设计的一项长期任务,艺术家为空间创建概念设计,建立3D模型的空间,装饰,然后执行渲染。尽管任务很重要,但它很乏味,需要巨大的努力。在本文中,我们引入了一个特定领域的室内场景图像合成的新问题,即神经场景装饰。鉴于一张空的室内空间的照片以及用户确定的布局列表,我们旨在合成具有所需的家具和装饰的相同空间的新图像。神经场景装饰可用于以简单而有效的方式创建概念室内设计。我们解决这个研究问题的尝试是一种新颖的场景生成体系结构,它将空的场景和对象布局转化为现实的场景照片。我们通过将其与有条件图像合成基线进行比较,以定性和定量的方式将其进行比较,证明了我们提出的方法的性能。我们进行广泛的实验,以进一步验证我们生成的场景的合理性和美学。我们的实现可在\ url {https://github.com/hkust-vgd/neural_scene_decoration}获得。
translated by 谷歌翻译
Photo-realistic style transfer aims at migrating the artistic style from an exemplar style image to a content image, producing a result image without spatial distortions or unrealistic artifacts. Impressive results have been achieved by recent deep models. However, deep neural network based methods are too expensive to run in real-time. Meanwhile, bilateral grid based methods are much faster but still contain artifacts like overexposure. In this work, we propose the \textbf{Adaptive ColorMLP (AdaCM)}, an effective and efficient framework for universal photo-realistic style transfer. First, we find the complex non-linear color mapping between input and target domain can be efficiently modeled by a small multi-layer perceptron (ColorMLP) model. Then, in \textbf{AdaCM}, we adopt a CNN encoder to adaptively predict all parameters for the ColorMLP conditioned on each input content and style image pair. Experimental results demonstrate that AdaCM can generate vivid and high-quality stylization results. Meanwhile, our AdaCM is ultrafast and can process a 4K resolution image in 6ms on one V100 GPU.
translated by 谷歌翻译
With the advent of Neural Style Transfer (NST), stylizing an image has become quite popular. A convenient way for extending stylization techniques to videos is by applying them on a per-frame basis. However, such per-frame application usually lacks temporal-consistency expressed by undesirable flickering artifacts. Most of the existing approaches for enforcing temporal-consistency suffers from one or more of the following drawbacks. They (1) are only suitable for a limited range of stylization techniques, (2) can only be applied in an offline fashion requiring the complete video as input, (3) cannot provide consistency for the task of stylization, or (4) do not provide interactive consistency-control. Note that existing consistent video-filtering approaches aim to completely remove flickering artifacts and thus do not respect any specific consistency-control aspect. For stylization tasks, however, consistency-control is an essential requirement where a certain amount of flickering can add to the artistic look and feel. Moreover, making this control interactive is paramount from a usability perspective. To achieve the above requirements, we propose an approach that can stylize video streams while providing interactive consistency-control. Apart from stylization, our approach also supports various other image processing filters. For achieving interactive performance, we develop a lite optical-flow network that operates at 80 Frames per second (FPS) on desktop systems with sufficient accuracy. We show that the final consistent video-output using our flow network is comparable to that being obtained using state-of-the-art optical-flow network. Further, we employ an adaptive combination of local and global consistent features and enable interactive selection between the two. By objective and subjective evaluation, we show that our method is superior to state-of-the-art approaches.
translated by 谷歌翻译
We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quan-titative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region. Code: https://github.com/ZPdesu/SEAN .
translated by 谷歌翻译
最近求解深卷积神经网络(CNNS)内的光致风格转移的技术通常需要大规模数据集的密集训练,从而具有有限的适用性和揭示图像或风格的普遍性能力差。为了克服这一点,我们提出了一种新颖的框架,称为深度翻译(DTP),通过对给定输入图像对的测试时间训练来实现光致风格转移,与未经培训的网络一起学习特定于图像对的翻译,从而更好地产生性能和泛化。为风格转移进行此类测试时间培训量身定制,我们提出了新颖的网络架构,具有两个对应和生成模块的子模块,以及由对比含量,样式和循环一致性损耗组成的损耗功能。我们的框架不需要离线培训阶段进行风格转移,这是现有方法中的主要挑战之一,但网络将在测试期间仅了解。实验结果证明我们的框架具有更好的概念图像对的概括能力,甚至优于最先进的方法。
translated by 谷歌翻译
图像颜色协调算法旨在自动匹配在不同条件下捕获的前景图像的颜色分布和背景图像。以前的基于深度学习的模型忽略了两个对于实际应用至关重要的问题,即高分辨率(HR)图像处理和模型的可理解性。在本文中,我们提出了一个新型的深层综合颜色滤波器(DCCF)学习框架,用于高分辨率图像协调。具体而言,DCCF首先将原始输入图像列为其低分辨率(LR)对抗零件,然后以端到端的方式学习四个人类可理解的神经过滤器(即色相,饱和,饱和,价值和细心的渲染过滤器),最终以将这些过滤器应用于原始输入图像以获得统一的结果。从可理解的神经过滤器中受益,我们可以为用户提供一个简单而有效的处理程序,以便用户与Deep Model合作,以便在必要时很少努力获得所需的结果。广泛的实验证明了DCCF学习框架的有效性,并且它在IHARMONY4数据集上的最先进的后处理方法优于图像的全分辨率,分别在MSE和PSNR上实现了7.63%和1.69%的相对改进,从而超过了图像的全分辨率。
translated by 谷歌翻译
从无监督的图像到图像翻译的角度来看,图像漫画化最近由生成对抗网络(gan)主导,其中固有的挑战是精确捕获和充分传递的特征动画片样式(例如,透明边缘,光滑的色彩,抽象,抽象,抽象,抽象,抽象精细的结构等)。现有的高级模型试图通过学习以对抗性来促进边缘,引入样式转移损失或学习从多个表示空间保持一致的样式来增强卡通化效果。本文表明,只有基本的对抗性损失,可以轻松实现更独特和生动的漫画化效果。观察卡通风格在卡通纹理效果的本地图像区域中更为明显,我们与正常图像级平行建立了一个区域级别的对抗学习分支,该分支在卡通质量级别上限制了对抗性学习,以更好感知和转移卡通纹理功能。为此,提出了一种新型的卡通纹理 - 效果 - 采访器(CTSS)模块,以从训练数据中动态采样卡通纹理质量贴片。通过广泛的实验,我们证明了对抗性学习中的纹理显着性适应性注意力,作为图像漫画化中相关方法的缺失成分,在促进和增强图像卡通风格方面至关重要,尤其是对于高分辨率输入图片。
translated by 谷歌翻译
随着深度学习(DL)的出现,超分辨率(SR)也已成为一个蓬勃发展的研究领域。然而,尽管结果有希望,但该领域仍然面临需要进一步研究的挑战,例如,允许灵活地采样,更有效的损失功能和更好的评估指标。我们根据最近的进步来回顾SR的域,并检查最新模型,例如扩散(DDPM)和基于变压器的SR模型。我们对SR中使用的当代策略进行了批判性讨论,并确定了有前途但未开发的研究方向。我们通过纳入该领域的最新发展,例如不确定性驱动的损失,小波网络,神经体系结构搜索,新颖的归一化方法和最新评估技术来补充先前的调查。我们还为整章中的模型和方法提供了几种可视化,以促进对该领域趋势的全球理解。最终,这篇综述旨在帮助研究人员推动DL应用于SR的界限。
translated by 谷歌翻译
图像转换是一类视觉和图形问题,其目标是学习输入图像和输出图像之间的映射,在深神网络的背景下迅速发展。在计算机视觉(CV)中,许多问题可以被视为图像转换任务,例如语义分割和样式转移。这些作品具有不同的主题和动机,使图像转换任务蓬勃发展。一些调查仅回顾有关样式转移或图像到图像翻译的研究,所有这些都只是图像转换的一个分支。但是,没有一项调查总结这些调查在我们最佳知识的统一框架中共同起作用。本文提出了一个新颖的学习框架,包括独立学习,指导学习和合作学习,称为IGC学习框架。我们讨论的图像转换主要涉及有关深神经网络的一般图像到图像翻译和样式转移。从这个框架的角度来看,我们回顾了这些子任务,并对各种情况进行统一的解释。我们根据相似的开发趋势对图像转换的相关子任务进行分类。此外,已经进行了实验以验证IGC学习的有效性。最后,讨论了新的研究方向和开放问题,以供将来的研究。
translated by 谷歌翻译
建筑摄影是一种摄影类型,重点是捕获前景中带有戏剧性照明的建筑物或结构。受图像到图像翻译方法的成功启发,我们旨在为建筑照片执行风格转移。但是,建筑摄影中的特殊构图对这类照片中的样式转移构成了巨大挑战。现有的神经风格转移方法将建筑图像视为单个实体,它将产生与原始建筑的几何特征,产生不切实际的照明,错误的颜色演绎以及可视化伪影,例如幽灵,外观失真或颜色不匹配。在本文中,我们专门针对建筑摄影的神经风格转移方法。我们的方法解决了两个分支神经网络中建筑照片中前景和背景的组成,该神经网络分别考虑了前景和背景的样式转移。我们的方法包括一个分割模块,基于学习的图像到图像翻译模块和图像混合优化模块。我们使用了一天中不同的魔术时代捕获的不受限制的户外建筑照片的新数据集培训了图像到图像的翻译神经网络,利用其他语义信息,以更好地匹配和几何形状保存。我们的实验表明,我们的方法可以在前景和背景上产生逼真的照明和颜色演绎,并且在定量和定性上都优于一般图像到图像转换和任意样式转移基线。我们的代码和数据可在https://github.com/hkust-vgd/architectural_style_transfer上获得。
translated by 谷歌翻译
Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network.
translated by 谷歌翻译
We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.
translated by 谷歌翻译
我们呈现SeveryGan,一种能够从单个输入示例自动生成砖纹理映射的方法。与大多数现有方法相比,专注于解决合成问题,我们的工作同时解决问题,合成和涤纶性。我们的关键思想是认识到,通过越野落扩展技术训练的生成网络内的潜伏空间产生具有在接缝交叉点的连续性的输出,然后可以通过裁剪中心区域进入彩色图像。由于不是潜在空间的每个值都有有效的来产生高质量的输出,因此我们利用鉴别者作为能够在采样过程中识别无伪纹理的感知误差度量。此外,与之前的深度纹理合成的工作相比,我们的模型设计和优化,以便使用多层纹理表示,使由多个地图组成的纹理,例如Albedo,法线等。我们广泛地测试网络的设计选择架构,丢失功能和采样参数。我们在定性和定量上展示我们的方法优于以前的方法和适用于不同类型的纹理。
translated by 谷歌翻译
图像增强旨在通过修饰颜色和音调来提高照片的美学视觉质量,并且是专业数字摄影的必不可少的技术。近年来,基于学习的图像增强算法已达到有希望的表现,并吸引了日益普及。但是,典型的努力试图为所有像素的颜色转换构建一个均匀的增强子。它忽略了对照片重要的不同内容(例如,天空,海洋等)之间的像素差异,从而导致结果不令人满意。在本文中,我们提出了一个新颖的可学习背景知觉的4维查找表(4D LUT),该表通过适应性地学习照片上下文来实现每个图像中不同内容的增强。特别是,我们首先引入一个轻量级上下文编码器和一个参数编码器,以分别学习像素级类别的上下文图和一组图像自适应系数。然后,通过通过系数集成多个基础4D LUT来生成上下文感知的4D LUT。最后,可以通过将源图像和上下文图馈入融合的上下文感知的4D〜LUT来获得增强的图像。与传统的3D LUT(即RGB映射到RGB)相比,通常用于摄像机成像管道系统或工具,4D LUT,即RGBC(RGB+上下文)映射到RGB,可实现具有不同像素的颜色转换的最佳控制每个图像中的内容,即使它们具有相同的RGB值。实验结果表明,我们的方法在广泛使用的基准中优于其他最先进的方法。
translated by 谷歌翻译
Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while keeping photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part and a style restoration part. The style removal network removes the original image styles, and the style restoration network recovers image styles in a supervised manner. Meanwhile, to address the problems in current feature transformation methods, we propose decoupled instance normalization to decompose feature transformation into style whitening and restylization. It works quite well in ColoristaNet and can transfer image styles efficiently while keeping photorealism. To ensure temporal coherency, we also incorporate optical flow methods and ConvLSTM to embed contextual information. Experiments demonstrates that ColoristaNet can achieve better stylization effects when compared with state-of-the-art algorithms.
translated by 谷歌翻译
超级分辨率(SR)是低级视觉区域的基本和代表任务。通常认为,从SR网络中提取的特征没有特定的语义信息,并且网络只能从输入到输出中学习复杂的非线性映射。我们可以在SR网络中找到任何“语义”吗?在本文中,我们为此问题提供了肯定的答案。通过分析具有维度降低和可视化的特征表示,我们成功地发现了SR网络中的深度语义表示,\ Texit {i.},深度劣化表示(DDR),其与图像劣化类型和度数相关。我们还揭示了分类和SR网络之间的表示语义的差异。通过广泛的实验和分析,我们得出一系列观测和结论,对未来的工作具有重要意义,例如解释低级CNN网络的内在机制以及开发盲人SR的新评估方法。
translated by 谷歌翻译
With the development of convolutional neural networks, hundreds of deep learning based dehazing methods have been proposed. In this paper, we provide a comprehensive survey on supervised, semi-supervised, and unsupervised single image dehazing. We first discuss the physical model, datasets, network modules, loss functions, and evaluation metrics that are commonly used. Then, the main contributions of various dehazing algorithms are categorized and summarized. Further, quantitative and qualitative experiments of various baseline methods are carried out. Finally, the unsolved issues and challenges that can inspire the future research are pointed out. A collection of useful dehazing materials is available at \url{https://github.com/Xiaofeng-life/AwesomeDehazing}.
translated by 谷歌翻译