现有的感知相似性指标假定图像及其参考非常适合。结果,这些指标通常对人眼无法察觉的小对齐误差敏感。本文研究了微小的未对准的影响,特别是输入图像和参考图像之间对现有指标的微小变化,并因此发展了耐转移的相似性度量。本文以LPIP为基础,这是一种广泛使用的知觉相似性度量,并探索了建筑设计注意事项,以使其可抵抗不可察觉的未对准。具体而言,我们研究了广泛的神经网络元素,例如抗异化滤波,汇总,跨性别,填充和跳过连接,并讨论它们在制作强大度量方面的作用。根据我们的研究,我们开发了一种新的基于神经网络的知觉相似性度量。我们的实验表明,我们的指标宽容不可察觉的转变,同时与人类的相似性判断一致。
translated by 谷歌翻译
图像之间的感知距离在预训练的深度特征的空间中测量,在评估图像相似性方面优于先前的低级,基于像素的指标。虽然众所周知,较旧模型(例如Alexnet和VGG)捕获感知相似性的功能却较少,但研究了现代和更准确的模型。在本文中,我们提出了一项大规模的经验研究,以评估成像网分类器在感知相似性方面的表现。首先,我们观察到成像网的精度与现代网络(例如重置,有效网络和视觉变压器)的感知得分之间的反相关性:更好的分类器达到了较差的感知得分。然后,我们在不同的深度,宽度,训练步骤,重量衰减,标签平滑和辍学时检查了成像网的精度/感知分数关系。更高的精度将感知得分提高到一定点,但是我们在中高精度方面发现了精度和感知得分之间的帕累托前沿。我们使用许多合理的假设,例如失真不变性,空间频率灵敏度和替代感知函数,进一步探索这种关系。有趣的是,我们发现仅在Imagenet上接受少于5个时代训练的浅重新收集和重新注册,其新兴的感知得分与直接受到监督的人类感知判断直接训练的先前最佳网络相匹配。
translated by 谷歌翻译
衡量图像的相似性是计算机视觉的基本问题,不存在通用解决方案。尽管已显示出像素的L2-Norm这样的简单指标,例如L2-Norm具有很大的缺陷,但它们仍然受欢迎。一组最新的最新指标减轻了其中一些缺陷是深度的知觉相似性(DPS)指标,其中将相似性评估为神经网络深度特征的距离。但是,DPS指标本身还没有彻底检查其利益,尤其是其缺陷。这项工作研究了最常见的DPS度量,其中通过空间位置进行了比较的深度特征,并比较了平均和排序的深度特征。对指标进行了深入分析,以通过使用专门挑战它们的图像来了解指标的优势和劣势。这项工作为DPS的缺陷提供了新的见解,并进一步提出了对指标的改进。这项工作的实施可在线获得:https://github.com/guspih/deep_perceptual_similarity_analysis/
translated by 谷歌翻译
Given a grayscale photograph as input, this paper attacks the problem of hallucinating a plausible color version of the photograph. This problem is clearly underconstrained, so previous approaches have either relied on significant user interaction or resulted in desaturated colorizations. We propose a fully automatic approach that produces vibrant and realistic colorizations. We embrace the underlying uncertainty of the problem by posing it as a classification task and use class-rebalancing at training time to increase the diversity of colors in the result. The system is implemented as a feed-forward pass in a CNN at test time and is trained on over a million color images. We evaluate our algorithm using a "colorization Turing test," asking human participants to choose between a generated and ground truth color image. Our method successfully fools humans on 32% of the trials, significantly higher than previous methods. Moreover, we show that colorization can be a powerful pretext task for self-supervised feature learning, acting as a cross-channel encoder. This approach results in state-of-the-art performance on several feature learning benchmarks.
translated by 谷歌翻译
图像质量评估(IQA)指标被广泛用于定量估计一些形成,恢复,转换或增强算法后图像降解的程度。我们提出了Pytorch图像质量(PIQ),这是一个以可用性为中心的库,其中包含最受欢迎的现代IQA算法,并保证根据其原始命题正确实现并进行了彻底验证。在本文中,我们详细介绍了图书馆基础背后的原则,描述了使其可靠的评估策略,提供了展示性能时间权衡的基准,并强调了GPU加速的好处Pytorch后端。Pytorch图像质量是一个开源软件:https://github.com/photosynthesis-team/piq/。
translated by 谷歌翻译
图像质量评估(IQA)是图像处理任务(例如压缩)的基本指标。使用了全参考iQA,使用了传统的智商,例如PSNR和SSIM。最近,还使用了基于深神经网络(深IQA)的IQA,例如LPIPS和DIST。众所周知,图像缩放在深IQA中是不一致的,因为有些则在预处理中执行下降,而另一些则使用原始图像大小。在本文中,我们表明图像量表是影响深度IQA性能的影响因素。我们在同一五个数据集上全面评估了四个深IQA,实验结果表明,图像量表会显着影响IQA性能。我们发现,最合适的图像量表通常既不是默认尺寸也不是原始大小,并且选择取决于所使用的方法和数据集。我们看到了稳定性,发现PIEAPP是四个深IQA中最稳定的。
translated by 谷歌翻译
Quantifying the perceptual similarity of two images is a long-standing problem in low-level computer vision. The natural image domain commonly relies on supervised learning, e.g., a pre-trained VGG, to obtain a latent representation. However, due to domain shift, pre-trained models from the natural image domain might not apply to other image domains, such as medical imaging. Notably, in medical imaging, evaluating the perceptual similarity is exclusively performed by specialists trained extensively in diverse medical fields. Thus, medical imaging remains devoid of task-specific, objective perceptual measures. This work answers the question: Is it necessary to rely on supervised learning to obtain an effective representation that could measure perceptual similarity, or is self-supervision sufficient? To understand whether recent contrastive self-supervised representation (CSR) may come to the rescue, we start with natural images and systematically evaluate CSR as a metric across numerous contemporary architectures and tasks and compare them with existing methods. We find that in the natural image domain, CSR behaves on par with the supervised one on several perceptual tests as a metric, and in the medical domain, CSR better quantifies perceptual similarity concerning the experts' ratings. We also demonstrate that CSR can significantly improve image quality in two image synthesis tasks. Finally, our extensive results suggest that perceptuality is an emergent property of CSR, which can be adapted to many image domains without requiring annotations.
translated by 谷歌翻译
Single image super-resolution is the task of inferring a high-resolution image from a single low-resolution input. Traditionally, the performance of algorithms for this task is measured using pixel-wise reconstruction measures such as peak signal-to-noise ratio (PSNR) which have been shown to correlate poorly with the human perception of image quality. As a result, algorithms minimizing these metrics tend to produce over-smoothed images that lack highfrequency textures and do not look natural despite yielding high PSNR values.We propose a novel application of automated texture synthesis in combination with a perceptual loss focusing on creating realistic textures rather than optimizing for a pixelaccurate reproduction of ground truth images during training. By using feed-forward fully convolutional neural networks in an adversarial training setting, we achieve a significant boost in image quality at high magnification ratios. Extensive experiments on a number of datasets show the effectiveness of our approach, yielding state-of-the-art results in both quantitative and qualitative benchmarks.
translated by 谷歌翻译
现有的基于深度学习的全参考IQA(FR-IQA)模型通常通过明确比较特征,以确定性的方式预测图像质量,从而衡量图像严重扭曲的图像是多远,相应的功能与参考的空间相对远。图片。本文中,我们从不同的角度看这个问题,并提议从统计分布的角度对知觉空间中的质量降解进行建模。因此,根据深度特征域中的Wasserstein距离来测量质量。更具体地说,根据执行最终质量评分,测量了预训练VGG网络的每个阶段的1Dwasserstein距离。 Deep Wasserstein距离(DEEPWSD)在神经网络的功能上执行的,可以更好地解释由各种扭曲引起的质量污染,并提出了高级质量预测能力。广泛的实验和理论分析表明,在质量预测和优化方面,提出的DEEPWSD的优越性。
translated by 谷歌翻译
随着脑成像技术和机器学习工具的出现,很多努力都致力于构建计算模型来捕获人脑中的视觉信息的编码。最具挑战性的大脑解码任务之一是通过功能磁共振成像(FMRI)测量的脑活动的感知自然图像的精确重建。在这项工作中,我们调查了来自FMRI的自然图像重建的最新学习方法。我们在架构设计,基准数据集和评估指标方面检查这些方法,并在标准化评估指标上呈现公平的性能评估。最后,我们讨论了现有研究的优势和局限,并提出了潜在的未来方向。
translated by 谷歌翻译
This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image and train a convolutional neural net to predict the position of the second patch relative to the first. We argue that doing well on this task requires the model to learn to recognize objects and their parts. We demonstrate that the feature representation learned using this within-image context indeed captures visual similarity across images. For example, this representation allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Furthermore, we show that the learned ConvNet can be used in the R-CNN framework [21] and provides a significant boost over a randomly-initialized ConvNet, resulting in state-of-theart performance among algorithms which use only Pascalprovided training set annotations.
translated by 谷歌翻译
Image quality assessment (IQA) forms a natural and often straightforward undertaking for humans, yet effective automation of the task remains highly challenging. Recent metrics from the deep learning community commonly compare image pairs during training to improve upon traditional metrics such as PSNR or SSIM. However, current comparisons ignore the fact that image content affects quality assessment as comparisons only occur between images of similar content. This restricts the diversity and number of image pairs that the model is exposed to during training. In this paper, we strive to enrich these comparisons with content diversity. Firstly, we relax comparison constraints, and compare pairs of images with differing content. This increases the variety of available comparisons. Secondly, we introduce listwise comparisons to provide a holistic view to the model. By including differentiable regularizers, derived from correlation coefficients, models can better adjust predicted scores relative to one another. Evaluation on multiple benchmarks, covering a wide range of distortions and image content, shows the effectiveness of our learning scheme for training image quality assessment models.
translated by 谷歌翻译
Unsupervised visual representation learning remains a largely unsolved problem in computer vision research. Among a big body of recently proposed approaches for unsupervised learning of visual representations, a class of self-supervised techniques achieves superior performance on many challenging benchmarks. A large number of the pretext tasks for self-supervised learning have been studied, but other important aspects, such as the choice of convolutional neural networks (CNN), has not received equal attention. Therefore, we revisit numerous previously proposed self-supervised models, conduct a thorough large scale study and, as a result, uncover multiple crucial insights. We challenge a number of common practices in selfsupervised visual representation learning and observe that standard recipes for CNN design do not always translate to self-supervised representation learning. As part of our study, we drastically boost the performance of previously proposed techniques and outperform previously published state-of-the-art results by a large margin.
translated by 谷歌翻译
Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image superresolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4× upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.
translated by 谷歌翻译
基于深度学习的立体图像超分辨率(StereOSR)的最新研究促进了Stereosr的发展。但是,现有的立体声模型主要集中于改善定量评估指标,并忽略了超级分辨立体图像的视觉质量。为了提高感知性能,本文提出了第一个面向感知的立体图像超分辨率方法,通过利用反馈,这是对立体声结果的感知质量的评估提供的。为了为StereOSR模型提供准确的指导,我们开发了第一个特殊的立体图像超分辨率质量评估(StereOSRQA)模型,并进一步构建了StereOSRQA数据库。广泛的实验表明,我们的Stereosr方法显着提高了感知质量,并提高了立体声图像的可靠性以进行差异估计。
translated by 谷歌翻译
超级分辨率(SR)是低级视觉区域的基本和代表任务。通常认为,从SR网络中提取的特征没有特定的语义信息,并且网络只能从输入到输出中学习复杂的非线性映射。我们可以在SR网络中找到任何“语义”吗?在本文中,我们为此问题提供了肯定的答案。通过分析具有维度降低和可视化的特征表示,我们成功地发现了SR网络中的深度语义表示,\ Texit {i.},深度劣化表示(DDR),其与图像劣化类型和度数相关。我们还揭示了分类和SR网络之间的表示语义的差异。通过广泛的实验和分析,我们得出一系列观测和结论,对未来的工作具有重要意义,例如解释低级CNN网络的内在机制以及开发盲人SR的新评估方法。
translated by 谷歌翻译
Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
translated by 谷歌翻译
低光图像增强(LLIE)旨在提高在环境中捕获的图像的感知或解释性,较差的照明。该领域的最新进展由基于深度学习的解决方案为主,其中许多学习策略,网络结构,丢失功能,培训数据等已被采用。在本文中,我们提供了全面的调查,以涵盖从算法分类到开放问题的各个方面。为了检查现有方法的概括,我们提出了一个低光图像和视频数据集,其中图像和视频是在不同的照明条件下的不同移动电话的相机拍摄的。除此之外,我们首次提供统一的在线平台,涵盖许多流行的LLIE方法,其中结果可以通过用户友好的Web界面生产。除了在公开和我们拟议的数据集上对现有方法的定性和定量评估外,我们还验证了他们在黑暗中的脸部检测中的表现。这项调查与拟议的数据集和在线平台一起作为未来研究的参考来源和促进该研究领域的发展。拟议的平台和数据集以及收集的方法,数据集和评估指标是公开可用的,并将经常更新。
translated by 谷歌翻译
培训监督图像综合模型需要批评评论权来比较两个图像:结果的原始真相。然而,这种基本功能仍然是一个公开问题。流行的方法使用L1(平均绝对误差)丢失,或者在预先预留的深网络的像素或特征空间中。然而,我们观察到这些损失倾向于产生过度模糊和灰色的图像,以及其他技术,如GAN需要用于对抗这些伪影。在这项工作中,我们介绍了一种基于信息理论的方法来测量两个图像之间的相似性。我们认为,良好的重建应该具有较高的相互信息与地面真相。这种观点使得能够以对比的方式学习轻量级批评者以“校准”特征空间,使得相应的空间贴片的重建被置于擦除其他贴片。我们表明,当用作L1损耗的替代时,我们的配方立即提升输出图像的感知现实主义,有或没有额外的GaN丢失。
translated by 谷歌翻译