We propose a deep learning method for single image superresolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) [15] that takes the lowresolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage.
translated by 谷歌翻译
Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image superresolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.
translated by 谷歌翻译
We present a highly accurate single-image superresolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification [19]. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual information over large image regions is exploited in an efficient way. With very deep networks, however, convergence speed becomes a critical issue during training. We propose a simple yet effective training procedure. We learn residuals only and use extremely high learning rates (10 4 times higher than SRCNN [6]) enabled by adjustable gradient clipping. Our proposed method performs better than existing methods in accuracy and visual improvements in our results are easily noticeable.
translated by 谷歌翻译
Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.
translated by 谷歌翻译
自从Dong等人的第一个成功以来,基于深度学习的方法已在单像超分辨率领域中占主导地位。这取代了使用深神经网络的传统基于稀疏编码方法的所有手工图像处理步骤。与明确创建高/低分辨率词典的基于稀疏编码的方法相反,基于深度学习的方法中的词典被隐式地作为多种卷积的非线性组合被隐式获取。基于深度学习方法的缺点是,它们的性能因与训练数据集(室外图像)不同的图像而降低。我们提出了一个带有深层字典(SRDD)的端到端超分辨率网络,在该网络中,高分辨率词典在不牺牲深度学习优势的情况下明确学习。广泛的实验表明,高分辨率词典的显式学习使网络在维持内域测试图像的性能的同时更加强大。
translated by 谷歌翻译
Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image superresolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4× upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.
translated by 谷歌翻译
Discriminative model learning for image denoising has been recently attracting considerable attentions due to its favorable denoising performance. In this paper, we take one step forward by investigating the construction of feed-forward denoising convolutional neural networks (DnCNNs) to embrace the progress in very deep architecture, learning algorithm, and regularization method into image denoising. Specifically, residual learning and batch normalization are utilized to speed up the training process as well as boost the denoising performance. Different from the existing discriminative denoising models which usually train a specific model for additive white Gaussian noise (AWGN) at a certain noise level, our DnCNN model is able to handle Gaussian denoising with unknown noise level (i.e., blind Gaussian denoising). With the residual learning strategy, DnCNN implicitly removes the latent clean image in the hidden layers. This property motivates us to train a single DnCNN model to tackle with several general image denoising tasks such as Gaussian denoising, single image super-resolution and JPEG image deblocking. Our extensive experiments demonstrate that our DnCNN model can not only exhibit high effectiveness in several general image denoising tasks, but also be efficiently implemented by benefiting from GPU computing.
translated by 谷歌翻译
In this paper we use sparse-representation modeling for the single image scale-up problem. The goal is to recover an original image from its blurred and down-scaled noisy version. Since this problem is highly ill-posed, a prior is needed in order to solve it in a robust fashion. The literature offers various ways to address this problem, ranging from simple linear space-invariant interpolation schemes (e.g., bicubic interpolation), to spatially adaptive and non-linear filters of various sorts.In this paper, we embark from a recently-proposed algorithm by Yang et. al. [1,2], and similarly assume a local Sparse-Land model on image patches, thus stabilizing the problem. We introduce several important modifications to the above-mentioned solution, and show that these lead to improved results. These modifications include a major simplification of the overall process both in terms of the computational complexity and the algorithm architecture, using a different training approach for the dictionary-pair, and operating without a training-set by boot-strapping the scale-up task from the given low-resolution image. We demonstrate the results on true images, showing both visual and PSNR improvements.
translated by 谷歌翻译
Single image super-resolution is the task of inferring a high-resolution image from a single low-resolution input. Traditionally, the performance of algorithms for this task is measured using pixel-wise reconstruction measures such as peak signal-to-noise ratio (PSNR) which have been shown to correlate poorly with the human perception of image quality. As a result, algorithms minimizing these metrics tend to produce over-smoothed images that lack highfrequency textures and do not look natural despite yielding high PSNR values.We propose a novel application of automated texture synthesis in combination with a perceptual loss focusing on creating realistic textures rather than optimizing for a pixelaccurate reproduction of ground truth images during training. By using feed-forward fully convolutional neural networks in an adversarial training setting, we achieve a significant boost in image quality at high magnification ratios. Extensive experiments on a number of datasets show the effectiveness of our approach, yielding state-of-the-art results in both quantitative and qualitative benchmarks.
translated by 谷歌翻译
We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). Our network has a very deep recursive layer (up to 16 recursions). Increasing recursion depth can improve performance without introducing new parameters for additional convolutions. Albeit advantages, learning a DRCN is very hard with a standard gradient descent method due to exploding/vanishing gradients. To ease the difficulty of training, we propose two extensions: recursive-supervision and skip-connection. Our method outperforms previous methods by a large margin.
translated by 谷歌翻译
Recent research on super-resolution has progressed with the development of deep convolutional neural networks (DCNN). In particular, residual learning techniques exhibit improved performance. In this paper, we develop an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods. The significant performance improvement of our model is due to optimization by removing unnecessary modules in conventional residual networks. The performance is further improved by expanding the model size while we stabilize the training procedure. We also propose a new multi-scale deep super-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model. The proposed methods show superior performance over the state-of-the-art methods on benchmark datasets and prove its excellence by winning the NTIRE2017 Super-Resolution Challenge [26].
translated by 谷歌翻译
卷积神经网络(CNNS)成功地进行了压缩图像感测。然而,由于局部性和重量共享的归纳偏差,卷积操作证明了建模远程依赖性的内在限制。变压器,最初作为序列到序列模型设计,在捕获由于基于自我关注的架构而捕获的全局背景中,即使它可以配备有限的本地化能力。本文提出了一种混合框架,一个混合框架,其集成了从CNN提供的借用的优点以及变压器提供的全局上下文,以获得增强的表示学习。所提出的方法是由自适应采样和恢复组成的端到端压缩图像感测方法。在采样模块中,通过学习的采样矩阵测量图像逐块。在重建阶段,将测量投射到双杆中。一个是用于通过卷积建模邻域关系的CNN杆,另一个是用于采用全球自我关注机制的变压器杆。双分支结构是并发,并且本地特征和全局表示在不同的分辨率下融合,以最大化功能的互补性。此外,我们探索一个渐进的战略和基于窗口的变压器块,以降低参数和计算复杂性。实验结果表明了基于专用变压器的架构进行压缩感测的有效性,与不同数据集的最先进方法相比,实现了卓越的性能。
translated by 谷歌翻译
Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, superresolution, and inpainting. Furthermore, the same prior can be used to invert deep neural representations to diagnose them, and to restore images based on flash-no flash input pairs.
translated by 谷歌翻译
Informative features play a crucial role in the single image super-resolution task. Channel attention has been demonstrated to be effective for preserving information-rich features in each layer. However, channel attention treats each convolution layer as a separate process that misses the correlation among different layers. To address this problem, we propose a new holistic attention network (HAN), which consists of a layer attention module (LAM) and a channel-spatial attention module (CSAM), to model the holistic interdependencies among layers, channels, and positions. Specifically, the proposed LAM adaptively emphasizes hierarchical features by considering correlations among layers. Meanwhile, CSAM learns the confidence at all the positions of each channel to selectively capture more informative features. Extensive experiments demonstrate that the proposed HAN performs favorably against the state-ofthe-art single image super-resolution approaches.
translated by 谷歌翻译
We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.
translated by 谷歌翻译
Deep neural networks provide unprecedented performance gains in many real world problems in signal and image processing. Despite these gains, future development and practical deployment of deep networks is hindered by their blackbox nature, i.e., lack of interpretability, and by the need for very large training sets. An emerging technique called algorithm unrolling or unfolding offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are used widely in signal processing and deep neural networks. Unrolling methods were first proposed to develop fast neural network approximations for sparse coding. More recently, this direction has attracted enormous attention and is rapidly growing both in theoretic investigations and practical applications. The growing popularity of unrolled deep networks is due in part to their potential in developing efficient, high-performance and yet interpretable network architectures from reasonable size training sets. In this article, we review algorithm unrolling for signal and image processing. We extensively cover popular techniques for algorithm unrolling in various domains of signal and image processing including imaging, vision and recognition, and speech processing. By reviewing previous works, we reveal the connections between iterative algorithms and neural networks and present recent theoretical results. Finally, we provide a discussion on current limitations of unrolling and suggest possible future research directions.
translated by 谷歌翻译
单像超分辨率(SISR),作为传统的不良反对问题,通过最近的卷积神经网络(CNN)的发展得到了极大的振兴。这些基于CNN的方法通常将低分辨率图像映射到其相应的高分辨率版本,具有复杂的网络结构和损耗功能,显示出令人印象深刻的性能。本文对传统的SISR算法提供了新的洞察力,并提出了一种基本上不同的方法,依赖于迭代优化。提出了一种新颖的迭代超分辨率网络(ISRN),顶部是迭代优化。我们首先分析图像SR问题的观察模型,通过以更一般和有效的方式模仿和融合每次迭代来激发可行的解决方案。考虑到批量归一化的缺点,我们提出了一种特征归一化(F-NOM,FN)方法来调节网络中的功能。此外,开发了一种具有FN的新颖块以改善作为FNB称为FNB的网络表示。剩余剩余结构被提出形成一个非常深的网络,其中FNBS与长时间跳过连接,以获得更好的信息传递和稳定训练阶段。对BICUBIC(BI)降解的测试基准的广泛实验结果表明我们的ISRN不仅可以恢复更多的结构信息,而且还可以获得竞争或更好的PSNR / SSIM结果,与其他作品相比,参数更少。除BI之外,我们除了模拟模糊(BD)和低级噪声(DN)的实际降级。 ISRN及其延伸ISRN +两者都比使用BD和DN降级模型的其他产品更好。
translated by 谷歌翻译
Model-based optimization methods and discriminative learning methods have been the two dominant strategies for solving various inverse problems in low-level vision. Typically, those two kinds of methods have their respective merits and drawbacks, e.g., model-based optimization methods are flexible for handling different inverse problems but are usually time-consuming with sophisticated priors for the purpose of good performance; in the meanwhile, discriminative learning methods have fast testing speed but their application range is greatly restricted by the specialized task. Recent works have revealed that, with the aid of variable splitting techniques, denoiser prior can be plugged in as a modular part of model-based optimization methods to solve other inverse problems (e.g., deblurring). Such an integration induces considerable advantage when the denoiser is obtained via discriminative learning. However, the study of integration with fast discriminative denoiser prior is still lacking. To this end, this paper aims to train a set of fast and effective CNN (convolutional neural network) denoisers and integrate them into model-based optimization method to solve other inverse problems. Experimental results demonstrate that the learned set of denoisers not only achieve promising Gaussian denoising results but also can be used as prior to deliver good performance for various low-level vision applications.
translated by 谷歌翻译
The feed-forward architectures of recently proposed deep super-resolution networks learn representations of low-resolution inputs, and the non-linear mapping from those to high-resolution output. However, this approach does not fully address the mutual dependencies of low-and high-resolution images. We propose Deep Back-Projection Networks (DBPN), that exploit iterative up-and downsampling layers, providing an error feedback mechanism for projection errors at each stage. We construct mutuallyconnected up-and down-sampling stages each of which represents different types of image degradation and highresolution components. We show that extending this idea to allow concatenation of features across up-and downsampling stages (Dense DBPN) allows us to reconstruct further improve super-resolution, yielding superior results and in particular establishing new state of the art results for large scaling factors such as 8× across multiple data sets.
translated by 谷歌翻译
四分之一采样和四分之三抽样是新型传感器概念,可实现高分辨率图像而无需增加像素的数量。这是通过非规范覆盖低分辨率传感器的每个像素的一部分来实现的,使每个像素的传感器区域的一个象限或三个象限对光敏感。与使用低分辨率传感器和随后的更新采样相比,结合了正确设计的面膜和高质量的重建算法,可以实现更高的图像质量。对于后一种情况,可以使用超级分辨率网络(VDSR)等超级分辨率算法进一步增强图像质量。在本文中,我们提出了一个新型的端到端神经网络,以从非规范采样的传感器数据中重建高分辨率图像。该网络是本地完全连接的重建网络(LFCR)和标准VDSR网络的串联。总的来说,使用我们的新型神经网络布局,使用四分之三的采样传感器,与最先进的方法相比,URBAN100数据集的PSNR图像质量可以增加2.96 dB。与使用VDSR的低分辨率传感器相比,获得1.11 dB的增益。
translated by 谷歌翻译