Image super-resolution (SR) serves as a fundamental tool for the processing and transmission of multimedia data. Recently, Transformer-based models have achieved competitive performances in image SR. They divide images into fixed-size patches and apply self-attention on these patches to model long-range dependencies among pixels. However, this architecture design is originated for high-level vision tasks, which lacks design guideline from SR knowledge. In this paper, we aim to design a new attention block whose insights are from the interpretation of Local Attribution Map (LAM) for SR networks. Specifically, LAM presents a hierarchical importance map where the most important pixels are located in a fine area of a patch and some less important pixels are spread in a coarse area of the whole image. To access pixels in the coarse area, instead of using a very large patch size, we propose a lightweight Global Pixel Access (GPA) module that applies cross-attention with the most similar patch in an image. In the fine area, we use an Intra-Patch Self-Attention (IPSA) module to model long-range pixel dependencies in a local patch, and then a $3\times3$ convolution is applied to process the finest details. In addition, a Cascaded Patch Division (CPD) strategy is proposed to enhance perceptual quality of recovered images. Extensive experiments suggest that our method outperforms state-of-the-art lightweight SR methods by a large margin. Code is available at https://github.com/passerer/HPINet.
translated by 谷歌翻译
通过利用大型内核分解和注意机制,卷积神经网络(CNN)可以在许多高级计算机视觉任务中与基于变压器的方法竞争。但是,由于远程建模的优势,具有自我注意力的变压器仍然主导着低级视野,包括超分辨率任务。在本文中,我们提出了一个基于CNN的多尺度注意网络(MAN),该网络由多尺度的大内核注意力(MLKA)和一个封闭式的空间注意单元(GSAU)组成,以提高卷积SR网络的性能。在我们的MLKA中,我们使用多尺度和栅极方案纠正LKA,以在各种粒度水平上获得丰富的注意图,从而共同汇总了全局和局部信息,并避免了潜在的阻塞伪像。在GSAU中,我们集成了栅极机制和空间注意力,以消除不必要的线性层和汇总信息丰富的空间环境。为了确认我们的设计的有效性,我们通过简单地堆叠不同数量的MLKA和GSAU来评估具有多种复杂性的人。实验结果表明,我们的人可以在最先进的绩效和计算之间实现各种权衡。代码可从https://github.com/icandle/man获得。
translated by 谷歌翻译
Recently, Transformer-based image restoration networks have achieved promising improvements over convolutional neural networks due to parameter-independent global interactions. To lower computational cost, existing works generally limit self-attention computation within non-overlapping windows. However, each group of tokens are always from a dense area of the image. This is considered as a dense attention strategy since the interactions of tokens are restrained in dense regions. Obviously, this strategy could result in restricted receptive fields. To address this issue, we propose Attention Retractable Transformer (ART) for image restoration, which presents both dense and sparse attention modules in the network. The sparse attention module allows tokens from sparse areas to interact and thus provides a wider receptive field. Furthermore, the alternating application of dense and sparse attention modules greatly enhances representation ability of Transformer while providing retractable attention on the input image.We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks. Experimental results validate that our proposed ART outperforms state-of-the-art methods on various benchmark datasets both quantitatively and visually. We also provide code and models at the website https://github.com/gladzhang/ART.
translated by 谷歌翻译
Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from lowquality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by up to 0.14∼0.45dB, while the total number of parameters can be reduced by up to 67%.
translated by 谷歌翻译
Informative features play a crucial role in the single image super-resolution task. Channel attention has been demonstrated to be effective for preserving information-rich features in each layer. However, channel attention treats each convolution layer as a separate process that misses the correlation among different layers. To address this problem, we propose a new holistic attention network (HAN), which consists of a layer attention module (LAM) and a channel-spatial attention module (CSAM), to model the holistic interdependencies among layers, channels, and positions. Specifically, the proposed LAM adaptively emphasizes hierarchical features by considering correlations among layers. Meanwhile, CSAM learns the confidence at all the positions of each channel to selectively capture more informative features. Extensive experiments demonstrate that the proposed HAN performs favorably against the state-ofthe-art single image super-resolution approaches.
translated by 谷歌翻译
随着深度学习的发展,单图像超分辨率(SISR)取得了重大突破。最近,已经提出了基于全局特征交互的SISR网络性能的方法。但是,需要动态地忽略对上下文的响应的神经元的功能。为了解决这个问题,我们提出了一个轻巧的交叉障碍性推理网络(CFIN),这是一个由卷积神经网络(CNN)和变压器组成的混合网络。具体而言,一种新型的交叉磁场导向变压器(CFGT)旨在通过使用调制卷积内核与局部代表性语义信息结合来自适应修改网络权重。此外,提出了基于CNN的跨尺度信息聚合模块(CIAM),以使模型更好地专注于潜在的实用信息并提高变压器阶段的效率。广泛的实验表明,我们提出的CFIN是一种轻巧有效的SISR模型,可以在计算成本和模型性能之间达到良好的平衡。
translated by 谷歌翻译
Convolutional neural network (CNN) depth is of crucial importance for image super-resolution (SR). However, we observe that deeper networks for image SR are more difficult to train. The lowresolution inputs and features contain abundant low-frequency information, which is treated equally across channels, hence hindering the representational ability of CNNs. To solve these problems, we propose the very deep residual channel attention networks (RCAN). Specifically, we propose a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections. Each residual group contains some residual blocks with short skip connections. Meanwhile, RIR allows abundant low-frequency information to be bypassed through multiple skip connections, making the main network focus on learning high-frequency information. Furthermore, we propose a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels. Extensive experiments show that our RCAN achieves better accuracy and visual improvements against state-of-the-art methods.
translated by 谷歌翻译
Recently, great progress has been made in single-image super-resolution (SISR) based on deep learning technology. However, the existing methods usually require a large computational cost. Meanwhile, the activation function will cause some features of the intermediate layer to be lost. Therefore, it is a challenge to make the model lightweight while reducing the impact of intermediate feature loss on the reconstruction quality. In this paper, we propose a Feature Interaction Weighted Hybrid Network (FIWHN) to alleviate the above problem. Specifically, FIWHN consists of a series of novel Wide-residual Distillation Interaction Blocks (WDIB) as the backbone, where every third WDIBs form a Feature shuffle Weighted Group (FSWG) by mutual information mixing and fusion. In addition, to mitigate the adverse effects of intermediate feature loss on the reconstruction results, we introduced a well-designed Wide Convolutional Residual Weighting (WCRW) and Wide Identical Residual Weighting (WIRW) units in WDIB, and effectively cross-fused features of different finenesses through a Wide-residual Distillation Connection (WRDC) framework and a Self-Calibrating Fusion (SCF) unit. Finally, to complement the global features lacking in the CNN model, we introduced the Transformer into our model and explored a new way of combining the CNN and Transformer. Extensive quantitative and qualitative experiments on low-level and high-level tasks show that our proposed FIWHN can achieve a good balance between performance and efficiency, and is more conducive to downstream tasks to solve problems in low-pixel scenarios.
translated by 谷歌翻译
As the quality of optical sensors improves, there is a need for processing large-scale images. In particular, the ability of devices to capture ultra-high definition (UHD) images and video places new demands on the image processing pipeline. In this paper, we consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution. We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms. As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method. The core components of LLFormer are the axis-based multi-head self-attention and cross-layer attention fusion block, which significantly reduces the linear complexity. Extensive experiments on the new dataset and existing public datasets show that LLFormer outperforms state-of-the-art methods. We also show that employing existing LLIE methods trained on our benchmark as a pre-processing step significantly improves the performance of downstream tasks, e.g., face detection in low-light conditions. The source code and pre-trained models are available at https://github.com/TaoWangzj/LLFormer.
translated by 谷歌翻译
基于深度卷积神经网络(CNN)的单图像飞机方法已取得了重大成功。以前的方法致力于通过增加网络的深度和宽度来改善网络的性能。当前的方法着重于增加卷积内核的大小,以通过受益于更大的接受场来增强其性能。但是,直接增加卷积内核的大小会引入大量计算开销和参数。因此,本文设计了一个新型的大内核卷积驱动块(LKD块),该磁带(LKD块)由分解深度大核卷积块(DLKCB)和通道增强的进料前向前网络(CEFN)组成。设计的DLKCB可以将深度大的内核卷积分为较小的深度卷积和深度扩张的卷积,而无需引入大量参数和计算开销。同时,设计的CEFN将通道注意机制纳入馈电网络中,以利用重要的通道并增强鲁棒性。通过组合多个LKD块和上向下的采样模块,可以进行大内核卷积DeHaze网络(LKD-NET)。评估结果证明了设计的DLKCB和CEFN的有效性,而我们的LKD-NET优于最先进的功能。在SOTS室内数据集上,我们的LKD-NET极大地优于基于变压器的方法Dehamer,只有1.79%#PARAM和48.9%的FLOPS。我们的LKD-NET的源代码可在https://github.com/swu-cs-medialab/lkd-net上获得。
translated by 谷歌翻译
基于变压器的方法与基于CNN的方法相比,由于其对远程依赖性的模型,因此获得了令人印象深刻的图像恢复性能。但是,像Swinir这样的进步采用了基于窗口的和本地注意力的策略来平衡性能和计算开销,这限制了采用大型接收领域来捕获全球信息并在早期层中建立长期依赖性。为了进一步提高捕获全球信息的效率,在这项工作中,我们建议Swinfir通过更换具有整个图像范围的接收场的快速傅立叶卷积(FFC)组件来扩展Swinir。我们还重新访问其他先进技术,即数据增强,预训练和功能集合,以改善图像重建的效果。并且我们的功能合奏方法使模型的性能得以大大增强,而无需增加训练和测试时间。与现有方法相比,我们将算法应用于多个流行的大规模基准,并实现了最先进的性能。例如,我们的Swinfir在漫画109数据集上达到了32.83 dB的PSNR,该PSNR比最先进的Swinir方法高0.8 dB。
translated by 谷歌翻译
现实世界图像Denoising是一个实用的图像恢复问题,旨在从野外嘈杂的输入中获取干净的图像。最近,Vision Transformer(VIT)表现出强大的捕获远程依赖性的能力,许多研究人员试图将VIT应用于图像DeNosing任务。但是,现实世界的图像是一个孤立的框架,它使VIT构建了内部贴片的远程依赖性,该依赖性将图像分为贴片并混乱噪声模式和梯度连续性。在本文中,我们建议通过使用连续的小波滑动转换器来解决此问题,该小波滑动转换器在现实世界中构建频率对应关系,称为dnswin。具体而言,我们首先使用CNN编码器从嘈杂的输入图像中提取底部功能。 DNSWIN的关键是将高频和低频信息与功能和构建频率依赖性分开。为此,我们提出了小波滑动窗口变压器,该变压器利用离散的小波变换,自我注意力和逆离散小波变换来提取深度特征。最后,我们使用CNN解码器将深度特征重建为DeNo的图像。对现实世界的基准测试的定量和定性评估都表明,拟议的DNSWIN对最新方法的表现良好。
translated by 谷歌翻译
基于卷积神经网络的单图像超分辨率(SISR)近年来取得了很大进展。然而,由于计算和内存成本,难以将这些方法应用于现实世界场景。同时,如何充分利用中间特征在有限的参数和计算的约束下是一个巨大的挑战。为了减轻这些问题,我们提出了一种轻量级但有效的特征蒸馏交互加权网络(FDIWN)。具体地,FDIWN利用一系列专门设计的特征随机加权组(FSWG)作为骨干,以及几种新的相互宽残留蒸馏相互作用块(WDIB)形成FSWG。另外,将宽相同的残余加权(WIRW)单元和宽卷积残余加权(WCRW)单元引入WDIB以进行更好的特征蒸馏。此外,提出了一种宽残留的蒸馏连接(WRDC)框架和自校准融合(SCF)单元,以更灵活和有效地与不同的尺度相互作用。扩大实验表明,我们的FDIWN优于其他模型来攻击良好的模型模型性能与效率之间的平衡。代码可在https://github.com/iviplab/fdiwn获得。
translated by 谷歌翻译
联合超分辨率和反音调映射(联合SR-ITM)旨在增加低分辨率和标准动态范围图像的分辨率和动态范围。重点方法主要是诉诸图像分解技术,使用多支化的网络体系结构。 ,这些方法采用的刚性分解在很大程度上将其力量限制在各种图像上。为了利用其潜在能力,在本文中,我们将分解机制从图像域概括为更广泛的特征域。为此,我们提出了一个轻巧的特征分解聚合网络(FDAN)。特别是,我们设计了一个功能分解块(FDB),可以实现功能细节和对比度的可学习分离。通过级联FDB,我们可以建立一个用于强大的多级特征分解的分层功能分解组。联合SR-ITM,\ ie,SRITM-4K的新基准数据集,该数据集是大规模的,为足够的模型培训和评估提供了多功能方案。两个基准数据集的实验结果表明,我们的FDAN表明我们的FDAN有效,并且胜过了以前的方法sr-itm.ar代码和数据集将公开发布。
translated by 谷歌翻译
由于卷积神经网络(CNNS)在从大规模数据中进行了学习的可概括图像前沿执行井,因此这些模型已被广泛地应用于图像恢复和相关任务。最近,另一类神经架构,变形金刚表现出对自然语言和高级视觉任务的显着性能。虽然变压器模型减轻了CNNS的缺点(即,有限的接收领域并对输入内容而无关),但其计算复杂性以空间分辨率二次大转,因此可以对涉及高分辨率图像的大多数图像恢复任务应用得不可行。在这项工作中,我们通过在构建块(多头关注和前锋网络)中进行多个关键设计,提出了一种有效的变压器模型,使得它可以捕获远程像素相互作用,同时仍然适用于大图像。我们的模型,命名恢复变压器(RESTORMER),实现了最先进的结果,导致几种图像恢复任务,包括图像派生,单图像运动脱棕,散焦去纹(单图像和双像素数据)和图像去噪(高斯灰度/颜色去噪,真实的图像去噪)。源代码和预先训练的型号可在https://github.com/swz30/restormer上获得。
translated by 谷歌翻译
卷积神经网络在过去十年中允许在单个图像超分辨率(SISR)中的显着进展。在SISR最近的进展中,关注机制对于高性能SR模型至关重要。但是,注意机制仍然不清楚为什么它在SISR中的工作原理。在这项工作中,我们试图量化和可视化SISR中的注意力机制,并表明并非所有关注模块都同样有益。然后,我们提出了关注网络(A $ ^ 2 $ n)的注意力,以获得更高效和准确的SISR。具体来说,$ ^ 2 $ n包括非关注分支和耦合注意力分支。提出了一种动态注意力模块,为这两个分支产生权重,以动态地抑制不需要的注意力调整,其中权重根据输入特征自适应地改变。这允许注意模块专门从事惩罚的有益实例,从而大大提高了注意力网络的能力,即几个参数开销。实验结果表明,我们的最终模型A $ ^ 2 $ n可以实现与类似尺寸的最先进网络相比的卓越的权衡性能。代码可以在https://github.com/haoyuc/a2n获得。
translated by 谷歌翻译
随着卷积神经网络最近的大规模发展,已经提出了用于边缘设备上实用部署的大量基于CNN的显着图像超分辨率方法。但是,大多数现有方法都集中在一个特定方面:网络或损失设计,这导致难以最大程度地减少模型大小。为了解决这个问题,我们得出结论,设计,架构搜索和损失设计,以获得更有效的SR结构。在本文中,我们提出了一个名为EFDN的边缘增强功能蒸馏网络,以保留在约束资源下的高频信息。详细说明,我们基于现有的重新处理方法构建了一个边缘增强卷积块。同时,我们提出了边缘增强的梯度损失,以校准重新分配的路径训练。实验结果表明,我们的边缘增强策略可以保持边缘并显着提高最终恢复质量。代码可在https://github.com/icandle/efdn上找到。
translated by 谷歌翻译
否决单图是一项普遍但又具有挑战性的任务。复杂的降雪降解和各种降解量表需要强大的代表能力。为了使否定的网络看到各种降雪并建模本地细节和全球信息的上下文相互作用,我们提出了一种称为Snowformer的功能强大的建筑。首先,它在编码器中执行比例感知功能聚合,以捕获各种降解的丰富积雪信息。其次,为了解决大规模降级,它使用了解码器中的新颖上下文交互变压器块,该互动器块在全球上下文交互中从前范围内的局部细节和全局信息进行了上下文交互。并引入本地上下文互动可改善场景细节的恢复。第三,我们设计了一个异质的特征投影头,该功能投影头逐渐融合了编码器和解码器的特征,并将精制功能投影到干净的图像中。广泛的实验表明,所提出的雪诺形雪孔比其他SOTA方法取得了重大改进。与SOTA单图像HDCW-NET相比,它在CSD测试集上将PSNR度量提高了9.2dB。此外,与一般图像恢复体系结构NAFNET相比,PSNR的增加5.13db,这验证了我们的雪诺形雪地降雪任务的强大表示能力。该代码在\ url {https://github.com/ephemeral182/snowformer}中发布。
translated by 谷歌翻译
基于卷积神经网络(CNN)的现代单图像超分辨率(SISR)系统实现了花哨的性能,而需要巨大的计算成本。在视觉识别任务中对特征冗余的问题进行了很好的研究,但很少在SISR中进行讨论。基于这样的观察,SISR模型中的许多功能也彼此相似,我们建议使用Shift操作来生成冗余功能(即幽灵功能)。与在类似GPU的设备上耗时的深度卷积相比,Shift操作可以为CNN带来实用的推理加速度。我们分析了SISR操作对SISR任务的好处,并根据Gumbel-SoftMax技巧使Shift取向可学习。此外,基于预训练的模型探索了聚类过程,以识别用于生成内在特征的内在过滤器。幽灵功能将通过沿特定方向移动这些内在功能来得出。最后,完整的输出功能是通过将固有和幽灵特征串联在一起来构建的。在几个基准模型和数据集上进行的广泛实验表明,嵌入了所提出方法的非压缩和轻质SISR模型都可以实现与基准的可比性能,并大大降低了参数,拖台和GPU推荐延迟。例如,我们将参数降低46%,FLOPS掉落46%,而GPU推断潜伏期则减少了$ \ times2 $ EDSR网络的42%,基本上是无损的。
translated by 谷歌翻译
卷积神经网络(CNNS)成功地进行了压缩图像感测。然而,由于局部性和重量共享的归纳偏差,卷积操作证明了建模远程依赖性的内在限制。变压器,最初作为序列到序列模型设计,在捕获由于基于自我关注的架构而捕获的全局背景中,即使它可以配备有限的本地化能力。本文提出了一种混合框架,一个混合框架,其集成了从CNN提供的借用的优点以及变压器提供的全局上下文,以获得增强的表示学习。所提出的方法是由自适应采样和恢复组成的端到端压缩图像感测方法。在采样模块中,通过学习的采样矩阵测量图像逐块。在重建阶段,将测量投射到双杆中。一个是用于通过卷积建模邻域关系的CNN杆,另一个是用于采用全球自我关注机制的变压器杆。双分支结构是并发,并且本地特征和全局表示在不同的分辨率下融合,以最大化功能的互补性。此外,我们探索一个渐进的战略和基于窗口的变压器块,以降低参数和计算复杂性。实验结果表明了基于专用变压器的架构进行压缩感测的有效性,与不同数据集的最先进方法相比,实现了卓越的性能。
translated by 谷歌翻译