轻巧的超级分辨率(SR)模型因其在移动设备中的可用性而受到了极大的关注。许多努力采用网络量化来压缩SR模型。但是,当将SR模型定量为具有低成本层量化的超低精度(例如2位和3位)时,这些方法会遭受严重的性能降解。在本文中,我们确定性能下降来自于层的对称量化器与SR模型中高度不对称的激活分布之间的矛盾。这种差异导致量化水平上的浪费或重建图像中的细节损失。因此,我们提出了一种新型的激活量化器,称为动态双训练边界(DDTB),以适应激活的不对称性。具体而言,DDTB在:1)具有可训练上限和下限的层量化器中,以应对高度不对称的激活。 2)一个动态栅极控制器,可在运行时自适应地调整上和下限,以克服不同样品上的急剧变化的激活范围。为了减少额外的开销,将动态栅极控制器定量到2位,并仅应用于部分的一部分SR网络根据引入的动态强度。广泛的实验表明,我们的DDTB在超低精度方面表现出显着的性能提高。例如,当将EDSR量化为2位并将输出图像扩展为X4时,我们的DDTB在Urban100基准测试基准上实现了0.70dB PSNR的增加。代码位于\ url {https://github.com/zysxmu/ddtb}。
translated by 谷歌翻译
量化图像超分辨率的深卷积神经网络大大降低了它们的计算成本。然而,现有的作品既不患有4个或低位宽度的超低精度的严重性能下降,或者需要沉重的微调过程以恢复性能。据我们所知,这种对低精度的漏洞依赖于特征映射值的两个统计观察。首先,特征贴图值的分布每个通道和每个输入图像都变化显着变化。其次,特征映射具有可以主导量化错误的异常值。基于这些观察,我们提出了一种新颖的分布感知量化方案(DAQ),其促进了超低精度的准确训练量化。 DAQ的简单功能确定了具有低计算负担的特征图和权重的动态范围。此外,我们的方法通过计算每个通道的相对灵敏度来实现混合精度量化,而无需涉及任何培训过程。尽管如此,量化感知培训也适用于辅助性能增益。我们的新方法优于最近的培训甚至基于培训的量化方法,以超低精度为最先进的图像超分辨率网络。
translated by 谷歌翻译
尽管具有卷积神经网络(CNN)的图像超分辨率(SR)的突破性进步,但由于SR网络的计算复杂性很高,SR尚未享受无处不在的应用。量化是解决此问题的有前途方法之一。但是,现有的方法无法量化低于8位的位宽度的SR模型,由于固定的位宽度量化量的严重精度损失。在这项工作中,为了实现高平均比重减少,准确性损失较低,我们建议针对SR网络的新颖的内容感知动态量化(CADYQ)方法,该方法将最佳位置分配给本地区域和层,并根据输入的本地内容适应。图片。为此,引入了一个可训练的位选择器模块,以确定每一层和给定的本地图像补丁的适当位宽度和量化水平。该模块受量化灵敏度的控制,该量化通过使用贴片的图像梯度的平均幅度和层的输入特征的标准偏差来估计。拟议的量化管道已在各种SR网络上进行了测试,并对几个标准基准进行了广泛评估。计算复杂性和升高恢复精度的显着降低清楚地表明了SR提出的CADYQ框架的有效性。代码可从https://github.com/cheeun/cadyq获得。
translated by 谷歌翻译
模型量化已成为加速深度学习推理的不可或缺的技术。虽然研究人员继续推动量化算法的前沿,但是现有量化工作通常是不可否认的和不可推销的。这是因为研究人员不选择一致的训练管道并忽略硬件部署的要求。在这项工作中,我们提出了模型量化基准(MQBench),首次尝试评估,分析和基准模型量化算法的再现性和部署性。我们为实际部署选择多个不同的平台,包括CPU,GPU,ASIC,DSP,并在统一培训管道下评估广泛的最新量化算法。 MQBENCK就像一个连接算法和硬件的桥梁。我们进行全面的分析,并找到相当大的直观或反向直观的见解。通过对齐训练设置,我们发现现有的算法在传统的学术轨道上具有大致相同的性能。虽然用于硬件可部署量化,但有一个巨大的精度差距,仍然不稳定。令人惊讶的是,没有现有的算法在MQBench中赢得每一项挑战,我们希望这项工作能够激发未来的研究方向。
translated by 谷歌翻译
训练后量化(PTQ)由于其在部署量化的神经网络方面的便利性而引起了越来越多的关注。 Founding是量化误差的主要来源,仅针对模型权重进行了优化,而激活仍然使用圆形至最终操作。在这项工作中,我们首次证明了精心选择的激活圆形方案可以提高最终准确性。为了应对激活舍入方案动态性的挑战,我们通过简单的功能适应圆形边框,以在推理阶段生成圆形方案。边界函数涵盖了重量误差,激活错误和传播误差的影响,以消除元素误差的偏差,从而进一步受益于模型的准确性。我们还使边境意识到全局错误,以更好地拟合不同的到达激活。最后,我们建议使用Aquant框架来学习边界功能。广泛的实验表明,与最先进的作品相比,Aquant可以通过可忽略不计的开销来取得明显的改进,并将Resnet-18的精度提高到2位重量和激活后训练后量化下的精度最高60.3 \%。
translated by 谷歌翻译
In recent years, image and video delivery systems have begun integrating deep learning super-resolution (SR) approaches, leveraging their unprecedented visual enhancement capabilities while reducing reliance on networking conditions. Nevertheless, deploying these solutions on mobile devices still remains an active challenge as SR models are excessively demanding with respect to workload and memory footprint. Despite recent progress on on-device SR frameworks, existing systems either penalize visual quality, lead to excessive energy consumption or make inefficient use of the available resources. This work presents NAWQ-SR, a novel framework for the efficient on-device execution of SR models. Through a novel hybrid-precision quantization technique and a runtime neural image codec, NAWQ-SR exploits the multi-precision capabilities of modern mobile NPUs in order to minimize latency, while meeting user-specified quality constraints. Moreover, NAWQ-SR selectively adapts the arithmetic precision at run time to equip the SR DNN's layers with wider representational power, improving visual quality beyond what was previously possible on NPUs. Altogether, NAWQ-SR achieves an average speedup of 7.9x, 3x and 1.91x over the state-of-the-art on-device SR systems that use heterogeneous processors (MobiSR), CPU (SplitSR) and NPU (XLSR), respectively. Furthermore, NAWQ-SR delivers an average of 3.2x speedup and 0.39 dB higher PSNR over status-quo INT8 NPU designs, but most importantly mitigates the negative effects of quantization on visual quality, setting a new state-of-the-art in the attainable quality of NPU-based SR.
translated by 谷歌翻译
虽然训练后量化受到普及,但由于其逃避访问原始的完整培训数据集,但其性能差也源于此限制。为了减轻这种限制,在本文中,我们利用零击量化引入的合成数据与校准数据集,我们提出了一种细粒度的数据分布对准(FDDA)方法来提高训练后量化的性能。该方法基于我们在训练网络的深层观察到的批量归一化统计(BNS)的两个重要属性,即,阶级间分离和级别的含量。为了保留这种细粒度分布信息:1)我们计算校准数据集的每级BNS作为每个类的BNS中心,并提出了BNS集中丢失,以强制不同类的合成数据分布靠近其自己的中心。 2)我们将高斯噪声添加到中心中,以模仿压力,并提出BNS扭曲的损失,以强迫同一类的合成数据分布接近扭曲的中心。通过引入这两个细粒度的损失,我们的方法显示了在想象中心上的最先进的性能,特别是当第一层和最后一层也被量化为低比特时。我们的项目可在https://github.com/zysxmu/fdda获得。
translated by 谷歌翻译
通过利用大型内核分解和注意机制,卷积神经网络(CNN)可以在许多高级计算机视觉任务中与基于变压器的方法竞争。但是,由于远程建模的优势,具有自我注意力的变压器仍然主导着低级视野,包括超分辨率任务。在本文中,我们提出了一个基于CNN的多尺度注意网络(MAN),该网络由多尺度的大内核注意力(MLKA)和一个封闭式的空间注意单元(GSAU)组成,以提高卷积SR网络的性能。在我们的MLKA中,我们使用多尺度和栅极方案纠正LKA,以在各种粒度水平上获得丰富的注意图,从而共同汇总了全局和局部信息,并避免了潜在的阻塞伪像。在GSAU中,我们集成了栅极机制和空间注意力,以消除不必要的线性层和汇总信息丰富的空间环境。为了确认我们的设计的有效性,我们通过简单地堆叠不同数量的MLKA和GSAU来评估具有多种复杂性的人。实验结果表明,我们的人可以在最先进的绩效和计算之间实现各种权衡。代码可从https://github.com/icandle/man获得。
translated by 谷歌翻译
Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one specific hardware setting, incurring considerable costs in model training and maintenance. In this paper, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. With this representation, we can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model. To this end, we propose a simple once quantization-aware training (QAT) scheme for obtaining high-performance vertical-layered models. Our design incorporates a cascade downsampling mechanism which allows us to obtain multiple quantized networks from one full precision source model by progressively mapping the higher precision weights to their adjacent lower precision counterparts. Then, with networks of different bit-widths from one source model, multi-objective optimization is employed to train the shared source model weights such that they can be updated simultaneously, considering the performance of all networks. By doing this, the shared weights will be optimized to balance the performance of different quantized models, thus making the weights transferable among different bit widths. Experiments show that the proposed vertical-layered representation and developed once QAT scheme are effective in embodying multiple quantized networks into a single one and allow one-time training, and it delivers comparable performance as that of quantized models tailored to any specific bit-width. Code will be available.
translated by 谷歌翻译
本文提出了一种任何时间的超分辨率方法(ARM),以解决过度参数化的单图像超分辨率(SISR)模型。我们的手臂是由三个观察结果激励的:(1)不同图像贴片的性能随不同大小的SISR网络而变化。 (2)计算开销与重建图像的性能之间存在权衡。 (3)给定输入图像,其边缘信息可以是估计其PSNR的有效选择。随后,我们训练包含不同尺寸的SISR子网的手臂超网,以处理各种复杂性的图像斑块。为此,我们构建了一个边缘到PSNR查找表,该表将图像补丁的边缘分数映射到每个子网的PSNR性能,以及子网的一组计算成本。在推论中,图像贴片单独分配给不同的子网,以获得更好的计算绩效折衷。此外,每个SISR子网都共享手臂超网的权重,因此不引入额外的参数。多个子网的设置可以很好地使SISR模型的计算成本适应动态可用的硬件资源,从而可以随时使用SISR任务。对不同大小的分辨率数据集的广泛实验和流行的SISR网络作为骨架验证了我们的手臂的有效性和多功能性。源代码可在https://github.com/chenbong/arm-net上找到。
translated by 谷歌翻译
Real-world image super-resolution (RISR) has received increased focus for improving the quality of SR images under unknown complex degradation. Existing methods rely on the heavy SR models to enhance low-resolution (LR) images of different degradation levels, which significantly restricts their practical deployments on resource-limited devices. In this paper, we propose a novel Dynamic Channel Splitting scheme for efficient Real-world Image Super-Resolution, termed DCS-RISR. Specifically, we first introduce the light degradation prediction network to regress the degradation vector to simulate the real-world degradations, upon which the channel splitting vector is generated as the input for an efficient SR model. Then, a learnable octave convolution block is proposed to adaptively decide the channel splitting scale for low- and high-frequency features at each block, reducing computation overhead and memory cost by offering the large scale to low-frequency features and the small scale to the high ones. To further improve the RISR performance, Non-local regularization is employed to supplement the knowledge of patches from LR and HR subspace with free-computation inference. Extensive experiments demonstrate the effectiveness of DCS-RISR on different benchmark datasets. Our DCS-RISR not only achieves the best trade-off between computation/parameter and PSNR/SSIM metric, and also effectively handles real-world images with different degradation levels.
translated by 谷歌翻译
随着深度学习(DL)的出现,超分辨率(SR)也已成为一个蓬勃发展的研究领域。然而,尽管结果有希望,但该领域仍然面临需要进一步研究的挑战,例如,允许灵活地采样,更有效的损失功能和更好的评估指标。我们根据最近的进步来回顾SR的域,并检查最新模型,例如扩散(DDPM)和基于变压器的SR模型。我们对SR中使用的当代策略进行了批判性讨论,并确定了有前途但未开发的研究方向。我们通过纳入该领域的最新发展,例如不确定性驱动的损失,小波网络,神经体系结构搜索,新颖的归一化方法和最新评估技术来补充先前的调查。我们还为整章中的模型和方法提供了几种可视化,以促进对该领域趋势的全球理解。最终,这篇综述旨在帮助研究人员推动DL应用于SR的界限。
translated by 谷歌翻译
基于卷积神经网络(CNN)的现代单图像超分辨率(SISR)系统实现了花哨的性能,而需要巨大的计算成本。在视觉识别任务中对特征冗余的问题进行了很好的研究,但很少在SISR中进行讨论。基于这样的观察,SISR模型中的许多功能也彼此相似,我们建议使用Shift操作来生成冗余功能(即幽灵功能)。与在类似GPU的设备上耗时的深度卷积相比,Shift操作可以为CNN带来实用的推理加速度。我们分析了SISR操作对SISR任务的好处,并根据Gumbel-SoftMax技巧使Shift取向可学习。此外,基于预训练的模型探索了聚类过程,以识别用于生成内在特征的内在过滤器。幽灵功能将通过沿特定方向移动这些内在功能来得出。最后,完整的输出功能是通过将固有和幽灵特征串联在一起来构建的。在几个基准模型和数据集上进行的广泛实验表明,嵌入了所提出方法的非压缩和轻质SISR模型都可以实现与基准的可比性能,并大大降低了参数,拖台和GPU推荐延迟。例如,我们将参数降低46%,FLOPS掉落46%,而GPU推断潜伏期则减少了$ \ times2 $ EDSR网络的42%,基本上是无损的。
translated by 谷歌翻译
联合超分辨率和反音调映射(联合SR-ITM)旨在增加低分辨率和标准动态范围图像的分辨率和动态范围。重点方法主要是诉诸图像分解技术,使用多支化的网络体系结构。 ,这些方法采用的刚性分解在很大程度上将其力量限制在各种图像上。为了利用其潜在能力,在本文中,我们将分解机制从图像域概括为更广泛的特征域。为此,我们提出了一个轻巧的特征分解聚合网络(FDAN)。特别是,我们设计了一个功能分解块(FDB),可以实现功能细节和对比度的可学习分离。通过级联FDB,我们可以建立一个用于强大的多级特征分解的分层功能分解组。联合SR-ITM,\ ie,SRITM-4K的新基准数据集,该数据集是大规模的,为足够的模型培训和评估提供了多功能方案。两个基准数据集的实验结果表明,我们的FDAN表明我们的FDAN有效,并且胜过了以前的方法sr-itm.ar代码和数据集将公开发布。
translated by 谷歌翻译
最近,低精确的深度学习加速器(DLA)由于其在芯片区域和能源消耗方面的优势而变得流行,但是这些DLA上的低精确量化模型导致严重的准确性降解。达到高精度和高效推断的一种方法是在低精度DLA上部署高精度神经网络,这很少被研究。在本文中,我们提出了平行的低精确量化(PALQUANT)方法,该方法通过从头开始学习并行低精度表示来近似高精度计算。此外,我们提出了一个新型的循环洗牌模块,以增强平行低精度组之间的跨组信息通信。广泛的实验表明,PALQUANT的精度和推理速度既优于最先进的量化方法,例如,对于RESNET-18网络量化,PALQUANT可以获得0.52 \%的准确性和1.78 $ \ times $ speedup同时获得在最先进的2位加速器上的4位反片机上。代码可在\ url {https://github.com/huqinghao/palquant}中获得。
translated by 谷歌翻译
Convolutional neural network (CNN) depth is of crucial importance for image super-resolution (SR). However, we observe that deeper networks for image SR are more difficult to train. The lowresolution inputs and features contain abundant low-frequency information, which is treated equally across channels, hence hindering the representational ability of CNNs. To solve these problems, we propose the very deep residual channel attention networks (RCAN). Specifically, we propose a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections. Each residual group contains some residual blocks with short skip connections. Meanwhile, RIR allows abundant low-frequency information to be bypassed through multiple skip connections, making the main network focus on learning high-frequency information. Furthermore, we propose a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels. Extensive experiments show that our RCAN achieves better accuracy and visual improvements against state-of-the-art methods.
translated by 谷歌翻译
Recently, Convolutional Neural Network (CNN) based models have achieved great success in Single Image Super-Resolution (SISR). Owing to the strength of deep networks, these CNN models learn an effective nonlinear mapping from the low-resolution input image to the high-resolution target image, at the cost of requiring enormous parameters. This paper proposes a very deep CNN model (up to 52 convolutional layers) named Deep Recursive Residual Network (DRRN) that strives for deep yet concise networks. Specifically, residual learning is adopted, both in global and local manners, to mitigate the difficulty of training very deep net-works; recursive learning is used to control the model parameters while increasing the depth. Extensive benchmark evaluation shows that DRRN significantly outperforms state of the art in SISR, while utilizing far fewer parameters. Code is available at https://github.com/tyshiwo /DRRN CVPR17.
translated by 谷歌翻译
不断需要在低容量设备上使用的图像超分辨率(SR)的高性能和计算有效的神经网络模型。获取此类模型的一种方法是压缩现有体系结构,例如量化。另一个选择是发现新的有效解决方案的神经体系结构搜索(NAS)。我们为专门设计的SR搜索空间提出了一种新颖的量化NAS程序。我们的方法执行NAS以找到量化友好的SR模型。搜索依赖于将量化噪声添加到参数和激活中,而不是直接量化参数。我们的Quontnas比固定体系结构的均匀或混合精度量化找到了具有更好的PSNR/BITOP权衡的体系结构。此外,我们对噪声过程的搜索比直接量化权重的速度快30%。
translated by 谷歌翻译
二进制神经网络(BNN)提供了一种有希望的解决方案,可以将参数密集的深度单像超分辨率(SISR)模型部署到具有有限的存储和计算资源的真实设备上。为了实现与完整精确的对应物的可比性能,大多数现有的SISR现有BNN主要集中于补偿通过更好地近似于二进制卷积,将网络中的权重和网络激活产生的信息损失。在这项研究中,我们重新审视了BNN及其全精度对应物之间的差异,并认为BNN的良好概括性能的关键在于保留完整的完整过度信息流以及通过每个二进制卷积层经过的准确梯度流量。受此启发的启发,我们建议在整个网络上引入完整的跳过连接或其在每个二元卷积层上的变体,这可以提高正向表达能力和背部传播梯度的准确性,从而提高概括性能。更重要的是,此类方案适用于任何现有的BNN骨架用于SISR,而无需引入任何其他计算成本。为了证明其功效,我们使用四个基准数据集中使用SISR的四个不同的骨干对其进行评估,并报告明显优于现有BNN甚至一些4位竞争对手。
translated by 谷歌翻译
Image super-resolution (SR) is a technique to recover lost high-frequency information in low-resolution (LR) images. Spatial-domain information has been widely exploited to implement image SR, so a new trend is to involve frequency-domain information in SR tasks. Besides, image SR is typically application-oriented and various computer vision tasks call for image arbitrary magnification. Therefore, in this paper, we study image features in the frequency domain to design a novel scale-arbitrary image SR network. First, we statistically analyze LR-HR image pairs of several datasets under different scale factors and find that the high-frequency spectra of different images under different scale factors suffer from different degrees of degradation, but the valid low-frequency spectra tend to be retained within a certain distribution range. Then, based on this finding, we devise an adaptive scale-aware feature division mechanism using deep reinforcement learning, which can accurately and adaptively divide the frequency spectrum into the low-frequency part to be retained and the high-frequency one to be recovered. Finally, we design a scale-aware feature recovery module to capture and fuse multi-level features for reconstructing the high-frequency spectrum at arbitrary scale factors. Extensive experiments on public datasets show the superiority of our method compared with state-of-the-art methods.
translated by 谷歌翻译