单像超分辨率可以在需要可靠的视觉流以监视任务,处理远程操作或研究相关视觉细节的环境中支持机器人任务。在这项工作中,我们为实时超级分辨率提出了一个有效的生成对抗网络模型。我们采用了原始SRGAN的量身定制体系结构和模型量化,以提高CPU和Edge TPU设备上的执行,最多达到200 fps的推断。我们通过将其知识提炼成较小版本的网络,进一步优化我们的模型,并与标准培训方法相比获得显着的改进。我们的实验表明,与较重的最新模型相比,我们的快速和轻量级模型可保持相当令人满意的图像质量。最后,我们对图像传输进行带宽降解的实验,以突出提出的移动机器人应用系统的优势。
translated by 谷歌翻译
近年来,使用基于深入学习的架构的状态,在图像超分辨率的任务中有几个进步。先前发布的许多基于超分辨率的技术,需要高端和顶部的图形处理单元(GPU)来执行图像超分辨率。随着深度学习方法的进步越来越大,神经网络已经变得越来越多地计算饥饿。我们返回了一步,并专注于创建实时有效的解决方案。我们提出了一种在其内存足迹方面更快更小的架构。所提出的架构使用深度明智的可分离卷积来提取特征,并且它与其他超分辨率的GAN(生成对抗网络)进行接受,同时保持实时推断和低存储器占用。即使在带宽条件不佳,实时超分辨率也能够流式传输高分辨率介质内容。在维持准确性和延迟之间的有效权衡之间,我们能够生产可比较的性能模型,该性能模型是超分辨率GAN的大小的一个 - 八(1/8),并且计算的速度比超分辨率的GAN快74倍。
translated by 谷歌翻译
Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image superresolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4× upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.
translated by 谷歌翻译
最近的研究通过卷积神经网络(CNNS)显着提高了单图像超分辨率(SR)的性能。虽然可以有许多用于给定输入的高分辨率(HR)解决方案,但大多数现有的基于CNN的方法在推理期间不会探索替代解决方案。获得替代SR结果的典型方法是培训具有不同丢失权重的多个SR模型,并利用这些模型的组合。我们通过利用多任务学习,我们提出了一种更有效的方法来培训单个可调SR模型的单一可调SR模型。具体地,我们在训练期间优化具有条件目标的SR模型,其中目标是不同特征级别的多个感知损失的加权之和。权重根据给定条件而变化,并且该组重量被定义为样式控制器。此外,我们提出了一种适用于该训练方案的架构,该架构是配备有空间特征变换层的残留残余密集块。在推理阶段,我们培训的模型可以在样式控制地图上生成局部不同的输出。广泛的实验表明,所提出的SR模型在没有伪影的情况下产生各种所需的重建,并对最先进的SR方法产生相当的定量性能。
translated by 谷歌翻译
Single image super-resolution is the task of inferring a high-resolution image from a single low-resolution input. Traditionally, the performance of algorithms for this task is measured using pixel-wise reconstruction measures such as peak signal-to-noise ratio (PSNR) which have been shown to correlate poorly with the human perception of image quality. As a result, algorithms minimizing these metrics tend to produce over-smoothed images that lack highfrequency textures and do not look natural despite yielding high PSNR values.We propose a novel application of automated texture synthesis in combination with a perceptual loss focusing on creating realistic textures rather than optimizing for a pixelaccurate reproduction of ground truth images during training. By using feed-forward fully convolutional neural networks in an adversarial training setting, we achieve a significant boost in image quality at high magnification ratios. Extensive experiments on a number of datasets show the effectiveness of our approach, yielding state-of-the-art results in both quantitative and qualitative benchmarks.
translated by 谷歌翻译
基于深度学习的单图像超分辨率(SISR)方法引起了人们的关注,并在现代高级GPU上取得了巨大的成功。但是,大多数最先进的方法都需要大量参数,记忆和计算资源,这些参数通常会显示在当前移动设备CPU/NPU上时显示出较低的推理时间。在本文中,我们提出了一个简单的普通卷积网络,该网络具有快速最近的卷积模块(NCNET),该模块对NPU友好,可以实时执行可靠的超级分辨率。提出的最近的卷积具有与最近的UP采样相同的性能,但更快,更适合Android NNAPI。我们的模型可以很容易地在具有8位量化的移动设备上部署,并且与所有主要的移动AI加速器完全兼容。此外,我们对移动设备上的不同张量操作进行了全面的实验,以说明网络体系结构的效率。我们的NCNET在DIV2K 3X数据集上进行了训练和验证,并且与其他有效的SR方法的比较表明,NCNET可以实现高保真SR结果,同时使用更少的推理时间。我们的代码和预估计的模型可在\ url {https://github.com/algolzw/ncnet}上公开获得。
translated by 谷歌翻译
尽管应用于自然图像的大量成功的超分辨率重建(SRR)模型,但它们在遥感图像中的应用往往会产生差的结果。遥感图像通常比自然图像更复杂,并且具有较低分辨率的特殊性,它包含噪音,并且通常描绘了大质感表面。结果,将非专业的SRR模型应用于遥感图像,从而导致人工制品和不良的重建。为了解决这些问题,本文提出了一种受到先前研究工作启发的体系结构,引入了一种新的方法来迫使SRR模型输出现实的遥感图像:而不是依靠功能空间相似性作为感知损失,而是将其视为Pixel-从图像的归一化数字表面模型(NDSM)推断出的级别信息。该策略允许在训练模型期间应用更具信息的更新,该模型从任务(高程图推理)源中源,该模型与遥感密切相关。但是,在生产过程中不需要NDSM辅助信息,因此该模型除了其低分辨率对以外没有任何其他数据,因此该模型还没有任何其他数据。我们在两个远程感知的不同空间分辨率的数据集上评估了我们的模型,这些数据集也包含图像的DSM对:DFC2018数据集和包含卢森堡国家激光雷达飞行的数据集。根据视觉检查,推断的超分辨率图像表现出特别优越的质量。特别是,高分辨率DFC2018数据集的结果是现实的,几乎与地面真相图像没有区别。
translated by 谷歌翻译
随着深度学习(DL)的出现,超分辨率(SR)也已成为一个蓬勃发展的研究领域。然而,尽管结果有希望,但该领域仍然面临需要进一步研究的挑战,例如,允许灵活地采样,更有效的损失功能和更好的评估指标。我们根据最近的进步来回顾SR的域,并检查最新模型,例如扩散(DDPM)和基于变压器的SR模型。我们对SR中使用的当代策略进行了批判性讨论,并确定了有前途但未开发的研究方向。我们通过纳入该领域的最新发展,例如不确定性驱动的损失,小波网络,神经体系结构搜索,新颖的归一化方法和最新评估技术来补充先前的调查。我们还为整章中的模型和方法提供了几种可视化,以促进对该领域趋势的全球理解。最终,这篇综述旨在帮助研究人员推动DL应用于SR的界限。
translated by 谷歌翻译
图像超分辨率(SR)是重要的图像处理方法之一,可改善计算机视野领域的图像分辨率。在过去的二十年中,在超级分辨率领域取得了重大进展,尤其是通过使用深度学习方法。这项调查是为了在深度学习的角度进行详细的调查,对单像超分辨率的最新进展进行详细的调查,同时还将告知图像超分辨率的初始经典方法。该调查将图像SR方法分类为四个类别,即经典方法,基于学习的方法,无监督学习的方法和特定领域的SR方法。我们还介绍了SR的问题,以提供有关图像质量指标,可用参考数据集和SR挑战的直觉。使用参考数据集评估基于深度学习的方法。一些审查的最先进的图像SR方法包括增强的深SR网络(EDSR),周期循环gan(Cincgan),多尺度残留网络(MSRN),Meta残留密度网络(META-RDN) ,反复反射网络(RBPN),二阶注意网络(SAN),SR反馈网络(SRFBN)和基于小波的残留注意网络(WRAN)。最后,这项调查以研究人员将解决SR的未来方向和趋势和开放问题的未来方向和趋势。
translated by 谷歌翻译
The Super-Resolution Generative Adversarial Network (SR-GAN) [1] is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGANnetwork architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN [2] to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge 1 [3]. The code is available at https://github.com/xinntao/ESRGAN.
translated by 谷歌翻译
我们考虑单个图像超分辨率(SISR)问题,其中基于低分辨率(LR)输入产生高分辨率(HR)图像。最近,生成的对抗性网络(GANS)变得幻觉细节。大多数沿着这条线的方法依赖于预定义的单个LR-intle-hr映射,这对于SISR任务来说是足够灵活的。此外,GaN生成的假细节可能经常破坏整个图像的现实主义。我们通过为Rich-Detail SISR提出最好的伙伴GANS(Beby-GaN)来解决这些问题。放松不变的一对一的约束,我们允许估计的贴片在培训期间动态寻求最佳监督,这有利于产生更合理的细节。此外,我们提出了一种区域感知的对抗性学习策略,指导我们的模型专注于自适应地为纹理区域发电细节。广泛的实验证明了我们方法的有效性。还构建了超高分辨率4K数据集以促进未来的超分辨率研究。
translated by 谷歌翻译
Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image superresolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.
translated by 谷歌翻译
Because of the necessity to obtain high-quality images with minimal radiation doses, such as in low-field magnetic resonance imaging, super-resolution reconstruction in medical imaging has become more popular (MRI). However, due to the complexity and high aesthetic requirements of medical imaging, image super-resolution reconstruction remains a difficult challenge. In this paper, we offer a deep learning-based strategy for reconstructing medical images from low resolutions utilizing Transformer and Generative Adversarial Networks (T-GAN). The integrated system can extract more precise texture information and focus more on important locations through global image matching after successfully inserting Transformer into the generative adversarial network for picture reconstruction. Furthermore, we weighted the combination of content loss, adversarial loss, and adversarial feature loss as the final multi-task loss function during the training of our proposed model T-GAN. In comparison to established measures like PSNR and SSIM, our suggested T-GAN achieves optimal performance and recovers more texture features in super-resolution reconstruction of MRI scanned images of the knees and belly.
translated by 谷歌翻译
当前的深层图像超分辨率(SR)方法试图从下采样的图像或假设简单高斯内核和添加噪声中降解来恢复高分辨率图像。但是,这种简单的图像处理技术代表了降低图像分辨率的现实世界过程的粗略近似。在本文中,我们提出了一个更现实的过程,通过引入新的内核对抗学习超分辨率(KASR)框架来处理现实世界图像SR问题,以降低图像分辨率。在提议的框架中,降解内核和噪声是自适应建模的,而不是明确指定的。此外,我们还提出了一个迭代监督过程和高频选择性目标,以进一步提高模型SR重建精度。广泛的实验验证了对现实数据集中提出的框架的有效性。
translated by 谷歌翻译
当网络条件恶化时,视频会议系统的用户体验差,因为当前的视频编解码器根本无法在极低的比特率下运行。最近,已经提出了几种神经替代方案,可以使用每个框架的稀疏表示,例如面部地标信息,以非常低的比特率重建说话的头视频。但是,这些方法在通话过程中具有重大运动或遮挡的情况下会产生不良的重建,并且不会扩展到更高的分辨率。我们设计了Gemino,这是一种基于新型高频条件超分辨率管道的新型神经压缩系统,用于视频会议。 Gemino根据从单个高分辨率参考图像中提取的信息来增强高频细节(例如,皮肤纹理,头发等),为每个目标框架的一个非常低分辨率的版本(例如,皮肤纹理,头发等)。我们使用多尺度体系结构,该体系结构在不同的分辨率下运行模型的不同组件,从而使其扩展到可与720p相当的分辨率,并且我们个性化模型以学习每个人的特定细节,在低比特率上实现了更好的保真度。我们在AIORTC上实施了Gemino,这是WEBRTC的开源Python实现,并表明它在A100 GPU上实时在1024x1024视频上运行,比比特率的比特率低于传统的视频Codecs,以相同的感知质量。
translated by 谷歌翻译
知识蒸馏(KD)可以有效地将知识从繁琐的网络(教师)转移到紧凑的网络(学生),在某些计算机视觉应用中证明了其优势。知识的表示对于知识转移和学生学习至关重要,这通常以手工制作的方式定义或直接使用中间功能。在本文中,我们建议在教师学生架构下为单像超级分辨率任务提出一种模型 - 不足的元知识蒸馏方法。它提供了一种更灵活,更准确的方法,可以通过知识代表网络(KRNET)的能力来帮助教师通过具有可学习参数的知识传输知识。为了提高知识表示对学生需求的看法能力,我们建议通过采用学生特征以及KRNET中的教师和学生之间的相关性来解决从中间产出到转移知识的转型过程。具体而言,生成纹理感知的动态内核,然后提取要改进的纹理特征,并将相应的教师指导分解为质地监督,以进一步促进高频细节的恢复质量。此外,KRNET以元学习方式进行了优化,以确保知识转移和学生学习有益于提高学生的重建质量。在各种单个图像超分辨率数据集上进行的实验表明,我们所提出的方法优于现有的定义知识表示相关的蒸馏方法,并且可以帮助超分辨率算法实现更好的重建质量,而无需引入任何推理复杂性。
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
面部超分辨率(FSR),也称为面部幻觉,其旨在增强低分辨率(LR)面部图像以产生高分辨率(HR)面部图像的分辨率,是特定于域的图像超分辨率问题。最近,FSR获得了相当大的关注,并目睹了深度学习技术的发展炫目。迄今为止,有很少有基于深入学习的FSR的研究摘要。在本次调查中,我们以系统的方式对基于深度学习的FSR方法进行了全面审查。首先,我们总结了FSR的问题制定,并引入了流行的评估度量和损失功能。其次,我们详细说明了FSR中使用的面部特征和流行数据集。第三,我们根据面部特征的利用大致分类了现有方法。在每个类别中,我们从设计原则的一般描述开始,然后概述代表方法,然后讨论其中的利弊。第四,我们评估了一些最先进的方法的表现。第五,联合FSR和其他任务以及与FSR相关的申请大致介绍。最后,我们设想了这一领域进一步的技术进步的前景。在\ URL {https://github.com/junjun-jiang/face-hallucination-benchmark}上有一个策划的文件和资源的策划文件和资源清单
translated by 谷歌翻译
Recent efforts in Neural Rendering Fields (NeRF) have shown impressive results on novel view synthesis by utilizing implicit neural representation to represent 3D scenes. Due to the process of volumetric rendering, the inference speed for NeRF is extremely slow, limiting the application scenarios of utilizing NeRF on resource-constrained hardware, such as mobile devices. Many works have been conducted to reduce the latency of running NeRF models. However, most of them still require high-end GPU for acceleration or extra storage memory, which is all unavailable on mobile devices. Another emerging direction utilizes the neural light field (NeLF) for speedup, as only one forward pass is performed on a ray to predict the pixel color. Nevertheless, to reach a similar rendering quality as NeRF, the network in NeLF is designed with intensive computation, which is not mobile-friendly. In this work, we propose an efficient network that runs in real-time on mobile devices for neural rendering. We follow the setting of NeLF to train our network. Unlike existing works, we introduce a novel network architecture that runs efficiently on mobile devices with low latency and small size, i.e., saving $15\times \sim 24\times$ storage compared with MobileNeRF. Our model achieves high-resolution generation while maintaining real-time inference for both synthetic and real-world scenes on mobile devices, e.g., $18.04$ms (iPhone 13) for rendering one $1008\times756$ image of real 3D scenes. Additionally, we achieve similar image quality as NeRF and better quality than MobileNeRF (PSNR $26.15$ vs. $25.91$ on the real-world forward-facing dataset).
translated by 谷歌翻译
Single-image super-resolution (SISR) networks trained with perceptual and adversarial losses provide high-contrast outputs compared to those of networks trained with distortion-oriented losses, such as L1 or L2. However, it has been shown that using a single perceptual loss is insufficient for accurately restoring locally varying diverse shapes in images, often generating undesirable artifacts or unnatural details. For this reason, combinations of various losses, such as perceptual, adversarial, and distortion losses, have been attempted, yet it remains challenging to find optimal combinations. Hence, in this paper, we propose a new SISR framework that applies optimal objectives for each region to generate plausible results in overall areas of high-resolution outputs. Specifically, the framework comprises two models: a predictive model that infers an optimal objective map for a given low-resolution (LR) input and a generative model that applies a target objective map to produce the corresponding SR output. The generative model is trained over our proposed objective trajectory representing a set of essential objectives, which enables the single network to learn various SR results corresponding to combined losses on the trajectory. The predictive model is trained using pairs of LR images and corresponding optimal objective maps searched from the objective trajectory. Experimental results on five benchmarks show that the proposed method outperforms state-of-the-art perception-driven SR methods in LPIPS, DISTS, PSNR, and SSIM metrics. The visual results also demonstrate the superiority of our method in perception-oriented reconstruction. The code and models are available at https://github.com/seungho-snu/SROOE.
translated by 谷歌翻译