translated by 谷歌翻译
NERF和其他相关隐式神经表示方法的最新成功为连续图像表示打开了一条新的途径,其中不再需要从存储的离散2D阵列中查找像素值,但可以从连续空间域上的神经网络模型推断出来。尽管LIIF最近的工作表明,这种新颖的方法可以在任意尺度的超分辨率任务上实现良好的性能,但由于对高频纹理的预测不准确,它们的高尺度图像经常显示出结构性失真。在这项工作中,我们提出了UltraSR,这是一种基于隐式图像函数的简单而有效的新网络设计,在其中我们深入整合了空间坐标和与隐式神经表示的定期编码。通过广泛的实验和消融研究,我们表明空间编码是朝向下一个阶段高表现隐式图像函数的缺失钥匙。与以前的最先进的方法相比,我们的Ultrasr在所有超分辨率量表下在DIV2K基准测试中设定了新的最先进的性能。 Ultrasr还可以在其他标准基准数据集上实现卓越的性能,在这些数据集中,它在几乎所有实验中都优于先前的工作。
translated by 谷歌翻译
如今,由于屏幕共享,远程合作和在线教育的广泛应用,屏幕内容存在爆炸性增长。为了匹配有限终端带宽,可以缩小高分辨率(HR)屏幕内容并压缩。在接收器侧,低分辨率(LR)屏幕内容图像(SCI)的超分辨率(SR)由HR显示器或用户缩小以供详细观察。然而,由于图像特性非常不同的图像特性以及在任意尺度下浏览的SCI浏览要求,图像SR方法主要针对自然图像设计不概括SCI。为此,我们为SCISR提出了一种新颖的隐式变压器超分辨率网络(ITSRN)。对于任意比率的高质量连续SR,通过所提出的隐式变压器从密钥坐标处的图像特征推断出查询坐标处的像素值,并且提出了隐式位置编码方案来聚合与查询相似的相邻像素值。使用LR和HR SCI对构建基准SCI1K和SCI1K压缩数据集。广泛的实验表明,提出的ITSRN显着优于压缩和未压缩的SCI的几种竞争连续和离散SR方法。
translated by 谷歌翻译
How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit neural representation, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in arbitrary resolution. To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with superresolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to ×30 higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths. Our project page with code is at https://yinboc.github.io/liif/.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
图像表示对于许多视觉任务至关重要。最近的一项研究,即局部隐式图像函数(LIIF),而不是用2D阵列代替图像,而是将图像表示为连续函数,其中像素值是通过使用相应的坐标作为输入来扩展的。由于其连续的性质,可以为任意规模的图像超分辨率任务采用LIIF,从而为各种提高因素提供了一个有效和有效的模型。但是,Liif通常遭受边缘周围的结构扭曲和响起的伪影,主要是因为所有像素共享相同的模型,因此忽略了图像的局部特性。在本文中,我们提出了一种新颖的自适应局部图像功能(A-LIIF)来减轻此问题。具体而言,我们的A-LIIF由两个主要组成部分组成:编码器和扩展网络。前者捕获了跨尺度的图像特征,而后者通过多个局部隐式图像函数的加权组合进行了连续升级函数。因此,我们的A-LIIF可以更准确地重建高频纹理和结构。多个基准数据集的实验验证了我们方法的有效性。我们的代码可在\ url {https://github.com/leehw-thu/a-liif}上找到。
translated by 谷歌翻译
360 {\ Deg}成像最近遭受了很大的关注;然而,其角度分辨率比窄视野(FOV)透视图像相对较低,因为它通过使用具有相同传感器尺寸的鱼眼透镜而被捕获。因此,它有利于超声解析360 {\ DEG}图像。已经制造了一些尝试,但大多数是常规的投影(ERP),尽管尽管存在纬度依赖性失真,但仍然是360 {\ DEG}图像表示的方式之一。在这种情况下,随着输出高分辨率(HR)图像始终处于与低分辨率(LR)输入相同的ERP格式,当将HR图像转换为其他投影类型时可能发生另一信息丢失。在本文中,我们提出了从LR 360 {\ Deg}图像产生连续球面图像表示的新颖框架,旨在通过任意360 {\ deg}预测给定球形坐标处的RGB值。图像投影。具体地,我们首先提出了一种特征提取模块,该特征提取模块表示基于IcosaheDron的球面数据,并有效地提取球面上的特征。然后,我们提出了一种球形本地隐式图像功能(SLIIF)来预测球形坐标处的RGB值。这样,Spheresr在任意投影型下灵活地重建HR图像。各种基准数据集的实验表明,我们的方法显着超越了现有方法。
translated by 谷歌翻译
Learning continuous image representations is recently gaining popularity for image super-resolution (SR) because of its ability to reconstruct high-resolution images with arbitrary scales from low-resolution inputs. Existing methods mostly ensemble nearby features to predict the new pixel at any queried coordinate in the SR image. Such a local ensemble suffers from some limitations: i) it has no learnable parameters and it neglects the similarity of the visual features; ii) it has a limited receptive field and cannot ensemble relevant features in a large field which are important in an image; iii) it inherently has a gap with real camera imaging since it only depends on the coordinate. To address these issues, this paper proposes a continuous implicit attention-in-attention network, called CiaoSR. We explicitly design an implicit attention network to learn the ensemble weights for the nearby local features. Furthermore, we embed a scale-aware attention in this implicit attention network to exploit additional non-local information. Extensive experiments on benchmark datasets demonstrate CiaoSR significantly outperforms the existing single image super resolution (SISR) methods with the same backbone. In addition, the proposed method also achieves the state-of-the-art performance on the arbitrary-scale SR task. The effectiveness of the method is also demonstrated on the real-world SR setting. More importantly, CiaoSR can be flexibly integrated into any backbone to improve the SR performance.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation, while the other module warps another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations adaptively via the learned attention maps, leading to the final high-resolution LF image with satisfactory results on both plain textured areas and depth discontinuous boundaries. Besides, to promote the effectiveness of our method trained with simulated hybrid data on real hybrid data captured by a hybrid LF imaging system, we carefully design the network architecture and the training strategy. Extensive experiments on both real and simulated hybrid data demonstrate the significant superiority of our approach over state-of-the-art ones. To the best of our knowledge, this is the first end-to-end deep learning method for LF reconstruction from a real hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission.
translated by 谷歌翻译
Neural volumetric representations have become a widely adopted model for radiance fields in 3D scenes. These representations are fully implicit or hybrid function approximators of the instantaneous volumetric radiance in a scene, which are typically learned from multi-view captures of the scene. We investigate the new task of neural volume super-resolution - rendering high-resolution views corresponding to a scene captured at low resolution. To this end, we propose a neural super-resolution network that operates directly on the volumetric representation of the scene. This approach allows us to exploit an advantage of operating in the volumetric domain, namely the ability to guarantee consistent super-resolution across different viewing directions. To realize our method, we devise a novel 3D representation that hinges on multiple 2D feature planes. This allows us to super-resolve the 3D scene representation by applying 2D convolutional networks on the 2D feature planes. We validate the proposed method's capability of super-resolving multi-view consistent views both quantitatively and qualitatively on a diverse set of unseen 3D scenes, demonstrating a significant advantage over existing approaches.
translated by 谷歌翻译
High Resolution (HR) medical images provide rich anatomical structure details to facilitate early and accurate diagnosis. In MRI, restricted by hardware capacity, scan time, and patient cooperation ability, isotropic 3D HR image acquisition typically requests long scan time and, results in small spatial coverage and low SNR. Recent studies showed that, with deep convolutional neural networks, isotropic HR MR images could be recovered from low-resolution (LR) input via single image super-resolution (SISR) algorithms. However, most existing SISR methods tend to approach a scale-specific projection between LR and HR images, thus these methods can only deal with a fixed up-sampling rate. For achieving different up-sampling rates, multiple SR networks have to be built up respectively, which is very time-consuming and resource-intensive. In this paper, we propose ArSSR, an Arbitrary Scale Super-Resolution approach for recovering 3D HR MR images. In the ArSSR model, the reconstruction of HR images with different up-scaling rates is defined as learning a continuous implicit voxel function from the observed LR images. Then the SR task is converted to represent the implicit voxel function via deep neural networks from a set of paired HR-LR training examples. The ArSSR model consists of an encoder network and a decoder network. Specifically, the convolutional encoder network is to extract feature maps from the LR input images and the fully-connected decoder network is to approximate the implicit voxel function. Due to the continuity of the learned function, a single ArSSR model can achieve arbitrary up-sampling rate reconstruction of HR images from any input LR image after training. Experimental results on three datasets show that the ArSSR model can achieve state-of-the-art SR performance for 3D HR MR image reconstruction while using a single trained model to achieve arbitrary up-sampling scales.
translated by 谷歌翻译
The rendering procedure used by neural radiance fields (NeRF) samples a scene with a single ray per pixel and may therefore produce renderings that are excessively blurred or aliased when training or testing images observe scene content at different resolutions. The straightforward solution of supersampling by rendering with multiple rays per pixel is impractical for NeRF, because rendering each ray requires querying a multilayer perceptron hundreds of times. Our solution, which we call "mip-NeRF" (à la "mipmap"), extends NeRF to represent the scene at a continuously-valued scale. By efficiently rendering anti-aliased conical frustums instead of rays, mip-NeRF reduces objectionable aliasing artifacts and significantly improves NeRF's ability to represent fine details, while also being 7% faster than NeRF and half the size. Compared to NeRF, mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset that we present. Mip-NeRF is also able to match the accuracy of a brute-force supersampled NeRF on our multiscale dataset while being 22× faster.
translated by 谷歌翻译
translated by 谷歌翻译
Informative features play a crucial role in the single image super-resolution task. Channel attention has been demonstrated to be effective for preserving information-rich features in each layer. However, channel attention treats each convolution layer as a separate process that misses the correlation among different layers. To address this problem, we propose a new holistic attention network (HAN), which consists of a layer attention module (LAM) and a channel-spatial attention module (CSAM), to model the holistic interdependencies among layers, channels, and positions. Specifically, the proposed LAM adaptively emphasizes hierarchical features by considering correlations among layers. Meanwhile, CSAM learns the confidence at all the positions of each channel to selectively capture more informative features. Extensive experiments demonstrate that the proposed HAN performs favorably against the state-ofthe-art single image super-resolution approaches.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
具有窄光谱带的高光谱图像(HSI)可以捕获丰富的光谱信息,但它在该过程中牺牲其空间分辨率。最近提出了许多基于机器学习的HSI超分辨率(SR)算法。然而,这些方法的基本限制之一是它们高度依赖于图像和相机设置,并且只能学会用另一个特定设置用一个特定的设置映射输入的HSI。然而,由于HSI相机的多样性,不同的相机捕获具有不同光谱响应函数和频带编号的图像。因此,现有的基于机器学习的方法无法学习用于各种输入输出频带设置的超声波HSIS。我们提出了一种基于元学习的超分辨率(MLSR)模型,其可以在任意数量的输入频带'峰值波长下采用HSI图像,并产生具有任意数量的输出频带'峰值波长的SR HSIS。我们利用NTIRE2020和ICVL数据集训练并验证MLSR模型的性能。结果表明,单个提出的模型可以在任意输入 - 输出频带设置下成功生成超分辨的HSI频段。结果更好或至少与在特定输入输出频带设置上单独培训的基线相当。
translated by 谷歌翻译