Despite deep end-to-end learning methods have shown their superiority in removing non-uniform motion blur, there still exist major challenges with the current multi-scale and scale-recurrent models: 1) Deconvolution/upsampling operations in the coarse-to-fine scheme result in expensive runtime; 2) Simply increasing the model depth with finer-scale levels cannot improve the quality of deblurring. To tackle the above problems, we present a deep hierarchical multi-patch network inspired by Spatial Pyramid Matching to deal with blurry images via a fine-tocoarse hierarchical representation. To deal with the performance saturation w.r.t. depth, we propose a stacked version of our multi-patch model. Our proposed basic multi-patch model achieves the state-of-the-art performance on the Go-Pro dataset while enjoying a 40× faster runtime compared to current multi-scale methods. With 30ms to process an image at 1280×720 resolution, it is the first real-time deep motion deblurring model for 720p images at 30fps. For stacked networks, significant improvements (over 1.2dB) are achieved on the GoPro dataset by increasing the network depth. Moreover, by varying the depth of the stacked model, one can adapt the performance and runtime of the same network for different application scenarios.
translated by 谷歌翻译
In single image deblurring, the "coarse-to-fine" scheme, i.e. gradually restoring the sharp image on different resolutions in a pyramid, is very successful in both traditional optimization-based methods and recent neural-networkbased approaches. In this paper, we investigate this strategy and propose a Scale-recurrent Network (SRN-DeblurNet) for this deblurring task. Compared with the many recent learning-based approaches in [25], it has a simpler network structure, a smaller number of parameters and is easier to train. We evaluate our method on large-scale deblurring datasets with complex motion. Results show that our method can produce better quality results than state-of-thearts, both quantitatively and qualitatively.
translated by 谷歌翻译
Non-uniform blind deblurring for general dynamic scenes is a challenging computer vision problem as blurs arise not only from multiple object motions but also from camera shake, scene depth variation. To remove these complicated motion blurs, conventional energy optimization based methods rely on simple assumptions such that blur kernel is partially uniform or locally linear. Moreover, recent machine learning based methods also depend on synthetic blur datasets generated under these assumptions. This makes conventional deblurring methods fail to remove blurs where blur kernel is difficult to approximate or parameterize (e.g. object motion boundaries). In this work, we propose a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources. Together, we present multiscale loss function that mimics conventional coarse-to-fine approaches. Furthermore, we propose a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera. With the proposed model trained on this dataset, we demonstrate empirically that our method achieves the state-of-the-art performance in dynamic scene deblurring not only qualitatively, but also quantitatively.
translated by 谷歌翻译
大多数现有的基于深度学习的单图像动态场景盲目脱毛(SIDSBD)方法通常设计深网络,以直接从一个输入的运动模糊图像中直接删除空间变化的运动模糊,而无需模糊的内核估计。在本文中,受投射运动路径模糊(PMPB)模型和可变形卷积的启发,我们提出了一个新颖的约束可变形的卷积网络(CDCN),以进行有效的单图像动态场景,同时实现了准确的空间变化,以及仅观察到的运动模糊图像的高质量图像恢复。在我们提出的CDCN中,我们首先构建了一种新型的多尺度多级多输入多输出(MSML-MIMO)编码器架构,以提高功能提取能力。其次,与使用多个连续帧的DLVBD方法不同,提出了一种新颖的约束可变形卷积重塑(CDCR)策略,其中首先将可变形的卷积应用于输入的单运动模糊图像的模糊特征,用于学习学习的抽样点,以学习学习的采样点每个像素的运动模糊内核类似于PMPB模型中摄像机震动的运动密度函数的估计,然后提出了一种基于PMPB的新型重塑损耗函数来限制学习的采样点收敛,这可以使得可以使得可以使其产生。学习的采样点与每个像素的相对运动轨迹匹配,并促进空间变化的运动模糊内核估计的准确性。
translated by 谷歌翻译
使用注意机制的深度卷积神经网络(CNN)在动态场景中取得了巨大的成功。在大多数这些网络中,只能通过注意图精炼的功能传递到下一层,并且不同层的注意力图彼此分开,这并不能充分利用来自CNN中不同层的注意信息。为了解决这个问题,我们引入了一种新的连续跨层注意传播(CCLAT)机制,该机制可以利用所有卷积层的分层注意信息。基于CCLAT机制,我们使用非常简单的注意模块来构建一个新型残留的密集注意融合块(RDAFB)。在RDAFB中,从上述RDAFB的输出中推断出的注意图和每一层直接连接到后续的映射,从而导致CRLAT机制。以RDAFB为基础,我们为动态场景Deblurring设计了一个名为RDAFNET的有效体系结构。基准数据集上的实验表明,所提出的模型的表现优于最先进的脱毛方法,并证明了CCLAT机制的有效性。源代码可在以下网址提供:https://github.com/xjmz6/rdafnet。
translated by 谷歌翻译
图像运动模糊通常是由于移动物体或摄像头摇动而导致的。这种模糊通常是方向性的,不均匀。先前的研究工作试图通过使用自我注意力的自我次数多尺度或多斑架构来解决非均匀的模糊。但是,使用自我电流框架通常会导致更长的推理时间,而像素间或通道间的自我注意力可能会导致过度记忆使用。本文提出了模糊的注意力网络(BANET),该网络通过单个正向通行证完成了准确有效的脱脂。我们的Banet利用基于区域的自我注意力,并通过多内核条池汇总到不同程度的模糊模式,并具有级联的平行扩张卷积,以汇总多尺度内容特征。关于GoPro和Hide基准的广泛实验结果表明,所提出的班轮在模糊的图像修复中表现出色,并可以实时提供Deblurred结果。
translated by 谷歌翻译
We present a new end-to-end generative adversarial network (GAN) for single image motion deblurring, named DeblurGAN-v2, which considerably boosts state-of-the-art deblurring efficiency, quality, and flexibility. DeblurGAN-v2 is based on a relativistic conditional GAN with a doublescale discriminator. For the first time, we introduce the Feature Pyramid Network into deblurring, as a core building block in the generator of DeblurGAN-v2. It can flexibly work with a wide range of backbones, to navigate the balance between performance and efficiency. The plugin of sophisticated backbones (e.g., Inception-ResNet-v2) can lead to solid state-of-the-art deblurring. Meanwhile, with light-weight backbones (e.g., MobileNet and its variants), DeblurGAN-v2 reaches 10-100 times faster than the nearest competitors, while maintaining close to state-ofthe-art results, implying the option of real-time video deblurring. We demonstrate that DeblurGAN-v2 obtains very competitive performance on several popular benchmarks, in terms of deblurring quality (both objective and subjective), as well as efficiency. Besides, we show the architecture to be effective for general image restoration tasks too. Our codes, models and data are available at: https: //github.com/KupynOrest/DeblurGANv2.
translated by 谷歌翻译
在本文中,我们研究了现实世界图像脱毛的问题,并考虑了改善深度图像脱布模型的性能的两个关键因素,即培训数据综合和网络体系结构设计。经过现有合成数据集训练的脱毛模型在由于域移位引起的真实模糊图像上的表现较差。为了减少合成和真实域之间的域间隙,我们提出了一种新颖的现实模糊合成管道来模拟摄像机成像过程。由于我们提出的合成方法,可以使现有的Deblurring模型更强大,以处理现实世界的模糊。此外,我们开发了一个有效的脱蓝色模型,该模型同时捕获特征域中的非本地依赖性和局部上下文。具体而言,我们将多路径变压器模块介绍给UNET架构,以进行丰富的多尺度功能学习。在三个现实世界数据集上进行的全面实验表明,所提出的Deblurring模型的性能优于最新方法。
translated by 谷歌翻译
在本文中,我们介绍了一种快速运动脱棕色条件的生成对抗网络(FMD-CGAN),其有助于单个图像的盲运动去纹理。 FMD-CGAN在去修改图像后提供令人印象深刻的结构相似性和视觉外观。与其他深度神经网络架构一样,GAN也遭受大型模型大小(参数)和计算。在诸如移动设备和机器人等资源约束设备上部署模型并不容易。借助MobileNet基于MobileNet的架构,包括深度可分离卷积,我们降低了模型大小和推理时间,而不会丢失图像的质量。更具体地说,我们将模型大小与最近的竞争对手相比将3-60倍。由此产生的压缩去掩盖CGAN比其最接近的竞争对手更快,甚至定性和定量结果优于各种最近提出的最先进的盲运动去误紧模型。我们还可以使用我们的模型进行实时映像解擦干任务。标准数据集的当前实验显示了该方法的有效性。
translated by 谷歌翻译
最近的变形金刚和多层Perceptron(MLP)模型的进展为计算机视觉任务提供了新的网络架构设计。虽然这些模型在许多愿景任务中被证明是有效的,但在图像识别之类的愿景中,仍然存在挑战,使他们适应低级视觉。支持高分辨率图像和本地注意力的局限性的不灵活性可能是使用变压器和MLP在图像恢复中的主要瓶颈。在这项工作中,我们介绍了一个多轴MLP基于MARIC的架构,称为Maxim,可用作用于图像处理任务的高效和灵活的通用视觉骨干。 Maxim使用UNET形的分层结构,并支持由空间门控MLP启用的远程交互。具体而言,Maxim包含两个基于MLP的构建块:多轴门控MLP,允许局部和全球视觉线索的高效和可扩展的空间混合,以及交叉栅栏,替代跨关注的替代方案 - 细分互补。这两个模块都仅基于MLP,而且还受益于全局和“全卷积”,两个属性对于图像处理是可取的。我们广泛的实验结果表明,所提出的Maxim模型在一系列图像处理任务中实现了十多个基准的最先进的性能,包括去噪,失败,派热,脱落和增强,同时需要更少或相当的数量参数和拖鞋而不是竞争模型。
translated by 谷歌翻译
Image restoration tasks demand a complex balance between spatial details and high-level contextualized information while recovering images. In this paper, we propose a novel synergistic design that can optimally balance these competing goals. Our main proposal is a multi-stage architecture, that progressively learns restoration functions for the degraded inputs, thereby breaking down the overall recovery process into more manageable steps. Specifically, our model first learns the contextualized features using encoder-decoder architectures and later combines them with a high-resolution branch that retains local information. At each stage, we introduce a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features. A key ingredient in such a multi-stage architecture is the information exchange between different stages. To this end, we propose a twofaceted approach where the information is not only exchanged sequentially from early to late stages, but lateral connections between feature processing blocks also exist to avoid any loss of information. The resulting tightly interlinked multi-stage architecture, named as MPRNet, delivers strong performance gains on ten datasets across a range of tasks including image deraining, deblurring, and denoising. The source code and pre-trained models are available at https://github.com/swz30/MPRNet.
translated by 谷歌翻译
This paper tackles the problem of motion deblurring of dynamic scenes. Although end-to-end fully convolutional designs have recently advanced the state-of-the-art in nonuniform motion deblurring, their performance-complexity trade-off is still sub-optimal. Existing approaches achieve a large receptive field by increasing the number of generic convolution layers and kernel-size, but this comes at the expense of of the increase in model size and inference speed. In this work, we propose an efficient pixel adaptive and feature attentive design for handling large blur variations across different spatial locations and process each test image adaptively. We also propose an effective content-aware global-local filtering module that significantly improves performance by considering not only global dependencies but also by dynamically exploiting neighboring pixel information. We use a patch-hierarchical attentive architecture composed of the above module that implicitly discovers the spatial variations in the blur present in the input image and in turn, performs local and global modulation of intermediate features. Extensive qualitative and quantitative comparisons with prior art on deblurring benchmarks demonstrate that our design offers significant improvements over the state-of-the-art in accuracy as well as speed.
translated by 谷歌翻译
在极低光线条件下捕获图像会对标准相机管道带来重大挑战。图像变得太黑了,太吵了,这使得传统的增强技术几乎不可能申请。最近,基于学习的方法已经为此任务显示了非常有希望的结果,因为它们具有更大的表现力能力来允许提高质量。这些研究中的激励,在本文中,我们的目标是利用爆破摄影来提高性能,并从极端暗的原始图像获得更加锐利和更准确的RGB图像。我们提出的框架的骨干是一种新颖的粗良好网络架构,逐步产生高质量的输出。粗略网络预测了低分辨率,去噪的原始图像,然后将其馈送到精细网络以恢复微尺的细节和逼真的纹理。为了进一步降低噪声水平并提高颜色精度,我们将该网络扩展到置换不变结构,使得它作为输入突发为低光图像,并在特征级别地合并来自多个图像的信息。我们的实验表明,我们的方法通过生产更详细和相当更高的质量的图像来引起比最先进的方法更令人愉悦的结果。
translated by 谷歌翻译
Multi-Scale and U-shaped Networks are widely used in various image restoration problems, including deblurring. Keeping in mind the wide range of applications, we present a comparison of these architectures and their effects on image deblurring. We also introduce a new block called as NFResblock. It consists of a Fast Fourier Transformation layer and a series of modified Non-Linear Activation Free Blocks. Based on these architectures and additions, we introduce NFResnet and NFResnet+, which are modified multi-scale and U-Net architectures, respectively. We also use three different loss functions to train these architectures: Charbonnier Loss, Edge Loss, and Frequency Reconstruction Loss. Extensive experiments on the Deep Video Deblurring dataset, along with ablation studies for each component, have been presented in this paper. The proposed architectures achieve a considerable increase in Peak Signal to Noise (PSNR) ratio and Structural Similarity Index (SSIM) value.
translated by 谷歌翻译
尽管运动补偿大大提高了视频质量,但单独执行运动补偿和视频脱张需要大量的计算开销。本文提出了一个实时视频Deblurring框架,该框架由轻巧的多任务单元组成,该单元以有效的方式支持视频脱张和运动补偿。多任务单元是专门设计的,用于使用单个共享网络处理两个任务的大部分,并由多任务详细网络和简单的网络组成,用于消除和运动补偿。多任务单元最大程度地减少了将运动补偿纳入视频Deblurring的成本,并实现了实时脱毛。此外,通过堆叠多个多任务单元,我们的框架在成本和过度质量之间提供了灵活的控制。我们通过实验性地验证了方法的最先进的质量,与以前的方法相比,该方法的运行速度要快得多,并显示了实时的实时性能(在DVD数据集中测量了30.99db@30fps)。
translated by 谷歌翻译
Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image superresolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.
translated by 谷歌翻译
本文铲球动态场景去模糊的问题。虽然终端到终端的全卷积的设计最近提出的国家的最先进的非匀速运动去模糊,他们的表现复杂的权衡仍是次优的。现有的方法在普通卷积层,内核尺寸的数量,来与模型的大小和推理速度的增加的负担,一个简单的增量实现大的感受野。在这项工作中,我们提出了一个有效的像素适应并配内和跨不同的图像处理大量的模糊变化周到的设计。我们还提出了一种有效的内容感知全局 - 局部滤波模块通过不仅考虑像素的全局依赖关系还动态使用相邻像素是显著提高性能。我们使用上述模块构成的补丁分层架构周到隐式地发现存在于所述输入图像并依次模糊的空间变化进行的中间特征局部和全局调制。与现有技术的上去模糊基准广泛的定性和定量的比较表明了该网络的优越性。
translated by 谷歌翻译
模糊文物可以严重降低图像的视觉质量,并且已经提出了许多用于特定场景的脱模方法。然而,在大多数现实世界的图像中,模糊是由不同因素引起的,例如运动和散焦。在本文中,我们解决了不同的去纹身方法如何在一般类型的模糊上进行。对于深入的性能评估,我们构建一个名为(MC-Blur)的新型大规模的多个原因图像去孔数据集,包括现实世界和合成模糊图像,具有模糊的混合因素。采用不同的技术收集所提出的MC-Blur数据集中的图像:卷积超高清(UHD)具有大核的锐利图像,平均由1000 FPS高速摄像头捕获的清晰图像,向图像添加Defocus,而且真实-world模糊的图像由各种相机型号捕获。这些结果概述了当前的去纹理方法的优缺点。此外,我们提出了一种新的基线模型,适应多种模糊的原因。通过包括对不同程度的特征的不同重量,所提出的网络导出更强大的特征,重量分配给更重要的水平,从而增强了特征表示。新数据集上的广泛实验结果展示了多原因模糊情景所提出的模型的有效性。
translated by 谷歌翻译
用于深度卷积神经网络的视频插值的现有方法,因此遭受其内在限制,例如内部局限性核心权重和受限制的接收领域。为了解决这些问题,我们提出了一种基于变换器的视频插值框架,允许内容感知聚合权重,并考虑具有自我关注操作的远程依赖性。为避免全球自我关注的高计算成本,我们将当地注意的概念引入视频插值并将其扩展到空间域。此外,我们提出了一个节省时间的分离策略,以节省内存使用,这也提高了性能。此外,我们开发了一种多尺度帧合成方案,以充分实现变压器的潜力。广泛的实验证明了所提出的模型对最先进的方法来说,定量和定性地在各种基准数据集上进行定量和定性。
translated by 谷歌翻译
在本文中,我们考虑了Defocus图像去缩合中的问题。以前的经典方法遵循两步方法,即首次散焦映射估计,然后是非盲目脱毛。在深度学习时代,一些研究人员试图解决CNN的这两个问题。但是,代表模糊级别的Defocus图的简单串联导致了次优性能。考虑到Defocus Blur的空间变体特性和Defocus Map中指示的模糊级别,我们采用Defocus Map作为条件指导来调整输入模糊图像而不是简单串联的特征。然后,我们提出了一个基于Defocus图的空间调制的简单但有效的网络。为了实现这一目标,我们设计了一个由三个子网络组成的网络,包括DeFocus Map估计网络,该网络将DeFocus Map编码为条件特征的条件网络以及根据条件功能执行空间动态调制的DeFocus Deblurring网络。此外,空间动态调制基于仿射变换函数,以调整输入模糊图像的特征。实验结果表明,与常用的公共测试数据集中的现有最新方法相比,我们的方法可以实现更好的定量和定性评估性能。
translated by 谷歌翻译