在过去十年中,深度学习的开花目睹了现场文本识别的快速发展。然而,识别低分辨率场景文本图像仍然是一个挑战。尽管已经提出了一些超分辨率的方法来解决这个问题,但它们通常将文本图像视为一般图像,同时忽略了中风的视觉质量(文本原子单位)的事实扮演文本识别的重要作用。根据Gestalt心理学,人类能够将部分细节构成为先前知识所指导的最相似的物体。同样,当人类观察低分辨率文本图像时,它们将本质上使用部分笔划级细节来恢复整体字符的外观。灵感来自Gestalt心理学,我们提出了一个中风感知的场景文本图像超分辨率方法,其中包含带有冲程的模块(SFM),专注于文本图像中的字符的行程级内部结构。具体而言,我们尝试设计用于在笔划级别分解英语字符和数字的规则,然后预先列车文本识别器以提供笔划级注意映射作为位置线索,目的是控制所生成的超分辨率图像之间的一致性和高分辨率的地面真相。广泛的实验结果验证了所提出的方法确实可以在Textoom和手动构建中文字符数据集DegraDed-IC13上生成更可区分的图像。此外,由于所提出的SFM仅用于在训练时提供笔划级别指导,因此在测试阶段不会带来任何时间开销。代码可在https://github.com/fudanvi/fudanocr/tree/main/text -GETALT中获得。
translated by 谷歌翻译
近年来,深入学习的蓬勃发展的开花目睹了文本认可的快速发展。但是,现有的文本识别方法主要用于英语文本,而忽略中文文本的关键作用。作为另一种广泛的语言,中文文本识别各种方式​​都有广泛的应用市场。根据我们的观察,我们将稀缺关注缺乏对缺乏合理的数据集建设标准,统一评估方法和现有基线的结果。为了填补这一差距,我们手动收集来自公开的竞争,项目和论文的中文文本数据集,然后将它们分为四类,包括场景,网络,文档和手写数据集。此外,我们在这些数据集中评估了一系列代表性的文本识别方法,具有统一的评估方法来提供实验结果。通过分析实验结果,我们令人惊讶地观察到识别英语文本的最先进的基线不能很好地表现出对中国情景的良好。由于中国文本的特征,我们认为仍然存在众多挑战,这与英文文本完全不同。代码和数据集在https://github.com/fudanvi/benchmarking-chinese-text-recognition中公开使用。
translated by 谷歌翻译
Face Restoration (FR) aims to restore High-Quality (HQ) faces from Low-Quality (LQ) input images, which is a domain-specific image restoration problem in the low-level computer vision area. The early face restoration methods mainly use statistic priors and degradation models, which are difficult to meet the requirements of real-world applications in practice. In recent years, face restoration has witnessed great progress after stepping into the deep learning era. However, there are few works to study deep learning-based face restoration methods systematically. Thus, this paper comprehensively surveys recent advances in deep learning techniques for face restoration. Specifically, we first summarize different problem formulations and analyze the characteristic of the face image. Second, we discuss the challenges of face restoration. Concerning these challenges, we present a comprehensive review of existing FR methods, including prior based methods and deep learning-based methods. Then, we explore developed techniques in the task of FR covering network architectures, loss functions, and benchmark datasets. We also conduct a systematic benchmark evaluation on representative methods. Finally, we discuss future directions, including network designs, metrics, benchmark datasets, applications,etc. We also provide an open-source repository for all the discussed methods, which is available at https://github.com/TaoWangzj/Awesome-Face-Restoration.
translated by 谷歌翻译
Informative features play a crucial role in the single image super-resolution task. Channel attention has been demonstrated to be effective for preserving information-rich features in each layer. However, channel attention treats each convolution layer as a separate process that misses the correlation among different layers. To address this problem, we propose a new holistic attention network (HAN), which consists of a layer attention module (LAM) and a channel-spatial attention module (CSAM), to model the holistic interdependencies among layers, channels, and positions. Specifically, the proposed LAM adaptively emphasizes hierarchical features by considering correlations among layers. Meanwhile, CSAM learns the confidence at all the positions of each channel to selectively capture more informative features. Extensive experiments demonstrate that the proposed HAN performs favorably against the state-ofthe-art single image super-resolution approaches.
translated by 谷歌翻译
面部超分辨率(FSR),也称为面部幻觉,其旨在增强低分辨率(LR)面部图像以产生高分辨率(HR)面部图像的分辨率,是特定于域的图像超分辨率问题。最近,FSR获得了相当大的关注,并目睹了深度学习技术的发展炫目。迄今为止,有很少有基于深入学习的FSR的研究摘要。在本次调查中,我们以系统的方式对基于深度学习的FSR方法进行了全面审查。首先,我们总结了FSR的问题制定,并引入了流行的评估度量和损失功能。其次,我们详细说明了FSR中使用的面部特征和流行数据集。第三,我们根据面部特征的利用大致分类了现有方法。在每个类别中,我们从设计原则的一般描述开始,然后概述代表方法,然后讨论其中的利弊。第四,我们评估了一些最先进的方法的表现。第五,联合FSR和其他任务以及与FSR相关的申请大致介绍。最后,我们设想了这一领域进一步的技术进步的前景。在\ URL {https://github.com/junjun-jiang/face-hallucination-benchmark}上有一个策划的文件和资源的策划文件和资源清单
translated by 谷歌翻译
Recently, Transformer-based image restoration networks have achieved promising improvements over convolutional neural networks due to parameter-independent global interactions. To lower computational cost, existing works generally limit self-attention computation within non-overlapping windows. However, each group of tokens are always from a dense area of the image. This is considered as a dense attention strategy since the interactions of tokens are restrained in dense regions. Obviously, this strategy could result in restricted receptive fields. To address this issue, we propose Attention Retractable Transformer (ART) for image restoration, which presents both dense and sparse attention modules in the network. The sparse attention module allows tokens from sparse areas to interact and thus provides a wider receptive field. Furthermore, the alternating application of dense and sparse attention modules greatly enhances representation ability of Transformer while providing retractable attention on the input image.We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks. Experimental results validate that our proposed ART outperforms state-of-the-art methods on various benchmark datasets both quantitatively and visually. We also provide code and models at the website https://github.com/gladzhang/ART.
translated by 谷歌翻译
深度神经网络极大地促进了单图超分辨率(SISR)的性能。传统方法仍然仅基于图像模态的输入来恢复单个高分辨率(HR)解决方案。但是,图像级信息不足以预测大型展望因素面临的足够细节和光真逼真的视觉质量(x8,x16)。在本文中,我们提出了一种新的视角,将SISR视为语义图像详细信息增强问题,以产生忠于地面真理的语义合理的HR图像。为了提高重建图像的语义精度和视觉质量,我们通过提出文本指导的超分辨率(TGSR)框架来探索SISR中的多模式融合学习,该框架可以从文本和图像模态中有效地利用信息。与现有方法不同,提出的TGSR可以生成通过粗到精细过程匹配文本描述的HR图像详细信息。广泛的实验和消融研究证明了TGSR的效果,该效果利用文本参考来恢复逼真的图像。
translated by 谷歌翻译
近年来,在光场(LF)图像超分辨率(SR)中,深度神经网络(DNN)的巨大进展。但是,现有的基于DNN的LF图像SR方法是在单个固定降解(例如,双学的下采样)上开发的,因此不能应用于具有不同降解的超级溶解实际LF图像。在本文中,我们提出了第一种处理具有多个降解的LF图像SR的方法。在我们的方法中,开发了一个实用的LF降解模型,以近似于真实LF图像的降解过程。然后,降解自适应网络(LF-DANET)旨在将降解之前纳入SR过程。通过对具有多种合成降解的LF图像进行训练,我们的方法可以学会适应不同的降解,同时结合了空间和角度信息。对合成降解和现实世界LFS的广泛实验证明了我们方法的有效性。与现有的最新单一和LF图像SR方法相比,我们的方法在广泛的降解范围内实现了出色的SR性能,并且可以更好地推广到真实的LF图像。代码和模型可在https://github.com/yingqianwang/lf-danet上找到。
translated by 谷歌翻译
Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. Firstly, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers.
translated by 谷歌翻译
汉字带有大量的形态和语义信息;因此,汉字形态的语义增强引起了极大的关注。先前的方法旨在直接从整个汉字图像中提取信息,这些图像通常无法同时捕获全球和本地信息。在本文中,我们开发了一种基于中风的自动编码器(SAE),以用自我监督的方法对汉字的复杂形态进行建模。按照其规范写作顺序,我们首先将汉字作为一系列带有固定写作顺序的中风图像,然后我们的SAE模型经过训练以重建此中风图像序列。只要训练集中出现这种预训练的SAE模型,只要它们的中风或激进分出现在看不见的字符中。我们在不同形式的中风图像上设计了两个对比的SAE架构。一种是对现有基于中风的方法进行微调的,用于零拍识别手写的汉字,另一个用于从其形态特征中富含中文单词的嵌入。实验结果证明,在预训练之后,我们的SAE架构以零拍的识别优于其他现有方法,并以其丰富的形态和语义信息增强了汉字的表示。
translated by 谷歌翻译
Convolutional Neural Network (CNN)-based image super-resolution (SR) has exhibited impressive success on known degraded low-resolution (LR) images. However, this type of approach is hard to hold its performance in practical scenarios when the degradation process is unknown. Despite existing blind SR methods proposed to solve this problem using blur kernel estimation, the perceptual quality and reconstruction accuracy are still unsatisfactory. In this paper, we analyze the degradation of a high-resolution (HR) image from image intrinsic components according to a degradation-based formulation model. We propose a components decomposition and co-optimization network (CDCN) for blind SR. Firstly, CDCN decomposes the input LR image into structure and detail components in feature space. Then, the mutual collaboration block (MCB) is presented to exploit the relationship between both two components. In this way, the detail component can provide informative features to enrich the structural context and the structure component can carry structural context for better detail revealing via a mutual complementary manner. After that, we present a degradation-driven learning strategy to jointly supervise the HR image detail and structure restoration process. Finally, a multi-scale fusion module followed by an upsampling layer is designed to fuse the structure and detail features and perform SR reconstruction. Empowered by such degradation-based components decomposition, collaboration, and mutual optimization, we can bridge the correlation between component learning and degradation modelling for blind SR, thereby producing SR results with more accurate textures. Extensive experiments on both synthetic SR datasets and real-world images show that the proposed method achieves the state-of-the-art performance compared to existing methods.
translated by 谷歌翻译
当前的深层图像超分辨率(SR)方法试图从下采样的图像或假设简单高斯内核和添加噪声中降解来恢复高分辨率图像。但是,这种简单的图像处理技术代表了降低图像分辨率的现实世界过程的粗略近似。在本文中,我们提出了一个更现实的过程,通过引入新的内核对抗学习超分辨率(KASR)框架来处理现实世界图像SR问题,以降低图像分辨率。在提议的框架中,降解内核和噪声是自适应建模的,而不是明确指定的。此外,我们还提出了一个迭代监督过程和高频选择性目标,以进一步提高模型SR重建精度。广泛的实验验证了对现实数据集中提出的框架的有效性。
translated by 谷歌翻译
盲级超分辨率(SR)旨在从低分辨率(LR)图像中恢复高质量的视觉纹理,通常通过下采样模糊内核和添加剂噪声来降解。由于现实世界中复杂的图像降解的挑战,此任务非常困难。现有的SR方法要么假定预定义的模糊内核或固定噪声,这限制了这些方法在具有挑战性的情况下。在本文中,我们提出了一个用于盲目超级分辨率(DMSR)的降解引导的元修复网络,该网络促进了真实病例的图像恢复。 DMSR由降解提取器和元修复模块组成。萃取器估计LR输入中的降解,并指导元恢复模块以预测恢复参数的恢复参数。 DMSR通过新颖的降解一致性损失和重建损失共同优化。通过这样的优化,DMSR在三个广泛使用的基准上以很大的边距优于SOTA。一项包括16个受试者的用户研究进一步验证了现实世界中的盲目SR任务中DMSR的优势。
translated by 谷歌翻译
在恢复低分辨率灰度图像的实际应用中,我们通常需要为目标设备运行三个单独的图像着色,超分辨率和Dows采样操作。但是,该管道对于独立进程是冗余的并且低效,并且可以共享一些内部特征。因此,我们提出了一种有效的范例来执行{s} {s} {c} olorization和{s} Uper分辨率(SCS),并提出了端到端的SCSNet来实现这一目标。该方法由两部分组成:用于学习颜色信息的彩色分支,用于采用所提出的即插即用\ EMPH {金字塔阀跨关注}(PVCATTN)模块来聚合源和参考图像之间的特征映射;和超分辨率分支集成颜色和纹理信息以预测使用设计的\ emph {连续像素映射}(CPM)模块的目标图像来预测连续放大率的高分辨率图像。此外,我们的SCSNet支持对实际应用更灵活的自动和参照模式。丰富的实验证明了我们通过最先进的方法生成真实图像的方法的优越性,例如,平均降低了1.8 $ \ Depararrow $和5.1 $ \ Downarrow $相比,与自动和参照模式的最佳分数相比,分别在拥有更少的参数(超过$ \ \倍$ 2 $ \ dovearrow $)和更快的运行速度(超过$ \ times $ 3 $ \ Uprarow $)。
translated by 谷歌翻译
在过去的几年中,目睹了基于无人机的应用,计算机视觉起着至关重要的作用。但是,大多数基于公共无人机的视力数据集都集中在检测和跟踪上。另一方面,大多数现有图像超分辨率方法的性能对数据集敏感,特别是高分辨率和低分辨率图像之间的退化模型。在本文中,我们提出了第一个用于无人机视觉的超分辨率数据集。图像对由具有不同焦距的无人机上的两个摄像机捕获。我们在不同的高度收集数据,然后提出预处理步骤以对齐图像对。广泛的经验研究表明,在不同高度捕获的图像之间存在域间隙。同时,经过验证的图像超分辨率网络的性能在我们的数据集上也有所下降,并且海拔不同。最后,我们提出了两种方法,以在不同高度建立强大的图像超分辨率网络。第一个通过高度感知的层将高度信息馈送到网络中。第二个使用单次学习来快速使超分辨率模型适应未知高度。我们的结果表明,所提出的方法可以有效地提高不同海拔高度的超分辨率网络的性能。
translated by 谷歌翻译
注意机制已成为场景文本识别方法(STR)方法中的事实上的模块,因为它有能力提取字符级表示。可以将这些方法汇总到基于隐性注意力的基于隐性的注意力和受监督的注意力中,取决于如何计算注意力,即分别从序列级别的文本注释和字符级别的边界框注释中学到隐性注意和监督注意力。隐含的注意力可能会提取出粗略甚至不正确的空间区域作为性格的注意,这很容易受到对齐拖延问题的困扰。受到监督的注意力可以减轻上述问题,但它是特定于类别的问题,它需要额外费力的角色级边界框注释,并且当角色类别的数量较大时,将是记忆密集的。为了解决上述问题,我们提出了一种新型的关注机制,用于STR,自我保护的隐式字形注意力(SIGA)。 Siga通过共同自我监督的文本分割和隐性注意对准来描述文本图像的字形结构,这些文本分割和隐性注意对准可以作为监督,以提高注意力正确性,而无需额外的角色级注释。实验结果表明,就注意力正确性和最终识别性能而言,SIGA的性能始终如一地比以前的基于注意力的STR方法更好,并且在公开可用的上下文基准上以及我们的无上下文基准。
translated by 谷歌翻译
现实世界图像超分辨率(SR)的关键挑战是在低分辨率(LR)图像中恢复具有复杂未知降解(例如,下采样,噪声和压缩)的缺失细节。大多数以前的作品还原图像空间中的此类缺失细节。为了应对自然图像的高度多样性,他们要么依靠难以训练和容易训练和伪影的不稳定的甘体,要么诉诸于通常不可用的高分辨率(HR)图像中的明确参考。在这项工作中,我们提出了匹配SR(FEMASR)的功能,该功能在更紧凑的特征空间中恢复了现实的HR图像。与图像空间方法不同,我们的FEMASR通过将扭曲的LR图像{\ IT特征}与我们预读的HR先验中的无失真性HR对应物匹配来恢复HR图像,并解码匹配的功能以获得现实的HR图像。具体而言,我们的人力资源先验包含一个离散的特征代码簿及其相关的解码器,它们在使用量化的生成对抗网络(VQGAN)的HR图像上预估计。值得注意的是,我们在VQGAN中结合了一种新型的语义正则化,以提高重建图像的质量。对于功能匹配,我们首先提取由LR编码器组成的LR编码器的LR功能,然后遵循简单的最近邻居策略,将其与预读的代码簿匹配。特别是,我们为LR编码器配备了与解码器的残留快捷方式连接,这对于优化功能匹配损耗至关重要,还有助于补充可能的功能匹配错误。实验结果表明,我们的方法比以前的方法产生更现实的HR图像。代码以\ url {https://github.com/chaofengc/femasr}发布。
translated by 谷歌翻译
Image super-resolution (SR) serves as a fundamental tool for the processing and transmission of multimedia data. Recently, Transformer-based models have achieved competitive performances in image SR. They divide images into fixed-size patches and apply self-attention on these patches to model long-range dependencies among pixels. However, this architecture design is originated for high-level vision tasks, which lacks design guideline from SR knowledge. In this paper, we aim to design a new attention block whose insights are from the interpretation of Local Attribution Map (LAM) for SR networks. Specifically, LAM presents a hierarchical importance map where the most important pixels are located in a fine area of a patch and some less important pixels are spread in a coarse area of the whole image. To access pixels in the coarse area, instead of using a very large patch size, we propose a lightweight Global Pixel Access (GPA) module that applies cross-attention with the most similar patch in an image. In the fine area, we use an Intra-Patch Self-Attention (IPSA) module to model long-range pixel dependencies in a local patch, and then a $3\times3$ convolution is applied to process the finest details. In addition, a Cascaded Patch Division (CPD) strategy is proposed to enhance perceptual quality of recovered images. Extensive experiments suggest that our method outperforms state-of-the-art lightweight SR methods by a large margin. Code is available at https://github.com/passerer/HPINet.
translated by 谷歌翻译
突发超级分辨率(SR)提供了从低质量图像恢复丰富细节的可能性。然而,由于实际应用中的低分辨率(LR)图像具有多种复杂和未知的降级,所以现有的非盲(例如,双臂)设计的网络通常导致恢复高分辨率(HR)图像的严重性能下降。此外,处理多重未对准的嘈杂的原始输入也是具有挑战性的。在本文中,我们解决了从现代手持设备获取的原始突发序列重建HR图像的问题。中央观点是一个内核引导策略,可以用两个步骤解决突发SR:内核建模和HR恢复。前者估计来自原始输入的突发内核,而后者基于估计的内核预测超分辨图像。此外,我们引入了内核感知可变形对准模块,其可以通过考虑模糊的前沿而有效地对准原始图像。对综合和现实世界数据集的广泛实验表明,所提出的方法可以在爆发SR问题中对最先进的性能进行。
translated by 谷歌翻译
Reference-based Super-resolution (RefSR) approaches have recently been proposed to overcome the ill-posed problem of image super-resolution by providing additional information from a high-resolution image. Multi-reference super-resolution extends this approach by allowing more information to be incorporated. This paper proposes a 2-step-weighting posterior fusion approach to combine the outputs of RefSR models with multiple references. Extensive experiments on the CUFED5 dataset demonstrate that the proposed methods can be applied to various state-of-the-art RefSR models to get a consistent improvement in image quality.
translated by 谷歌翻译