智能论文笔记

Image-based Automatic Dial Meter Reading in Unconstrained Scenarios

Gabriel Salomon , Rayson Laroca , David Menotti

分类：计算机视觉

2022-01-08

更换具有智能电表的模拟仪表昂贵，艰巨，远非完全在发展中国家。ParaNa（Copel）（巴西）的能源公司每月执行超过400万米的读数（几乎完全是非智能设备），我们估计其中850万人来自拨号米。因此，基于图像的自动读取系统可以减少人类错误，创建读取证明，并使客户能够通过移动应用程序执行读取本身。我们提出了用于自动拨号抄表（ADMR）的新方法，并在不约束场景中引入ADMR的新数据集，称为UFPR-ADMR-V2。我们的最佳方法将YOLOV4与新的回归方法（ANGREG）结合起来，探讨了几种后处理技术。与以前的作品相比，它降低了1,343至129的平均绝对误差（MAE），并实现了98.90％的仪表识别率（MRR） - 误差容差为1千瓦时（千瓦时）。

translated by 谷歌翻译

On the Cross-dataset Generalization for License Plate Recognition

Rayson Laroca , Everton V. Cardoso , Diego R. Lucio , Valter Estevam , David Menotti

分类：计算机视觉

2022-01-02

由于深度学习的进步和数据集的增加，自动许可证板识别（ALPR）系统对来自多个区域的牌照（LPS）的表现显着。对深度ALPR系统的评估通常在每个数据集内完成;因此，如果这种结果是泛化能力的可靠指标，则是可疑的。在本文中，我们提出了一种传统分配的与休假 - 单数据集实验设置，以统一地评估12个光学字符识别（OCR）模型的交叉数据集泛化，其在九个公共数据集上应用于LP识别，具有良好的品种在若干方面（例如，获取设置，图像分辨率和LP布局）。我们还介绍了一个用于端到端ALPR的公共数据集，这是第一个包含带有Mercosur LP的车辆的图像和摩托车图像数量最多的图像。实验结果揭示了传统分离协议的局限性，用于评估ALPR上下文中的方法，因为在训练和测试休假时，大多数数据集在大多数数据集中的性能显着下降。

translated by 谷歌翻译

Proceedings of the 3rd International Workshop on Reading Music Systems

Jorge Calvo-Zaragoza , Alexander Pacha

分类：计算机视觉 | 机器学习

2022-12-01

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.

translated by 谷歌翻译

Proceedings of the 2nd International Workshop on Reading Music Systems

Jorge Calvo-Zaragoza , Alexander Pacha

分类：计算机视觉 | 机器学习

2022-12-01

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 2nd International Workshop on Reading Music Systems, held in Delft on the 2nd of November 2019.

translated by 谷歌翻译

Computer Vision on X-ray Data in Industrial Production and Security Applications: A survey

Mehdi Rafiei , Jenni Raitoharju , Alexandros Iosifidis

分类：计算机视觉

2022-11-10

X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.

translated by 谷歌翻译

Towards End-to-end Car License Plate Location and Recognition in Unconstrained Scenarios

Shuxin Qin , Sijiang Liu

分类：计算机视觉 | 人工智能 | 机器学习

2020-08-25

从卷积神经网络的快速发展中受益，汽车牌照检测和识别的性能得到了很大的改善。但是，大多数现有方法分别解决了检测和识别问题，并专注于特定方案，这阻碍了现实世界应用的部署。为了克服这些挑战，我们提出了一个有效而准确的框架，以同时解决车牌检测和识别任务。这是一个轻巧且统一的深神经网络，可以实时优化端到端。具体而言，对于不受约束的场景，采用了无锚方法来有效检测车牌的边界框和四个角，这些框用于提取和纠正目标区域特征。然后，新型的卷积神经网络分支旨在进一步提取角色的特征而不分割。最后，将识别任务视为序列标记问题，这些问题通过连接派时间分类（CTC）解决。选择了几个公共数据集，包括在各种条件下从不同方案中收集的图像进行评估。实验结果表明，所提出的方法在速度和精度上都显着优于先前的最新方法。

translated by 谷歌翻译

Two Decades of Bengali Handwritten Digit Recognition: A Survey

A. B. M. Ashikur Rahman , Md. Bakhtiar Hasan , Sabbir Ahmed , Tasnim Ahmed , Md. Hamjajul Ashmafee , Mohammad Ridwan Kabir , Md. Hasanul Kabir

分类：计算机视觉

2022-06-05

手写数字识别（HDR）是光学特征识别（OCR）领域中最具挑战性的任务之一。不管语言如何，HDR都存在一些固有的挑战，这主要是由于个人跨个人的写作风格的变化，编写媒介和环境的变化，无法在反复编写任何数字等时保持相同的笔触。除此之外，特定语言数字的结构复杂性可能会导致HDR的模棱两可。多年来，研究人员开发了许多离线和在线HDR管道，其中不同的图像处理技术与传统的机器学习（ML）基于基于的和/或基于深度学习（DL）的体系结构相结合。尽管文献中存在有关HDR的广泛审查研究的证据，例如：英语，阿拉伯语，印度，法尔西，中文等，但几乎没有对孟加拉人HDR（BHDR）的调查，这缺乏对孟加拉语HDR（BHDR）的研究，而这些调查缺乏对孟加拉语HDR（BHDR）的研究。挑战，基础识别过程以及可能的未来方向。在本文中，已经分析了孟加拉语手写数字的特征和固有的歧义，以及二十年来最先进的数据集的全面见解和离线BHDR的方法。此外，还详细讨论了一些涉及BHDR的现实应用特定研究。本文还将作为对离线BHDR背后科学感兴趣的研究人员的汇编，煽动了对相关研究的新途径的探索，这可能会进一步导致在不同应用领域对孟加拉语手写数字进行更好的离线认识。

translated by 谷歌翻译

Ocular Recognition Databases and Competitions: A Survey

Luiz A. Zanlorensi , Rayson Laroca , Eduardo Luz , Alceu S. Britto Jr. , Luiz S. Oliveira , David Menotti

分类：计算机视觉

2019-11-21

已经广泛地研究了使用虹膜和围眼区域作为生物特征，主要是由于虹膜特征的奇异性以及当图像分辨率不足以提取虹膜信息时的奇异区域的使用。除了提供有关个人身份的信息外，还可以探索从这些特征提取的功能，以获得其他信息，例如个人的性别，药物使用的影响，隐形眼镜的使用，欺骗等。这项工作提出了对为眼部识别创建的数据库的调查，详细说明其协议以及如何获取其图像。我们还描述并讨论了最受欢迎的眼镜识别比赛（比赛），突出了所提交的算法，只使用Iris特征和融合虹膜和周边地区信息实现了最佳结果。最后，我们描述了一些相关工程，将深度学习技术应用于眼镜识别，并指出了新的挑战和未来方向。考虑到有大量的眼部数据库，并且每个人通常都设计用于特定问题，我们认为这项调查可以广泛概述眼部生物识别学中的挑战。

translated by 谷歌翻译

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Benjamin Kiefer , Matej Kristan , Janez Perš , Lojze Žust , Fabio Poiesi , Fabio Augusto de Alcantara Andrade , Alexandre Bernardino , Matthew Dawkins , Jenni Raitoharju , Yitong Quan

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2022-11-24

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.

translated by 谷歌翻译

Applications of Deep Learning in Fish Habitat Monitoring: A Tutorial and Survey

Alzayat Saleh , Marcus Sheaves , Dean Jerry , Mostafa Rahimi Azghadi

分类：计算机视觉

2022-06-11

海洋生态系统及其鱼类栖息地越来越重要，因为它们在提供有价值的食物来源和保护效果方面的重要作用。由于它们的偏僻且难以接近自然，因此通常使用水下摄像头对海洋环境和鱼类栖息地进行监测。这些相机产生了大量数字数据，这些数据无法通过当前的手动处理方法有效地分析，这些方法涉及人类观察者。 DL是一种尖端的AI技术，在分析视觉数据时表现出了前所未有的性能。尽管它应用于无数领域，但仍在探索其在水下鱼类栖息地监测中的使用。在本文中，我们提供了一个涵盖DL的关键概念的教程，该教程可帮助读者了解对DL的工作原理的高级理解。该教程还解释了一个逐步的程序，讲述了如何为诸如水下鱼类监测等挑战性应用开发DL算法。此外，我们还提供了针对鱼类栖息地监测的关键深度学习技术的全面调查，包括分类，计数，定位和细分。此外，我们对水下鱼类数据集进行了公开调查，并比较水下鱼类监测域中的各种DL技术。我们还讨论了鱼类栖息地加工深度学习的新兴领域的一些挑战和机遇。本文是为了作为希望掌握对DL的高级了解，通过遵循我们的分步教程而为其应用开发的海洋科学家的教程，并了解如何发展其研究，以促进他们的研究。努力。同时，它适用于希望调查基于DL的最先进方法的计算机科学家，以进行鱼类栖息地监测。

translated by 谷歌翻译

Region-based Layout Analysis of Music Score Images

Francisco J. Castellanos , Carlos Garrido-Munoz , Antonio Ríos-Vila , Jorge Calvo-Zaragoza

分类：计算机视觉

2022-01-11

布局分析（LA）阶段对光学音乐识别（OMR）系统的正确性能至关重要。它标识了感兴趣的区域，例如Staves或歌词，然后必须处理，以便转录它们的内容。尽管存在基于深度学习的现代方法，但在不同模型的精度，它们对不同领域的概括或更重要的是，它们尚未开展对OMR的详尽研究，或者更重要的是，它们对后续阶段的影响管道。这项工作侧重于通过对不同神经结构，音乐文档类型和评估方案的实验研究填补文献中的这种差距。培训数据的需求也导致了一种新的半合成数据生成技术的提议，这使得LA方法在真实情况下能够有效适用性。我们的结果表明：（i）该模型的选择及其性能对于整个转录过程至关重要; （ii）（ii）常用于评估LA阶段的指标并不总是与OMR系统的最终性能相关，并且（iii）所提出的数据生成技术使最先进的结果能够以有限的限制实现标记数据集。

translated by 谷歌翻译

A Survey on Computer Vision based Human Analysis in the COVID-19 Era

Fevziye Irem Eyiokur , Alperen Kantarcı , Mustafa Ekrem Erakın , Naser Damer , Ferda Ofli , Muhammad Imran , Janez Križaj , Albert Ali Salah , Alexander Waibel , Vitomir Štruc

分类：计算机视觉

2022-11-07

The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks. Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given.

translated by 谷歌翻译

PACMAN: a framework for pulse oximeter digit detection and reading in a low-resource setting

Chiraphat Boonnag , Wanumaidah Saengmolee , Narongrid Seesawad , Amrest Chinkamol , Saendee Rattanasomrerk , Kanyakorn Veerakanjana , Kamonwan Thanontip , Warissara Limpornchitwilai , Piyalitt Ittichaiwong , Theerawit Wilaiprasitporn

分类：计算机视觉

2022-12-09

In light of the COVID-19 pandemic, patients were required to manually input their daily oxygen saturation (SpO2) and pulse rate (PR) values into a health monitoring system-unfortunately, such a process trend to be an error in typing. Several studies attempted to detect the physiological value from the captured image using optical character recognition (OCR). However, the technology has limited availability with high cost. Thus, this study aimed to propose a novel framework called PACMAN (Pandemic Accelerated Human-Machine Collaboration) with a low-resource deep learning-based computer vision. We compared state-of-the-art object detection algorithms (scaled YOLOv4, YOLOv5, and YOLOR), including the commercial OCR tools for digit recognition on the captured images from pulse oximeter display. All images were derived from crowdsourced data collection with varying quality and alignment. YOLOv5 was the best-performing model against the given model comparison across all datasets, notably the correctly orientated image dataset. We further improved the model performance with the digits auto-orientation algorithm and applied a clustering algorithm to extract SpO2 and PR values. The accuracy performance of YOLOv5 with the implementations was approximately 81.0-89.5%, which was enhanced compared to without any additional implementation. Accordingly, this study highlighted the completion of PACMAN framework to detect and read digits in real-world datasets. The proposed framework has been currently integrated into the patient monitoring system utilized by hospitals nationwide.

translated by 谷歌翻译

Object Detection with Deep Learning: A Review

Zhong-Qiu Zhao , Peng Zheng , Shou-tao Xu , Xindong Wu

分类：

2018-07-15

Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.

translated by 谷歌翻译

A Survey on Masked Facial Detection Methods and Datasets for Fighting Against COVID-19

Bingshu Wang , Jiangbin Zheng , C. L. Philip Chen

分类：计算机视觉 | 机器学习

2022-01-13

2019年冠状病毒疾病（Covid-19）继续自爆发以来对世界产生巨大挑战。为了对抗这种疾病，开发了一系列人工智能（AI）技术，并应用于现实世界的情景，如安全监测，疾病诊断，感染风险评估，Covid-19 CT扫描的病变细分等。 Coronavirus流行病迫使人们佩戴面膜来抵消病毒的传播，这也带来了监控戴着面具的大群人群的困难。在本文中，我们主要关注蒙面面部检测和相关数据集的AI技术。从蒙面面部检测数据集的描述开始，我们调查了最近的进步。详细描述并详细讨论了十三可用数据集。然后，该方法大致分为两类：传统方法和基于神经网络的方法。常规方法通常通过用手工制作的特征升高算法来训练，该算法占少比例。基于神经网络的方法根据处理阶段的数量进一步归类为三个部分。详细描述了代表性算法，与一些简要描述的一些典型技术耦合。最后，我们总结了最近的基准测试结果，讨论了关于数据集和方法的局限性，并扩大了未来的研究方向。据我们所知，这是关于蒙面面部检测方法和数据集的第一次调查。希望我们的调查可以提供一些帮助对抗流行病的帮助。

translated by 谷歌翻译

Visual and Object Geo-localization: A Comprehensive Survey

Daniel Wilson , Xiaohan Zhang , Waqas Sultani , Safwan Wshah

分类：计算机视觉

2021-12-30

地理定位的概念是指确定地球上的某些“实体”的位置的过程，通常使用全球定位系统（GPS）坐标。感兴趣的实体可以是图像，图像序列，视频，卫星图像，甚至图像中可见的物体。由于GPS标记媒体的大规模数据集由于智能手机和互联网而迅速变得可用，而深入学习已经上升以提高机器学习模型的性能能力，因此由于其显着影响而出现了视觉和对象地理定位的领域广泛的应用，如增强现实，机器人，自驾驶车辆，道路维护和3D重建。本文提供了对涉及图像的地理定位的全面调查，其涉及从捕获图像（图像地理定位）或图像内的地理定位对象（对象地理定位）的地理定位的综合调查。我们将提供深入的研究，包括流行算法的摘要，对所提出的数据集的描述以及性能结果的分析来说明每个字段的当前状态。

translated by 谷歌翻译

An advanced combination of semi-supervised Normalizing Flow & Yolo (YoloNF) to detect and recognize vehicle license plates

Khalid Oublal , Xinyi Dai

分类：计算机视觉 | 人工智能

2022-07-21

由于多个实际应用，全自动车牌识别（ALPR）一直是一个经常研究的主题。但是，在实际情况下，许多当前的解决方案仍然不够强大，通常取决于许多限制。本文提出了一个基于最先进的Yolo对象检测器和标准化流量的强大而有效的ALPR系统。该模型使用两种新策略。首先，使用YOLO的两阶段网络和基于标准化的基于归一化的模型来检测许可板（LP）并识别具有数字和阿拉伯字符的LP。其次，实施了多尺度图像转换，以解决Yolo裁剪LP检测问题的问题，包括明显的背景噪声。此外，在具有现实情况的新数据集中，我们引入了一个更大的公共注释数据集，该数据集从摩洛哥板上收集到了更大的公共注释数据集。我们证明我们提出的模型可以在没有单个或多个字符的少数样品上学习。该数据集还将公开使用，以鼓励对板检测和识别进行进一步的研究和研究。

translated by 谷歌翻译

Visual Object Tracking in First Person Vision

Matteo Dunnhofer , Antonino Furnari , Giovanni Maria Farinella , Christian Micheloni

分类：计算机视觉

2022-09-27

对人类对象相互作用的理解在第一人称愿景（FPV）中至关重要。遵循相机佩戴者操纵的对象的视觉跟踪算法可以提供有效的信息，以有效地建模此类相互作用。在过去的几年中，计算机视觉社区已大大提高了各种目标对象和场景的跟踪算法的性能。尽管以前有几次尝试在FPV域中利用跟踪器，但仍缺少对最先进跟踪器的性能的有条理分析。这项研究差距提出了一个问题，即应使用当前的解决方案``现成''还是应进行更多特定领域的研究。本文旨在为此类问题提供答案。我们介绍了FPV中单个对象跟踪的首次系统研究。我们的研究广泛分析了42个算法的性能，包括通用对象跟踪器和基线FPV特定跟踪器。分析是通过关注FPV设置的不同方面，引入新的绩效指标以及与FPV特定任务有关的。这项研究是通过引入Trek-150（由150个密集注释的视频序列组成的新型基准数据集）来实现的。我们的结果表明，FPV中的对象跟踪对当前的视觉跟踪器构成了新的挑战。我们强调了导致这种行为的因素，并指出了可能的研究方向。尽管遇到了困难，但我们证明了跟踪器为需要短期对象跟踪的FPV下游任务带来好处。我们预计，随着新的和FPV特定的方法学会得到研究，通用对象跟踪将在FPV中受欢迎。

translated by 谷歌翻译

Detecting Rotated Objects as Gaussian Distributions and Its 3-D Generalization

Xue Yang , Gefan Zhang , Xiaojiang Yang , Yue Zhou , Wentao Wang , Jin Tang , Tao He , Junchi Yan

分类：计算机视觉 | 人工智能 | 机器学习

2022-09-22

现有检测方法通常使用参数化边界框（Bbox）进行建模和检测（水平）对象，并将其他旋转角参数用于旋转对象。我们认为，这种机制在建立有效的旋转检测回归损失方面具有根本的局限性，尤其是对于高精度检测而言，高精度检测（例如0.75）。取而代之的是，我们建议将旋转的对象建模为高斯分布。一个直接的优势是，我们关于两个高斯人之间距离的新回归损失，例如kullback-leibler Divergence（KLD）可以很好地对齐实际检测性能度量标准，这在现有方法中无法很好地解决。此外，两个瓶颈，即边界不连续性和正方形的问题也消失了。我们还提出了一种有效的基于高斯度量的标签分配策略，以进一步提高性能。有趣的是，通过在基于高斯的KLD损失下分析Bbox参数的梯度，我们表明这些参数通过可解释的物理意义进行了动态更新，这有助于解释我们方法的有效性，尤其是对于高精度检测。我们使用量身定制的算法设计将方法从2-D扩展到3-D，以处理标题估计，并在十二个公共数据集（2-D/3-D，空中/文本/脸部图像）上进行了各种基本检测器的实验结果。展示其优越性。

translated by 谷歌翻译

Robust Table Detection and Structure Recognition from Heterogeneous Document Images

Chixiang Ma , Weihong Lin , Lei Sun , Qiang Huo

分类：计算机视觉

2022-03-17

我们介绍了一种名为RobustAbnet的新表检测和结构识别方法，以检测表的边界并从异质文档图像中重建每个表的细胞结构。为了进行表检测，我们建议将Cornernet用作新的区域建议网络来生成更高质量的表建议，以更快的R-CNN，这显着提高了更快的R-CNN的定位准确性以进行表检测。因此，我们的表检测方法仅使用轻巧的RESNET-18骨干网络，在三个公共表检测基准（即CTDAR TRACKA，PUBLAYNET和IIIT-AR-13K）上实现最新性能。此外，我们提出了一种新的基于分裂和合并的表结构识别方法，其中提出了一个新型的基于CNN的新空间CNN分离线预测模块将每个检测到的表分为单元格，并且基于网格CNN的CNN合并模块是应用用于恢复生成细胞。由于空间CNN模块可以有效地在整个表图像上传播上下文信息，因此我们的表结构识别器可以坚固地识别具有较大的空白空间和几何扭曲（甚至弯曲）表的表。得益于这两种技术，我们的表结构识别方法在包括SCITSR，PubTabnet和CTDAR TrackB2-Modern在内的三个公共基准上实现了最先进的性能。此外，我们进一步证明了我们方法在识别具有复杂结构，大空间以及几何扭曲甚至弯曲形状的表上的表格上的优势。

translated by 谷歌翻译