从空中和卫星图像提取自动化路线图是一个长期存在的挑战。现有算法基于像素级分段,然后是矢量化,或者使用下一个移动预测的迭代图构造。这两种策略都遭受了严重的缺点,特别是高计算资源和不完整的产出。相比之下,我们提出了一种直接在单次通过中缩小最终道路图的方法。关键思想包括组合完全卷积的网络,这些网络负责定位点,例如交叉点,死头和转弯,以及预测这些点之间的链路的图形神经网络。这种策略比迭代方法更有效,并允许我们通过在保持训练端到端的同时消除生成起始位置的需要来简化培训过程。我们评估我们对流行的道路流数据集上现有工作的方法,并实现竞争结果。我们还将速度基准测试,并表明它优于现有的方法。这为嵌入式设备打开了飞行中的可能性。
translated by 谷歌翻译
我们介绍了一种名为RobustAbnet的新表检测和结构识别方法,以检测表的边界并从异质文档图像中重建每个表的细胞结构。为了进行表检测,我们建议将Cornernet用作新的区域建议网络来生成更高质量的表建议,以更快的R-CNN,这显着提高了更快的R-CNN的定位准确性以进行表检测。因此,我们的表检测方法仅使用轻巧的RESNET-18骨干网络,在三个公共表检测基准(即CTDAR TRACKA,PUBLAYNET和IIIT-AR-13K)上实现最新性能。此外,我们提出了一种新的基于分裂和合并的表结构识别方法,其中提出了一个新型的基于CNN的新空间CNN分离线预测模块将每个检测到的表分为单元格,并且基于网格CNN的CNN合并模块是应用用于恢复生成细胞。由于空间CNN模块可以有效地在整个表图像上传播上下文信息,因此我们的表结构识别器可以坚固地识别具有较大的空白空间和几何扭曲(甚至弯曲)表的表。得益于这两种技术,我们的表结构识别方法在包括SCITSR,PubTabnet和CTDAR TrackB2-Modern在内的三个公共基准上实现了最先进的性能。此外,我们进一步证明了我们方法在识别具有复杂结构,大空间以及几何扭曲甚至弯曲形状的表上的表格上的优势。
translated by 谷歌翻译
Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.
translated by 谷歌翻译
Graph classification is an important area in both modern research and industry. Multiple applications, especially in chemistry and novel drug discovery, encourage rapid development of machine learning models in this area. To keep up with the pace of new research, proper experimental design, fair evaluation, and independent benchmarks are essential. Design of strong baselines is an indispensable element of such works. In this thesis, we explore multiple approaches to graph classification. We focus on Graph Neural Networks (GNNs), which emerged as a de facto standard deep learning technique for graph representation learning. Classical approaches, such as graph descriptors and molecular fingerprints, are also addressed. We design fair evaluation experimental protocol and choose proper datasets collection. This allows us to perform numerous experiments and rigorously analyze modern approaches. We arrive to many conclusions, which shed new light on performance and quality of novel algorithms. We investigate application of Jumping Knowledge GNN architecture to graph classification, which proves to be an efficient tool for improving base graph neural network architectures. Multiple improvements to baseline models are also proposed and experimentally verified, which constitutes an important contribution to the field of fair model comparison.
translated by 谷歌翻译
图提供了一种自然的方式来制定多个对象跟踪(MOT)和多个对象跟踪和分割(MOTS),逐个检测范式中。但是,他们还引入了学习方法的主要挑战,因为定义可以在这种结构化领域运行的模型并不是微不足道的。在这项工作中,我们利用MOT的经典网络流程公式来定义基于消息传递网络(MPN)的完全微分框架。通过直接在图形域上操作,我们的方法可以在整个检测和利用上下文特征上全球推理。然后,它共同预测了数据关联问题的最终解决方案和场景中所有对象的分割掩码,同时利用这两个任务之间的协同作用。我们在几个公开可用的数据集中获得跟踪和细分的最新结果。我们的代码可在github.com/ocetintas/mpntrackseg上找到。
translated by 谷歌翻译
机器学习,在深入学习的进步,在过去分析时间序列方面表现出巨大的潜力。但是,在许多情况下,可以通过将其结合到学习方法中可能改善预测的附加信息。这对于由例如例如传感器位置的传感器网络而产生的数据至关重要。然后,可以通过通过图形结构建模,以及顺序(时间)信息来利用这种空间信息。适应深度学习的最新进展在各种图形相关任务中表明了有希望的潜力。但是,这些方法尚未在很大程度上适用于时间序列相关任务。具体而言,大多数尝试基本上围绕空间 - 时间图形神经网络巩固了时间序列预测的小序列长度。通常,这些架构不适合包含大数据序列的回归或分类任务。因此,在这项工作中,我们使用图形神经网络的好处提出了一种能够在多变量时间序列回归任务中处理这些长序列的架构。我们的模型在包含地震波形的两个地震数据集上进行测试,其中目标是预测在一组站的地面摇动的强度测量。我们的研究结果表明了我们的方法的有希望的结果,这是深入讨论的额外消融研究。
translated by 谷歌翻译
在过去的几年中,已经开发了图形绘图技术,目的是生成美学上令人愉悦的节点链接布局。最近,利用可区分损失功能的使用已为大量使用梯度下降和相关优化算法铺平了道路。在本文中,我们提出了一个用于开发图神经抽屉(GND)的新框架,即依靠神经计算来构建有效且复杂的图的机器。 GND是图形神经网络(GNN),其学习过程可以由任何提供的损失函数(例如图形图中通常使用的损失函数)驱动。此外,我们证明,该机制可以由通过前馈神经网络计算的损失函数来指导,并根据表达美容特性的监督提示,例如交叉边缘的最小化。在这种情况下,我们表明GNN可以通过位置功能很好地丰富与未标记的顶点处理。我们通过为边缘交叉构建损失函数来提供概念验证,并在提议的框架下工作的不同GNN模型之间提供定量和定性的比较。
translated by 谷歌翻译
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
translated by 谷歌翻译
现代组织病理学图像分析依赖于细胞结构的分割,以导出生物医学研究和临床诊断所需的定量度量。最先进的深度学习方法主要在分割中施加卷积层,通常以特定的实验配置高度定制;通常无法概括到未知数据。随着经典卷积层的模型容量受到有限的学习内核的限制,我们的方法使用图像的图形表示,并在多个放大率上侧重于节点转换。我们提出了一种新的细胞核的语义分段建筑,对实验配置的差异如染色和细胞类型的变异。该结构包括基于残留图注意层的新型神经塑形图注意网络,并同时优化表示组织病理学图像的多倍倍率水平的图形结构。通过投影生成节点特征的图形结构的修改对于架构是图形神经网络本身的重要性。它确定可能的消息流和关键属性以优化关注,图形结构和节点更新在平衡放大率损耗中。在实验评估中,我们的框架优于最先进的神经网络的整合,通常需要一部分神经元,并为新的核数据集进行分割的新标准。
translated by 谷歌翻译
基于无人机(UAV)基于无人机的视觉对象跟踪已实现了广泛的应用,并且由于其多功能性和有效性而引起了智能运输系统领域的越来越多的关注。作为深度学习革命性趋势的新兴力量,暹罗网络在基于无人机的对象跟踪中闪耀,其准确性,稳健性和速度有希望的平衡。由于开发了嵌入式处理器和深度神经网络的逐步优化,暹罗跟踪器获得了广泛的研究并实现了与无人机的初步组合。但是,由于无人机在板载计算资源和复杂的现实情况下,暹罗网络的空中跟踪仍然在许多方面都面临严重的障碍。为了进一步探索基于无人机的跟踪中暹罗网络的部署,这项工作对前沿暹罗跟踪器进行了全面的审查,以及使用典型的无人机板载处理器进行评估的详尽无人用分析。然后,进行板载测试以验证代表性暹罗跟踪器在现实世界无人机部署中的可行性和功效。此外,为了更好地促进跟踪社区的发展,这项工作分析了现有的暹罗跟踪器的局限性,并进行了以低弹片评估表示的其他实验。最后,深入讨论了基于无人机的智能运输系统的暹罗跟踪的前景。领先的暹罗跟踪器的统一框架,即代码库及其实验评估的结果,请访问https://github.com/vision4robotics/siamesetracking4uav。
translated by 谷歌翻译
我们提出了一种新型的图形神经网络(GNN)方法,用于高通量显微镜视频中的细胞跟踪。通过将整个延时序列建模为直接图,其中细胞实例由其节点及其边缘表示,我们通过查找图中的最大路径来提取整个细胞轨迹。这是由纳入端到端深度学习框架中的几个关键贡献来完成的。我们利用深度度量学习算法来提取细胞特征向量,以区分不同生物细胞的实例并组装相同的细胞实例。我们引入了一种新的GNN块类型,该类型可以对节点和边缘特征向量进行相互更新,从而促进基础消息传递过程。消息传递概念的范围由GNN块的数量确定,这是至关重要的,因为它可以在连续的框架中实现节点和边缘之间的“节点和边缘”之间的“流动”。最后,我们解决了边缘分类问题,并使用已确定的活动边缘来构建单元格的轨道和谱系树。我们通过将其应用于不同细胞类型,成像设置和实验条件的2D和3D数据集,来证明所提出的细胞跟踪方法的强度。我们表明,我们的框架在大多数评估的数据集上都优于当前最新方法。该代码可在我们的存储库中获得:https://github.com/talbenha/cell-tracker-gnn。
translated by 谷歌翻译
深神网络的对象探测器正在不断发展,并用于多种应用程序,每个应用程序都有自己的要求集。尽管关键安全应用需要高准确性和可靠性,但低延迟任务需要资源和节能网络。不断提出了实时探测器,在高影响现实世界中是必需的,但是它们过分强调了准确性和速度的提高,而其他功能(例如多功能性,鲁棒性,资源和能源效率)则被省略。现有网络的参考基准不存在,设计新网络的标准评估指南也不存在,从而导致比较模棱两可和不一致的比较。因此,我们对广泛的数据集进行了多个实时探测器(基于锚点,关键器和变压器)的全面研究,并报告了一系列广泛指标的结果。我们还研究了变量,例如图像大小,锚固尺寸,置信阈值和架构层对整体性能的影响。我们分析了检测网络的鲁棒性,以防止分配变化,自然腐败和对抗性攻击。此外,我们提供了校准分析来评估预测的可靠性。最后,为了强调现实世界的影响,我们对自动驾驶和医疗保健应用进行了两个独特的案例研究。为了进一步衡量关键实时应用程序中网络的能力,我们报告了在Edge设备上部署检测网络后的性能。我们广泛的实证研究可以作为工业界对现有网络做出明智选择的指南。我们还希望激发研究社区的设计和评估网络的新方向,该网络着重于更大而整体的概述,以实现深远的影响。
translated by 谷歌翻译
Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A topdown architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art singlemodel results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 6 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.
translated by 谷歌翻译
大规模矢量映射对于运输,城市规划,调查和人口普查很重要。我们提出了GraphMapper,这是从卫星图像中提取端到端向量图的统一框架。我们的关键思想是一种新颖的统一表示,称为“原始图”的不同拓扑的形状,这是一组形状原语及其成对关系矩阵。然后,我们将向量形状的预测,正则化和拓扑重构转换为独特的原始图学习问题。具体而言,GraphMapper是一个基于多头注意的全局形状上下文建模的通用原始图形学习网络。开发了一种嵌入式空间排序方法,用于准确的原始关系建模。我们从经验上证明了GraphMapper对两个具有挑战性的映射任务的有效性,即建立足迹正则化和道路网络拓扑重建。我们的模型在公共基准上的两项任务中都优于最先进的方法。所有代码将公开可用。
translated by 谷歌翻译
建模长期依赖关系对于理解计算机视觉中的任务至关重要。尽管卷积神经网络(CNN)在许多视觉任务中都表现出色,但由于它们通常由当地核层组成,因此它们仍然限制捕获长期结构化关系。但是,完全连接的图(例如变形金刚中的自我发项操作)对这种建模是有益的,但是,其计算开销非常有用。在本文中,我们提出了一个动态图形消息传递网络,与建模完全连接的图形相比,该网络大大降低了计算复杂性。这是通过在图表中自适应采样节点(以输入为条件)来实现的,以传递消息传递。基于采样节点,我们动态预测节点依赖性滤波器权重和亲和力矩阵,以在它们之间传播信息。这种公式使我们能够设计一个自我发挥的模块,更重要的是,我们将基于变压器的新骨干网络用于图像分类预处理,并用于解决各种下游任务(对象检测,实例和语义细分)。使用此模型,我们在四个不同任务上的强,最先进的基线方面显示出显着改进。我们的方法还优于完全连接的图形,同时使用较少的浮点操作和参数。代码和型号将在https://github.com/fudan-zvg/dgmn2上公开提供。
translated by 谷歌翻译
In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of December 2022, the GitHub repository has reached 2,000 stars and 380 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting.
translated by 谷歌翻译
基于1-HOP邻居之间的消息传递(MP)范式交换信息的图形神经网络(GNN),以在每一层构建节点表示。原则上,此类网络无法捕获在图形上学习给定任务的可能或必需的远程交互(LRI)。最近,人们对基于变压器的图的开发产生了越来越多的兴趣,这些方法可以考虑超出原始稀疏结构以外的完整节点连接,从而实现了LRI的建模。但是,仅依靠1跳消息传递的MP-gnn与位置特征表示形式结合使用时通常在几个现有的图形基准中表现得更好,因此,限制了Transferter类似体系结构的感知效用和排名。在这里,我们介绍了5个图形学习数据集的远程图基准(LRGB):Pascalvoc-SP,Coco-SP,PCQM-Contact,Peptides-Func和肽结构,可以说需要LRI推理以在给定的任务中实现强大的性能。我们基准测试基线GNN和Graph Transformer网络,以验证捕获长期依赖性的模型在这些任务上的性能明显更好。因此,这些数据集适用于旨在捕获LRI的MP-GNN和Graph Transformer架构的基准测试和探索。
translated by 谷歌翻译
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.
translated by 谷歌翻译