语义分割是图像的像素明智标记。由于在像素级别定义了问题,因此确定图像类标签是不可接受的,而是在原始图像像素分辨率下本地化它们是必要的。通过卷积神经网络(CNN)在创建语义,高级和分层图像特征方面的非凡能力推动;在过去十年中提出了几种基于深入的学习的2D语义分割方法。在本调查中,我们主要关注最近的语义细分科学发展,特别是在使用2D图像的基于深度学习的方法。我们开始分析了对2D语义分割的公共图像集和排行榜,概述了性能评估中使用的技术。在研究现场的演变时,我们按时间顺序分类为三个主要时期,即预先和早期的深度学习时代,完全卷积的时代和后FCN时代。我们在技术上分析了解决领域的基本问题的解决方案,例如细粒度的本地化和规模不变性。在借阅我们的结论之前,我们提出了一张来自所有提到的时代的方法表,每个方法都概述了他们对该领域的贡献。我们通过讨论现场当前的挑战以及他们已经解决的程度来结束调查。
translated by 谷歌翻译
Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.
translated by 谷歌翻译
Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.
translated by 谷歌翻译
深度学习已被广​​泛用于医学图像分割,并且录制了录制了该领域深度学习的成功的大量论文。在本文中,我们使用深层学习技术对医学图像分割的全面主题调查。本文进行了两个原创贡献。首先,与传统调查相比,直接将深度学习的文献分成医学图像分割的文学,并为每组详细介绍了文献,我们根据从粗略到精细的多级结构分类目前流行的文献。其次,本文侧重于监督和弱监督的学习方法,而不包括无监督的方法,因为它们在许多旧调查中引入而且他们目前不受欢迎。对于监督学习方法,我们分析了三个方面的文献:骨干网络的选择,网络块的设计,以及损耗功能的改进。对于虚弱的学习方法,我们根据数据增强,转移学习和交互式分割进行调查文献。与现有调查相比,本调查将文献分类为比例不同,更方便读者了解相关理由,并将引导他们基于深度学习方法思考医学图像分割的适当改进。
translated by 谷歌翻译
Semantic segmentation works on the computer vision algorithm for assigning each pixel of an image into a class. The task of semantic segmentation should be performed with both accuracy and efficiency. Most of the existing deep FCNs yield to heavy computations and these networks are very power hungry, unsuitable for real-time applications on portable devices. This project analyzes current semantic segmentation models to explore the feasibility of applying these models for emergency response during catastrophic events. We compare the performance of real-time semantic segmentation models with non-real-time counterparts constrained by aerial images under oppositional settings. Furthermore, we train several models on the Flood-Net dataset, containing UAV images captured after Hurricane Harvey, and benchmark their execution on special classes such as flooded buildings vs. non-flooded buildings or flooded roads vs. non-flooded roads. In this project, we developed a real-time UNet based model and deployed that network on Jetson AGX Xavier module.
translated by 谷歌翻译
视频分析的图像分割在不同的研究领域起着重要作用,例如智能城市,医疗保健,计算机视觉和地球科学以及遥感应用。在这方面,最近致力于发展新的细分策略;最新的杰出成就之一是Panoptic细分。后者是由语义和实例分割的融合引起的。明确地,目前正在研究Panoptic细分,以帮助获得更多对视频监控,人群计数,自主驾驶,医学图像分析的图像场景的更细致的知识,以及一般对场景更深入的了解。为此,我们介绍了本文的首次全面审查现有的Panoptic分段方法,以获得作者的知识。因此,基于所采用的算法,应用场景和主要目标的性质,执行现有的Panoptic技术的明确定义分类。此外,讨论了使用伪标签注释新数据集的Panoptic分割。继续前进,进行消融研究,以了解不同观点的Panoptic方法。此外,讨论了适合于Panoptic分割的评估度量,并提供了现有解决方案性能的比较,以告知最先进的并识别其局限性和优势。最后,目前对主题技术面临的挑战和吸引不久的将来吸引相当兴趣的未来趋势,可以成为即将到来的研究研究的起点。提供代码的文件可用于:https://github.com/elharroussomar/awesome-panoptic-egation
translated by 谷歌翻译
视频分割,即将视频帧分组到多个段或对象中,在广泛的实际应用中扮演关键作用,例如电影中的视觉效果辅助,自主驾驶中的现场理解,以及视频会议中的虚拟背景创建,名称一些。最近,由于计算机愿景中的联系复兴,一直存在众多深度学习的方法,这一直专用于视频分割并提供引人注目的性能。在这项调查中,通过引入各自的任务设置,背景概念,感知需要,开发历史,以及开发历史,综合审查这一领域的两种基本研究,即在视频和视频语义分割中,即视频和视频语义分割中的通用对象分段(未知类别)。主要挑战。我们还提供关于两种方法和数据集的代表文学的详细概述。此外,我们在基准数据集中呈现了审查方法的定量性能比较。最后,我们指出了这一领域的一套未解决的开放问题,并提出了进一步研究的可能机会。
translated by 谷歌翻译
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
translated by 谷歌翻译
深度学习属于人工智能领域,机器执行通常需要某种人类智能的任务。类似于大脑的基本结构,深度学习算法包括一种人工神经网络,其类似于生物脑结构。利用他们的感官模仿人类的学习过程,深入学习网络被送入(感官)数据,如文本,图像,视频或声音。这些网络在不同的任务中优于最先进的方法,因此,整个领域在过去几年中看到了指数增长。这种增长在过去几年中每年超过10,000多种出版物。例如,只有在医疗领域中的所有出版物中覆盖的搜索引擎只能在Q3 2020中覆盖所有出版物的子集,用于搜索术语“深度学习”,其中大约90%来自过去三年。因此,对深度学习领域的完全概述已经不可能在不久的将来获得,并且在不久的将来可能会难以获得难以获得子场的概要。但是,有几个关于深度学习的综述文章,这些文章专注于特定的科学领域或应用程序,例如计算机愿景的深度学习进步或在物体检测等特定任务中进行。随着这些调查作为基础,这一贡献的目的是提供对不同科学学科的深度学习的第一个高级,分类的元调查。根据底层数据来源(图像,语言,医疗,混合)选择了类别(计算机愿景,语言处理,医疗信息和其他工程)。此外,我们还审查了每个子类别的常见架构,方法,专业,利弊,评估,挑战和未来方向。
translated by 谷歌翻译
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.
translated by 谷歌翻译
组织学图像中核和腺体的实例分割是用于癌症诊断,治疗计划和生存分析的计算病理学工作流程中的重要一步。随着现代硬件的出现,大规模质量公共数据集的最新可用性以及社区组织的宏伟挑战已经看到了自动化方法的激增,重点是特定领域的挑战,这对于技术进步和临床翻译至关重要。在这项调查中,深入分析了过去五年(2017-2022)中发表的原子核和腺体实例细分的126篇论文,进行了深入分析,讨论了当前方法的局限性和公开挑战。此外,提出了潜在的未来研究方向,并总结了最先进方法的贡献。此外,还提供了有关公开可用数据集的概括摘要以及关于说明每种挑战的最佳性能方法的巨大挑战的详细见解。此外,我们旨在使读者现有研究的现状和指针在未来的发展方向上开发可用于临床实践的方法,从而可以改善诊断,分级,预后和癌症的治疗计划。据我们所知,以前没有工作回顾了朝向这一方向的组织学图像中的实例细分。
translated by 谷歌翻译
海洋生态系统及其鱼类栖息地越来越重要,因为它们在提供有价值的食物来源和保护效果方面的重要作用。由于它们的偏僻且难以接近自然,因此通常使用水下摄像头对海洋环境和鱼类栖息地进行监测。这些相机产生了大量数字数据,这些数据无法通过当前的手动处理方法有效地分析,这些方法涉及人类观察者。 DL是一种尖端的AI技术,在分析视觉数据时表现出了前所未有的性能。尽管它应用于无数领域,但仍在探索其在水下鱼类栖息地监测中的使用。在本文中,我们提供了一个涵盖DL的关键概念的教程,该教程可帮助读者了解对DL的工作原理的高级理解。该教程还解释了一个逐步的程序,讲述了如何为诸如水下鱼类监测等挑战性应用开发DL算法。此外,我们还提供了针对鱼类栖息地监测的关键深度学习技术的全面调查,包括分类,计数,定位和细分。此外,我们对水下鱼类数据集进行了公开调查,并比较水下鱼类监测域中的各种DL技术。我们还讨论了鱼类栖息地加工深度学习的新兴领域的一些挑战和机遇。本文是为了作为希望掌握对DL的高级了解,通过遵循我们的分步教程而为其应用开发的海洋科学家的教程,并了解如何发展其研究,以促进他们的研究。努力。同时,它适用于希望调查基于DL的最先进方法的计算机科学家,以进行鱼类栖息地监测。
translated by 谷歌翻译
随着深度学习方法的进步,如深度卷积神经网络,残余神经网络,对抗网络的进步。 U-Net架构最广泛利用生物医学图像分割,以解决目标区域或子区域的识别和检测的自动化。在最近的研究中,基于U-Net的方法在不同应用中显示了最先进的性能,以便在脑肿瘤,肺癌,阿尔茨海默,乳腺癌等疾病的早期诊断和治疗中发育计算机辅助诊断系统等,使用各种方式。本文通过描述U-Net框架来提出这些方法的成功,然后通过执行1)型号的U-Net变体进行综合分析,2)模特内分类,建立更好的见解相关的挑战和解决方案。此外,本文还强调了基于U-Net框架在持续的大流行病,严重急性呼吸综合征冠状病毒2(SARS-COV-2)中的贡献也称为Covid-19。最后,分析了这些U-Net变体的优点和相似性以及生物医学图像分割所涉及的挑战,以发现该领域的未来未来的研究方向。
translated by 谷歌翻译
语义分割是将类标签分配给图像中每个像素的问题,并且是自动车辆视觉堆栈的重要组成部分,可促进场景的理解和对象检测。但是,许多表现最高的语义分割模型非常复杂且笨拙,因此不适合在计算资源有限且低延迟操作的板载自动驾驶汽车平台上部署。在这项调查中,我们彻底研究了旨在通过更紧凑,更有效的模型来解决这种未对准的作品,该模型能够在低内存嵌入式系统上部署,同时满足实时推理的限制。我们讨论了该领域中最杰出的作品,根据其主要贡献将它们置于分类法中,最后我们评估了在一致的硬件和软件设置下,所讨论模型的推理速度,这些模型代表了具有高端的典型研究环境GPU和使用低内存嵌入式GPU硬件的现实部署方案。我们的实验结果表明,许多作品能够在资源受限的硬件上实时性能,同时说明延迟和准确性之间的一致权衡。
translated by 谷歌翻译
In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.
translated by 谷歌翻译
Human parsing aims to partition humans in image or video into multiple pixel-level semantic parts. In the last decade, it has gained significantly increased interest in the computer vision community and has been utilized in a broad range of practical applications, from security monitoring, to social media, to visual special effects, just to name a few. Although deep learning-based human parsing solutions have made remarkable achievements, many important concepts, existing challenges, and potential research directions are still confusing. In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing, by introducing their respective task settings, background concepts, relevant problems and applications, representative literature, and datasets. We also present quantitative performance comparisons of the reviewed methods on benchmark datasets. Additionally, to promote sustainable development of the community, we put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research through universal, concise, and extensible solutions. Finally, we point out a set of under-investigated open issues in this field and suggest new directions for future study. We also provide a regularly updated project page, to continuously track recent developments in this fast-advancing field: https://github.com/soeaver/awesome-human-parsing.
translated by 谷歌翻译
Australian Centre for Robotic Vision {guosheng.lin;anton.milan;chunhua.shen;
translated by 谷歌翻译
现有的计算机视觉系统可以与人类竞争,以理解物体的可见部分,但在描绘部分被遮挡物体的无形部分时,仍然远远远远没有达到人类。图像Amodal的完成旨在使计算机具有类似人类的Amodal完成功能,以了解完整的对象,尽管该对象被部分遮住。这项调查的主要目的是对图像Amodal完成领域的研究热点,关键技术和未来趋势提供直观的理解。首先,我们对这个新兴领域的最新文献进行了全面的评论,探讨了图像Amodal完成中的三个关键任务,包括Amodal形状完成,Amodal外观完成和订单感知。然后,我们检查了与图像Amodal完成有关的流行数据集及其共同的数据收集方法和评估指标。最后,我们讨论了现实世界中的应用程序和未来的研究方向,以实现图像的完成,从而促进了读者对现有技术和即将到来的研究趋势的挑战的理解。
translated by 谷歌翻译
深度学习技术导致了通用对象检测领域的显着突破,近年来产生了很多场景理解的任务。由于其强大的语义表示和应用于场景理解,场景图一直是研究的焦点。场景图生成(SGG)是指自动将图像映射到语义结构场景图中的任务,这需要正确标记检测到的对象及其关系。虽然这是一项具有挑战性的任务,但社区已经提出了许多SGG方法并取得了良好的效果。在本文中,我们对深度学习技术带来了近期成就的全面调查。我们审查了138个代表作品,涵盖了不同的输入方式,并系统地将现有的基于图像的SGG方法从特征提取和融合的角度进行了综述。我们试图通过全面的方式对现有的视觉关系检测方法进行连接和系统化现有的视觉关系检测方法,概述和解释SGG的机制和策略。最后,我们通过深入讨论当前存在的问题和未来的研究方向来完成这项调查。本调查将帮助读者更好地了解当前的研究状况和想法。
translated by 谷歌翻译
语义分割在广泛的计算机视觉应用中起着基本作用,提供了全球对图像​​的理解的关键信息。然而,最先进的模型依赖于大量的注释样本,其比在诸如图像分类的任务中获得更昂贵的昂贵的样本。由于未标记的数据替代地获得更便宜,因此无监督的域适应达到了语义分割社区的广泛成功并不令人惊讶。本调查致力于总结这一令人难以置信的快速增长的领域的五年,这包含了语义细分本身的重要性,以及将分段模型适应新环境的关键需求。我们提出了最重要的语义分割方法;我们对语义分割的域适应技术提供了全面的调查;我们揭示了多域学习,域泛化,测试时间适应或无源域适应等较新的趋势;我们通过描述在语义细分研究中最广泛使用的数据集和基准测试来结束本调查。我们希望本调查将在学术界和工业中提供具有全面参考指导的研究人员,并有助于他们培养现场的新研究方向。
translated by 谷歌翻译