智能论文笔记

Panoptic Segmentation Meets Remote Sensing

Osmar Luiz Ferreira de Carvalho , Osmar Abílio de Carvalho Júnior , Cristiano Rosa e Silva , Anesmar Olino de Albuquerque , Nickolas Castro Santana , Dibio Leandro Borges , Roberto Arnaldo Trancoso Gomes , Renato Fontes Guimarães

分类：计算机视觉 | 人工智能

2021-11-23

Panoptic semonation组合实例和语义预测，允许同时检测“事物”和“东西”。在许多具有挑战性的问题中有效地接近远程感测的数据中的Panoptic分段可能是吉祥的，因为它允许连续映射和特定的目标计数。有几个困难阻止了遥感中这项任务的增长：（a）大多数算法都设计用于传统图像，（b）图像标签必须包含“事物”和“填写”类，并且（c）注释格式复杂。因此，旨在解决和提高遥感中Panoptic分割的可操作性，这项研究有五个目标：（1）创建一个新的Panoptic分段数据准备管道，（2）提出注释转换软件以产生Panoptic注释; （3）在城市地区提出一个小说数据集，（4）修改任务的Detectron2，（5）评估城市环境中这项任务的困难。我们使用的空中图像，考虑14级，使用0,24米的空间分辨率。我们的管道考虑了三个图像输入，所提出的软件使用点Shapefile来创建Coco格式的样本。我们的研究生成了3,400个样本，具有512x512像素尺寸。我们使用了带有两个骨干板（Reset-50和Reset-101）的Panoptic-FPN，以及模型评估被视为语义实例和Panoptic指标。我们获得了93.9,47.7和64.9的平均iou，box ap和pq。我们的研究提出了一个用于Panoptic Seation的第一个有效管道，以及用于其他研究人员的广泛数据库使用和处理需要彻底了解的其他数据或相关问题。

translated by 谷歌翻译

Bounding Box-Free Instance Segmentation Using Semi-Supervised Learning for Generating a City-Scale Vehicle Dataset

Osmar Luiz Ferreira de Carvalho , Osmar Abílio de Carvalho Júnior , Anesmar Olino de Albuquerque , Nickolas Castro Santana , Dibio Leandro Borges , Roberto Arnaldo Trancoso Gomes , Renato Fontes Guimarães

分类：计算机视觉 | 人工智能

2021-11-23

车辆分类是一台热电电脑视觉主题，研究从地面查看到顶视图。在遥感中，顶视图的使用允许了解城市模式，车辆集中，交通管理等。但是，在瞄准像素方面的分类时存在一些困难：（a）大多数车辆分类研究使用对象检测方法，并且最公开的数据集设计用于此任务，（b）创建实例分段数据集是费力的，并且（C ）传统的实例分段方法由于对象很小，因此在此任务上执行此任务。因此，本研究目标是：（1）提出使用GIS软件的新型半监督迭代学习方法，（2）提出一种自由盒实例分割方法，（3）提供城市规模的车辆数据集。考虑的迭代学习程序：（1）标记少数车辆，（2）在这些样本上列车，（3）使用模型对整个图像进行分类，（4）将图像预测转换为多边形shapefile，（5 ）纠正有错误的一些区域，并将其包含在培训数据中，（6）重复，直到结果令人满意。为了单独的情况，我们考虑了车辆内部和车辆边界，DL模型是U-Net，具有高效网络B7骨架。当移除边框时，车辆内部变为隔离，允许唯一的对象识别。要恢复已删除的1像素边框，我们提出了一种扩展每个预测的简单方法。结果显示与掩模-RCNN（IOU中67％的82％）相比的更好的像素 - 明智的指标。关于每个对象分析，整体准确性，精度和召回大于90％。该管道适用于任何遥感目标，对分段和生成数据集非常有效。

translated by 谷歌翻译

Panoptic Segmentation: A Review

Omar Elharrouss , Somaya Al-Maadeed , Nandhini Subramanian , Najmath Ottakath , Noor Almaadeed , Yassine Himeur

分类：计算机视觉

2021-11-19

视频分析的图像分割在不同的研究领域起着重要作用，例如智能城市，医疗保健，计算机视觉和地球科学以及遥感应用。在这方面，最近致力于发展新的细分策略;最新的杰出成就之一是Panoptic细分。后者是由语义和实例分割的融合引起的。明确地，目前正在研究Panoptic细分，以帮助获得更多对视频监控，人群计数，自主驾驶，医学图像分析的图像场景的更细致的知识，以及一般对场景更深入的了解。为此，我们介绍了本文的首次全面审查现有的Panoptic分段方法，以获得作者的知识。因此，基于所采用的算法，应用场景和主要目标的性质，执行现有的Panoptic技术的明确定义分类。此外，讨论了使用伪标签注释新数据集的Panoptic分割。继续前进，进行消融研究，以了解不同观点的Panoptic方法。此外，讨论了适合于Panoptic分割的评估度量，并提供了现有解决方案性能的比较，以告知最先进的并识别其局限性和优势。最后，目前对主题技术面临的挑战和吸引不久的将来吸引相当兴趣的未来趋势，可以成为即将到来的研究研究的起点。提供代码的文件可用于：https：//github.com/elharroussomar/awesome-panoptic-egation

translated by 谷歌翻译

Image Segmentation Using Deep Learning: A Survey

Shervin Minaee , Yuri Boykov , Fatih Porikli , Antonio Plaza , Nasser Kehtarnavaz , Demetri Terzopoulos

分类：

2020-01-15

Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.

translated by 谷歌翻译

OmniCity: Omnipotent City Understanding with Multi-level and Multi-view Images

Weijia Li , Yawen Lai , Linning Xu , Yuanbo Xiangli , Jinhua Yu , Conghui He , Gui-Song Xia , Dahua Lin

分类：计算机视觉

2022-08-01

本文介绍了Omnicity，这是一种从多层次和多视图图像中了解无所不能的城市理解的新数据集。更确切地说，Omnicity包含多视图的卫星图像以及街道级全景图和单视图图像，构成了超过100k像素的注释图像，这些图像是从纽约市的25k Geo-Locations中良好的一致性和收集的。为了减轻大量像素的注释努力，我们提出了一个有效的街景图像注释管道，该管道利用了卫星视图的现有标签地图以及不同观点之间的转换关系（卫星，Panorama和Mono-View）。有了新的Omnicity数据集，我们为各种任务提供基准，包括构建足迹提取，高度估计以及构建平面/实例/细粒细分。我们还分析了视图对每个任务的影响，不同模型的性能，现有方法的局限性等。与现有的多层次和多视图基准相比，我们的Omnicity包含更多具有更丰富注释类型和更丰富的图像更多的视图，提供了从最先进的模型获得的更多基线结果，并为街道级全景图像中的细粒度建筑实例细分介绍了一项新颖的任务。此外，Omnicity为现有任务提供了新的问题设置，例如跨视图匹配，合成，分割，检测等，并促进开发新方法，以了解大规模的城市理解，重建和仿真。 Omnicity数据集以及基准将在https://city-super.github.io/omnicity上找到。

translated by 谷歌翻译

Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks

Vivien Sainte Fare Garnot , Loic Landrieu

分类：计算机视觉

2021-07-16

前所未有的访问多时间卫星图像，为各种地球观察任务开辟了新的视角。其中，农业包裹的像素精确的Panoptic分割具有重大的经济和环境影响。虽然研究人员对单张图像进行了探索了这个问题，但我们争辩说，随着图像的时间序列更好地寻址作物候选的复杂时间模式。在本文中，我们介绍了卫星图像时间序列（坐着）的Panoptic分割的第一端到端，单级方法（坐姿）。该模块可以与我们的新型图像序列编码网络相结合，依赖于时间自我关注，以提取丰富和自适应的多尺度时空特征。我们还介绍了Pastis，第一个开放式访问坐在Panoptic注释的数据集。我们展示了对多个竞争架构的语义细分的编码器的优越性，并建立了坐在的第一封Panoptic细分状态。我们的实施和痛苦是公开的。

translated by 谷歌翻译

Panoptic Segmentation

Alexander Kirillov , Kaiming He , Ross Girshick , Carsten Rother , Piotr Dollár

分类：

2018-01-03

We propose and study a task we name panoptic segmentation (PS). Panoptic segmentation unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). The proposed task requires generating a coherent scene segmentation that is rich and complete, an important step toward real-world vision systems. While early work in computer vision addressed related image/scene parsing tasks, these are not currently popular, possibly due to lack of appropriate metrics or associated recognition challenges. To address this, we propose a novel panoptic quality (PQ) metric that captures performance for all classes (stuff and things) in an interpretable and unified manner. Using the proposed metric, we perform a rigorous study of both human and machine performance for PS on three existing datasets, revealing interesting insights about the task. The aim of our work is to revive the interest of the community in a more unified view of image segmentation.

translated by 谷歌翻译

A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images

Irem Ulku , Erdem Akagunduz

分类：计算机视觉

2019-12-21

语义分割是图像的像素明智标记。由于在像素级别定义了问题，因此确定图像类标签是不可接受的，而是在原始图像像素分辨率下本地化它们是必要的。通过卷积神经网络（CNN）在创建语义，高级和分层图像特征方面的非凡能力推动;在过去十年中提出了几种基于深入的学习的2D语义分割方法。在本调查中，我们主要关注最近的语义细分科学发展，特别是在使用2D图像的基于深度学习的方法。我们开始分析了对2D语义分割的公共图像集和排行榜，概述了性能评估中使用的技术。在研究现场的演变时，我们按时间顺序分类为三个主要时期，即预先和早期的深度学习时代，完全卷积的时代和后FCN时代。我们在技术上分析了解决领域的基本问题的解决方案，例如细粒度的本地化和规模不变性。在借阅我们的结论之前，我们提出了一张来自所有提到的时代的方法表，每个方法都概述了他们对该领域的贡献。我们通过讨论现场当前的挑战以及他们已经解决的程度来结束调查。

translated by 谷歌翻译

Building Height Prediction with Instance Segmentation

Furkan Burak Bagci , Ahmet Alp Kindriroglu , Metehan Yalcin , Ufuk Uyan , Mahiye Uluyagmur Ozturk

分类：计算机视觉

2022-12-19

Extracting building heights from satellite images is an active research area used in many fields such as telecommunications, city planning, etc. Many studies utilize DSM (Digital Surface Models) generated with lidars or stereo images for this purpose. Predicting the height of the buildings using only RGB images is challenging due to the insufficient amount of data, low data quality, variations of building types, different angles of light and shadow, etc. In this study, we present an instance segmentation-based building height extraction method to predict building masks with their respective heights from a single RGB satellite image. We used satellite images with building height annotations of certain cities along with an open-source satellite dataset with the transfer learning approach. We reached, the bounding box mAP 59, the mask mAP 52.6, and the average accuracy value of 70% for buildings belonging to each height class in our test set.

translated by 谷歌翻译

Image Amodal Completion: A Survey

Jiayang Ao , Krista A. Ehinger , Qiuhong Ke

分类：计算机视觉 | 机器学习

2022-07-05

现有的计算机视觉系统可以与人类竞争，以理解物体的可见部分，但在描绘部分被遮挡物体的无形部分时，仍然远远远远没有达到人类。图像Amodal的完成旨在使计算机具有类似人类的Amodal完成功能，以了解完整的对象，尽管该对象被部分遮住。这项调查的主要目的是对图像Amodal完成领域的研究热点，关键技术和未来趋势提供直观的理解。首先，我们对这个新兴领域的最新文献进行了全面的评论，探讨了图像Amodal完成中的三个关键任务，包括Amodal形状完成，Amodal外观完成和订单感知。然后，我们检查了与图像Amodal完成有关的流行数据集及其共同的数据收集方法和评估指标。最后，我们讨论了现实世界中的应用程序和未来的研究方向，以实现图像的完成，从而促进了读者对现有技术和即将到来的研究趋势的挑战的理解。

translated by 谷歌翻译

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Gong Cheng , Xiang Yuan , Xiwen Yao , Kebing Yan , Qinghua Zeng , Junwei Han

分类：计算机视觉

2022-07-28

随着深度卷积神经网络的兴起，对象检测在过去几年中取得了突出的进步。但是，这种繁荣无法掩盖小物体检测（SOD）的不令人满意的情况，这是计算机视觉中臭名昭著的挑战性任务之一，这是由于视觉外观不佳和由小目标的内在结构引起的嘈杂表示。此外，用于基准小对象检测方法基准测试的大规模数据集仍然是瓶颈。在本文中，我们首先对小物体检测进行了详尽的审查。然后，为了催化SOD的发展，我们分别构建了两个大规模的小物体检测数据集（SODA），SODA-D和SODA-A，分别集中在驾驶和空中场景上。 SODA-D包括24704个高质量的交通图像和277596个9个类别的实例。对于苏打水，我们收集2510个高分辨率航空图像，并在9个类别上注释800203实例。众所周知，拟议的数据集是有史以来首次尝试使用针对多类SOD量身定制的大量注释实例进行大规模基准测试。最后，我们评估主流方法在苏打水上的性能。我们预计发布的基准可以促进SOD的发展，并产生该领域的更多突破。数据集和代码将很快在：\ url {https://shaunyuan22.github.io/soda}上。

translated by 谷歌翻译

The MIS Check-Dam Dataset for Object Detection and Instance Segmentation Tasks

Chintan Tundia , Rajiv Kumar , Om Damani , G. Sivakumar

分类：计算机视觉

2021-11-30

在其他计算机视觉任务中，深入学习导致对象检测和实例分割的最近进步。这些进步导致广泛的基于学习方法和相关方法的广泛应用于卫星图像的对象检测任务中。在本文中，我们介绍了MIS检查水坝，从卫星图像中的卫星图像进行新数据集，用于构建用于检查和映射的自动化系统，专注于用于农业的灌溉结构的重要性。我们审查了一些最新的对象检测和实例分段方法，并在我们的新数据集中评估其性能。我们根据各种网络配置和骨干架构评估了几个基于单级，两阶段和注意的方法。数据集和预训练型号可在https://www.cse.iitb.ac.in.in/gramdridisti/上获得。

translated by 谷歌翻译

UPSNet: A Unified Panoptic Segmentation Network

Yuwen Xiong , Renjie Liao , Hengshuang Zhao , Rui Hu , Min Bai , Ersin Yumer , Raquel Urtasun

分类：

2019-01-12

In this paper, we propose a unified panoptic segmentation network (UPSNet) for tackling the newly proposed panoptic segmentation task. On top of a single backbone residual network, we first design a deformable convolution based semantic segmentation head and a Mask R-CNN style instance segmentation head which solve these two subtasks simultaneously. More importantly, we introduce a parameter-free panoptic head which solves the panoptic segmentation via pixel-wise classification. It first leverages the logits from the previous two heads and then innovatively expands the representation for enabling prediction of an extra unknown class which helps better resolve the conflicts between semantic and instance segmentation. Additionally, it handles the challenge caused by the varying number of instances and permits back propagation to the bottom modules in an end-to-end manner. Extensive experimental results on Cityscapes, COCO and our internal dataset demonstrate that our UPSNet achieves stateof-the-art performance with much faster inference. Code has been made available at: https://github.com/ uber-research/UPSNet. * Equal contribution.† This work was done when Hengshuang Zhao was an intern at Uber ATG.

translated by 谷歌翻译

Panoptic Feature Pyramid Networks

Alexander Kirillov , Ross Girshick , Kaiming He , Piotr Dollár

分类：

2019-01-08

The recently introduced panoptic segmentation task has renewed our community's interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-ofthe-art methods for this joint task use separate and dissimilar networks for instance and semantic segmentation, without performing any shared computation. In this work, we aim to unify these methods at the architectural level, designing a single network for both tasks. Our approach is to endow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone. Surprisingly, this simple baseline not only remains effective for instance segmentation, but also yields a lightweight, topperforming method for semantic segmentation. In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks. Given its effectiveness and conceptual simplicity, we hope our method can serve as a strong baseline and aid future research in panoptic segmentation.

translated by 谷歌翻译

Camouflaged Instance Segmentation In-The-Wild: Dataset, Method, and Benchmark Suite

Trung-Nghia Le , Yubo Cao , Tan-Cong Nguyen , Minh-Quan Le , Khanh-Duy Nguyen , Thanh-Toan Do , Minh-Triet Tran , Tam V. Nguyen

分类：计算机视觉

2021-03-31

本文推动了在图像中分解伪装区域的信封，成了有意义的组件，即伪装的实例。为了促进伪装实例分割的新任务，我们将在数量和多样性方面引入DataSet被称为Camo ++，该数据集被称为Camo ++。新数据集基本上增加了具有分层像素 - 明智的地面真理的图像的数量。我们还为伪装实例分割任务提供了一个基准套件。特别是，我们在各种场景中对新构造的凸轮++数据集进行了广泛的评估。我们还提出了一种伪装融合学习（CFL）伪装实例分割框架，以进一步提高最先进的方法的性能。数据集，模型，评估套件和基准测试将在我们的项目页面上公开提供：https://sites.google.com/view/ltnghia/research/camo_plus_plus

translated by 谷歌翻译

Computer Vision on X-ray Data in Industrial Production and Security Applications: A survey

Mehdi Rafiei , Jenni Raitoharju , Alexandros Iosifidis

分类：计算机视觉

2022-11-10

X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.

translated by 谷歌翻译

Applications of Deep Learning in Fish Habitat Monitoring: A Tutorial and Survey

Alzayat Saleh , Marcus Sheaves , Dean Jerry , Mostafa Rahimi Azghadi

分类：计算机视觉

2022-06-11

海洋生态系统及其鱼类栖息地越来越重要，因为它们在提供有价值的食物来源和保护效果方面的重要作用。由于它们的偏僻且难以接近自然，因此通常使用水下摄像头对海洋环境和鱼类栖息地进行监测。这些相机产生了大量数字数据，这些数据无法通过当前的手动处理方法有效地分析，这些方法涉及人类观察者。 DL是一种尖端的AI技术，在分析视觉数据时表现出了前所未有的性能。尽管它应用于无数领域，但仍在探索其在水下鱼类栖息地监测中的使用。在本文中，我们提供了一个涵盖DL的关键概念的教程，该教程可帮助读者了解对DL的工作原理的高级理解。该教程还解释了一个逐步的程序，讲述了如何为诸如水下鱼类监测等挑战性应用开发DL算法。此外，我们还提供了针对鱼类栖息地监测的关键深度学习技术的全面调查，包括分类，计数，定位和细分。此外，我们对水下鱼类数据集进行了公开调查，并比较水下鱼类监测域中的各种DL技术。我们还讨论了鱼类栖息地加工深度学习的新兴领域的一些挑战和机遇。本文是为了作为希望掌握对DL的高级了解，通过遵循我们的分步教程而为其应用开发的海洋科学家的教程，并了解如何发展其研究，以促进他们的研究。努力。同时，它适用于希望调查基于DL的最先进方法的计算机科学家，以进行鱼类栖息地监测。

translated by 谷歌翻译

Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep Learning

Alexandru Munteanu , Marian Neagul

分类：计算机视觉 | 人工智能

2022-09-28

近年来，地理空间行业一直在稳定发展。这种增长意味着增加卫星星座，每天都会产生大量的卫星图像和其他遥感数据。有时，这些信息，即使在某些情况下我们指的是公开可用的数据，由于它的大小，它也无法占据。从时间和其他资源的角度来看，借助人工或使用传统的自动化方法来处理如此大量的数据并不总是可行的解决方案。在目前的工作中，我们提出了一种方法，用于创建一个由公开可用的遥感数据组成的多模式和时空数据集，并使用ART机器学习（ML）技术进行可行性进行测试。确切地说，卷积神经网络（CNN）模型的用法能够分离拟议数据集中存在的不同类别的植被。在地理信息系统（GIS）和计算机视觉（CV）的背景下，类似方法的受欢迎程度和成功更普遍地表明，应考虑并进一步分析和开发方法。

translated by 谷歌翻译

BuildMapper: A Fully Learnable Framework for Vectorized Building Contour Extraction

Shiqing Wei , Tao Zhang , Shunping Ji , Muying Luo , Jianya Gong

分类：计算机视觉

2022-11-07

Deep learning based methods have significantly boosted the study of automatic building extraction from remote sensing images. However, delineating vectorized and regular building contours like a human does remains very challenging, due to the difficulty of the methodology, the diversity of building structures, and the imperfect imaging conditions. In this paper, we propose the first end-to-end learnable building contour extraction framework, named BuildMapper, which can directly and efficiently delineate building polygons just as a human does. BuildMapper consists of two main components: 1) a contour initialization module that generates initial building contours; and 2) a contour evolution module that performs both contour vertex deformation and reduction, which removes the need for complex empirical post-processing used in existing methods. In both components, we provide new ideas, including a learnable contour initialization method to replace the empirical methods, dynamic predicted and ground truth vertex pairing for the static vertex correspondence problem, and a lightweight encoder for vertex information extraction and aggregation, which benefit a general contour-based method; and a well-designed vertex classification head for building corner vertices detection, which casts light on direct structured building contour extraction. We also built a suitable large-scale building dataset, the WHU-Mix (vector) building dataset, to benefit the study of contour-based building extraction methods. The extensive experiments conducted on the WHU-Mix (vector) dataset, the WHU dataset, and the CrowdAI dataset verified that BuildMapper can achieve a state-of-the-art performance, with a higher mask average precision (AP) and boundary AP than both segmentation-based and contour-based methods.

translated by 谷歌翻译

SUNet: Scale-aware Unified Network for Panoptic Segmentation

Weihao Yan , Yeqiang Qian , Chunxiang Wang , Ming Yang

分类：计算机视觉

2022-09-07

Pastic分割结合了语义和实例细分的优势，可以为智能车辆提供像素级和实例级别的环境感知信息。但是，它挑战各种尺度的对象，尤其是在极小的和小的物体上。在这项工作中，我们提出了两个轻量级模块来减轻此问题。首先，Pixel-ReSation Block旨在为大规模事物建模全局上下文信息，该信息基于与查询无关的公式，并带来小参数增量。然后，构建对流网络以收集针对小规模内容的额外高分辨率信息，为下游分割分支提供更合适的语义功能。基于这两个模块，我们提出了一个端到端尺度意识到的统一网络（Sunet），该网络更适合多尺度对象。对城市景观和可可的广泛实验证明了所提出的方法的有效性。

translated by 谷歌翻译