智能论文笔记

Global and Local Features through Gaussian Mixture Models on Image Semantic Segmentation

Darwin Saire , Adín Ramírez Rivera

分类：计算机视觉

2022-07-19

语义细分任务的目的是在像素级别上进行密集分类。深层模型在解决这项任务方面表现出进展。但是，这些方法的剩余问题是空间精度的丧失，通常是在分段对象的边界上产生的。我们提出的模型通过为特征表示形式提供内部结构来解决此问题，同时提取支持前者的全局表示。为了适应内部结构，在训练过程中，我们预测数据中的高斯混合模型，该模型与跳过连接和解码阶段合并，有助于避免换动态偏见。此外，我们的结果表明，我们可以通过提供集群行为并将其组合来通过提供学习表征（全球和本地）来改善语义细分。最后，我们提出的结果证明了我们在城市景观和合成数据集方面的进步。

translated by 谷歌翻译

On Efficient Real-Time Semantic Segmentation: A Survey

Christopher J. Holder , Muhammad Shafique

分类：计算机视觉 | 机器学习

2022-06-17

语义分割是将类标签分配给图像中每个像素的问题，并且是自动车辆视觉堆栈的重要组成部分，可促进场景的理解和对象检测。但是，许多表现最高的语义分割模型非常复杂且笨拙，因此不适合在计算资源有限且低延迟操作的板载自动驾驶汽车平台上部署。在这项调查中，我们彻底研究了旨在通过更紧凑，更有效的模型来解决这种未对准的作品，该模型能够在低内存嵌入式系统上部署，同时满足实时推理的限制。我们讨论了该领域中最杰出的作品，根据其主要贡献将它们置于分类法中，最后我们评估了在一致的硬件和软件设置下，所讨论模型的推理速度，这些模型代表了具有高端的典型研究环境GPU和使用低内存嵌入式GPU硬件的现实部署方案。我们的实验结果表明，许多作品能够在资源受限的硬件上实时性能，同时说明延迟和准确性之间的一致权衡。

translated by 谷歌翻译

Image Segmentation Using Deep Learning: A Survey

Shervin Minaee , Yuri Boykov , Fatih Porikli , Antonio Plaza , Nasser Kehtarnavaz , Demetri Terzopoulos

分类：

2020-01-15

Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.

translated by 谷歌翻译

MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context

Weijun Wang , Andrew Howard

分类：计算机视觉 | 人工智能

2021-12-22

我们展示了一个下一代神经网络架构，马赛克，用于移动设备上的高效和准确的语义图像分割。MOSAIC是通过各种移动硬件平台使用常用的神经操作设计，以灵活地部署各种移动平台。利用简单的非对称编码器 - 解码器结构，该解码器结构由有效的多尺度上下文编码器和轻量级混合解码器组成，以从聚合信息中恢复空间细节，Mosaic在平衡准确度和计算成本的同时实现了新的最先进的性能。基于搜索的分类网络，马赛克部署在定制的特征提取骨架顶部，达到目前行业标准MLPerf型号和最先进的架构，达到5％的绝对精度增益。

translated by 谷歌翻译

Uncertainty, Edge, and Reverse-Attention Guided Generative Adversarial Network for Automatic Building Detection in Remotely Sensed Images

Somrita Chattopadhyay , Avinash C. Kak

分类：计算机视觉 | 机器学习

2021-12-10

尽管近期基于深度学习的语义细分，但远程感测图像的自动建筑检测仍然是一个具有挑战性的问题，由于全球建筑物的出现巨大变化。误差主要发生在构建足迹的边界，阴影区域，以及检测外表面具有与周围区域非常相似的反射率特性的建筑物。为了克服这些问题，我们提出了一种生成的对抗基于网络的基于网络的分割框架，其具有嵌入在发电机中的不确定性关注单元和改进模块。由边缘和反向关注单元组成的细化模块，旨在精炼预测的建筑地图。边缘注意力增强了边界特征，以估计更高的精度，并且反向关注允许网络探索先前估计区域中缺少的功能。不确定性关注单元有助于网络解决分类中的不确定性。作为我们方法的权力的衡量标准，截至2021年12月4日，它在Deepglobe公共领导板上的第二名，尽管我们的方法的主要重点 - 建筑边缘 - 并不完全对齐用于排行榜排名的指标。 DeepGlobe充满挑战数据集的整体F1分数为0.745。我们还报告了对挑战的Inria验证数据集的最佳成绩，我们的网络实现了81.28％的总体验证，总体准确性为97.03％。沿着同一条线，对于官方Inria测试数据集，我们的网络总体上得分77.86％和96.41％，而且准确性。

translated by 谷歌翻译

Erfnet: Efficient residual factorized convnet for real-time semantic segmentation

分类：

Semantic segmentation is a challenging task that addresses most of the perception needs of Intelligent Vehicles (IV) in an unified way. Deep Neural Networks excel at this task, as they can be trained end-to-end to accurately classify multiple object categories in an image at pixel level. However, a good trade-off between high quality and computational resources is yet not present in state-of-the-art semantic segmentation approaches, limiting their application in real vehicles. In this paper, we propose a deep architecture that is able to run in real-time while providing accurate semantic segmentation. The core of our architecture is a novel layer that uses residual connections and factorized convolutions in order to remain efficient while retaining remarkable accuracy. Our approach is able to run at over 83 FPS in a single Titan X, and 7 FPS in a Jetson TX1 (embedded GPU). A comprehensive set of experiments on the publicly available Cityscapes dataset demonstrates that our system achieves an accuracy that is similar to the state of the art, while being orders of magnitude faster to compute than other architectures that achieve top precision. The resulting trade-off makes our model an ideal approach for scene understanding in IV applications. The code is publicly available at: https://github.com/Eromera/erfnet

translated by 谷歌翻译

Object Detection with Deep Learning: A Review

Zhong-Qiu Zhao , Peng Zheng , Shou-tao Xu , Xindong Wu

分类：

2018-07-15

Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.

translated by 谷歌翻译

Multi-modal land cover mapping of remote sensing images using pyramid attention and gated fusion networks

Qinghui Liu , Michael Kampffmeyer , Robert Jenssen , Arnt-Børre Salberg

分类：计算机视觉

2021-11-06

多模态数据在遥感（RS）中变得容易获得，并且可以提供有关地球表面的互补信息。因此，多模态信息的有效融合对于卢比的各种应用是重要的，而且由于域差异，噪音和冗余，也是非常具有挑战性的。缺乏有效和可扩展的融合技术，用于遍布多种模式编码器和完全利用互补信息。为此，我们提出了一种基于新型金字塔注意融合（PAF）模块和门控融合单元（GFU）的多模态遥感数据的新型多模态网络（Multimodnet）。 PAF模块旨在有效地从每个模态中获得丰富的细粒度上下文表示，具有内置的交叉级别和巧克力关注融合机制，GFU模块利用了新颖的门控机制，用于早期合并特征，从而降低隐藏的冗余和噪音。这使得可以有效地提取补充方式来提取最迟到的特征融合的最有价值和互补的信息。两个代表性RS基准数据集的广泛实验证明了多模态土地覆盖分类的多模型的有效性，鲁棒性和优越性。

translated by 谷歌翻译

Deep Co-supervision and Attention Fusion Strategy for Automatic COVID-19 Lung Infection Segmentation on CT Images

Haigen Hu , Leizhao Shen , Qiu Guan , Xiaoxin Li , Qianwei Zhou , Su Ruan

分类：计算机视觉

2021-12-20

由于不规则的形状，正常和感染组织之间的各种尺寸和无法区分的边界，仍然是一种具有挑战性的任务，可以准确地在CT图像上进行Covid-19的感染病变。在本文中，提出了一种新的分段方案，用于通过增强基于编码器 - 解码器架构的不同级别的监督信息和融合多尺度特征映射来感染Covid-19。为此，提出了深入的协作监督（共同监督）计划，以指导网络学习边缘和语义的特征。更具体地，首先设计边缘监控模块（ESM），以通过将边缘监督信息结合到初始阶段的下采样的初始阶段来突出显示低电平边界特征。同时，提出了一种辅助语义监督模块（ASSM）来加强通过将掩码监督信息集成到稍后阶段来加强高电平语义信息。然后，通过使用注意机制来扩展高级和低电平特征映射之间的语义间隙，开发了一种注意融合模块（AFM）以融合不同级别的多个规模特征图。最后，在四个各种Covid-19 CT数据集上证明了所提出的方案的有效性。结果表明，提出的三个模块都是有希望的。基于基线（RESUNT），单独使用ESM，ASSM或AFM可以分别将骰子度量增加1.12 \％，1.95 \％，1.63 \％，而在我们的数据集中，通过将三个模型结合在一起可以上升3.97 \％。与各个数据集的现有方法相比，所提出的方法可以在某些主要指标中获得更好的分段性能，并可实现最佳的泛化和全面的性能。

translated by 谷歌翻译

Medical Image Segmentation Using Deep Learning: A Survey

Risheng Wang , Tao Lei , Ruixia Cui , Bingtao Zhang , Hongying Meng , Asoke K. Nandi

分类：计算机视觉

2020-09-28

深度学习已被广泛用于医学图像分割，并且录制了录制了该领域深度学习的成功的大量论文。在本文中，我们使用深层学习技术对医学图像分割的全面主题调查。本文进行了两个原创贡献。首先，与传统调查相比，直接将深度学习的文献分成医学图像分割的文学，并为每组详细介绍了文献，我们根据从粗略到精细的多级结构分类目前流行的文献。其次，本文侧重于监督和弱监督的学习方法，而不包括无监督的方法，因为它们在许多旧调查中引入而且他们目前不受欢迎。对于监督学习方法，我们分析了三个方面的文献：骨干网络的选择，网络块的设计，以及损耗功能的改进。对于虚弱的学习方法，我们根据数据增强，转移学习和交互式分割进行调查文献。与现有调查相比，本调查将文献分类为比例不同，更方便读者了解相关理由，并将引导他们基于深度学习方法思考医学图像分割的适当改进。

translated by 谷歌翻译

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Guosheng Lin , Anton Milan , Chunhua Shen , Ian Reid

分类：

2016-11-20

Australian Centre for Robotic Vision {guosheng.lin;anton.milan;chunhua.shen;

translated by 谷歌翻译

S\textsuperscript{2}-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation

Mohammed A. M. Elhassan , Chenhui Yang , Chenxi Huang , Tewodros Legesse Munea , Xin Hong

分类：计算机视觉 | 人工智能

2022-06-15

现代的高性能语义分割方法采用沉重的主链和扩张的卷积来提取相关特征。尽管使用上下文和语义信息提取功能对于分割任务至关重要，但它为实时应用程序带来了内存足迹和高计算成本。本文提出了一种新模型，以实现实时道路场景语义细分的准确性/速度之间的权衡。具体来说，我们提出了一个名为“比例吸引的条带引导特征金字塔网络”（s \ textsuperscript {2} -fpn）的轻巧模型。我们的网络由三个主要模块组成：注意金字塔融合（APF）模块，比例吸引条带注意模块（SSAM）和全局特征Upsample（GFU）模块。 APF采用了注意力机制来学习判别性多尺度特征，并有助于缩小不同级别之间的语义差距。 APF使用量表感知的关注来用垂直剥离操作编码全局上下文，并建模长期依赖性，这有助于将像素与类似的语义标签相关联。此外，APF还采用频道重新加权块（CRB）来强调频道功能。最后，S \ TextSuperScript {2} -fpn的解码器然后采用GFU，该GFU用于融合APF和编码器的功能。已经对两个具有挑战性的语义分割基准进行了广泛的实验，这表明我们的方法通过不同的模型设置实现了更好的准确性/速度权衡。提出的模型已在CityScapes Dataset上实现了76.2 \％miou/87.3fps，77.4 \％miou/67fps和77.8 \％miou/30.5fps，以及69.6 \％miou，71.0 miou，71.0 \％miou，和74.2 \％\％\％\％\％\％。 miou在Camvid数据集上。这项工作的代码将在\ url {https://github.com/mohamedac29/s2-fpn提供。

translated by 谷歌翻译

A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images

Irem Ulku , Erdem Akagunduz

分类：计算机视觉

2019-12-21

语义分割是图像的像素明智标记。由于在像素级别定义了问题，因此确定图像类标签是不可接受的，而是在原始图像像素分辨率下本地化它们是必要的。通过卷积神经网络（CNN）在创建语义，高级和分层图像特征方面的非凡能力推动;在过去十年中提出了几种基于深入的学习的2D语义分割方法。在本调查中，我们主要关注最近的语义细分科学发展，特别是在使用2D图像的基于深度学习的方法。我们开始分析了对2D语义分割的公共图像集和排行榜，概述了性能评估中使用的技术。在研究现场的演变时，我们按时间顺序分类为三个主要时期，即预先和早期的深度学习时代，完全卷积的时代和后FCN时代。我们在技术上分析了解决领域的基本问题的解决方案，例如细粒度的本地化和规模不变性。在借阅我们的结论之前，我们提出了一张来自所有提到的时代的方法表，每个方法都概述了他们对该领域的贡献。我们通过讨论现场当前的挑战以及他们已经解决的程度来结束调查。

translated by 谷歌翻译

Dense Prediction with Attentive Feature Aggregation

Yung-Hsu Yang , Thomas E. Huang , Samuel Rota Bulò , Peter Kontschieder , Fisher Yu

分类：计算机视觉

2021-11-01

跨不同层的特征的聚合信息是密集预测模型的基本操作。尽管表现力有限，但功能级联占主导地位聚合运营的选择。在本文中，我们引入了细分特征聚合（AFA），以融合不同的网络层，具有更具表现力的非线性操作。 AFA利用空间和渠道注意，以计算层激活的加权平均值。灵感来自神经体积渲染，我们将AFA扩展到规模空间渲染（SSR），以执行多尺度预测的后期融合。 AFA适用于各种现有网络设计。我们的实验表明了对挑战性的语义细分基准，包括城市景观，BDD100K和Mapillary Vistas的一致而显着的改进，可忽略不计的计算和参数开销。特别是，AFA改善了深层聚集（DLA）模型在城市景观上的近6％Miou的性能。我们的实验分析表明，AFA学会逐步改进分割地图并改善边界细节，导致新的最先进结果对BSDS500和NYUDV2上的边界检测基准。在http://vis.xyz/pub/dla-afa上提供代码和视频资源。

translated by 谷歌翻译

Modality specific U-Net variants for biomedical image segmentation: A survey

Narinder Singh Punn , Sonali Agarwal

分类：计算机视觉

2021-07-09

随着深度学习方法的进步，如深度卷积神经网络，残余神经网络，对抗网络的进步。 U-Net架构最广泛利用生物医学图像分割，以解决目标区域或子区域的识别和检测的自动化。在最近的研究中，基于U-Net的方法在不同应用中显示了最先进的性能，以便在脑肿瘤，肺癌，阿尔茨海默，乳腺癌等疾病的早期诊断和治疗中发育计算机辅助诊断系统等，使用各种方式。本文通过描述U-Net框架来提出这些方法的成功，然后通过执行1）型号的U-Net变体进行综合分析，2）模特内分类，建立更好的见解相关的挑战和解决方案。此外，本文还强调了基于U-Net框架在持续的大流行病，严重急性呼吸综合征冠状病毒2（SARS-COV-2）中的贡献也称为Covid-19。最后，分析了这些U-Net变体的优点和相似性以及生物医学图像分割所涉及的挑战，以发现该领域的未来未来的研究方向。

translated by 谷歌翻译

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Liang-Chieh Chen , George Papandreou , Iasonas Kokkinos , Kevin Murphy , Alan L. Yuille

分类：

2016-06-02

In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

translated by 谷歌翻译

LDNet: End-to-End Lane Marking Detection Approach Using a Dynamic Vision Sensor

Farzeen Munir , Shoaib Azam , Moongu Jeon , Byung-Geun Lee , Witold Pedrycz

分类：计算机视觉

2020-09-17

现代车辆配备各种驾驶员辅助系统，包括自动车道保持，这防止了无意的车道偏离。传统车道检测方法采用了手工制作或基于深度的学习功能，然后使用基于帧的RGB摄像机进行通道提取的后处理技术。用于车道检测任务的帧的RGB摄像机的利用易于照明变化，太阳眩光和运动模糊，这限制了车道检测方法的性能。在自主驾驶中的感知堆栈中结合了一个事件摄像机，用于自动驾驶的感知堆栈是用于减轻基于帧的RGB摄像机遇到的挑战的最有希望的解决方案之一。这项工作的主要贡献是设计车道标记检测模型，它采用动态视觉传感器。本文探讨了使用事件摄像机通过设计卷积编码器后跟注意引导的解码器的新颖性应用了车道标记检测。编码特征的空间分辨率由致密的区域空间金字塔池（ASPP）块保持。解码器中的添加剂注意机制可提高促进车道本地化的高维输入编码特征的性能，并缓解后处理计算。使用DVS数据集进行通道提取（DET）的DVS数据集进行评估所提出的工作的功效。实验结果表明，多人和二进制车道标记检测任务中的5.54 \％$ 5.54 \％$ 5.54 \％$ 5.03 \％$ 5.03 \％$ 5.03。此外，在建议方法的联盟（$ iou $）分数上的交叉点将超越最佳最先进的方法，分别以6.50 \％$ 6.50 \％$ 6.5.37 \％$ 9.37 \％$ 。

translated by 谷歌翻译

Computer Vision on X-ray Data in Industrial Production and Security Applications: A survey

Mehdi Rafiei , Jenni Raitoharju , Alexandros Iosifidis

分类：计算机视觉

2022-11-10

X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.

translated by 谷歌翻译

Atrous Space Bender U-Net (ASBU-Net/LogiNet)

Anurag Bansal , Oleg Ostap , Miguel Maestre Trueba , Kristopher Perry

分类：计算机视觉

2022-12-16

$ $With recent advances in CNNs, exceptional improvements have been made in semantic segmentation of high resolution images in terms of accuracy and latency. However, challenges still remain in detecting objects in crowded scenes, large scale variations, partial occlusion, and distortions, while still maintaining mobility and latency. We introduce a fast and efficient convolutional neural network, ASBU-Net, for semantic segmentation of high resolution images that addresses these problems and uses no novelty layers for ease of quantization and embedded hardware support. ASBU-Net is based on a new feature extraction module, atrous space bender layer (ASBL), which is efficient in terms of computation and memory. The ASB layers form a building block that is used to make ASBNet. Since this network does not use any special layers it can be easily implemented, quantized and deployed on FPGAs and other hardware with limited memory. We present experiments on resource and accuracy trade-offs and show strong performance compared to other popular models.

translated by 谷歌翻译

Exploring the Effects of Data Augmentation for Drivable Area Segmentation

Srinjoy Bhuiya , Ayushman Kumar , Sankalok Sen

分类：计算机视觉 | 人工智能

2022-08-06

可驱动区域的实时分割在完成汽车的自主感知中起着至关重要的作用。最近，使用深度学习的图像分割模型开发了一些快速的进步。但是，大多数进步都是在模型架构设计中取得的。在解决与细分有关的任何有监督的深度学习问题时，一个人构建的模型的成功取决于我们用于该模型的输入培训数据的数量和质量。该数据应包含良好的各种图像，以更好地工作分割模型。与数据集中的注释有关的问题可能会导致该模型在测试和验证中的压倒性I型和II型错误中得出结论，在试图解决现实世界问题时造成恶意问题。为了解决这个问题并使我们的模型更加准确，动态和健壮，数据增强涉及使用，因为它有助于扩展我们的样本培训数据并使其更好，整体上更加多样化。因此，在我们的研究中，我们专注于通过分析预先存在的图像数据集并相应地进行增强来研究数据增强的好处。我们的结果表明，现有最新模型（或SOTA）模型的性能和鲁棒性可以大大增加，而不会增加模型复杂性或推理时间。仅在对当今广泛使用中的其他几种增强方法和策略进行彻底研究及其相应的效果之后，仅在本文中决定并使用的增强作用。我们所有的结果都在广泛使用的CityScapes数据集上报告。

translated by 谷歌翻译