行人检测是自主驱动系统中最关键的模块。虽然相机通常用于此目的,但其质量严重降低了低光夜间驾驶场景。另一方面,热摄像机图像的质量在类似条件下保持不受影响。本文采用RGB和热图像提出了一种用于行人检测的端到端多峰融合模型。其新颖的时空深度网络架构能够有效利用多模式输入。它由两个不同的可变形ResNext-50编码器组成,用于来自两个方式的特征提取。这两个编码特征的融合发生在由几个图形关注网络和特征融合单元组成的多模式特征嵌入模块(MUFEM)内部。随后将MUFEM的最后一个特征融合单元的输出传递给两个CRF的空间细化。通过在四个不同方向横穿四个RNN的帮助下,通过应用渠道明智的关注和提取上下文信息来实现特征的进一步提高。最后,单级解码器使用这些特征映射来生成每个行人和分数图的边界框。我们在三个公开可用的多模式行人检测基准数据集,即Kaist,CVC-14和Utokyo上进行了广泛的框架实验。每个每个结果都改善了各种最先进的性能。在https://youtu.be/fdjdsifuucs,可以看到一个简短的视频以及其定性结果的概述。我们的源代码将在发布论文时发布。
translated by 谷歌翻译
Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.
translated by 谷歌翻译
热红外(TIR)图像在为多光谱行人检测提供温度提示时已经证明了有效性。大多数现有方法直接将TIR模型注入基于RGB的框架或简单地集合两个模态的结果。然而,这可能导致较差的检测性能,因为RGB和TIR特征通常具有模态特定的噪声,这可能与网络的传播一起恶化。因此,这项工作提出了一种称为双向自适应注意栅极(BAA门)的有效和高效的跨型号融合模块。基于注意机制,设计了BAA门以蒸馏出信息特征,并重新校验渐近的表示。具体地,采用双向多阶段融合策略来逐步优化两种方式的特征,并在传播期间保持其特异性。此外,通过基于照明的权重策略引入了BAA栅极的自适应相互作用,以便于在BAA栅极中自适应地调整重新校准和聚集强度,并增强稳健性对照明变化。关于挑战性的Kaist DataSet的相当大的实验证明了我们对令人满意的速度的卓越性能。
translated by 谷歌翻译
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
translated by 谷歌翻译
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.
translated by 谷歌翻译
Pedestrian detection in the wild remains a challenging problem especially for scenes containing serious occlusion. In this paper, we propose a novel feature learning method in the deep learning framework, referred to as Feature Calibration Network (FC-Net), to adaptively detect pedestrians under various occlusions. FC-Net is based on the observation that the visible parts of pedestrians are selective and decisive for detection, and is implemented as a self-paced feature learning framework with a self-activation (SA) module and a feature calibration (FC) module. In a new self-activated manner, FC-Net learns features which highlight the visible parts and suppress the occluded parts of pedestrians. The SA module estimates pedestrian activation maps by reusing classifier weights, without any additional parameter involved, therefore resulting in an extremely parsimony model to reinforce the semantics of features, while the FC module calibrates the convolutional features for adaptive pedestrian representation in both pixel-wise and region-based ways. Experiments on CityPersons and Caltech datasets demonstrate that FC-Net improves detection performance on occluded pedestrians up to 10% while maintaining excellent performance on non-occluded instances.
translated by 谷歌翻译
使用多模式输入的对象检测可以改善许多安全性系统,例如自动驾驶汽车(AVS)。由白天和黑夜运行的AV动机,我们使用RGB和热摄像机研究多模式对象检测,因为后者在较差的照明下提供了更强的对象签名。我们探索融合来自不同方式的信息的策略。我们的关键贡献是一种概率结合技术,Proben,一种简单的非学习方法,可以将多模式的检测融合在一起。我们从贝叶斯的规则和第一原则中得出了探针,这些原则在跨模态上采用条件独立性。通过概率边缘化,当检测器不向同一物体发射时,概率可以优雅地处理缺失的方式。重要的是,即使有条件的独立性假设不存在,也可以显着改善多模式检测,例如,从其他融合方法(包括现成的内部和训练有素的内部)融合输出。我们在两个基准上验证了包含对齐(KAIST)和未对准(Flir)多模式图像的基准,这表明Proben的相对性能优于先前的工作超过13%!
translated by 谷歌翻译
环境的敏感性和敏感性在自主车辆的安全和安全运行中起着决定性作用。这种对周围的感知是类似于人类视觉表示的方式。人类的大脑通过利用不同的感官频道并开发视图不变的表示模型来感知环境。在这种情况下保持,不同的脱模传感器部署在自主车辆上,以感知环境。最常见的遗赠传感器是自主车辆感知的相机,激光乐队和雷达。尽管存在这些传感器,但在可见的光谱结构域中已经在不利的天气条件下说明了它们的益处,例如,在夜间,它们具有有限的操作能力,这可能导致致命事故。在这项工作中,我们探讨了热对象检测,以通过采用自我监督的对比度学习方法来模拟视图不变模型表示。为此,我们提出了一个深度神经网络自我监督的热网络(SSTN),用于学习通过对比学习来最大化可见和红外光谱域之间的信息,并在使用这些学习特征表示使用的使用多尺度编码器 - 解码器互感器网络。在两个公共可用的数据集中广泛评估所提出的方法:FLIR-ADAS数据集和KAIST多光谱数据集。实验结果说明了所提出的方法的功效。
translated by 谷歌翻译
基于深度学习的检测网络在自动驾驶系统(广告)中取得了显着进展。广告应在各种环境照明和恶劣天气条件下具有可靠的性能。然而,亮度劣化和视觉障碍物(如眩光,雾)导致视觉相机质量差,导致性能下降。为了克服这些挑战,我们探讨了利用不同数据模型的想法,这些数据模块不同于视觉数据。我们提出了一种基于多模式协作框架的全面检测系统,该框架从RGB(来自Visual Cameras)和热(来自红外相机)数据学习。该框架在学习其自身模式的学习最佳特征中提供了灵活性,同时还包含对方的互补知识。我们广泛的经验结果表明,虽然准确性的提高是标称的,但该值在于挑战性和极其困难的边缘情况,这在广告中的安全关键应用中至关重要。我们提供了在检测中使用热成像系统的效果和限制的整体视图。
translated by 谷歌翻译
RGB互补的金属氧化物导体(CMOS)传感器在可见光光谱中起作用。因此,它对环境光条件非常敏感。相反,在8-14微米光谱带中运行的长波红外(LWIR)传感器,与可见光无关。在本文中,我们利用视觉和热感知单元来实现可靠的对象检测目的。在FLIR [1]数据集的精致同步和(交叉)标记之后,该多模式感知数据通过卷积神经网络(CNN),以检测道路上的三个关键物体,即行人,自行车和汽车。在评估RGB和红外线(通常可以互换使用热和红外)传感器后,将各种网络结构进行比较,以有效地将数据融合在功能级别上。我们的RGB-Thermal(RGBT)融合网络利用了新型的熵块注意模块(EBAM),以82.9%的地图优于最先进的网络[2]。
translated by 谷歌翻译
RGB-thermal显着对象检测(RGB-T SOD)旨在定位对齐可见的和热红外图像对的共同突出对象,并准确地分割所有属于这些对象的像素。由于对热图像的照明条件不敏感,它在诸如夜间和复杂背景之类的具有挑战性的场景中很有希望。因此,RGB-T SOD的关键问题是使两种方式的功能相互补充并互相调整,因为不可避免的是,由于极端光条件和诸如极端光条件和诸如极端光明条件和热跨界。在本文中,我们提出了一个针对RGB-T SOD的新型镜子互补变压器网络(MCNET)。具体而言,我们将基于变压器的特征提取模块引入RGB和热图像的有效提取分层特征。然后,通过基于注意力的特征相互作用和基于串行的多尺度扩张卷积(SDC)特征融合模块,提出的模型实现了低级特征的互补相互作用以及深度特征的语义融合。最后,基于镜子互补结构,即使是一种模态也可以准确地提取两种方式的显着区域也是无效的。为了证明在现实世界中具有挑战性的场景下提出的模型的鲁棒性,我们基于自动驾驶域中使用的大型公共语义分段RGB-T数据集建立了一种新颖的RGB-T SOD数据集VT723。基准和VT723数据集上的昂贵实验表明,所提出的方法优于最先进的方法,包括基于CNN的方法和基于变压器的方法。该代码和数据集将在稍后在https://github.com/jxr326/swinmcnet上发布。
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
Computer vision applications in intelligent transportation systems (ITS) and autonomous driving (AD) have gravitated towards deep neural network architectures in recent years. While performance seems to be improving on benchmark datasets, many real-world challenges are yet to be adequately considered in research. This paper conducted an extensive literature review on the applications of computer vision in ITS and AD, and discusses challenges related to data, models, and complex urban environments. The data challenges are associated with the collection and labeling of training data and its relevance to real world conditions, bias inherent in datasets, the high volume of data needed to be processed, and privacy concerns. Deep learning (DL) models are commonly too complex for real-time processing on embedded hardware, lack explainability and generalizability, and are hard to test in real-world settings. Complex urban traffic environments have irregular lighting and occlusions, and surveillance cameras can be mounted at a variety of angles, gather dirt, shake in the wind, while the traffic conditions are highly heterogeneous, with violation of rules and complex interactions in crowded scenarios. Some representative applications that suffer from these problems are traffic flow estimation, congestion detection, autonomous driving perception, vehicle interaction, and edge computing for practical deployment. The possible ways of dealing with the challenges are also explored while prioritizing practical deployment.
translated by 谷歌翻译
近年来,多个对象跟踪引起了研究人员的极大兴趣,它已成为计算机视觉中的趋势问题之一,尤其是随着自动驾驶的最新发展。 MOT是针对不同问题的关键视觉任务之一,例如拥挤的场景中的闭塞,相似的外观,小物体检测难度,ID切换等,以应对这些挑战,因为研究人员试图利用变压器的注意力机制,与田径的相互关系,与田径的相互关系,图形卷积神经网络,与暹罗网络不同帧中对象的外观相似性,他们还尝试了基于IOU匹配的CNN网络,使用LSTM的运动预测。为了将这些零散的技术在雨伞下采用,我们研究了过去三年发表的一百多篇论文,并试图提取近代研究人员更关注的技术来解决MOT的问题。我们已经征集了许多应用,可能性以及MOT如何与现实生活有关。我们的评论试图展示研究人员使用过时的技术的不同观点,并为潜在的研究人员提供了一些未来的方向。此外,我们在这篇评论中包括了流行的基准数据集和指标。
translated by 谷歌翻译
除标准摄像机外,自动驾驶汽车通常还包括多个其他传感器,例如激光雷达和雷达,这些传感器有助于获取更丰富的信息以感知驾驶场景的内容。尽管最近的几项作品着重于通过使用特定于检查设置的架构组件融合某些传感器,例如相机,镜头或相机和雷达,但文献中缺少了通用和模块化传感器融合体系结构。在这项工作中,我们专注于2D对象检测,这是在2D图像域上定义的基本高级任务,并提出了HRFUSER,这是一种多分辨率的传感器融合体系结构,可直接扩展到任意数量的输入模式。 HRFUSER的设计基于用于仅图像密集预测的最新高分辨率网络,并结合了一种新型的多窗口交叉注意区块,作为在多种分辨率下进行多种模态融合的手段。即使单独的相机为2D检测提供了非常有用的功能,我们通过对Nuscenes的广泛实验进行了证明,并通过FOG数据集查看,我们的模型有效地利用了其他模态的互补功能,从而实质上改善了相机性能,并始终如一地超过了更胜过摄影机的状态表现。在正常情况下和不利条件下,用于2D检测的ART融合方法。源代码将公开可用。
translated by 谷歌翻译
由于其前所未有的优势,在规模,移动,部署和隐蔽观察能力方面,空中平台和成像传感器的快速出现是实现新的空中监测形式。本文从计算机视觉和模式识别的角度来看,全面概述了以人为本的空中监控任务。它旨在为读者提供使用无人机,无人机和其他空中平台的空中监测任务当前状态的深入系统审查和技术分析。感兴趣的主要对象是人类,其中要检测单个或多个受试者,识别,跟踪,重新识别并进行其行为。更具体地,对于这四项任务中的每一个,我们首先讨论与基于地面的设置相比在空中环境中执行这些任务的独特挑战。然后,我们审查和分析公共可用于每项任务的航空数据集,并深入了解航空文学中的方法,并调查他们目前如何应对鸟瞰挑战。我们在讨论缺失差距和开放研究问题的讨论中得出结论,告知未来的研究途径。
translated by 谷歌翻译
Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.
translated by 谷歌翻译
多光谱遥感图像对的横向熔断互补信息可以提高检测算法的感知能力,使其更加坚固可靠,对更广泛的应用,例如夜间检测。与先前的方法相比,我们认为应具体处理不同的功能,应保留和增强模态特定功能,而模态共享功能应从RGB和热IR模型挑选。在此思想之后,提出了一种具有关节共模和微分方式的小说和轻质的多光谱特征融合方法,称为跨型号注意特征融合(CMAFF)。鉴于RGB和IR图像的中间特征映射,我们的模块并行Infers Infers来自两个单独的模态,共同和微分方式,然后分别将注意力映射乘以自适应特征增强或选择。广泛的实验表明,我们的建议方法可以以低计算成本实现最先进的性能。
translated by 谷歌翻译
现代车辆配备各种驾驶员辅助系统,包括自动车道保持,这防止了无意的车道偏离。传统车道检测方法采用了手工制作或基于深度的学习功能,然后使用基于帧的RGB摄像机进行通道提取的后处理技术。用于车道检测任务的帧的RGB摄像机的利用易于照明变化,太阳眩光和运动模糊,这限制了车道检测方法的性能。在自主驾驶中的感知堆栈中结合了一个事件摄像机,用于自动驾驶的感知堆栈是用于减轻基于帧的RGB摄像机遇到的挑战的最有希望的解决方案之一。这项工作的主要贡献是设计车道标记检测模型,它采用动态视觉传感器。本文探讨了使用事件摄像机通过设计卷积编码器后跟注意引导的解码器的新颖性应用了车道标记检测。编码特征的空间分辨率由致密的区域空间金字塔池(ASPP)块保持。解码器中的添加剂注意机制可提高促进车道本地化的高维输入编码特征的性能,并缓解后处理计算。使用DVS数据集进行通道提取(DET)的DVS数据集进行评估所提出的工作的功效。实验结果表明,多人和二进制车道标记检测任务中的5.54 \%$ 5.54 \%$ 5.54 \%$ 5.03 \%$ 5.03 \%$ 5.03。此外,在建议方法的联盟($ iou $)分数上的交叉点将超越最佳最先进的方法,分别以6.50 \%$ 6.50 \%$ 6.5.37 \%$ 9.37 \%$ 。
translated by 谷歌翻译
在鸟眼中学习强大的表现(BEV),以进行感知任务,这是趋势和吸引行业和学术界的广泛关注。大多数自动驾驶算法的常规方法在正面或透视视图中执行检测,细分,跟踪等。随着传感器配置变得越来越复杂,从不同的传感器中集成了多源信息,并在统一视图中代表功能至关重要。 BEV感知继承了几个优势,因为代表BEV中的周围场景是直观和融合友好的。对于BEV中的代表对象,对于随后的模块,如计划和/或控制是最可取的。 BEV感知的核心问题在于(a)如何通过从透视视图到BEV来通过视图转换来重建丢失的3D信息; (b)如何在BEV网格中获取地面真理注释; (c)如何制定管道以合并来自不同来源和视图的特征; (d)如何适应和概括算法作为传感器配置在不同情况下各不相同。在这项调查中,我们回顾了有关BEV感知的最新工作,并对不同解决方案进行了深入的分析。此外,还描述了该行业的BEV方法的几种系统设计。此外,我们推出了一套完整的实用指南,以提高BEV感知任务的性能,包括相机,激光雷达和融合输入。最后,我们指出了该领域的未来研究指示。我们希望该报告能阐明社区,并鼓励对BEV感知的更多研究。我们保留一个活跃的存储库来收集最新的工作,并在https://github.com/openperceptionx/bevperception-survey-recipe上提供一包技巧的工具箱。
translated by 谷歌翻译