智能论文笔记

OCR-RTPS: An OCR-based real-time positioning system for the valet parking

Zizhang Wu , Xinyuan Chen , Jizheng Wang , Xiaoquan Wang , Yuanzhu Gan , Muqing Fang , Tianhao Xu

分类：计算机视觉 | 机器人

2022-12-08

Obtaining the position of ego-vehicle is a crucial prerequisite for automatic control and path planning in the field of autonomous driving. Most existing positioning systems rely on GPS, RTK, or wireless signals, which are arduous to provide effective localization under weak signal conditions. This paper proposes a real-time positioning system based on the detection of the parking numbers as they are unique positioning marks in the parking lot scene. It does not only can help with the positioning with open area, but also run independently under isolation environment. The result tested on both public datasets and self-collected dataset show that the system outperforms others in both performances and applies in practice. In addition, the code and dataset will release later.

translated by 谷歌翻译

Surround-view Fisheye BEV-Perception for Valet Parking: Dataset, Baseline and Distortion-insensitive Multi-task Framework

Zizhang Wu , Yuanzhu Gan , Xianzhi Li , Yunzhe Wu , Xiaoquan Wang , Tianhao Xu , Fan Wang

分类：计算机视觉

2022-12-08

Surround-view fisheye perception under valet parking scenes is fundamental and crucial in autonomous driving. Environmental conditions in parking lots perform differently from the common public datasets, such as imperfect light and opacity, which substantially impacts on perception performance. Most existing networks based on public datasets may generalize suboptimal results on these valet parking scenes, also affected by the fisheye distortion. In this article, we introduce a new large-scale fisheye dataset called Fisheye Parking Dataset(FPD) to promote the research in dealing with diverse real-world surround-view parking cases. Notably, our compiled FPD exhibits excellent characteristics for different surround-view perception tasks. In addition, we also propose our real-time distortion-insensitive multi-task framework Fisheye Perception Network (FPNet), which improves the surround-view fisheye BEV perception by enhancing the fisheye distortion operation and multi-task lightweight designs. Extensive experiments validate the effectiveness of our approach and the dataset's exceptional generalizability.

translated by 谷歌翻译

Autonomous Driving in Adverse Weather Conditions: A Survey

Yuxiao Zhang , Alexander Carballo , Hanting Yang , Kazuya Takeda

分类：机器人

2021-12-16

自动化驾驶系统（广告）开辟了汽车行业的新领域，为未来的运输提供了更高的效率和舒适体验的新可能性。然而，在恶劣天气条件下的自主驾驶已经存在，使自动车辆（AVS）长时间保持自主车辆（AVS）或更高的自主权。本文评估了天气在分析和统计方式中为广告传感器带来的影响和挑战，并对恶劣天气条件进行了解决方案。彻底报道了关于对每种天气的感知增强的最先进技术。外部辅助解决方案如V2X技术，当前可用的数据集，模拟器和天气腔室的实验设施中的天气条件覆盖范围明显。通过指出各种主要天气问题，自主驾驶场目前正在面临，近年来审查硬件和计算机科学解决方案，这项调查概述了在不利的天气驾驶条件方面的障碍和方向的障碍和方向。

translated by 谷歌翻译

Multi-modal Fusion Technology based on Vehicle Information: A Survey

Yan Gong , Jianli Lu , Jiayi Wu , Wenzhuo Liu

分类：机器人 | 计算机视觉

2022-11-11

Multi-modal fusion is a basic task of autonomous driving system perception, which has attracted many scholars' interest in recent years. The current multi-modal fusion methods mainly focus on camera data and LiDAR data, but pay little attention to the kinematic information provided by the bottom sensors of the vehicle, such as acceleration, vehicle speed, angle of rotation. These information are not affected by complex external scenes, so it is more robust and reliable. In this paper, we introduce the existing application fields of vehicle bottom information and the research progress of related methods, as well as the multi-modal fusion methods based on bottom information. We also introduced the relevant information of the vehicle bottom information data set in detail to facilitate the research as soon as possible. In addition, new future ideas of multi-modal fusion technology for autonomous driving tasks are proposed to promote the further utilization of vehicle bottom information.

translated by 谷歌翻译

Vision-Based Environmental Perception for Autonomous Driving

Fei Liu , Zihao Lu , Xianke Lin

分类：计算机视觉

2022-12-22

Visual perception plays an important role in autonomous driving. One of the primary tasks is object detection and identification. Since the vision sensor is rich in color and texture information, it can quickly and accurately identify various road information. The commonly used technique is based on extracting and calculating various features of the image. The recent development of deep learning-based method has better reliability and processing speed and has a greater advantage in recognizing complex elements. For depth estimation, vision sensor is also used for ranging due to their small size and low cost. Monocular camera uses image data from a single viewpoint as input to estimate object depth. In contrast, stereo vision is based on parallax and matching feature points of different views, and the application of deep learning also further improves the accuracy. In addition, Simultaneous Location and Mapping (SLAM) can establish a model of the road environment, thus helping the vehicle perceive the surrounding environment and complete the tasks. In this paper, we introduce and compare various methods of object detection and identification, then explain the development of depth estimation and compare various methods based on monocular, stereo, and RDBG sensors, next review and compare various methods of SLAM, and finally summarize the current problems and present the future development trends of vision technologies.

translated by 谷歌翻译

Providentia -- A Large-Scale Sensor System for the Assistance of Autonomous Vehicles and Its Evaluation

Annkathrin Krämmer , Christoph Schöller , Dhiraj Gulati , Venkatnarayanan Lakshminarasimhan , Franz Kurz , Dominik Rosenbaum , Claus Lenz , Alois Knoll

分类：机器人 | 计算机视觉

2019-06-16

自主车辆的环境感知受其物理传感器范围和算法性能的限制，以及通过降低其对正在进行的交通状况的理解的闭塞。这不仅构成了对安全和限制驾驶速度的重大威胁，而且它也可能导致不方便的动作。智能基础设施系统可以帮助缓解这些问题。智能基础设施系统可以通过在当前交通情况的数字模型的形式提供关于其周围环境的额外详细信息，填补了车辆的感知中的差距并扩展了其视野。数字双胞胎。然而，这种系统的详细描述和工作原型表明其可行性稀缺。在本文中，我们提出了一种硬件和软件架构，可实现这样一个可靠的智能基础架构系统。我们在现实世界中实施了该系统，并展示了它能够创建一个准确的延伸高速公路延伸的数字双胞胎，从而提高了自主车辆超越其车载传感器的极限的感知。此外，我们通过使用空中图像和地球观测方法来评估数字双胞胎的准确性和可靠性，用于产生地面真理数据。

translated by 谷歌翻译

Towards End-to-end Car License Plate Location and Recognition in Unconstrained Scenarios

Shuxin Qin , Sijiang Liu

分类：计算机视觉 | 人工智能 | 机器学习

2020-08-25

从卷积神经网络的快速发展中受益，汽车牌照检测和识别的性能得到了很大的改善。但是，大多数现有方法分别解决了检测和识别问题，并专注于特定方案，这阻碍了现实世界应用的部署。为了克服这些挑战，我们提出了一个有效而准确的框架，以同时解决车牌检测和识别任务。这是一个轻巧且统一的深神经网络，可以实时优化端到端。具体而言，对于不受约束的场景，采用了无锚方法来有效检测车牌的边界框和四个角，这些框用于提取和纠正目标区域特征。然后，新型的卷积神经网络分支旨在进一步提取角色的特征而不分割。最后，将识别任务视为序列标记问题，这些问题通过连接派时间分类（CTC）解决。选择了几个公共数据集，包括在各种条件下从不同方案中收集的图像进行评估。实验结果表明，所提出的方法在速度和精度上都显着优于先前的最新方法。

translated by 谷歌翻译

A Survey and Framework of Cooperative Perception: From Heterogeneous Singleton to Hierarchical Cooperation

Zhengwei Bai , Guoyuan Wu , Matthew J. Barth , Yongkang Liu , Emrah Akin Sisbot , Kentaro Oguchi , Zhitong Huang

分类：计算机视觉

2022-08-22

感知环境是实现合作驾驶自动化（CDA）的最基本关键之一，该关键被认为是解决当代运输系统的安全性，流动性和可持续性问题的革命性解决方案。尽管目前在计算机视觉的物体感知领域正在发生前所未有的进化，但由于不可避免的物理遮挡和单辆车的接受程度有限，最先进的感知方法仍在与复杂的现实世界流量环境中挣扎系统。基于多个空间分离的感知节点，合作感知（CP）诞生是为了解锁驱动自动化的感知瓶颈。在本文中，我们全面审查和分析了CP的研究进度，据我们所知，这是第一次提出统一的CP框架。审查了基于不同类型的传感器的CP系统的体系结构和分类学，以显示对CP系统的工作流程和不同结构的高级描述。对节点结构，传感器模式和融合方案进行了审查和分析，并使用全面的文献进行了详细的解释。提出了分层CP框架，然后对现有数据集和模拟器进行审查，以勾勒出CP的整体景观。讨论重点介绍了当前的机会，开放挑战和预期的未来趋势。

translated by 谷歌翻译

High-Definition Map Generation Technologies For Autonomous Driving: A Review

Zhibin Bao , Sabir Hossain , Haoxiang Lang , Xianke Lin

分类：机器人 | 计算机视觉

2022-06-11

在过去几年中，自动驾驶一直是最受欢迎，最具挑战性的主题之一。在实现完全自治的道路上，研究人员使用了各种传感器，例如LIDAR，相机，惯性测量单元（IMU）和GPS，并开发了用于自动驾驶应用程序的智能算法，例如对象检测，对象段，障碍，避免障碍物，避免障碍物和障碍物，以及路径计划。近年来，高清（HD）地图引起了很多关注。由于本地化中高清图的精度和信息水平很高，因此它立即成为自动驾驶的关键组成部分之一。从Baidu Apollo，Nvidia和TomTom等大型组织到个别研究人员，研究人员创建了用于自主驾驶的不同场景和用途的高清地图。有必要查看高清图生成的最新方法。本文回顾了最新的高清图生成技术，这些技术利用了2D和3D地图生成。这篇评论介绍了高清图的概念及其在自主驾驶中的有用性，并详细概述了高清地图生成技术。我们还将讨论当前高清图生成技术的局限性，以激发未来的研究。

translated by 谷歌翻译

Deep Learning based Computer Vision Methods for Complex Traffic Environments Perception: A Review

Talha Azfar , Jinlong Li , Hongkai Yu , Ruey Long Cheu , Yisheng Lv , Ruimin Ke

分类：计算机视觉

2022-11-09

Computer vision applications in intelligent transportation systems (ITS) and autonomous driving (AD) have gravitated towards deep neural network architectures in recent years. While performance seems to be improving on benchmark datasets, many real-world challenges are yet to be adequately considered in research. This paper conducted an extensive literature review on the applications of computer vision in ITS and AD, and discusses challenges related to data, models, and complex urban environments. The data challenges are associated with the collection and labeling of training data and its relevance to real world conditions, bias inherent in datasets, the high volume of data needed to be processed, and privacy concerns. Deep learning (DL) models are commonly too complex for real-time processing on embedded hardware, lack explainability and generalizability, and are hard to test in real-world settings. Complex urban traffic environments have irregular lighting and occlusions, and surveillance cameras can be mounted at a variety of angles, gather dirt, shake in the wind, while the traffic conditions are highly heterogeneous, with violation of rules and complex interactions in crowded scenarios. Some representative applications that suffer from these problems are traffic flow estimation, congestion detection, autonomous driving perception, vehicle interaction, and edge computing for practical deployment. The possible ways of dealing with the challenges are also explored while prioritizing practical deployment.

translated by 谷歌翻译

BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images

Zhihuang Zhang , Meng Xu , Wenqiang Zhou , Tao Peng , Liang Li , Stefan Poslad

分类：计算机视觉 | 人工智能

2022-11-27

Accurate localization ability is fundamental in autonomous driving. Traditional visual localization frameworks approach the semantic map-matching problem with geometric models, which rely on complex parameter tuning and thus hinder large-scale deployment. In this paper, we propose BEV-Locator: an end-to-end visual semantic localization neural network using multi-view camera images. Specifically, a visual BEV (Birds-Eye-View) encoder extracts and flattens the multi-view images into BEV space. While the semantic map features are structurally embedded as map queries sequence. Then a cross-model transformer associates the BEV features and semantic map queries. The localization information of ego-car is recursively queried out by cross-attention modules. Finally, the ego pose can be inferred by decoding the transformer outputs. We evaluate the proposed method in large-scale nuScenes and Qcraft datasets. The experimental results show that the BEV-locator is capable to estimate the vehicle poses under versatile scenarios, which effectively associates the cross-model information from multi-view images and global semantic maps. The experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$^\circ$ in lateral, longitudinal translation and heading angle degree.

translated by 谷歌翻译

Near-field Perception for Low-Speed Vehicle Automation using Surround-view Fisheye Cameras

Ciaran Eising , Jonathan Horgan , Senthil Yogamani

分类：计算机视觉 | 机器人

2021-03-31

摄像机是自动化驱动系统中的主要传感器。它们提供高信息密度，并对检测为人类视野提供的道路基础设施线索最优。环绕式摄像机系统通常包括具有190 {\ DEG} +视野的四个鱼眼相机，覆盖在车辆周围的整个360 {\ DEG}集中在近场传感上。它们是低速，高精度和近距离传感应用的主要传感器，如自动停车，交通堵塞援助和低速应急制动。在这项工作中，我们提供了对这种视觉系统的详细调查，在可以分解为四个模块化组件的架构中，设置调查即可识别，重建，重建和重组。我们共同称之为4R架构。我们讨论每个组件如何完成特定方面，并提供一个位置论证，即它们可以协同组织以形成用于低速自动化的完整感知系统。我们通过呈现来自以前的作品的结果，并通过向此类系统提出架构提案来支持此参数。定性结果在视频中呈现在HTTPS://youtu.be/ae8bcof7777uy中。

translated by 谷歌翻译

Map Container: A Map-based Framework for Cooperative Perception

Kun Jiang , Yining Shi , Benny Wijaya , Mengmeng Yang , Tuopu Wen , Zhongyang Xiao , Diange Yang

分类：机器人

2022-08-28

合作感知的想法是从多辆车之间的共同感知数据中受益，并克服单车上车载传感器的局限性。但是，由于本地化不准确，通信带宽和模棱两可的融合，多车信息的融合仍然具有挑战性。过去的实践通过放置精确的GNSS定位系统来简化问题，手动指定连接的车辆数量并确定融合策略。本文提出了一个基于地图的合作感知框架，名为MAP容器，以提高合作感的准确性和鲁棒性，最终克服了这个问题。概念“地图容器”表示地图是将所有信息转换为地图坐标空间的平台，并将不同的信息源合并到分布式融合体系结构中。在拟议的MAP容器中，考虑使用GNSS信号和传感器功能和地图功能之间的匹配关系以优化环境状态的估计。对仿真数据集和房地车平台的评估结果验证了所提出的方法的有效性。

translated by 谷歌翻译

Public Parking Spot Detection And Geo-localization Using Transfer Learning

Moseli Mots'oehli , Yao Chao Yang

分类：计算机视觉

2022-09-01

在世界各地的城市中，找到带有空置停车位的公共停车场是一个主要问题，使通勤时间耗费时间并增加交通拥堵。这项工作说明了如何使用手机摄像机的地理标签图像数据集，可用于导航到约翰内斯堡最方便的公共停车场，并带有可用的停车位，可由神经网络驱动的公共摄像头检测到。这些图像用于微调在Imagenet数据集上预先训练的检测模型，以证明对空置停车位的检测和分割，然后我们添加停车场的相应经度和纬度坐标，向基于驾驶员的最方便的停车场推荐停车位在距离距离和可用停车位数量上。使用VGG图像注释（VIA），我们使用来自扩展图像数据集的76张图像，并用四种不同类型的感兴趣对象的多边形大纲进行注释：汽车，开放式停车位，人员和汽车号码。我们使用细分模型来确保可以在生产中遮住数字板，以匿名使用汽车注册。我们在汽车和停车位上分别获得了89％和82％的交叉点。这项工作有可能帮助减少通勤者花费的时间来寻找免费的公共停车场，从而缓解购物综合大楼和其他公共场所的交通拥堵，并在公共道路上开车时最大程度地利用人们的实用性。

translated by 谷歌翻译

nuScenes: A multimodal dataset for autonomous driving

Holger Caesar , Varun Bankiti , Alex H. Lang , Sourabh Vora , Venice Erin Liong , Qiang Xu , Anush Krishnan , Yu Pan , Giancarlo Baldan , Oscar Beijbom

分类：

2019-03-26

Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We define novel 3D detection and tracking metrics. We also provide careful dataset analysis as well as baselines for lidar and image based detection and tracking. Data, development kit and more information are available online 1 .

translated by 谷歌翻译

A Survey of Deep Learning Techniques for Autonomous Driving

Sorin Grigorescu , Bogdan Trasnea , Tiberiu Cocias , Gigel Macesanu

分类：

2019-10-17

The last decade witnessed increasingly rapid progress in self-driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence. The objective of this paper is to survey the current state-of-the-art on deep learning technologies used in autonomous driving. We start by presenting AI-based self-driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration and motion control algorithms. We investigate both the modular perception-planning-action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources and computational hardware. The comparison presented in this survey helps to gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices. 1

translated by 谷歌翻译

Semantic Visual Simultaneous Localization and Mapping: A Survey

Kaiqi Chen , Jianhua Zhang , Jialing Liu , Qiyi Tong , Ruyu Liu , Shengyong Chen

分类：计算机视觉

2022-09-14

视觉同时定位和映射（VSLAM）在计算机视觉和机器人社区中取得了巨大进展，并已成功用于许多领域，例如自主机器人导航和AR/VR。但是，VSLAM无法在动态和复杂的环境中实现良好的定位。许多出版物报告说，通过与VSLAM结合语义信息，语义VSLAM系统具有近年来解决上述问题的能力。然而，尚无关于语义VSLAM的全面调查。为了填补空白，本文首先回顾了语义VSLAM的发展，并明确着眼于其优势和差异。其次，我们探讨了语义VSLAM的三个主要问题：语义信息的提取和关联，语义信息的应用以及语义VSLAM的优势。然后，我们收集和分析已广泛用于语义VSLAM系统的当前最新SLAM数据集。最后，我们讨论未来的方向，该方向将为语义VSLAM的未来发展提供蓝图。

translated by 谷歌翻译

TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance

Yajun Xu , Chuwen Huang , Yibing Nan , Shiguo Lian

分类：计算机视觉

2022-09-26

自动交通事故检测已吸引机器视觉社区，因为它对自动智能运输系统（ITS）的发展产生了影响和对交通安全的重要性。然而，大多数关于有效分析和交通事故预测的研究都使用了覆盖范围有限的小规模数据集，从而限制了其效果和适用性。交通事故中现有的数据集是小规模，不是来自监视摄像机，而不是开源的，或者不是为高速公路场景建造的。由于在高速公路上发生事故，因此往往会造成严重损坏，并且太快了，无法赶上现场。针对从监视摄像机收集的高速公路交通事故的开源数据集非常需要和实际上。为了帮助视觉社区解决这些缺点，我们努力收集涵盖丰富场景的真实交通事故的视频数据。在通过各个维度进行集成和注释后，在这项工作中提出了一个名为TAD的大规模交通事故数据集。在这项工作中，使用公共主流视觉算法或框架进行了有关图像分类，对象检测和视频分类任务的各种实验，以证明不同方法的性能。拟议的数据集以及实验结果将作为改善计算机视觉研究的新基准提出，尤其是在其中。

translated by 谷歌翻译

Traffic-Net: 3D Traffic Monitoring Using a Single Camera

Mahdi Rezaei , Mohsen Azarmi , Farzam Mohammad Pour Mir

分类：计算机视觉 | 人工智能 | 机器学习

2021-09-19

计算机视觉在智能运输系统（ITS）和交通监视中发挥了重要作用。除了快速增长的自动化车辆和拥挤的城市外，通过实施深层神经网络的实施，可以使用视频监视基础架构进行自动和高级交通管理系统（ATM）。在这项研究中，我们为实时交通监控提供了一个实用的平台，包括3D车辆/行人检测，速度检测，轨迹估算，拥塞检测以及监视车辆和行人的相互作用，都使用单个CCTV交通摄像头。我们适应了定制的Yolov5深神经网络模型，用于车辆/行人检测和增强的排序跟踪算法。还开发了基于混合卫星的基于混合卫星的逆透视图（SG-IPM）方法，用于摄像机自动校准，从而导致准确的3D对象检测和可视化。我们还根据短期和长期的时间视频数据流开发了层次结构的交通建模解决方案，以了解脆弱道路使用者的交通流量，瓶颈和危险景点。关于现实世界情景和与最先进的比较的几项实验是使用各种交通监控数据集进行的，包括从高速公路，交叉路口和城市地区收集的MIO-TCD，UA-DETRAC和GRAM-RTM，在不同的照明和城市地区天气状况。

translated by 谷歌翻译

Indian Licence Plate Dataset in the wild

Sanchit Tanwar , Ayush Tiwari , Ritesh Chowdhry

分类：计算机视觉

2021-11-11

印度车牌检测是一个问题，它在开源级别尚未探讨。可以使用专有解决方案，但没有大的开源数据集可用于执行实验并测试不同的方法。可用的大型数据集是中国，巴西等国家，但在这些数据集上培训的模型对印度板块表现不佳，因为字体样式和板材设计从国家到国家差异很大。这篇论文介绍了印度车牌数据集使用16192图像和21683板板用每个板的4个点注释，并且相应的板中的每个字符.WE呈现了一种使用语义分割来解决数字板检测的基准模型。我们提出了一种两级方法，其中第一阶段是用于本地化板，第二阶段是读取裁剪板图像中的文本.WE测试的基准对象检测和语义分段模型，用于第二阶段，我们使用了LPRNET基于OCR。

translated by 谷歌翻译