智能论文笔记

Automatic vehicle trajectory data reconstruction at scale

Yanbing Wang , Derek Gloudemans , Zi Nean Teoh , Lisa Liu , Gergely Zachár , William Barbour , Daniel Work

分类：计算机视觉

2022-12-15

Vehicle trajectory data has received increasing research attention over the past decades. With the technological sensing improvements such as high-resolution video cameras, in-vehicle radars and lidars, abundant individual and contextual traffic data is now available. However, though the data quantity is massive, it is by itself of limited utility for traffic research because of noise and systematic sensing errors, thus necessitates proper processing to ensure data quality. We draw particular attention to extracting high-resolution vehicle trajectory data from video cameras as traffic monitoring cameras are becoming increasingly ubiquitous. We explore methods for automatic trajectory data reconciliation, given "raw" vehicle detection and tracking information from automatic video processing algorithms. We propose a pipeline including a) an online data association algorithm to match fragments that are associated to the same object (vehicle), which is formulated as a min-cost network flow problem of a graph, and b) a trajectory reconciliation method formulated as a quadratic program to enhance raw detection data. The pipeline leverages vehicle dynamics and physical constraints to associate tracked objects when they become fragmented, remove measurement noise on trajectories and impute missing data due to fragmentations. The accuracy is benchmarked on a sample of manually-labeled data, which shows that the reconciled trajectories improve the accuracy on all the tested input data for a wide range of measures. An online version of the reconciliation pipeline is implemented and will be applied in a continuous video processing system running on a camera network covering a 4-mile stretch of Interstate-24 near Nashville, Tennessee.

translated by 谷歌翻译

Providentia -- A Large-Scale Sensor System for the Assistance of Autonomous Vehicles and Its Evaluation

Annkathrin Krämmer , Christoph Schöller , Dhiraj Gulati , Venkatnarayanan Lakshminarasimhan , Franz Kurz , Dominik Rosenbaum , Claus Lenz , Alois Knoll

分类：机器人 | 计算机视觉

2019-06-16

自主车辆的环境感知受其物理传感器范围和算法性能的限制，以及通过降低其对正在进行的交通状况的理解的闭塞。这不仅构成了对安全和限制驾驶速度的重大威胁，而且它也可能导致不方便的动作。智能基础设施系统可以帮助缓解这些问题。智能基础设施系统可以通过在当前交通情况的数字模型的形式提供关于其周围环境的额外详细信息，填补了车辆的感知中的差距并扩展了其视野。数字双胞胎。然而，这种系统的详细描述和工作原型表明其可行性稀缺。在本文中，我们提出了一种硬件和软件架构，可实现这样一个可靠的智能基础架构系统。我们在现实世界中实施了该系统，并展示了它能够创建一个准确的延伸高速公路延伸的数字双胞胎，从而提高了自主车辆超越其车载传感器的极限的感知。此外，我们通过使用空中图像和地球观测方法来评估数字双胞胎的准确性和可靠性，用于产生地面真理数据。

translated by 谷歌翻译

High-temporal-resolution event-based vehicle detection and tracking

Zaid El-Shair , Samir Rawashdeh

分类：计算机视觉

2022-12-29

Event-based vision has been rapidly growing in recent years justified by the unique characteristics it presents such as its high temporal resolutions (~1us), high dynamic range (>120dB), and output latency of only a few microseconds. This work further explores a hybrid, multi-modal, approach for object detection and tracking that leverages state-of-the-art frame-based detectors complemented by hand-crafted event-based methods to improve the overall tracking performance with minimal computational overhead. The methods presented include event-based bounding box (BB) refinement that improves the precision of the resulting BBs, as well as a continuous event-based object detection method, to recover missed detections and generate inter-frame detections that enable a high-temporal-resolution tracking output. The advantages of these methods are quantitatively verified by an ablation study using the higher order tracking accuracy (HOTA) metric. Results show significant performance gains resembled by an improvement in the HOTA from 56.6%, using only frames, to 64.1% and 64.9%, for the event and edge-based mask configurations combined with the two methods proposed, at the baseline framerate of 24Hz. Likewise, incorporating these methods with the same configurations has improved HOTA from 52.5% to 63.1%, and from 51.3% to 60.2% at the high-temporal-resolution tracking rate of 384Hz. Finally, a validation experiment is conducted to analyze the real-world single-object tracking performance using high-speed LiDAR. Empirical evidence shows that our approaches provide significant advantages compared to using frame-based object detectors at the baseline framerate of 24Hz and higher tracking rates of up to 500Hz.

translated by 谷歌翻译

Spatial-Temporal Map Vehicle Trajectory Detection Using Dynamic Mode Decomposition and Res-UNet+ Neural Networks

Tianya T. Zhang , Peter J. Jin

分类：计算机视觉

2022-01-13

本文提出了一种机器学习增强的纵向扫描线方法，用于从大角度交通摄像机中提取车辆轨迹。通过将空间颞映射（STMAP）分解到稀疏前景和低秩背景，应用动态模式分解（DMD）方法来提取车辆股线。通过调整两个普遍的深度学习架构，设计了一个名为Res-Unet +的深神经网络。 RES-UNET +神经网络显着提高了基于STMAP的车辆检测的性能，DMD模型提供了许多有趣的见解，了解由Stmap保留的潜在空间结构的演变。与先前的图像处理模型和主流语义分割深神经网络进行比较模型输出。经过彻底的评估后，证明该模型对许多具有挑战性的因素来说是准确和强大的。最后但并非最不重要的是，本文从根本上解决了NGSIM轨迹数据中发现了许多质量问题。清除清洁的高质量轨迹数据，以支持交通流量和微观车辆控制的未来理论和建模研究。该方法是用于基于视频的轨迹提取的可靠解决方案，并且具有广泛的适用性。

translated by 谷歌翻译

A Survey of Deep Learning Techniques for Autonomous Driving

Sorin Grigorescu , Bogdan Trasnea , Tiberiu Cocias , Gigel Macesanu

分类：

2019-10-17

The last decade witnessed increasingly rapid progress in self-driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence. The objective of this paper is to survey the current state-of-the-art on deep learning technologies used in autonomous driving. We start by presenting AI-based self-driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration and motion control algorithms. We investigate both the modular perception-planning-action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources and computational hardware. The comparison presented in this survey helps to gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices. 1

translated by 谷歌翻译

Deep Learning based Computer Vision Methods for Complex Traffic Environments Perception: A Review

Talha Azfar , Jinlong Li , Hongkai Yu , Ruey Long Cheu , Yisheng Lv , Ruimin Ke

分类：计算机视觉

2022-11-09

Computer vision applications in intelligent transportation systems (ITS) and autonomous driving (AD) have gravitated towards deep neural network architectures in recent years. While performance seems to be improving on benchmark datasets, many real-world challenges are yet to be adequately considered in research. This paper conducted an extensive literature review on the applications of computer vision in ITS and AD, and discusses challenges related to data, models, and complex urban environments. The data challenges are associated with the collection and labeling of training data and its relevance to real world conditions, bias inherent in datasets, the high volume of data needed to be processed, and privacy concerns. Deep learning (DL) models are commonly too complex for real-time processing on embedded hardware, lack explainability and generalizability, and are hard to test in real-world settings. Complex urban traffic environments have irregular lighting and occlusions, and surveillance cameras can be mounted at a variety of angles, gather dirt, shake in the wind, while the traffic conditions are highly heterogeneous, with violation of rules and complex interactions in crowded scenarios. Some representative applications that suffer from these problems are traffic flow estimation, congestion detection, autonomous driving perception, vehicle interaction, and edge computing for practical deployment. The possible ways of dealing with the challenges are also explored while prioritizing practical deployment.

translated by 谷歌翻译

Network-level Safety Metrics for Overall Traffic Safety Assessment: A Case Study

Xiwen Chen , Hao Wang , Abolfazl Razi , Brendan Russo , Jason Pacheco , John Roberts , Jeffrey Wishart , Larry Head , Alonso Granados Baca

分类：计算机视觉 | 人工智能 | 机器学习

2022-01-27

由于精确定位传感器，人工智能（AI）的安全功能，自动驾驶系统，连接的车辆，高通量计算和边缘计算服务器的技术进步，驾驶安全分析最近经历了前所未有的改进。特别是，深度学习（DL）方法授权音量视频处理，从路边单元（RSU）捕获的大型视频中提取与安全相关的功能。安全指标是调查崩溃和几乎冲突事件的常用措施。但是，这些指标提供了对整个网络级流量管理的有限见解。另一方面，一些安全评估工作致力于处理崩溃报告，并确定与道路几何形状，交通量和天气状况相关的崩溃的空间和时间模式。这种方法仅依靠崩溃报告，而忽略了交通视频的丰富信息，这些信息可以帮助确定违规行为在崩溃中的作用。为了弥合这两个观点，我们定义了一组新的网络级安全指标（NSM），以通过处理RSU摄像机拍摄的图像来评估交通流的总体安全性。我们的分析表明，NSM显示出与崩溃率的显着统计关联。这种方法与简单地概括单个崩溃分析的结果不同，因为所有车辆都有助于计算NSM，而不仅仅是碰撞事件所涉及的NSM。该视角将交通流量视为一个复杂的动态系统，其中某些节点的动作可以通过网络传播并影响其他节点的崩溃风险。我们还提供了附录A中的代孕安全指标（SSM）的全面审查。

translated by 谷歌翻译

Traffic-Net: 3D Traffic Monitoring Using a Single Camera

Mahdi Rezaei , Mohsen Azarmi , Farzam Mohammad Pour Mir

分类：计算机视觉 | 人工智能 | 机器学习

2021-09-19

计算机视觉在智能运输系统（ITS）和交通监视中发挥了重要作用。除了快速增长的自动化车辆和拥挤的城市外，通过实施深层神经网络的实施，可以使用视频监视基础架构进行自动和高级交通管理系统（ATM）。在这项研究中，我们为实时交通监控提供了一个实用的平台，包括3D车辆/行人检测，速度检测，轨迹估算，拥塞检测以及监视车辆和行人的相互作用，都使用单个CCTV交通摄像头。我们适应了定制的Yolov5深神经网络模型，用于车辆/行人检测和增强的排序跟踪算法。还开发了基于混合卫星的基于混合卫星的逆透视图（SG-IPM）方法，用于摄像机自动校准，从而导致准确的3D对象检测和可视化。我们还根据短期和长期的时间视频数据流开发了层次结构的交通建模解决方案，以了解脆弱道路使用者的交通流量，瓶颈和危险景点。关于现实世界情景和与最先进的比较的几项实验是使用各种交通监控数据集进行的，包括从高速公路，交叉路口和城市地区收集的MIO-TCD，UA-DETRAC和GRAM-RTM，在不同的照明和城市地区天气状况。

translated by 谷歌翻译

Argoverse: 3D Tracking and Forecasting with Rich Maps

Ming-Fang Chang , John Lambert , Patsorn Sangkloy , Jagjeet Singh , Slawomir Bak , Andrew Hartnett , De Wang , Peter Carr , Simon Lucey , Deva Ramanan

分类：

2019-11-06

Figure 1: We introduce datasets for 3D tracking and motion forecasting with rich maps for autonomous driving. Our 3D tracking dataset contains sequences of LiDAR measurements, 360 • RGB video, front-facing stereo (middle-right), and 6-dof localization. All sequences are aligned with maps containing lane center lines (magenta), driveable region (orange), and ground height. Sequences are annotated with 3D cuboid tracks (green). A wider map view is shown in the bottom-right.

translated by 谷歌翻译

UAVs Beneath the Surface: Cooperative Autonomy for Subterranean Search and Rescue in DARPA SubT

Matej Petrlik , Pavel Petracek , Vit Kratky , Tomas Musil , Yurii Stasinchuk , Matous Vrba , Tomas Baca , Daniel Hert , Martin Pecka , Tomas Svoboda

分类：机器人 | 人工智能

2022-06-16

本文提出了一种新颖的方法，用于在具有复杂拓扑结构的地下领域的搜索和救援行动中自动合作。作为CTU-Cras-Norlab团队的一部分，拟议的系统在DARPA SubT决赛的虚拟轨道中排名第二。与专门为虚拟轨道开发的获奖解决方案相反，该建议的解决方案也被证明是在现实世界竞争极为严峻和狭窄的环境中飞行的机上实体无人机的强大系统。提出的方法可以使无缝模拟转移的无人机团队完全自主和分散的部署，并证明了其优于不同环境可飞行空间的移动UGV团队的优势。该论文的主要贡献存在于映射和导航管道中。映射方法采用新颖的地图表示形式 - 用于有效的风险意识长距离计划，面向覆盖范围和压缩的拓扑范围的LTVMAP领域，以允许在低频道通信下进行多机器人合作。这些表示形式与新的方法一起在导航中使用，以在一般的3D环境中可见性受限的知情搜索，而对环境结构没有任何假设，同时将深度探索与传感器覆盖的剥削保持平衡。所提出的解决方案还包括一条视觉感知管道，用于在没有专用GPU的情况下在5 Hz处进行四个RGB流中感兴趣的对象的板上检测和定位。除了参与DARPA SubT外，在定性和定量评估的各种环境中，在不同的环境中进行了广泛的实验验证，UAV系统的性能得到了支持。

translated by 谷歌翻译

LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking

Duy M. H. Nguyen , Roberto Henschel , Bodo Rosenhahn , Daniel Sonntag , Paul Swoboda

分类：计算机视觉

2021-11-23

多摄像机多对象跟踪目前在计算机视野中引起了注意力，因为它在现实世界应用中的卓越性能，如具有拥挤场景或巨大空间的视频监控。在这项工作中，我们提出了一种基于空间升降的多乳制型配方的数学上优雅的多摄像多对象跟踪方法。我们的模型利用单摄像头跟踪器产生的最先进的TOOTWLET作为提案。由于这些Tracklet可能包含ID-Switch错误，因此我们通过从3D几何投影获得的新型预簇来完善它们。因此，我们派生了更好的跟踪图，没有ID交换机，更精确的数据关联阶段的亲和力成本。然后通过求解全局提升的多乳制型制剂，将轨迹与多摄像机轨迹匹配，该组件包含位于同一相机和相互相机间的Tracklet上的短路和远程时间交互。在Wildtrack DataSet的实验结果是近乎完美的结果，在校园上表现出最先进的追踪器，同时在PETS-09数据集上处于校准状态。我们将在接受纸质时进行我们的实施。

translated by 谷歌翻译

iDriving: Toward Safe and Efficient Infrastructure-directed Autonomous Driving

Fawad Ahmad , Christina Shin , Weiwu Pang , Jacob Cashman , Branden Leong , Ramesh Govindan

分类：机器人

2022-07-18

在未来几十年中，自动驾驶将普遍存在。闲置在交叉点上提高自动驾驶的安全性，并通过改善交叉点的交通吞吐量来提高效率。在闲置时，路边基础设施通过卸载从车辆到路边基础设施的知觉和计划，在交叉路口远程驾驶自动驾驶汽车。为了实现这一目标，iDriving必须能够以全帧速率以较少100毫秒的尾声处理大量的传感器数据，而无需牺牲准确性。我们描述了算法和优化，使其能够使用准确且轻巧的感知组件实现此目标，该组件是从重叠传感器中得出的复合视图的原因，以及一个共同计划多个车辆的轨迹的计划者。在我们的评估中，闲置始终确保车辆的安全通过，而自动驾驶只能有27％的时间。与其他方法相比，闲置的等待时间还要低5倍，因为它可以实现无流量的交叉点。

translated by 谷歌翻译

Deep Learning-Driven Edge Video Analytics: A Survey

Renjie Xu , Saiedeh Razavi , Rong Zheng

分类：计算机视觉 | 机器学习

2022-11-28

Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.

translated by 谷歌翻译

Autonomous Driving in Adverse Weather Conditions: A Survey

Yuxiao Zhang , Alexander Carballo , Hanting Yang , Kazuya Takeda

分类：机器人

2021-12-16

自动化驾驶系统（广告）开辟了汽车行业的新领域，为未来的运输提供了更高的效率和舒适体验的新可能性。然而，在恶劣天气条件下的自主驾驶已经存在，使自动车辆（AVS）长时间保持自主车辆（AVS）或更高的自主权。本文评估了天气在分析和统计方式中为广告传感器带来的影响和挑战，并对恶劣天气条件进行了解决方案。彻底报道了关于对每种天气的感知增强的最先进技术。外部辅助解决方案如V2X技术，当前可用的数据集，模拟器和天气腔室的实验设施中的天气条件覆盖范围明显。通过指出各种主要天气问题，自主驾驶场目前正在面临，近年来审查硬件和计算机科学解决方案，这项调查概述了在不利的天气驾驶条件方面的障碍和方向的障碍和方向。

translated by 谷歌翻译

Spatial-Temporal Deep Embedding for Vehicle Trajectory Reconstruction from High-Angle Video

Tianya T. Zhang Ph. D. , Peter J. Jin Ph. D. , Han Zhou , Benedetto Piccoli , Ph. D

分类：计算机视觉 | 人工智能

2022-09-17

基于时空的图（STMAP）方法显示出为车辆轨迹重建处理高角度视频的巨大潜力，可以满足各种数据驱动的建模和模仿学习应用的需求。在本文中，我们开发了时空深嵌入（STDE）模型，该模型在像素和实例水平上施加了平等约束，以生成用于STMAP上车辆条纹分割的实例感知嵌入。在像素级别上，每个像素在不同范围的8-邻居像素进行编码，随后使用该编码来指导神经网络学习嵌入机制。在实例级别上，歧视性损耗函数被设计为将属于同一实例的像素更接近，并将不同实例的平均值分开。然后，通过静脉 - 沃特算法算法优化时空亲和力的输出，以获得最终的聚类结果。基于分割指标，我们的模型优于其他五个用于STMAP处理的基线，并在阴影，静态噪声和重叠的影响下显示出稳健性。该设计的模型用于处理所有公共NGSIM US-101视频，以生成完整的车辆轨迹，表明具有良好的可扩展性和适应性。最后但并非最不重要的一点是，讨论了带有STDE和未来方向的扫描线方法的优势。代码，STMAP数据集和视频轨迹在在线存储库中公开可用。 github链接：shorturl.at/jklt0。

translated by 谷歌翻译

Multiple Object Tracking in Recent Times: A Literature Review

Mk Bashar , Samia Islam , Kashifa Kawaakib Hussain , Md. Bakhtiar Hasan , A. B. M. Ashikur Rahman , Md. Hasanul Kabir

分类：计算机视觉

2022-09-11

近年来，多个对象跟踪引起了研究人员的极大兴趣，它已成为计算机视觉中的趋势问题之一，尤其是随着自动驾驶的最新发展。 MOT是针对不同问题的关键视觉任务之一，例如拥挤的场景中的闭塞，相似的外观，小物体检测难度，ID切换等，以应对这些挑战，因为研究人员试图利用变压器的注意力机制，与田径的相互关系，与田径的相互关系，图形卷积神经网络，与暹罗网络不同帧中对象的外观相似性，他们还尝试了基于IOU匹配的CNN网络，使用LSTM的运动预测。为了将这些零散的技术在雨伞下采用，我们研究了过去三年发表的一百多篇论文，并试图提取近代研究人员更关注的技术来解决MOT的问题。我们已经征集了许多应用，可能性以及MOT如何与现实生活有关。我们的评论试图展示研究人员使用过时的技术的不同观点，并为潜在的研究人员提供了一些未来的方向。此外，我们在这篇评论中包括了流行的基准数据集和指标。

translated by 谷歌翻译

SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset

Fatih Emre Simsek , Cevahir Cigla , Koray Kayabol

分类：计算机视觉

2022-08-04

由于卷积神经网络（CNN）在过去的十年中检测成功，多对象跟踪（MOT）通过检测方法的使用来控制。随着数据集和基础标记网站的发布，研究方向已转向在跟踪时在包括重新识别对象的通用场景（包括重新识别（REID））上的最佳准确性。在这项研究中，我们通过提供专用的行人数据集并专注于对性能良好的多对象跟踪器的深入分析来缩小监视的范围）现实世界应用的技术。为此，我们介绍SOMPT22数据集；一套新的，用于多人跟踪的新套装，带有带注释的简短视频，该视频从位于杆子上的静态摄像头捕获，高度为6-8米，用于城市监视。与公共MOT数据集相比，这提供了室外监视的MOT的更为集中和具体的基准。我们分析了该新数据集上检测和REID网络的使用方式，分析了将MOT跟踪器分类为单发和两阶段。我们新数据集的实验结果表明，SOTA远非高效率，而单一跟踪器是统一快速执行和准确性的良好候选者，并具有竞争性的性能。该数据集将在以下网址提供：sompt22.github.io

translated by 谷歌翻译

Vehicle Trajectory Tracking Through Magnetic Sensors: A Case Study of Two-lane Road

Xiaojiang Ren , Yuanfa Tu , Yingfan Geng

分类：机器人

2022-09-09

Traffic surveillance is an important issue in Intelligent Transportation Systems(ITS). In this paper, we propose a novel surveillance system to detect and track vehicles using ubiquitously deployed magnetic sensors. That is, multiple magnetic sensors, mounted roadside and along lane boundary lines, are used to track various vehicles. Real-time vehicle detection data are reported from magnetic sensors, collected into data center via base stations, and processed to depict vehicle trajectories including vehicle position, timestamp, speed and type. We first define a vehicle trajectory tracking problem. We then propose a graph-based data association algorithm to track each detected vehicle, and design a related online algorithm framework respectively. We finally validate the performance via both experimental simulation and real-world road test. The experimental results demonstrate that the proposed solution provides a cost-effective solution to capture the driving status of vehicles and on that basis form various traffic safety and efficiency applications.

translated by 谷歌翻译

Autonomous Aerial Delivery Vehicles, a Survey of Techniques on how Aerial Package Delivery is Achieved

Jack Saunders , Sajad Saeedi , Wenbin Li

分类：机器人

2021-10-06

在过去的十年中，自动驾驶航空运输车辆引起了重大兴趣。这是通过空中操纵器和新颖的握手的技术进步来实现这一目标的。此外，改进的控制方案和车辆动力学能够更好地对有效载荷进行建模和改进的感知算法，以检测无人机（UAV）环境中的关键特征。在这项调查中，对自动空中递送车辆的技术进步和开放研究问题进行了系统的审查。首先，详细讨论了各种类型的操纵器和握手，以及动态建模和控制方法。然后，讨论了降落在静态和动态平台上的。随后，诸如天气状况，州估计和避免碰撞之类的风险以确保安全过境。最后，调查了交付的UAV路由，该路由将主题分为两个领域：无人机操作和无人机合作操作。

translated by 谷歌翻译

Visual and Object Geo-localization: A Comprehensive Survey

Daniel Wilson , Xiaohan Zhang , Waqas Sultani , Safwan Wshah

分类：计算机视觉

2021-12-30

地理定位的概念是指确定地球上的某些“实体”的位置的过程，通常使用全球定位系统（GPS）坐标。感兴趣的实体可以是图像，图像序列，视频，卫星图像，甚至图像中可见的物体。由于GPS标记媒体的大规模数据集由于智能手机和互联网而迅速变得可用，而深入学习已经上升以提高机器学习模型的性能能力，因此由于其显着影响而出现了视觉和对象地理定位的领域广泛的应用，如增强现实，机器人，自驾驶车辆，道路维护和3D重建。本文提供了对涉及图像的地理定位的全面调查，其涉及从捕获图像（图像地理定位）或图像内的地理定位对象（对象地理定位）的地理定位的综合调查。我们将提供深入的研究，包括流行算法的摘要，对所提出的数据集的描述以及性能结果的分析来说明每个字段的当前状态。

translated by 谷歌翻译