司机注意力预测的任务对机器人和自治车辆行业的研究人员来说具有相当大的兴趣。司机注意预测可以在缓解和防止高风险事件中起作用的乐器作用,如碰撞和伤亡。然而,现有的司机注意力预测模型忽略了驾驶员的分心状态和意图,这可能会显着影响他们如何观察周围环境。为解决这些问题,我们展示了一个新的驱动程序注意数据集,Cocatt(认知条件注意)。与以前的驱动程序注意数据集不同,CoCatt包括单帧注释,用于描述驱动程序的分散注意力状态和意图。此外,我们的数据集中的注意数据在手动和自动驾驶仪模式中使用不同分辨率的眼跟踪设备捕获。我们的结果表明,将上述两个驾驶员状态纳入注意建模可以提高驾驶员注意预测的性能。据我们所知,这项工作是第一个提供自动opilot注意数据的人。此外,COCATT目前是最大的和最多样化的驾驶员注意数据集,在自主水平,眼跟踪器分辨率和驾驶场景方面。
translated by 谷歌翻译
驾驶员注意力预测的任务引起了研究人员对机器人技术和自动驾驶汽车行业的极大兴趣。驾驶员注意力预测可以在缓解和预防高风险事件(如碰撞和伤亡)中发挥工具作用。但是,现有的驾驶员注意力预测模型忽略了驾驶员的分心状态和意图,这可能会极大地影响他们观察周围环境的方式。为了解决这些问题,我们提出了一个新的驾驶员注意数据集Cocatt(认知条件的注意力)。与以前的驾驶员注意数据集不同,Cocatt包括描述驾驶员的分心状态和意图的人均注释。此外,我们的数据集中的注意力数据使用不同分辨率的眼睛跟踪设备在手动和自动驾驶模式中捕获。我们的结果表明,将上述两个驱动程序状态纳入注意力建模可以提高驾驶员注意力预测的性能。据我们所知,这项工作是第一个提供自动驾驶注意数据的工作。此外,就自主性水平,眼动分辨率和驾驶场景而言,Cocatt目前是最大,最多样化的驾驶员注意数据集。 Cocatt可在https://cocatt-dataset.github.io上下载。
translated by 谷歌翻译
眼目光分析是计算机视觉和人类计算机相互作用领域的重要研究问题。即使在过去十年中取得了显着进展,由于眼睛外观,眼头相互作用,遮挡,图像质量和照明条件的独特性,自动凝视分析仍然具有挑战性。有几个开放的问题,包括在没有先验知识的情况下,在不受限制的环境中解释凝视方向的重要提示以及如何实时编码它们。我们回顾了一系列目光分析任务和应用程序的进展,以阐明这些基本问题,确定凝视分析中的有效方法并提供可能的未来方向。我们根据其优势和报告的评估指标分析了最近的凝视估计和分割方法,尤其是在无监督和弱监督的领域中。我们的分析表明,强大而通用的凝视分析方法的开发仍然需要解决现实世界中的挑战,例如不受限制的设置和学习,并减少了监督。最后,我们讨论了设计现实的目光分析系统的未来研究方向,该系统可以传播到其他领域,包括计算机视觉,增强现实(AR),虚拟现实(VR)和人类计算机交互(HCI)。项目页面:https://github.com/i-am-shreya/eyegazesurvey} {https://github.com/i-am-shreya/eyegazesurvey
translated by 谷歌翻译
这项调查回顾了对基于视觉的自动驾驶系统进行行为克隆训练的解释性方法。解释性的概念具有多个方面,并且需要解释性的驾驶强度是一种安全至关重要的应用。从几个研究领域收集贡献,即计算机视觉,深度学习,自动驾驶,可解释的AI(X-AI),这项调查可以解决几点。首先,它讨论了从自动驾驶系统中获得更多可解释性和解释性的定义,上下文和动机,以及该应用程序特定的挑战。其次,以事后方式为黑盒自动驾驶系统提供解释的方法是全面组织和详细的。第三,详细介绍和讨论了旨在通过设计构建更容易解释的自动驾驶系统的方法。最后,确定并检查了剩余的开放挑战和潜在的未来研究方向。
translated by 谷歌翻译
交通事故预期是自动化驾驶系统(广告)提供安全保证的驾驶体验的重要功能。事故预期模型旨在在发生之前及时准确地预测事故。现有的人工智能(AI)意外预期模型缺乏对其决策的人类可意识形态的解释。虽然这些模型表现良好,但它们仍然是广告用户的黑匣子,因此难以获得他们的信任。为此,本文介绍了一个门控复发单位(GRU)网络,用于了解从Dashcam视频数据的交通事故的早期期间的时空关系特征。名为Grad-CAM的后HOC关注机制被集成到网络中,以产生显着图作为事故预期决策的视觉解释。眼跟踪器捕获人眼固定点以产生人类注意图。与人类注意图相比,评估网络生成的显着性图的解释性。在公共崩溃数据集上的定性和定量结果证实,建议的可解释网络可以平均预期事故,然后在发生之前的4.57秒,平均精度为94.02%。此外,评估各种基于HOC的基于后关注的XAI方法。它证实了本研究选择的渐变凸轮可以产生高质量的人类可解释的显着性图(具有1.23标准化的扫描路径显着性),以解释碰撞预期决定。重要的是,结果证实,拟议的AI模型,具有人类灵感设计,可以在事故期内超越人类。
translated by 谷歌翻译
自动化车辆功能最佳接受和舒适性的关键因素是驾驶方式。自动化和驱动程序偏爱的驾驶方式之间的不匹配可以使用户更频繁地接管甚至禁用自动化功能。这项工作建议用多模式信号识别用户驾驶样式偏好,因此该车辆可以以连续自动的方式匹配用户偏好。我们对36名参与者进行了驾驶模拟器研究,并收集了广泛的多模式数据,包括行为,生理和情境数据。这包括眼目光,转向抓地力,驾驶演习,制动和节气门踏板输入以及距踏板的脚距离,瞳孔直径,电流皮肤反应,心率和情境驱动驱动环境。然后,我们建立了机器学习模型来识别首选的驾驶方式,并确认所有模式对于识别用户偏好都很重要。这项工作为自动车辆的隐性自适应驾驶风格铺平了道路。
translated by 谷歌翻译
The last decade witnessed increasingly rapid progress in self-driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence. The objective of this paper is to survey the current state-of-the-art on deep learning technologies used in autonomous driving. We start by presenting AI-based self-driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration and motion control algorithms. We investigate both the modular perception-planning-action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources and computational hardware. The comparison presented in this survey helps to gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices. 1
translated by 谷歌翻译
自动化驾驶系统(广告)开辟了汽车行业的新领域,为未来的运输提供了更高的效率和舒适体验的新可能性。然而,在恶劣天气条件下的自主驾驶已经存在,使自动车辆(AVS)长时间保持自主车辆(AVS)或更高的自主权。本文评估了天气在分析和统计方式中为广告传感器带来的影响和挑战,并对恶劣天气条件进行了解决方案。彻底报道了关于对每种天气的感知增强的最先进技术。外部辅助解决方案如V2X技术,当前可用的数据集,模拟器和天气腔室的实验设施中的天气条件覆盖范围明显。通过指出各种主要天气问题,自主驾驶场目前正在面临,近年来审查硬件和计算机科学解决方案,这项调查概述了在不利的天气驾驶条件方面的障碍和方向的障碍和方向。
translated by 谷歌翻译
Figure 1: We introduce datasets for 3D tracking and motion forecasting with rich maps for autonomous driving. Our 3D tracking dataset contains sequences of LiDAR measurements, 360 • RGB video, front-facing stereo (middle-right), and 6-dof localization. All sequences are aligned with maps containing lane center lines (magenta), driveable region (orange), and ground height. Sequences are annotated with 3D cuboid tracks (green). A wider map view is shown in the bottom-right.
translated by 谷歌翻译
In this work, we tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images. Mainly, we investigate the question: what would be good road scene-level representations for these two tasks? We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle while performing actions to their destinations. To this end, we introduce the representation of semantic regions, which are areas where ego-vehicles visit while taking an afforded action (e.g., left-turn at 4-way intersections). We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm. Extensive evaluations are conducted on the HDD and nuScenes datasets, and the learned representations lead to state-of-the-art performance for driver intention prediction and risk object identification.
translated by 谷歌翻译
Multi-modal fusion is a basic task of autonomous driving system perception, which has attracted many scholars' interest in recent years. The current multi-modal fusion methods mainly focus on camera data and LiDAR data, but pay little attention to the kinematic information provided by the bottom sensors of the vehicle, such as acceleration, vehicle speed, angle of rotation. These information are not affected by complex external scenes, so it is more robust and reliable. In this paper, we introduce the existing application fields of vehicle bottom information and the research progress of related methods, as well as the multi-modal fusion methods based on bottom information. We also introduced the relevant information of the vehicle bottom information data set in detail to facilitate the research as soon as possible. In addition, new future ideas of multi-modal fusion technology for autonomous driving tasks are proposed to promote the further utilization of vehicle bottom information.
translated by 谷歌翻译
强化学习(RL)已证明可以在各种任务中达到超级人类水平的表现。但是,与受监督的机器学习不同,将其推广到各种情况的学习策略仍然是现实世界中最具挑战性的问题之一。自主驾驶(AD)提供了一个多方面的实验领域,因为有必要在许多变化的道路布局和可能的交通情况大量分布中学习正确的行为,包括个人驾驶员个性和难以预测的交通事件。在本文中,我们根据可配置,灵活和性能的代码库为AD提出了一个具有挑战性的基准。我们的基准测试使用了随机场景生成器的目录,包括用于道路布局和交通变化的多种机制,不同的数值和视觉观察类型,不同的动作空间,不同的车辆模型,并允许在静态场景定义下使用。除了纯粹的算法见解外,我们面向应用程序的基准还可以更好地理解设计决策的影响,例如行动和观察空间对政策的普遍性。我们的基准旨在鼓励研究人员提出能够在各种情况下成功概括的解决方案,这是当前RL方法失败的任务。基准的代码可在https://github.com/seawee1/driver-dojo上获得。
translated by 谷歌翻译
Traffic accident prediction in driving videos aims to provide an early warning of the accident occurrence, and supports the decision making of safe driving systems. Previous works usually concentrate on the spatial-temporal correlation of object-level context, while they do not fit the inherent long-tailed data distribution well and are vulnerable to severe environmental change. In this work, we propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training. In particular, the text description provides a dense semantic description guidance for the primary context of the traffic scene, while the driver attention provides a traction to focus on the critical region closely correlating with safe driving. CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module. We leverage the attention mechanism in these modules to explore the core semantic cues for accident prediction. In order to train CAP, we extend an existing self-collected DADA-2000 dataset (with annotated driver attention for each frame) with further factual text descriptions for the visual observations before the accidents. Besides, we construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames (named as CAP-DATA) together with labeled fact-effect-reason-introspection description and temporal accident frame label. Based on extensive experiments, the superiority of CAP is validated compared with state-of-the-art approaches. The code, CAP-DATA, and all results will be released in \url{https://github.com/JWFanggit/LOTVS-CAP}.
translated by 谷歌翻译
最近,已经证明了与图形学习技术结合使用的道路场景图表示,在包括动作分类,风险评估和碰撞预测的任务中优于最先进的深度学习技术。为了使Road场景图形表示的应用探索,我们介绍了RoadScene2VEC:一个开源工具,用于提取和嵌入公路场景图。 RoadScene2VEC的目标是通过提供用于生成场景图的工具,为生成时空场景图嵌入的工具以及用于可视化和分析场景图的工具来实现Road场景图的应用程序和能力基于方法。 RoadScene2VEC的功能包括(i)来自Carla Simulator的视频剪辑或数据的自定义场景图,(ii)多种可配置的时空图嵌入模型和基于基于基于CNN的模型,(iii)内置功能使用图形和序列嵌入用于风险评估和碰撞预测应用,(iv)用于评估转移学习的工具,以及(v)用于可视化场景图的实用程序,并分析图形学习模型的解释性。我们展示了道路展示的效用,用于这些用例,具有实验结果和基于CNN的模型的实验结果和定性评估。 Rodscene2vec可在https://github.com/aicps/roadscene2vec提供。
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
在驾驶的背景下进行警觉性监控可改善安全性并挽救生命。基于计算机视觉的警报监视是一个活跃的研究领域。但是,存在警觉性监控的算法和数据集主要针对年轻人(18-50岁)。我们提出了一个针对老年人进行车辆警报监控的系统。通过设计研究,我们确定了适合在5级车辆中独立旅行的老年人的变量和参数。我们实施了一个原型旅行者监测系统,并评估了十个老年人(70岁及以上)的警报检测算法。我们以适合初学者或从业者的详细级别报告系统设计和实施。我们的研究表明,数据集的开发是开发针对老年人的警觉性监测系统的首要挑战。这项研究是迄今为止研究不足的人群中的第一项研究,并通过参与方法对未来的算法开发和系统设计具有影响。
translated by 谷歌翻译
Computer vision applications in intelligent transportation systems (ITS) and autonomous driving (AD) have gravitated towards deep neural network architectures in recent years. While performance seems to be improving on benchmark datasets, many real-world challenges are yet to be adequately considered in research. This paper conducted an extensive literature review on the applications of computer vision in ITS and AD, and discusses challenges related to data, models, and complex urban environments. The data challenges are associated with the collection and labeling of training data and its relevance to real world conditions, bias inherent in datasets, the high volume of data needed to be processed, and privacy concerns. Deep learning (DL) models are commonly too complex for real-time processing on embedded hardware, lack explainability and generalizability, and are hard to test in real-world settings. Complex urban traffic environments have irregular lighting and occlusions, and surveillance cameras can be mounted at a variety of angles, gather dirt, shake in the wind, while the traffic conditions are highly heterogeneous, with violation of rules and complex interactions in crowded scenarios. Some representative applications that suffer from these problems are traffic flow estimation, congestion detection, autonomous driving perception, vehicle interaction, and edge computing for practical deployment. The possible ways of dealing with the challenges are also explored while prioritizing practical deployment.
translated by 谷歌翻译
在多机构动态交通情况下的自主驾驶具有挑战性:道路使用者的行为不确定,很难明确建模,并且自我车辆应与他们应用复杂的谈判技巧,例如屈服,合并和交付,以实现,以实现在各种环境中都有安全有效的驾驶。在这些复杂的动态场景中,传统的计划方法主要基于规则,并且通常会导致反应性甚至过于保守的行为。因此,他们需要乏味的人类努力来维持可行性。最近,基于深度学习的方法显示出令人鼓舞的结果,具有更好的概括能力,但手工工程的工作较少。但是,它们要么是通过有监督的模仿学习(IL)来实施的,该学习遭受了数据集偏见和分配不匹配问题,要么接受了深入强化学习(DRL)的培训,但专注于一种特定的交通情况。在这项工作中,我们建议DQ-GAT实现可扩展和主动的自主驾驶,在这些驾驶中,基于图形注意力的网络用于隐式建模相互作用,并采用了深层Q学习来以无聊的方式训练网络端到端的网络。 。在高保真驾驶模拟器中进行的广泛实验表明,我们的方法比以前的基于学习的方法和传统的基于规则的方法获得了更高的成功率,并且在可见和看不见的情况下都可以更好地摆脱安全性和效率。此外,轨迹数据集的定性结果表明,我们所学的政策可以通过实时速度转移到现实世界中。演示视频可在https://caipeide.github.io/dq-gat/上找到。
translated by 谷歌翻译
自主驾驶中安全路径规划是由于静态场景元素和不确定的周围代理的相互作用,这是一个复杂的任务。虽然所有静态场景元素都是信息来源,但对自助车辆可用的信息有不对称的重要性。我们展示了一个具有新颖功能的数据集,签署了Parience,定义为指示符号是否明显地对自助式车辆的目标有关交通规则的目标。在裁剪标志上使用卷积网络,通过道路类型,图像坐标和计划机动的实验增强,我们预测了76%的准确性,使用76%的符号蓬勃发展,并使用与标志图像的车辆机动信息找到最佳改进。
translated by 谷歌翻译
对行人行为的预测对于完全自主车辆安全有效地在繁忙的城市街道上驾驶至关重要。未来的自治车需要适应混合条件,不仅具有技术还是社会能力。随着更多算法和数据集已经开发出预测行人行为,这些努力缺乏基准标签和估计行人的时间动态意图变化的能力,提供了对交互场景的解释,以及具有社会智能的支持算法。本文提出并分享另一个代表数据集,称为Iupui-CSRC行人位于意图(PSI)数据,除了综合计算机视觉标签之外,具有两种创新标签。第一部小说标签是在自助式车辆前面交叉的行人的动态意图变化,从24个司机中实现了不同的背景。第二个是在估计行人意图并在交互期间预测其行为时对驾驶员推理过程的基于文本的解释。这些创新标签可以启用几个计算机视觉任务,包括行人意图/行为预测,车辆行人互动分割和用于可解释算法的视频到语言映射。发布的数据集可以从根本上从根本上改善行人行为预测模型的发展,并开发社会智能自治车,以有效地与行人进行互动。 DataSet已被不同的任务进行评估,并已释放到公众访问。
translated by 谷歌翻译