对象检测是自动驾驶中的一个全面研究的问题。但是,在鱼眼相机的情况下,它的探索相对较少。强烈的径向失真破坏了卷积神经网络的翻译不变性电感偏置。因此,我们提出了自动驾驶的木观鱼眼检测挑战,这是CVPR 2022年全向计算机视觉(OMNICV)的一部分。这是针对鱼眼相机对象检测的首批比赛之一。我们鼓励参与者设计在没有纠正的情况下对鱼眼图像的本地工作的模型。我们使用Codalab根据公开可用的Fisheye数据集主持竞争。在本文中,我们提供了有关竞争的详细分析,该分析吸引了120个全球团队的参与和1492份提交的参与。我们简要讨论获胜方法的细节,并分析其定性和定量结果。
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
The task of locating and classifying different types of vehicles has become a vital element in numerous applications of automation and intelligent systems ranging from traffic surveillance to vehicle identification and many more. In recent times, Deep Learning models have been dominating the field of vehicle detection. Yet, Bangladeshi vehicle detection has remained a relatively unexplored area. One of the main goals of vehicle detection is its real-time application, where `You Only Look Once' (YOLO) models have proven to be the most effective architecture. In this work, intending to find the best-suited YOLO architecture for fast and accurate vehicle detection from traffic images in Bangladesh, we have conducted a performance analysis of different variants of the YOLO-based architectures such as YOLOV3, YOLOV5s, and YOLOV5x. The models were trained on a dataset containing 7390 images belonging to 21 types of vehicles comprising samples from the DhakaAI dataset, the Poribohon-BD dataset, and our self-collected images. After thorough quantitative and qualitative analysis, we found the YOLOV5x variant to be the best-suited model, performing better than YOLOv3 and YOLOv5s models respectively by 7 & 4 percent in mAP, and 12 & 8.5 percent in terms of Accuracy.
translated by 谷歌翻译
交通灯检测对于自动驾驶汽车在城市地区安全导航至关重要。公开可用的交通灯数据集不足以开发用于检测提供重要导航信息的遥远交通信号灯的算法。我们介绍了一个新颖的基准交通灯数据集,该数据集使用一对涵盖城市和半城市道路的狭窄角度和广角摄像机捕获。我们提供1032张训练图像和813个同步图像对进行测试。此外,我们提供同步视频对进行定性分析。该数据集包括第1920 $ \ times $ 1080的分辨率图像,覆盖10个不同类别。此外,我们提出了一种用于结合两个相机输出的后处理算法。结果表明,与使用单个相机框架的传统方法相比,我们的技术可以在速度和准确性之间取得平衡。
translated by 谷歌翻译
实时机器学习检测算法通常在自动驾驶汽车技术中发现,并依赖优质数据集。这些算法在日常条件以及强烈的阳光下都能正常工作。报告表明,眩光是撞车事故最突出的两个最突出的原因之一。但是,现有的数据集,例如LISA和德国交通标志识别基准,根本不反映Sun Glare的存在。本文介绍了眩光交通标志数据集:在阳光下重大视觉干扰下,具有基于美国的交通标志的图像集合。眩光包含2,157张带有阳光眩光的交通标志图像,从33个美国道路录像带中拉出。它为广泛使用的Lisa流量标志数据集提供了必不可少的丰富。我们的实验研究表明,尽管几种最先进的基线方法在没有太阳眩光的情况下对交通符号数据集进行了训练和测试,但在对眩光进行测试时,它们遭受了极大的痛苦(例如,9%至21%的平均图范围为9%至21%。 ,它明显低于LISA数据集上的性能)。我们还注意到,当对Sun Glare中的交通标志图像进行培训时,当前的架构具有更好的检测准确性(例如,主流算法平均42%的平均地图增益)。
translated by 谷歌翻译
通过验证SOTIF-ISO / PAS-21448(预期功能的安全)来验证安全标准,构思自动车辆以提供安全和安全的服务。在这种情况下,对环境的感知与本地化,规划和控制模块结合起作用乐器作用。作为感知堆栈中的枢轴算法,对象检测提供了广泛的洞察,进入自动车辆的周围环境。相机和激光雷达广泛用于不同的传感器模式之间的物体检测,但这些脱离传感器在分辨率和恶劣天气条件下具有局限性。在这项工作中,探索基于雷达的对象检测提供了部署的对应传感器模块,并用于恶劣天气条件。雷达提供复杂的数据;为此目的,提出了一种具有变压器编码器 - 解码器网络的通道升压功能集合方法。使用雷达的对象检测任务被制定为一个设置的预测问题,并在公共可用的数据集中进行评估,在良好和良好的天气条件下。使用Coco评估度量广泛评估所提出的方法的功效,最佳拟议的模型将其最先进的同行方法超过12.55 \%$ 12.48 \%$ 12.48 \%$。
translated by 谷歌翻译
第六版的AI城市挑战赛特别关注了两个领域的问题,在计算机视觉和人工智能的交集中具有巨大的解锁潜力:智能交通系统(ITS),以及实体和砂浆零售业务。 2022年AI City Challenge的四个挑战赛收到了来自27个国家 /地区254个团队的参与请求。轨道1地址的城市规模多目标多摄像机(MTMC)车辆跟踪。轨道2地址为基于天然语言的车辆轨道检索。 Track 3是一条全新的自然主义驾驶分析的轨道,该轨道是由安装在车辆内部的几台相机捕获的,该摄像头专注于驾驶员安全,而任务是对驾驶员的操作进行分类。 Track 4是另一个旨在仅使用单个视图摄像头实现零售商店自动结帐的新轨道。我们发布了两个基于不同方法的领导董事会成员提交,包括比赛的公共负责人委员会,不允许使用外部数据,以及用于所有提交结果的总管委员会。参与团队的最高表现建立了强大的基线,甚至超过了拟议的挑战赛中的最先进。
translated by 谷歌翻译
Continual Learning, also known as Lifelong or Incremental Learning, has recently gained renewed interest among the Artificial Intelligence research community. Recent research efforts have quickly led to the design of novel algorithms able to reduce the impact of the catastrophic forgetting phenomenon in deep neural networks. Due to this surge of interest in the field, many competitions have been held in recent years, as they are an excellent opportunity to stimulate research in promising directions. This paper summarizes the ideas, design choices, rules, and results of the challenge held at the 3rd Continual Learning in Computer Vision (CLVision) Workshop at CVPR 2022. The focus of this competition is the complex continual object detection task, which is still underexplored in literature compared to classification tasks. The challenge is based on the challenge version of the novel EgoObjects dataset, a large-scale egocentric object dataset explicitly designed to benchmark continual learning algorithms for egocentric category-/instance-level object understanding, which covers more than 1k unique main objects and 250+ categories in around 100k video frames.
translated by 谷歌翻译
雷达和摄像机多模式融合的环境感知对于自动驾驶至关重要,以提高准确性,完整性和稳健性。本文着重于如何利用毫米波(MMW)雷达和相机传感器融合进行3D对象检测。提出了一种新的方法,该方法在提出了更好的特征表示形式下意识到在鸟眼视图(BEV)下的特征级融合。首先,将雷达特征通过时间积累增强,并发送到时间空间编码器以进行雷达特征提取。同时,通过图像骨干和颈部模型获得了适应各种空间尺度的多尺度图像2D特征。然后,将图像功能转换为使用设计的视图变压器。此外,这项工作将多模式特征与称为点融合和ROI融合的两阶段融合模型融合在一起。最后,检测头会回归对象类别和3D位置。实验结果表明,所提出的方法在最重要的检测指标,平均平均精度(MAP)和NUSCENES检测分数(NDS)下实现了最先进的性能。
translated by 谷歌翻译
电动汽车越来越普遍,具有电感折射板被认为是充电电动车辆的方便和有效的手段。然而,驾驶员通常较差,使车辆对准到必要的电感充电的必要精度时,使得两个充电板的自动对准是所需的。与车辆队列的电气化平行,利用环保相机系统的自动停车系统越来越受欢迎。在这项工作中,我们提出了一种基于环绕式摄像机架构的系统来检测,本地化,并自动将车辆与电感充电板对齐。费用板的视觉设计不标准化,并不一定事先已知。因此,依赖离线培训的系统将在某些情况下失败。因此,我们提出了一种在线学习方法,在手动将车辆用ChartionPad手动对准时,利用驾驶员的行动,并将其与语义分割和深度的弱监督相结合,以学习分类器以自动注释视频中的电荷工作以进行进一步培训。通过这种方式,当面对先前的未持代币支付板时,驾驶员只需手动对准车辆即可。由于电荷板在地上平坦,从远处检测到它并不容易。因此,我们建议使用Visual Slam管道来学习相对于ChiftPad的地标,以实现从更大范围的对齐。我们展示了自动化车辆上的工作系统,如视频HTTPS://youtu.BE/_CLCMKW4UYO所示。为了鼓励进一步研究,我们将分享在这项工作中使用的费用数据集。
translated by 谷歌翻译
Surround-view fisheye perception under valet parking scenes is fundamental and crucial in autonomous driving. Environmental conditions in parking lots perform differently from the common public datasets, such as imperfect light and opacity, which substantially impacts on perception performance. Most existing networks based on public datasets may generalize suboptimal results on these valet parking scenes, also affected by the fisheye distortion. In this article, we introduce a new large-scale fisheye dataset called Fisheye Parking Dataset(FPD) to promote the research in dealing with diverse real-world surround-view parking cases. Notably, our compiled FPD exhibits excellent characteristics for different surround-view perception tasks. In addition, we also propose our real-time distortion-insensitive multi-task framework Fisheye Perception Network (FPNet), which improves the surround-view fisheye BEV perception by enhancing the fisheye distortion operation and multi-task lightweight designs. Extensive experiments validate the effectiveness of our approach and the dataset's exceptional generalizability.
translated by 谷歌翻译
Timely and effective feedback within surgical training plays a critical role in developing the skills required to perform safe and efficient surgery. Feedback from expert surgeons, while especially valuable in this regard, is challenging to acquire due to their typically busy schedules, and may be subject to biases. Formal assessment procedures like OSATS and GEARS attempt to provide objective measures of skill, but remain time-consuming. With advances in machine learning there is an opportunity for fast and objective automated feedback on technical skills. The SimSurgSkill 2021 challenge (hosted as a sub-challenge of EndoVis at MICCAI 2021) aimed to promote and foster work in this endeavor. Using virtual reality (VR) surgical tasks, competitors were tasked with localizing instruments and predicting surgical skill. Here we summarize the winning approaches and how they performed. Using this publicly available dataset and results as a springboard, future work may enable more efficient training of surgeons with advances in surgical data science. The dataset can be accessed from https://console.cloud.google.com/storage/browser/isi-simsurgskill-2021.
translated by 谷歌翻译
环绕视图相机是用于自动驾驶的主要传感器,用于近场感知。它是主要用于停车可视化和自动停车的商用车中最常用的传感器之一。四个带有190 {\ deg}视场覆盖车辆周围360 {\ deg}的鱼眼相机。由于其高径向失真,标准算法不容易扩展。以前,我们发布了第一个名为Woodscape的公共鱼眼环境视图数据集。在这项工作中,我们发布了环绕视图数据集的合成版本,涵盖了其许多弱点并扩展了它。首先,不可能获得像素光流和深度的地面真相。其次,为了采样不同的框架,木景没有同时注释的所有四个相机。但是,这意味着不能设计多相机算法以在新数据集中启用的鸟眼空间中获得统一的输出。我们在Carla模拟器中实现了环绕式鱼眼的几何预测,与木观的配置相匹配并创建了Synwoodscape。
translated by 谷歌翻译
采用车辆到车辆通信以提高自动驾驶技术中的感知性能,最近引起了相当大的关注;然而,对于基准测试算法的合适开放数据集已经难以开发和评估合作感知技术。为此,我们介绍了用于车辆到车辆的第一个大型开放模拟数据集。它包含超过70个有趣的场景,11,464帧和232,913帧的注释3D车辆边界盒,从卡拉的8个城镇和洛杉矶的数码镇。然后,我们构建了一个全面的基准,共有16种实施模型来评估若干信息融合策略〜(即早期,晚期和中间融合),最先进的激光雷达检测算法。此外,我们提出了一种新的细心中间融合管线,以从多个连接的车辆汇总信息。我们的实验表明,拟议的管道可以很容易地与现有的3D LIDAR探测器集成,即使具有大的压缩速率也可以实现出色的性能。为了鼓励更多的研究人员来调查车辆到车辆的感知,我们将释放数据集,基准方法以及HTTPS://mobility-lab.seas.ucla.edu/opv2v2v/中的所有相关代码。
translated by 谷歌翻译
基准,如Coco,在物体检测中发挥至关重要的作用。然而,现有的基准在规模变化中不足,他们的协议不足以进行公平比较。在本文中,我们介绍了通用尺度对象检测基准(USB)。 USB通过将Coco与最近提出的Waymo Open DataSet和Manga109-S数据集合并了Coco,USB具有对象尺度和图像域的变化。为了实现公平的比较和包容性研究,我们提出了培训和评估议定书。它们有多个部门用于培训时期和评估图像分辨率,如体育中的重量类,以及跨训练协议的兼容性,如通用串行总线的后向兼容性。具体而言,我们要求参与者报告结果,不仅具有更高的协议(更长的培训),而且还有更低的协议(较短培训)。使用所提出的基准和协议,我们分析了八种方法,发现了现有的Coco-偏偏见方法的缺点。代码可在https://github.com/shinya7y/universenet上获得。
translated by 谷歌翻译
Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We define novel 3D detection and tracking metrics. We also provide careful dataset analysis as well as baselines for lidar and image based detection and tracking. Data, development kit and more information are available online 1 .
translated by 谷歌翻译
自动驾驶技术的加速开发对获得大量高质量数据的需求更大。标签,现实世界数据代表性是培训深度学习网络的燃料,对于改善自动驾驶感知算法至关重要。在本文中,我们介绍了PANDASET,由完整的高精度自动车辆传感器套件生产的第一个数据集,具有无需成本商业许可证。使用一个360 {\ DEG}机械纺丝利达,一个前置,远程LIDAR和6个摄像机收集数据集。DataSet包含100多个场景,每个场景为8秒,为目标分类提供28种类型的标签和37种类型的语义分割标签。我们提供仅限LIDAR 3D对象检测的基线,LIDAR-Camera Fusion 3D对象检测和LIDAR点云分割。有关Pandaset和开发套件的更多详细信息,请参阅https://scale.com/open-datasets/pandaset。
translated by 谷歌翻译
智能城市应用程序(例如智能交通路由或事故预防)依赖计算机视觉方法来确切的车辆定位和跟踪。由于精确标记的数据缺乏,从多个摄像机中检测和跟踪3D的车辆被证明是探索挑战的。我们提出了一个庞大的合成数据集,用于多个重叠和非重叠摄像头视图中的多个车辆跟踪和分割。与现有的数据集不同,该数据集仅为2D边界框提供跟踪地面真实,我们的数据集还包含适用于相机和世界坐标中的3D边界框的完美标签,深度估计以及实例,语义和泛型细分。该数据集由17个小时的标记视频材料组成,从64个不同的一天,雨,黎明和夜幕播放的340张摄像机录制,使其成为迄今为止多目标多型多相机跟踪的最广泛数据集。我们提供用于检测,车辆重新识别以及单摄像机跟踪的基准。代码和数据公开可用。
translated by 谷歌翻译
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing selfdriving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world selfdriving problems, we introduce a new large-scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest cam-era+LiDAR dataset available based on our proposed geographical coverage metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.
translated by 谷歌翻译
计算机视觉在智能运输系统(ITS)和交通监视中发挥了重要作用。除了快速增长的自动化车辆和拥挤的城市外,通过实施深层神经网络的实施,可以使用视频监视基础架构进行自动和高级交通管理系统(ATM)。在这项研究中,我们为实时交通监控提供了一个实用的平台,包括3D车辆/行人检测,速度检测,轨迹估算,拥塞检测以及监视车辆和行人的相互作用,都使用单个CCTV交通摄像头。我们适应了定制的Yolov5深神经网络模型,用于车辆/行人检测和增强的排序跟踪算法。还开发了基于混合卫星的基于混合卫星的逆透视图(SG-IPM)方法,用于摄像机自动校准,从而导致准确的3D对象检测和可视化。我们还根据短期和长期的时间视频数据流开发了层次结构的交通建模解决方案,以了解脆弱道路使用者的交通流量,瓶颈和危险景点。关于现实世界情景和与最先进的比较的几项实验是使用各种交通监控数据集进行的,包括从高速公路,交叉路口和城市地区收集的MIO-TCD,UA-DETRAC和GRAM-RTM,在不同的照明和城市地区天气状况。
translated by 谷歌翻译