探讨了将数据驱动对象检测器的不确定性结合到对象跟踪算法中的不确定性的方法。对象跟踪方法依赖于测量误差模型,通常以测量噪声,假阳性率和错过检测速率的形式。通常,这些数量通常可以取决于物体或测量位置。然而,对于从神经网络处理的摄像机输入产生的检测,这些测量误差统计不足以表示主要错误源,即运行时传感器输入与检测器训练的训练数据之间的不相似性。为此,我们调查将数据不确定性纳入物体跟踪方法,例如提高跟踪物体的能力,特别是那些超出的能力。培训数据。所提出的方法在对象跟踪基准上验证以及具有真正自治飞机的实验。
translated by 谷歌翻译
多对象跟踪(MOT)是现代高级驾驶员辅助系统(ADA)和自动驾驶(AD)系统的关键应用之一。 MOT的大多数解决方案都是基于随机矢量贝叶斯过滤器,例如Global最近的邻居(GNN)以及基于规则的启发轨道维护。随着随机有限集(RFS)理论的发展,最近已将RFS贝叶斯过滤器应用于ADA和AD Systems的MOT任务中。但是,由于计算成本和实施复杂性,它们在实际流量中的有用性是对疑问的。在本文中,据透露,具有基于规则的启发式轨道维护的GNN不足以在ADA和AD系统中基于激光雷达的MOT任务。通过系统地比较几个不同的基于对象过滤器的跟踪框架,包括传统的随机矢量贝叶斯滤波器,以及基于规则的启发式跟踪维护和RFS贝叶斯过滤器,可以说明这种判断。此外,提出了一个简单有效的跟踪器,即使用全局最近邻居(GNN-PMB)跟踪器的Poisson Multi-Bernoulli滤波器,建议用于基于激光雷达的MOT任务。拟议的GNN-PMB跟踪器在Nuscenes测试数据集中取得了竞争性的结果,并显示出优于其他最先进的LIDAR的跟踪性能,而Haver Holly Holling Trackers,Lidar和基于摄像机的基于摄像头的跟踪器。
translated by 谷歌翻译
在协作人类机器人语义传感问题中,例如为了进行科学探索,机器人可能会通过人类伴侣提供过度质疑的信息,从而导致次优的状态估计和团队绩效差。当人类不能被视为牙齿时,机器人需要更新状态信念,以正确解释人类语义观察与导致这些观察的现实世界状态之间可能存在的差异。这项工作为在一般环境中针对语义可能性的概率语义数据关联(PSDA)概率进行了严格的在线计算制定了策略,这与以前的工作不同,这些工作开发了针对特定设置的天真或启发式近似。新的PSDA方法纳入了混合贝叶斯数据融合方案中,该方案将高斯混合先验用于对象状态和SoftMax函数用于语义人类传感器观察可能性,并在Monte Carlo模拟中证明了合作的多对象搜索任务的范围人类感测特征(例如错误的检测率)。结果表明,每当语义人类传感器数据包含重要的目标参考歧义性,用于自主对象搜索和本地化时,PSDA会导致在广泛条件下对观察关联概率的强大估计。
translated by 谷歌翻译
自主驾驶应用中的对象检测意味着语义对象的检测和跟踪通常是城市驾驶环境的原产,作为行人和车辆。最先进的基于深度学习的物体检测中的主要挑战之一是假阳性,其出现过于自信得分。由于安全问题,这在自动驾驶和其他关键机器人感知域中是非常不可取的。本文提出了一种通过将新的概率层引入测试中的深度对象检测网络来缓解过度自信预测问题的方法。建议的方法避免了传统的乙状结肠或Softmax预测层,其通常产生过度自信预测。证明所提出的技术在不降低真实阳性上的性能的情况下降低了误报的过度频率。通过yolov4和第二(基于LiDar的探测器)对2D-Kitti异点检测验证了该方法。该方法使得能够实现可解释的概率预测,而无需重新培训网络,因此非常实用。
translated by 谷歌翻译
最近的多目标跟踪(MOT)系统利用高精度的对象探测器;然而,培训这种探测器需要大量标记的数据。虽然这种数据广泛适用于人类和车辆,但其他动物物种显着稀缺。我们目前稳健的置信跟踪(RCT),一种算法,旨在保持鲁棒性能,即使检测质量差。与丢弃检测置信信息的先前方法相比,RCT采用基本上不同的方法,依赖于精确的检测置信度值来初始化曲目,扩展轨道和滤波器轨道。特别地,RCT能够通过有效地使用低置信度检测(以及单个物体跟踪器)来最小化身份切换,以保持对象的连续轨道。为了评估在存在不可靠的检测中的跟踪器,我们提出了一个挑战的现实世界水下鱼跟踪数据集,Fishtrac。在对FISHTRAC以及UA-DETRAC数据集的评估中,我们发现RCT在提供不完美的检测时优于其他算法,包括最先进的深单和多目标跟踪器以及更经典的方法。具体而言,RCT具有跨越方法的最佳平均热量,可以成功返回所有序列的结果,并且具有比其他方法更少的身份交换机。
translated by 谷歌翻译
部署到开放世界中,对象探测器容易出现开放式错误,训练数据集中不存在的对象类的假阳性检测。我们提出了GMM-DET,一种用于从对象探测器中提取认知不确定性的实时方法,以识别和拒绝开放式错误。 GMM-DID列达探测器以产生与特定于类高斯混合模型建模的结构化的Logit空间。在测试时间时,通过所有高斯混合模型下的低对数概率识别开放式错误。我们测试了两个常见的探测器架构,更快的R-CNN和RETINANET,跨越了三种不同的数据集,跨越机器人和计算机视觉。我们的结果表明,GMM-DET始终如一地优于识别和拒绝开放式检测的现有不确定性技术,特别是在安全关键应用程序所需的低差错率操作点。 GMM-DET保持对象检测性能,并仅引入最小的计算开销。我们还介绍一种用于将现有对象检测数据集转换为特定的开放式数据集的方法,以评估对象检测中的开放式性能。
translated by 谷歌翻译
通过查找图像可能不满意的图像来捕获对象检测器的错误行为,这一兴趣很长。在实际应用(例如自动驾驶)中,对于表征除了简单的检测性能要求之外的潜在失败也至关重要。例如,与远处未遗漏的汽车检测相比,错过对靠近自我车辆的行人的侦查通常需要更仔细的检查。在测试时间预测这种潜在失败的问题在文献和基于检测不确定性的传统方法中被忽略了,因为它们对这种错误的细粒度表征不可知。在这项工作中,我们建议将查找“硬”图像作为基于查询的硬图像检索任务的问题进行重新制定,其中查询是“硬度”的特定定义,并提供了一种简单而直观的方法,可以解决此任务大型查询家庭。我们的方法完全是事后的,不需要地面真相注释,独立于检测器的选择,并且依赖于有效的蒙特卡洛估计,该估计使用简单的随机模型代替地面真相。我们通过实验表明,它可以成功地应用于各种查询中,它可以可靠地识别给定检测器的硬图像,而无需任何标记的数据。我们使用广泛使用的视网膜,更快的RCNN,Mask-RCNN和CASCADE MASK-RCNN对象检测器提供有关排名和分类任务的结果。
translated by 谷歌翻译
Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms. In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches. In spirit of the original framework we place much of the computational complexity into an offline pre-training stage where we learn a deep association metric on a largescale person re-identification dataset. During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space. Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates.
translated by 谷歌翻译
This paper explores a pragmatic approach to multiple object tracking where the main focus is to associate objects efficiently for online and realtime applications. To this end, detection quality is identified as a key factor influencing tracking performance, where changing the detector can improve tracking by up to 18.9%. Despite only using a rudimentary combination of familiar techniques such as the Kalman Filter and Hungarian algorithm for the tracking components, this approach achieves an accuracy comparable to state-of-the-art online trackers. Furthermore, due to the simplicity of our tracking method, the tracker updates at a rate of 260 Hz which is over 20x faster than other state-of-the-art trackers.
translated by 谷歌翻译
Accurate representation and localization of relevant objects is important for robots to perform tasks. Building a generic representation that can be used across different environments and tasks is not easy, as the relevant objects vary depending on the environment and the task. Furthermore, another challenge arises in agro-food environments due to their complexity, and high levels of clutter and occlusions. In this paper, we present a method to build generic representations in highly occluded agro-food environments using multi-view perception and 3D multi-object tracking. Our representation is built upon a detection algorithm that generates a partial point cloud for each detected object. The detected objects are then passed to a 3D multi-object tracking algorithm that creates and updates the representation over time. The whole process is performed at a rate of 10 Hz. We evaluated the accuracy of the representation on a real-world agro-food environment, where it was able to successfully represent and locate tomatoes in tomato plants despite a high level of occlusion. We were able to estimate the total count of tomatoes with a maximum error of 5.08% and to track tomatoes with a tracking accuracy up to 71.47%. Additionally, we showed that an evaluation using tracking metrics gives more insight in the errors in localizing and representing the fruits.
translated by 谷歌翻译
3D多对象跟踪旨在唯一,始终如一地识别所有移动实体。尽管在此设置中提供了丰富的时空信息,但当前的3D跟踪方法主要依赖于抽象的信息和有限的历史记录,例如单帧对象边界框。在这项工作中,我们开发了对交通场景的整体表示,该场景利用了现场演员的空间和时间信息。具体而言,我们通过将跟踪的对象表示为时空点和边界框的序列来重新将跟踪作为时空问题,并在悠久的时间历史上进行重新制定。在每个时间戳上,我们通过对对象历史记录的完整顺序进行的细化来改善跟踪对象的位置和运动估计。通过共同考虑时间和空间,我们的代表自然地编码了基本的物理先验,例如对象持久性和整个时间的一致性。我们的时空跟踪框架在Waymo和Nuscenes基准测试中实现了最先进的性能。
translated by 谷歌翻译
We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a Self-Supervised Learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically-transformed images as inputs to a Convolutional Neural Network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multi-view trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to $42\%$ without increasing the inference time of the model. Our multi-camera association method achieves up to $89\%$ multi-object tracking accuracy with an average computation time of less than $15$ ms.
translated by 谷歌翻译
Event-based vision has been rapidly growing in recent years justified by the unique characteristics it presents such as its high temporal resolutions (~1us), high dynamic range (>120dB), and output latency of only a few microseconds. This work further explores a hybrid, multi-modal, approach for object detection and tracking that leverages state-of-the-art frame-based detectors complemented by hand-crafted event-based methods to improve the overall tracking performance with minimal computational overhead. The methods presented include event-based bounding box (BB) refinement that improves the precision of the resulting BBs, as well as a continuous event-based object detection method, to recover missed detections and generate inter-frame detections that enable a high-temporal-resolution tracking output. The advantages of these methods are quantitatively verified by an ablation study using the higher order tracking accuracy (HOTA) metric. Results show significant performance gains resembled by an improvement in the HOTA from 56.6%, using only frames, to 64.1% and 64.9%, for the event and edge-based mask configurations combined with the two methods proposed, at the baseline framerate of 24Hz. Likewise, incorporating these methods with the same configurations has improved HOTA from 52.5% to 63.1%, and from 51.3% to 60.2% at the high-temporal-resolution tracking rate of 384Hz. Finally, a validation experiment is conducted to analyze the real-world single-object tracking performance using high-speed LiDAR. Empirical evidence shows that our approaches provide significant advantages compared to using frame-based object detectors at the baseline framerate of 24Hz and higher tracking rates of up to 500Hz.
translated by 谷歌翻译
宽阔的区域运动图像(瓦米)产生具有大量极小物体的高分辨率图像。目标物体在连续帧中具有大的空间位移。令人讨厌的图像的这种性质使对象跟踪和检测具有挑战性。在本文中,我们介绍了我们基于深度神经网络的组合对象检测和跟踪模型,即热图网络(HM-Net)。 HM-Net明显快于最先进的帧差异和基于背景减法的方法,而不会影响检测和跟踪性能。 HM-Net遵循基于对象的联合检测和跟踪范式。简单的热图的预测支持无限数量的同时检测。所提出的方法使用来自前一帧的两个连续帧和物体检测热图作为输入,这有助于帧之间的HM-Net监视器时空变化并跟踪先前预测的对象。尽管重复使用先前的物体检测热图作为基于生命的反馈的存储器元件,但它可能导致假阳性检测的意外浪涌。为了增加对误报和消除低置信度检测的方法的稳健性,HM-Net采用新的反馈滤波器和高级数据增强。 HM-Net优于最先进的WAMI移动对象检测和跟踪WPAFB数据集的跟踪方法,其96.2%F1和94.4%地图检测分数,同时在同一数据集上实现61.8%的地图跟踪分数。这种性能对应于F1,6.1%的地图分数的增长率为2.1%,而在追踪最先进的地图分数的地图分数为9.5%。
translated by 谷歌翻译
Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That's it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves 67.8% MOTA on the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves 28.3% AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.
translated by 谷歌翻译
在执行视觉伺服或对象跟踪任务时,有效的传感器规划对于保持目标的目标是必不可少的,或者在缺失时重新定位它们。特别是,当处理从传感器的视野中缺少的已知目标时,我们建议使用与上下文信息相关的先验知识来估计其可能的位置。为此,本研究提出了一种动态贝叶斯网络,它使用上下文信息来有效地搜索目标。 Monte Carlo颗粒滤波器用于近似目标状态的后验概率,从中定义不确定性。我们通过信息理论形式主义定义机器人的实用程序函数,因为寻求最佳动作减少了任务的不确定性,提示机器人代理商调查最可能存在的目标的位置。使用上下文状态模型,我们使用部分可观察的Markov决策过程设计代理的高级决策框架。根据通过顺序观察的基础上下文的估计信仰状态,决定了机器人的导航行动进行探索性和检测任务。通过使用这种多模态上下文模型,我们的代理可以有效处理基本动态事件,例如妨碍目标或从视野中的缺失。我们实时实施并展示移动机器人的这些功能。
translated by 谷歌翻译
Object permanence is the concept that objects do not suddenly disappear in the physical world. Humans understand this concept at young ages and know that another person is still there, even though it is temporarily occluded. Neural networks currently often struggle with this challenge. Thus, we introduce explicit object permanence into two stage detection approaches drawing inspiration from particle filters. At the core, our detector uses the predictions of previous frames as additional proposals for the current one at inference time. Experiments confirm the feedback loop improving detection performance by a up to 10.3 mAP with little computational overhead. Our approach is suited to extend two-stage detectors for stabilized and reliable detections even under heavy occlusion. Additionally, the ability to apply our method without retraining an existing model promises wide application in real-world tasks.
translated by 谷歌翻译
在本文中,我们使用单个摄像头和惯性测量单元(IMU)以及相应的感知共识问题(即,所有观察者的独特性和相同的ID)来解决基于视觉的检测和跟踪多个航空车的问题。我们设计了几种基于视觉的分散贝叶斯多跟踪滤波策略,以解决视觉探测器算法获得的传入的未分类测量与跟踪剂之间的关联。我们根据团队中代理的数量在不同的操作条件以及可扩展性中比较它们的准确性。该分析提供了有关给定任务最合适的设计选择的有用见解。我们进一步表明,提出的感知和推理管道包括深度神经网络(DNN),因为视觉目标检测器是轻量级的,并且能够同时运行控制和计划,并在船上进行大小,重量和功率(交换)约束机器人。实验结果表明,在各种具有挑战性的情况(例如重闭)中,有效跟踪了多个无人机。
translated by 谷歌翻译
本文介绍了一种名为Polytrack的新方法,用于使用边界多边形的快速多目标跟踪和分段。PolyTrack通过产生其中心键盘的热插拔来检测物体。对于它们中的每一个,通过在每个实例上计算限定多边形而不是传统边界框来完成粗略分割。通过将两个连续帧作为输入来完成跟踪,并计算在第一帧中检测到的每个对象的中心偏移,以预测其在第二帧中的位置。还应用了卡尔曼滤波器以减少ID交换机的数量。由于我们的目标应用程序是自动化驾驶系统,因此我们在城市环境视频上应用了方法。我们在MOTS和Kittimots数据集上培训和评估多轨。结果表明,跟踪多边形可以是边界框和掩模跟踪的良好替代品。Polytrack代码可在https://github.com/gafaua/polytrack上获得。
translated by 谷歌翻译
准确的不确定性估计对于在安全关键系统中部署深层对象探测器至关重要。概率对象探测器的开发和评估受到现有绩效指标的缺点的阻碍,这些绩效指标倾向于涉及任意阈值或限制检测器的分布选择。在这项工作中,我们建议将对象检测视为设置预测任务,其中检测器预测对象集的分布。使用负面的对数可能性进行随机有限集,我们提出了一个适当的评分规则,用于评估和训练概率对象探测器。所提出的方法可以应用于现有的概率检测器,没有阈值,并可以在体系结构之间进行公平的比较。在可可数据集上评估了三种不同类型的检测器。我们的结果表明,现有检测器的培训已针对非稳定指标进行了优化。我们希望鼓励开发新的对象探测器,这些探测器可以准确估计自己的不确定性。代码可在https://github.com/georghess/pmb-nll上找到。
translated by 谷歌翻译