以前的在线3D多对象跟踪(3DMOT)方法在与几帧的新检测无关时终止ROCKET。但是如果一个物体刚刚变暗,就像被其他物体暂时封闭或者只是从FOV暂时封闭一样,过早地终止ROCKET将导致身份切换。我们揭示了过早的轨迹终端是现代3DMOT系统中身份开关的主要原因。为了解决这个问题,我们提出了一个不朽的跟踪器,一个简单的跟踪系统,它利用轨迹预测来维护对象变暗的物体的轨迹。我们使用一个简单的卡尔曼滤波器进行轨迹预测,并在目标不可见时通过预测保留轨迹。通过这种方法,我们可以避免由过早托管终止产生的96%的车辆标识开关。如果没有任何学习的参数,我们的方法在Waymo Open DataSet测试集上的车载类别的0.0001级和竞争Mota处实现了不匹配的比率。我们的不匹配比率比任何先前发表的方法低一倍。在NUSCENes上报告了类似的结果。我们相信拟议的不朽追踪器可以为推动3DMOT的极限提供简单而强大的解决方案。我们的代码可在https://github.com/immortaltracker/immortaltracker中找到。
translated by 谷歌翻译
3D多对象跟踪(MOT)近年来目睹了众多新颖的基准和方法,尤其是那些在“逐侦测”范式下的基准。尽管他们的进步和有用,但对他们的优势和劣势的深入分析尚不可用。在本文中,我们通过将它们分解为四个组成部分来总结当前的3D MOL方法:检测,关联,运动模型和生命周期管理的预处理。然后,我们将现有算法的故障情况归因于每个组件并详细研究它们。基于分析,我们提出了相应的改进,导致强大但简单的基线:简单进展。 Waymo Open DataSet和Nuscenes上的综合实验结果表明,我们的最终方法可以通过微小的修改来实现新的最先进的结果。此外,我们采取额外的步骤并重新思考当前的基准面是否真实地反映了真实挑战的算法能力。我们深入了解现有基准的细节,并找到一些有趣的事实。最后,我们分析了\ name \中剩余失败的分布和原因,并提出了3D MOT的未来方向。我们的代码可在https://github.com/tusimple/simpletrack获得。
translated by 谷歌翻译
3D多对象跟踪(MOT)确保在连续动态检测过程中保持一致性,有利于自动驾驶中随后的运动计划和导航任务。但是,基于摄像头的方法在闭塞情况下受到影响,准确跟踪基于激光雷达的方法的对象的不规则运动可能是具有挑战性的。某些融合方法效果很好,但不认为在遮挡下出现外观特征的不可信问题。同时,错误检测问题也显着影响跟踪。因此,我们根据组合的外观运动优化(Camo-Mot)提出了一种新颖的相机融合3D MOT框架,该框架使用相机和激光镜数据,并大大减少了由遮挡和错误检测引起的跟踪故障。对于遮挡问题,我们是第一个提出遮挡头来有效地选择最佳对象外观的人,从而减少了闭塞的影响。为了减少错误检测在跟踪中的影响,我们根据置信得分设计一个运动成本矩阵,从而提高了3D空间中的定位和对象预测准确性。由于现有的多目标跟踪方法仅考虑一个类别,因此我们还建议建立多类损失,以在多类别场景中实现多目标跟踪。在Kitti和Nuscenes跟踪基准测试上进行了一系列验证实验。我们提出的方法在KITTI测试数据集上的所有多模式MOT方法中实现了最先进的性能和最低的身份开关(IDS)值(CAR为23,行人为137)。并且我们提出的方法在Nuscenes测试数据集上以75.3%的AMOTA进行了所有算法中的最新性能。
translated by 谷歌翻译
To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings. Yet, camera-based 3D object tracking methods prioritize optimizing the single-camera setup and resort to post-hoc fusion in a multi-camera setup. In this paper, we propose a method for panoramic 3D object tracking, called CC-3DT, that associates and models object trajectories both temporally and across views, and improves the overall tracking consistency. In particular, our method fuses 3D detections from multiple cameras before association, reducing identity switches significantly and improving motion modeling. Our experiments on large-scale driving datasets show that fusion before association leads to a large margin of improvement over post-hoc fusion. We set a new state-of-the-art with 12.6% improvement in average multi-object tracking accuracy (AMOTA) among all camera-based methods on the competitive NuScenes 3D tracking benchmark, outperforming previously published methods by 6.5% in AMOTA with the same 3D detector.
translated by 谷歌翻译
3D多对象跟踪旨在唯一,始终如一地识别所有移动实体。尽管在此设置中提供了丰富的时空信息,但当前的3D跟踪方法主要依赖于抽象的信息和有限的历史记录,例如单帧对象边界框。在这项工作中,我们开发了对交通场景的整体表示,该场景利用了现场演员的空间和时间信息。具体而言,我们通过将跟踪的对象表示为时空点和边界框的序列来重新将跟踪作为时空问题,并在悠久的时间历史上进行重新制定。在每个时间戳上,我们通过对对象历史记录的完整顺序进行的细化来改善跟踪对象的位置和运动估计。通过共同考虑时间和空间,我们的代表自然地编码了基本的物理先验,例如对象持久性和整个时间的一致性。我们的时空跟踪框架在Waymo和Nuscenes基准测试中实现了最先进的性能。
translated by 谷歌翻译
一方面,在最近的文献中,许多3D多对象跟踪(MOT)的作品集中在跟踪准确性和被忽视的计算速度上,通常是通过设计相当复杂的成本功能和功能提取器来进行的。另一方面,某些方法以跟踪准确性为代价过多地关注计算速度。鉴于这些问题,本文提出了一种强大而快速的基于相机融合的MOT方法,该方法在准确性和速度之间取决于良好的权衡。依靠相机和激光雷达传感器的特性,设计并嵌入了提出的MOT方法中的有效的深层关联机制。该关联机制在对象远处并仅由摄像机检测到2D域中的对象,并在对象出现在LIDAR的视野中以实现平滑融合时获得的2D轨迹进行更新,并更新2D轨迹。 2D和3D轨迹。基于典型数据集的广泛实验表明,就跟踪准确性和处理速度而言,我们提出的方法在最先进的MOT方法上具有明显的优势。我们的代码可公开用于社区的利益。
translated by 谷歌翻译
Multi-object tracking is a cornerstone capability of any robotic system. Most approaches follow a tracking-by-detection paradigm. However, within this framework, detectors function in a low precision-high recall regime, ensuring a low number of false-negatives while producing a high rate of false-positives. This can negatively affect the tracking component by making data association and track lifecycle management more challenging. Additionally, false-negative detections due to difficult scenarios like occlusions can negatively affect tracking performance. Thus, we propose a method that learns shape and spatio-temporal affinities between consecutive frames to better distinguish between true-positive and false-positive detections and tracks, while compensating for false-negative detections. Our method provides a probabilistic matching of detections that leads to robust data association and track lifecycle management. We quantitatively evaluate our method through ablative experiments and on the nuScenes tracking benchmark where we achieve state-of-the-art results. Our method not only estimates accurate, high-quality tracks but also decreases the overall number of false-positive and false-negative tracks. Please see our project website for source code and demo videos: sites.google.com/view/shasta-3d-mot/home.
translated by 谷歌翻译
This paper explores a pragmatic approach to multiple object tracking where the main focus is to associate objects efficiently for online and realtime applications. To this end, detection quality is identified as a key factor influencing tracking performance, where changing the detector can improve tracking by up to 18.9%. Despite only using a rudimentary combination of familiar techniques such as the Kalman Filter and Hungarian algorithm for the tracking components, this approach achieves an accuracy comparable to state-of-the-art online trackers. Furthermore, due to the simplicity of our tracking method, the tracker updates at a rate of 260 Hz which is over 20x faster than other state-of-the-art trackers.
translated by 谷歌翻译
Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That's it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves 67.8% MOTA on the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves 28.3% AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.
translated by 谷歌翻译
多对象跟踪(MOT)的目标是检测和跟踪场景中的所有对象,同时为每个对象保留唯一的标识符。在本文中,我们提出了一种新的可靠的最新跟踪器,该跟踪器可以结合运动和外观信息的优势,以及摄像机运动补偿以及更准确的Kalman滤波器状态矢量。我们的新跟踪器在Mot17和Mot20测试集的Motchallenge [29,11]的数据集[29,11]中,Bot-Sort-Reid排名第一,就所有主要MOT指标而言:MOTA,IDF1和HOTA。对于Mot17:80.5 Mota,80.2 IDF1和65.0 HOTA。源代码和预培训模型可在https://github.com/niraharon/bot-sort上找到
translated by 谷歌翻译
Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objects as points. Our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity. In a second stage, it refines these estimates using additional point features on the object. In CenterPoint, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective. CenterPoint achieved state-of-theart performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model. On the Waymo Open Dataset, Center-Point outperforms all previous single model methods by a large margin and ranks first among all Lidar-only submissions. The code and pretrained models are available at https://github.com/tianweiy/CenterPoint.
translated by 谷歌翻译
多任务学习的最新研究揭示了解决单个神经网络中相关问题的好处。 3D对象检测和多对象跟踪(MOT)是两个严重的相互交织的问题,可以预测并关联整个时间的对象实例位置。但是,3D MOT中的大多数先前作品都将检测器视为先前的分离管道,不一致地将检测器的输出作为跟踪器的输入。在这项工作中,我们提出了Minkowski Tracker,这是一种稀疏的时空R-CNN,可以共同解决对象检测和跟踪。受基于区域的CNN(R-CNN)的启发,我们建议将跟踪作为对象检测器R-CNN的第二阶段,该跟踪预测了轨道的分配概率。首先,Minkowski Tracker将4D点云作为输入,以生成时空鸟的视图(BEV)特征通过4D稀疏卷积编码器网络。然后,我们提出的TrackAlign聚集了BEV功能的轨道区域(ROI)功能。最后,Minkowski Tracker根据ROI功能预测的检测到追踪匹配概率更新了跟踪及其置信得分。我们在大规模实验中显示,我们方法的总体性能增益是由于四个因素:1。4D编码器的时间推理提高了检测性能2.对象检测的多任务学习和MOT共同增强了彼此3.检测到轨道比赛得分学习隐式运动模型以增强轨道分配4.检测到轨道匹配分数提高了轨道置信度得分的质量。结果,Minkowski Tracker在没有手工设计的运动模型的情况下实现了Nuscenes数据集跟踪任务上的最新性能。
translated by 谷歌翻译
对象运动和对象外观是多个对象跟踪(MOT)应用中的常用信息,用于将帧跨越帧的检测相关联,或用于联合检测和跟踪方法的直接跟踪预测。然而,不仅是这两种类型的信息通常是单独考虑的,而且它们也没有帮助直接从当前感兴趣帧中使用视觉信息的用法。在本文中,我们提出了PatchTrack,一种基于变压器的联合检测和跟踪系统,其使用当前感兴趣的帧帧的曲线预测曲目。我们使用卡尔曼滤波器从前一帧预测当前帧中的现有轨道的位置。从预测边界框裁剪的补丁被发送到变压器解码器以推断新曲目。通过利用在补丁中编码的对象运动和对象外观信息,所提出的方法将更多地关注新曲目更有可能发生的位置。我们展示了近期MOT基准的Patchtrack的有效性,包括MOT16(MOTA 73.71%,IDF1 65.77%)和MOT17(MOTA 73.59%,IDF1 65.23%)。结果在https://motchallenge.net/method/mot=4725&chl=10上发布。
translated by 谷歌翻译
视频中的多目标跟踪需要解决相邻帧中对象之间一对一分配的基本问题。大多数方法通过首先丢弃不可能的对距离大于阈值的不可能对解决问题,然后使用匈牙利算法将对象链接起来以最大程度地减少整体距离。但是,我们发现从重新ID特征计算出的距离的分布可能在不同的视频中有很大差异。因此,没有一个最佳阈值可以使我们安全丢弃不可能的对。为了解决该问题,我们提出了一种有效的方法来实时计算每对对象的边际概率。边际概率可以视为标准化距离,比原始特征距离明显稳定。结果,我们可以为所有视频使用一个阈值。该方法是一般的,可以应用于现有的跟踪器,以在IDF1度量方面获得大约一个点改进。它在MOT17和MOT20基准上取得了竞争成果。此外,计算的概率更容易解释,从而有助于后续后期处理操作。
translated by 谷歌翻译
The problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-bydetection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. To this end, we exploit the bounding box regression of an object detector to predict the position of an object in the next frame, thereby converting a detector into a Tracktor. We demonstrate the potential of Tracktor and provide a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation.We then perform an analysis on the performance and failure cases of several state-of-the-art tracking methods in comparison to our Tracktor. Surprisingly, none of the dedicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small and occluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigm and point out promising future research directions. Overall, Tracktor yields superior tracking performance than any current tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.
translated by 谷歌翻译
为了克服多个对象跟踪任务中的挑战,最近的算法将交互线索与运动和外观特征一起使用。这些算法使用图形神经网络或变压器来提取导致高计算成本的交互功能。在本文中,提出了一种基于几何特征的新型交互提示,旨在检测遮挡和重新识别计算成本低的丢失目标。此外,在大多数算法中,摄像机运动被认为可以忽略不计,这是一个强有力的假设,并不总是正确的,并且导致目标转换或目标不匹配。在本文中,提出了一种测量相机运动和删除其效果的方法,可有效地降低相机运动对跟踪的影响。该算法在MOT17和MOT20数据集上进行了评估,并在MOT20上实现了MOT17的最先进性能和可比较的结果。该代码也可以公开使用。
translated by 谷歌翻译
How would you fairly evaluate two multi-object tracking algorithms (i.e. trackers), each one employing a different object detector? Detectors keep improving, thus trackers can make less effort to estimate object states over time. Is it then fair to compare a new tracker employing a new detector with another tracker using an old detector? In this paper, we propose a novel performance measure, named Tracking Effort Measure (TEM), to evaluate trackers that use different detectors. TEM estimates the improvement that the tracker does with respect to its input data (i.e. detections) at frame level (intra-frame complexity) and sequence level (inter-frame complexity). We evaluate TEM over well-known datasets, four trackers and eight detection sets. Results show that, unlike conventional tracking evaluation measures, TEM can quantify the effort done by the tracker with a reduced correlation on the input detections. Its implementation is publicly available online at https://github.com/vpulab/MOT-evaluation.
translated by 谷歌翻译
Existing Multiple Object Tracking (MOT) methods design complex architectures for better tracking performance. However, without a proper organization of input information, they still fail to perform tracking robustly and suffer from frequent identity switches. In this paper, we propose two novel methods together with a simple online Message Passing Network (MPN) to address these limitations. First, we explore different integration methods for the graph node and edge embeddings and put forward a new IoU (Intersection over Union) guided function, which improves long term tracking and handles identity switches. Second, we introduce a hierarchical sampling strategy to construct sparser graphs which allows to focus the training on more difficult samples. Experimental results demonstrate that a simple online MPN with these two contributions can perform better than many state-of-the-art methods. In addition, our association method generalizes well and can also improve the results of private detection based methods.
translated by 谷歌翻译
本文旨在解决多个对象跟踪(MOT),这是计算机视觉中的一个重要问题,但由于许多实际问题,尤其是阻塞,因此仍然具有挑战性。确实,我们提出了一种新的实时深度透视图 - 了解多个对象跟踪(DP-MOT)方法,以解决MOT中的闭塞问题。首先提出了一个简单但有效的主题深度估计(SODE),以在2D场景中自动以无监督的方式自动订购检测到的受试者的深度位置。使用SODE的输出,提出了一个新的活动伪3D KALMAN滤波器,即具有动态控制变量的Kalman滤波器的简单但有效的扩展,以动态更新对象的运动。此外,在数据关联步骤中提出了一种新的高阶关联方法,以合并检测到的对象之间的一阶和二阶关系。与标准MOT基准的最新MOT方法相比,提出的方法始终达到最先进的性能。
translated by 谷歌翻译
由于3D对象检测和2D MOT的快速发展,3D多对象跟踪(MOT)已取得了巨大的成就。最近的高级工作通常采用一系列对象属性,例如位置,大小,速度和外观,以提供3D MOT的关联线索。但是,由于某些视觉噪音,例如遮挡和模糊,这些提示可能无法可靠,从而导致跟踪性能瓶颈。为了揭示困境,我们进行了广泛的经验分析,以揭示每个线索的关键瓶颈及其彼此之间的相关性。分析结果激发了我们有效地吸收所有线索之间的优点,并适应性地产生最佳的应对方式。具体而言,我们提出位置和速度质量学习,该学习有效地指导网络估计预测对象属性的质量。基于这些质量估计,我们提出了一种质量意识的对象关联(QOA)策略,以利用质量得分作为实现强大关联的重要参考因素。尽管具有简单性,但广泛的实验表明,提出的策略可显着提高2.2%的AMOTA跟踪性能,而我们的方法的表现优于所有现有的最先进的Nuscenes上的最新作品。此外,Qtrack在Nuscenes验证和测试集上实现了48.0%和51.1%的AMOTA跟踪性能,这大大降低了纯摄像头和基于LIDAR的跟踪器之间的性能差距。
translated by 谷歌翻译