多对象跟踪(MOT)的目标是检测和跟踪场景中的所有对象,同时为每个对象保留唯一的标识符。在本文中,我们提出了一种新的可靠的最新跟踪器,该跟踪器可以结合运动和外观信息的优势,以及摄像机运动补偿以及更准确的Kalman滤波器状态矢量。我们的新跟踪器在Mot17和Mot20测试集的Motchallenge [29,11]的数据集[29,11]中,Bot-Sort-Reid排名第一,就所有主要MOT指标而言:MOTA,IDF1和HOTA。对于Mot17:80.5 Mota,80.2 IDF1和65.0 HOTA。源代码和预培训模型可在https://github.com/niraharon/bot-sort上找到
translated by 谷歌翻译
本文旨在解决多个对象跟踪(MOT),这是计算机视觉中的一个重要问题,但由于许多实际问题,尤其是阻塞,因此仍然具有挑战性。确实,我们提出了一种新的实时深度透视图 - 了解多个对象跟踪(DP-MOT)方法,以解决MOT中的闭塞问题。首先提出了一个简单但有效的主题深度估计(SODE),以在2D场景中自动以无监督的方式自动订购检测到的受试者的深度位置。使用SODE的输出,提出了一个新的活动伪3D KALMAN滤波器,即具有动态控制变量的Kalman滤波器的简单但有效的扩展,以动态更新对象的运动。此外,在数据关联步骤中提出了一种新的高阶关联方法,以合并检测到的对象之间的一阶和二阶关系。与标准MOT基准的最新MOT方法相比,提出的方法始终达到最先进的性能。
translated by 谷歌翻译
This paper explores a pragmatic approach to multiple object tracking where the main focus is to associate objects efficiently for online and realtime applications. To this end, detection quality is identified as a key factor influencing tracking performance, where changing the detector can improve tracking by up to 18.9%. Despite only using a rudimentary combination of familiar techniques such as the Kalman Filter and Hungarian algorithm for the tracking components, this approach achieves an accuracy comparable to state-of-the-art online trackers. Furthermore, due to the simplicity of our tracking method, the tracker updates at a rate of 260 Hz which is over 20x faster than other state-of-the-art trackers.
translated by 谷歌翻译
The problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-bydetection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. To this end, we exploit the bounding box regression of an object detector to predict the position of an object in the next frame, thereby converting a detector into a Tracktor. We demonstrate the potential of Tracktor and provide a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation.We then perform an analysis on the performance and failure cases of several state-of-the-art tracking methods in comparison to our Tracktor. Surprisingly, none of the dedicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small and occluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigm and point out promising future research directions. Overall, Tracktor yields superior tracking performance than any current tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.
translated by 谷歌翻译
Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That's it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves 67.8% MOTA on the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves 28.3% AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.
translated by 谷歌翻译
多目标跟踪(MOT)的典型管道是使用探测器进行对象本地化,并在重新识别(RE-ID)之后进行对象关联。该管道通过对象检测和重新ID的最近进展部分而部分地激励,并且部分地通过现有的跟踪数据集中的偏差激励,其中大多数物体倾向于具有区分外观和RE-ID模型足以建立关联。为了响应这种偏见,我们希望重新强调多目标跟踪的方法也应该在对象外观不充分辨别时起作用。为此,我们提出了一个大型数据集,用于多人跟踪,人类具有相似的外观,多样化的运动和极端关节。由于数据集包含主要组跳舞视频,我们将其命名为“DanceTrack”。我们预计DanceTrack可以提供更好的平台,以开发更多的MOT算法,这些算法依赖于视觉识别并更依赖于运动分析。在我们的数据集上,我们在数据集上基准测试了几个最先进的追踪器,并在与现有基准测试中遵守DanceTrack的显着性能下降。 DataSet,项目代码和竞争服务器播放:\ url {https://github.com/danceTrack}。
translated by 谷歌翻译
Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms. In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches. In spirit of the original framework we place much of the computational complexity into an offline pre-training stage where we learn a deep association metric on a largescale person re-identification dataset. During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space. Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates.
translated by 谷歌翻译
视频中的多目标跟踪需要解决相邻帧中对象之间一对一分配的基本问题。大多数方法通过首先丢弃不可能的对距离大于阈值的不可能对解决问题,然后使用匈牙利算法将对象链接起来以最大程度地减少整体距离。但是,我们发现从重新ID特征计算出的距离的分布可能在不同的视频中有很大差异。因此,没有一个最佳阈值可以使我们安全丢弃不可能的对。为了解决该问题,我们提出了一种有效的方法来实时计算每对对象的边际概率。边际概率可以视为标准化距离,比原始特征距离明显稳定。结果,我们可以为所有视频使用一个阈值。该方法是一般的,可以应用于现有的跟踪器,以在IDF1度量方面获得大约一个点改进。它在MOT17和MOT20基准上取得了竞争成果。此外,计算的概率更容易解释,从而有助于后续后期处理操作。
translated by 谷歌翻译
To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings. Yet, camera-based 3D object tracking methods prioritize optimizing the single-camera setup and resort to post-hoc fusion in a multi-camera setup. In this paper, we propose a method for panoramic 3D object tracking, called CC-3DT, that associates and models object trajectories both temporally and across views, and improves the overall tracking consistency. In particular, our method fuses 3D detections from multiple cameras before association, reducing identity switches significantly and improving motion modeling. Our experiments on large-scale driving datasets show that fusion before association leads to a large margin of improvement over post-hoc fusion. We set a new state-of-the-art with 12.6% improvement in average multi-object tracking accuracy (AMOTA) among all camera-based methods on the competitive NuScenes 3D tracking benchmark, outperforming previously published methods by 6.5% in AMOTA with the same 3D detector.
translated by 谷歌翻译
We propose a Cascaded Buffered IoU (C-BIoU) tracker to track multiple objects that have irregular motions and indistinguishable appearances. When appearance features are unreliable and geometric features are confused by irregular motions, applying conventional Multiple Object Tracking (MOT) methods may generate unsatisfactory results. To address this issue, our C-BIoU tracker adds buffers to expand the matching space of detections and tracks, which mitigates the effect of irregular motions in two aspects: one is to directly match identical but non-overlapping detections and tracks in adjacent frames, and the other is to compensate for the motion estimation bias in the matching space. In addition, to reduce the risk of overexpansion of the matching space, cascaded matching is employed: first matching alive tracks and detections with a small buffer, and then matching unmatched tracks and detections with a large buffer. Despite its simplicity, our C-BIoU tracker works surprisingly well and achieves state-of-the-art results on MOT datasets that focus on irregular motions and indistinguishable appearances. Moreover, the C-BIoU tracker is the dominant component for our 2-nd place solution in the CVPR'22 SoccerNet MOT and ECCV'22 MOTComplex DanceTrack challenges. Finally, we analyze the limitation of our C-BIoU tracker in ablation studies and discuss its application scope.
translated by 谷歌翻译
3D多对象跟踪(MOT)确保在连续动态检测过程中保持一致性,有利于自动驾驶中随后的运动计划和导航任务。但是,基于摄像头的方法在闭塞情况下受到影响,准确跟踪基于激光雷达的方法的对象的不规则运动可能是具有挑战性的。某些融合方法效果很好,但不认为在遮挡下出现外观特征的不可信问题。同时,错误检测问题也显着影响跟踪。因此,我们根据组合的外观运动优化(Camo-Mot)提出了一种新颖的相机融合3D MOT框架,该框架使用相机和激光镜数据,并大大减少了由遮挡和错误检测引起的跟踪故障。对于遮挡问题,我们是第一个提出遮挡头来有效地选择最佳对象外观的人,从而减少了闭塞的影响。为了减少错误检测在跟踪中的影响,我们根据置信得分设计一个运动成本矩阵,从而提高了3D空间中的定位和对象预测准确性。由于现有的多目标跟踪方法仅考虑一个类别,因此我们还建议建立多类损失,以在多类别场景中实现多目标跟踪。在Kitti和Nuscenes跟踪基准测试上进行了一系列验证实验。我们提出的方法在KITTI测试数据集上的所有多模式MOT方法中实现了最先进的性能和最低的身份开关(IDS)值(CAR为23,行人为137)。并且我们提出的方法在Nuscenes测试数据集上以75.3%的AMOTA进行了所有算法中的最新性能。
translated by 谷歌翻译
由于卷积神经网络(CNN)在过去的十年中检测成功,多对象跟踪(MOT)通过检测方法的使用来控制。随着数据集和基础标记网站的发布,研究方向已转向在跟踪时在包括重新识别对象的通用场景(包括重新识别(REID))上的最佳准确性。在这项研究中,我们通过提供专用的行人数据集并专注于对性能良好的多对象跟踪器的深入分析来缩小监视的范围)现实世界应用的技术。为此,我们介绍SOMPT22数据集;一套新的,用于多人跟踪的新套装,带有带注释的简短视频,该视频从位于杆子上的静态摄像头捕获,高度为6-8米,用于城市监视。与公共MOT数据集相比,这提供了室外监视的MOT的更为集中和具体的基准。我们分析了该新数据集上检测和REID网络的使用方式,分析了将MOT跟踪器分类为单发和两阶段。我们新数据集的实验结果表明,SOTA远非高效率,而单一跟踪器是统一快速执行和准确性的良好候选者,并具有竞争性的性能。该数据集将在以下网址提供:sompt22.github.io
translated by 谷歌翻译
以前的在线3D多对象跟踪(3DMOT)方法在与几帧的新检测无关时终止ROCKET。但是如果一个物体刚刚变暗,就像被其他物体暂时封闭或者只是从FOV暂时封闭一样,过早地终止ROCKET将导致身份切换。我们揭示了过早的轨迹终端是现代3DMOT系统中身份开关的主要原因。为了解决这个问题,我们提出了一个不朽的跟踪器,一个简单的跟踪系统,它利用轨迹预测来维护对象变暗的物体的轨迹。我们使用一个简单的卡尔曼滤波器进行轨迹预测,并在目标不可见时通过预测保留轨迹。通过这种方法,我们可以避免由过早托管终止产生的96%的车辆标识开关。如果没有任何学习的参数,我们的方法在Waymo Open DataSet测试集上的车载类别的0.0001级和竞争Mota处实现了不匹配的比率。我们的不匹配比率比任何先前发表的方法低一倍。在NUSCENes上报告了类似的结果。我们相信拟议的不朽追踪器可以为推动3DMOT的极限提供简单而强大的解决方案。我们的代码可在https://github.com/immortaltracker/immortaltracker中找到。
translated by 谷歌翻译
最近的多目标跟踪(MOT)系统利用高精度的对象探测器;然而,培训这种探测器需要大量标记的数据。虽然这种数据广泛适用于人类和车辆,但其他动物物种显着稀缺。我们目前稳健的置信跟踪(RCT),一种算法,旨在保持鲁棒性能,即使检测质量差。与丢弃检测置信信息的先前方法相比,RCT采用基本上不同的方法,依赖于精确的检测置信度值来初始化曲目,扩展轨道和滤波器轨道。特别地,RCT能够通过有效地使用低置信度检测(以及单个物体跟踪器)来最小化身份切换,以保持对象的连续轨道。为了评估在存在不可靠的检测中的跟踪器,我们提出了一个挑战的现实世界水下鱼跟踪数据集,Fishtrac。在对FISHTRAC以及UA-DETRAC数据集的评估中,我们发现RCT在提供不完美的检测时优于其他算法,包括最先进的深单和多目标跟踪器以及更经典的方法。具体而言,RCT具有跨越方法的最佳平均热量,可以成功返回所有序列的结果,并且具有比其他方法更少的身份交换机。
translated by 谷歌翻译
为了克服多个对象跟踪任务中的挑战,最近的算法将交互线索与运动和外观特征一起使用。这些算法使用图形神经网络或变压器来提取导致高计算成本的交互功能。在本文中,提出了一种基于几何特征的新型交互提示,旨在检测遮挡和重新识别计算成本低的丢失目标。此外,在大多数算法中,摄像机运动被认为可以忽略不计,这是一个强有力的假设,并不总是正确的,并且导致目标转换或目标不匹配。在本文中,提出了一种测量相机运动和删除其效果的方法,可有效地降低相机运动对跟踪的影响。该算法在MOT17和MOT20数据集上进行了评估,并在MOT20上实现了MOT17的最先进性能和可比较的结果。该代码也可以公开使用。
translated by 谷歌翻译
3D多对象跟踪(MOT)近年来目睹了众多新颖的基准和方法,尤其是那些在“逐侦测”范式下的基准。尽管他们的进步和有用,但对他们的优势和劣势的深入分析尚不可用。在本文中,我们通过将它们分解为四个组成部分来总结当前的3D MOL方法:检测,关联,运动模型和生命周期管理的预处理。然后,我们将现有算法的故障情况归因于每个组件并详细研究它们。基于分析,我们提出了相应的改进,导致强大但简单的基线:简单进展。 Waymo Open DataSet和Nuscenes上的综合实验结果表明,我们的最终方法可以通过微小的修改来实现新的最先进的结果。此外,我们采取额外的步骤并重新思考当前的基准面是否真实地反映了真实挑战的算法能力。我们深入了解现有基准的细节,并找到一些有趣的事实。最后,我们分析了\ name \中剩余失败的分布和原因,并提出了3D MOT的未来方向。我们的代码可在https://github.com/tusimple/simpletrack获得。
translated by 谷歌翻译
对象运动和对象外观是多个对象跟踪(MOT)应用中的常用信息,用于将帧跨越帧的检测相关联,或用于联合检测和跟踪方法的直接跟踪预测。然而,不仅是这两种类型的信息通常是单独考虑的,而且它们也没有帮助直接从当前感兴趣帧中使用视觉信息的用法。在本文中,我们提出了PatchTrack,一种基于变压器的联合检测和跟踪系统,其使用当前感兴趣的帧帧的曲线预测曲目。我们使用卡尔曼滤波器从前一帧预测当前帧中的现有轨道的位置。从预测边界框裁剪的补丁被发送到变压器解码器以推断新曲目。通过利用在补丁中编码的对象运动和对象外观信息,所提出的方法将更多地关注新曲目更有可能发生的位置。我们展示了近期MOT基准的Patchtrack的有效性,包括MOT16(MOTA 73.71%,IDF1 65.77%)和MOT17(MOTA 73.59%,IDF1 65.23%)。结果在https://motchallenge.net/method/mot=4725&chl=10上发布。
translated by 谷歌翻译
近年来,多个对象跟踪引起了研究人员的极大兴趣,它已成为计算机视觉中的趋势问题之一,尤其是随着自动驾驶的最新发展。 MOT是针对不同问题的关键视觉任务之一,例如拥挤的场景中的闭塞,相似的外观,小物体检测难度,ID切换等,以应对这些挑战,因为研究人员试图利用变压器的注意力机制,与田径的相互关系,与田径的相互关系,图形卷积神经网络,与暹罗网络不同帧中对象的外观相似性,他们还尝试了基于IOU匹配的CNN网络,使用LSTM的运动预测。为了将这些零散的技术在雨伞下采用,我们研究了过去三年发表的一百多篇论文,并试图提取近代研究人员更关注的技术来解决MOT的问题。我们已经征集了许多应用,可能性以及MOT如何与现实生活有关。我们的评论试图展示研究人员使用过时的技术的不同观点,并为潜在的研究人员提供了一些未来的方向。此外,我们在这篇评论中包括了流行的基准数据集和指标。
translated by 谷歌翻译
一方面,在最近的文献中,许多3D多对象跟踪(MOT)的作品集中在跟踪准确性和被忽视的计算速度上,通常是通过设计相当复杂的成本功能和功能提取器来进行的。另一方面,某些方法以跟踪准确性为代价过多地关注计算速度。鉴于这些问题,本文提出了一种强大而快速的基于相机融合的MOT方法,该方法在准确性和速度之间取决于良好的权衡。依靠相机和激光雷达传感器的特性,设计并嵌入了提出的MOT方法中的有效的深层关联机制。该关联机制在对象远处并仅由摄像机检测到2D域中的对象,并在对象出现在LIDAR的视野中以实现平滑融合时获得的2D轨迹进行更新,并更新2D轨迹。 2D和3D轨迹。基于典型数据集的广泛实验表明,就跟踪准确性和处理速度而言,我们提出的方法在最先进的MOT方法上具有明显的优势。我们的代码可公开用于社区的利益。
translated by 谷歌翻译
Existing Multiple Object Tracking (MOT) methods design complex architectures for better tracking performance. However, without a proper organization of input information, they still fail to perform tracking robustly and suffer from frequent identity switches. In this paper, we propose two novel methods together with a simple online Message Passing Network (MPN) to address these limitations. First, we explore different integration methods for the graph node and edge embeddings and put forward a new IoU (Intersection over Union) guided function, which improves long term tracking and handles identity switches. Second, we introduce a hierarchical sampling strategy to construct sparser graphs which allows to focus the training on more difficult samples. Experimental results demonstrate that a simple online MPN with these two contributions can perform better than many state-of-the-art methods. In addition, our association method generalizes well and can also improve the results of private detection based methods.
translated by 谷歌翻译