多目标多摄像机跟踪(MTMCT)中的数据关联通常从重新识别(RE-ID)特征距离直接估计亲和力。但是,我们认为它可能不是最佳选择,因为匹配范围与MTMCT问题之间的匹配范围差异。重新ID系统专注于全局匹配,从而从所有相机和常规检索目标。相反,跟踪中的数据关联是一个本地匹配问题,因为其候选者仅来自相邻位置和时间框架。在本文中,我们设计实验,以验证全局重新ID功能距离和本地匹配在跟踪中的本地匹配之间的这种错误,并提出了一种简单但有效的方法来适应MTMCT中的相应匹配范围。我们不是尝试处理所有外观变化,而不是在数据关联期间专门调整关联度量来专门化。为此,我们介绍了一种新的数据采样方案,其中包含用于跟踪中的数据关联的时间窗口。自适应亲和模块最小化不匹配,对全局重新ID距离具有显着的改进,并在CityFlow和DukemTMC数据集中生成竞争性能。
translated by 谷歌翻译
多对象跟踪(MOT)是最基本的计算机视觉任务之一,它有助于各种视频分析应用程序。尽管最近取得了有希望的进展,但当前的MOT研究仍仅限于输入流的固定采样帧速率。实际上,我们从经验上发现,当输入帧速率变化时,所有最新最新跟踪器的准确性都会急剧下降。对于更智能的跟踪解决方案,我们将研究工作的注意力转移到了帧速率不可知MOT(FRAMOT)的问题上。在本文中,我们建议使用定期培训计划(FAPS)的帧速率不可知的MOT框架,以首次解决FRAMOT问题。具体而言,我们提出了一个帧速率不可知协会模块(FAAM),该模块(FAAM)渗透并编码帧速率信息,以帮助跨多帧速率输入的身份匹配,从而提高了学习模型在处理FRAMOT中复杂的运动体验关系方面的能力。此外,FRAMOT中训练和推理之间的关联差距扩大,因为训练中未包含的那些后处理步骤在较低的帧速率方案中产生了更大的影响。为了解决这个问题,我们建议定期培训计划(PTS),以通过跟踪模式匹配和融合来反映培训中的所有后处理步骤。除了提出的方法外,我们首次尝试以两种不同的模式(即已知的帧速率和未知帧速率)建立这项新任务的评估方法,旨在处理更复杂的情况。在具有挑战性的MOT数据集(FRAMOT版本)上进行的定量实验清楚地表明,所提出的方法可以更好地处理不同的帧速率,从而改善对复杂情况的鲁棒性。
translated by 谷歌翻译
3D多对象跟踪(MOT)确保在连续动态检测过程中保持一致性,有利于自动驾驶中随后的运动计划和导航任务。但是,基于摄像头的方法在闭塞情况下受到影响,准确跟踪基于激光雷达的方法的对象的不规则运动可能是具有挑战性的。某些融合方法效果很好,但不认为在遮挡下出现外观特征的不可信问题。同时,错误检测问题也显着影响跟踪。因此,我们根据组合的外观运动优化(Camo-Mot)提出了一种新颖的相机融合3D MOT框架,该框架使用相机和激光镜数据,并大大减少了由遮挡和错误检测引起的跟踪故障。对于遮挡问题,我们是第一个提出遮挡头来有效地选择最佳对象外观的人,从而减少了闭塞的影响。为了减少错误检测在跟踪中的影响,我们根据置信得分设计一个运动成本矩阵,从而提高了3D空间中的定位和对象预测准确性。由于现有的多目标跟踪方法仅考虑一个类别,因此我们还建议建立多类损失,以在多类别场景中实现多目标跟踪。在Kitti和Nuscenes跟踪基准测试上进行了一系列验证实验。我们提出的方法在KITTI测试数据集上的所有多模式MOT方法中实现了最先进的性能和最低的身份开关(IDS)值(CAR为23,行人为137)。并且我们提出的方法在Nuscenes测试数据集上以75.3%的AMOTA进行了所有算法中的最新性能。
translated by 谷歌翻译
多摄像机多对象跟踪目前在计算机视野中引起了注意力,因为它在现实世界应用中的卓越性能,如具有拥挤场景或巨大空间的视频监控。在这项工作中,我们提出了一种基于空间升降的多乳制型配方的数学上优雅的多摄像多对象跟踪方法。我们的模型利用单摄像头跟踪器产生的最先进的TOOTWLET作为提案。由于这些Tracklet可能包含ID-Switch错误,因此我们通过从3D几何投影获得的新型预簇来完善它们。因此,我们派生了更好的跟踪图,没有ID交换机,更精确的数据关联阶段的亲和力成本。然后通过求解全局提升的多乳制型制剂,将轨迹与多摄像机轨迹匹配,该组件包含位于同一相机和相互相机间的Tracklet上的短路和远程时间交互。在Wildtrack DataSet的实验结果是近乎完美的结果,在校园上表现出最先进的追踪器,同时在PETS-09数据集上处于校准状态。我们将在接受纸质时进行我们的实施。
translated by 谷歌翻译
视频中的多目标跟踪需要解决相邻帧中对象之间一对一分配的基本问题。大多数方法通过首先丢弃不可能的对距离大于阈值的不可能对解决问题,然后使用匈牙利算法将对象链接起来以最大程度地减少整体距离。但是,我们发现从重新ID特征计算出的距离的分布可能在不同的视频中有很大差异。因此,没有一个最佳阈值可以使我们安全丢弃不可能的对。为了解决该问题,我们提出了一种有效的方法来实时计算每对对象的边际概率。边际概率可以视为标准化距离,比原始特征距离明显稳定。结果,我们可以为所有视频使用一个阈值。该方法是一般的,可以应用于现有的跟踪器,以在IDF1度量方面获得大约一个点改进。它在MOT17和MOT20基准上取得了竞争成果。此外,计算的概率更容易解释,从而有助于后续后期处理操作。
translated by 谷歌翻译
近年来,多个对象跟踪引起了研究人员的极大兴趣,它已成为计算机视觉中的趋势问题之一,尤其是随着自动驾驶的最新发展。 MOT是针对不同问题的关键视觉任务之一,例如拥挤的场景中的闭塞,相似的外观,小物体检测难度,ID切换等,以应对这些挑战,因为研究人员试图利用变压器的注意力机制,与田径的相互关系,与田径的相互关系,图形卷积神经网络,与暹罗网络不同帧中对象的外观相似性,他们还尝试了基于IOU匹配的CNN网络,使用LSTM的运动预测。为了将这些零散的技术在雨伞下采用,我们研究了过去三年发表的一百多篇论文,并试图提取近代研究人员更关注的技术来解决MOT的问题。我们已经征集了许多应用,可能性以及MOT如何与现实生活有关。我们的评论试图展示研究人员使用过时的技术的不同观点,并为潜在的研究人员提供了一些未来的方向。此外,我们在这篇评论中包括了流行的基准数据集和指标。
translated by 谷歌翻译
The recent trend in multiple object tracking (MOT) is jointly solving detection and tracking, where object detection and appearance feature (or motion) are learned simultaneously. Despite competitive performance, in crowded scenes, joint detection and tracking usually fail to find accurate object associations due to missed or false detections. In this paper, we jointly model counting, detection and re-identification in an end-to-end framework, named CountingMOT, tailored for crowded scenes. By imposing mutual object-count constraints between detection and counting, the CountingMOT tries to find a balance between object detection and crowd density map estimation, which can help it to recover missed detections or reject false detections. Our approach is an attempt to bridge the gap of object detection, counting, and re-Identification. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to failure in crowded scenes, or depend on local correlations to build a graphical relationship for matching targets. The proposed MOT tracker can perform online and real-time tracking, and achieves the state-of-the-art results on public benchmarks MOT16 (MOTA of 77.6), MOT17 (MOTA of 78.0%) and MOT20 (MOTA of 70.2%).
translated by 谷歌翻译
The problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-bydetection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. To this end, we exploit the bounding box regression of an object detector to predict the position of an object in the next frame, thereby converting a detector into a Tracktor. We demonstrate the potential of Tracktor and provide a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation.We then perform an analysis on the performance and failure cases of several state-of-the-art tracking methods in comparison to our Tracktor. Surprisingly, none of the dedicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small and occluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigm and point out promising future research directions. Overall, Tracktor yields superior tracking performance than any current tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.
translated by 谷歌翻译
Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That's it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves 67.8% MOTA on the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves 28.3% AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.
translated by 谷歌翻译
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.
translated by 谷歌翻译
To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings. Yet, camera-based 3D object tracking methods prioritize optimizing the single-camera setup and resort to post-hoc fusion in a multi-camera setup. In this paper, we propose a method for panoramic 3D object tracking, called CC-3DT, that associates and models object trajectories both temporally and across views, and improves the overall tracking consistency. In particular, our method fuses 3D detections from multiple cameras before association, reducing identity switches significantly and improving motion modeling. Our experiments on large-scale driving datasets show that fusion before association leads to a large margin of improvement over post-hoc fusion. We set a new state-of-the-art with 12.6% improvement in average multi-object tracking accuracy (AMOTA) among all camera-based methods on the competitive NuScenes 3D tracking benchmark, outperforming previously published methods by 6.5% in AMOTA with the same 3D detector.
translated by 谷歌翻译
多对象跟踪(MOT)的目标是检测和跟踪场景中的所有对象,同时为每个对象保留唯一的标识符。在本文中,我们提出了一种新的可靠的最新跟踪器,该跟踪器可以结合运动和外观信息的优势,以及摄像机运动补偿以及更准确的Kalman滤波器状态矢量。我们的新跟踪器在Mot17和Mot20测试集的Motchallenge [29,11]的数据集[29,11]中,Bot-Sort-Reid排名第一,就所有主要MOT指标而言:MOTA,IDF1和HOTA。对于Mot17:80.5 Mota,80.2 IDF1和65.0 HOTA。源代码和预培训模型可在https://github.com/niraharon/bot-sort上找到
translated by 谷歌翻译
将对象检测和ID嵌入提取到统一网络的单次多对象跟踪,近年来取得了开创性的结果。然而,目前的单次追踪器仅依赖于单帧检测来预测候选界限盒,当面对灾难性的视觉下降时,例如运动模糊,闭塞时可能是不可靠的。一旦检测器错误地被错误地归类为背景,将不再维护其相应的ROCKLET的时间一致性。在本文中,我们首先通过提出重新检查网络恢复被错误分类为“假背景”的边界框。重新检查网络创新地扩展了ID从数据关联嵌入ID的角色,以通过有效地将先前的轨迹传播到具有小开销的当前帧的运动预测。请注意,传播结果由独立和有效的嵌入搜索产生,防止模型过度依赖于检测结果。最终,它有助于重新加载“假背景”并修复破碎的Tracklet。在强大的基线Cstrack上建立一个新的单次追踪器,分别通过70.7 $ 76.4,70.6 $ \右前场达到76.3美元的MOT17和MOT17。它还达到了新的最先进的Mota和IDF1性能。代码在https://github.com/judasdie/sots发布。
translated by 谷歌翻译
当前的多类多类别对象跟踪(MOT)指标使用类标签来分组跟踪结果以进行每类评估。同样,MOT方法通常仅将对象与相同的类预测相关联。这两种MOT中的普遍策略隐含地假设分类性能几乎完美。但是,这远非最近的大型MOT数据集中的情况,这些数据集包含许多罕见或语义上类似类别的类别。因此,所得的不正确分类导致跟踪器的基准跟踪和基准不足。我们通过将分类与跟踪无关,以解决这些问题。我们引入了一个新的指标,跟踪所有准确性(TETA),将跟踪测量测量分为三个子因素:本地化,关联和分类,即使在不准确的分类下,也可以全面地跟踪性能的基准测试。 TETA还处理了大规模跟踪数据集中具有挑战性的不完整注释问题。我们进一步介绍了使用类示例匹配(CEM)执行关联的每件事跟踪器(TETER)。我们的实验表明,TETA对跟踪器进行更全面的评估,并且与最先进的ART相比,TETE对挑战性的大规模数据集BDD100K和TAO进行了重大改进。
translated by 谷歌翻译
How would you fairly evaluate two multi-object tracking algorithms (i.e. trackers), each one employing a different object detector? Detectors keep improving, thus trackers can make less effort to estimate object states over time. Is it then fair to compare a new tracker employing a new detector with another tracker using an old detector? In this paper, we propose a novel performance measure, named Tracking Effort Measure (TEM), to evaluate trackers that use different detectors. TEM estimates the improvement that the tracker does with respect to its input data (i.e. detections) at frame level (intra-frame complexity) and sequence level (inter-frame complexity). We evaluate TEM over well-known datasets, four trackers and eight detection sets. Results show that, unlike conventional tracking evaluation measures, TEM can quantify the effort done by the tracker with a reduced correlation on the input detections. Its implementation is publicly available online at https://github.com/vpulab/MOT-evaluation.
translated by 谷歌翻译
The tracking-by-detection paradigm today has become the dominant method for multi-object tracking and works by detecting objects in each frame and then performing data association across frames. However, its sequential frame-wise matching property fundamentally suffers from the intermediate interruptions in a video, such as object occlusions, fast camera movements, and abrupt light changes. Moreover, it typically overlooks temporal information beyond the two frames for matching. In this paper, we investigate an alternative by treating object association as clip-wise matching. Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips. The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames, and the short clip tracking avoids the conventional error-prone long-term track memory management. Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching. Given the state-of-the-art tracking-by-detection tracker, QDTrack, we showcase how the tracking performance improves with our new tracking formulation. We evaluate our proposals on two tracking benchmarks, TAO and MOT17 that have complementary characteristics and challenges each other.
translated by 谷歌翻译
以前的在线3D多对象跟踪(3DMOT)方法在与几帧的新检测无关时终止ROCKET。但是如果一个物体刚刚变暗,就像被其他物体暂时封闭或者只是从FOV暂时封闭一样,过早地终止ROCKET将导致身份切换。我们揭示了过早的轨迹终端是现代3DMOT系统中身份开关的主要原因。为了解决这个问题,我们提出了一个不朽的跟踪器,一个简单的跟踪系统,它利用轨迹预测来维护对象变暗的物体的轨迹。我们使用一个简单的卡尔曼滤波器进行轨迹预测,并在目标不可见时通过预测保留轨迹。通过这种方法,我们可以避免由过早托管终止产生的96%的车辆标识开关。如果没有任何学习的参数,我们的方法在Waymo Open DataSet测试集上的车载类别的0.0001级和竞争Mota处实现了不匹配的比率。我们的不匹配比率比任何先前发表的方法低一倍。在NUSCENes上报告了类似的结果。我们相信拟议的不朽追踪器可以为推动3DMOT的极限提供简单而强大的解决方案。我们的代码可在https://github.com/immortaltracker/immortaltracker中找到。
translated by 谷歌翻译
在监控和搜索和救援应用程序中,重要的是在低端设备上实时执行多目标跟踪(MOT)。今天的MOT解决方案采用深度神经网络,往往具有高计算复杂性。识别帧大小对跟踪性能的影响,我们提出了深度,一种模型不可知框架尺寸选择方法,可在现有的全卷积网络基跟踪器之上进行操作,以加速跟踪吞吐量。在培训阶段,我们将可检测性分数纳入单次跟踪器架构,使得DeepScale以自我监督的方式学习不同帧大小的表示估计。在推理期间,它可以根据基于用户控制参数根据视觉内容的复杂性来调整帧大小。为了利用边缘服务器上的计算资源,我们提出了两个计算分区模式,即仅使用自适应帧大小传输和边缘服务器辅助跟踪仅适用于MOT,即边缘服务器。 MOT数据集的广泛实验和基准测试证明了深度的有效性和灵活性。与最先进的追踪器相比,DeepScale ++,DeepScale的变种实现1.57倍加速,仅在一个配置中的MOT15数据集上跟踪准确性。我们已经实现和评估了DeepScale ++,以及由NVIDIA JETSON TX2板和GPU服务器组成的小型测试平台上所提出的计算分区方案。实验显示与仅服务器或智能相机的解决方案相比跟踪性能和延迟之间的非琐碎权衡。
translated by 谷歌翻译
Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms. In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches. In spirit of the original framework we place much of the computational complexity into an offline pre-training stage where we learn a deep association metric on a largescale person re-identification dataset. During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space. Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates.
translated by 谷歌翻译
一方面,在最近的文献中,许多3D多对象跟踪(MOT)的作品集中在跟踪准确性和被忽视的计算速度上,通常是通过设计相当复杂的成本功能和功能提取器来进行的。另一方面,某些方法以跟踪准确性为代价过多地关注计算速度。鉴于这些问题,本文提出了一种强大而快速的基于相机融合的MOT方法,该方法在准确性和速度之间取决于良好的权衡。依靠相机和激光雷达传感器的特性,设计并嵌入了提出的MOT方法中的有效的深层关联机制。该关联机制在对象远处并仅由摄像机检测到2D域中的对象,并在对象出现在LIDAR的视野中以实现平滑融合时获得的2D轨迹进行更新,并更新2D轨迹。 2D和3D轨迹。基于典型数据集的广泛实验表明,就跟踪准确性和处理速度而言,我们提出的方法在最先进的MOT方法上具有明显的优势。我们的代码可公开用于社区的利益。
translated by 谷歌翻译