作为现代深度学习框架的静态计算图的一部分,评估可可平均平均精度(MAP)和可可召回指标会带来一系列独特的挑战。这些挑战包括需要保持动态大小的状态以计算平均平均精度,对全局数据集级别统计数据计算指标的依赖,以及管理批次中图像之间的边界框不同的数量。结果,研究人员和从业人员将可可指标评估为培训后评估步骤是普遍的实践。使用图形友好的算法来计算可可平均的平均精度和回忆,可以在训练时间评估这些指标,从而提高通过训练曲线图的指标演变的可见性,并在原型进行新模型版本时降低迭代时间。我们的贡献包括平均平均精度的准确近似算法,可可平均平均精度和可可召回的开源实现,广泛的数值基准测试以验证我们实施的准确性以及包括火车时间评估的开源培训循环平均平均精度和回忆。
translated by 谷歌翻译
流行的对象检测度量平均精度(3D AP)依赖于预测的边界框和地面真相边界框之间的结合。但是,基于摄像机的深度估计的精度有限,这可能会导致其他合理的预测,这些预测遭受了如此纵向定位错误,被视为假阳性和假阴性。因此,我们提出了流行的3D AP指标的变体,这些变体旨在在深度估计误差方面更具允许性。具体而言,我们新颖的纵向误差耐受度指标,Let-3D-AP和Let-3D-APL,允许预测的边界框的纵向定位误差,最高为给定的公差。所提出的指标已在Waymo Open DataSet 3D摄像头仅检测挑战中使用。我们认为,它们将通过提供更有信息的性能信号来促进仅相机3D检测领域的进步。
translated by 谷歌翻译
本文研究了涉及对象集,对象检测,实例级分段和多对象跟踪的基本视觉任务的性能评估标准。现有标准的算法排名可能会以不同的参数选择波动,例如联合(IOU)阈值的交叉点使他们的评估不可靠。更重要的是,没有能够验证我们是否可以相信标准的评估。这项工作提出了对性能标准的可信赖性的概念,该概念需要(i)对可靠性的参数鲁棒性,(ii)理智测试中的上下文意义,以及(iii)与数学要求(例如度量属性)的一致性。我们观察到这些要求被许多广泛使用的标准忽略了,并使用一组形状的指标探索替代标准。我们还根据建议的可信度要求评估所有这些标准。
translated by 谷歌翻译
Confluence是对对象检测的边界框后处理中的非墨西哥抑制(NMS)替代的新型非交流(IOU)替代方案。它克服了基于IOU的NMS变体的固有局限性,以通过使用归一化的曼哈顿距离启发的接近度度量来表示边界框聚类的更稳定,一致的预测指标来表示边界框群集。与贪婪和柔软的NMS不同,它不仅依赖分类置信度得分来选择最佳边界框,而是选择与给定群集中最接近其他盒子的框并删除高度汇合的相邻框。在MS Coco和CrowdHuman基准测试中,汇合的平均精度最高2.3-3.8%,而平均召回率则与DEACTO标准和ART NMS NMS变体相比,平均召回率最高为5.3-7.2%。广泛的定性分析和阈值灵敏度分析实验支持了定量结果,这支持了结论,即汇合比NMS变体更健壮。 Confluence代表边界框处理中的范式变化,有可能在边界框回归过程中替换IOU。
translated by 谷歌翻译
空中无人机镜头的视觉检查是当今土地搜索和救援(SAR)运营的一个组成部分。由于此检查是对人类的缓慢而繁琐,令人疑惑的工作,我们提出了一种新颖的深入学习算法来自动化该航空人员检测(APD)任务。我们试验模型架构选择,在线数据增强,转移学习,图像平铺和其他几种技术,以提高我们方法的测试性能。我们将新型航空检验视网膜(空气)算法呈现为这些贡献的结合。空中探测器在精度(〜21个百分点增加)和速度方面,在常用的SAR测试数据上表现出最先进的性能。此外,我们为SAR任务中的APD问题提供了新的正式定义。也就是说,我们提出了一种新的评估方案,在现实世界SAR本地化要求方面排名探测器。最后,我们提出了一种用于稳健的新型后处理方法,近似对象定位:重叠边界框(MOB)算法的合并。在空中检测器中使用的最终处理阶段在真实的空中SAR任务面前显着提高了其性能和可用性。
translated by 谷歌翻译
Asteroids are an indelible part of most astronomical surveys though only a few surveys are dedicated to their detection. Over the years, high cadence microlensing surveys have amassed several terabytes of data while scanning primarily the Galactic Bulge and Magellanic Clouds for microlensing events and thus provide a treasure trove of opportunities for scientific data mining. In particular, numerous asteroids have been observed by visual inspection of selected images. This paper presents novel deep learning-based solutions for the recovery and discovery of asteroids in the microlensing data gathered by the MOA project. Asteroid tracklets can be clearly seen by combining all the observations on a given night and these tracklets inform the structure of the dataset. Known asteroids were identified within these composite images and used for creating the labelled datasets required for supervised learning. Several custom CNN models were developed to identify images with asteroid tracklets. Model ensembling was then employed to reduce the variance in the predictions as well as to improve the generalisation error, achieving a recall of 97.67%. Furthermore, the YOLOv4 object detector was trained to localize asteroid tracklets, achieving a mean Average Precision (mAP) of 90.97%. These trained networks will be applied to 16 years of MOA archival data to find both known and unknown asteroids that have been observed by the survey over the years. The methodologies developed can be adapted for use by other surveys for asteroid recovery and discovery.
translated by 谷歌翻译
Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for regressing the parameters of a bounding box and maximizing this metric value. The optimal objective for a metric is the metric itself. In the case of axisaligned 2D bounding boxes, it can be shown that IoU can be directly used as a regression loss. However, IoU has a plateau making it infeasible to optimize in the case of nonoverlapping bounding boxes. In this paper, we address the weaknesses of IoU by introducing a generalized version as both a new loss and a new metric. By incorporating this generalized IoU (GIoU ) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, IoU based, and new, GIoU based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.
translated by 谷歌翻译
最近的多目标跟踪(MOT)系统利用高精度的对象探测器;然而,培训这种探测器需要大量标记的数据。虽然这种数据广泛适用于人类和车辆,但其他动物物种显着稀缺。我们目前稳健的置信跟踪(RCT),一种算法,旨在保持鲁棒性能,即使检测质量差。与丢弃检测置信信息的先前方法相比,RCT采用基本上不同的方法,依赖于精确的检测置信度值来初始化曲目,扩展轨道和滤波器轨道。特别地,RCT能够通过有效地使用低置信度检测(以及单个物体跟踪器)来最小化身份切换,以保持对象的连续轨道。为了评估在存在不可靠的检测中的跟踪器,我们提出了一个挑战的现实世界水下鱼跟踪数据集,Fishtrac。在对FISHTRAC以及UA-DETRAC数据集的评估中,我们发现RCT在提供不完美的检测时优于其他算法,包括最先进的深单和多目标跟踪器以及更经典的方法。具体而言,RCT具有跨越方法的最佳平均热量,可以成功返回所有序列的结果,并且具有比其他方法更少的身份交换机。
translated by 谷歌翻译
Non-maximum suppression is an integral part of the object detection pipeline. First, it sorts all detection boxes on the basis of their scores. The detection box M with the maximum score is selected and all other detection boxes with a significant overlap (using a pre-defined threshold) with M are suppressed. This process is recursively applied on the remaining boxes. As per the design of the algorithm, if an object lies within the predefined overlap threshold, it leads to a miss. To this end, we propose Soft-NMS, an algorithm which decays the detection scores of all other objects as a continuous function of their overlap with M. Hence, no object is eliminated in this process. Soft-NMS obtains consistent improvements for the coco-style mAP metric on standard datasets like PASCAL VOC 2007 (1.7% for both R-FCN and Faster-RCNN) and MS-COCO (1.3% for R-FCN and 1.1% for Faster-RCNN) by just changing the NMS algorithm without any additional hyper-parameters. UsingDeformable-RFCN, Soft-NMS improves state-of-the-art in object detection from 39.8% to 40.9% with a single model. Further, the computational complexity of Soft-NMS is the same as traditional NMS and hence it can be efficiently implemented. Since Soft-NMS does not require any extra training and is simple to implement, it can be easily integrated into any object detection pipeline. Code for Soft-NMS is publicly available on GitHub http://bit.ly/ 2nJLNMu.
translated by 谷歌翻译
自动对象检测器的本地化质量通常通过联合(IOU)分数进行评估。在这项工作中,我们表明人类对本地化质量有不同的看法。为了评估这一点,我们对70多名参与者进行了调查。结果表明,对于以完全相同的评分而言,人类可能不会认为这些错误是相等的,并且表达了偏好。我们的工作是第一个与人类一起评估IOU的工作,并清楚地表明,仅依靠IOU分数来评估本地化错误可能还不够。
translated by 谷歌翻译
在本文中,我们通过将无线电信息结合到最先进的检测方法中提出了一种无线电辅助人类检测框架,包括基于锚的oneStage检测器和两级检测器。我们从无线电信号中提取无线电定位和标识符信息以帮助人类检测,由于哪种错误阳性和假否定的问题可能会大大缓解。对于两个探测器,我们使用基于无线电定位的置信度评分修订来提高检测性能。对于两级检测方法,我们建议利用无线电定位产生的区域提案,而不是依赖于区域提案网络(RPN)。此外,利用无线电标识符信息,还提出了具有无线电定位约束的非最大抑制方法,以进一步抑制假检测并减少错过的检测。模拟Microsoft Coco DataSet和CALTECH步行数据集的实验表明,借助无线电信息可以改善平均平均精度(地图)和最先进的检测方法的错过率。最后,我们在现实世界的情况下进行实验,以展示我们在实践中的提出方法的可行性。
translated by 谷歌翻译
通过查找图像可能不满意的图像来捕获对象检测器的错误行为,这一兴趣很长。在实际应用(例如自动驾驶)中,对于表征除了简单的检测性能要求之外的潜在失败也至关重要。例如,与远处未遗漏的汽车检测相比,错过对靠近自我车辆的行人的侦查通常需要更仔细的检查。在测试时间预测这种潜在失败的问题在文献和基于检测不确定性的传统方法中被忽略了,因为它们对这种错误的细粒度表征不可知。在这项工作中,我们建议将查找“硬”图像作为基于查询的硬图像检索任务的问题进行重新制定,其中查询是“硬度”的特定定义,并提供了一种简单而直观的方法,可以解决此任务大型查询家庭。我们的方法完全是事后的,不需要地面真相注释,独立于检测器的选择,并且依赖于有效的蒙特卡洛估计,该估计使用简单的随机模型代替地面真相。我们通过实验表明,它可以成功地应用于各种查询中,它可以可靠地识别给定检测器的硬图像,而无需任何标记的数据。我们使用广泛使用的视网膜,更快的RCNN,Mask-RCNN和CASCADE MASK-RCNN对象检测器提供有关排名和分类任务的结果。
translated by 谷歌翻译
交通灯检测对于自动驾驶汽车在城市地区安全导航至关重要。公开可用的交通灯数据集不足以开发用于检测提供重要导航信息的遥远交通信号灯的算法。我们介绍了一个新颖的基准交通灯数据集,该数据集使用一对涵盖城市和半城市道路的狭窄角度和广角摄像机捕获。我们提供1032张训练图像和813个同步图像对进行测试。此外,我们提供同步视频对进行定性分析。该数据集包括第1920 $ \ times $ 1080的分辨率图像,覆盖10个不同类别。此外,我们提出了一种用于结合两个相机输出的后处理算法。结果表明,与使用单个相机框架的传统方法相比,我们的技术可以在速度和准确性之间取得平衡。
translated by 谷歌翻译
在对象检测中,当检测器未能检测到目标对象时,会出现假阴性。为了了解为什么对象检测产生假阴性,我们确定了五个“假负机制”,其中每个机制都描述了检测器体系结构内部的特定组件如何失败。着眼于两阶段和一阶段锚点对象检测器体系结构,我们引入了一个框架,用于量化这些虚假的负面机制。使用此框架,我们调查了为什么更快的R-CNN和视网膜无法检测基准视觉数据集和机器人数据集中的对象。我们表明,检测器的假负机制在计算机视觉基准数据集和机器人部署方案之间存在显着差异。这对为机器人应用程序开发的对象检测器的翻译具有影响。
translated by 谷歌翻译
The task of locating and classifying different types of vehicles has become a vital element in numerous applications of automation and intelligent systems ranging from traffic surveillance to vehicle identification and many more. In recent times, Deep Learning models have been dominating the field of vehicle detection. Yet, Bangladeshi vehicle detection has remained a relatively unexplored area. One of the main goals of vehicle detection is its real-time application, where `You Only Look Once' (YOLO) models have proven to be the most effective architecture. In this work, intending to find the best-suited YOLO architecture for fast and accurate vehicle detection from traffic images in Bangladesh, we have conducted a performance analysis of different variants of the YOLO-based architectures such as YOLOV3, YOLOV5s, and YOLOV5x. The models were trained on a dataset containing 7390 images belonging to 21 types of vehicles comprising samples from the DhakaAI dataset, the Poribohon-BD dataset, and our self-collected images. After thorough quantitative and qualitative analysis, we found the YOLOV5x variant to be the best-suited model, performing better than YOLOv3 and YOLOv5s models respectively by 7 & 4 percent in mAP, and 12 & 8.5 percent in terms of Accuracy.
translated by 谷歌翻译
尽管广泛用作可视检测任务的性能措施,但平均精度(AP)In(i)的限制在反映了本地化质量,(ii)对其计算的设计选择的鲁棒性以及其对输出的适用性没有信心分数。 Panoptic质量(PQ),提出评估Panoptic Seationation(Kirillov等,2019)的措施,不会遭受这些限制,而是限于Panoptic Seationation。在本文中,我们提出了基于其本地化和分类质量的视觉检测器的平均匹配误差,提出了定位召回精度(LRP)误差。 LRP错误,最初仅为Oksuz等人进行对象检测。 (2018),不遭受上述限制,适用于所有视觉检测任务。我们还介绍了最佳LRP(OLRP)错误,因为通过置信区获得的最小LRP错误以评估视觉检测器并获得部署的最佳阈值。我们提供对AP和PQ的LRP误差的详细比较分析,并使用七个可视检测任务(即对象检测,关键点检测,实例分割,Panoptic分段,视觉关系检测,使用近100个最先进的视觉检测器零拍摄检测和广义零拍摄检测)使用10个数据集来统一地显示LRP误差提供比其对应物更丰富和更辨别的信息。可用的代码:https://github.com/kemaloksuz/lrp-error
translated by 谷歌翻译
Object detectors, which are widely deployed in security-critical systems such as autonomous vehicles, have been found vulnerable to patch hiding attacks. An attacker can use a single physically-realizable adversarial patch to make the object detector miss the detection of victim objects and undermine the functionality of object detection applications. In this paper, we propose ObjectSeeker for certifiably robust object detection against patch hiding attacks. The key insight in ObjectSeeker is patch-agnostic masking: we aim to mask out the entire adversarial patch without knowing the shape, size, and location of the patch. This masking operation neutralizes the adversarial effect and allows any vanilla object detector to safely detect objects on the masked images. Remarkably, we can evaluate ObjectSeeker's robustness in a certifiable manner: we develop a certification procedure to formally determine if ObjectSeeker can detect certain objects against any white-box adaptive attack within the threat model, achieving certifiable robustness. Our experiments demonstrate a significant (~10%-40% absolute and ~2-6x relative) improvement in certifiable robustness over the prior work, as well as high clean performance (~1% drop compared with undefended models).
translated by 谷歌翻译
布局分析(LA)阶段对光学音乐识别(OMR)系统的正确性能至关重要。它标识了感兴趣的区域,例如Staves或歌词,然后必须处理,以便转录它们的内容。尽管存在基于深度学习的现代方法,但在不同模型的精度,它们对不同领域的概括或更重要的是,它们尚未开展对OMR的详尽研究,或者更重要的是,它们对后续阶段的影响管道。这项工作侧重于通过对不同神经结构,音乐文档类型和评估方案的实验研究填补文献中的这种差距。培训数据的需求也导致了一种新的半合成数据生成技术的提议,这使得LA方法在真实情况下能够有效适用性。我们的结果表明:(i)该模型的选择及其性能对于整个转录过程至关重要; (ii)(ii)常用于评估LA阶段的指标并不总是与OMR系统的最终性能相关,并且(iii)所提出的数据生成技术使最先进的结果能够以有限的限制实现标记数据集。
translated by 谷歌翻译
水果苍蝇是果实产量最有害的昆虫物种之一。在AlertTrap中,使用不同的最先进的骨干功能提取器(如MobiLenetv1和MobileNetv2)的SSD架构的实现似乎是实时检测问题的潜在解决方案。SSD-MobileNetv1和SSD-MobileNetv2表现良好并导致AP至0.5分别为0.957和1.0。YOLOV4-TINY优于SSD家族,在AP@0.5中为1.0;但是,其吞吐量速度略微慢。
translated by 谷歌翻译
How would you fairly evaluate two multi-object tracking algorithms (i.e. trackers), each one employing a different object detector? Detectors keep improving, thus trackers can make less effort to estimate object states over time. Is it then fair to compare a new tracker employing a new detector with another tracker using an old detector? In this paper, we propose a novel performance measure, named Tracking Effort Measure (TEM), to evaluate trackers that use different detectors. TEM estimates the improvement that the tracker does with respect to its input data (i.e. detections) at frame level (intra-frame complexity) and sequence level (inter-frame complexity). We evaluate TEM over well-known datasets, four trackers and eight detection sets. Results show that, unlike conventional tracking evaluation measures, TEM can quantify the effort done by the tracker with a reduced correlation on the input detections. Its implementation is publicly available online at https://github.com/vpulab/MOT-evaluation.
translated by 谷歌翻译