Crowd localization aims to predict the spatial position of humans in a crowd scenario. We observe that the performance of existing methods is challenged from two aspects: (i) ranking inconsistency between test and training phases; and (ii) fixed anchor resolution may underfit or overfit crowd densities of local regions. To address these problems, we design a supervision target reassignment strategy for training to reduce ranking inconsistency and propose an anchor pyramid scheme to adaptively determine the anchor density in each image region. Extensive experimental results on three widely adopted datasets (ShanghaiTech A\&B, JHU-CROWD++, UCF-QNRF) demonstrate the favorable performance against several state-of-the-art methods.
translated by 谷歌翻译
在本文中,我们专注于人群本地化任务,这是人群分析的关键主题。大多数基于回归的方法都利用卷积神经网络(CNN)回归密度图,该密度图无法准确地定位在极度密集的场景中,这两个至关重要的原因是:1)密度图由一系列模糊的高斯斑点组成,2)密度图的致密区域中存在严重的重叠。为了解决这个问题,我们为人群本地化任务提出了一个新颖的焦点反向变换(FIDT)图。与密度图相比,FIDT地图准确地描述了人们的位置,而不会在密集区域重叠。基于FIDT地图,得出了局部Maxima-detection-Strategy(LMDS),以有效地为每个人提取中心点。此外,我们引入了独立的SSIM(I-SSIM)损失,以使模型倾向于学习局部结构信息,从而更好地识别局部最大值。广泛的实验表明,提出的方法报告在六个人群数据集和一个车辆数据集上的最先进的本地化性能。此外,我们发现所提出的方法在负面和极密密集的场景上显示出优异的鲁棒性,这进一步验证了FIDT地图的有效性。该代码和模型将在https://github.com/dk-liang/fidtm上找到。
translated by 谷歌翻译
人群本地化(预测头部位置)是一项更实用,更高的任务,而不是仅仅计数。现有方法采用伪装框或预设计的本地化图,依靠复杂的后处理来获得头部位置。在本文中,我们提出了一个名为CLTR的优雅,端到端的人群本地化变压器,该变压器在基于回归的范式中解决了任务。所提出的方法将人群定位视为直接设置的预测问题,将提取的功能和可训练的嵌入作为变压器描述器的输入。为了减少模棱两可的点并产生更合理的匹配结果,我们引入了基于KMO的匈牙利匹配器,该匹配器采用附近的环境作为辅助匹配成本。在各种数据设置中在五个数据集上进行的广泛实验显示了我们方法的有效性。特别是,所提出的方法在NWPU-Crowd,UCF-QNRF和Shanghaitech a部分A部分上实现了最佳的本地化性能。
translated by 谷歌翻译
我们提出对象盒,这是一种新颖的单阶段锚定且高度可推广的对象检测方法。与现有的基于锚固的探测器和无锚的探测器相反,它们更偏向于其标签分配中的特定对象量表,我们仅将对象中心位置用作正样本,并在不同的特征级别中平均处理所有对象,而不论对象'尺寸或形状。具体而言,我们的标签分配策略将对象中心位置视为形状和尺寸不足的锚定,并以无锚固的方式锚定,并允许学习每个对象的所有尺度。为了支持这一点,我们将新的回归目标定义为从中心单元位置的两个角到边界框的四个侧面的距离。此外,为了处理比例变化的对象,我们提出了一个量身定制的损失来处理不同尺寸的盒子。结果,我们提出的对象检测器不需要在数据集中调整任何依赖数据集的超参数。我们在MS-Coco 2017和Pascal VOC 2012数据集上评估了我们的方法,并将我们的结果与最先进的方法进行比较。我们观察到,与先前的作品相比,对象盒的性能优惠。此外,我们执行严格的消融实验来评估我们方法的不同组成部分。我们的代码可在以下网址提供:https://github.com/mohsenzand/objectbox。
translated by 谷歌翻译
物体检测在计算机视觉中取得了巨大的进步。具有外观降级的小物体检测是一个突出的挑战,特别是对于鸟瞰观察。为了收集足够的阳性/阴性样本进行启发式训练,大多数物体探测器预设区域锚,以便将交叉联盟(iou)计算在地面判处符号数据上。在这种情况下,小物体经常被遗弃或误标定。在本文中,我们提出了一种有效的动态增强锚(DEA)网络,用于构建新颖的训练样本发生器。与其他最先进的技术不同,所提出的网络利用样品鉴别器来实现基于锚的单元和无锚单元之间的交互式样本筛选,以产生符合资格的样本。此外,通过基于保守的基于锚的推理方案的多任务联合训练增强了所提出的模型的性能,同时降低计算复杂性。所提出的方案支持定向和水平对象检测任务。对两个具有挑战性的空中基准(即,DotA和HRSC2016)的广泛实验表明,我们的方法以适度推理速度和用于训练的计算开销的准确性实现最先进的性能。在DotA上,我们的DEA-NET与ROI变压器的基线集成了0.40%平均平均精度(MAP)的先进方法,以便用较弱的骨干网(Resnet-101 VS Resnet-152)和3.08%平均 - 平均精度(MAP),具有相同骨干网的水平对象检测。此外,我们的DEA网与重新排列的基线一体化实现最先进的性能80.37%。在HRSC2016上,它仅使用3个水平锚点超过1.1%的最佳型号。
translated by 谷歌翻译
Crowd counting is usually handled in a density map regression fashion, which is supervised via a L2 loss between the predicted density map and ground truth. To effectively regulate models, various improved L2 loss functions have been proposed to find a better correspondence between predicted density and annotation positions. In this paper, we propose to predict the density map at one resolution but measure the density map at multiple resolutions. By maximizing the posterior probability in such a setting, we obtain a log-formed multi-resolution L2-difference loss, where the traditional single-resolution L2 loss is its particular case. We mathematically prove it is superior to a single-resolution L2 loss. Without bells and whistles, the proposed loss substantially improves several baselines and performs favorably compared to state-of-the-art methods on four crowd counting datasets, ShanghaiTech A & B, UCF-QNRF, and JHU-Crowd++.
translated by 谷歌翻译
无锚的检测器基本上将对象检测作为密集的分类和回归。对于流行的无锚检测器,通常是引入单个预测分支来估计本地化的质量。当我们深入研究分类和质量估计的实践时,会观察到以下不一致之处。首先,对于某些分配了完全不同标签的相邻样品,训练有素的模型将产生相似的分类分数。这违反了训练目标并导致绩效退化。其次,发现检测到具有较高信心的边界框与相应的地面真相具有较小的重叠。准确的局部边界框将被非最大抑制(NMS)过程中的精确量抑制。为了解决不一致问题,提出了动态平滑标签分配(DSLA)方法。基于最初在FCO中开发的中心概念,提出了平稳的分配策略。在[0,1]中将标签平滑至连续值,以在正样品和负样品之间稳定过渡。联合(IOU)在训练过程中会动态预测,并与平滑标签结合。分配动态平滑标签以监督分类分支。在这样的监督下,质量估计分支自然合并为分类分支,这简化了无锚探测器的体系结构。全面的实验是在MS Coco基准上进行的。已经证明,DSLA可以通过减轻上述无锚固探测器的不一致来显着提高检测准确性。我们的代码在https://github.com/yonghaohe/dsla上发布。
translated by 谷歌翻译
在实际人群计算应用程序中,图像中的人群密度差异很大。当面对密度变化时,人类倾向于在低密度区域定位和计数目标,并推理高密度区域的数量。我们观察到,CNN使用固定大小的卷积内核专注于局部信息相关性,而变压器可以通过使用全球自我注意机制有效地提取语义人群信息。因此,CNN可以在低密度区域中准确定位和估计人群,而在高密度区域中很难正确感知密度。相反,变压器在高密度区域具有很高的可靠性,但未能在稀疏区域定位目标。 CNN或变压器都无法很好地处理这种密度变化。为了解决此问题,我们提出了一个CNN和变压器自适应选择网络(CTASNET),该网络可以自适应地为不同密度区域选择适当的计数分支。首先,CTASNET生成CNN和变压器的预测结果。然后,考虑到CNN/变压器适用于低/高密度区域,密度引导的自适应选择模块被设计为自动结合CNN和Transformer的预测。此外,为了减少注释噪声的影响,我们引入了基于Correntropy的最佳运输损失。对四个挑战的人群计数数据集进行了广泛的实验,已经验证了该方法。
translated by 谷歌翻译
Single-frame InfraRed Small Target (SIRST) detection has been a challenging task due to a lack of inherent characteristics, imprecise bounding box regression, a scarcity of real-world datasets, and sensitive localization evaluation. In this paper, we propose a comprehensive solution to these challenges. First, we find that the existing anchor-free label assignment method is prone to mislabeling small targets as background, leading to their omission by detectors. To overcome this issue, we propose an all-scale pseudo-box-based label assignment scheme that relaxes the constraints on scale and decouples the spatial assignment from the size of the ground-truth target. Second, motivated by the structured prior of feature pyramids, we introduce the one-stage cascade refinement network (OSCAR), which uses the high-level head as soft proposals for the low-level refinement head. This allows OSCAR to process the same target in a cascade coarse-to-fine manner. Finally, we present a new research benchmark for infrared small target detection, consisting of the SIRST-V2 dataset of real-world, high-resolution single-frame targets, the normalized contrast evaluation metric, and the DeepInfrared toolkit for detection. We conduct extensive ablation studies to evaluate the components of OSCAR and compare its performance to state-of-the-art model-driven and data-driven methods on the SIRST-V2 benchmark. Our results demonstrate that a top-down cascade refinement framework can improve the accuracy of infrared small target detection without sacrificing efficiency. The DeepInfrared toolkit, dataset, and trained models are available at https://github.com/YimianDai/open-deepinfrared to advance further research in this field.
translated by 谷歌翻译
Recent one-stage object detectors follow a per-pixel prediction approach that predicts both the object category scores and boundary positions from every single grid location. However, the most suitable positions for inferring different targets, i.e., the object category and boundaries, are generally different. Predicting all these targets from the same grid location thus may lead to sub-optimal results. In this paper, we analyze the suitable inference positions for object category and boundaries, and propose a prediction-target-decoupled detector named PDNet to establish a more flexible detection paradigm. Our PDNet with the prediction decoupling mechanism encodes different targets separately in different locations. A learnable prediction collection module is devised with two sets of dynamic points, i.e., dynamic boundary points and semantic points, to collect and aggregate the predictions from the favorable regions for localization and classification. We adopt a two-step strategy to learn these dynamic point positions, where the prior positions are estimated for different targets first, and the network further predicts residual offsets to the positions with better perceptions of the object properties. Extensive experiments on the MS COCO benchmark demonstrate the effectiveness and efficiency of our method. With a single ResNeXt-64x4d-101-DCN as the backbone, our detector achieves 50.1 AP with single-scale testing, which outperforms the state-of-the-art methods by an appreciable margin under the same experimental settings.Moreover, our detector is highly efficient as a one-stage framework. Our code is public at https://github.com/yangli18/PDNet.
translated by 谷歌翻译
标签分配在现代对象检测模型中起着重要作用。检测模型可能会通过不同的标签分配策略产生完全不同的性能。对于基于锚的检测模型,锚点及其相应的地面真实边界框之间的IO(与联合的交点)是关键要素,因为正面样品和负样品除以IOU阈值。早期对象探测器仅利用所有训练样本的固定阈值,而最近的检测算法则基于基于IOUS到地面真相框的分布而着重于自适应阈值。在本文中,我们介绍了一种简单的同时有效的方法,可以根据预测的培训状态动态执行标签分配。通过在标签分配中引入预测,选择了更高的地面真相对象的高质量样本作为正样本,这可以减少分类得分和IOU分数之间的差异,并生成更高质量的边界框。我们的方法显示了使用自适应标签分配算法和这些正面样本的下限框损失的检测模型的性能的改进,这表明将更多具有较高质量预测盒的样品选择为阳性。
translated by 谷歌翻译
在这项研究中,我们深入研究了半监督对象检测〜(SSOD)所面临的独特挑战。我们观察到当前的探测器通常遭受3个不一致问题。 1)分配不一致,传统的分配策略对标记噪声很敏感。 2)子任务不一致,其中分类和回归预测在同一特征点未对准。 3)时间不一致,伪Bbox在不同的训练步骤中差异很大。这些问题导致学生网络的优化目标不一致,从而恶化了性能并减慢模型收敛性。因此,我们提出了一个系统的解决方案,称为一致的老师,以补救上述挑战。首先,自适应锚分配代替了基于静态的策略,该策略使学生网络能够抵抗嘈杂的psudo bbox。然后,我们通过设计功能比对模块来校准子任务预测。最后,我们采用高斯混合模型(GMM)来动态调整伪盒阈值。一致的老师在各种SSOD评估上提供了新的强大基线。只有10%的带注释的MS-Coco数据,它可以使用Resnet-50骨干实现40.0 MAP,该数据仅使用伪标签,超过了4个地图。当对完全注释的MS-Coco进行其他未标记的数据进行培训时,性能将进一步增加到49.1 MAP。我们的代码将很快开源。
translated by 谷歌翻译
估计公共场所的面膜磨损比率很重要,因为它使卫生当局能够及时分析和实施政策。报道了基于图像分析估计掩模磨损比的方法。但是,仍然对两种方法和数据集仍然缺乏全面的研究。最近的报告通过应用常规物体检测和分类方法直接提出估算比例。使用基于回归的方法来估计佩戴面具的人数是可行的,特别是对于具有微小和遮挡面孔的拥挤场景,但这并未得到很好的研究。大规模和良好的注释数据集仍在需求。在本文中,我们提出了两种比率估计方法,其利用基于检测的或基于回归的方法。对于基于检测的方法,我们改进了最先进的面部探测器,RetinaFace,用于估计比率。对于基于回归的方法,我们微调基线网络CSRNet,用于估计屏蔽和未屏蔽面的密度图。我们还提供了第一个大规模数据集,其中包含从18,088个视频帧中提取的581,108脸注释,从17个街道视图视频中提取了581,108个脸部注释。实验表明,基于视网膜的方法在各种情况下具有更高的准确性,并且由于其紧凑性,基于CSRNet的方法具有更短的操作时间。
translated by 谷歌翻译
Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve stateof-the-art detectors by a large margin to 50.7% AP without introducing any overhead. The code is available at https://github.com/sfzhang15/ATSS.
translated by 谷歌翻译
现有的实例分割方法已经达到了令人印象深刻的表现,但仍遭受了共同的困境:一个实例推断出冗余表示(例如,多个框,网格和锚点),这导致了多个重复的预测。因此,主流方法通常依赖于手工设计的非最大抑制(NMS)后处理步骤来选择最佳预测结果,这会阻碍端到端训练。为了解决此问题,我们建议一个称为Uniinst的无盒和无端机实例分割框架,该框架仅对每个实例产生一个唯一的表示。具体而言,我们设计了一种实例意识到的一对一分配方案,即仅产生一个表示(Oyor),该方案根据预测和地面真相之间的匹配质量,动态地为每个实例动态分配一个独特的表示。然后,一种新颖的预测重新排列策略被优雅地集成到框架中,以解决分类评分和掩盖质量之间的错位,从而使学习的表示形式更具歧视性。借助这些技术,我们的Uniinst,第一个基于FCN的盒子和无NMS实例分段框架,实现竞争性能,例如,使用Resnet-50-FPN和40.2 mask AP使用Resnet-101-FPN,使用Resnet-50-FPN和40.2 mask AP,使用Resnet-101-FPN,对抗AP可可测试-DEV的主流方法。此外,提出的实例感知方法对于遮挡场景是可靠的,在重锁定的ochuman基准上,通过杰出的掩码AP优于公共基线。我们的代码将在出版后提供。
translated by 谷歌翻译
We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of our knowledge, this work is the first to properly handle such noise at multiple scales in end-to-end loss design and thus push the crowd counting state-of-the-art. We model the noise of crowd annotation points as a Gaussian and derive the crowd probability density map from the input image. We then approximate the joint distribution of crowd density maps with the full covariance of multiple scales and derive a low-rank approximation for tractability and efficient implementation. The derived scale-aware loss function is used to train the SPF-Net. We show that it outperforms various loss functions on four public datasets: UCF-QNRF, UCF CC 50, NWPU and ShanghaiTech A-B datasets. The proposed SPF-Net can accurately predict the locations of people in the crowd, despite training on noisy training annotations.
translated by 谷歌翻译
痤疮检测对于解释性诊断和对皮肤疾病的精确治疗至关重要。任意边界和痤疮病变的尺寸较小,导致在两阶段检测中大量质量较差的建议。在本文中,我们提出了一个针对地区建议网络的新型头部结构,以两种方式提高建议的质量。首先,提出了一个空间意识的双头(SADH)结构,以从两个不同的空间角度从分类和本地化进行分类和本地化的表示。拟议的SADH确保了更陡峭的分类信心梯度,并抑制了与匹配的地面真理相交(IOU)低相交(IOU)的建议。然后,我们提出了一个归一化的Wasserstein距离预测分支,以改善提议分类评分与IOU之间的相关性。此外,为了促进痤疮检测的进一步研究,我们构建了一个名为Acnescu的新数据集,具有高分辨率成像,精确的注释和细粒度的病变类别。对AcnesCU和公共数据集Acne04进行了广泛的实验,结果表明该方法可以提高建议的质量,始终超过最先进的方法。代码和收集的数据集可在https://github.com/pingguokiller/acnedetection中找到。
translated by 谷歌翻译
We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), FCOS with ResNeXt-64x4d-101 achieves 44.7% in AP with single-model and single-scale testing, surpassing previous one-stage detectors with the advantage of being much simpler. For the first time, we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code is available at:tinyurl.com/FCOSv1
translated by 谷歌翻译
检测微小的物体是一个非常具有挑战性的问题,因为一个小物体只包含几个像素的大小。我们证明,由于缺乏外观信息,最新的检测器不会对微小物体产生令人满意的结果。我们的主要观察结果是,基于联合(IOU)的相交(例如IOU本身及其扩展)对微小物体的位置偏差非常敏感,并且在基于锚固的检测器中使用时会大大恶化检测性能。为了减轻这一点,我们提出了使用Wasserstein距离进行微小对象检测的新评估度量。具体而言,我们首先将边界框建模为2D高斯分布,然后提出一个新的公制称为标准化的瓦斯汀距离(NWD),以通过相应的高斯分布来计算它们之间的相似性。提出的NWD度量可以轻松地嵌入分配中,非最大抑制作用以及任何基于锚固的检测器的损耗函数,以替换常用的IOU度量。我们在新的数据集上评估了我们的度量,以用于微小对象检测(AI-TOD),其中平均对象大小比现有对象检测数据集小得多。广泛的实验表明,在配备NWD指标时,我们的方法的性能比标准的微调基线高6.7 AP点,并且比最先进的竞争对手高6.0 AP点。代码可在以下网址提供:https://github.com/jwwangchn/nwd。
translated by 谷歌翻译
指导可学习的参数优化的一种吸引人的方法,例如特征图,是全球关注,它以成本的一小部分启发了网络智能。但是,它的损失计算过程仍然很短:1)我们只能产生一维的“伪标签”,因为该过程中涉及的人工阈值不健壮; 2)等待损失计算的注意力必然是高维的,而通过卷积减少它将不可避免地引入其他可学习的参数,从而使损失的来源混淆。为此,我们设计了一个基于软磁性注意的简单但有效的间接注意力优化(IIAO)模块,该模块将高维注意图转换为数学意义上的一维功能图,以通过网络中途进行损失计算,同时自动提供自适应多尺度融合以配备金字塔模块。特殊转化产生相对粗糙的特征,最初,区域的预测性谬误性随着人群的密度分布而变化,因此我们定制区域相关损失(RCLOSS)以检索连续错误的错误区域和平滑的空间信息。广泛的实验证明,我们的方法在许多基准数据集中超过了先前的SOTA方法。
translated by 谷歌翻译