智能论文笔记

Adversarial Focal Loss: Asking Your Discriminator for Hard Examples

Chen Liu , Xiaomeng Dong , Michael Potter , Hsi-Ming Chang , Ravi Soni

分类：计算机视觉 | 机器学习

2022-07-15

焦点损失已获得了令人难以置信的知名度，因为它使用一种简单的技术来识别和利用硬性示例来在分类方面取得更好的性能。但是，此方法不容易在分类任务之外概括，例如在KePoint检测中。在本文中，我们提出了对焦点检测任务的焦点损失的新颖适应，称为对抗局灶性损失（AFL）。AFL不仅在语义上类似于焦点损失，而且还可以作为任意损失功能的插头升级。尽管焦点损失需要分类器的输出，但AFL利用单独的对抗网络来为每个输入产生难度分数。然后，即使在没有分类器的情况下，也可以将这种难度分数用于在硬示例上的学习优先级。在这项工作中，我们展示了AFL在增强关键点检测中现有方法的有效性，并验证其根据难度重新提交示例的能力。

translated by 谷歌翻译

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

Bowen Cheng , Bin Xiao , Jingdong Wang , Honghui Shi , Thomas S. Huang , Lei Zhang

分类：

2019-08-27

Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multiresolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHR-Net outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all topdown methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene. The code and models are available at https://github.com/HRNet/ Higher-HRNet-Human-Pose-Estimation.

translated by 谷歌翻译

Computer Vision on X-ray Data in Industrial Production and Security Applications: A survey

Mehdi Rafiei , Jenni Raitoharju , Alexandros Iosifidis

分类：计算机视觉

2022-11-10

X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.

translated by 谷歌翻译

2D Human Pose Estimation with Explicit Anatomical Keypoints Structure Constraints

Zhangjian Ji , Zilong Wang , Ming Zhang , Yapeng Chen , Yuhua Qian

分类：计算机视觉

2022-12-05

Recently, human pose estimation mainly focuses on how to design a more effective and better deep network structure as human features extractor, and most designed feature extraction networks only introduce the position of each anatomical keypoint to guide their training process. However, we found that some human anatomical keypoints kept their topology invariance, which can help to localize them more accurately when detecting the keypoints on the feature map. But to the best of our knowledge, there is no literature that has specifically studied it. Thus, in this paper, we present a novel 2D human pose estimation method with explicit anatomical keypoints structure constraints, which introduces the topology constraint term that consisting of the differences between the distance and direction of the keypoint-to-keypoint and their groundtruth in the loss object. More importantly, our proposed model can be plugged in the most existing bottom-up or top-down human pose estimation methods and improve their performance. The extensive experiments on the benchmark dataset: COCO keypoint dataset, show that our methods perform favorably against the most existing bottom-up and top-down human pose estimation methods, especially for Lite-HRNet, when our model is plugged into it, its AP scores separately raise by 2.9\% and 3.3\% on COCO val2017 and test-dev2017 datasets.

translated by 谷歌翻译

Focal loss for dense object detection

分类：

translated by 谷歌翻译

SpineOne: A One-Stage Detection Framework for Degenerative Discs and Vertebrae

Jiabo He , Wei Liu , Yu Wang , Xingjun Ma , Xian-Sheng Hua

分类：计算机视觉

2021-10-28

脊柱退化困扰着许多长老，办公室工作者，甚至是年轻世代。有效的药剂或外科干预措施可以帮助缓解退行性脊柱条件。然而，传统的诊断程序往往太费力了。临床专家需要从脊柱磁共振成像（MRI）或计算机断层扫描（CT）图像中检测椎间盘和椎骨作为进行病理诊断或术前评价的初步步骤。已经开发了机器学习系统，以帮助这一程序通常在两级方法之后：首先进行解剖定位，然后进行病理分类。为了更高效和准确的诊断，我们提出了一种单阶段检测框架，称为Spineone，同时定位和分类来自MRI切片的退化椎间盘和椎骨。脊柱内置于以下三个关键技术：1）Keypoint Heatmap的新设计，以促进同时关键点本地化和分类; 2）使用注意力模块更好地区分光盘和椎骨之间的表示; 3）一种新颖的梯度引导的客观协会机制，将多个学习目标与后来的培训阶段相关联。脊髓疾病智能诊断的经验结果Tianchi竞争（SDID-TC）550考试的数据集表明，我们的方法通过大幅度超越现有方法。

translated by 谷歌翻译

Class-Difficulty Based Methods for Long-Tailed Visual Recognition

Saptarshi Sinha , Hiroki Ohashi , Katsuyuki Nakamura

分类：计算机视觉 | 人工智能

2022-07-29

与其他类别（称为少数族裔或尾巴类）相比，很少的类或类别（称为多数或头等类别的类别）具有更高的数据样本数量，在现实世界中，长尾数据集经常遇到。在此类数据集上培训深层神经网络会给质量级别带来偏见。到目前为止，研究人员提出了多种加权损失和数据重新采样技术，以减少偏见。但是，大多数此类技术都认为，尾巴类始终是最难学习的类，因此需要更多的重量或注意力。在这里，我们认为该假设可能并不总是成立的。因此，我们提出了一种新颖的方法，可以在模型的训练阶段动态测量每个类别的瞬时难度。此外，我们使用每个班级的难度度量来设计一种新型的加权损失技术，称为“基于阶级难度的加权（CDB-W）损失”和一种新型的数据采样技术，称为“基于类别难度的采样）（CDB-S ）'。为了验证CDB方法的广泛可用性，我们对多个任务进行了广泛的实验，例如图像分类，对象检测，实例分割和视频操作分类。结果验证了CDB-W损失和CDB-S可以在许多类似于现实世界中用例的类别不平衡数据集（例如Imagenet-LT，LVIS和EGTEA）上实现最先进的结果。

translated by 谷歌翻译

End-to-End Trainable Multi-Instance Pose Estimation with Transformers

Lucas Stoffl , Maxime Vidal , Alexander Mathis

分类：计算机视觉

2021-03-22

我们提出了一种用于多实例姿态估计的端到端培训方法，称为诗人（姿势估计变压器）。将卷积神经网络与变压器编码器 - 解码器架构组合，我们将多个姿势估计从图像标记为直接设置预测问题。我们的模型能够使用双方匹配方案直接出现所有个人的姿势。诗人使用基于集的全局损失进行培训，该丢失包括关键点损耗，可见性损失和载重损失。诗歌的原因与多个检测到的个人与完整图像上下文之间的关系直接预测它们并行姿势。我们展示诗人在Coco Keypoint检测任务上实现了高精度，同时具有比其他自下而上和自上而下的方法更少的参数和更高推理速度。此外，在将诗人应用于动物姿势估计时，我们表现出了成功的转移学习。据我们所知，该模型是第一个端到端的培训多实例姿态估计方法，我们希望它将成为一种简单而有前途的替代方案。

translated by 谷歌翻译

KTN: Knowledge Transfer Network for Learning Multi-person 2D-3D Correspondences

Xuanhan Wang , Lianli Gao , Yixuan Zhou , Jingkuan Song , Meng Wang

分类：计算机视觉

2022-06-21

人类茂密的估计旨在建立人体2D像素与3D人体模板之间的密集对应关系，是使机器能够了解图像中人员的关键技术。由于实际场景是复杂的，只有部分注释可用，导致无能为力或错误的估计，它仍然构成了几个挑战。在这项工作中，我们提出了一个新颖的框架，以检测图像中多人的密集。我们指的是知识转移网络（KTN）的建议方法解决了两个主要问题：1）如何完善图像表示以减轻不完整的估计，以及2）如何减少由低质量培训标签引起的错误估计（即。，有限的注释和班级不平衡标签）。与现有的作品直接传播区域的锥体特征以进行致密估计，KTN使用金字塔表示的改进，同时它可以维持特征分辨率并抑制背景像素，并且这种策略导致准确性大幅提高。此外，KTN通过外部知识增强了基于3D的身体解析的能力，在该知识中，它通过结构性的身体知识图，从足够的注释作为基于3D的身体解析器进行训练。通过这种方式，它大大减少了由低质量注释引起的不利影响。 KTN的有效性通过其优越的性能优于致密coco数据集的最先进方法。关于代表性任务（例如，人体分割，人体部分分割和关键点检测）和两个流行的致密估计管道（即RCNN和全面卷积框架）的广泛消融研究和实验结果，进一步表明了提议方法的概括性。

translated by 谷歌翻译

HoughNet: Integrating near and long-range evidence for visual detection

Nermin Samet , Samet Hicsonmez , Emre Akbas

分类：计算机视觉

2021-04-14

本文介绍了Houghnet，这是一种单阶段，无锚，基于投票的，自下而上的对象检测方法。受到广义的霍夫变换的启发，霍尼特通过在该位置投票的总和确定了某个位置的物体的存在。投票是根据对数极极投票领域的近距离和长距离地点收集的。由于这种投票机制，Houghnet能够整合近距离和远程的班级条件证据以进行视觉识别，从而概括和增强当前的对象检测方法，这通常仅依赖于本地证据。在可可数据集中，Houghnet的最佳型号达到$ 46.4 $ $ $ ap $（和$ 65.1 $ $ $ ap_ {50} $），与自下而上的对象检测中的最先进的作品相同，超越了最重要的一项 - 阶段和两阶段方法。我们进一步验证了提案在其他视觉检测任务中的有效性，即视频对象检测，实例分割，3D对象检测和人为姿势估计的关键点检测以及其他“图像”图像生成任务的附加“标签”，其中集成的集成在所有情况下，我们的投票模块始终提高性能。代码可在https://github.com/nerminsamet/houghnet上找到。

translated by 谷歌翻译

Instance Semantic Segmentation Benefits from Generative Adversarial Networks

Quang H. Le , Kamal Youcef-Toumi , Dzmitry Tsetserukou , Ali Jahanian

分类：计算机视觉

2020-10-26

在重建掩码的实例分段网络的设计中，分段通常是其文字定义 - 分配每个像素标签。这导致了将问题视为匹配一个问题，其中一个目标是最小化重建和地面真相像素之间的损耗。重新思考重建网络作为发电机，我们定义了预测掩模作为GAN游戏框架的问题：分割网络生成掩码，鉴别器网络决定掩码的质量。为了演示这个游戏，我们对掩模R-CNN的普通分段框架显示了有效修改。我们发现，在特征空间中播放游戏比导致鉴别器和发电机之间的稳定训练的像素空间更有效，应该通过预测对象的上下文区域来替换预测对象坐标，并且整体对抗性损失有助于性能和消除每个不同数据域的任何自定义设置都需要。我们在各个域中测试我们的框架并报告手机回收，自动驾驶，大规模对象检测和医用腺体。我们观察到一般的GANS产生掩模，该掩模占克里克里德界，杂乱，小物体和细节，处于规则形状或异质和聚结形状的领域。我们的再现结果的代码可公开提供。

translated by 谷歌翻译

DeepSportLab: a Unified Framework for Ball Detection, Player Instance Segmentation and Pose Estimation in Team Sports Scenes

Seyed Abolfazl Ghasemzadeh , Gabriel Van Zandycke , Maxime Istasse , Niels Sayez , Amirafshar Moshtaghpour , Christophe De Vleeschouwer

分类：计算机视觉

2021-12-01

本文提出了一个统一的框架到（i）找到球，（ii）预测姿势，（iii）在团队体育场景中分段播放器的实例掩码。这些问题对自动体育分析，生产和广播有高兴趣。常见做法是通过利用通用最先进的模型，例如Panoptic-Deeblab来单独解决每个问题，用于玩家分割。除了从单任务模型的乘法乘以增加的复杂性之外，由于团队体育场景的复杂性和特异性，使用现成的架子模型也会阻碍性能，如强大的遮挡和运动模糊。为了规避这些限制，我们的论文提出培训一种单一的模型，它通过组合零件强度场和空间嵌入原理来预测球和玩家掩模和姿势。部件强度场提供球和播放器位置，以及播放器接头位置。然后利用空间嵌入来将播放器实例像素联系到其各自的播放器中心，而且还将播放器接头分组成骷髅。我们展示了拟议模型在DeepSport篮球数据集上的有效性，为单独解决每个单独任务的SOA模型实现了可比性的性能。

translated by 谷歌翻译

Cascaded Pyramid Network for Multi-Person Pose Estimation

Yilun Chen , Zhicheng Wang , Yuxiang Peng , Zhiqiang Zhang , Gang Yu , Jian Sun

分类：

2017-11-20

The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: Glob-alNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the Global-Net together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve stateof-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge. Code 1 and the detection results are publicly available for further research.

translated by 谷歌翻译

Deep Learning-Based Automatic Diagnosis System for Developmental Dysplasia of the Hip

Yang Li , Leo Yan Li-Han , Hua Tian

分类：人工智能 | 计算机视觉 | 机器学习

2022-09-07

作为一线诊断成像方式，射线照相在早期检测髋关节发育不良（DDH）中起着至关重要的作用。在临床上，DDH的诊断依赖于手动测量和对骨盆X光片不同解剖特征的主观评估。这个过程效率低下且容易出错，需要多年的临床经验。在这项研究中，我们提出了一个基于深度学习的系统，该系统自动从X光片中自动检测14个关键点，测量三个解剖学角度（中心边缘，T \“ Onnis和Sharp Angles），并将DDH臀部分类为I-IV级别I-IV级别此外，提出了一种新型数据驱动的评分系统，以定量地整合DDH诊断的信息。提出的键盘检测模型达到了平均值（95％置信区间[CI]）的平均精度为0.807）（0.804-0.810。）和0.953（0.947-0.960），它们明显高于经验丰富的骨科医生（p <0.0001）。此外，使用拟议的得分获得的平均（95％CI）测试诊断协议（Cohen's Kappa）系统为0.84（0.83-0.85），whi CH显着高于从诊断标准获得的单个角度（0.76 [0.75-0.77]）和骨科医生（0.71 [0.63-0.79]）的CH。据我们所知，这是通过利用深度学习关键点检测和整合不同解剖学测量值的首次进行客观DDH诊断的研究，这可以为临床决策提供可靠且可解释的支持。

translated by 谷歌翻译

Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation

Zhehan Kan , Shuoshuo Chen , Zeng Li , Zhihai He

分类：计算机视觉 | 人工智能

2022-07-06

我们观察到，由于不同身体部位的生物学约束，人类的姿势表现出强大的群体结构相关性和空间耦合。可以探索这种群体结构相关性，以提高人类姿势估计的准确性和鲁棒性。在这项工作中，我们开发了一个自我控制的预测验证网络，以表征和学习训练过程中关键点之间的结构相关性。在推理阶段，来自验证网络的反馈信息使我们能够进一步优化姿势预测，从而显着提高了人类姿势估计的性能。具体而言，我们根据人体的生物结构将关键点分组分组。在每个组中，关键点进一步分为两个子集，高信心基础关键点和低信心终端关键点。我们开发一个自我约束的预测验证网络，以在这些关键点子集之间执行前向和向后的预测。姿势估计以及通用预测任务中的一个基本挑战是，由于无法获得地面真相，因此我们没有机制可以验证获得的姿势估计或预测结果是否准确。一旦成功学习，验证网络将用作前向姿势预测的准确性验证模块。在推理阶段，它可用于指导低保持信心关键点的姿势估计结果的局部优化，而高信心关键点的自我约束损失是目标函数。我们对基准MS可可和人群数据集的广泛实验结果表明，所提出的方法可以显着改善姿势估计结果。

translated by 谷歌翻译

Equalization Loss for Long-Tailed Object Recognition

Jingru Tan , Changbao Wang , Buyu Li , Quanquan Li , Wanli Ouyang , Changqing Yin , Junjie Yan

分类：

2020-03-11

Object recognition techniques using convolutional neural networks (CNN) have achieved great success. However, state-of-the-art object detection methods still perform poorly on large vocabulary and long-tailed datasets, e.g. LVIS.In this work, we analyze this problem from a novel perspective: each positive sample of one category can be seen as a negative sample for other categories, making the tail categories receive more discouraging gradients. Based on it, we propose a simple but effective loss, named equalization loss, to tackle the problem of long-tailed rare categories by simply ignoring those gradients for rare categories. The equalization loss protects the learning of rare categories from being at a disadvantage during the network parameter updating. Thus the model is capable of learning better discriminative features for objects of rare classes. Without any bells and whistles, our method achieves AP gains of 4.1% and 4.8% for the rare and common categories on the challenging LVIS benchmark, compared to the Mask R-CNN baseline. With the utilization of the effective equalization loss, we finally won the 1st place in the LVIS Challenge 2019. Code has been made available at: https: //github.com/tztztztztz/eql.detectron2

translated by 谷歌翻译

Deep High-Resolution Representation Learning for Human Pose Estimation

Ke Sun , Bin Xiao , Dong Liu , Jingdong Wang

分类：

2019-02-25

In this paper, we are interested in the human pose estimation problem with a focus on learning reliable highresolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process.We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutliresolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich highresolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. In addition, we show the superiority of our network in pose tracking on the PoseTrack dataset. The code and models have been publicly available at https://github.com/leoxiaobin/ deep-high-resolution-net.pytorch.

translated by 谷歌翻译

Transfer Learning for Pose Estimation of Illustrated Characters

Shuhong Chen , Matthias Zwicker

分类：计算机视觉

2021-08-04

人类姿势信息是许多下游图像处理任务中的关键组成部分，例如活动识别和运动跟踪。同样地，所示字符域的姿势估计器将在辅助内容创建任务中提供有价值的，例如参考姿势检索和自动字符动画。但是，虽然现代数据驱动技术在自然图像上具有显着提高的姿态估计性能，但是对于插图来说已经完成了很少的工作。在我们的工作中，我们通过从域特定的和任务特定的源模型有效地学习来弥合这个域名差距。此外，我们还升级和展开现有的所示姿势估计数据集，并引入两个用于分类和分段子任务的新数据集。然后，我们应用所产生的最先进的角色姿势估算器来解决姿势引导例证检索的新颖任务。所有数据，模型和代码都将公开可用。

translated by 谷歌翻译

Poseur: Direct Human Pose Regression with Transformers

Weian Mao , Yongtao Ge , Chunhua Shen , Zhi Tian , Xinlong Wang , Zhibin Wang , Anton van den Hengel

分类：计算机视觉

2022-01-19

我们提出了一种直接的，基于回归的方法，以从单个图像中估计2D人姿势。我们将问题提出为序列预测任务，我们使用变压器网络解决了问题。该网络直接学习了从图像到关键点坐标的回归映射，而无需诉诸中间表示（例如热图）。这种方法避免了与基于热图的方法相关的许多复杂性。为了克服以前基于回归的方法的特征错位问题，我们提出了一种注意机制，该机制适应与目标关键最相关的功能，从而大大提高了准确性。重要的是，我们的框架是端到端的可区分，并且自然学会利用关键点之间的依赖关系。两个主要的姿势估计数据集在MS-Coco和MPII上进行的实验表明，我们的方法在基于回归的姿势估计中的最新方法显着改善。更值得注意的是，与最佳的基于热图的姿势估计方法相比，我们的第一种基于回归的方法是有利的。

translated by 谷歌翻译

SOLO: Segmenting Objects by Locations

Xinlong Wang , Tao Kong , Chunhua Shen , Yuning Jiang , Lei Li

分类：

2019-12-10

We present a new, embarrassingly simple approach to instance segmentation. Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that have made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the "detect-then-segment" strategy (e.g., Mask R-CNN), or predict embedding vectors first then use clustering techniques to group pixels into individual instances. We view the task of instance segmentation from a completely new perspective by introducing the notion of "instance categories", which assigns categories to each pixel within an instance according to the instance's location and size, thus nicely converting instance segmentation into a single-shot classification-solvable problem. We demonstrate a much simpler and flexible instance segmentation framework with strong performance, achieving on par accuracy with Mask R-CNN and outperforming recent single-shot instance segmenters in accuracy. We hope that this simple and strong framework can serve as a baseline for many instance-level recognition tasks besides instance segmentation. Code is available at https://git.io/AdelaiDet

translated by 谷歌翻译