智能论文笔记

Feature Erasing and Diffusion Network for Occluded Person Re-Identification

Zhikang Wang , Feng Zhu , Shixiang Tang , Rui Zhao , Lihuo He , Jiangning Song

分类：计算机视觉

2021-12-16

闭塞者重新识别（REID）旨在匹配遮挡人物在不同的相机视图上的整体上。目标行人（TP）通常受到非行人闭塞（NPO）和Nontarget行人（NTP）的干扰。以前的方法主要集中在忽略NTP的特征污染的同时越来越越来越多的模型对非NPO的鲁棒性。在本文中，我们提出了一种新颖的特征擦除和扩散网络（FED），同时处理NPO和NTP。具体地，我们的建议闭塞擦除模块（OEM）消除了NPO特征，并由NPO增强策略辅助，该策略模拟整体行人图像上的NPO并产生精确的遮挡掩模。随后，我们随后，我们将行人表示与其他记忆特征弥散，以通过学习的跨关注机构通过新颖的特征扩散模块（FDM）实现的特征空间中的NTP特征。随着OEM的闭塞分数的指导，特征扩散过程主要在可见的身体部位上进行，保证合成的NTP特性的质量。通过在我们提出的联邦网络中联合优化OEM和FDM，我们可以大大提高模型对TP的看法能力，并减轻NPO和NTP的影响。此外，所提出的FDM仅用作用于训练的辅助模块，并将在推理阶段中丢弃，从而引入很少的推理计算开销。遮挡和整体人员Reid基准的实验表明了美联储最先进的优越性，喂养的含量在封闭式封闭的内容上取得了86.3％的排名 - 1准确性，超过其他人至少4.7％。

translated by 谷歌翻译

Quality-aware Part Models for Occluded Person Re-identification

Pengfei Wang , Changxing Ding , Zhiyin Shao , Zhibin Hong , Shengli Zhang , Dacheng Tao

分类：计算机视觉

2022-01-01

遮挡对人重新识别（Reid）构成了重大挑战。现有方法通常依赖于外部工具来推断可见的身体部位，这在计算效率和Reid精度方面可能是次优。特别是，在面对复杂的闭塞时，它们可能会失败，例如行人之间的遮挡。因此，在本文中，我们提出了一种名为M质量感知部分模型（QPM）的新方法，用于遮挡鲁棒Reid。首先，我们建议共同学习零件特征和预测部分质量分数。由于没有提供质量注释，我们介绍了一种自动将低分分配给闭塞体部位的策略，从而削弱了遮挡体零落在Reid结果上的影响。其次，基于预测部分质量分数，我们提出了一种新颖的身份感知空间关注（ISA）模块。在该模块中，利用粗略标识感知功能来突出目标行人的像素，以便处理行人之间的遮挡。第三，我们设计了一种自适应和有效的方法，用于了解来自每个图像对的共同非遮挡区域的全局特征。这种设计至关重要，但经常被现有方法忽略。 QPM有三个关键优势：1）它不依赖于培训或推理阶段的任何外部工具; 2）它处理由物体和其他行人引起的闭塞; 3）它是高度计算效率。对闭塞Reid的四个流行数据库的实验结果证明QPM始终如一地以显着的利润方式优于最先进的方法。 QPM代码将被释放。

translated by 谷歌翻译

Dynamic Prototype Mask for Occluded Person Re-Identification

Lei Tan , Pingyang Dai , Rongrong Ji , Yongjian Wu

分类：计算机视觉

2022-07-19

尽管近年来人的重新识别取得了令人印象深刻的改善，但在实际应用程序场景中，由不同的障碍引起的常见闭塞案例仍然是一个不稳定的问题。现有方法主要通过采用额外网络提供的身体线索来区分可见部分，以解决此问题。然而，助理模型和REID数据集之间的不可避免的域间隙极大地增加了获得有效和有效模型的困难。为了摆脱额外的预训练网络并在端到端可训练网络中实现自动对齐，我们根据两个不言而喻的先验知识提出了一种新型的动态原型掩码（DPM）。具体而言，我们首先设计了一个层次蒙版生成器，该层面生成器利用层次的语义选择高质量的整体原型和闭塞输入图像的特征表示之间的可见图案空间。在这种情况下，可以自发地在选定的子空间中很好地对齐。然后，为了丰富高质量整体原型的特征表示并提供更完整的特征空间，我们引入了一个头部丰富模块，以鼓励不同的头部在整个图像中汇总不同的模式表示。对被遮挡和整体人员重新识别基准进行的广泛的实验评估证明了DPM优于最先进的方法。该代码在https://github.com/stone96123/dpm上发布。

translated by 谷歌翻译

Deep Learning-based Occluded Person Re-identification: A Survey

Yunjie Peng , Saihui Hou , Chunshui Cao , Xu Liu , Yongzhen Huang , Zhiqiang He

分类：计算机视觉

2022-07-29

被遮挡的人重新识别（RE-ID）旨在解决跨多个摄像机感兴趣的人时解决遮挡问题。随着深度学习技术的促进和对智能视频监视的需求的不断增长，现实世界应用中的频繁闭塞使闭塞的人重新引起了研究人员的极大兴趣。已经提出了大量封闭的人重新ID方法，而很少有针对遮挡的调查。为了填补这一空白并有助于提高未来的研究，本文提供了对封闭者重新ID的系统调查。通过对人体闭塞的深入分析，发现大多数现有方法仅考虑一部分闭塞问题。因此，我们从问题和解决方案的角度回顾了与闭塞相关的人重新ID方法。我们总结了个人重新闭塞引起的四个问题，即位置错位，规模错位，嘈杂的信息和缺失的信息。然后对解决不同问题的闭塞相关方法进行分类和引入。之后，我们总结并比较了四个流行数据集上最近被遮挡的人重新ID方法的性能：部分reid，部分易边，咬合 - 固定和遮挡的dukemtmc。最后，我们提供了有关有希望的未来研究方向的见解。

translated by 谷歌翻译

Dynamic Template Initialization for Part-Aware Person Re-ID

Kalana Abeywardena , Shechem Sumanthiran , Sanoojan Baliah , Nadarasar Bahavan , Nalith Udugampola , Ajith Pasqual , Chamira Edussooriya , Ranga Rodrigo

分类：计算机视觉 | 人工智能

2022-08-24

许多现有人员的重新识别（RE-ID）方法取决于特征图，这些特征图可以分区以定位一个人的部分或减少以创建全球表示形式。尽管部分定位已显示出显着的成功，但它使用了基于位置的分区或静态特征模板。但是，这些假设假设零件在给定图像或其位置中的先前存在，忽略了特定于图像的信息，这些信息限制了其在挑战性场景中的可用性，例如用部分遮挡和部分探针图像进行重新添加。在本文中，我们介绍了一个基于空间注意力的动态零件模板初始化模块，该模块在主链的早期层中使用中级语义特征动态生成零件序列。遵循自发注意力的层，使用简化的跨注意方案来使用主链的人体部分特征来提取各种人体部位的模板特征，提高整个模型的判别能力。我们进一步探索零件描述符的自适应加权，以量化局部属性的缺失或阻塞，并抑制相应零件描述子对匹配标准的贡献。关于整体，遮挡和部分重新ID任务基准的广泛实验表明，我们提出的架构能够实现竞争性能。代码将包含在补充材料中，并将公开提供。

translated by 谷歌翻译

HTML版本

PGGANet: Pose Guided Graph Attention Network for Person Re-identification

Zhijun He , Hongbo Zhao , Wenquan Feng

分类：计算机视觉

2021-11-29

人重新识别（Reid）旨在从不同摄像机捕获的图像中检索一个人。对于基于深度学习的REID方法，已经证明，使用本地特征与人物图像的全局特征可以帮助为人员检索提供强大的特征表示。人类的姿势信息可以提供人体骨架的位置，有效地指导网络在这些关键领域更加关注这些关键领域，也可能有助于减少来自背景或闭塞的噪音分散。然而，先前与姿势相关的作品提出的方法可能无法充分利用姿势信息的好处，并没有考虑不同当地特征的不同贡献。在本文中，我们提出了一种姿势引导图注意网络，一个多分支架构，包括一个用于全局特征的一个分支，一个用于中粒体特征的一个分支，一个分支用于细粒度关键点特征。我们使用预先训练的姿势估计器来生成本地特征学习的关键点热图，并仔细设计图表卷积层以通过建模相似关系来重新评估提取的本地特征的贡献权重。实验结果表明我们对歧视特征学习的方法的有效性，我们表明我们的模型在几个主流评估数据集上实现了最先进的表演。我们还对我们的网络进行了大量的消融研究和设计不同类型的比较实验，以证明其有效性和鲁棒性，包括整体数据集，部分数据集，遮挡数据集和跨域测试。

translated by 谷歌翻译

DiP: Learning Discriminative Implicit Parts for Person Re-Identification

Dengjie Li , Siyu Chen , Yujie Zhong , Fan Liang , Lin Ma

分类：计算机视觉

2022-12-24

In person re-identification (ReID) tasks, many works explore the learning of part features to improve the performance over global image features. Existing methods extract part features in an explicit manner, by either using a hand-designed image division or keypoints obtained with external visual systems. In this work, we propose to learn Discriminative implicit Parts (DiPs) which are decoupled from explicit body parts. Therefore, DiPs can learn to extract any discriminative features that can benefit in distinguishing identities, which is beyond predefined body parts (such as accessories). Moreover, we propose a novel implicit position to give a geometric interpretation for each DiP. The implicit position can also serve as a learning signal to encourage DiPs to be more position-equivariant with the identity in the image. Lastly, a set of attributes and auxiliary losses are introduced to further improve the learning of DiPs. Extensive experiments show that the proposed method achieves state-of-the-art performance on multiple person ReID benchmarks.

translated by 谷歌翻译

Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer

Tao Wang , Hong Liu , Pinhao Song , Tianyu Guo , Wei Shi

分类：计算机视觉

2021-12-05

被遮挡的人重新识别是一个具有挑战性的任务，因为某些场景中的某些障碍（例如树木，汽车和行人）封闭人体部分。一些现有的姿势引导方法通过根据图形匹配对准身体部位来解决这个问题，但这些基于图的方法不直观和复杂。因此，我们提出了一种基于变压器的姿态引导特征解除留出（PFD）方法，通过利用姿势信息来清楚地解散语义部件（例如人体或关节部件）并相应地选择性地匹配非封闭部分。首先，视觉变压器（VIV）用于提取具有强大功能的贴片功能。其次，为了从补丁信息预先解散姿势信息，匹配和分配机制在姿势引导特征聚合（PFA）模块中利用。第三，在变压器解码器中引入了一组学习的语义视图，以隐式增强解除戒备的身体部位特征。然而，没有额外监督，那些语义视图并不保证与身体相关。因此，提出了姿势视图匹配（PVM）模块以明确匹配可见的身体部位并自动分离遮挡功能。第四，为了更好地防止闭塞的干扰，我们设计了一个姿势引导的推动损失，强调了可见的身体部位的特征。对于两个任务（封闭和整体RE-ID）的五个具有挑战性的数据集进行了广泛的实验表明，我们提出的PFD具有优越的承诺，这对最先进的方法表现了有利的方法。代码可在https://github.com/wangtaoas/pfd_net上获得

translated by 谷歌翻译

Short Range Correlation Transformer for Occluded Person Re-Identification

Yunbin Zhao , Songhao Zhu , Dongsheng Wang , Zhiwei Liang

分类：计算机视觉 | 人工智能

2022-01-04

闭塞者重新识别是计算机视觉的具有挑战性的领域，这面临效率低下特征表示和低识别准确性等问题。卷积神经网络更加关注局部特征的提取，因此难以提取遮挡行人的特征，效果并不满足。最近，视觉变压器被引入重新识别领域，并通过构建补丁序列之间的全局特征的关系来实现最先进的结果。然而，视觉变压器在提取局部特征方面的性能不如卷积神经网络的性能。因此，我们设计了一个名为PFT的基于部分特征变换器的人重新识别框架。所提出的PFT采用三个模块来提高视觉变压器的效率。（1）补丁全维增强模块。我们设计一种具有与补丁序列相同的尺寸的学习张量，这是全维性和深度嵌入在补丁序列中，以丰富训练样本的多样性。（2）融合与重建模块。我们提取获得的补丁序列的不太重要的部分，并用原始补丁序列融合它们以重建原始补丁序列。（3）空间切片模块。从空间方向切片和组贴片序列，可以有效地提高贴片序列的短距离相关性。封闭和整体重新识别数据集的实验结果表明，所提出的PFT网络始终如一地实现了卓越的性能，优于最先进的方法。

translated by 谷歌翻译

Deep Learning for Person Re-identification: A Survey and Outlook

Mang Ye , Jianbing Shen , Gaojie Lin , Tao Xiang , Ling Shao , Steven C. H. Hoi

分类：

2020-01-13

Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.

translated by 谷歌翻译

Deep learning-based person re-identification methods: A survey and outlook of recent works

Zhangqiang Ming , Min Zhu , Xiangkun Wang , Jiamin Zhu , Junlong Cheng , Yong Yang , Xiaoyong Wei

分类：计算机视觉

2021-10-10

近年来，随着对公共安全的需求越来越多，智能监测网络的快速发展，人员重新识别（RE-ID）已成为计算机视野领域的热门研究主题之一。人员RE-ID的主要研究目标是从不同的摄像机中检索具有相同身份的人。但是，传统的人重新ID方法需要手动标记人的目标，这消耗了大量的劳动力成本。随着深度神经网络的广泛应用，出现了许多基于深入的基于学习的人物的方法。因此，本文促进研究人员了解最新的研究成果和该领域的未来趋势。首先，我们总结了对几个最近公布的人的研究重新ID调查，并补充了系统地分类基于深度学习的人的重新ID方法的最新研究方法。其次，我们提出了一种多维分类，根据度量标准和表示学习，将基于深度学习的人的重新ID方法分为四类，包括深度度量学习，本地特征学习，生成的对抗学习和序列特征学习的方法。此外，我们根据其方法和动机来细分以上四类，讨论部分子类别的优缺点。最后，我们讨论了一些挑战和可能的研究方向的人重新ID。

translated by 谷歌翻译

Multiplex-detection Based Multiple Instance Learning Network for Whole Slide Image Classification

Zhikang Wang , Yue Bi , Tong Pan , Chris Bain , Richard Bassed , Seiya Imoto , Jianhua Yao , Jiangning Song

分类：计算机视觉

2022-08-06

多个实例学习（MIL）是对诊断病理学的整个幻灯片图像（WSI）进行分类的强大方法。 MIL对WSI分类的基本挑战是发现触发袋子标签的\ textit {critical Instances}。但是，先前的方法主要是在独立和相同的分布假设（\ textit {i.i.d}）下设计的，忽略了肿瘤实例或异质性之间的相关性。在本文中，我们提出了一种新颖的基于多重检测的多重实例学习（MDMIL）来解决上述问题。具体而言，MDMIL是由内部查询产生模块（IQGM）和多重检测模块（MDM）构建的，并在训练过程中基于内存的对比度损失的辅助。首先，IQGM给出了实例的概率，并通过在分布分析后汇总高度可靠的功能来为后续MDM生成内部查询（IQ）。其次，在MDM中，多重检测交叉注意（MDCA）和多头自我注意力（MHSA）合作以生成WSI的最终表示形式。在此过程中，智商和可训练的变异查询（VQ）成功建立了实例之间的联系，并显着提高了模型对异质肿瘤的鲁棒性。最后，为了进一步在特征空间中实施限制并稳定训练过程，我们采用基于内存的对比损失，即使在每次迭代中有一个样本作为输入，也可以实现WSI分类。我们对三个计算病理数据集进行实验，例如CamelyOn16，TCGA-NSCLC和TCGA-RCC数据集。优越的准确性和AUC证明了我们提出的MDMIL比其他最先进方法的优越性。

translated by 谷歌翻译

Learning to Disentangle Scenes for Person Re-identification

Xianghao Zang , Ge Li , Wei Gao , Xiujun Shu

分类：计算机视觉

2021-11-10

人重新识别（Reid）任务中存在许多具有挑战性的问题，例如遮挡和比例变化。现有的作品通常试图通过使用单分支网络来解决这些问题。这一分支网络需要对各种具有挑战性的问题强大，这使得该网络覆盖。本文建议分割和征服Reid任务。为此目的，我们采用了几种自我监督操作来模拟不同的具有挑战性问题，并使用不同的网络处理每个具有挑战性的问题。具体地，我们使用随机擦除操作并提出一种新的随机缩放操作来产生具有可控特性的新图像。介绍了一般的多分支网络，包括一个主分支和两个仆人分支，以处理不同的场景。这些分支机构学习协同性并实现不同的感知能力。通过这种方式，Reid任务中的复杂场景得到有效地解散，每个分支的负担都被释放。来自广泛实验的结果表明，该方法在三个Reid基准和两个遮挡的Reid基准上实现了最先进的表演。消融研究还表明，拟议的方案和操作显着提高了各种场景的性能。

translated by 谷歌翻译

Learning Context-Aware Embedding for Person Search

Shihui Chen , Yueqing Zhuang , Boxun Li

分类：计算机视觉

2021-11-29

人员搜索是一个有关的任务，旨在共同解决人员检测和人员重新识别（RE-ID）。虽然最先前的方法侧重于学习稳健的个人功能，但由于照明，大构成方差和遮挡，仍然很难区分令人困惑的人。上下文信息实际上是人们搜索任务，这些任务在减少混淆方面搜索。为此，我们提出了一个名为注意上下文感知嵌入（ACAE）的新颖的上下文特征头，这增强了上下文信息。 Acae反复审查图像内部和图像内的该人员，以查找类似的行人模式，允许它隐含地学会找到可能的共同旅行者和有效地模范上下文相关的实例的关系。此外，我们提出了图像记忆库来提高培训效率。实验上，ACAE在基于不同的一步法时显示出广泛的促销。我们的整体方法实现了最先进的结果与先前的一步法。

translated by 谷歌翻译

Weakly-supervised Part-Attention and Mentored Networks for Vehicle Re-Identification

Lisha Tang , Yi Wang , Lap-Pui Chau

分类：计算机视觉

2021-07-17

车辆重新识别（RE-ID）旨在通过不同的摄像机检索具有相同车辆ID的图像。当前的零件级特征学习方法通常通过统一的部门，外部工具或注意力建模来检测车辆零件。但是，此部分功能通常需要昂贵的额外注释，并在不可靠的零件遮罩预测的情况下导致次优性能。在本文中，我们提出了一个针对车辆重新ID的弱监督零件注意网络（Panet）和零件式网络（PMNET）。首先，Panet通过与零件相关的通道重新校准和基于群集的掩模生成无需车辆零件监管信息来定位车辆零件。其次，PMNET利用教师指导的学习来从锅et中提取特定于车辆的特定功能，并进行多尺度的全球零件特征提取。在推断过程中，PMNET可以自适应提取歧视零件特征，而无需围绕锅et定位，从而防止了不稳定的零件掩模预测。我们将重新ID问题作为一个多任务问题，并采用同质的不确定性来学习最佳的ID损失权衡。实验是在两个公共基准上进行的，这表明我们的方法优于最近的方法，这不需要额外的注释，即CMC@5的平均增加3.0％，而Veri776的MAP中不需要超过1.4％。此外，我们的方法可以扩展到遮挡的车辆重新ID任务，并具有良好的概括能力。

translated by 谷歌翻译

Deep Multimodal Fusion for Generalizable Person Re-identification

Suncheng Xiang , Hao Chen , Wei Ran , Zefang Yu , Ting Liu , Dahong Qian , Yuzhuo Fu

分类：计算机视觉

2022-11-02

Person re-identification plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. Recently, leveraging the supervised or semi-unsupervised learning paradigms, which benefits from the large-scale datasets and strong computing performance, has achieved a competitive performance on a specific target domain. However, when Re-ID models are directly deployed in a new domain without target samples, they always suffer from considerable performance degradation and poor domain generalization. To address this challenge, we propose a Deep Multimodal Fusion network to elaborate rich semantic knowledge for assisting in representation learning during the pre-training. Importantly, a multimodal fusion strategy is introduced to translate the features of different modalities into the common space, which can significantly boost generalization capability of Re-ID model. As for the fine-tuning stage, a realistic dataset is adopted to fine-tune the pre-trained model for better distribution alignment with real-world data. Comprehensive experiments on benchmarks demonstrate that our method can significantly outperform previous domain generalization or meta-learning methods with a clear margin. Our source code will also be publicly available at https://github.com/JeremyXSC/DMF.

translated by 谷歌翻译

Body Part-Based Representation Learning for Occluded Person Re-Identification

Vladimir Somers , Christophe De Vleeschouwer , Alexandre Alahi

分类：计算机视觉

2022-11-07

Occluded person re-identification (ReID) is a person retrieval task which aims at matching occluded person images with holistic ones. For addressing occluded ReID, part-based methods have been shown beneficial as they offer fine-grained information and are well suited to represent partially visible human bodies. However, training a part-based model is a challenging task for two reasons. Firstly, individual body part appearance is not as discriminative as global appearance (two distinct IDs might have the same local appearance), this means standard ReID training objectives using identity labels are not adapted to local feature learning. Secondly, ReID datasets are not provided with human topographical annotations. In this work, we propose BPBreID, a body part-based ReID model for solving the above issues. We first design two modules for predicting body part attention maps and producing body part-based features of the ReID target. We then propose GiLt, a novel training scheme for learning part-based representations that is robust to occlusions and non-discriminative local appearance. Extensive experiments on popular holistic and occluded datasets show the effectiveness of our proposed method, which outperforms state-of-the-art methods by 0.7% mAP and 5.6% rank-1 accuracy on the challenging Occluded-Duke dataset. Our code is available at https://github.com/VlSomers/bpbreid.

translated by 谷歌翻译

A Survey of Face Recognition

Xinyi Wang , Jianteng Peng , Sufang Zhang , Bihui Chen , Yi Wang , Yandong Guo

分类：计算机视觉

2022-12-26

Recent years witnessed the breakthrough of face recognition with deep convolutional neural networks. Dozens of papers in the field of FR are published every year. Some of them were applied in the industrial community and played an important role in human life such as device unlock, mobile payment, and so on. This paper provides an introduction to face recognition, including its history, pipeline, algorithms based on conventional manually designed features or deep learning, mainstream training, evaluation datasets, and related applications. We have analyzed and compared state-of-the-art works as many as possible, and also carefully designed a set of experiments to find the effect of backbone size and data distribution. This survey is a material of the tutorial named The Practical Face Recognition Technology in the Industrial World in the FG2023.

translated by 谷歌翻译

Learning Discriminative Features with Multiple Granularities for Person Re-Identification

Guanshuo Wang , Yufeng Yuan , Xiong Chen , Jiwei Li , Xi Zhou

分类：

2018-04-04

The combination of global and partial features has been an essential solution to improve discriminative performances in person re-identification (Re-ID) tasks. Previous part-based methods mainly focus on locating regions with specific pre-defined semantics to learn local representations, which increases learning difficulty but not efficient or robust to scenarios with large variances. In this paper, we propose an end-to-end feature learning strategy integrating discriminative information with various granularities. We carefully design the Multiple Granularity Network (MGN), a multi-branch deep network architecture consisting of one branch for global feature representations and two branches for local feature representations. Instead of learning on semantic regions, we uniformly partition the images into several stripes, and vary the number of parts in different local branches to obtain local feature representations with multiple granularities. Comprehensive experiments implemented on the mainstream evaluation datasets including Market-1501, DukeMTMC-reid and CUHK03 indicate that our method robustly achieves state-of-the-art performances and outperforms any existing approaches by a large margin. For example, on Market-1501 dataset in single query mode, we obtain a top result of Rank-1/mAP=96.6%/94.2% with this method after re-ranking.

translated by 谷歌翻译

Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification

Ziyi Tang , Ruimao Zhang , Zhanglin Peng , Jinrui Chen , Liang Lin

分类：计算机视觉

2023-01-02

In recent years, the Transformer architecture has shown its superiority in the video-based person re-identification task. Inspired by video representation learning, these methods mainly focus on designing modules to extract informative spatial and temporal features. However, they are still limited in extracting local attributes and global identity information, which are critical for the person re-identification task. In this paper, we propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules to address the above issue. Specifically, MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips, respectively, achieving the holistic perception of the input person. We combine the outputs of all the stages for the final identification. In practice, to save the computational cost, the Spatial-Temporal Aggregation (STA) modules are first adopted in each stage to conduct the self-attention operations along the spatial and temporal dimensions separately. We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages. All of them are realized by employing newly designed self-attention operations with specific meanings. Moreover, temporal patch shuffling is also introduced to further improve the robustness of the model. Extensive experimental results demonstrate the effectiveness of the proposed modules in extracting the informative and discriminative information from the videos, and illustrate the MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.

translated by 谷歌翻译