In person re-identification (ReID) tasks, many works explore the learning of part features to improve the performance over global image features. Existing methods extract part features in an explicit manner, by either using a hand-designed image division or keypoints obtained with external visual systems. In this work, we propose to learn Discriminative implicit Parts (DiPs) which are decoupled from explicit body parts. Therefore, DiPs can learn to extract any discriminative features that can benefit in distinguishing identities, which is beyond predefined body parts (such as accessories). Moreover, we propose a novel implicit position to give a geometric interpretation for each DiP. The implicit position can also serve as a learning signal to encourage DiPs to be more position-equivariant with the identity in the image. Lastly, a set of attributes and auxiliary losses are introduced to further improve the learning of DiPs. Extensive experiments show that the proposed method achieves state-of-the-art performance on multiple person ReID benchmarks.
translated by 谷歌翻译
闭塞者重新识别(REID)旨在匹配遮挡人物在不同的相机视图上的整体上。目标行人(TP)通常受到非行人闭塞(NPO)和Nontarget行人(NTP)的干扰。以前的方法主要集中在忽略NTP的特征污染的同时越来越越来越多的模型对非NPO的鲁棒性。在本文中,我们提出了一种新颖的特征擦除和扩散网络(FED),同时处理NPO和NTP。具体地,我们的建议闭塞擦除模块(OEM)消除了NPO特征,并由NPO增强策略辅助,该策略模拟整体行人图像上的NPO并产生精确的遮挡掩模。随后,我们随后,我们将行人表示与其他记忆特征弥散,以通过学习的跨关注机构通过新颖的特征扩散模块(FDM)实现的特征空间中的NTP特征。随着OEM的闭塞分数的指导,特征扩散过程主要在可见的身体部位上进行,保证合成的NTP特性的质量。通过在我们提出的联邦网络中联合优化OEM和FDM,我们可以大大提高模型对TP的看法能力,并减轻NPO和NTP的影响。此外,所提出的FDM仅用作用于训练的辅助模块,并将在推理阶段中丢弃,从而引入很少的推理计算开销。遮挡和整体人员Reid基准的实验表明了美联储最先进的优越性,喂养的含量在封闭式封闭的内容上取得了86.3%的排名 - 1准确性,超过其他人至少4.7%。
translated by 谷歌翻译
尽管近年来人的重新识别取得了令人印象深刻的改善,但在实际应用程序场景中,由不同的障碍引起的常见闭塞案例仍然是一个不稳定的问题。现有方法主要通过采用额外网络提供的身体线索来区分可见部分,以解决此问题。然而,助理模型和REID数据集之间的不可避免的域间隙极大地增加了获得有效和有效模型的困难。为了摆脱额外的预训练网络并在端到端可训练网络中实现自动对齐,我们根据两个不言而喻的先验知识提出了一种新型的动态原型掩码(DPM)。具体而言,我们首先设计了一个层次蒙版生成器,该层面生成器利用层次的语义选择高质量的整体原型和闭塞输入图像的特征表示之间的可见图案空间。在这种情况下,可以自发地在选定的子空间中很好地对齐。然后,为了丰富高质量整体原型的特征表示并提供更完整的特征空间,我们引入了一个头部丰富模块,以鼓励不同的头部在整个图像中汇总不同的模式表示。对被遮挡和整体人员重新识别基准进行的广泛的实验评估证明了DPM优于最先进的方法。该代码在https://github.com/stone96123/dpm上发布。
translated by 谷歌翻译
Occluded person re-identification (ReID) is a person retrieval task which aims at matching occluded person images with holistic ones. For addressing occluded ReID, part-based methods have been shown beneficial as they offer fine-grained information and are well suited to represent partially visible human bodies. However, training a part-based model is a challenging task for two reasons. Firstly, individual body part appearance is not as discriminative as global appearance (two distinct IDs might have the same local appearance), this means standard ReID training objectives using identity labels are not adapted to local feature learning. Secondly, ReID datasets are not provided with human topographical annotations. In this work, we propose BPBreID, a body part-based ReID model for solving the above issues. We first design two modules for predicting body part attention maps and producing body part-based features of the ReID target. We then propose GiLt, a novel training scheme for learning part-based representations that is robust to occlusions and non-discriminative local appearance. Extensive experiments on popular holistic and occluded datasets show the effectiveness of our proposed method, which outperforms state-of-the-art methods by 0.7% mAP and 5.6% rank-1 accuracy on the challenging Occluded-Duke dataset. Our code is available at https://github.com/VlSomers/bpbreid.
translated by 谷歌翻译
闭塞者重新识别是计算机视觉的具有挑战性的领域,这面临效率低下特征表示和低识别准确性等问题。卷积神经网络更加关注局部特征的提取,因此难以提取遮挡行人的特征,效果并不满足。最近,视觉变压器被引入重新识别领域,并通过构建补丁序列之间的全局特征的关系来实现最先进的结果。然而,视觉变压器在提取局部特征方面的性能不如卷积神经网络的性能。因此,我们设计了一个名为PFT的基于部分特征变换器的人重新识别框架。所提出的PFT采用三个模块来提高视觉变压器的效率。 (1)补丁全维增强模块。我们设计一种具有与补丁序列相同的尺寸的学习张量,这是全维性和深度嵌入在补丁序列中,以丰富训练样本的多样性。 (2)融合与重建模块。我们提取获得的补丁序列的不太重要的部分,并用原始补丁序列融合它们以重建原始补丁序列。 (3)空间切片模块。从空间方向切片和组贴片序列,可以有效地提高贴片序列的短距离相关性。封闭和整体重新识别数据集的实验结果表明,所提出的PFT网络始终如一地实现了卓越的性能,优于最先进的方法。
translated by 谷歌翻译
被遮挡的人重新识别是一个具有挑战性的任务,因为某些场景中的某些障碍(例如树木,汽车和行人)封闭人体部分。一些现有的姿势引导方法通过根据图形匹配对准身体部位来解决这个问题,但这些基于图的方法不直观和复杂。因此,我们提出了一种基于变压器的姿态引导特征解除留出(PFD)方法,通过利用姿势信息来清楚地解散语义部件(例如人体或关节部件)并相应地选择性地匹配非封闭部分。首先,视觉变压器(VIV)用于提取具有强大功能的贴片功能。其次,为了从补丁信息预先解散姿势信息,匹配和分配机制在姿势引导特征聚合(PFA)模块中利用。第三,在变压器解码器中引入了一组学习的语义视图,以隐式增强解除戒备的身体部位特征。然而,没有额外监督,那些语义视图并不保证与身体相关。因此,提出了姿势视图匹配(PVM)模块以明确匹配可见的身体部位并自动分离遮挡功能。第四,为了更好地防止闭塞的干扰,我们设计了一个姿势引导的推动损失,强调了可见的身体部位的特征。对于两个任务(封闭和整体RE-ID)的五个具有挑战性的数据集进行了广泛的实验表明,我们提出的PFD具有优越的承诺,这对最先进的方法表现了有利的方法。代码可在https://github.com/wangtaoas/pfd_net上获得
translated by 谷歌翻译
遮挡对人重新识别(Reid)构成了重大挑战。现有方法通常依赖于外部工具来推断可见的身体部位,这在计算效率和Reid精度方面可能是次优。特别是,在面对复杂的闭塞时,它们可能会失败,例如行人之间的遮挡。因此,在本文中,我们提出了一种名为M质量感知部分模型(QPM)的新方法,用于遮挡鲁棒Reid。首先,我们建议共同学习零件特征和预测部分质量分数。由于没有提供质量注释,我们介绍了一种自动将低分分配给闭塞体部位的策略,从而削弱了遮挡体零落在Reid结果上的影响。其次,基于预测部分质量分数,我们提出了一种新颖的身份感知空间关注(ISA)模块。在该模块中,利用粗略标识感知功能来突出目标行人的像素,以便处理行人之间的遮挡。第三,我们设计了一种自适应和有效的方法,用于了解来自每个图像对的共同非遮挡区域的全局特征。这种设计至关重要,但经常被现有方法忽略。 QPM有三个关键优势:1)它不依赖于培训或推理阶段的任何外部工具; 2)它处理由物体和其他行人引起的闭塞; 3)它是高度计算效率。对闭塞Reid的四个流行数据库的实验结果证明QPM始终如一地以显着的利润方式优于最先进的方法。 QPM代码将被释放。
translated by 谷歌翻译
人重新识别(Reid)旨在从不同摄像机捕获的图像中检索一个人。对于基于深度学习的REID方法,已经证明,使用本地特征与人物图像的全局特征可以帮助为人员检索提供强大的特征表示。人类的姿势信息可以提供人体骨架的位置,有效地指导网络在这些关键领域更加关注这些关键领域,也可能有助于减少来自背景或闭塞的噪音分散。然而,先前与姿势相关的作品提出的方法可能无法充分利用姿势信息的好处,并没有考虑不同当地特征的不同贡献。在本文中,我们提出了一种姿势引导图注意网络,一个多分支架构,包括一个用于全局特征的一个分支,一个用于中粒体特征的一个分支,一个分支用于细粒度关键点特征。我们使用预先训练的姿势估计器来生成本地特征学习的关键点热图,并仔细设计图表卷积层以通过建模相似关系来重新评估提取的本地特征的贡献权重。实验结果表明我们对歧视特征学习的方法的有效性,我们表明我们的模型在几个主流评估数据集上实现了最先进的表演。我们还对我们的网络进行了大量的消融研究和设计不同类型的比较实验,以证明其有效性和鲁棒性,包括整体数据集,部分数据集,遮挡数据集和跨域测试。
translated by 谷歌翻译
在亲自重新识别(REID)中,最近的研究已经验证了未标记的人图像上的模型的预训练要比ImageNet上要好得多。但是,这些研究直接应用了为图像分类设计的现有自我监督学习(SSL)方法,用于REID,而无需在框架中进行任何适应。这些SSL方法将本地视图的输出(例如红色T恤,蓝色短裤)与同时的全球视图相匹配,从而丢失了很多细节。在本文中,我们提出了一种特定于REID的预训练方法,部分意识的自我监督预训练(PASS),该方法可以生成零件级别的功能以提供细粒度的信息,并且更适合REID。通行证将图像分为几个局部区域,每个区域随机裁剪的本地视图都有特定的可学习[部分]令牌。另一方面,所有地方区域的[部分]也附加到全球视图中。通行证学习以匹配同一[部分]上本地视图的输出和全局视图。也就是说,从本地区域获得的本地视图的[部分]仅与从全球视图中学到的相应[部分]相匹配。结果,每个[部分]可以专注于图像的特定局部区域,并提取该区域的细粒度信息。实验显示通行证在Market1501和MSMT17上的新最先进的表演以及各种REID任务(例如Vanilla vit-s/16)通过Pass Achieves 92.2 \%/90.2 \%/88.5 \%地图准确性,例如Vanilla vit-s/16在Market1501上进行监督/UDA/USL REID。我们的代码可在https://github.com/casia-iva-lab/pass-reid上找到。
translated by 谷歌翻译
Person re-identification plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. Recently, leveraging the supervised or semi-unsupervised learning paradigms, which benefits from the large-scale datasets and strong computing performance, has achieved a competitive performance on a specific target domain. However, when Re-ID models are directly deployed in a new domain without target samples, they always suffer from considerable performance degradation and poor domain generalization. To address this challenge, we propose a Deep Multimodal Fusion network to elaborate rich semantic knowledge for assisting in representation learning during the pre-training. Importantly, a multimodal fusion strategy is introduced to translate the features of different modalities into the common space, which can significantly boost generalization capability of Re-ID model. As for the fine-tuning stage, a realistic dataset is adopted to fine-tune the pre-trained model for better distribution alignment with real-world data. Comprehensive experiments on benchmarks demonstrate that our method can significantly outperform previous domain generalization or meta-learning methods with a clear margin. Our source code will also be publicly available at https://github.com/JeremyXSC/DMF.
translated by 谷歌翻译
被遮挡的人重新识别(RE-ID)旨在解决跨多个摄像机感兴趣的人时解决遮挡问题。随着深度学习技术的促进和对智能视频监视的需求的不断增长,现实世界应用中的频繁闭塞使闭塞的人重新引起了研究人员的极大兴趣。已经提出了大量封闭的人重新ID方法,而很少有针对遮挡的调查。为了填补这一空白并有助于提高未来的研究,本文提供了对封闭者重新ID的系统调查。通过对人体闭塞的深入分析,发现大多数现有方法仅考虑一部分闭塞问题。因此,我们从问题和解决方案的角度回顾了与闭塞相关的人重新ID方法。我们总结了个人重新闭塞引起的四个问题,即位置错位,规模错位,嘈杂的信息和缺失的信息。然后对解决不同问题的闭塞相关方法进行分类和引入。之后,我们总结并比较了四个流行数据集上最近被遮挡的人重新ID方法的性能:部分reid,部分易边,咬合 - 固定和遮挡的dukemtmc。最后,我们提供了有关有希望的未来研究方向的见解。
translated by 谷歌翻译
In recent years, the Transformer architecture has shown its superiority in the video-based person re-identification task. Inspired by video representation learning, these methods mainly focus on designing modules to extract informative spatial and temporal features. However, they are still limited in extracting local attributes and global identity information, which are critical for the person re-identification task. In this paper, we propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules to address the above issue. Specifically, MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips, respectively, achieving the holistic perception of the input person. We combine the outputs of all the stages for the final identification. In practice, to save the computational cost, the Spatial-Temporal Aggregation (STA) modules are first adopted in each stage to conduct the self-attention operations along the spatial and temporal dimensions separately. We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages. All of them are realized by employing newly designed self-attention operations with specific meanings. Moreover, temporal patch shuffling is also introduced to further improve the robustness of the model. Extensive experimental results demonstrate the effectiveness of the proposed modules in extracting the informative and discriminative information from the videos, and illustrate the MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
translated by 谷歌翻译
变压器在许多视觉任务上表现出优选的性能。然而,对于人的任务重新识别(Reid),Vanilla变形金刚将丰富的背景留下了高阶特征关系,这是由于行人的戏剧性变化而不足的局部特征细节。在这项工作中,我们提出了一个全部关系高阶变压器(OH-Figrain)来模拟Reid的全系关系功能。首先,为了加强视觉表示的能力,而不是基于每个空间位置的对查询和隔离键获得注意矩阵,我们进一步逐步以模拟非本地机制的高阶统计信息。我们以先前的混合机制在每个订单的相应层中共享注意力,以降低计算成本。然后,提出了一种基于卷积的本地关系感知模块来提取本地关系和2D位置信息。我们模型的实验结果是优越的有前途,其在市场上显示出最先进的性能-1501,Dukemtmc,MSMT17和occluded-Duke数据集。
translated by 谷歌翻译
许多现有人员的重新识别(RE-ID)方法取决于特征图,这些特征图可以分区以定位一个人的部分或减少以创建全球表示形式。尽管部分定位已显示出显着的成功,但它使用了基于位置的分区或静态特征模板。但是,这些假设假设零件在给定图像或其位置中的先前存在,忽略了特定于图像的信息,这些信息限制了其在挑战性场景中的可用性,例如用部分遮挡和部分探针图像进行重新添加。在本文中,我们介绍了一个基于空间注意力的动态零件模板初始化模块,该模块在主链的早期层中使用中级语义特征动态生成零件序列。遵循自发注意力的层,使用简化的跨注意方案来使用主链的人体部分特征来提取各种人体部位的模板特征,提高整个模型的判别能力。我们进一步探索零件描述符的自适应加权,以量化局部属性的缺失或阻塞,并抑制相应零件描述子对匹配标准的贡献。关于整体,遮挡和部分重新ID任务基准的广泛实验表明,我们提出的架构能够实现竞争性能。代码将包含在补充材料中,并将公开提供。
translated by 谷歌翻译
人重新识别(Reid)任务中存在许多具有挑战性的问题,例如遮挡和比例变化。现有的作品通常试图通过使用单分支网络来解决这些问题。这一分支网络需要对各种具有挑战性的问题强大,这使得该网络覆盖。本文建议分割和征服Reid任务。为此目的,我们采用了几种自我监督操作来模拟不同的具有挑战性问题,并使用不同的网络处理每个具有挑战性的问题。具体地,我们使用随机擦除操作并提出一种新的随机缩放操作来产生具有可控特性的新图像。介绍了一般的多分支网络,包括一个主分支和两个仆人分支,以处理不同的场景。这些分支机构学习协同性并实现不同的感知能力。通过这种方式,Reid任务中的复杂场景得到有效地解散,每个分支的负担都被释放。来自广泛实验的结果表明,该方法在三个Reid基准和两个遮挡的Reid基准上实现了最先进的表演。消融研究还表明,拟议的方案和操作显着提高了各种场景的性能。
translated by 谷歌翻译
Pre-trained vision-language models like CLIP have recently shown superior performances on various downstream tasks, including image classification and segmentation. However, in fine-grained image re-identification (ReID), the labels are indexes, lacking concrete text descriptions. Therefore, it remains to be determined how such models could be applied to these tasks. This paper first finds out that simply fine-tuning the visual model initialized by the image encoder in CLIP, has already obtained competitive performances in various ReID tasks. Then we propose a two-stage strategy to facilitate a better visual representation. The key idea is to fully exploit the cross-modal description ability in CLIP through a set of learnable text tokens for each ID and give them to the text encoder to form ambiguous descriptions. In the first training stage, image and text encoders from CLIP keep fixed, and only the text tokens are optimized from scratch by the contrastive loss computed within a batch. In the second stage, the ID-specific text tokens and their encoder become static, providing constraints for fine-tuning the image encoder. With the help of the designed loss in the downstream task, the image encoder is able to represent data as vectors in the feature embedding accurately. The effectiveness of the proposed strategy is validated on several datasets for the person or vehicle ReID tasks. Code is available at https://github.com/Syliz517/CLIP-ReID.
translated by 谷歌翻译
从图像中学习代表,健壮和歧视性信息对于有效的人重新识别(RE-ID)至关重要。在本文中,我们提出了一种基于身体和手部图像的人重新ID的端到端判别深度学习的复合方法。我们仔细设计了本地感知的全球注意力网络(Laga-Net),这是一个多分支深度网络架构,由一个用于空间注意力的分支组成,一个用于渠道注意。注意分支集中在图像的相关特征上,同时抑制了无关紧要的背景。为了克服注意力机制的弱点,与像素改组一样,我们将相对位置编码整合到空间注意模块中以捕获像素的空间位置。全球分支机构打算保留全球环境或结构信息。对于打算捕获细粒度信息的本地分支,我们进行统一的分区以水平在Conv-Layer上生成条纹。我们通过执行软分区来检索零件,而无需明确分区图像或需要外部线索,例如姿势估计。一组消融研究表明,每个组件都会有助于提高拉加网络的性能。对四个受欢迎的人体重新ID基准和两个公开可用的手数据集的广泛评估表明,我们的建议方法始终优于现有的最新方法。
translated by 谷歌翻译
Person re-identification is a challenging task because of the high intra-class variance induced by the unrestricted nuisance factors of variations such as pose, illumination, viewpoint, background, and sensor noise. Recent approaches postulate that powerful architectures have the capacity to learn feature representations invariant to nuisance factors, by training them with losses that minimize intra-class variance and maximize inter-class separation, without modeling nuisance factors explicitly. The dominant approaches use either a discriminative loss with margin, like the softmax loss with the additive angular margin, or a metric learning loss, like the triplet loss with batch hard mining of triplets. Since the softmax imposes feature normalization, it limits the gradient flow supervising the feature embedding. We address this by joining the losses and leveraging the triplet loss as a proxy for the missing gradients. We further improve invariance to nuisance factors by adding the discriminative task of predicting attributes. Our extensive evaluation highlights that when only a holistic representation is learned, we consistently outperform the state-of-the-art on the three most challenging datasets. Such representations are easier to deploy in practical systems. Finally, we found that joining the losses removes the requirement for having a margin in the softmax loss while increasing performance.
translated by 谷歌翻译
细粒度的视觉分类(FGVC)旨在识别类似下属类别的对象,这对于人类的准确自动识别需求而言是挑战性和实用性的。大多数FGVC方法都集中在判别区域开采的注意力机制研究上,同时忽略了它们的相互依赖性和组成的整体对象结构,这对于模型的判别信息本地化和理解能力至关重要。为了解决上述限制,我们建议结构信息建模变压器(SIM-TRANS)将对象结构信息纳入变压器,以增强判别性表示学习,以包含外观信息和结构信息。具体而言,我们将图像编码为一系列贴片令牌,并使用两个精心设计的模块构建强大的视觉变压器框架:(i)提出了结构信息学习(SIL)模块以挖掘出在该模块中的空间上下文关系,对象范围借助变压器的自我发项权重,进一步注入导入结构信息的模型; (ii)引入了多级特征增强(MFB)模块,以利用类中多级特征和对比度学习的互补性,以增强功能鲁棒性,以获得准确的识别。提出的两个模块具有轻加权,可以插入任何变压器网络并轻松地端到端训练,这仅取决于视觉变压器本身带来的注意力重量。广泛的实验和分析表明,所提出的SIM-TRANS在细粒度的视觉分类基准上实现了最先进的性能。该代码可在https://github.com/pku-icst-mipl/sim-trans_acmmm2022上获得。
translated by 谷歌翻译
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.
translated by 谷歌翻译