智能论文笔记

Benchmarking person re-identification datasets and approaches for practical real-world implementations

Jose Huaman , Felix O. Sumari , Luigy Machaca , Esteban Clua , Joris Guerin

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-20

Recently, Person Re-Identification (Re-ID) has received a lot of attention. Large datasets containing labeled images of various individuals have been released, allowing researchers to develop and test many successful approaches. However, when such Re-ID models are deployed in new cities or environments, the task of searching for people within a network of security cameras is likely to face an important domain shift, thus resulting in decreased performance. Indeed, while most public datasets were collected in a limited geographic area, images from a new city present different features (e.g., people's ethnicity and clothing style, weather, architecture, etc.). In addition, the whole frames of the video streams must be converted into cropped images of people using pedestrian detection models, which behave differently from the human annotators who created the dataset used for training. To better understand the extent of this issue, this paper introduces a complete methodology to evaluate Re-ID approaches and training datasets with respect to their suitability for unsupervised deployment for live operations. This method is used to benchmark four Re-ID approaches on three datasets, providing insight and guidelines that can help to design better Re-ID pipelines in the future.

translated by 谷歌翻译

TrADe Re-ID -- Live Person Re-Identification using Tracking and Anomaly Detection

Luigy Machaca , F. Oliver Sumari H , Jose Huaman , Esteban Clua , Joris Guerin

分类：计算机视觉 | 人工智能 | 机器学习

2022-09-14

人重新识别（RE-ID）旨在在相机网络中寻找感兴趣的人（查询）。在经典的重新设置中，查询查询在包含整个身体的正确裁剪图像的画廊中。最近，引入了实时重新ID设置，以更好地代表Re-ID的实际应用上下文。它包括在简短的视频中搜索查询，其中包含整个场景帧。最初的实时重新ID基线使用行人探测器来构建大型搜索库和经典的重新ID模型，以在画廊中找到查询。但是，产生的画廊太大，包含低质量的图像，从而降低了现场重新ID性能。在这里，我们提出了一种称为贸易的新现场重新ID方法，以产生较低的高质量画廊。贸易首先使用跟踪算法来识别画廊中同一个人的图像序列。随后，使用异常检测模型选择每个轨道的单个良好代表。贸易已在PRID-2011数据集的实时重新ID版本上进行了验证，并显示出比基线的显着改进。

translated by 谷歌翻译

Deep learning-based person re-identification methods: A survey and outlook of recent works

Zhangqiang Ming , Min Zhu , Xiangkun Wang , Jiamin Zhu , Junlong Cheng , Yong Yang , Xiaoyong Wei

分类：计算机视觉

2021-10-10

近年来，随着对公共安全的需求越来越多，智能监测网络的快速发展，人员重新识别（RE-ID）已成为计算机视野领域的热门研究主题之一。人员RE-ID的主要研究目标是从不同的摄像机中检索具有相同身份的人。但是，传统的人重新ID方法需要手动标记人的目标，这消耗了大量的劳动力成本。随着深度神经网络的广泛应用，出现了许多基于深入的基于学习的人物的方法。因此，本文促进研究人员了解最新的研究成果和该领域的未来趋势。首先，我们总结了对几个最近公布的人的研究重新ID调查，并补充了系统地分类基于深度学习的人的重新ID方法的最新研究方法。其次，我们提出了一种多维分类，根据度量标准和表示学习，将基于深度学习的人的重新ID方法分为四类，包括深度度量学习，本地特征学习，生成的对抗学习和序列特征学习的方法。此外，我们根据其方法和动机来细分以上四类，讨论部分子类别的优缺点。最后，我们讨论了一些挑战和可能的研究方向的人重新ID。

translated by 谷歌翻译

Deep Learning for Person Re-identification: A Survey and Outlook

Mang Ye , Jianbing Shen , Gaojie Lin , Tao Xiang , Ling Shao , Steven C. H. Hoi

分类：

2020-01-13

Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.

translated by 谷歌翻译

Proceedings of the 3rd International Workshop on Reading Music Systems

Jorge Calvo-Zaragoza , Alexander Pacha

分类：计算机视觉 | 机器学习

2022-12-01

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.

translated by 谷歌翻译

Visual Object Tracking in First Person Vision

Matteo Dunnhofer , Antonino Furnari , Giovanni Maria Farinella , Christian Micheloni

分类：计算机视觉

2022-09-27

对人类对象相互作用的理解在第一人称愿景（FPV）中至关重要。遵循相机佩戴者操纵的对象的视觉跟踪算法可以提供有效的信息，以有效地建模此类相互作用。在过去的几年中，计算机视觉社区已大大提高了各种目标对象和场景的跟踪算法的性能。尽管以前有几次尝试在FPV域中利用跟踪器，但仍缺少对最先进跟踪器的性能的有条理分析。这项研究差距提出了一个问题，即应使用当前的解决方案``现成''还是应进行更多特定领域的研究。本文旨在为此类问题提供答案。我们介绍了FPV中单个对象跟踪的首次系统研究。我们的研究广泛分析了42个算法的性能，包括通用对象跟踪器和基线FPV特定跟踪器。分析是通过关注FPV设置的不同方面，引入新的绩效指标以及与FPV特定任务有关的。这项研究是通过引入Trek-150（由150个密集注释的视频序列组成的新型基准数据集）来实现的。我们的结果表明，FPV中的对象跟踪对当前的视觉跟踪器构成了新的挑战。我们强调了导致这种行为的因素，并指出了可能的研究方向。尽管遇到了困难，但我们证明了跟踪器为需要短期对象跟踪的FPV下游任务带来好处。我们预计，随着新的和FPV特定的方法学会得到研究，通用对象跟踪将在FPV中受欢迎。

translated by 谷歌翻译

Scalable person re-identification: A benchmark

分类：

This paper contributes a new high quality dataset for person re-identification, named "Market-1501". Generally, current datasets: 1) are limited in scale; 2) consist of hand-drawn bboxes, which are unavailable under realistic settings; 3) have only one ground truth and one query image for each identity (close environment). To tackle these problems, the proposed Market-1501 dataset is featured in three aspects. First, it contains over 32,000 annotated bboxes, plus a distractor set of over 500K images, making it the largest person re-id dataset to date. Second, images in Market-1501 dataset are produced using the Deformable Part Model (DPM) as pedestrian detector. Third, our dataset is collected in an open system, where each identity has multiple images under each camera.As a minor contribution, inspired by recent advances in large-scale image search, this paper proposes an unsupervised Bag-of-Words descriptor. We view person reidentification as a special task of image search. In experiment, we show that the proposed descriptor yields competitive accuracy on VIPeR, CUHK03, and Market-1501 datasets, and is scalable on the large-scale 500k dataset.

translated by 谷歌翻译

Re-ranking person re-identification with k-reciprocal encoding

分类：

When considering person re-identification (re-ID) as a retrieval process, re-ranking is a critical step to improve its accuracy. Yet in the re-ID community, limited effort has been devoted to re-ranking, especially those fully automatic, unsupervised solutions. In this paper, we propose a -reciprocal encoding method to re-rank the re-ID results. Our hypothesis is that if a gallery image is similar to the probe in the -reciprocal nearest neighbors, it is more likely to be a true match. Specifically, given an image, areciprocal feature is calculated by encoding its -reciprocal nearest neighbors into a single vector, which is used for reranking under the Jaccard distance. The final distance is computed as the combination of the original distance and the Jaccard distance. Our re-ranking method does not require any human interaction or any labeled data, so it is applicable to large-scale datasets. Experiments on the largescale Market-1501, CUHK03, MARS, and PRW datasets confirm the effectiveness of our method 1 .

translated by 谷歌翻译

Generalizable Re-Identification from Videos with Cycle Association

Zhongdao Wang , Zhaopeng Dou , Jingwei Zhang , Liang Zheng , Yifan Sun , Yali Li , Shengjin Wang

分类：计算机视觉

2022-11-07

In this paper, we are interested in learning a generalizable person re-identification (re-ID) representation from unlabeled videos. Compared with 1) the popular unsupervised re-ID setting where the training and test sets are typically under the same domain, and 2) the popular domain generalization (DG) re-ID setting where the training samples are labeled, our novel scenario combines their key challenges: the training samples are unlabeled, and collected form various domains which do no align with the test domain. In other words, we aim to learn a representation in an unsupervised manner and directly use the learned representation for re-ID in novel domains. To fulfill this goal, we make two main contributions: First, we propose Cycle Association (CycAs), a scalable self-supervised learning method for re-ID with low training complexity; and second, we construct a large-scale unlabeled re-ID dataset named LMP-video, tailored for the proposed method. Specifically, CycAs learns re-ID features by enforcing cycle consistency of instance association between temporally successive video frame pairs, and the training cost is merely linear to the data size, making large-scale training possible. On the other hand, the LMP-video dataset is extremely large, containing 50 million unlabeled person images cropped from over 10K Youtube videos, therefore is sufficient to serve as fertile soil for self-supervised learning. Trained on LMP-video, we show that CycAs learns good generalization towards novel domains. The achieved results sometimes even outperform supervised domain generalizable models. Remarkably, CycAs achieves 82.2% Rank-1 on Market-1501 and 49.0% Rank-1 on MSMT17 with zero human annotation, surpassing state-of-the-art supervised DG re-ID methods. Moreover, we also demonstrate the superiority of CycAs under the canonical unsupervised re-ID and the pretrain-and-finetune scenarios.

translated by 谷歌翻译

Large-Scale Spatio-Temporal Person Re-identification: Algorithms and Benchmark

Xiujun Shu , Xiao Wang , Xianghao Zang , Shiliang Zhang , Yuanqi Chen , Ge Li , Qi Tian

分类：计算机视觉

2021-05-31

具有大量空间和时间跨境的情景中的人重新识别（RE-ID）尚未完全探索。这部分原因是，现有的基准数据集主要由有限的空间和时间范围收集，例如，使用在校园特定区域的相机录制的视频中使用的视频。这种有限的空间和时间范围使得难以模拟真实情景中的人的困难。在这项工作中，我们贡献了一个新的大型时空上次最后一个数据集，包括10,862个图像，具有超过228k的图像。与现有数据集相比，最后一个具有挑战性和高度多样性的重新ID设置，以及显着更大的空间和时间范围。例如，每个人都可以出现在不同的城市或国家，以及在白天到夜间的各个时隙，以及春季到冬季的不同季节。为了我们的最佳知识，最后是一个新的Perse Re-ID数据集，具有最大的时空范围。基于最后，我们通过对14个RE-ID算法进行全面的绩效评估来验证其挑战。我们进一步提出了一种易于实施的基线，适用于如此挑战的重新ID设置。我们还验证了初步训练的模型可以在具有短期和更改方案的现有数据集中概括。我们期待持续激发未来的工程，以更现实和挑战的重新识别任务。有关DataSet的更多信息，请访问https://github.com/shuxjweb/last.git。

translated by 谷歌翻译

UPAR: Unified Pedestrian Attribute Recognition and Person Retrieval

Andreas Specker , Mickael Cormier , Jürgen Beyerer

分类：计算机视觉

2022-09-06

在视频监视和时尚检索中，识别软性识别人行人属性至关重要。最近的作品在单个数据集上显示了有希望的结果。然而，这些方法在不同属性分布，观点，不同的照明和低分辨率下的概括能力很少因当前数据集中的强偏差和变化属性而很少被理解。为了缩小这一差距并支持系统的调查，我们介绍了UPAR，即统一的人属性识别数据集。它基于四个知名人士属性识别数据集：PA100K，PETA，RAPV2和Market1501。我们通过提供3300万个附加注释来统一这些数据集，以在整个数据集中统一40个属性类别的40个重要二进制属性。因此，我们首次对可概括的行人属性识别以及基于属性的人检索进行研究。由于图像分布，行人姿势，规模和遮挡的巨大差异，现有方法在准确性和效率方面都受到了极大的挑战。此外，我们基于对正则化方法的彻底分析，为基于PAR和属性的人检索开发了强大的基线。我们的模型在PA100K，PETA，RAPV2，Market1501-Atributes和UPAR上的跨域和专业设置中实现了最先进的性能。我们相信UPAR和我们的强大基线将为人工智能界做出贡献，并促进有关大规模，可推广属性识别系统的研究。

translated by 谷歌翻译

Is Synthetic Dataset Reliable for Benchmarking Generalizable Person Re-Identification?

Cuicui Kang

分类：计算机视觉

2022-09-12

最近的研究表明，接受合成数据集培训的模型能够实现比在公共现实世界数据集中培训的更高概括的人重新识别（GPREID）性能。另一方面，由于现实世界中的人REID数据集的局限性，使用大规模合成数据集作为测试集对基准人REID算法也很重要且有趣。然而，这提出了一个关键的问题：合成数据集可靠地测试可概括的人重新识别吗？在文献中，没有证据表明这一点。为了解决这个问题，我们设计了一种称为成对排名分析（PRA）的方法，以定量测量排名相似性并执行相同分布的统计检验。具体而言，我们采用Kendall等级相关系数来评估不同数据集上算法排名之间的成对相似性值。然后，进行非参数的两样本Kolmogorov-smirnov（KS）测试，以判断合成数据集和现实世界数据集之间的算法排名是否在相同的分布中排名在合成数据集和现实世界数据集之间的相关性。我们进行了全面的实验，具有十种代表性算法，三个流行的真实人物REID数据集和三个最近发布的大规模合成数据集。通过设计的成对排名分析和全面评估，我们得出结论，最近可以可靠地使用一个大规模合成数据集克隆人来基准GPREID，从统计学上讲与现实世界数据集相同。因此，本研究保证了源训练集和目标测试集的合成数据集使用，而实际上没有现实世界监视数据的隐私问题。此外，本文中的研究还可能激发合成数据集的未来设计。

translated by 谷歌翻译

The State of Aerial Surveillance: A Survey

Kien Nguyen , Clinton Fookes , Sridha Sridharan , Yingli Tian , Xiaoming Liu , Feng Liu , Arun Ross

分类：计算机视觉 | 人工智能 | 机器学习

2022-01-09

由于其前所未有的优势，在规模，移动，部署和隐蔽观察能力方面，空中平台和成像传感器的快速出现是实现新的空中监测形式。本文从计算机视觉和模式识别的角度来看，全面概述了以人为本的空中监控任务。它旨在为读者提供使用无人机，无人机和其他空中平台的空中监测任务当前状态的深入系统审查和技术分析。感兴趣的主要对象是人类，其中要检测单个或多个受试者，识别，跟踪，重新识别并进行其行为。更具体地，对于这四项任务中的每一个，我们首先讨论与基于地面的设置相比在空中环境中执行这些任务的独特挑战。然后，我们审查和分析公共可用于每项任务的航空数据集，并深入了解航空文学中的方法，并调查他们目前如何应对鸟瞰挑战。我们在讨论缺失差距和开放研究问题的讨论中得出结论，告知未来的研究途径。

translated by 谷歌翻译

Less is More: Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification

Suncheng Xiang , Guanjie You , Mengyuan Guan , Hao Chen , Binjie Yan , Ting Liu , Yuzhuo Fu

分类：计算机视觉

2021-09-22

人重新识别（RE-ID）在公共安全和视频监控等应用中起着重要作用。最近，从合成数据引擎的普及中获益的合成数据学习，从公众眼中引起了极大的关注。但是，现有数据集数量，多样性和变性有限，并且不能有效地用于重新ID问题。为了解决这一挑战，我们手动构造一个名为FineGPR的大型人数据集，具有细粒度的属性注释。此外，旨在充分利用FineGPR的潜力，并推广从数百万综合数据的高效培训，我们提出了一个名为AOST的属性分析流水线，它动态地学习了真实域中的属性分布，然后消除了合成和现实世界之间的差距因此，自由地部署到新场景。在基准上进行的实验表明，FineGPR具有AOST胜过（或与）现有的实际和合成数据集，这表明其对重新ID任务的可行性，并证明了众所周知的较少的原则。我们的Synthetic FineGPR数据集可公开可用于\ URL {https://github.com/jeremyxsc/finegpr}。

translated by 谷歌翻译

RealGait: Gait Recognition for Person Re-Identification

Shaoxiong Zhang , Yunhong Wang , Tianrui Chai , Annan Li , Anil K. Jain

分类：计算机视觉

2022-01-13

人的步态被认为是一种独特的生物识别标识符，其可以在距离处以覆盖方式获取。但是，在受控场景中捕获的现有公共领域步态数据集接受的模型导致应用于现实世界无约束步态数据时的剧烈性能下降。另一方面，视频人员重新识别技术在大规模公共可用数据集中实现了有希望的性能。鉴于服装特性的多样性，衣物提示对于人们的认可不可靠。因此，实际上尚不清楚为什么最先进的人重新识别方法以及他们的工作。在本文中，我们通过从现有的视频人重新识别挑战中提取剪影来构建一个新的步态数据集，该挑战包括1,404人以不受约束的方式行走。基于该数据集，可以进行步态认可与人重新识别之间的一致和比较研究。鉴于我们的实验结果表明，目前在受控情景收集的数据下设计的目前的步态识别方法不适合真实监视情景，我们提出了一种名为Realgait的新型步态识别方法。我们的结果表明，在实际监视情景中识别人的步态是可行的，并且潜在的步态模式可能是视频人重新设计在实践中的真正原因。

translated by 谷歌翻译

Person Transfer GAN to Bridge Domain Gap for Person Re-Identification

Longhui Wei , Shiliang Zhang , Wen Gao , Qi Tian

分类：

2017-11-23

Although the performance of person Re-Identification (ReID) has been significantly boosted, many challenging issues in real scenarios have not been fully investigated, e.g., the complex scenes and lighting variations, viewpoint and pose changes, and the large number of identities in a camera network. To facilitate the research towards conquering those issues, this paper contributes a new dataset called MSMT17 with many important features, e.g., 1) the raw videos are taken by an 15-camera network deployed in both indoor and outdoor scenes, 2) the videos cover a long period of time and present complex lighting variations, and 3) it contains currently the largest number of annotated identities, i.e., 4,101 identities and 126,441 bounding boxes. We also observe that, domain gap commonly exists between datasets, which essentially causes severe performance drop when training and testing on different datasets. This results in that available training data cannot be effectively leveraged for new testing domains. To relieve the expensive costs of annotating new training samples, we propose a Person Transfer Generative Adversarial Network (PTGAN) to bridge the domain gap. Comprehensive experiments show that the domain gap could be substantially narrowed-down by the PTGAN.

translated by 谷歌翻译

Improving Person Re-Identification with Temporal Constraints

Julia Dietlmeier , Feiyan Hu , Frances Ryan , Noel E. O'Connor , Kevin McGuinness

分类：计算机视觉

2021-11-17

在本文中，我们在爱尔兰都柏林都柏林的大型和繁忙机场中介绍了一个基于图像的人重新识别数据集。与所有可公开的基于图像的数据集不同，我们的数据集除帧号和相机和人员ID之外还包含时间戳信息。我们的数据集也完全是匿名的，以遵守现代数据隐私法规。我们将最先进的人重新识别模型应用于我们的数据集，并显示通过利用可用的时间戳信息，我们能够在地图中实现37.43％的显着增益，并且在Rank1精度中的增益为30.22％。我们还提出了一个贝叶斯颞次重新排名的后处理步骤，该步骤进一步增加了10.03％的地图增益和Rank1精度度量的9.95％。在其他基于图像的人重新识别数据集中不可能结合视觉和时间信息的工作。我们认为，拟议的新数据集将能够进一步开发人员重新识别研究，以挑战现实世界应用。 Daa DataSet可以从HTTPS://bit.ly/3Atxtd6下载

translated by 谷歌翻译

Domain Camera Adaptation and Collaborative Multiple Feature Clustering for Unsupervised Person Re-ID

Yuanpeng Tu

分类：计算机视觉

2022-08-18

最近，无监督的人重新识别（RE-ID）引起了人们的关注，因为其开放世界情景设置有限，可用的带注释的数据有限。现有的监督方法通常无法很好地概括在看不见的域上，而无监督的方法（大多数缺乏多范围的信息），并且容易患有确认偏见。在本文中，我们旨在从两个方面从看不见的目标域上找到更好的特征表示形式，1）在标记的源域上进行无监督的域适应性和2）2）在未标记的目标域上挖掘潜在的相似性。此外，提出了一种协作伪标记策略，以减轻确认偏见的影响。首先，使用生成对抗网络将图像从源域转移到目标域。此外，引入了人身份和身份映射损失，以提高生成图像的质量。其次，我们提出了一个新颖的协作多元特征聚类框架（CMFC），以学习目标域的内部数据结构，包括全局特征和部分特征分支。全球特征分支（GB）在人体图像的全球特征上采用了无监督的聚类，而部分特征分支（PB）矿山在不同人体区域内的相似性。最后，在两个基准数据集上进行的广泛实验表明，在无监督的人重新设置下，我们的方法的竞争性能。

translated by 谷歌翻译

Applications of Deep Learning in Fish Habitat Monitoring: A Tutorial and Survey

Alzayat Saleh , Marcus Sheaves , Dean Jerry , Mostafa Rahimi Azghadi

分类：计算机视觉

2022-06-11

海洋生态系统及其鱼类栖息地越来越重要，因为它们在提供有价值的食物来源和保护效果方面的重要作用。由于它们的偏僻且难以接近自然，因此通常使用水下摄像头对海洋环境和鱼类栖息地进行监测。这些相机产生了大量数字数据，这些数据无法通过当前的手动处理方法有效地分析，这些方法涉及人类观察者。 DL是一种尖端的AI技术，在分析视觉数据时表现出了前所未有的性能。尽管它应用于无数领域，但仍在探索其在水下鱼类栖息地监测中的使用。在本文中，我们提供了一个涵盖DL的关键概念的教程，该教程可帮助读者了解对DL的工作原理的高级理解。该教程还解释了一个逐步的程序，讲述了如何为诸如水下鱼类监测等挑战性应用开发DL算法。此外，我们还提供了针对鱼类栖息地监测的关键深度学习技术的全面调查，包括分类，计数，定位和细分。此外，我们对水下鱼类数据集进行了公开调查，并比较水下鱼类监测域中的各种DL技术。我们还讨论了鱼类栖息地加工深度学习的新兴领域的一些挑战和机遇。本文是为了作为希望掌握对DL的高级了解，通过遵循我们的分步教程而为其应用开发的海洋科学家的教程，并了解如何发展其研究，以促进他们的研究。努力。同时，它适用于希望调查基于DL的最先进方法的计算机科学家，以进行鱼类栖息地监测。

translated by 谷歌翻译

Rethinking Person Re-Identification via Semantic-Based Pretraining

Suncheng Xiang , Jingsheng Gao , Zirui Zhang , Mengyuan Guan , Binjie Yan , Ting Liu , Dahong Qian , Yuzhuo Fu

分类：计算机视觉

2021-10-11

Pretraining is a dominant paradigm in computer vision. Generally, supervised ImageNet pretraining is commonly used to initialize the backbones of person re-identification (Re-ID) models. However, recent works show a surprising result that CNN-based pretraining on ImageNet has limited impacts on Re-ID system due to the large domain gap between ImageNet and person Re-ID data. To seek an alternative to traditional pretraining, here we investigate semantic-based pretraining as another method to utilize additional textual data against ImageNet pretraining. Specifically, we manually construct a diversified FineGPR-C caption dataset for the first time on person Re-ID events. Based on it, a pure semantic-based pretraining approach named VTBR is proposed to adopt dense captions to learn visual representations with fewer images. We train convolutional neural networks from scratch on the captions of FineGPR-C dataset, and then transfer them to downstream Re-ID tasks. Comprehensive experiments conducted on benchmark datasets show that our VTBR can achieve competitive performance compared with ImageNet pretraining - despite using up to 1.4x fewer images, revealing its potential in Re-ID pretraining.

translated by 谷歌翻译