近年来,已经产生了大量的视觉内容,并从许多领域共享,例如社交媒体平台,医学成像和机器人。这种丰富的内容创建和共享引入了新的挑战,特别是在寻找类似内容内容的图像检索(CBIR)-A的数据库中,即长期建立的研究区域,其中需要改进的效率和准确性来实时检索。人工智能在CBIR中取得了进展,并大大促进了实例搜索过程。在本调查中,我们审查了最近基于深度学习算法和技术开发的实例检索工作,通过深网络架构类型,深度功能,功能嵌入方法以及网络微调策略组织了调查。我们的调查考虑了各种各样的最新方法,在那里,我们识别里程碑工作,揭示各种方法之间的联系,并呈现常用的基准,评估结果,共同挑战,并提出未来的未来方向。
translated by 谷歌翻译
Image descriptors based on activations of Convolutional Neural Networks (CNNs) have become dominant in image retrieval due to their discriminative power, compactness of representation, and search efficiency. Training of CNNs, either from scratch or fine-tuning, requires a large amount of annotated data, where a high quality of annotation is often crucial. In this work, we propose to fine-tune CNNs for image retrieval on a large collection of unordered images in a fully automated manner. Reconstructed 3D models obtained by the state-of-the-art retrieval and structure-from-motion methods guide the selection of the training data. We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval. CNN descriptor whitening discriminatively learned from the same training data outperforms commonly used PCA whitening. We propose a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and average pooling and show that it boosts retrieval performance. Applying the proposed method to the VGG network achieves state-of-the-art performance on the standard benchmarks: Oxford Buildings, Paris, and Holidays datasets.
translated by 谷歌翻译
细粒度的图像分析(FGIA)是计算机视觉和模式识别中的长期和基本问题,并为一组多种现实世界应用提供了基础。 FGIA的任务是从属类别分析视觉物体,例如汽车或汽车型号的种类。细粒度分析中固有的小阶级和阶级阶级内变异使其成为一个具有挑战性的问题。利用深度学习的进步,近年来,我们在深入学习动力的FGIA中见证了显着进展。在本文中,我们对这些进展的系统进行了系统的调查,我们试图通过巩固两个基本的细粒度研究领域 - 细粒度的图像识别和细粒度的图像检索来重新定义和扩大FGIA领域。此外,我们还审查了FGIA的其他关键问题,例如公开可用的基准数据集和相关域的特定于应用程序。我们通过突出几个研究方向和开放问题,从社区中突出了几个研究方向和开放问题。
translated by 谷歌翻译
Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.
translated by 谷歌翻译
准确且强大的视觉对象跟踪是最具挑战性和最基本的计算机视觉问题之一。它需要在图像序列中估计目标的轨迹,仅给出其初始位置和分段,或者在边界框的形式中粗略近似。判别相关滤波器(DCF)和深度暹罗网络(SNS)被出现为主导跟踪范式,这导致了重大进展。在过去十年的视觉对象跟踪快速演变之后,该调查介绍了90多个DCFS和暹罗跟踪器的系统和彻底审查,基于九个跟踪基准。首先,我们介绍了DCF和暹罗跟踪核心配方的背景理论。然后,我们在这些跟踪范式中区分和全面地审查共享以及具体的开放研究挑战。此外,我们彻底分析了DCF和暹罗跟踪器对九个基准的性能,涵盖了视觉跟踪的不同实验方面:数据集,评估度量,性能和速度比较。通过提出根据我们的分析提出尊重开放挑战的建议和建议来完成调查。
translated by 谷歌翻译
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.
translated by 谷歌翻译
基于无人机(UAV)基于无人机的视觉对象跟踪已实现了广泛的应用,并且由于其多功能性和有效性而引起了智能运输系统领域的越来越多的关注。作为深度学习革命性趋势的新兴力量,暹罗网络在基于无人机的对象跟踪中闪耀,其准确性,稳健性和速度有希望的平衡。由于开发了嵌入式处理器和深度神经网络的逐步优化,暹罗跟踪器获得了广泛的研究并实现了与无人机的初步组合。但是,由于无人机在板载计算资源和复杂的现实情况下,暹罗网络的空中跟踪仍然在许多方面都面临严重的障碍。为了进一步探索基于无人机的跟踪中暹罗网络的部署,这项工作对前沿暹罗跟踪器进行了全面的审查,以及使用典型的无人机板载处理器进行评估的详尽无人用分析。然后,进行板载测试以验证代表性暹罗跟踪器在现实世界无人机部署中的可行性和功效。此外,为了更好地促进跟踪社区的发展,这项工作分析了现有的暹罗跟踪器的局限性,并进行了以低弹片评估表示的其他实验。最后,深入讨论了基于无人机的智能运输系统的暹罗跟踪的前景。领先的暹罗跟踪器的统一框架,即代码库及其实验评估的结果,请访问https://github.com/vision4robotics/siamesetracking4uav。
translated by 谷歌翻译
实例级图像检索(IIR)或简单的实例检索,涉及在数据集中查找包含查询实例(例如对象)的数据集中所有图像的问题。本文首次尝试使用基于实例歧视的对比学习(CL)解决此问题。尽管CL在许多计算机视觉任务中表现出令人印象深刻的性能,但在IIR领域也从未找到过类似的成功。在这项工作中,我们通过探索从预先训练和微调的CL模型中得出判别表示的能力来解决此问题。首先,我们通过比较预先训练的深度神经网络(DNN)分类器与CL模型学到的功能相比,研究了IIR转移学习的功效。这些发现启发了我们提出了一种新的培训策略,该策略通过使用平均精度(AP)损失以及微调方法来学习针对IIR量身定制的对比功能表示形式,从而优化CL以学习为导向IIR的功能。我们的经验评估表明,从挑战性的牛津和巴黎数据集中的预先培训的DNN分类器中学到的现成的特征上的表现显着提高。
translated by 谷歌翻译
Recent years witnessed the breakthrough of face recognition with deep convolutional neural networks. Dozens of papers in the field of FR are published every year. Some of them were applied in the industrial community and played an important role in human life such as device unlock, mobile payment, and so on. This paper provides an introduction to face recognition, including its history, pipeline, algorithms based on conventional manually designed features or deep learning, mainstream training, evaluation datasets, and related applications. We have analyzed and compared state-of-the-art works as many as possible, and also carefully designed a set of experiments to find the effect of backbone size and data distribution. This survey is a material of the tutorial named The Practical Face Recognition Technology in the Industrial World in the FG2023.
translated by 谷歌翻译
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
translated by 谷歌翻译
近年来,随着对公共安全的需求越来越多,智能监测网络的快速发展,人员重新识别(RE-ID)已成为计算机视野领域的热门研究主题之一。人员RE-ID的主要研究目标是从不同的摄像机中检索具有相同身份的人。但是,传统的人重新ID方法需要手动标记人的目标,这消耗了大量的劳动力成本。随着深度神经网络的广泛应用,出现了许多基于深入的基于学习的人物的方法。因此,本文促进研究人员了解最新的研究成果和该领域的未来趋势。首先,我们总结了对几个最近公布的人的研究重新ID调查,并补充了系统地分类基于深度学习的人的重新ID方法的最新研究方法。其次,我们提出了一种多维分类,根据度量标准和表示学习,将基于深度学习的人的重新ID方法分为四类,包括深度度量学习,本地特征学习,生成的对抗学习和序列特征学习的方法。此外,我们根据其方法和动机来细分以上四类,讨论部分子类别的优缺点。最后,我们讨论了一些挑战和可能的研究方向的人重新ID。
translated by 谷歌翻译
人类自然有效地在复杂的场景中找到突出区域。通过这种观察的动机,引入了计算机视觉中的注意力机制,目的是模仿人类视觉系统的这一方面。这种注意机制可以基于输入图像的特征被视为动态权重调整过程。注意机制在许多视觉任务中取得了巨大的成功,包括图像分类,对象检测,语义分割,视频理解,图像生成,3D视觉,多模态任务和自我监督的学习。在本调查中,我们对计算机愿景中的各种关注机制进行了全面的审查,并根据渠道注意,空间关注,暂时关注和分支注意力进行分类。相关的存储库https://github.com/menghaoguo/awesome-vision-tions致力于收集相关的工作。我们还建议了未来的注意机制研究方向。
translated by 谷歌翻译
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g., Long short-term memory (LSTM). Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets. These strengths have led to exciting progress on a number of vision tasks using Transformer networks. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline. We start with an introduction to fundamental concepts behind the success of Transformers i.e., self-attention, large-scale pre-training, and bidirectional feature encoding. We then cover extensive applications of transformers in vision including popular recognition tasks (e.g., image classification, object detection, action recognition, and segmentation), generative modeling, multi-modal tasks (e.g., visual-question answering, visual reasoning, and visual grounding), video processing (e.g., activity recognition, video forecasting), low-level vision (e.g., image super-resolution, image enhancement, and colorization) and 3D analysis (e.g., point cloud classification and segmentation). We compare the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value. Finally, we provide an analysis on open research directions and possible future works. We hope this effort will ignite further interest in the community to solve current challenges towards the application of transformer models in computer vision.
translated by 谷歌翻译
即使在几个例子中,人类能够学会识别新物品。相比之下,培训基于深度学习的对象探测器需要大量的注释数据。为避免需求获取和注释这些大量数据,但很少拍摄的对象检测旨在从目标域中的新类别的少数对象实例中学习。在本调查中,我们在几次拍摄对象检测中概述了本领域的状态。我们根据培训方案和建筑布局分类方法。对于每种类型的方法,我们描述了一般的实现以及提高新型类别性能的概念。在适当的情况下,我们在这些概念上给出短暂的外卖,以突出最好的想法。最终,我们介绍了常用的数据集及其评估协议,并分析了报告的基准结果。因此,我们强调了评估中的共同挑战,并确定了这种新兴对象检测领域中最有前景的电流趋势。
translated by 谷歌翻译
This paper reviews the recent progress of remote sensing image scene classification, proposes a large-scale benchmark dataset, and evaluates a number of state-of-the-art methods using the proposed dataset.
translated by 谷歌翻译
在深度学习研究中,自学学习(SSL)引起了极大的关注,引起了计算机视觉和遥感社区的兴趣。尽管计算机视觉取得了很大的成功,但SSL在地球观测领域的大部分潜力仍然锁定。在本文中,我们对在遥感的背景下为计算机视觉的SSL概念和最新发展提供了介绍,并回顾了SSL中的概念和最新发展。此外,我们在流行的遥感数据集上提供了现代SSL算法的初步基准,从而验证了SSL在遥感中的潜力,并提供了有关数据增强的扩展研究。最后,我们确定了SSL未来研究的有希望的方向的地球观察(SSL4EO),以铺平了两个领域的富有成效的相互作用。
translated by 谷歌翻译
对象检测是计算机视觉和图像处理中的基本任务。基于深度学习的对象探测器非常成功,具有丰富的标记数据。但在现实生活中,它不保证每个对象类别都有足够的标记样本进行培训。当训练数据有限时,这些大型物体探测器易于过度装备。因此,有必要将几次拍摄的学习和零射击学习引入对象检测,这可以将低镜头对象检测命名在一起。低曝光对象检测(LSOD)旨在检测来自少数甚至零标记数据的对象,其分别可以分为几次对象检测(FSOD)和零拍摄对象检测(ZSD)。本文对基于深度学习的FSOD和ZSD进行了全面的调查。首先,本调查将FSOD和ZSD的方法分类为不同的类别,并讨论了它们的利弊。其次,本调查审查了数据集设置和FSOD和ZSD的评估指标,然后分析了在这些基准上的不同方法的性能。最后,本调查讨论了FSOD和ZSD的未来挑战和有希望的方向。
translated by 谷歌翻译
地理定位的概念是指确定地球上的某些“实体”的位置的过程,通常使用全球定位系统(GPS)坐标。感兴趣的实体可以是图像,图像序列,视频,卫星图像,甚至图像中可见的物体。由于GPS标记媒体的大规模数据集由于智能手机和互联网而迅速变得可用,而深入学习已经上升以提高机器学习模型的性能能力,因此由于其显着影响而出现了视觉和对象地理定位的领域广泛的应用,如增强现实,机器人,自驾驶车辆,道路维护和3D重建。本文提供了对涉及图像的地理定位的全面调查,其涉及从捕获图像(图像地理定位)或图像内的地理定位对象(对象地理定位)的地理定位的综合调查。我们将提供深入的研究,包括流行算法的摘要,对所提出的数据集的描述以及性能结果的分析来说明每个字段的当前状态。
translated by 谷歌翻译
标记数据通常昂贵且耗时,特别是对于诸如对象检测和实例分割之类的任务,这需要对图像的密集标签进行密集的标签。虽然几张拍摄对象检测是关于培训小说中的模型(看不见的)对象类具有很少的数据,但它仍然需要在许多标记的基础(见)类的课程上进行训练。另一方面,自我监督的方法旨在从未标记数据学习的学习表示,该数据转移到诸如物体检测的下游任务。结合几次射击和自我监督的物体检测是一个有前途的研究方向。在本调查中,我们审查并表征了几次射击和自我监督对象检测的最新方法。然后,我们给我们的主要外卖,并讨论未来的研究方向。https://gabrielhuang.github.io/fsod-survey/的项目页面
translated by 谷歌翻译
深度学习技术导致了通用对象检测领域的显着突破,近年来产生了很多场景理解的任务。由于其强大的语义表示和应用于场景理解,场景图一直是研究的焦点。场景图生成(SGG)是指自动将图像映射到语义结构场景图中的任务,这需要正确标记检测到的对象及其关系。虽然这是一项具有挑战性的任务,但社区已经提出了许多SGG方法并取得了良好的效果。在本文中,我们对深度学习技术带来了近期成就的全面调查。我们审查了138个代表作品,涵盖了不同的输入方式,并系统地将现有的基于图像的SGG方法从特征提取和融合的角度进行了综述。我们试图通过全面的方式对现有的视觉关系检测方法进行连接和系统化现有的视觉关系检测方法,概述和解释SGG的机制和策略。最后,我们通过深入讨论当前存在的问题和未来的研究方向来完成这项调查。本调查将帮助读者更好地了解当前的研究状况和想法。
translated by 谷歌翻译