我们提出了一种用于图像显着性预测的新方法,群集显着性预测。该方法根据其个人特征和已知的显着图将个体分为群集,并为每个群集生成单独的图像显着模型。我们在个性化显着图的公共数据集上测试了我们的方法,对个人特征因素的重要性各不相同,并观察了对集群的影响。对于每个群集,我们使用图像到图像翻译方法(主要是Pix2Pix模型)将通用显着性图转换为该群集的显着性图。我们尝试了三种最先进的普遍显着性预测方法,即Deepgaze II,ML-Net和Salgan,并看到它们对结果的影响。我们表明,我们的群集显着性预测技术优于最先进的普遍显着性预测模型。我们还通过使用通过受试者相似性聚类算法和两种基线方法比较聚类显着性预测的结果来证明聚类方法的有效性。我们提出了一种方法,将新朋友分配给最合适的集群,基于他们的个人功能和任何已知的显着图。在我们的实验中,我们看到这种将新人分配给群集的方法平均选择了具有更高显着性得分的群集。
translated by 谷歌翻译
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.
translated by 谷歌翻译
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译
Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.
translated by 谷歌翻译
令人难忘性测量在闪光后将容易记忆的难忘,这可能有助于设计杂志盖板,旅游宣传材料等。最近的作品对令人难忘的通用图像,对象图像或面部照片的可视化功能。然而,这些方法不能有效地预测户外自然场景图像的令人难忘性。为了克服以前作品的这种缺点,在本文中,我们提供了回答:“究竟是什么让户外自然场景令人难忘的东西”。为此,我们首先建立大规模的户外自然场景图像难忘(LNSIM)数据库,其中包含2,632个户外自然场景图像,其基础令人难忘分数和多标签场景类别注释。然后,类似于以前的作品,我们挖掘了我们的数据库,调查了如何影响户外自然场景的令人难忘程度,中高水平和高水平的手工业。特别是,我们发现场景类别的高级特征与户外自然场景难忘相当相关,深神经网络(DNN)学习的深度特征在预测令人难忘分数方面也是有效的。此外,将具有类别特征的深度特征组合可以进一步提高难忘预测的性能。因此,我们提出了基于端到端的DNN的户外自然场景难忘(DeepnSM)预测器,其利用了学习的类别相关的特征。然后,实验结果验证了我们深度的模型的有效性,超出了最先进的方法。最后,我们试图了解我们Deepnsm模型的良好表现的原因,并研究了我们的Deepnsm模型成功或未能准确预测户外自然场景的令人难忘的情况。代码:github.com/jiaxinlu-home/natural-cene-memorability-dataset。
translated by 谷歌翻译
The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.
translated by 谷歌翻译
Saliency detection is one of the most challenging problems in image analysis and computer vision. Many approaches propose different architectures based on the psychological and biological properties of the human visual attention system. However, there is still no abstract framework that summarizes the existing methods. In this paper, we offered a general framework for saliency models, which consists of five main steps: pre-processing, feature extraction, saliency map generation, saliency map combination, and post-processing. Also, we study different saliency models containing each level and compare their performance. This framework helps researchers to have a comprehensive view of studying new methods.
translated by 谷歌翻译
使用变压器 - 卷积神经网络(CNN)的视觉显着性预测具有显着的高级计算模型,以实现显着性预测。但是,准确模拟人类皮层中视觉注意的机制仍然是一个学术挑战。将人类视力的属性集成到CNN体系结构的设计中,这是至关重要的,从而导致感知上更相关的显着性预测。由于CNN体系结构的固有归纳偏见,因此缺乏足够的长距离上下文编码能力。这阻碍了基于CNN的显着性模型,无法捕获模仿人类观看行为的属性。通过利用自我发项机制来编码远程信息,变形金刚在编码远程信息方面表现出了巨大潜力。在本文中,我们提出了一个新颖的显着性模型,该模型将变压器组件集成到CNNs以捕获远程上下文视觉信息。实验结果表明,变压器为显着性预测提供了附加的价值,从而增强了其在性能中的感知相关性。我们提出的使用变压器的显着性模型在公共基准和显着性预测模型的竞争上取得了卓越的成果。我们提出的显着模型TransAlnet的源代码可在以下网址获得:https://github.com/ljovo/transalnet
translated by 谷歌翻译
我们提出了统一的显着性和扫描路径模型(UMSS) - 一个模型,用于预测信息性能的视觉显着和扫描路径(即眼固定序列)。虽然扫描路径提供有关视觉探索过程中不同可视化元素的重要性的丰富信息,但是有限的工作仅限于预测聚合的注意力统计,例如视觉显着性。我们对流行的Massvis DataSet上的不同信息可视化元素(例如标题,标题,数据)进行了深入的凝视行为。我们表明,虽然整体而言,凝视图案令人惊讶地在可视化和观众方面一致,但不同元素的凝视动力学也存在结构差异。通过我们的分析来了解,UMSS首先预测多持续元素级显着映射,然后是概率地样本来自它们的扫描路径。对Massvis的广泛实验表明,我们的方法始终如一地优于若干,广泛使用的扫描路径和显着性评估度量的最先进的方法。我们的方法在扫描路径预测的序列得分为11.5%的相对改善,并且Pearson相关系数的显着性预测高达23.6%的相对提高。这些结果是令人愉快的,并指向更丰富的用户模型和对视觉关注的模拟,无需任何眼睛跟踪设备。
translated by 谷歌翻译
尽管近期基于深度学习的语义细分,但远程感测图像的自动建筑检测仍然是一个具有挑战性的问题,由于全球建筑物的出现巨大变化。误差主要发生在构建足迹的边界,阴影区域,以及检测外表面具有与周围区域非常相似的反射率特性的建筑物。为了克服这些问题,我们提出了一种生成的对抗基于网络的基于网络的分割框架,其具有嵌入在发电机中的不确定性关注单元和改进模块。由边缘和反向关注单元组成的细化模块,旨在精炼预测的建筑地图。边缘注意力增强了边界特征,以估计更高的精度,并且反向关注允许网络探索先前估计区域中缺少的功能。不确定性关注单元有助于网络解决分类中的不确定性。作为我们方法的权力的衡量标准,截至2021年12月4日,它在Deepglobe公共领导板上的第二名,尽管我们的方法的主要重点 - 建筑边缘 - 并不完全对齐用于排行榜排名的指标。 DeepGlobe充满挑战数据集的整体F1分数为0.745。我们还报告了对挑战的Inria验证数据集的最佳成绩,我们的网络实现了81.28%的总体验证,总体准确性为97.03%。沿着同一条线,对于官方Inria测试数据集,我们的网络总体上得分77.86%和96.41%,而且准确性。
translated by 谷歌翻译
随着多媒体技术的快速发展,增强现实(AR)已成为一个有希望的下一代移动平台。 AR的基本理论是人类的视觉混乱,它使用户可以通过将它们叠加在一起,同时感知现实世界的场景和增强内容(虚拟世界场景)。为了获得优质的经验(QOE),重要的是要了解两种情况之间的相互作用并和谐地显示AR内容。但是,关于这种叠加将如何影响人类视觉关注的研究。因此,在本文中,我们主要分析背景(BG)场景和AR内容之间的相互作用效果,并研究AR中的显着性预测问题。具体而言,我们首先在AR数据集(SARD)中构建显着性,其中包含450 bg图像,450次AR图像以及由叠加BG和AR图像产生的1350个叠加图像,并配对三个混合级别。在60个受试者中进行了大规模的眼睛跟踪实验,以收集眼动数据。为了更好地预测AR的显着性,我们提出了一种量化显着性预测方法,并将其推广为AR显着性预测。为了进行比较,提出并评估了三种基准方法,并与我们在沙德上提出的方法一起进行了评估。实验结果证明了我们提出的方法在常见的显着性预测问题和AR显着性预测问题上的优越性比基准方法的优势。我们的数据集和代码可在以下网址获得:https://github.com/duanhuiyu/arsality。
translated by 谷歌翻译
本文介绍了一种新的框架,以预测全向图像的视觉注意。我们的体系结构的关键设置是同时预测给定刺激的显着图和相应的扫描路径。该框架实现了一个完全编码器 - 解码器卷积神经网络,由注意模块增强以生成代表性显着图。另外,采用辅助网络通过SoftArgMax函数来生成可能的视口中心固定点。后者允许从特征映射派生固定点。为了利用扫描路径预测,然后应用自适应联合概率分布模型来通过利用基于编码器解码器的显着性图和基于扫描路径的显着热图来构建最终的不偏不倚的显着性图。在显着性和扫描路径预测方面评估所提出的框架,并将结果与​​Salient360上的最先进方法进行比较!数据集。结果表明,我们的框架和这种架构的益处的相关性,用于进一步全向视觉注意预测任务。
translated by 谷歌翻译
Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-theart results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis on the role of training data on performance. Our experimental results provide a more reasonable and powerful training set for future research and fair comparisons.
translated by 谷歌翻译
Deep domain adaptation has emerged as a new learning technique to address the lack of massive amounts of labeled data. Compared to conventional methods, which learn shared feature subspaces or reuse important source instances with shallow representations, deep domain adaptation methods leverage deep networks to learn more transferable representations by embedding domain adaptation in the pipeline of deep learning. There have been comprehensive surveys for shallow domain adaptation, but few timely reviews the emerging deep learning based methods. In this paper, we provide a comprehensive survey of deep domain adaptation methods for computer vision applications with four major contributions. First, we present a taxonomy of different deep domain adaptation scenarios according to the properties of data that define how two domains are diverged. Second, we summarize deep domain adaptation approaches into several categories based on training loss, and analyze and compare briefly the state-of-the-art methods under these categories. Third, we overview the computer vision applications that go beyond image classification, such as face recognition, semantic segmentation and object detection. Fourth, some potential deficiencies of current methods and several future directions are highlighted.
translated by 谷歌翻译
机器学习模型通常会遇到与训练分布不同的样本。无法识别分布(OOD)样本,因此将该样本分配给课堂标签会显着损害模​​型的可靠性。由于其对在开放世界中的安全部署模型的重要性,该问题引起了重大关注。由于对所有可能的未知分布进行建模的棘手性,检测OOD样品是具有挑战性的。迄今为止,一些研究领域解决了检测陌生样本的问题,包括异常检测,新颖性检测,一级学习,开放式识别识别和分布外检测。尽管有相似和共同的概念,但分别分布,开放式检测和异常检测已被独立研究。因此,这些研究途径尚未交叉授粉,创造了研究障碍。尽管某些调查打算概述这些方法,但它们似乎仅关注特定领域,而无需检查不同领域之间的关系。这项调查旨在在确定其共同点的同时,对各个领域的众多著名作品进行跨域和全面的审查。研究人员可以从不同领域的研究进展概述中受益,并协同发展未来的方法。此外,据我们所知,虽然进行异常检测或单级学习进行了调查,但没有关于分布外检测的全面或最新的调查,我们的调查可广泛涵盖。最后,有了统一的跨域视角,我们讨论并阐明了未来的研究线,打算将这些领域更加紧密地融为一体。
translated by 谷歌翻译
本文介绍了图像“培养”的概念,即定义为改变“文化特征的画笔”的过程,使物体被认为属于给定文化的同时保留其功能。首先,我们提出了一种基于生成的对冲网络(GaN)将物体从源转换为目标文化域的管道。然后,我们通过在线调查问卷收集数据,以测试有关意大利参与者对属于不同文化的物体和环境的偏好的四个假设。正如预期的那样,结果取决于个人口味和偏好:然而,它们符合我们的猜想,即某些人在与机器人或其他智能系统的互动期间,可能更愿意被示出其文化领域已被修改以匹配其的图像文化背景。
translated by 谷歌翻译
随着脑成像技术和机器学习工具的出现,很多努力都致力于构建计算模型来捕获人脑中的视觉信息的编码。最具挑战性的大脑解码任务之一是通过功能磁共振成像(FMRI)测量的脑活动的感知自然图像的精确重建。在这项工作中,我们调查了来自FMRI的自然图像重建的最新学习方法。我们在架构设计,基准数据集和评估指标方面检查这些方法,并在标准化评估指标上呈现公平的性能评估。最后,我们讨论了现有研究的优势和局限,并提出了潜在的未来方向。
translated by 谷歌翻译
尽管有重要的表示能力,但馈通仅卷积神经网络(CNNS)可以忽略视觉任务中反馈连接的内在关系和潜在好处。在这项工作中,我们提出了一个反馈递归卷积框架(SALFBNET),可加于显着性检测。所提出的反馈模型可以通过从更高级别的特征块到低级层来缩小递归通路来学习丰富的上下文表示。此外,我们创建了一个大规模的伪显着数据集来缓解显着性检测的数据缺陷问题。我们首先使用所提出的反馈模型来从伪地面真理中学习显着分布。之后,我们微调现有眼固定数据集的反馈模型。此外,我们提出了一种新颖的选择性固定和非固定误差(SFNE)丢失,以使提出的反馈模型更好地学习可区分的基于眼固定的特征。广泛的实验结果表明,我们的SALFBNET具有较少参数的竞争结果对公共显着性检测基准进行了竞争力,这证明了提出的反馈模型和伪显着数据的有效性。源代码和伪显着数据集可以在https://github.com/gqding/salfbnet找到
translated by 谷歌翻译
最近,面部生物识别是对传统认证系统的方便替代的巨大关注。因此,检测恶意尝试已经发现具有重要意义,导致面部抗欺骗〜(FAS),即面部呈现攻击检测。与手工制作的功能相反,深度特色学习和技术已经承诺急剧增加FAS系统的准确性,解决了实现这种系统的真实应用的关键挑战。因此,处理更广泛的发展以及准确的模型的新研究区越来越多地引起了研究界和行业的关注。在本文中,我们为自2017年以来对与基于深度特征的FAS方法相关的文献综合调查。在这一主题上阐明,基于各种特征和学习方法的语义分类。此外,我们以时间顺序排列,其进化进展和评估标准(数据集内集和数据集互联集合中集)覆盖了FAS的主要公共数据集。最后,我们讨论了开放的研究挑战和未来方向。
translated by 谷歌翻译
图像区域的人类优先级可以以显着图或依次使用扫描模型进行时间不变的方式建模。但是,尽管两种类型的模型在几个基准和数据集上都稳步改善,但预测人类凝视仍然存在很大的差距。在这里,我们利用最近的两个发展来减少这一差距:理论分析建立一个原则性的框架,以预测下一个凝视目标和对凝视切换的人为成本的经验测量,而与图像内容无关。我们在顺序决策的框架中介绍了一种算法,该算法将任何静态显着性映射转换为一系列动态历史依赖的值映射序列,在每个注视转移之后都会重新计算。这些地图基于1)任意显着性模型提供的显着性图,2)最近测量的人类成本函数量化了眼动的大小和方向的偏好,以及3)连续探索奖金,随后的每次凝视随着每个探索而变化。该探索奖金的空间范围和时间衰减的参数是从人类凝视数据中估计的。这三个组件的相对贡献在MIT1003数据集上优化了NSS得分,并且足以显着超过NSS上的下一个注视目标的预测,并且在三个图像数据集中,对于五个最神经的显着性模型,对NSS的下一个凝视目标和AUC分数进行了预测。因此,我们提供了人类凝视偏好的实施,可用于改善任意显着性模型的“对人类对人类的预测”的下一个凝视目标。
translated by 谷歌翻译