Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way. We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking. The saliency of the image elements is defined based on their relevances to the given seeds or queries. We represent the image as a close-loop graph with superpixels as nodes. These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices. Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently. Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed. We also create a more difficult benchmark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field.
translated by 谷歌翻译
Reliable estimation of visual saliency allows appropriate processing of images without prior knowledge of their contents, and thus remains an important step in many computer vision tasks including image segmentation, object recognition, and adaptive compression. We propose a regional contrast based saliency extraction algorithm, which simultaneously evaluates global contrast differences and spatial coherence. The proposed algorithm is simple, efficient, and yields full resolution saliency maps. Our algorithm consistently outperformed existing saliency detection methods, yielding higher precision and better recall rates, when evaluated using one of the largest publicly available data sets. We also demonstrate how the extracted saliency map can be used to create high quality segmentation masks for subsequent image processing.
translated by 谷歌翻译
显着对象检测(SOD)在图像分析中具有若干应用。基于深度学习的SOD方法是最有效的,但它们可能会错过具有相似颜色的前景部分。为了规避问题,我们介绍了一个后处理方法,名为\ Texit {SuperPixel Materionity}(Sess)的后期处理方法,其交替地执行两个操作,以便显着完成:基于对象的SuperPixel分段和基于SuperPixel的显着性估算。 Sess使用输入显着图来估算超像素描绘的种子,并在前景和背景中定义超顶盒查询。新的显着性图是由查询和超像素之间的颜色相似性产生的。对于给定数量的迭代的过程重复,使得所有产生的显着性图通过蜂窝自动机组合成单个。最后,使用其平均值合并后处理和初始映射。我们展示SES可以始终如一地,并在五个图像数据集上一致而大大提高三种基于深度学习的SOD方法的结果。
translated by 谷歌翻译
Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-theart results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis on the role of training data on performance. Our experimental results provide a more reasonable and powerful training set for future research and fair comparisons.
translated by 谷歌翻译
在本文中,我们描述了一种基于图的算法,该算法使用自我监管的变压器获得的功能来检测图像和视频中的显着对象。使用这种方法,将构成图像或视频的图像贴片组织成一个完全连接的图,其中每对贴片之间的边缘使用变压器学到的功能在补丁之间标记为相似性得分。然后将显着物体的检测和分割作为图形问题配制,并使用经典的归一化切割算法解决。尽管这种方法很简单,但它仍可以在几个常见的图像和视频检测和分割任务上实现最新结果。对于无监督的对象发现,当使用VOC07,VOC12和COCO20K数据集进行测试时,这种方法的优于竞争方法的差距分别为6.1%,5.7%和2.6%。对于图像中无监督的显着性检测任务,此方法将联合(IOU)的交叉分数提高了4.4%,5.6%和5.2%。与当前最新技术相比,与ECSD,DUTS和DUT-OMRON数据集进行测试时。该方法还通过戴维斯,SEGTV2和FBMS数据集为无监督的视频对象分割任务实现了竞争结果。
translated by 谷歌翻译
对无监督对象发现的现有方法(UOD)不会向大大扩展到大型数据集,而不会损害其性能的近似。我们提出了一种新颖的UOD作为排名问题的制定,适用于可用于特征值问题和链接分析的分布式方法的阿森纳。通过使用自我监督功能,我们还展示了UOD的第一个有效的完全无监督的管道。对Coco和OpenImages的广泛实验表明,在每个图像中寻求单个突出对象的单对象发现设置中,所提出的LOD(大规模对象发现)方法与之相当于或更好地中型数据集的艺术(最多120K图像),比能够缩放到1.7M图像的唯一其他算法超过37%。在每个图像中寻求多个对象的多对象发现设置中,所提出的LOD平均精度(AP)比所有其他用于从20K到1.7M图像的数据的方法更好。使用自我监督功能,我们还表明该方法在OpenImages上获得最先进的UOD性能。我们的代码在HTTPS://github.com/huyvvo/lod上公开提供。
translated by 谷歌翻译
现有的突出实例检测(SID)方法通常从像素级注释数据集中学习。在本文中,我们向SID问题提出了第一个弱监督的方法。虽然在一般显着性检测中考虑了弱监管,但它主要基于使用类标签进行对象本地化。然而,仅使用类标签来学习实例知识的显着性信息是不普遍的,因为标签可能不容易地分离具有高语义亲和力的显着实例。由于子化信息提供了对突出项的数量的即时判断,因此自然地与检测突出实例相关,并且可以帮助分离相同实例的不同部分的同一类别的单独实例。灵感来自这一观察,我们建议使用课程和镇展标签作为SID问题的弱监督。我们提出了一种具有三个分支的新型弱监管网络:显着性检测分支利用类一致性信息来定位候选物体;边界检测分支利用类差异信息来解除对象边界;和Firedroid检测分支,使用子化信息来检测SALICE实例质心。然后融合该互补信息以产生突出的实例图。为方便学习过程,我们进一步提出了一种渐进的培训方案,以减少标签噪声和模型中学到的相应噪声,通过往复式突出实例预测和模型刷新模型。我们广泛的评估表明,该方法对精心设计的基线方法进行了有利地竞争,这些方法适应了相关任务。
translated by 谷歌翻译
视频突出对象检测旨在在视频中找到最具视觉上的对象。为了探索时间依赖性,现有方法通常是恢复性的神经网络或光学流量。然而,这些方法需要高计算成本,并且往往会随着时间的推移积累不准确性。在本文中,我们提出了一种带有注意模块的网络,以学习视频突出物体检测的对比特征,而没有高计算时间建模技术。我们开发了非本地自我关注方案,以捕获视频帧中的全局信息。共注意配方用于结合低级和高级功能。我们进一步应用了对比学学习以改善来自相同视频的前景区域对的特征表示,并将前景 - 背景区域对被推除在潜在的空间中。帧内对比损失有助于将前景和背景特征分开,并且帧间的对比损失提高了时间的稠度。我们对多个基准数据集进行广泛的实验,用于视频突出对象检测和无监督的视频对象分割,并表明所提出的方法需要较少的计算,并且对最先进的方法进行有利地执行。
translated by 谷歌翻译
皮肤病变分割是高效的非侵入性计算机辅助性早期诊断黑素瘤的关键步骤之一。本文调查了除了显着性的颜色信息,可用于自动测定着色的病变区。与仅使用显着性的大多数现有的分割方法不同,以便与周围地区的皮肤病变区分,我们提出了一种采用二值化过程的新方法,其与新的感知标准相结合,受到人类视觉感知的启发,与显着性的性质有关和输入图像数据分布的颜色。作为改进所提出的方法的准确性的手段,在分割步骤之前前面通过预处理,旨在减少计算负担,消除伪像和改善对比度。我们已经在两个公共数据库上评估了该方法,包括1497个Dermoscopic图像。我们还通过明确为DerMicopic图像明确设计的经典和最近的基于显着的方法的性能。定性和定量评估表明,该方法是有前途的,因为它产生了精确的皮肤病变分割,与其他基于显着性的分段方法相比令人满意地表现得令人满意。
translated by 谷歌翻译
Foreground map evaluation is crucial for gauging the progress of object segmentation algorithms, in particular in the field of salient object detection where the purpose is to accurately detect and segment the most salient object in a scene. Several widely-used measures such as Area Under the Curve (AUC), Average Precision (AP) and the recently proposed F ω β (Fbw) have been used to evaluate the similarity between a non-binary saliency map (SM) and a ground-truth (GT) map. These measures are based on pixel-wise errors and often ignore the structural similarities. Behavioral vision studies, however, have shown that the human visual system is highly sensitive to structures in scenes. Here, we propose a novel, efficient, and easy to calculate measure known as structural similarity measure (Structure-measure) to evaluate non-binary foreground maps. Our new measure simultaneously evaluates region-aware and object-aware structural similarity between a SM and a GT map. We demonstrate superiority of our measure over existing ones using 5 meta-measures on 5 benchmark datasets.
translated by 谷歌翻译
玻璃在我们的日常生活中非常普遍。现有的计算机视觉系统忽略了它,因此可能会产生严重的后果,例如,机器人可能会坠入玻璃墙。但是,感知玻璃的存在并不简单。关键的挑战是,任意物体/场景可以出现在玻璃后面。在本文中,我们提出了一个重要的问题,即从单个RGB图像中检测玻璃表面。为了解决这个问题,我们构建了第一个大规模玻璃检测数据集(GDD),并提出了一个名为GDNet-B的新颖玻璃检测网络,该网络通过新颖的大型场探索大型视野中的丰富上下文提示上下文特征集成(LCFI)模块并将高级和低级边界特征与边界特征增强(BFE)模块集成在一起。广泛的实验表明,我们的GDNET-B可以在GDD测试集内外的图像上达到满足玻璃检测结果。我们通过将其应用于其他视觉任务(包括镜像分割和显着对象检测)来进一步验证我们提出的GDNET-B的有效性和概括能力。最后,我们显示了玻璃检测的潜在应用,并讨论了可能的未来研究方向。
translated by 谷歌翻译
人类轻松地检测突出物体是几个领域的研究的主题,包括计算机愿景,因为它具有许多应用。然而,突出物体检测对于处理颜色和纹理图像的许多计算机模型仍然是一个挑战。这里,我们通过简单的模型提出了一种新颖和有效的策略,几乎没有内部参数,它为自然图像产生了强大的显着性图。该策略包括将颜色信息集成到局部纹理图案中,以表征颜色微纹理。使用颜色和纹理功能的文献中的大多数模型分别对待它们。在我们的情况下,它是一个简单而强大的LTP(本地三元模式)纹理描述符,应用于允许我们实现这一结束的彩色空间的相对颜色对。每种颜色微纹理由载体表示,载体由Slico(简单的线性迭代聚类与零参数)算法所获得的超像素,这是简单,快速的,表现出最先进的边界依从性。每对颜色微观纹理之间的异常程度是通过FastMAP方法计算的,该方法的快速版本(多维缩放),其在保持其距离时考虑颜色微纹理非线性。这些不同程度的不相似性为每个RGB,HSL,LUV和CMY颜色空间提供了中间显着图。最终的显着图是它们的组合,以利用它们中的每一个的强度。 MAE(平均绝对误差)和F $ _ {\ beta} $衡量我们的显着性图,在复杂的ECSSD数据集上显示,我们的模型既简单又高效,表现出几种最先进的模型。
translated by 谷歌翻译
Unsupervised object discovery aims to localize objects in images, while removing the dependence on annotations required by most deep learning-based methods. To address this problem, we propose a fully unsupervised, bottom-up approach, for multiple objects discovery. The proposed approach is a two-stage framework. First, instances of object parts are segmented by using the intra-image similarity between self-supervised local features. The second step merges and filters the object parts to form complete object instances. The latter is performed by two CNN models that capture semantic information on objects from the entire dataset. We demonstrate that the pseudo-labels generated by our method provide a better precision-recall trade-off than existing single and multiple objects discovery methods. In particular, we provide state-of-the-art results for both unsupervised class-agnostic object detection and unsupervised image segmentation.
translated by 谷歌翻译
Superpixels在众多计算机视觉任务中用作强大的预处理工具。通过使用Superpixel表示,图像基元的数量可以大大降低倍数。随着近年来深度学习的兴起,少数作品试图将深受学习的特征/图饲养成现有的经典超像素技术。然而,他们都没有能够在近乎实时生产超像素,这对超像素在实践中适用性至关重要。在这项工作中,我们提出了一个基于图形的基于图形的Superpixel分割框架。在第一阶段,我们介绍了一种高效的深度亲和学习(DAL)网络,通过聚合多尺度信息来学习成对像素亲和力。在第二阶段,我们提出了一种称为分层熵速率分割(HERS)的高效超像素方法。使用来自第一阶段的学习亲和力,HERS构建了一个分层树结构,可以瞬间产生任何数量的高度自适应超像素。我们通过视觉和数值实验证明,我们的方法的有效性和效率与各种最先进的超像素方法相比。
translated by 谷歌翻译
RGB热点对象检测(SOD)结合了两个光谱,以分段图像中的视觉明显区域。大多数现有方法都使用边界图来学习锋利的边界。这些方法忽略了孤立的边界像素与其他自信像素之间的相互作用,从而导致了次优性能。为了解决这个问题,我们为基于SWIN Transformer的RGB-T SOD提出了一个职位感知关系学习网络(PRLNET)。 PRLNET探索像素之间的距离和方向关系,以增强阶层内的紧凑性和类间的分离,从而产生具有清晰边界和均匀区域的显着对象掩模。具体而言,我们开发了一个新颖的签名距离辅助模块(SDMAM)来改善编码器特征表示,该模块考虑了边界邻域中不同像素的距离关系。然后,我们使用定向字段(FRDF)设计一种功能改进方法,该方法通过利用明显对象内部的功能来纠正边界邻域的特征。 FRDF利用对象像素之间的方向信息有效地增强了显着区域的阶层紧凑性。此外,我们构成了一个纯变压器编码器 - 模块网络,以增强RGB-T SOD的多光谱特征表示。最后,我们对三个公共基准数据集进行了定量和定性实验。结果表明,我们所提出的方法的表现优于最新方法。
translated by 谷歌翻译
Deep Convolutional Neural Networks have been adopted for salient object detection and achieved the state-of-the-art performance. Most of the previous works however focus on region accuracy but not on the boundary quality. In this paper, we propose a predict-refine architecture, BASNet, and a new hybrid loss for Boundary-Aware Salient object detection. Specifically, the architecture is composed of a densely supervised Encoder-Decoder network and a residual refinement module, which are respectively in charge of saliency prediction and saliency map refinement. The hybrid loss guides the network to learn the transformation between the input image and the ground truth in a three-level hierarchy -pixel-, patch-and map-level -by fusing Binary Cross Entropy (BCE), Structural SIMilarity (SSIM) and Intersectionover-Union (IoU) losses. Equipped with the hybrid loss, the proposed predict-refine architecture is able to effectively segment the salient object regions and accurately predict the fine structures with clear boundaries. Experimental results on six public datasets show that our method outperforms the state-of-the-art methods both in terms of regional and boundary evaluation measures. Our method runs at over 25 fps on a single GPU. The code is available at: https://github.com/NathanUA/BASNet.
translated by 谷歌翻译
Fully convolutional neural networks (FCNs) have shown their advantages in the salient object detection task. However, most existing FCNs-based methods still suffer from coarse object boundaries. In this paper, to solve this problem, we focus on the complementarity between salient edge information and salient object information. Accordingly, we present an edge guidance network (EGNet) for salient object detection with three steps to simultaneously model these two kinds of complementary information in a single network. In the first step, we extract the salient object features by a progressive fusion way. In the second step, we integrate the local edge information and global location information to obtain the salient edge features. Finally, to sufficiently leverage these complementary features, we couple the same salient edge features with salient object features at various resolutions. Benefiting from the rich edge information and location information in salient edge features, the fused features can help locate salient objects, especially their boundaries more accurately. Experimental results demonstrate that the proposed method performs favorably against the state-of-the-art methods on six widely used datasets without any pre-processing and post-processing. The source code is available at http: //mmcheng.net/egnet/.
translated by 谷歌翻译
Fully supervised salient object detection (SOD) has made considerable progress based on expensive and time-consuming data with pixel-wise annotations. Recently, to relieve the labeling burden while maintaining performance, some scribble-based SOD methods have been proposed. However, learning precise boundary details from scribble annotations that lack edge information is still difficult. In this paper, we propose to learn precise boundaries from our designed synthetic images and labels without introducing any extra auxiliary data. The synthetic image creates boundary information by inserting synthetic concave regions that simulate the real concave regions of salient objects. Furthermore, we propose a novel self-consistent framework that consists of a global integral branch (GIB) and a boundary-aware branch (BAB) to train a saliency detector. GIB aims to identify integral salient objects, whose input is the original image. BAB aims to help predict accurate boundaries, whose input is the synthetic image. These two branches are connected through a self-consistent loss to guide the saliency detector to predict precise boundaries while identifying salient objects. Experimental results on five benchmarks demonstrate that our method outperforms the state-of-the-art weakly supervised SOD methods and further narrows the gap with the fully supervised methods.
translated by 谷歌翻译
We propose a unified approach for bottom-up hierarchical image segmentation and object candidate generation for recognition, called Multiscale Combinatorial Grouping (MCG). For this purpose, we first develop a fast normalized cuts algorithm. We then propose a high-performance hierarchical segmenter that makes effective use of multiscale information. Finally, we propose a grouping strategy that combines our multiscale regions into highly-accurate object candidates by exploring efficiently their combinatorial space. We conduct extensive experiments on both the BSDS500 and on the PASCAL 2012 segmentation datasets, showing that MCG produces state-of-the-art contours, hierarchical regions and object candidates.
translated by 谷歌翻译
在夜间对其他道路使用者的检测有可能提高道路安全性。为此,人类直观地使用视觉提示,例如其他道路使用者发出的光锥和光反射,以便能够在早期阶段对即将来临的交通做出反应。通过计算机视觉方法,可以通过根据车辆大灯引起的发射光反射来预测车辆的外观来模仿这种行为。由于当前的对象检测算法主要是基于通过边界框注释的直接可见对象,因此在没有锋利边界的情况下对光反射的检测和注释是具有挑战性的。因此,发表了广泛的开源数据集PVDN(晚上的公积车辆检测),其中包括夜间的交通情况,并通过按键通过关键点进行了照明反射。在本文中,我们探讨了基于显着性方法的潜力,以根据PVDN数据集的视觉显着性和稀疏关键点注释来创建不同的对象表示。为此,我们通过考虑人类的稀疏关键点注释,将布尔地图显着性的一般思想扩展到上下文感知的方法。我们表明,这种方法允许对不同对象表示形式进行自动推导,例如二进制图或边界框,因此可以在不同的注释变体上训练检测模型,并且可以从不同的角度解决夜间彻底检测车辆的问题。因此,我们提供了进一步的强大工具和方法来研究在实际可见之前晚上检测车辆的问题。
translated by 谷歌翻译