The Jaccard index, also referred to as the intersectionover-union score, is commonly employed in the evaluation of image segmentation results given its perceptual qualities, scale invariance -which lends appropriate relevance to small objects, and appropriate counting of false negatives, in comparison to per-pixel losses. We present a method for direct optimization of the mean intersection-over-union loss in neural networks, in the context of semantic image segmentation, based on the convex Lovász extension of submodular losses. The loss is shown to perform better with respect to the Jaccard index measure than the traditionally used cross-entropy loss. We show quantitative and qualitative differences between optimizing the Jaccard index per image versus optimizing the Jaccard index taken over an entire dataset. We evaluate the impact of our method in a semantic segmentation pipeline and show substantially improved intersection-over-union segmentation scores on the Pascal VOC and Cityscapes datasets using state-of-the-art deep learning segmentation architectures.
translated by 谷歌翻译
Jaccard索引,也称为交叉联盟(iou),是图像语义分段中最关键的评估度量之一。然而,由于学习目的既不可分解也不是可分解的,则iou得分的直接优化是非常困难的。虽然已经提出了一些算法来优化其代理,但没有提供泛化能力的保证。在本文中,我们提出了一种边缘校准方法,可以直接用作学习目标,在数据分布上改善IOO的推广,通过刚性下限为基础。本方案理论上,根据IOU分数来确保更好的分割性能。我们评估了在七个图像数据集中所提出的边缘校准方法的有效性,显示使用深度分割模型的其他学习目标的IOU分数大量改进。
translated by 谷歌翻译
The semantic image segmentation task presents a trade-off between test time accuracy and training-time annotation cost. Detailed per-pixel annotations enable training accurate models but are very timeconsuming to obtain; image-level class labels are an order of magnitude cheaper but result in less accurate models. We take a natural step from image-level annotation towards stronger supervision: we ask annotators to point to an object if one exists. We incorporate this point supervision along with a novel objectness potential in the training loss function of a CNN model. Experimental results on the PASCAL VOC 2012 benchmark reveal that the combined effect of point-level supervision and objectness potential yields an improvement of 12.9% mIOU over image-level supervision. Further, we demonstrate that models trained with pointlevel supervision are more accurate than models trained with image-level, squiggle-level or full supervision given a fixed annotation budget.
translated by 谷歌翻译
本文介绍了类增量语义分割(CISS)问题的固态基线。虽然最近的CISS算法利用了知识蒸馏(KD)技术的变体来解决问题,但他们未能充分解决CISS引起灾难性遗忘的关键挑战;背景类的语义漂移和多标签预测问题。为了更好地解决这些挑战,我们提出了一种新方法,被称为SSUL-M(具有内存的未知标签的语义分割),通过仔细组合为语义分割量身定制的技术。具体来说,我们要求三项主要贡献。 (1)在背景课程中定义未知的类,以帮助学习未来的课程(帮助可塑性),(2)冻结骨干网以及与二进制交叉熵丢失和伪标签的跨熵丢失的分类器,以克服灾难性的遗忘(帮助稳定)和(3)首次利用微小的示例存储器在CISS中提高可塑性和稳定性。广泛进行的实验表明了我们的方法的有效性,而不是标准基准数据集上最近的最新的基线的性能明显更好。此外,与彻底的消融分析有关我们对彻底消融分析的贡献,并与传统的类增量学习针对分类相比,讨论了CISS问题的不同自然。官方代码可在https://github.com/clovaai/ssul获得。
translated by 谷歌翻译
细分已成为计算机视觉和自然语言处理的基本领域,该领域将标签分配给每个像素/功能,以从图像/文本中提取感兴趣的区域。为了评估分割的性能,骰子和IOU指标用于衡量地面真理与预测分割之间的重叠程度。在本文中,我们建立了关于骰子/IOU指标的分割理论基础,包括贝叶斯规则和骰子/iou校准,类似于分类 - 校准或分类中的Fisher一致性。我们证明,与骰子/IOU指标相对于大多数操作损失的现有基于阈值的框架不一致,因此可能导致次优的解决方案。为了解决这一陷阱,我们提出了一个基于排名的一致框架,即rankdice/rankiou,灵感来自贝叶斯细分规则的插件规则。开发了三种具有GPU并行执行的数值算法,以在大规模和高维分段中实现所提出的框架。我们研究所提出的框架的统计特性。我们表明它是骰子 - 校准的,它的多余风险范围和收敛速度也提供了。在各种模拟示例,精细的城市景观和带有最先进的深度学习体系结构的Pascal VOC数据集中,证明了Rankdice/Mrankdice的数值有效性。
translated by 谷歌翻译
深度神经网络中的建筑进步导致了跨越一系列计算机视觉任务的巨大飞跃。神经建筑搜索(NAS)并没有依靠人类的专业知识,而是成为自动化建筑设计的有前途的途径。尽管图像分类的最新成就提出了机会,但NAS的承诺尚未对更具挑战性的语义细分任务进行彻底评估。将NAS应用于语义分割的主要挑战来自两个方面:(i)要处理的高分辨率图像; (ii)针对自动驾驶等应用的实时推理速度(即实时语义细分)的其他要求。为了应对此类挑战,我们在本文中提出了一种替代辅助的多目标方法。通过一系列自定义预测模型,我们的方法有效地将原始的NAS任务转换为普通的多目标优化问题。然后是用于填充选择的层次预筛选标准,我们的方法逐渐实现了一组有效的体系结构在细分精度和推理速度之间进行交易。对三个基准数据集的经验评估以及使用华为地图集200 dk的应用程序的实证评估表明,我们的方法可以识别架构明显优于人类专家手动设计和通过其他NAS方法自动设计的现有最先进的体系结构。
translated by 谷歌翻译
在社区中广泛调查了语义分割,其中最先进的技术基于监督模型。这些模型报告了前所未有的性能,以需要大量的高质量细分面具。为了获得这种注释是非常昂贵的并且特别是在需要像素级注释的语义分割中。在这项工作中,我们通过提出作为半监督语义细分的三级自我训练框架的整体解决方案来解决这个问题。我们技术的关键思想是提取伪掩模统计信息,以减少预测概率的不确定性,同时以多任务方式执行分段一致性。我们通过三级解决方案实现这一目标。首先,我们训练分割网络以产生粗糙的伪掩模,预测概率非常不确定。其次,我们使用一个多任务模型来减少伪掩模的不确定性,该模型强制利用数据丰富的数据统计信息。我们将采用现有方法与半监督语义分割的现有方法进行比较,并在广泛的实验中展示其最先进的性能。
translated by 谷歌翻译
In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.
translated by 谷歌翻译
我们介绍了正规化的弗兰克 - 沃尔夫(Frank-Wolfe),这是一种通用有效的算法,用于推断和学习密集的有条件随机场(CRF)。该算法使用Vanilla Frank-Wolfe优化了CRF推理问题的不连续放松,并具有近似更新,这相当于最大程度地减少正则能量函数。我们提出的方法是对现有算法(例如平均字段或凹形通用程序)的概括。这种观点不仅提供了对这些算法的统一分析,而且还允许一种简单的方法来探索不同的变体,这些变体可能会产生更好的性能。我们在标准语义分割数据集的经验结果中说明了这一点,在该数据集中,我们正规化的Frank-Wolfe优于均值均值推断的几个实例化,无论是独立的组件还是作为神经网络中的端到端可训练层。我们还表明,密集的CRF与我们的新算法相结合,对强CNN基准产生了重大改进。
translated by 谷歌翻译
虽然微调预训练的网络已成为训练图像分割模型的流行方式,但这种用于图像分割的骨干网络经常使用图像分类源数据集(例如ImageNet)进行预训练。尽管图像分类数据集可以为骨干网络提供丰富的视觉特征和歧视能力,但它们无法以端到端的方式完全预训练目标模型(即骨干+分割模块)。由于分类数据集中缺乏分割标签,因此在微调过程中进行分割模块在微调过程中随机初始化。在我们的工作中,我们提出了一种利用伪语义分割标签(PSSL)的方法,以启用基于分类数据集的图像分割模型的端到端预训练。 PSSL的启发是受到观察的启发,即通过CAM,Smoothgrad和Lime等解释算法获得的分类模型的解释结果将接近视觉对象的像素簇。具体而言,通过解释分类结果并汇总了从多个分类器查询的解释集合来降低单个模型引起的偏差,从而为每个图像获得PSSL。使用PSSL,对于ImageNet的每个图像,提出的方法都利用加权分割学习程序来预先培训分割网络。实验结果表明,在Imagenet伴随PSSL作为源数据集的情况下,提出的端到端预训练策略成功地增强了各种分割模型的性能,即PSPNET-RESNET50,DEEPLABV3-RESNET50和OCRNET-HRNET-HRNETENET-HRNETENET-HRNETENET-HRNETENET-HRNETW18,和在许多细分任务上,例如CAMVID,VOC-A,VOC-C,ADE20K和CityScapes,并有重大改进。源代码可在https://github.com/paddlepaddle/paddleseg上使用。
translated by 谷歌翻译
Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called "semantic image segmentation"). We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Qualitatively, our "DeepLab" system is able to localize segment boundaries at a level of accuracy which is beyond previous methods. Quantitatively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 71.6% IOU accuracy in the test set. We show how these results can be obtained efficiently: Careful network re-purposing and a novel application of the 'hole' algorithm from the wavelet community allow dense computation of neural net responses at 8 frames per second on a modern GPU.
translated by 谷歌翻译
弱监督的语义细分(WSSS)旨在仅使用用于训练的图像级标签来产生像素类预测。为此,以前的方法采用了通用管道:它们从类激活图(CAM)生成伪口罩,并使用此类掩码来监督分割网络。但是,由于凸轮的局部属性,即它们倾向于仅专注于小的判别对象零件,因此涵盖涵盖整个物体的全部范围的全面伪面罩是一项挑战。在本文中,我们将CAM的局部性与卷积神经网络(CNNS)的质地偏见特性相关联。因此,我们建议利用形状信息来补充质地偏见的CNN特征,从而鼓励掩模预测不仅是全面的,而且还与物体边界相交。我们通过一种新颖的改进方法进一步完善了在线方式的预测,该方法同时考虑了类和颜色亲和力,以生成可靠的伪口罩以监督模型。重要的是,我们的模型是在单阶段框架内进行端到端训练的,因此在培训成本方面有效。通过对Pascal VOC 2012的广泛实验,我们验证了方法在产生精确和形状对准的分割结果方面的有效性。具体而言,我们的模型超过了现有的最新单阶段方法。此外,当在没有铃铛和哨声的简单两阶段管道中采用时,它还在多阶段方法上实现了新的最新性能。
translated by 谷歌翻译
虽然现有的语义分割方法实现令人印象深刻的结果,但它们仍然努力将其模型逐步更新,因为新类别被发现。此外,逐个像素注释昂贵且耗时。本文提出了一种新颖的对语义分割学习弱增量学习的框架,旨在学习从廉价和大部分可用的图像级标签进行新课程。与现有的方法相反,需要从下线生成伪标签,我们使用辅助分类器,用图像级标签培训并由分段模型规范化,在线获取伪监督并逐步更新模型。我们通过使用由辅助分类器生成的软标签来应对过程中的内在噪声。我们展示了我们对Pascal VOC和Coco数据集的方法的有效性,表现出离线弱监督方法,并获得了具有全面监督的增量学习方法的结果。
translated by 谷歌翻译
Class-Incremental Learning is a challenging problem in machine learning that aims to extend previously trained neural networks with new classes. This is especially useful if the system is able to classify new objects despite the original training data being unavailable. While the semantic segmentation problem has received less attention than classification, it poses distinct problems and challenges since previous and future target classes can be unlabeled in the images of a single increment. In this case, the background, past and future classes are correlated and there exist a background-shift. In this paper, we address the problem of how to model unlabeled classes while avoiding spurious feature clustering of future uncorrelated classes. We propose to use Evidential Deep Learning to model the evidence of the classes as a Dirichlet distribution. Our method factorizes the problem into a separate foreground class probability, calculated by the expected value of the Dirichlet distribution, and an unknown class (background) probability corresponding to the uncertainty of the estimate. In our novel formulation, the background probability is implicitly modeled, avoiding the feature space clustering that comes from forcing the model to output a high background score for pixels that are not labeled as objects. Experiments on the incremental Pascal VOC, and ADE20k benchmarks show that our method is superior to state-of-the-art, especially when repeatedly learning new classes with increasing number of increments.
translated by 谷歌翻译
深度神经网络易于学习具有纠缠特征表示的偏置模型,这可能导致各种下游任务的子模式。对于非代表性的类尤其如此,其中数据中缺乏多样性加剧了趋势。这种限制主要是在分类任务中解决的,但对可能出现在更复杂的密集预测问题中可能出现的额外挑战几乎没有研究,包括语义分割。为此,我们提出了一种用于语义细分的模型 - 不可知论和随机培训方案,这有助于了解脱叠和解除戒律的陈述。对于每个类,我们首先从高度纠缠的特征映射中提取特定的类信息。然后,通过特征空间中的特征选择过程抑制与随机采样类相关的信息。通过随机消除每个训练迭代中的某些类信息,我们有效地减少了类之间的特征依赖性,并且该模型能够了解更多的脱叠和解散的特征表示。使用我们的方法培训的模型展示了多个语义细分基准的强烈结果,特别是代表性课程的表现尤为显着。
translated by 谷歌翻译
Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with humanannotated, pixel-level segmentation masks. Such pixelaccurate supervision demands expensive labeling effort and limits the performance of deep networks that usually benefit from more training data. In this paper, we propose a method that achieves competitive accuracy but only requires easily obtained bounding box annotations. The basic idea is to iterate between automatically generating region proposals and training convolutional networks. These two steps gradually recover segmentation masks for improving the networks, and vise versa. Our method, called "BoxSup", produces competitive results (e.g., 62.0% mAP for validation) supervised by boxes only, on par with strong baselines (e.g., 63.8% mAP) fully supervised by masks under the same setting. By leveraging a large amount of bounding boxes, BoxSup further unleashes the power of deep convolutional networks and yields state-of-the-art results on PAS-CAL VOC 2012 and PASCAL-CONTEXT [24].
translated by 谷歌翻译
The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. Since 2012, there have been significant advances in representation capabilities of the models and computational capabilities of GPUs. But the size of the biggest dataset has surprisingly remained constant. What will happen if we increase the dataset size by 10× or 100×? This paper takes a step towards clearing the clouds of mystery surrounding the relationship between 'enormous data' and visual deep learning. By exploiting the JFT-300M dataset which has more than 375M noisy labels for 300M images, we investigate how the performance of current vision tasks would change if this data was used for representation learning. Our paper delivers some surprising (and some expected) findings. First, we find that the performance on vision tasks increases logarithmically based on volume of training data size. Second, we show that representation learning (or pretraining) still holds a lot of promise. One can improve performance on many vision tasks by just training a better base model. Finally, as expected, we present new state-of-theart results for different vision tasks including image classification, object detection, semantic segmentation and human pose estimation. Our sincere hope is that this inspires vision community to not undervalue the data and develop collective efforts in building larger datasets.
translated by 谷歌翻译
现代方法通常将语义分割标记为每个像素分类任务,而使用替代掩码分类处理实例级分割。我们的主要洞察力:掩码分类是足够的一般,可以使用完全相同的模型,丢失和培训过程来解决语义和实例级分段任务。在此观察之后,我们提出了一个简单的掩模分类模型,该模型预测了一组二进制掩码,每个模型与单个全局类标签预测相关联。总的来说,所提出的基于掩模分类的方法简化了语义和Panoptic分割任务的有效方法的景观,并显示出优异的经验结果。特别是,当类的数量大时,我们观察到掩码形成器优于每个像素分类基线。我们的面具基于分类的方法优于当前最先进的语义(ADE20K上的55.6 miou)和Panoptic Seation(Coco)模型的Panoptic Seationation(52.7 PQ)。
translated by 谷歌翻译
分类网络已用于弱监督语义分割(WSSS)中,以通过类激活图(CAM)进行细分对象。但是,没有像素级注释,已知它们主要是(1)集中在歧视区域上,以及(2)产生弥漫性凸轮而没有定义明确的预测轮廓。在这项工作中,我们通过改善CAM学习来缓解这两个问题。首先,我们根据CAM引起的类别概率质量函数来合并重要性抽样,以产生随机图像级别的类预测。如我们的经验研究所示,这导致分割涵盖更大程度的对象。其次,我们制定了特征相似性损失项,该术语进一步改善了图像中边缘的预测轮廓的对齐。此外,我们通过测量轮廓f-评分作为对公共区域MIOU度量的补充,将新的光芒放到了WSS的问题上。我们表明,我们的方法在轮廓质量方面显着优于以前的方法,同时匹配了区域相似性的最新方法。
translated by 谷歌翻译
经过图像级标签训练的弱监督图像分割通常在伪地面上的生成期间因物体区域的覆盖率不准确。这是因为对象激活图受到分类目标的训练,并且缺乏概括的能力。为了提高客观激活图的一般性,我们提出了一个区域原型网络RPNET来探索训练集的跨图像对象多样性。通过区域特征比较确定了跨图像的相似对象零件。区域之间传播对象信心,以发现新的对象区域,同时抑制了背景区域。实验表明,该提出的方法会生成更完整和准确的伪对象掩模,同时在Pascal VOC 2012和MS Coco上实现最先进的性能。此外,我们研究了提出的方法在减少训练集方面的鲁棒性。
translated by 谷歌翻译