Accurate polyp segmentation is of great importance for colorectal cancer diagnosis and treatment. However, due to the high cost of producing accurate mask annotations, existing polyp segmentation methods suffer from severe data shortage and impaired model generalization. Reversely, coarse polyp bounding box annotations are more accessible. Thus, in this paper, we propose a boosted BoxPolyp model to make full use of both accurate mask and extra coarse box annotations. In practice, box annotations are applied to alleviate the over-fitting issue of previous polyp segmentation models, which generate fine-grained polyp area through the iterative boosted segmentation model. To achieve this goal, a fusion filter sampling (FFS) module is firstly proposed to generate pixel-wise pseudo labels from box annotations with less noise, leading to significant performance improvements. Besides, considering the appearance consistency of the same polyp, an image consistency (IC) loss is designed. Such IC loss explicitly narrows the distance between features extracted by two different networks, which improves the robustness of the model. Note that our BoxPolyp is a plug-and-play model, which can be merged into any appealing backbone. Quantitative and qualitative experimental results on five challenging benchmarks confirm that our proposed model outperforms previous state-of-the-art methods by a large margin.
translated by 谷歌翻译
In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention. This paper presents a novel single-shot instance segmentation approach, namely Box2Mask, which integrates the classical level-set evolution model into deep neural network learning to achieve accurate mask prediction with only bounding box supervision. Specifically, both the input image and its deep features are employed to evolve the level-set curves implicitly, and a local consistency module based on a pixel affinity kernel is used to mine the local context and spatial relations. Two types of single-stage frameworks, i.e., CNN-based and transformer-based frameworks, are developed to empower the level-set evolution for box-supervised instance segmentation, and each framework consists of three essential components: instance-aware decoder, box-level matching assignment and level-set evolution. By minimizing the level-set energy function, the mask map of each instance can be iteratively optimized within its bounding box annotation. The experimental results on five challenging testbeds, covering general scenes, remote sensing, medical and scene text images, demonstrate the outstanding performance of our proposed Box2Mask approach for box-supervised instance segmentation. In particular, with the Swin-Transformer large backbone, our Box2Mask obtains 42.4% mask AP on COCO, which is on par with the recently developed fully mask-supervised methods. The code is available at: https://github.com/LiWentomng/boxlevelset.
translated by 谷歌翻译
Open-World实例细分(OWIS)旨在从图像中分割类不足的实例,该图像具有广泛的现实应用程序,例如自主驾驶。大多数现有方法遵循两阶段的管道:首先执行类不足的检测,然后再进行特定于类的掩模分段。相比之下,本文提出了一个单阶段框架,以直接为每个实例生成掩码。另外,实例掩码注释在现有数据集中可能很吵。为了克服这个问题,我们引入了新的正规化损失。具体而言,我们首先训练一个额外的分支来执行预测前景区域的辅助任务(即属于任何对象实例的区域),然后鼓励辅助分支的预测与实例掩码的预测一致。关键的见解是,这种交叉任务一致性损失可以充当误差校正机制,以打击注释中的错误。此外,我们发现所提出的跨任务一致性损失可以应用于图像,而无需任何注释,将自己借给了半监督的学习方法。通过广泛的实验,我们证明了所提出的方法可以在完全监督和半监督的设置中获得令人印象深刻的结果。与SOTA方法相比,所提出的方法将$ ap_ {100} $得分提高了4.75 \%\%\%\ rightarrow $ uvo设置和4.05 \%\%\%\%\%\%\ rightarrow $ uvo设置。在半监督学习的情况下,我们的模型仅使用30 \%标记的数据学习,甚至超过了其完全监督的数据,并具有5​​0 \%标记的数据。该代码将很快发布。
translated by 谷歌翻译
核细胞分割是数字病理分析中的基本任务,可以通过基于深度学习的方法自动化。然而,这种自动化方法的发展需要大量数据具有精确的注释掩模,这很难获得。具有弱标记数据的培训是减少注释工作量的流行解决方案。在本文中,我们提出了一种新的基于元学习的核细胞分段方法,其跟随标签校正范例,以利用嘈杂的面具利用数据。具体而言,我们设计一个完全传统的元模型,可以使用少量清洁的元数据来纠正嘈杂的掩模。然后,纠正的掩模可用于监督分割模型的训练。同时,采用双级优化方法来交替地以端到端的方式更新主要分段模型和元模型的参数。两个核细分数据集的广泛实验结果表明,我们的方法实现了最先进的结果。它甚至可以在一些嘈杂的设置中实现了对监督数据的模型培训相当的性能。
translated by 谷歌翻译
3D医学图像分割中卷积神经网络(CNN)的成功取决于大量的完全注释的3D体积,用于训练,这些训练是耗时且劳动力密集的。在本文中,我们建议在3D医学图像中只有7个点注释分段目标,并设计一个两阶段弱监督的学习框架PA-SEG。在第一阶段,我们采用大地距离变换来扩展种子点以提供更多的监督信号。为了在培训期间进一步处理未注释的图像区域,我们提出了两种上下文正则化策略,即多视图条件随机场(MCRF)损失和差异最小化(VM)损失,其中第一个鼓励具有相似特征的像素以具有一致的标签,第二个分别可以最大程度地减少分段前景和背景的强度差异。在第二阶段,我们使用在第一阶段预先训练的模型获得的预测作为伪标签。为了克服伪标签中的噪音,我们引入了一种自我和交叉监测(SCM)策略,该策略将自我训练与跨知识蒸馏(CKD)结合在主要模型和辅助模型之间,该模型从彼此生成的软标签中学习。在公共数据集的前庭造型瘤(VS)分割和脑肿瘤分割(BRAT)上的实验表明,我们在第一阶段训练的模型优于现有的最先进的弱监督方法,并在使用SCM之后,以提供其他scm来获得其他额外的scm培训,与Brats数据集中完全有监督的对应物相比,该模型可以实现竞争性能。
translated by 谷歌翻译
基于深度学习的视网膜病变分割方法通常需要大量精确的像素注释数据。但是,概述病变区域的圆形或椭圆等粗糙注释的效率可能是像素级注释的六倍。因此,本文提出了一个注释细化网络,以将粗注释转换为像素级分割掩码。我们的主要新颖性是原型学习范式的应用来增强不同数据集或类型病变的概括能力。我们还引入了一个原型称量模块,以处理过度较小的病变的具有挑战性的病例。提出的方法对公开可用的IDRID数据集进行了培训,然后概括为公共DDR和我们的现实世界私人数据集。实验表明,我们的方法显着改善了初始的粗蒙版,并以较大的边缘优于非概率基线。此外,我们证明了原型称量模块在跨数据库和跨阶级设置中的实用性。
translated by 谷歌翻译
与使用像素面罩标签的完全监督的方法相反,盒子监督实例细分利用了简单的盒子注释,该盒子注释最近吸引了许多研究注意力。在本文中,我们提出了一种新颖的单弹盒监督实例分割方法,该方法将经典级别设置模型与深度神经网络精致整合在一起。具体而言,我们提出的方法迭代地通过端到端的方式通过基于Chan-Vese的连续能量功能来学习一系列级别集。一个简单的掩码监督的SOLOV2模型可供选择,以预测实例感知的掩码映射为每个实例的级别设置。输入图像及其深度特征都被用作输入数据来发展级别集曲线,其中使用框投影函数来获得初始边界。通过最大程度地减少完全可分化的能量函数,在其相应的边界框注释中迭代优化了每个实例的级别设置。在四个具有挑战性的基准上的实验结果表明,在各种情况下,我们提出的强大实例分割方法的领先表现。该代码可在以下网址获得:https://github.com/liwentomng/boxlevelset。
translated by 谷歌翻译
滴虫病是一种常见的传染病,由寄生虫毛trichomonas阴道引起,如果不加以治疗,则增加了在人类中艾滋病毒的风险。从微观图像中对阴道的自动检测可以提供至关重要的信息,以诊断滴虫病。然而,由于毛滴虫和其他细胞之间的高外观相似性(例如,白细胞),由于其运动性较大,而且缺乏较大的巨大的外观差异,因此精确的阴道分割(TVS)是一项艰巨的任务,这是一项具有挑战性的任务,最重要的是,最重要的是,其出现较大的外观变化。对深度模型培训的规模注释数据。为了应对这些挑战,我们精心阐述了第一个大规模的微观图像数据集,trichomonas vaginalis,名为TVMI3K,由3158张图像组成,涵盖了各种背景中的毛trichomonas,具有高质量的注释,包括对象层面标签,对象标签,对象,对象,对象,物体,物体,物体,物体标签,物体标签,物体标签,对象。边界和具有挑战性的属性。此外,我们提出了一个简单而有效的基线,称为TVNet,以自动从微观图像中分割毛刺,包括高分辨率融合和前景 - 背景的注意模块。广泛的实验表明,我们的模型实现了卓越的细分性能,并且在定量和定性上都超越了各种尖端的对象检测模型,这使其成为促进电视任务中未来研究的有希望的框架。数据集和结果将在:https://github.com/cellrecog/cellrecog上公开可用。
translated by 谷歌翻译
弱监督的语义细分(WSSS)旨在仅使用用于训练的图像级标签来产生像素类预测。为此,以前的方法采用了通用管道:它们从类激活图(CAM)生成伪口罩,并使用此类掩码来监督分割网络。但是,由于凸轮的局部属性,即它们倾向于仅专注于小的判别对象零件,因此涵盖涵盖整个物体的全部范围的全面伪面罩是一项挑战。在本文中,我们将CAM的局部性与卷积神经网络(CNNS)的质地偏见特性相关联。因此,我们建议利用形状信息来补充质地偏见的CNN特征,从而鼓励掩模预测不仅是全面的,而且还与物体边界相交。我们通过一种新颖的改进方法进一步完善了在线方式的预测,该方法同时考虑了类和颜色亲和力,以生成可靠的伪口罩以监督模型。重要的是,我们的模型是在单阶段框架内进行端到端训练的,因此在培训成本方面有效。通过对Pascal VOC 2012的广泛实验,我们验证了方法在产生精确和形状对准的分割结果方面的有效性。具体而言,我们的模型超过了现有的最新单阶段方法。此外,当在没有铃铛和哨声的简单两阶段管道中采用时,它还在多阶段方法上实现了新的最新性能。
translated by 谷歌翻译
完全监督的显着对象检测(SOD)方法取得了长足的进步,但是这种方法通常依赖大量的像素级注释,这些注释耗时且耗时。在本文中,我们专注于混合标签下的新的弱监督SOD任务,其中监督标签包括传统无监督方法生成的大量粗标签和少量的真实标签。为了解决此任务中标签噪声和数量不平衡问题的问题,我们设计了一个新的管道框架,采用三种复杂的培训策略。在模型框架方面,我们将任务分解为标签细化子任务和显着对象检测子任务,它们相互合作并交替训练。具体而言,R-NET设计为配备有指导和聚合机制的搅拌机的两流编码器模型(BGA),旨在纠正更可靠的伪标签的粗标签,而S-NET是可更换的。由当前R-NET生成的伪标签监督的SOD网络。请注意,我们只需要使用训练有素的S-NET进行测试。此外,为了确保网络培训的有效性和效率,我们设计了三种培训策略,包括替代迭代机制,小组智慧的增量机制和信誉验证机制。五个草皮基准的实验表明,我们的方法在定性和定量上都针对弱监督/无监督/无监督的方法实现了竞争性能。
translated by 谷歌翻译
深度学习的快速发展在分割方面取得了长足的进步,这是计算机视觉的基本任务之一。但是,当前的细分算法主要取决于像素级注释的可用性,这些注释通常昂贵,乏味且费力。为了减轻这一负担,过去几年见证了越来越多的关注,以建立标签高效,深度学习的细分算法。本文对标签有效的细分方法进行了全面的审查。为此,我们首先根据不同类型的弱标签提供的监督(包括没有监督,粗略监督,不完整的监督和嘈杂的监督和嘈杂的监督),首先开发出一种分类法来组织这些方法,并通过细分类型(包括语义细分)补充,实例分割和全景分割)。接下来,我们从统一的角度总结了现有的标签有效的细分方法,该方法讨论了一个重要的问题:如何弥合弱监督和密集预测之间的差距 - 当前的方法主要基于启发式先导,例如交叉像素相似性,跨标签约束,跨视图一致性,跨图像关系等。最后,我们分享了对标签有效深层细分的未来研究方向的看法。
translated by 谷歌翻译
Image instance segmentation is a fundamental research topic in autonomous driving, which is crucial for scene understanding and road safety. Advanced learning-based approaches often rely on the costly 2D mask annotations for training. In this paper, we present a more artful framework, LiDAR-guided Weakly Supervised Instance Segmentation (LWSIS), which leverages the off-the-shelf 3D data, i.e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models. Our LWSIS not only exploits the complementary information in multimodal data during training, but also significantly reduces the annotation cost of the dense 2D masks. In detail, LWSIS consists of two crucial modules, Point Label Assignment (PLA) and Graph-based Consistency Regularization (GCR). The former module aims to automatically assign the 3D point cloud as 2D point-wise labels, while the latter further refines the predictions by enforcing geometry and appearance consistency of the multimodal data. Moreover, we conduct a secondary instance segmentation annotation on the nuScenes, named nuInsSeg, to encourage further research on multimodal perception tasks. Extensive experiments on the nuInsSeg, as well as the large-scale Waymo, show that LWSIS can substantially improve existing weakly supervised segmentation models by only involving 3D data during training. Additionally, LWSIS can also be incorporated into 3D object detectors like PointPainting to boost the 3D detection performance for free. The code and dataset are available at https://github.com/Serenos/LWSIS.
translated by 谷歌翻译
从非结构化的3D点云学习密集点语义,虽然是一个逼真的问题,但在文献中探讨了逼真的问题。虽然现有的弱监督方法可以仅具有小数点的点级注释来有效地学习语义,但我们发现香草边界箱级注释也是大规模3D点云的语义分割信息。在本文中,我们介绍了一个神经结构,称为Box2Seg,以了解3D点云的点级语义,具有边界盒级监控。我们方法的关键是通过探索每个边界框内和外部的几何和拓扑结构来生成准确的伪标签。具体地,利用基于注意的自我训练(AST)技术和点类激活映射(PCAM)来估计伪标签。通过伪标签进行进一步培训并精制网络。在两个大型基准测试中的实验,包括S3DIS和Scannet,证明了该方法的竞争性能。特别是,所提出的网络可以培训,甚至是均匀的空缺边界箱级注释和子环级标签。
translated by 谷歌翻译
Recently deep neural networks, which require a large amount of annotated samples, have been widely applied in nuclei instance segmentation of H\&E stained pathology images. However, it is inefficient and unnecessary to label all pixels for a dataset of nuclei images which usually contain similar and redundant patterns. Although unsupervised and semi-supervised learning methods have been studied for nuclei segmentation, very few works have delved into the selective labeling of samples to reduce the workload of annotation. Thus, in this paper, we propose a novel full nuclei segmentation framework that chooses only a few image patches to be annotated, augments the training set from the selected samples, and achieves nuclei segmentation in a semi-supervised manner. In the proposed framework, we first develop a novel consistency-based patch selection method to determine which image patches are the most beneficial to the training. Then we introduce a conditional single-image GAN with a component-wise discriminator, to synthesize more training samples. Lastly, our proposed framework trains an existing segmentation model with the above augmented samples. The experimental results show that our proposed method could obtain the same-level performance as a fully-supervised baseline by annotating less than 5% pixels on some benchmarks.
translated by 谷歌翻译
利用深度学习的水提取需要精确的像素级标签。然而,在像素级别标记高分辨率遥感图像非常困难。因此,我们研究如何利用点标签来提取水体并提出一种名为邻居特征聚合网络(NFANET)的新方法。与PixelLevel标签相比,Point标签更容易获得,但它们会失去许多信息。在本文中,我们利用了局部水体的相邻像素之间的相似性,并提出了邻居采样器来重塑遥感图像。然后,将采样的图像发送到网络以进行特征聚合。此外,我们使用改进的递归训练算法进一步提高提取精度,使水边界更加自然。此外,我们的方法利用相邻特征而不是全局或本地特征来学习更多代表性。实验结果表明,所提出的NFANET方法不仅优于其他研究的弱监管方法,而且还获得与最先进的结果相似。
translated by 谷歌翻译
The cup-to-disc ratio (CDR) is one of the most significant indicator for glaucoma diagnosis. Different from the use of costly fully supervised learning formulation with pixel-wise annotations in the literature, this study investigates the feasibility of accurate CDR measurement in fundus images using only tight bounding box supervision. For this purpose, we develop a two-task network named as CDRNet for accurate CDR measurement, one for weakly supervised image segmentation, and the other for bounding-box regression. The weakly supervised image segmentation task is implemented based on generalized multiple instance learning formulation and smooth maximum approximation, and the bounding-box regression task outputs class-specific bounding box prediction in a single scale at the original image resolution. To get accurate bounding box prediction, a class-specific bounding-box normalizer and an expected intersection-over-union are proposed. In the experiments, the proposed approach was evaluated by a testing set with 1200 images using CDR error and $F_1$ score for CDR measurement and dice coefficient for image segmentation. A grader study was conducted to compare the performance of the proposed approach with those of individual graders. The experimental results indicate that the proposed approach outperforms the state-of-the-art performance obtained from the fully supervised image segmentation (FSIS) approach using pixel-wise annotation for CDR measurement. Its performance is also better than those of individual graders. In addition, the proposed approach gets performance close to the state-of-the-art obtained from FSIS and the performance of individual graders for optic cup and disc segmentation. The codes are available at \url{https://github.com/wangjuan313/CDRNet}.
translated by 谷歌翻译
我们介绍了深度学习时代的首次全面视频息肉细分(VPS)研究。多年来,由于缺乏大规模细粒度分割注释,VPS的发展并没有轻松前进。为了解决此问题,我们首先引入了名为Sun-Seg的高质量逐帧注释数据集,其中包含来自著名的Sun-Database的158,690帧。我们提供具有不同类型的其他注释,即属性,对象掩码,边界,涂鸦和多边形。其次,我们设计了一个简单但有效的基线,称为PNS+,由全局编码器,局部编码器和归一化的自我注意(NS)块组成。全球和本地编码器会收到一个锚固框架和多个连续的帧,以提取长期和短期时空表示,然后由两个NS块逐渐更新。广泛的实验表明,PNS+实现了最佳性能和实时推理速度(170FPS),这使其成为VPS任务的有前途解决方案。第三,我们在Sun-Seg数据集中广泛评估13个代表性息肉/对象分割模型,并提供基于属性的比较。最后,我们讨论了几个开放问题,并为VPS社区提出了可能的研究指示。
translated by 谷歌翻译
基于高质量标签的鱼类跟踪和细分的DNN很昂贵。替代无监督的方法取决于视频数据中自然发生的空间和时间变化来生成嘈杂的伪界图标签。这些伪标签用于训练多任务深神经网络。在本文中,我们提出了一个三阶段的框架,用于强大的鱼类跟踪和分割,其中第一阶段是光流模型,该模型使用帧之间的空间和时间一致性生成伪标签。在第二阶段,一个自我监督的模型会逐步完善伪标签。在第三阶段,精制标签用于训练分割网络。在培训或推理期间没有使用人类注释。进行了广泛的实验来验证我们在三个公共水下视频数据集中的方法,并证明它对视频注释和细分非常有效。我们还评估框架对不同成像条件的鲁棒性,并讨论当前实施的局限性。
translated by 谷歌翻译
Instance segmentation in videos, which aims to segment and track multiple objects in video frames, has garnered a flurry of research attention in recent years. In this paper, we present a novel weakly supervised framework with \textbf{S}patio-\textbf{T}emporal \textbf{C}ollaboration for instance \textbf{Seg}mentation in videos, namely \textbf{STC-Seg}. Concretely, STC-Seg demonstrates four contributions. First, we leverage the complementary representations from unsupervised depth estimation and optical flow to produce effective pseudo-labels for training deep networks and predicting high-quality instance masks. Second, to enhance the mask generation, we devise a puzzle loss, which enables end-to-end training using box-level annotations. Third, our tracking module jointly utilizes bounding-box diagonal points with spatio-temporal discrepancy to model movements, which largely improves the robustness to different object appearances. Finally, our framework is flexible and enables image-level instance segmentation methods to operate the video-level task. We conduct an extensive set of experiments on the KITTI MOTS and YT-VIS datasets. Experimental results demonstrate that our method achieves strong performance and even outperforms fully supervised TrackR-CNN and MaskTrack R-CNN. We believe that STC-Seg can be a valuable addition to the community, as it reflects the tip of an iceberg about the innovative opportunities in the weakly supervised paradigm for instance segmentation in videos.
translated by 谷歌翻译
This paper presents the first attempt to learn semantic boundary detection using image-level class labels as supervision. Our method starts by estimating coarse areas of object classes through attentions drawn by an image classification network. Since boundaries will locate somewhere between such areas of different classes, our task is formulated as a multiple instance learning (MIL) problem, where pixels on a line segment connecting areas of two different classes are regarded as a bag of boundary candidates. Moreover, we design a new neural network architecture that can learn to estimate semantic boundaries reliably even with uncertain supervision given by the MIL strategy. Our network is used to generate pseudo semantic boundary labels of training images, which are in turn used to train fully supervised models. The final model trained with our pseudo labels achieves an outstanding performance on the SBD dataset, where it is as competitive as some of previous arts trained with stronger supervision.
translated by 谷歌翻译