逐渐射击的语义分割(IFSS)目标以逐步扩展模型的能力逐渐扩大了仅由几个样本监督的新图像。但是,在旧课程中学到的特征可能会大大漂移,从而导致灾难性遗忘。此外,很少有针对新课程的像素级细分样本会导致每个学习课程中臭名昭著的过度拟合问题。在本文中,我们明确表示基于类别的语义分割的知识作为类别嵌入和超级类嵌入,前者描述了独家的语义属性,而后者则表示超级类知识作为类共享语义属性。为了解决IFSS问题,我们提出了EHNET,即从两个方面嵌入自适应更高和超级级表示网络。首先,我们提出了一种嵌入自适应的策略,以避免特征漂移,该策略通过超级班级表示保持旧知识,并使用类似课程的方案自适应地更新类别嵌入类别,以涉及在各个会话中学习的新课程。其次,为了抵制很少有培训样本引起的过度拟合问题,通过将所有类别嵌入以进行初始化并与新班级的类别保持一致以进行增强,从而学习了超级班级的嵌入,从而使学会知识有助于学习新知识,从而减轻了绩效绩效的绩效,依赖培训数据量表。值得注意的是,这两种设计为具有足够语义和有限偏见的类提供了表示能力,从而可以执行需要高语义依赖性的分割任务。 Pascal-5i和可可数据集的实验表明,EHNET具有显着优势的新最先进的性能。
translated by 谷歌翻译
通过恢复(实体瘤的响应评估标准)自动测量病变/肿瘤大小,直径和分割对于计算机辅助诊断很重要。尽管近年来已经研究了它,但仍有空间可以提高其准确性和鲁棒性,例如(1)通过合并丰富的上下文信息来增强功能,同时保持高空间分辨率,(2)涉及新任务和损失以进行关节优化。为了实现这一目标,本文提出了一个基于变压器的网络(Meaformer,测量变压器),用于病变恢复直径预测和分割(LRDPS)。它被配制为三个相关和互补任务:病变分割,热图预测和关键点回归。据我们所知,这是首次使用按键重点回归进行恢复直径预测。 MeaeFormer可以通过使用变压器来捕获其远程依赖性来增强高分辨率功能。引入了两个一致性损失,以明确建立这些任务之间的关系,以更好地优化。实验表明,MeAformer实现了LRDP在大规模深层数据集上的最新性能,并在纵向研究中产生了两个下游诊所的任务,即3D病变细分和恢复评估。
translated by 谷歌翻译
训练深图神经网络(GNNS)构成了一项具有挑战性的任务,因为GNN的性能可能会遭受隐藏的消息层的数量。文献集中在过度平滑和了解深度GNN的性能恶化的建议上。在本文中,我们提出了一种新的解释,以解决这种恶化的性能现象,即错误的简化,也就是说,通过防止自我浮动和强迫不得加权的边缘来简化图形。我们表明,这种简化可以降低消息通话层的潜力以捕获图的结构信息。鉴于此,我们提出了一个新的框架,Edge增强了图形神经网络(EEGNN)。 EEGNN使用从提出的Dirichlet混合泊松图模型(贝叶斯非参数模型)中提取的结构信息,以改善各种深度消息的GNN的性能。不同数据集的实验表明,与基准相比,我们的方法实现了可观的性能。
translated by 谷歌翻译
为了应对人类检测对标签数据和隐私问题的不断增长的需求,合成数据已被用作替代品,并在人类检测和跟踪任务中显示出令人鼓舞的结果。我们参加了第七届基准测试多目标跟踪(BMTT)的研讨会,主题是“合成数据可以带我们多远”?我们的解决方案Pietrack是根据合成数据开发的,而无需使用任何预训练的权重。我们提出了一种自我监督的域适应方法,该方法能够减轻合成(例如Motsynth)和真实数据(例如Mot17)之间的域移位问题,而无需涉及额外的人类标签。通过利用拟议的多尺度合奏推理,我们在MOT17测试集中获得了58.7的最终HOTA得分,在挑战中排名第三。
translated by 谷歌翻译
膝关节骨关节炎(OA)是一种常见的堕落联合障碍,影响全世界的大型老年人。膝关节OA严重程度的准确放射线摄影评估在慢性患者管理中起着关键作用。目前临床采用的膝盖oA分级系统是观察者主观的,遭受帧间间的分歧。在这项工作中,我们提出了一种计算机辅助诊断方法,可以同时为两种复合材料和细粒度的OA等级提供更准确和一致的评估。提出了一种新的半监督学习方法,通过从未标记的数据学习来利用复合材料和细粒度的OA等级的潜在一致性。通过使用预先训练的高斯混合模型的日志概率表示等级相干性,我们制定了不连贯的损失,以纳入训练中的未标记数据。该方法还描述了基于关键点的汇集网络,其中从疾病目标键点(沿膝关节提取)汇集了深度图像特征,以提供更准确的和病于病理信息的特征表示,以获得准确的OA级评估。拟议的方法在公共骨关节炎倡议(OAI)数据上全面评估了4,796名科目的多中心的十年观测研究。实验结果表明,我们的方法对以前的强大的整个图像的深度分类网络基线(如Resnet-50)的显着改进。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译