Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.
translated by 谷歌翻译
Measuring and alleviating the discrepancies between the synthetic (source) and real scene (target) data is the core issue for domain adaptive semantic segmentation. Though recent works have introduced depth information in the source domain to reinforce the geometric and semantic knowledge transfer, they cannot extract the intrinsic 3D information of objects, including positions and shapes, merely based on 2D estimated depth. In this work, we propose a novel Geometry-Aware Network for Domain Adaptation (GANDA), leveraging more compact 3D geometric point cloud representations to shrink the domain gaps. In particular, we first utilize the auxiliary depth supervision from the source domain to obtain the depth prediction in the target domain to accomplish structure-texture disentanglement. Beyond depth estimation, we explicitly exploit 3D topology on the point clouds generated from RGB-D images for further coordinate-color disentanglement and pseudo-labels refinement in the target domain. Moreover, to improve the 2D classifier in the target domain, we perform domain-invariant geometric adaptation from source to target and unify the 2D semantic and 3D geometric segmentation results in two domains. Note that our GANDA is plug-and-play in any existing UDA framework. Qualitative and quantitative results demonstrate that our model outperforms state-of-the-arts on GTA5->Cityscapes and SYNTHIA->Cityscapes.
translated by 谷歌翻译
This paper introduces a new few-shot learning pipeline that casts relevance ranking for image retrieval as binary ranking relation classification. In comparison to image classification, ranking relation classification is sample efficient and domain agnostic. Besides, it provides a new perspective on few-shot learning and is complementary to state-of-the-art methods. The core component of our deep neural network is a simple MLP, which takes as input an image triplet encoded as the difference between two vector-Kronecker products, and outputs a binary relevance ranking order. The proposed RankMLP can be built on top of any state-of-the-art feature extractors, and our entire deep neural network is called the ranking deep neural network, or RankDNN. Meanwhile, RankDNN can be flexibly fused with other post-processing methods. During the meta test, RankDNN ranks support images according to their similarity with the query samples, and each query sample is assigned the class label of its nearest neighbor. Experiments demonstrate that RankDNN can effectively improve the performance of its baselines based on a variety of backbones and it outperforms previous state-of-the-art algorithms on multiple few-shot learning benchmarks, including miniImageNet, tieredImageNet, Caltech-UCSD Birds, and CIFAR-FS. Furthermore, experiments on the cross-domain challenge demonstrate the superior transferability of RankDNN.The code is available at: https://github.com/guoqianyu-alberta/RankDNN.
translated by 谷歌翻译
图形神经网络(GNNS)在图表表示学习中获得了动力,并在各种领域(例如数据挖掘)(\ emph {e.g。,}社交网络分析和推荐系统),计算机视觉(\ emph {例如,}对象检测和点云学习)和自然语言处理(\ emph {e.g。,}关系提取和序列学习),仅举几例。随着自然语言处理和计算机视觉中变压器的出现,图形变压器将图形结构嵌入到变压器体系结构中,以克服局部邻域聚集的局限性,同时避免严格的结构电感偏见。在本文中,我们从面向任务的角度介绍了计算机视觉中GNN和图形变压器的全面综述。具体来说,我们根据输入数据的模式,\ emph {i.e。,} 2D自然图像,视频,3D数据,Vision +语言和医学图像,将其在计算机视觉中的应用分为五个类别。在每个类别中,我们根据一组视觉任务进一步对应用程序进行划分。这种面向任务的分类法使我们能够检查如何通过不同的基于GNN的方法以及这些方法的表现如何解决每个任务。基于必要的初步,我们提供了任务的定义和挑战,对代表性方法的深入报道以及有关见解,局限性和未来方向的讨论。
translated by 谷歌翻译
在临床实践中,放射科医生经常使用属性,例如病变的形态学和外观特征,以帮助疾病诊断。有效地建模属性以及所有涉及属性的关系可以提高医学图像诊断算法的概括能力和可验证性。在本文中,我们介绍了一种用于基于可验证属性的医学图像诊断的混合神经培养基推理算法。在我们的混合算法中,有两个平行分支,一个贝叶斯网络分支执行概率因果关系推理,图形卷积网络分支执行了使用特征表示的更通用的关系建模和推理。这两个分支之间的紧密耦合是通过跨网络注意机制及其分类结果的融合来实现的。我们已成功地将混合推理算法应用于两个具有挑战性的医学图像诊断任务。在LIDC-IDRI基准数据集上,用于CT图像中肺结核的良性恶性分类,我们的方法达到了95.36 \%的新最新精度,AUC为96.54 \%。我们的方法还可以在内部胸部X射线图像数据集上提高3.24 \%的精度,以诊断结核病。我们的消融研究表明,在非常有限的培训数据下,与纯神经网络体系结构相比,我们的混合算法的概括性能要好得多。
translated by 谷歌翻译
使用嘈杂标签(LNL)学习旨在设计策略来通过减轻模型过度适应嘈杂标签的影响来提高模型性能和概括。 LNL的主要成功在于从大量嘈杂数据中识别尽可能多的干净样品,同时纠正错误分配的嘈杂标签。最近的进步采用了单个样品的预测标签分布来执行噪声验证和嘈杂的标签校正,很容易产生确认偏差。为了减轻此问题,我们提出了邻里集体估计,其中通过将其与其功能空间最近的邻居进行对比,重新估计了候选样本的预测性可靠性。具体而言,我们的方法分为两个步骤:1)邻域集体噪声验证,将所有训练样品分为干净或嘈杂的子集,2)邻里集体标签校正到Relabel嘈杂样品,然后使用辅助技术来帮助进一步的模型优化。 。在四个常用基准数据集(即CIFAR-10,CIFAR-100,Clothing-1M和WebVision-1.0)上进行了广泛的实验,这表明我们提出的方法非常优于最先进的方法。
translated by 谷歌翻译
作为许多医疗应用的重要上游任务,监督的地标本地化仍然需要不可忽略的注释成本才能实现理想的绩效。此外,由于繁琐的收集程序,医疗地标数据集的规模有限,会影响大规模自我监督的预训练方法的有效性。为了应对这些挑战,我们提出了一个两阶段的单次医疗地标本地化框架,该框架首先通过无监督的注册从标记的示例中删除了地标,以便未​​标记的目标,然后利用这些嘈杂的伪标签来训练健壮的探测器。为了处理重要的结构变化,我们在包含边缘信息的新型损失函数的指导下学习了全球对齐和局部变形的端到端级联。在第二阶段,我们探索了选择可靠的伪标签和半监视学习的跨矛盾的自持矛盾。我们的方法在不同身体部位的公共数据集上实现了最先进的表现,这证明了其一般适用性。
translated by 谷歌翻译
经过嘈杂标签训练的深层模型很容易在概括中过度拟合和挣扎。大多数现有的解决方案都是基于理想的假设,即标签噪声是类条件,即同一类的实例共享相同的噪声模型,并且独立于特征。在实践中,现实世界中的噪声模式通常更为细粒度作为实例依赖性,这构成了巨大的挑战,尤其是在阶层间失衡的情况下。在本文中,我们提出了一种两阶段的干净样品识别方法,以应对上述挑战。首先,我们采用类级特征聚类程序,以早期识别在班级预测中心附近的干净样品。值得注意的是,我们根据稀有类的预测熵来解决类不平衡问题。其次,对于接近地面真相类边界的其余清洁样品(通常与样品与实例有关的噪声混合),我们提出了一种基于一致性的新型分类方法,该方法使用两个分类器头的一致性来识别它们:一致性越高,样品清洁的可能性就越大。对几个具有挑战性的基准进行了广泛的实验,证明了我们的方法与最先进的方法相比。
translated by 谷歌翻译
尽管深入学习算法已被深入开发用于计算机辅助结核病诊断(CTD),但它们主要依赖于精心注释的数据集,从而导致了大量时间和资源消耗。弱监督的学习(WSL)利用粗粒标签来完成精细的任务,具有解决此问题的潜力。在本文中,我们首先提出了一个新的大规模结核病(TB)胸部X射线数据集,即结核病胸部X射线属性数据集(TBX-ATT),然后建立一个属性辅助的弱点监督的框架来分类并通过利用属性信息来克服WSL方案中的监督不足来定位结核病。具体而言,首先,TBX-ATT数据集包含2000个X射线图像,其中具有七种用于TB关系推理的属性,这些属性由经验丰富的放射科医生注释。它还包括带有11200 X射线图像的公共TBX11K数据集,以促进弱监督检测。其次,我们利用一个多尺度特征交互模型,用于TB区域分类和属性关系推理检测。在TBX-ATT数据集上评估了所提出的模型,并将作为未来研究的稳固基准。代码和数据将在https://github.com/gangmingzhao/tb-attribute-weak-localization上获得。
translated by 谷歌翻译