Name ambiguity is common in academic digital libraries, such as multiple authors having the same name. This creates challenges for academic data management and analysis, thus name disambiguation becomes necessary. The procedure of name disambiguation is to divide publications with the same name into different groups, each group belonging to a unique author. A large amount of attribute information in publications makes traditional methods fall into the quagmire of feature selection. These methods always select attributes artificially and equally, which usually causes a negative impact on accuracy. The proposed method is mainly based on representation learning for heterogeneous networks and clustering and exploits the self-attention technology to solve the problem. The presentation of publications is a synthesis of structural and semantic representations. The structural representation is obtained by meta-path-based sampling and a skip-gram-based embedding method, and meta-path level attention is introduced to automatically learn the weight of each feature. The semantic representation is generated using NLP tools. Our proposal performs better in terms of name disambiguation accuracy compared with baselines and the ablation experiments demonstrate the improvement by feature selection and the meta-path level attention in our method. The experimental results show the superiority of our new method for capturing the most attributes from publications and reducing the impact of redundant information.
translated by 谷歌翻译
完全有监督的语义细分从密集的口罩中学习,这需要封闭设置的大量注释成本。在本文中,我们使用自然语言作为监督,而无需任何像素级注释进行开放世界细分。我们将提出的框架称为FreeSeg,在该框架上可以从训练训练型模型的原始功能图中免费获得。与零射击或开放集分割相比,freeSeg不需要任何带注释的掩码,并且可以广泛预测超出类无需监督的分段之外的类别。具体而言,FreeSeg从图像文本相似性图(ITSM)中获得了可解释的对比度图像预处理(ICLIP)的自由掩码。我们的核心改进是浓密ICLIP的平滑最小池,具有部分标签和像素的分割策略。此外,没有复杂的设计,例如分组,聚类或检索,很简单。除了简单性外,Freeseg的表现超过了以前的最先进的边缘,例如在同一设置中,MIOU在MIOU上的13.4%。
translated by 谷歌翻译
对比性语言图像预训练(剪辑)通过随时可用的自然语言监督学习丰富的表示。它可以改善下游视觉任务的一般性能,包括但不限于零射击,长尾巴,细分,检索,标题和视频。但是,据我们所知,尚未研究剪辑的视觉解释性。为了提供其预测的视觉解释,我们提出了图像文本相似性图(ITSM)。基于它,我们出人意料地发现,剪辑比前景更喜欢背景区域,并且对人类理解提出了错误的可视化。在实验上,我们发现魔鬼在汇总部分,其中不适当的合并方法导致一种称为语义转移的现象。为了纠正和提高可视化结果,我们提出了蒙版的最大池,并使用自我监督图像编码器的注意力图。同时,解释性任务和识别任务需要不同的表示。为了解决这个问题,我们提出了双重预测,以满足这一要求。我们将上述方法整合为可解释的对比度图像预训练(ICLIP)。实验表明ICLIP极大地提高了可解释性。例如,在VOC 2012数据集中,非平凡的改进分别为$ 32.85 \%$和$ 49.10 \%$。
translated by 谷歌翻译
Deep hashing has been extensively utilized in massive image retrieval because of its efficiency and effectiveness. However, deep hashing models are vulnerable to adversarial examples, making it essential to develop adversarial defense methods for image retrieval. Existing solutions achieved limited defense performance because of using weak adversarial samples for training and lacking discriminative optimization objectives to learn robust features. In this paper, we present a min-max based Center-guided Adversarial Training, namely CgAT, to improve the robustness of deep hashing networks through worst adversarial examples. Specifically, we first formulate the center code as a semantically-discriminative representative of the input image content, which preserves the semantic similarity with positive samples and dissimilarity with negative examples. We prove that a mathematical formula can calculate the center code immediately. After obtaining the center codes in each optimization iteration of the deep hashing network, they are adopted to guide the adversarial training process. On the one hand, CgAT generates the worst adversarial examples as augmented data by maximizing the Hamming distance between the hash codes of the adversarial examples and the center codes. On the other hand, CgAT learns to mitigate the effects of adversarial samples by minimizing the Hamming distance to the center codes. Extensive experiments on the benchmark datasets demonstrate the effectiveness of our adversarial training algorithm in defending against adversarial attacks for deep hashing-based retrieval. Compared with the current state-of-the-art defense method, we significantly improve the defense performance by an average of 18.61%, 12.35%, and 11.56% on FLICKR-25K, NUS-WIDE, and MS-COCO, respectively.
translated by 谷歌翻译
Surgical phase recognition is a fundamental task in computer-assisted surgery systems. Most existing works are under the supervision of expensive and time-consuming full annotations, which require the surgeons to repeat watching videos to find the precise start and end time for a surgical phase. In this paper, we introduce timestamp supervision for surgical phase recognition to train the models with timestamp annotations, where the surgeons are asked to identify only a single timestamp within the temporal boundary of a phase. This annotation can significantly reduce the manual annotation cost compared to the full annotations. To make full use of such timestamp supervisions, we propose a novel method called uncertainty-aware temporal diffusion (UATD) to generate trustworthy pseudo labels for training. Our proposed UATD is motivated by the property of surgical videos, i.e., the phases are long events consisting of consecutive frames. To be specific, UATD diffuses the single labelled timestamp to its corresponding high confident ( i.e., low uncertainty) neighbour frames in an iterative way. Our study uncovers unique insights of surgical phase recognition with timestamp supervisions: 1) timestamp annotation can reduce 74% annotation time compared with the full annotation, and surgeons tend to annotate those timestamps near the middle of phases; 2) extensive experiments demonstrate that our method can achieve competitive results compared with full supervision methods, while reducing manual annotation cost; 3) less is more in surgical phase recognition, i.e., less but discriminative pseudo labels outperform full but containing ambiguous frames; 4) the proposed UATD can be used as a plug and play method to clean ambiguous labels near boundaries between phases, and improve the performance of the current surgical phase recognition methods.
translated by 谷歌翻译
级联预测旨在建模信息扩散在网络中。最先前的方法集中在挖掘来自网络的结构或顺序特征和传播路径。最近致力于将网络结构和序列特征结合起来的图形神经网络和经常性神经网络。然而,光谱或空间方法的限制限制了预测性能的提高。此外,经常性神经网络是耗时和计算昂贵的,这导致预测的效率低下。在这里,我们提出了一种考虑个人简档,结构特征和序列信息的新方法CCASGNN。该方法利用GAT和GCN的协作框架以及将位置编码堆叠到图形神经网络层中,这与所有现有的GAT神经网络层不同,并表明了良好的性能。与最先进的方法相比,在两个真实数据集上进行的实验证实,我们的方法显着提高了预测准确性。更重要的是,消融研究调查了我们在我们的方法中的每个组分的贡献。
translated by 谷歌翻译
自动描绘器官风险(OAR)和总肿瘤体积(GTV)对于放射治疗计划具有重要意义。然而,在有限的像素(体素)向内注释下,学习强大的描绘的强大表示是一个具有挑战性的任务。在像素级别的对比学习可以通过从未标记数据学习密集的表示来缓解对注释的依赖性。最近在该方向上的研究设计了特征图上的各种对比损失,以产生地图中每个像素的鉴别特征。然而,同一地图中的像素不可避免地共享语义,其实际上可能影响同一地图中的像素的辨别,并导致与其他地图中的像素相比。为了解决这些问题,我们提出了分离的区域级对比学习计划,即Separeg,其核心是将每个图像分离成区域并分别对每个区域进行编码。具体地,Separeg包括两个组件:结构感知图像分离(SIS)模块和器官和室内间蒸馏(IID)模块。 SIS被提出在图像集上运行以重建在结构信息的指导下设置的区域。将通过典型的对比损失交叉区域从此学习机关间代表。另一方面,提出了IID来解决设定的区域中的数量不平衡,因为通过利用器官表示,微小器官可以产生较少的区域。我们进行了广泛的实验,以评估公共数据集和两个私有数据集的提出模型。实验结果表明了拟议模型的有效性,始终如一地实现比最先进的方法更好的性能。代码可在https://github.com/jcwang123/separate_cl上获得。
translated by 谷歌翻译
基于正规化的方法有利于缓解类渐进式学习中的灾难性遗忘问题。由于缺乏旧任务图像,如果分类器在新图像上产生类似的输出,它们通常会假设旧知识得到很好的保存。在本文中,我们发现他们的效果很大程度上取决于旧课程的性质:它们在彼此之间容易区分的课程上工作,但可能在更细粒度的群体上失败,例如,男孩和女孩。在SPIRIT中,此类方法将新数据项目投入到完全连接层中的权重向量中跨越的特征空间,对应于旧类。由此产生的预测在细粒度的旧课程上是相似的,因此,新分类器将逐步失去这些课程的歧视能力。为了解决这个问题,我们提出了一种无记忆生成的重播策略,通过直接从旧分类器生成代表性的旧图像并结合新的分类器培训的新数据来保留细粒度的旧阶级特征。为了解决所产生的样本的均化问题,我们还提出了一种分集体损失,使得产生的样品之间的Kullback Leibler(KL)发散。我们的方法最好是通过先前的基于正规化的方法补充,证明是为了易于区分的旧课程有效。我们验证了上述关于CUB-200-2011,CALTECH-101,CIFAR-100和微小想象的设计和见解,并表明我们的策略优于现有的无记忆方法,并具有清晰的保证金。代码可在https://github.com/xmengxin/mfgr获得
translated by 谷歌翻译
信息扩散的预测在社交网络上具有良好的营销和舆论控制具有巨大实际意义。它旨在预测将可能在社交网络上发布消息的个人。一种类型的方法基于人口统计数据,复杂网络和其他先验知识,建立一个可解释的模型来模拟和预测传播过程,而另一种类型的方法是完全数据驱动的并且将节点映射到传播预测的潜空间。 。现有的潜在空间设计和嵌入方法缺乏用户之间的干预措施。在本文中,我们提出了一种独立的不对称嵌入方法来将每个人嵌入一个潜在影响空间和多个潜在敏感空间。基于信息扩散与热扩散现象之间的相似性,在我们的模型中利用了热扩散内核,并建立了嵌入规则。此外,我们的方法捕获级联中用户组合的共同发生调节,以提高计算效果。在现实世界数据集上进行的广泛实验结果验证了我们方法的预测准确性和成本效益。
translated by 谷歌翻译
Training deep convolutional neural networks usually requires a large amount of labeled data. However, it is expensive and timeconsuming to annotate data for medical image segmentation tasks. In this paper, we present a novel uncertainty-aware semi-supervised framework for left atrium segmentation from 3D MR images. Our framework can effectively leverage the unlabeled data by encouraging consistent predictions of the same input under different perturbations. Concretely, the framework consists of a student model and a teacher model, and the student model learns from the teacher model by minimizing a segmentation loss and a consistency loss with respect to the targets of the teacher model. We design a novel uncertainty-aware scheme to enable the student model to gradually learn from the meaningful and reliable targets by exploiting the uncertainty information. Experiments show that our method achieves high performance gains by incorporating the unlabeled data. Our method outperforms the state-of-the-art semi-supervised methods, demonstrating the potential of our framework for the challenging semi-supervised problems 3 .
translated by 谷歌翻译