Automated detecting lung infections from computed tomography (CT) data plays an important role for combating COVID-19. However, there are still some challenges for developing AI system. 1) Most current COVID-19 infection segmentation methods mainly relied on 2D CT images, which lack 3D sequential constraint. 2) Existing 3D CT segmentation methods focus on single-scale representations, which do not achieve the multiple level receptive field sizes on 3D volume. 3) The emergent breaking out of COVID-19 makes it hard to annotate sufficient CT volumes for training deep model. To address these issues, we first build a multiple dimensional-attention convolutional neural network (MDA-CNN) to aggregate multi-scale information along different dimension of input feature maps and impose supervision on multiple predictions from different CNN layers. Second, we assign this MDA-CNN as a basic network into a novel dual multi-scale mean teacher network (DM${^2}$T-Net) for semi-supervised COVID-19 lung infection segmentation on CT volumes by leveraging unlabeled data and exploring the multi-scale information. Our DM${^2}$T-Net encourages multiple predictions at different CNN layers from the student and teacher networks to be consistent for computing a multi-scale consistency loss on unlabeled data, which is then added to the supervised loss on the labeled data from multiple predictions of MDA-CNN. Third, we collect two COVID-19 segmentation datasets to evaluate our method. The experimental results show that our network consistently outperforms the compared state-of-the-art methods.
translated by 谷歌翻译
通过允许多个临床站点在不集中数据集的情况下协作学习全球模型,在联邦学习(FL)下进行的医学图像分割是一个有希望的方向。但是,使用单个模型适应来自不同站点的各种数据分布非常具有挑战性。个性化的FL仅利用来自Global Server共享的部分模型参数来解决此问题,同时保留其余部分以适应每个站点本地培训中的数据分布。但是,大多数现有方法都集中在部分参数分裂上,而在本地培训期间,不考虑\ textit {textit {site Inter-inter insteriscisies},实际上,这可以促进网站上的知识交流,以使模型学习有益于改进模型学习本地准确性。在本文中,我们提出了一个个性化的联合框架,使用\ textbf {l} ocal \ textbf {c}启动(lc-fed),以利用\ textIt {feftrict-and prediction-lactic}中的位置间暂停。提高细分。具体而言,由于每个本地站点都对各种功能都有另一种关注,因此我们首先设计嵌入的对比度位点,并与通道选择操作结合以校准编码的功能。此外,我们建议利用预测级别的一致性的知识,以指导模棱两可地区的个性化建模,例如解剖界限。它是通过计算分歧感知图来校准预测来实现的。我们的方法的有效性已在具有不同方式的三个医学图像分割任务上进行了验证,在该任务中,我们的方法始终显示出与最先进的个性化FL方法相比的性能。代码可从https://github.com/jcwang123/fedlc获得。
translated by 谷歌翻译
超声检查中的乳腺病变检测对于乳腺癌诊断至关重要。现有方法主要依赖于单独的2D超声图像或组合未标记的视频和标记为2D图像以训练模型以进行乳腺病变检测。在本文中,我们首先收集并注释一个超声视频数据集(188个视频),以进行乳腺病变检测。此外,我们通过汇总视频级别的病变分类功能和剪辑级的时间功能来解决超声视频中乳房病变检测的解决剪辑级和视频级特征聚合网络(CVA-NET)。剪辑级的时间功能特征编码有序视频框架的本地时间信息和洗牌视频帧的全局时间信息。在我们的CVA-NET中,设计了一个Inter-Video融合模块,以融合原始视频框架的本地功能以及从洗牌视频帧中的全局功能,并设计了一个内部视频融合模块,以学习相邻视频框架之间的时间信息。此外,我们学习视频水平功能,以将原始视频的乳房病变分类为良性或恶性病变,以进一步增强超声视频中最终的乳房病变检测性能。我们注释数据集的实验结果表明,我们的CVA-NET显然优于最先进的方法。相应的代码和数据集可在\ url {https://github.com/jhl-det/cva-net}上公开获得。
translated by 谷歌翻译
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
translated by 谷歌翻译
自动描绘器官风险(OAR)和总肿瘤体积(GTV)对于放射治疗计划具有重要意义。然而,在有限的像素(体素)向内注释下,学习强大的描绘的强大表示是一个具有挑战性的任务。在像素级别的对比学习可以通过从未标记数据学习密集的表示来缓解对注释的依赖性。最近在该方向上的研究设计了特征图上的各种对比损失,以产生地图中每个像素的鉴别特征。然而,同一地图中的像素不可避免地共享语义,其实际上可能影响同一地图中的像素的辨别,并导致与其他地图中的像素相比。为了解决这些问题,我们提出了分离的区域级对比学习计划,即Separeg,其核心是将每个图像分离成区域并分别对每个区域进行编码。具体地,Separeg包括两个组件:结构感知图像分离(SIS)模块和器官和室内间蒸馏(IID)模块。 SIS被提出在图像集上运行以重建在结构信息的指导下设置的区域。将通过典型的对比损失交叉区域从此学习机关间代表。另一方面,提出了IID来解决设定的区域中的数量不平衡,因为通过利用器官表示,微小器官可以产生较少的区域。我们进行了广泛的实验,以评估公共数据集和两个私有数据集的提出模型。实验结果表明了拟议模型的有效性,始终如一地实现比最先进的方法更好的性能。代码可在https://github.com/jcwang123/separate_cl上获得。
translated by 谷歌翻译
我们提出了一种新颖的形状意识的关系网络,用于内窥镜粘膜颌下粘膜释放(ESD)手术中的准确和实时地标检测。这项任务具有很大的临床意义,但由于复杂的手术环境中出血,照明反射和运动模糊而极其挑战。与现有解决方案相比,通过使用复杂的聚合方案忽略靶向对象之间的几何关系或捕获关系,所提出的网络能够实现令人满意的精度,同时通过充分利用地标之间的空间关系来保持实时性能。我们首先设计一种算法来自动生成关系关键点热量表,其能够直观地代表地标之间的空间关系的先验知识,而无需使用任何额外的手动注释工作。然后,我们开发两个互补正规计划,以逐步将先验知识纳入培训过程。虽然一个方案通过多任务学习引入像素级正则化,但另一个方案通过利用新设计的分组的一致性评估器来实现全局级正则化,该评估将关系约束以越野方式添加到所提出的网络。这两个方案都有利于训练模型,并且可以随时推动才能卸载,以实现实时检测。我们建立了一个大型内部数据集的ESD手术,用于食管癌,以验证我们提出的方法的有效性。广泛的实验结果表明,我们的方法在准确性和效率方面优于最先进的方法,更快地实现了更好的检测结果。在两个下游应用的有希望的结果进一步证实了我们在ESD临床实践中的方法的巨大潜力。
translated by 谷歌翻译
预训练为深入学习支持的X线射线分析中最近的成功奠定了基础。它通过在源域上进行大规模完全监督或自我监督的学习来学习可转移的图像表示。然而,监督的预培训需要复杂和劳动密集的两级人类辅助注释过程,而自我监督的学习不能与监督范例竞争。为了解决这些问题,我们提出了一个跨监督的方法,命名为审查监督(指的)的自由文本报告,该报告从射线照相中获取来自原始放射学报告的自由监督信号。该方法采用了视觉变压器,旨在从每个患者研究中的多种视图中学习联合表示。在极其有限的监督下,引用其在4个众所周知的X射线数据集上的转移学习和自我监督学习对应。此外,甚至是基于具有人辅助结构标签的射线照相的源区的甚至超越方法。因此,有可能取代规范的预训练方法。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译