Achieving accurate and automated tumor segmentation plays an important role in both clinical practice and radiomics research. Segmentation in medicine is now often performed manually by experts, which is a laborious, expensive and error-prone task. Manual annotation relies heavily on the experience and knowledge of these experts. In addition, there is much intra- and interobserver variation. Therefore, it is of great significance to develop a method that can automatically segment tumor target regions. In this paper, we propose a deep learning segmentation method based on multimodal positron emission tomography-computed tomography (PET-CT), which combines the high sensitivity of PET and the precise anatomical information of CT. We design an improved spatial attention network(ISA-Net) to increase the accuracy of PET or CT in detecting tumors, which uses multi-scale convolution operation to extract feature information and can highlight the tumor region location information and suppress the non-tumor region location information. In addition, our network uses dual-channel inputs in the coding stage and fuses them in the decoding stage, which can take advantage of the differences and complementarities between PET and CT. We validated the proposed ISA-Net method on two clinical datasets, a soft tissue sarcoma(STS) and a head and neck tumor(HECKTOR) dataset, and compared with other attention methods for tumor segmentation. The DSC score of 0.8378 on STS dataset and 0.8076 on HECKTOR dataset show that ISA-Net method achieves better segmentation performance and has better generalization. Conclusions: The method proposed in this paper is based on multi-modal medical image tumor segmentation, which can effectively utilize the difference and complementarity of different modes. The method can also be applied to other multi-modal data or single-modal data by proper adjustment.
translated by 谷歌翻译
有效的缩放和灵活的任务接口使大型语言模型能够在许多任务中表现出色。帕利(Pali)根据视觉和文本输入生成文本,并使用该界面以许多语言执行许多视觉,语言和多模式任务。为了训练帕利,我们利用了大型的编码器语言模型和视觉变压器(VITS)。这使我们能够利用其现有能力,并利用培训它们的大量成本。我们发现,视觉和语言组成部分的联合缩放很重要。由于现有的语言变压器比其视觉对应物要大得多,因此我们训练迄今为止最大的VIT(VIT-E),以量化甚至大容量视觉模型的好处。为了训练Pali,我们基于一个新的图像文本训练集,其中包含10B图像和文本,以100多种语言来创建大型的多语言组合。帕利(Pali)在多个视觉和语言任务(例如字幕,视觉问题,索方式,场景文本理解)中实现了最新的,同时保留了简单,模块化和可扩展的设计。
translated by 谷歌翻译
这项工作介绍了一个简单的视觉变压器设计,作为对象本地化和实例分段任务的强大基线。变压器最近在图像分类任务中展示了竞争性能。为了采用对象检测和密集的预测任务,许多作品从卷积网络和高度定制的Vit架构继承了多级设计。在这种设计背后,目标是在计算成本和多尺度全球背景的有效聚合之间进行更好的权衡。然而,现有的作品采用多级架构设计作为黑匣子解决方案,无清楚地了解其真正的益处。在本文中,我们全面研究了三个架构设计选择对vit - 空间减少,加倍的频道和多尺度特征 - 并证明了vanilla vit架构可以在没有手动的多尺度特征的情况下实现这一目标,保持原始的Vit设计哲学。我们进一步完成了缩放规则,以优化模型的准确性和计算成本/型号大小的权衡。通过在整个编码器块中利用恒定的特征分辨率和隐藏大小,我们提出了一种称为通用视觉变压器(UVIT)的简单而紧凑的VIT架构,可实现COCO对象检测和实例分段任务的强劲性能。
translated by 谷歌翻译
现实世界数据通常遵循长尾分布,这使得现有分类算法的性能较大。关键问题是尾类别中的样本未能描绘其级别的多种多样性。人类可以想象在新的姿势,场景和观看角度的样本,即使是第一次看到此类别也是如此。灵感来自于此,我们提出了一种新的基于推理的隐式语义数据增强方法,可以从其他类借用转换方向。由于每个类别的协方差矩阵表示特征转换方向,因此我们可以从类似类别中采样新的方向以产生绝对不同的实例。具体地,首先采用长尾分布式数据来训练骨干和分类器。然后,估计每个类别的协方差矩阵,构建知识图形以存储任何两个类别的关系。最后,通过从知识图中的所有类似类别传播信息,自适应地增强尾样本。 CiFar-100-LT,想象 - LT和Inattations 2018上的实验结果表明了我们所提出的方法的有效性与最先进的方法相比。
translated by 谷歌翻译
无监督的突出物体检测(USOD)对于工业应用和下游任务来说是最重要的意义。基于深度学习(DL)的USOD方法利用多种传统的SOD方法提取的一些低质量的显着性预测,作为显着性提示,主要捕获图像中的一些显着区域。此外,它们通过语义信息的助手优化这些显着性提示,该显着性提示是由其他相关视觉任务中的监督学习训练的一些型号获得的。在这项工作中,我们提出了一种两级激活 - 到显着性(A2S)框架,有效地产生了高质量的显着性提示,并使用这些提示培训强大的耐药性检测器。更重要的是,在整个培训过程中没有人类注释参与我们的框架。在第一阶段中,我们将普雷托网络(MOCO V2)转换为将多级别特征聚合到单个激活图,其中提出了一种自适应决策边界(ADB)来帮助训练变换网络。为了便于生成高质量的伪标签,我们提出了一种损失功能来扩大像素之间的特征距离及其手段。在第二阶段,在线标签纠正(OLR)策略在培训过程中更新伪标签,以减少分散的人的负面影响。此外,我们使用两个残余注意模块(RAM)来构造轻量级显着探测器,其使用低级功能中的互补信息,例如边缘和颜色,从而优化高级功能。对几个SOD基准的广泛实验证明,与现有的USOD方法相比,我们的框架报告了显着性能。此外,在3000张图像上培训我们的框架约1小时,比以前的最先进的方法快30倍。
translated by 谷歌翻译
风险的准确器官(OAR)分割对于减少治疗后并发症的放射治疗至关重要。达人指南推荐头部和颈部(H&N)区域的一套超过40桨的桨,然而,由于这项任务的可预测的禁止劳动力成本,大多数机构通过划定较小的桨子和忽视的少数,选择了大量简化的协议与其他桨相关的剂量分布。在这项工作中,我们提出了一种使用深度学习的新颖,自动化和高效的分层OAR分段(SOARS)系统,精确地描绘了一套全面的42 H&N OAR。 SOARS将42桨分层进入锚,中级和小型和硬质子类别,通过神经结构搜索(NAS)原则,专门为每个类别提供神经网络架构。我们在内在机构中使用176名培训患者建立了SOAR模型,并在六个不同的机构中独立评估了1327名外部患者。对于每个机构评估,它始终如一地表现出其他最先进的方法至少3-5%的骰子得分(在其他度量的相对误差减少36%)。更重要的是,广泛的多用户研究明显证明,98%的SOARE预测只需要非常轻微或没有直接临床验收的修订(节省90%的辐射脑神经工作负载),并且它们的分割和剂量准确度在于或小于帧 - 用户的变化。这些调查结果证实了H&N癌症放射疗法工作流OAR描绘过程的强烈临床适用性,提高了效率,全面性和质量。
translated by 谷歌翻译
With the development of a series of Galaxy sky surveys in recent years, the observations increased rapidly, which makes the research of machine learning methods for galaxy image recognition a hot topic. Available automatic galaxy image recognition researches are plagued by the large differences in similarity between categories, the imbalance of data between different classes, and the discrepancy between the discrete representation of Galaxy classes and the essentially gradual changes from one morphological class to the adjacent class (DDRGC). These limitations have motivated several astronomers and machine learning experts to design projects with improved galaxy image recognition capabilities. Therefore, this paper proposes a novel learning method, ``Hierarchical Imbalanced data learning with Weighted sampling and Label smoothing" (HIWL). The HIWL consists of three key techniques respectively dealing with the above-mentioned three problems: (1) Designed a hierarchical galaxy classification model based on an efficient backbone network; (2) Utilized a weighted sampling scheme to deal with the imbalance problem; (3) Adopted a label smoothing technique to alleviate the DDRGC problem. We applied this method to galaxy photometric images from the Galaxy Zoo-The Galaxy Challenge, exploring the recognition of completely round smooth, in between smooth, cigar-shaped, edge-on and spiral. The overall classification accuracy is 96.32\%, and some superiorities of the HIWL are shown based on recall, precision, and F1-Score in comparing with some related works. In addition, we also explored the visualization of the galaxy image features and model attention to understand the foundations of the proposed scheme.
translated by 谷歌翻译
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision
translated by 谷歌翻译
近年来,深度学习的显着进步主要是由于规模的改进而驱动,在该规模上,更大的模型在较大的数据集上进行了更长的时间表的培训。为了从经验上预测规模的好处,我们主张基于外推损失的更严格的方法,而不是报告最合适的(插值)参数。然后,我们提出了一种从学习曲线可靠地估算缩放定律参数的配方。我们证明,除了来自大型基础评估基准的任务外,除了大型域中,包括图像分类,神经机器翻译(NMT)和语言建模,包括图像分类,神经机器翻译(NMT)和语言建模,它比以前的方法更准确地推断出更准确的方法。最后,我们发布了一个由90个评估任务组成的基准数据集,以促进该领域的研究。
translated by 谷歌翻译
近年来,卷积神经网络(CNN)在合成孔径雷达(SAR)目标识别方面表现出巨大的潜力。 SAR图像具有强烈的粒度感,并且具有不同的纹理特征,例如斑点噪声,目标优势散射器和目标轮廓,这些轮廓很少在传统的CNN模型中被考虑。本文提出了两个残留块,即具有多尺度接收场(RFS)的EMC2A块,基于多型结构,然后设计了有效的同位素体系结构深CNN(DCNN),EMC2A-net。 EMC2A阻止使用不同的扩张速率利用平行的扩张卷积,这可以有效地捕获多尺度上下文特征而不会显着增加计算负担。为了进一步提高多尺度功能融合的效率,本文提出了多尺度特征跨通道注意模块,即EMC2A模块,采用了局部的多尺度特征交互策略,而无需降低维度。该策略通过有效的一维(1D) - 圆形卷积和Sigmoid函数适应每个通道的权重,以指导全球通道明智的关注。 MSTAR数据集上的比较结果表明,EMC2A-NET优于相同类型的现有模型,并且具有相对轻巧的网络结构。消融实验结果表明,仅使用一些参数和适当的跨渠道相互作用,EMC2A模块可显着提高模型的性能。
translated by 谷歌翻译