三维(3D)图像(例如CT,MRI和PET)在医学成像应用中很常见,在临床诊断中很重要。语义歧义是许多医学图像标签的典型特征。这可能是由许多因素引起的,例如成像特性,病理解剖学以及二进制面具的弱表示,这给精确的3D分割带来了挑战。在2D医学图像中,使用软面膜代替图像垫形式产生的二进制掩码来表征病变可以提供丰富的语义信息,更全面地描述病变的结构特征,从而使后续诊断和分析受益。在这项工作中,我们将图像垫子介绍到3D场景中,以描述3D医学图像中的病变。 3D模态中图像垫的研究有限,并且没有与3D矩阵相关的高质量注释数据集,因此减慢了基于数据驱动的深度学习方法的发展。为了解决这个问题,我们构建了第一个3D医疗垫数据集,并通过质量控制和下游实验中的肺结节分类中令人信服地验证了数据集的有效性。然后,我们将四个选定的最新2D图像矩阵算法调整为3D场景,并进一步自定义CT图像的方法。此外,我们提出了第一个端到端的深3D垫网络,并实施了可靠的3D医疗图像垫测试基准,该基准将被发布以鼓励进一步的研究。
translated by 谷歌翻译
实际上,许多医疗数据集在疾病标签空间上定义了基本的分类学。但是,现有的医学诊断分类算法通常假定具有语义独立的标签。在这项研究中,我们旨在利用深度学习算法来利用类层次结构,以更准确,可靠的皮肤病变识别。我们提出了一个双曲线网络,以共同学习图像嵌入和类原型。事实证明,双曲线为与欧几里得几何形状更好地建模层次关系提供了一个空间。同时,我们使用从类层次结构编码的距离矩阵限制双曲线原型的分布。因此,学习的原型保留了嵌入空间中的语义类关系,我们可以通过将图像特征分配给最近的双曲线类原型来预测图像的标签。我们使用内部皮肤病变数据集,该数据集由65种皮肤疾病的大约230k皮肤镜图像组成,以验证我们的方法。广泛的实验提供了证据表明,与模型相比,我们的模型可以实现更高的准确性,而在不考虑班级关系的情况下可以实现更高的严重分类错误。
translated by 谷歌翻译
基于对抗性学习的现有无监督的域适应方法在多个医学成像任务中取得了良好的表现。但是,这些方法仅着眼于全局分布适应,而忽略了类别级别的分布约束,这将导致次级适应性的性能。本文基于类别级别的正则化提出了一个无监督的域适应框架,该框架从三个角度正规化了类别分布。具体而言,对于域间类别的正则化,提出了一个自适应原型比对模块,以使源和目标域中同一类别的特征原型对齐。此外,对于域内类别的正则化,我们分别针对源和目标域定制了正则化技术。在源域中,提出了原型引导的判别性损失,以通过执行阶层内紧凑性和类间的分离性来学习更多的判别特征表示,并作为对传统监督损失的补充。在目标域中,提出了增强的一致性类别的正则化损失,以迫使该模型为增强/未增强目标图像提供一致的预测,这鼓励在语义上相似的区域给予相同的标签。在两个公共底面数据集上进行的广泛实验表明,所提出的方法显着优于其他最先进的比较算法。
translated by 谷歌翻译
生成图形结构化数据需要学习图形的基础分布。然而,这是一个具有挑战性的问题,先前的图生成方法要么无法捕获图形的置换率属性,要么无法充分对节点和边缘之间的复杂依赖性进行建模,这对于生成现实世界图(例如分子)至关重要。为了克服此类局限性,我们为具有连续时间框架的图形提出了一种基于分数的新型生成模型。具体而言,我们提出了一个新的图扩散过程,该过程通过随机微分方程(SDE)系统建模节点和边缘的联合分布。然后,我们得出了针对建议的扩散过程量身定制的新的分数匹配目标,以估算关节对数密度相对于每个组件的梯度,并为SDE系统引入一个新的求解器,以从反向扩散过程中有效采样。我们验证了不同数据集的图形生成方法,在该数据集上,它要么在其上取得了比基线显着或竞争性能的。进一步的分析表明,我们的方法能够生成接近训练分布但不违反化学价值规则的分子,从而证明了SDE系统在建模节点边缘关系中的有效性。我们的代码可在https://github.com/harryjo97/gdss上找到。
translated by 谷歌翻译
在现实世界中,医疗数据集通常表现出长尾数据分布(即,一些类占据大多数数据,而大多数类都很少有一些样本),这导致挑战的不平衡学习场景。例如,估计有超过40种不同的视网膜疾病,无论发生了多种发病率,然而,来自全球患者队列的超过30多种条件非常罕见,这导致基于深度学习的筛选典型的长尾学习问题楷模。此外,视网膜中可能存在多种疾病,这导致多标签情景并为重新采样策略带来标签共生问题。在这项工作中,我们提出了一种新颖的框架,利用了视网膜疾病的先验知识,以便在等级 - 明智的约束下培训模型的更强大的代表。然后,首先引入了一个实例 - 明智的类平衡的采样策略和混合知识蒸馏方式,以从长尾的多标签分布中学习。我们的实验培训超过一百万个样品的视网膜数据集展示了我们所提出的方法的优越性,这些方法优于所有竞争对手,并显着提高大多数疾病的识别准确性,特别是那些罕见的疾病。
translated by 谷歌翻译
难以通过二进制面具手动准确标记含糊不清的和复杂形状的目标。在医学图像分割中突出显示二元掩模下面的弱点,其中模糊是普遍的。在多个注释的情况下,通过二元面具对临床医生达成共识更具挑战性。此外,这些不确定的区域与病变结构有关,可能含有有利于诊断的解剖信息。然而,目前关于不确定性的研究主要关注模型培训和数据标签的不确定性。他们都没有调查病变本身的模糊性质的影响。通过图像消光,透过图像消光,将Alpha Matte作为软片介绍,代表医学场景中不确定的区域,并因此提出了一种新的不确定性量化方法来填补填补差距病变结构的不确定性研究。在这项工作中,我们在多任务框架中引入了一种新的架构,以在多任务框架中生成二进制掩模和alpha掩饰,这优于所有最先进的消光算法。建议的不确定性地图能够突出模糊地区和我们提出的新型多任务损失加权策略可以进一步提高性能并证明其具体的益处。为了充分评估我们提出的方法的有效性,我们首先用alpha哑布标记了三个医疗数据集,以解决医学场景中可用消光数据集的短缺,并证明alpha遮罩是一种比定性的二进制掩模更有效的标签方法和量化方面。
translated by 谷歌翻译
Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.
translated by 谷歌翻译
Weakly-supervised temporal action localization (WTAL) learns to detect and classify action instances with only category labels. Most methods widely adopt the off-the-shelf Classification-Based Pre-training (CBP) to generate video features for action localization. However, the different optimization objectives between classification and localization, make temporally localized results suffer from the serious incomplete issue. To tackle this issue without additional annotations, this paper considers to distill free action knowledge from Vision-Language Pre-training (VLP), since we surprisingly observe that the localization results of vanilla VLP have an over-complete issue, which is just complementary to the CBP results. To fuse such complementarity, we propose a novel distillation-collaboration framework with two branches acting as CBP and VLP respectively. The framework is optimized through a dual-branch alternate training strategy. Specifically, during the B step, we distill the confident background pseudo-labels from the CBP branch; while during the F step, the confident foreground pseudo-labels are distilled from the VLP branch. And as a result, the dual-branch complementarity is effectively fused to promote a strong alliance. Extensive experiments and ablation studies on THUMOS14 and ActivityNet1.2 reveal that our method significantly outperforms state-of-the-art methods.
translated by 谷歌翻译
Photometric stereo recovers the surface normals of an object from multiple images with varying shading cues, i.e., modeling the relationship between surface orientation and intensity at each pixel. Photometric stereo prevails in superior per-pixel resolution and fine reconstruction details. However, it is a complicated problem because of the non-linear relationship caused by non-Lambertian surface reflectance. Recently, various deep learning methods have shown a powerful ability in the context of photometric stereo against non-Lambertian surfaces. This paper provides a comprehensive review of existing deep learning-based calibrated photometric stereo methods. We first analyze these methods from different perspectives, including input processing, supervision, and network architecture. We summarize the performance of deep learning photometric stereo models on the most widely-used benchmark data set. This demonstrates the advanced performance of deep learning-based photometric stereo methods. Finally, we give suggestions and propose future research trends based on the limitations of existing models.
translated by 谷歌翻译
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate attention mechanism, improve inefficient designs, or incorporate mobile-friendly lightweight convolutions to form hybrid architectures. However, ViT and its variants still have higher latency or considerably more parameters than lightweight CNNs, even true for the years-old MobileNet. In practice, latency and size are both crucial for efficient deployment on resource-constraint hardware. In this work, we investigate a central question, can transformer models run as fast as MobileNet and maintain a similar size? We revisit the design choices of ViTs and propose an improved supernet with low latency and high parameter efficiency. We further introduce a fine-grained joint search strategy that can find efficient architectures by optimizing latency and number of parameters simultaneously. The proposed models, EfficientFormerV2, achieve about $4\%$ higher top-1 accuracy than MobileNetV2 and MobileNetV2$\times1.4$ on ImageNet-1K with similar latency and parameters. We demonstrate that properly designed and optimized vision transformers can achieve high performance with MobileNet-level size and speed.
translated by 谷歌翻译