深度学习(DL)模型为各种医学成像基准挑战提供了最先进的性能,包括脑肿瘤细分(BRATS)挑战。然而,局灶性病理多隔室分割(例如,肿瘤和病变子区)的任务特别具有挑战性,并且潜在的错误阻碍DL模型转化为临床工作流程。量化不确定形式的DL模型预测的可靠性,可以实现最不确定的地区的临床审查,从而建立信任并铺平临床翻译。最近,已经引入了许多不确定性估计方法,用于DL医学图像分割任务。开发指标评估和比较不确定性措施的表现将有助于最终用户制定更明智的决策。在本研究中,我们探索并评估在Brats 2019-2020任务期间开发的公制,以对不确定量化量化(Qu-Brats),并旨在评估和排列脑肿瘤多隔室分割的不确定性估计。该公制(1)奖励不确定性估计,对正确断言产生高置信度,以及在不正确的断言处分配低置信水平的估计数,(2)惩罚导致更高百分比的无关正确断言百分比的不确定性措施。我们进一步基准测试由14个独立参与的Qu-Brats 2020的分割不确定性,所有这些都参与了主要的Brats细分任务。总体而言,我们的研究结果证实了不确定性估计提供了分割算法的重要性和互补价值,因此突出了医学图像分析中不确定性量化的需求。我们的评估代码在HTTPS://github.com/ragmeh11/qu-brats公开提供。
translated by 谷歌翻译
Automatic segmentation is essential for the brain tumor diagnosis, disease prognosis, and follow-up therapy of patients with gliomas. Still, accurate detection of gliomas and their sub-regions in multimodal MRI is very challenging due to the variety of scanners and imaging protocols. Over the last years, the BraTS Challenge has provided a large number of multi-institutional MRI scans as a benchmark for glioma segmentation algorithms. This paper describes our contribution to the BraTS 2022 Continuous Evaluation challenge. We propose a new ensemble of multiple deep learning frameworks namely, DeepSeg, nnU-Net, and DeepSCAN for automatic glioma boundaries detection in pre-operative MRI. It is worth noting that our ensemble models took first place in the final evaluation on the BraTS testing dataset with Dice scores of 0.9294, 0.8788, and 0.8803, and Hausdorf distance of 5.23, 13.54, and 12.05, for the whole tumor, tumor core, and enhancing tumor, respectively. Furthermore, the proposed ensemble method ranked first in the final ranking on another unseen test dataset, namely Sub-Saharan Africa dataset, achieving mean Dice scores of 0.9737, 0.9593, and 0.9022, and HD95 of 2.66, 1.72, 3.32 for the whole tumor, tumor core, and enhancing tumor, respectively. The docker image for the winning submission is publicly available at (https://hub.docker.com/r/razeineldin/camed22).
translated by 谷歌翻译
尽管脑肿瘤分割的准确性最近取得了进步,但结果仍然遭受低可靠性和鲁棒性的影响。不确定性估计是解决此问题的有效解决方案,因为它提供了对分割结果的信心。当前的不确定性估计方法基于分位数回归,贝叶斯神经网络,集合和蒙特卡洛辍学者受其高计算成本和不一致的限制。为了克服这些挑战,在最近的工作中开发了证据深度学习(EDL),但主要用于自然图像分类。在本文中,我们提出了一个基于区域的EDL分割框架,该框架可以生成可靠的不确定性图和可靠的分割结果。我们使用证据理论将神经网络的输出解释为从输入特征收集的证据价值。遵循主观逻辑,将证据作为差异分布进行了参数化,预测的概率被视为主观意见。为了评估我们在分割和不确定性估计的模型的性能,我们在Brats 2020数据集上进行了定量和定性实验。结果证明了所提出的方法在量化分割不确定性和稳健分割肿瘤方面的最高性能。此外,我们提出的新框架保持了低计算成本和易于实施的优势,并显示了临床应用的潜力。
translated by 谷歌翻译
深度学习技术在检测医学图像中的对象方面取得了成功,但仍然遭受虚假阳性预测,可能会阻碍准确的诊断。神经网络输出的估计不确定性已用于标记不正确的预测。我们研究了来自神经网络不确定性估计的功能和基于形状的特征,这些特征是根据二进制预测计算出的,从二进制预测中,通过开发基于分类的后处理步骤来减少肝病病变检测中的假阳性,以用于不同的不确定性估计方法。我们证明了两个数据集上所有不确定性估计方法的神经网络的病变检测性能(相对于F1分数)的改善,分别包括腹部MR和CT图像。我们表明,根据神经网络不确定性估计计算的功能往往不会有助于降低假阳性。我们的结果表明,诸如阶级不平衡(真实假阳性比率)和从不确定性图提取的基于形状的特征之类的因素在区分假阳性和真实阳性预测方面起着重要作用
translated by 谷歌翻译
Glioblastomas是最具侵略性的快速生长的主要脑癌,起源于大脑的胶质细胞。准确鉴定恶性脑肿瘤及其子区域仍然是医学图像分割中最具挑战性问题之一。脑肿瘤分割挑战(Brats)是自动脑胶质细胞瘤分割算法的流行基准,自于其启动。在今年的挑战中,Brats 2021提供了2,000名术前患者的最大多参数(MPMRI)数据集。在本文中,我们提出了两个深度学习框架的新聚合,即在术前MPMRI中的自动胶质母细胞瘤识别的Deepseg和NNU-Net。我们的集合方法获得了92.00,87.33和84.10和Hausdorff距离为3.81,8.91和16.02的骰子相似度分数,用于增强肿瘤,肿瘤核心和全肿瘤区域,单独进行。这些实验结果提供了证据表明它可以在临床上容易地应用,从而助攻脑癌预后,治疗计划和治疗反应监测。
translated by 谷歌翻译
Objective: Convolutional neural networks (CNNs) have demonstrated promise in automated cardiac magnetic resonance image segmentation. However, when using CNNs in a large real-world dataset, it is important to quantify segmentation uncertainty and identify segmentations which could be problematic. In this work, we performed a systematic study of Bayesian and non-Bayesian methods for estimating uncertainty in segmentation neural networks. Methods: We evaluated Bayes by Backprop, Monte Carlo Dropout, Deep Ensembles, and Stochastic Segmentation Networks in terms of segmentation accuracy, probability calibration, uncertainty on out-of-distribution images, and segmentation quality control. Results: We observed that Deep Ensembles outperformed the other methods except for images with heavy noise and blurring distortions. We showed that Bayes by Backprop is more robust to noise distortions while Stochastic Segmentation Networks are more resistant to blurring distortions. For segmentation quality control, we showed that segmentation uncertainty is correlated with segmentation accuracy for all the methods. With the incorporation of uncertainty estimates, we were able to reduce the percentage of poor segmentation to 5% by flagging 31--48% of the most uncertain segmentations for manual review, substantially lower than random review without using neural network uncertainty (reviewing 75--78% of all images). Conclusion: This work provides a comprehensive evaluation of uncertainty estimation methods and showed that Deep Ensembles outperformed other methods in most cases. Significance: Neural network uncertainty measures can help identify potentially inaccurate segmentations and alert users for manual review.
translated by 谷歌翻译
通过磁共振成像(MRI)评估肿瘤负担对于评估胶质母细胞瘤的治疗反应至关重要。由于疾病的高异质性和复杂性,该评估的性能很复杂,并且与高变异性相关。在这项工作中,我们解决了这个问题,并提出了一条深度学习管道,用于对胶质母细胞瘤患者进行全自动的端到端分析。我们的方法同时确定了肿瘤的子区域,包括第一步的肿瘤,周围肿瘤和手术腔,然后计算出遵循神经符号学(RANO)标准的当前响应评估的体积和双相测量。此外,我们引入了严格的手动注释过程,其随后是人类专家描绘肿瘤子区域的,并捕获其分割的信心,后来在训练深度学习模型时被使用。我们广泛的实验研究的结果超过了760次术前和504例从公共数据库获得的神经胶质瘤后患者(2021 - 2020年在19个地点获得)和临床治疗试验(47和69个地点,可用于公共数据库(在19个地点获得)(47和69个地点)术前/术后患者,2009-2011)并以彻底的定量,定性和统计分析进行了备份,表明我们的管道在手动描述时间的一部分中对术前和术后MRI进行了准确的分割(最高20比人更快。二维和体积测量与专家放射科医生非常吻合,我们表明RANO测量并不总是足以量化肿瘤负担。
translated by 谷歌翻译
The clinical interest is often to measure the volume of a structure, which is typically derived from a segmentation. In order to evaluate and compare segmentation methods, the similarity between a segmentation and a predefined ground truth is measured using popular discrete metrics, such as the Dice score. Recent segmentation methods use a differentiable surrogate metric, such as soft Dice, as part of the loss function during the learning phase. In this work, we first briefly describe how to derive volume estimates from a segmentation that is, potentially, inherently uncertain or ambiguous. This is followed by a theoretical analysis and an experimental validation linking the inherent uncertainty to common loss functions for training CNNs, namely cross-entropy and soft Dice. We find that, even though soft Dice optimization leads to an improved performance with respect to the Dice score and other measures, it may introduce a volume bias for tasks with high inherent uncertainty. These findings indicate some of the method's clinical limitations and suggest doing a closer ad-hoc volume analysis with an optional re-calibration step.
translated by 谷歌翻译
心肌活力的评估对于患有心肌梗塞的患者的诊断和治疗管理是必不可少的,并且心肌病理学的分类是本评估的关键。这项工作定义了医学图像分析的新任务,即进行心肌病理分割(MYOPS)结合三个序列的心脏磁共振(CMR)图像,该图像首次与Mycai 2020一起在Myops挑战中提出的。挑战提供了45个配对和预对准的CMR图像,允许算法将互补信息与三个CMR序列组合到病理分割。在本文中,我们提供了挑战的详细信息,从十五个参与者的作品调查,并根据五个方面解释他们的方法,即预处理,数据增强,学习策略,模型架构和后处理。此外,我们对不同因素的结果分析了结果,以检查关键障碍和探索解决方案的潜力,以及为未来的研究提供基准。我们得出结论,虽然报告了有前途的结果,但研究仍处于早期阶段,在成功应用于诊所之前需要更深入的探索。请注意,MyOPS数据和评估工具继续通过其主页(www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20 /)注册注册。
translated by 谷歌翻译
多发性硬化症(MS)是中枢神经系统的慢性炎症和退行性疾病,其特征在于,白色和灰质的外观与个体患者的神经症状和标志进行地平整相关。磁共振成像(MRI)提供了详细的体内结构信息,允许定量和分类MS病变,其批判性地通知疾病管理。传统上,MS病变在2D MRI切片上手动注释,一个流程效率低,易于观察室内误差。最近,已经提出了自动统计成像分析技术以基于MRI体素强度检测和分段段病变。然而,它们的有效性受到MRI数据采集技术的异质性和MS病变的外观的限制。通过直接从图像学习复杂的病变表现,深度学习技术已经在MS病变分割任务中取得了显着的突破。在这里,我们提供了全面审查最先进的自动统计和深度学习MS分段方法,并讨论当前和未来的临床应用。此外,我们审查了域适应等技术策略,以增强现实世界临床环境中的MS病变分段。
translated by 谷歌翻译
分配转移或培训数据和部署数据之间的不匹配是在高风险工业应用中使用机器学习的重要障碍,例如自动驾驶和医学。这需要能够评估ML模型的推广以及其不确定性估计的质量。标准ML基线数据集不允许评估这些属性,因为培训,验证和测试数据通常相同分布。最近,已经出现了一系列专用基准测试,其中包括分布匹配和转移的数据。在这些基准测试中,数据集在任务的多样性以及其功能的数据模式方面脱颖而出。虽然大多数基准测试由2D图像分类任务主导,但Shifts包含表格天气预测,机器翻译和车辆运动预测任务。这使得可以评估模型的鲁棒性属性,并可以得出多种工业规模的任务以及通用或直接适用的特定任务结论。在本文中,我们扩展了偏移数据集,其中两个数据集来自具有高社会重要性的工业高风险应用程序。具体而言,我们考虑了3D磁共振脑图像中白质多发性硬化病变的分割任务以及海洋货物容器中功耗的估计。两项任务均具有无处不在的分配变化和由于错误成本而构成严格的安全要求。这些新数据集将使研究人员能够进一步探索新情况下的强大概括和不确定性估计。在这项工作中,我们提供了两个任务的数据集和基线结果的描述。
translated by 谷歌翻译
Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.
translated by 谷歌翻译
域适应(DA)最近在医学影像社区提出了强烈的兴趣。虽然已经提出了大量DA技术进行了用于图像分割,但大多数这些技术已经在私有数据集或小公共可用数据集上验证。此外,这些数据集主要解决了单级问题。为了解决这些限制,与第24届医学图像计算和计算机辅助干预(Miccai 2021)结合第24届国际会议组织交叉模态域适应(Crossmoda)挑战。 Crossmoda是无监督跨型号DA的第一个大型和多级基准。挑战的目标是分割参与前庭施瓦新瘤(VS)的后续和治疗规划的两个关键脑结构:VS和Cochleas。目前,使用对比度增强的T1(CET1)MRI进行VS患者的诊断和监测。然而,使用诸如高分辨率T2(HRT2)MRI的非对比度序列越来越感兴趣。因此,我们创建了一个无人监督的跨模型分段基准。训练集提供注释CET1(n = 105)和未配对的非注释的HRT2(n = 105)。目的是在测试集中提供的HRT2上自动对HRT2进行单侧VS和双侧耳蜗分割(n = 137)。共有16支球队提交了评估阶段的算法。顶级履行团队达成的表现水平非常高(最佳中位数骰子 - vs:88.4%; Cochleas:85.7%)并接近完全监督(中位数骰子 - vs:92.5%;耳蜗:87.7%)。所有顶级执行方法都使用图像到图像转换方法将源域图像转换为伪目标域图像。然后使用这些生成的图像和为源图像提供的手动注释进行培训分割网络。
translated by 谷歌翻译
State-of-the-art brain tumor segmentation is based on deep learning models applied to multi-modal MRIs. Currently, these models are trained on images after a preprocessing stage that involves registration, interpolation, brain extraction (BE, also known as skull-stripping) and manual correction by an expert. However, for clinical practice, this last step is tedious and time-consuming and, therefore, not always feasible, resulting in skull-stripping faults that can negatively impact the tumor segmentation quality. Still, the extent of this impact has never been measured for any of the many different BE methods available. In this work, we propose an automatic brain tumor segmentation pipeline and evaluate its performance with multiple BE methods. Our experiments show that the choice of a BE method can compromise up to 15.7% of the tumor segmentation performance. Moreover, we propose training and testing tumor segmentation models on non-skull-stripped images, effectively discarding the BE step from the pipeline. Our results show that this approach leads to a competitive performance at a fraction of the time. We conclude that, in contrast to the current paradigm, training tumor segmentation models on non-skull-stripped images can be the best option when high performance in clinical practice is desired.
translated by 谷歌翻译
脑小血管疾病的成像标记提供了有关脑部健康的宝贵信息,但是它们的手动评估既耗时又受到实质性内部和间际变异性的阻碍。自动化评级可能受益于生物医学研究以及临床评估,但是现有算法的诊断可靠性尚不清楚。在这里,我们介绍了\ textIt {血管病变检测和分割}(\ textit {v textit {where valdo?})挑战,该挑战是在国际医学图像计算和计算机辅助干预措施(MICCAI)的卫星事件中运行的挑战(MICCAI) 2021.这一挑战旨在促进大脑小血管疾病的小而稀疏成像标记的自动检测和分割方法的开发,即周围空间扩大(EPVS)(任务1),脑微粒(任务2)和预先塑造的鞋类血管起源(任务3),同时利用弱和嘈杂的标签。总体而言,有12个团队参与了针对一个或多个任务的解决方案的挑战(任务1 -EPVS 4,任务2 -Microbleeds的9个,任务3 -lacunes的6个)。多方数据都用于培训和评估。结果表明,整个团队和跨任务的性能都有很大的差异,对于任务1- EPV和任务2-微型微型且对任务3 -lacunes尚无实际的结果,其结果尤其有望。它还强调了可能阻止个人级别使用的情况的性能不一致,同时仍证明在人群层面上有用。
translated by 谷歌翻译
本文提出了第二版的头部和颈部肿瘤(Hecktor)挑战的概述,作为第24届医学图像计算和计算机辅助干预(Miccai)2021的卫星活动。挑战由三个任务组成与患有头颈癌(H&N)的患者的PET / CT图像的自动分析有关,专注于oropharynx地区。任务1是FDG-PET / CT图像中H&N主肿瘤肿瘤体积(GTVT)的自动分割。任务2是来自同一FDG-PET / CT的进展自由生存(PFS)的自动预测。最后,任务3与任务2的任务2与参与者提供的地面真理GTVT注释相同。这些数据从六个中心收集,总共325个图像,分为224个培训和101个测试用例。通过103个注册团队和448个结果提交的重要参与,突出了对挑战的兴趣。在第一任务中获得0.7591的骰子相似度系数(DSC),分别在任务2和3中的0.7196和0.6978的一致性指数(C-Index)。在所有任务中,发现这种方法的简单性是确保泛化性能的关键。 PFS预测性能在任务2和3中的比较表明,提供GTVT轮廓对于实现最佳结果,这表明可以使用完全自动方法。这可能避免了对GTVT轮廓的需求,用于可重复和大规模的辐射瘤研究的开头途径,包括千元潜在的受试者。
translated by 谷歌翻译
Clinical diagnostic and treatment decisions rely upon the integration of patient-specific data with clinical reasoning. Cancer presents a unique context that influence treatment decisions, given its diverse forms of disease evolution. Biomedical imaging allows noninvasive assessment of disease based on visual evaluations leading to better clinical outcome prediction and therapeutic planning. Early methods of brain cancer characterization predominantly relied upon statistical modeling of neuroimaging data. Driven by the breakthroughs in computer vision, deep learning became the de facto standard in the domain of medical imaging. Integrated statistical and deep learning methods have recently emerged as a new direction in the automation of the medical practice unifying multi-disciplinary knowledge in medicine, statistics, and artificial intelligence. In this study, we critically review major statistical and deep learning models and their applications in brain imaging research with a focus on MRI-based brain tumor segmentation. The results do highlight that model-driven classical statistics and data-driven deep learning is a potent combination for developing automated systems in clinical oncology.
translated by 谷歌翻译
乳腺癌是女性最常见的恶性肿瘤,每年负责超过50万人死亡。因此,早期和准确的诊断至关重要。人类专业知识是诊断和正确分类乳腺癌并定义适当的治疗,这取决于评价不同生物标志物如跨膜蛋白受体HER2的表达。该评估需要几个步骤,包括免疫组织化学或原位杂交等特殊技术,以评估HER2状态。通过降低诊断中的步骤和人类偏差的次数的目标,赫洛挑战是组织的,作为第16届欧洲数字病理大会的并行事件,旨在自动化仅基于苏木精和曙红染色的HER2地位的评估侵袭性乳腺癌的组织样本。评估HER2状态的方法是在全球21个团队中提出的,并通过一些提议的方法实现了潜在的观点,以推进最先进的。
translated by 谷歌翻译
肺癌是最致命的癌症之一,部分诊断和治疗取决于肿瘤的准确描绘。目前是最常见的方法的人以人为本的分割,须遵守观察者间变异性,并且考虑到专家只能提供注释的事实,也是耗时的。最近展示了有前途的结果,自动和半自动肿瘤分割方法。然而,随着不同的研究人员使用各种数据集和性能指标验证了其算法,可靠地评估这些方法仍然是一个开放的挑战。通过2018年IEEE视频和图像处理(VIP)杯竞赛创建的计算机断层摄影扫描(LOTUS)基准测试的肺起源肿瘤分割的目标是提供唯一的数据集和预定义的指标,因此不同的研究人员可以开发和以统一的方式评估他们的方法。 2018年VIP杯始于42个国家的全球参与,以获得竞争数据。在注册阶段,有129名成员组成了来自10个国家的28个团队,其中9个团队将其达到最后阶段,6队成功完成了所有必要的任务。简而言之,竞争期间提出的所有算法都是基于深度学习模型与假阳性降低技术相结合。三种决赛选手开发的方法表明,有希望的肿瘤细分导致导致越来越大的努力应降低假阳性率。本次竞争稿件概述了VIP-Cup挑战,以及所提出的算法和结果。
translated by 谷歌翻译
This paper focuses on the uncertainty estimation of white matter lesions (WML) segmentation in magnetic resonance imaging (MRI). On one side, voxel-scale segmentation errors cause the erroneous delineation of the lesions; on the other side, lesion-scale detection errors lead to wrong lesion counts. Both of these factors are clinically relevant for the assessment of multiple sclerosis patients. This work aims to compare the ability of different voxel- and lesion- scale uncertainty measures to capture errors related to segmentation and lesion detection respectively. Our main contributions are (i) proposing new measures of lesion-scale uncertainty that do not utilise voxel-scale uncertainties; (ii) extending an error retention curves analysis framework for evaluation of lesion-scale uncertainty measures. Our results obtained on the multi-center testing set of 58 patients demonstrate that the proposed lesion-scale measures achieves the best performance among the analysed measures. All code implementations are provided at https://github.com/NataliiaMolch/MS_WML_uncs
translated by 谷歌翻译