在神经网络应用中,不足的培训样本是一个常见的问题。尽管数据增强方法至少需要最少数量的样本,但我们提出了一种基于新颖的,基于渲染的管道来合成带注释的数据集。我们的方法不会修改现有样本,而是合成全新样本。提出的基于渲染的管道能够在全自动过程中生成和注释合成和部分真实的图像和视频数据。此外,管道可以帮助获取真实数据。拟议的管道基于渲染过程。此过程生成综合数据。部分实现的数据使合成序列通过在采集过程中合并真实摄像机使综合序列更接近现实。在自动车牌识别的背景下,广泛的实验验证证明了拟议的数据生成管道的好处,尤其是对于具有有限的可用培训数据的机器学习方案。与仅在实际数据集中训练的OCR算法相比,该实验表明,角色错误率和错过率分别从73.74%和100%和14.11%和41.27%降低。这些改进是通过仅对合成数据训练算法来实现的。当另外合并真实数据时,错误率可以进一步降低。因此,角色错误率和遗漏率可以分别降低至11.90%和39.88%。在实验过程中使用的所有数据以及针对自动数据生成的拟议基于渲染的管道公开可用(URL将在出版时揭示)。
translated by 谷歌翻译
神经网络几乎被广泛用于识别图像内容的任何任务。尽管已经为研究有效的网络架构,优化器和培训策略而付出了很多努力,但图像插值对神经网络性能的影响尚未得到很好的研究。此外,研究表明,神经网络通常对输入图像的微小变化敏感,从而导致其性能急剧下降。因此,我们建议在本文中使用关键点不可知的选择性网格到网格重采样(FSMR)来处理神经网络的输入数据。这种基于模型的插值方法已经表明,它能够用PSNR优于常见的插值方法。我们使用广泛的实验评估表明,根据网络体系结构和分类任务,FSMR在培训过程中的应用有助于学习过程。此外,我们表明在应用阶段使用FSMR是有益的。对于RESNET50和OXFLOWER17数据集,可以提高分类精度高达4.31个百分点。
translated by 谷歌翻译
在本文中,我们提出了一个用于光学特征识别(OCR)的数据增强框架。所提出的框架能够合成新的视角和照明方案,从而有效地丰富任何可用的OCR数据集。它的模块化结构允许修改以符合单个用户需求。该框架使得可以舒适地扩展可用数据集的扩大因子。此外,所提出的方法不仅限于单帧OCR,但也可以应用于视频OCR。我们通过扩大普通BRNO移动OCR数据集的15%子集来证明框架的性能。我们提出的框架能够利用OCR应用程序的性能,尤其是对于小型数据集。应用提出的方法,在字符错误率(CER)方面提高了多达2.79个百分点,并在子集中获得了高达7.88个百分点。特别是可以改善对具有挑战性的文本线条的认识。该类别的CER可能会降低14.92个百分点,而该级别的CER可下降到18.19个百分点。此外,与原始的非仪式完整数据集相比,使用建议方法的15%子集进行训练时,我们能够达到较小的错误率。
translated by 谷歌翻译
法医车牌识别(FLPR)仍然是在法律环境(例如刑事调查)中的公开挑战,在刑事调查中,不可读取的车牌(LPS)需要从高度压缩和/或低分辨率录像(例如监视摄像机)中解密。在这项工作中,我们提出了一个侧面信息变压器体系结构,该结构嵌入了输入压缩级别的知识,以改善在强压缩下的识别。我们在低质量的现实世界数据集上显示了变压器对车牌识别(LPR)的有效性。我们还提供了一个合成数据集,其中包括强烈退化,难以辨认的LP图像并分析嵌入知识对其的影响。该网络的表现优于现有的FLPR方法和标准最先进的图像识别模型,同时需要更少的参数。对于最严重的降级图像,我们可以将识别提高多达8.9%。
translated by 谷歌翻译
Coronary Computed Tomography Angiography (CCTA) provides information on the presence, extent, and severity of obstructive coronary artery disease. Large-scale clinical studies analyzing CCTA-derived metrics typically require ground-truth validation in the form of high-fidelity 3D intravascular imaging. However, manual rigid alignment of intravascular images to corresponding CCTA images is both time consuming and user-dependent. Moreover, intravascular modalities suffer from several non-rigid motion-induced distortions arising from distortions in the imaging catheter path. To address these issues, we here present a semi-automatic segmentation-based framework for both rigid and non-rigid matching of intravascular images to CCTA images. We formulate the problem in terms of finding the optimal \emph{virtual catheter path} that samples the CCTA data to recapitulate the coronary artery morphology found in the intravascular image. We validate our co-registration framework on a cohort of $n=40$ patients using bifurcation landmarks as ground truth for longitudinal and rotational registration. Our results indicate that our non-rigid registration significantly outperforms other co-registration approaches for luminal bifurcation alignment in both longitudinal (mean mismatch: 3.3 frames) and rotational directions (mean mismatch: 28.6 degrees). By providing a differentiable framework for automatic multi-modal intravascular data fusion, our developed co-registration modules significantly reduces the manual effort required to conduct large-scale multi-modal clinical studies while also providing a solid foundation for the development of machine learning-based co-registration approaches.
translated by 谷歌翻译
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.
translated by 谷歌翻译
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
translated by 谷歌翻译
The future of population-based breast cancer screening is likely personalized strategies based on clinically relevant risk models. Mammography-based risk models should remain robust to domain shifts caused by different populations and mammographic devices. Modern risk models do not ensure adaptation across vendor-domains and are often conflated to unintentionally rely on both precursors of cancer and systemic/global mammographic information associated with short- and long-term risk, respectively, which might limit performance. We developed a robust, cross-vendor model for long-term risk assessment. An augmentation-based domain adaption technique, based on flavorization of mammographic views, ensured generalization to an unseen vendor-domain. We trained on samples without diagnosed/potential malignant findings to learn systemic/global breast tissue features, called mammographic texture, indicative of future breast cancer. However, training so may cause erratic convergence. By excluding noise-inducing samples and designing a case-control dataset, a robust ensemble texture model was trained. This model was validated in two independent datasets. In 66,607 Danish women with flavorized Siemens views, the AUC was 0.71 and 0.65 for prediction of interval cancers within two years (ICs) and from two years after screening (LTCs), respectively. In a combination with established risk factors, the model's AUC increased to 0.68 for LTCs. In 25,706 Dutch women with Hologic-processed views, the AUCs were not different from the AUCs in Danish women with flavorized views. The results suggested that the model robustly estimated long-term risk while adapting to an unseen processed vendor-domain. The model identified 8.1% of Danish women accounting for 20.9% of ICs and 14.2% of LTCs.
translated by 谷歌翻译
Quaternion valued neural networks experienced rising popularity and interest from researchers in the last years, whereby the derivatives with respect to quaternions needed for optimization are calculated as the sum of the partial derivatives with respect to the real and imaginary parts. However, we can show that product- and chain-rule does not hold with this approach. We solve this by employing the GHRCalculus and derive quaternion backpropagation based on this. Furthermore, we experimentally prove the functionality of the derived quaternion backpropagation.
translated by 谷歌翻译
In this work, a method for obtaining pixel-wise error bounds in Bayesian regularization of inverse imaging problems is introduced. The proposed method employs estimates of the posterior variance together with techniques from conformal prediction in order to obtain coverage guarantees for the error bounds, without making any assumption on the underlying data distribution. It is generally applicable to Bayesian regularization approaches, independent, e.g., of the concrete choice of the prior. Furthermore, the coverage guarantees can also be obtained in case only approximate sampling from the posterior is possible. With this in particular, the proposed framework is able to incorporate any learned prior in a black-box manner. Guaranteed coverage without assumptions on the underlying distributions is only achievable since the magnitude of the error bounds is, in general, unknown in advance. Nevertheless, experiments with multiple regularization approaches presented in the paper confirm that in practice, the obtained error bounds are rather tight. For realizing the numerical experiments, also a novel primal-dual Langevin algorithm for sampling from non-smooth distributions is introduced in this work.
translated by 谷歌翻译