智能论文笔记

An Efficient Multi-Scale Fusion Network for 3D Organ at Risk (OAR) Segmentation

Abhishek Srivastava , Debesh Jha , Elif Keles , Bulent Aydogan , Mohamed Abazeed , Ulas Bagci

分类：计算机视觉

2022-08-15

精确分割器官 - 危险（OARS）是优化放射治疗计划的先驱。现有的基于深度学习的多尺度融合体系结构已显示出2D医疗图像分割的巨大能力。他们成功的关键是汇总全球环境并保持高分辨率表示。但是，当转化为3D分割问题时，由于其大量的计算开销和大量数据饮食，现有的多尺度融合体系结构可能表现不佳。为了解决此问题，我们提出了一个新的OAR分割框架，称为Oarfocalfusenet，该框架融合了多尺度功能，并采用焦点调制来捕获多个尺度的全局本地上下文。每个分辨率流都具有来自不同分辨率量表的特征，并且多尺度信息汇总到模型多样化的上下文范围。结果，功能表示将进一步增强。在我们的实验设置中与OAR分割以及多器官分割的全面比较表明，我们提出的Oarfocalfusenet在公开可用的OpenKBP数据集和Synapse Multi-Organ细分方面的最新最新方法优于最新的最新方法。在标准评估指标方面，提出的两种方法（3D-MSF和Oarfocalfusenet）均表现出色。我们的最佳性能方法（Oarfocalfusenet）在OpenKBP数据集上获得的骰子系数为0.7995，Hausdorff的距离为5.1435，而Synapse Multi-Organ分段数据集则获得了0.8137的骰子系数。

translated by 谷歌翻译

Video Capsule Endoscopy Classification using Focal Modulation Guided Convolutional Neural Network

Abhishek Srivastava , Nikhil Kumar Tomar , Ulas Bagci , Debesh Jha

分类：计算机视觉

2022-06-16

视频胶囊内窥镜检查是计算机视觉和医学的热门话题。深度学习会对视频胶囊内窥镜技术的未来产生积极影响。它可以提高异常检测率，减少医生的筛查时间并有助于实际临床分析。视频胶囊内窥镜检查的CADX分类系统已显示出进一步改进的巨大希望。例如，检测癌性息肉和出血会导致快速的医疗反应并提高患者的存活率。为此，自动化的CADX系统必须具有较高的吞吐量和不错的精度。在本文中，我们提出了焦距，这是一个与轻量级卷积层集成的焦点调制网络，用于分类小肠解剖学地标和腔内发现。 FocalConvnet利用焦点调制以实现全球环境，并允许在整个正向通行证中进行全局本地空间相互作用。此外，具有固有的感应/学习偏置和提取分层特征的能力的卷积块使我们的焦点concalconvnet能够获得高吞吐量的有利结果。我们将焦点vnet与Kvasir-Capsule上的其他SOTA进行比较，Kvasir-Capsule是一个具有44,228帧的大型VCE数据集，具有13类不同的异常。我们提出的方法分别超过了其他SOTA方法论，加权F1得分，回忆和MCC}分别超过了其他SOTA方法。此外，我们报告了在实时临床环境中建立焦距的148.02图像/秒速率的最高吞吐量。建议的focalConvnet的代码可在https://github.com/noviceman-prog/focalconvnet上获得。

translated by 谷歌翻译

Automatic Polyp Segmentation with Multiple Kernel Dilated Convolution Network

Nikhil Kumar Tomar , Abhishek Srivastava , Ulas Bagci , Debesh Jha

分类：计算机视觉

2022-06-13

通过结肠镜检查检测和去除癌前息肉是预防全球结直肠癌的主要技术。然而，内镜医生的结直肠息肉率差异很大。众所周知，计算机辅助诊断（CAD）系统可以帮助内窥镜检测结肠息肉并最大程度地减少内镜医生之间的变化。在这项研究中，我们介绍了一种新颖的深度学习体系结构，称为{\ textbf {mkdcnet}}，以自动息肉分割鲁棒性，以鲁棒性数据分布的重大变化。 MKDCNET只是一个编码器decoder神经网络，它使用预先训练的\ textIt {resnet50}作为编码器和小说\ textit {多个内核扩张卷积（MKDC）}块，可以扩展更多的观点，以了解更多强大的和异性的表示形式。对四个公开息肉数据集和细胞核数据集进行的广泛实验表明，当在从不同分布中对未见息肉数据进行测试时，在对同一数据集进行训练和测试时，所提出的MKDCNET在同一数据集进行训练和测试时，超出了最先进的方法。取得丰富的结果，我们证明了拟议的建筑的鲁棒性。从效率的角度来看，我们的算法可以在RTX 3090 GPU上以每秒（$ \ of45 $）帧进行处理。 MKDCNET可能是建造临床结肠镜检查实时系统的强大基准。建议的MKDCNET的代码可在\ url {https://github.com/nikhilroxtomar/mkdcnet}上获得。

translated by 谷歌翻译

MHATC: Autism Spectrum Disorder identification utilizing multi-head attention encoder along with temporal consolidation modules

Ranjeet Ranjan Jha , Abhishek Bhardwaj , Devin Garg , Arnav Bhavsar , Aditya Nigam

分类：计算机视觉 | 机器学习

2021-12-27

休息状态FMRI通常用于通过使用基于网络的功能连接来诊断自闭症谱系期（ASD）。已经表明，ASD与大脑区域相关联及其连接。然而，基于控制群体的成像数据和ASD患者大脑的成像数据之间的判别是一种非琐碎的任务。为了解决上述分类任务，我们提出了一种新的深度学习架构（MHATC），包括多针关注和时间整合模块，用于将个体分类为ASD的患者。设计的架构是由对当前深度神经网络解决方案的局限性进行了深入分析了类似应用的局限性。我们的方法不仅坚固但计算效率，可以在各种其他研究和临床环境中采用它。

translated by 谷歌翻译

Exploring Low-Cost Transformer Model Compression for Large-Scale Commercial Reply Suggestions

Vaishnavi Shrivastava , Radhika Gaonkar , Shashank Gupta , Abhishek Jha

分类：自然语言处理

2021-11-27

微调预训练的语言模型可以提高商业回复建议系统的质量，但以不可持续的培训时间的成本。流行的训练时间减少方法是资源密集型，因此我们探索了低成本的模型压缩技术，如层次掉线和冻结层。我们展示了这些技术在大数据场景中的功效，使商业电子邮件回复建议系统的培训时间减少了42％，而不会影响模型相关性或用户参与。我们进一步研究了这些技术的稳健性，以预先训练的模型和数据集大小消融，并为商业应用分享了几个见解和建议。

translated by 谷歌翻译

PAANet: Progressive Alternating Attention for Automatic Medical Image Segmentation

Abhishek Srivastava , Sukalpa Chanda , Debesh Jha , Michael A. Riegler , Pål Halvorsen , Dag Johansen , Umapada Pal

分类：计算机视觉

2021-11-20

医学图像分割可以为临床分析提供详细信息，这对于发现的详细位置很重要的情况可能是有用的。了解疾病的位置可以在治疗和决策中发挥重要作用。基于卷积神经网络（CNN）的编码器 - 解码器技术具有自动化医学图像分割系统的性能。几种基于CNN的方法利用了诸如空间和渠道的技术来提高性能。近年来引起关注的另一种技术是残留致密块（RDB）。密集连接块中的连续卷积层能够用不同的接收领域提取各种特征，从而提高性能。然而，连续堆积的卷积运营商可能不一定生成有助于识别目标结构的功能。在本文中，我们提出了一种逐步的交替注意网络（PAANET）。我们开发逐步交替注意密度（Paad）块，其在密集块中的每个卷积层中使用来自所有尺度的特征构建指导注意力图（GAM）。 GAM允许密集块中的以下层集中在与目标区域相关的空间位置。每个备用Paad块都反转GAM以生成反向注意地图，指导后面的图层，以提取边界和边缘相关信息，精炼分割过程。我们对三种不同的生物医学图像分割数据集的实验表明，与其他最先进的方法相比，我们的Paanet达到了有利的性能。

translated by 谷歌翻译

GMSRF-Net: An improved generalizability with global multi-scale residual fusion network for polyp segmentation

Abhishek Srivastava , Sukalpa Chanda , Debesh Jha , Umapada Pal , Sharib Ali

分类：计算机视觉

2021-11-20

结肠镜检查是一种金标准程序，但依赖于高度操作员。已经努力自动化息肉的检测和分割，这是一种癌前前兆，以有效地减少错过率。广泛使用的通过编码器解码器驱动的计算机辅助息肉分段系统在精度方面具有高性能。然而，从各种中心收集的息肉分割数据集可以遵循不同的成像协议，导致数据分布的差异。因此，大多数方法遭受性能下降，并且需要对每个特定数据集进行重新训练。我们通过提出全局多尺度剩余融合网络（GMSRF-Net）来解决这个概括问题。我们所提出的网络在为所有分辨率尺度执行多尺度融合操作时保持高分辨率表示。为了进一步利用比例信息，我们在GMSRF-Net中设计交叉多尺度注意（CMSA）和多尺度特征选择（MSFS）模块。由CMSA和MSFS门控的重复融合操作展示了网络的改进的概括性。在两种不同的息肉分割数据集上进行的实验表明，我们提出的GMSRF-Net优于先前的最先进的方法，在骰子方面，在看不见的CVC-ClinicDB和Unseen KVasir-SEG上的前一流的最先进方法。系数。

translated by 谷歌翻译

Morphology-based non-rigid registration of coronary computed tomography and intravascular images through virtual catheter path optimization

Karim Kadry , Abhishek Karmakar , Andreas Schuh , Kersten Peterson , Michiel Schaap , David Marlevi , Charles Taylor , Elazer Edelman , Farhad Nezami

分类：计算机视觉

2022-12-30

Coronary Computed Tomography Angiography (CCTA) provides information on the presence, extent, and severity of obstructive coronary artery disease. Large-scale clinical studies analyzing CCTA-derived metrics typically require ground-truth validation in the form of high-fidelity 3D intravascular imaging. However, manual rigid alignment of intravascular images to corresponding CCTA images is both time consuming and user-dependent. Moreover, intravascular modalities suffer from several non-rigid motion-induced distortions arising from distortions in the imaging catheter path. To address these issues, we here present a semi-automatic segmentation-based framework for both rigid and non-rigid matching of intravascular images to CCTA images. We formulate the problem in terms of finding the optimal \emph{virtual catheter path} that samples the CCTA data to recapitulate the coronary artery morphology found in the intravascular image. We validate our co-registration framework on a cohort of $n=40$ patients using bifurcation landmarks as ground truth for longitudinal and rotational registration. Our results indicate that our non-rigid registration significantly outperforms other co-registration approaches for luminal bifurcation alignment in both longitudinal (mean mismatch: 3.3 frames) and rotational directions (mean mismatch: 28.6 degrees). By providing a differentiable framework for automatic multi-modal intravascular data fusion, our developed co-registration modules significantly reduces the manual effort required to conduct large-scale multi-modal clinical studies while also providing a solid foundation for the development of machine learning-based co-registration approaches.

translated by 谷歌翻译

Visualizing Information Bottleneck through Variational Inference

Cipta Herwana , Abhishek Kadian

分类：机器学习

2022-12-24

The Information Bottleneck theory provides a theoretical and computational framework for finding approximate minimum sufficient statistics. Analysis of the Stochastic Gradient Descent (SGD) training of a neural network on a toy problem has shown the existence of two phases, fitting and compression. In this work, we analyze the SGD training process of a Deep Neural Network on MNIST classification and confirm the existence of two phases of SGD training. We also propose a setup for estimating the mutual information for a Deep Neural Network through Variational Inference.

translated by 谷歌翻译

Linear features segmentation from aerial images

Zhipeng Chang , Siddharth Jha , Yunfei Xia

分类：计算机视觉 | 人工智能

2022-12-23

The rapid development of remote sensing technologies have gained significant attention due to their ability to accurately localize, classify, and segment objects from aerial images. These technologies are commonly used in unmanned aerial vehicles (UAVs) equipped with high-resolution cameras or sensors to capture data over large areas. This data is useful for various applications, such as monitoring and inspecting cities, towns, and terrains. In this paper, we presented a method for classifying and segmenting city road traffic dashed lines from aerial images using deep learning models such as U-Net and SegNet. The annotated data is used to train these models, which are then used to classify and segment the aerial image into two classes: dashed lines and non-dashed lines. However, the deep learning model may not be able to identify all dashed lines due to poor painting or occlusion by trees or shadows. To address this issue, we proposed a method to add missed lines to the segmentation output. We also extracted the x and y coordinates of each dashed line from the segmentation output, which can be used by city planners to construct a CAD file for digital visualization of the roads.

translated by 谷歌翻译