Convolutional Neural Networks (CNNs) with U-shaped architectures have dominated medical image segmentation, which is crucial for various clinical purposes. However, the inherent locality of convolution makes CNNs fail to fully exploit global context, essential for better recognition of some structures, e.g., brain lesions. Transformers have recently proven promising performance on vision tasks, including semantic segmentation, mainly due to their capability of modeling long-range dependencies. Nevertheless, the quadratic complexity of attention makes existing Transformer-based models use self-attention layers only after somehow reducing the image resolution, which limits the ability to capture global contexts present at higher resolutions. Therefore, this work introduces a family of models, dubbed Factorizer, which leverages the power of low-rank matrix factorization for constructing an end-to-end segmentation model. Specifically, we propose a linearly scalable approach to context modeling, formulating Nonnegative Matrix Factorization (NMF) as a differentiable layer integrated into a U-shaped architecture. The shifted window technique is also utilized in combination with NMF to effectively aggregate local information. Factorizers compete favorably with CNNs and Transformers in terms of accuracy, scalability, and interpretability, achieving state-of-the-art results on the BraTS dataset for brain tumor segmentation and ISLES'22 dataset for stroke lesion segmentation. Highly meaningful NMF components give an additional interpretability advantage to Factorizers over CNNs and Transformers. Moreover, our ablation studies reveal a distinctive feature of Factorizers that enables a significant speed-up in inference for a trained Factorizer without any extra steps and without sacrificing much accuracy. The code and models are publicly available at https://github.com/pashtari/factorizer.
translated by 谷歌翻译
6D object pose estimation problem has been extensively studied in the field of Computer Vision and Robotics. It has wide range of applications such as robot manipulation, augmented reality, and 3D scene understanding. With the advent of Deep Learning, many breakthroughs have been made; however, approaches continue to struggle when they encounter unseen instances, new categories, or real-world challenges such as cluttered backgrounds and occlusions. In this study, we will explore the available methods based on input modality, problem formulation, and whether it is a category-level or instance-level approach. As a part of our discussion, we will focus on how 6D object pose estimation can be used for understanding 3D scenes.
translated by 谷歌翻译
在本文中,我们介绍了一种新的端到端流量分类方法,以区分包括在开放系统互连(OSI)模型的三层中的VPN流量的流量等级。由于其加密性质,VPN流量的分类并不是使用传统分类方法的琐碎。我们利用了两个知名的神经网络,即多层的感知者和经常性神经网络,以创建我们的级联神经网络,专注于两个指标:课程得分和距离课程中心的距离。这种方法将提取,选择和分类功能组合成单个端到端系统以系统地学习输入和预测性能之间的非线性关系。因此,我们可以通过拒绝VPN类的无关功能将VPN流量与非VPN流量区分开来。此外,我们同时获得非VPN流量的应用类型。使用常规交通数据集iSCX VPN-NONVPN和获取的数据集进行评估该方法。结果证明了框架方法对加密流量分类的功效,同时也实现了极端准确性,95美元百分比,高于最先进模型的准确性和强大的泛化能力。
translated by 谷歌翻译