智能论文笔记

Self-Supervised Implicit Attention: Guided Attention by The Model Itself

Jinyi Wu , Xun Gong , Zhemin Zhang

分类：计算机视觉

2022-06-15

我们提出了自我监督的隐式注意力（SSIA），这是一种新方法，可以适应性地指导深度神经网络模型，以通过利用模型本身的特性来吸引注意力。 SSIA是一种新颖的注意机制，在推理过程中不需要任何额外的参数，计算或内存访问成本，这与现有的注意机制相反。简而言之，通过将注意力重量视为高级语义信息，我们重新考虑了现有注意机制的实现，并进一步提出了从较高网络层中生成监督信号，以指导较低的网络层以进行参数更新。我们通过使用网络本身的层次特征来构建自我监督的学习任务，从而实现了这一目标，该任务仅在培训阶段起作用。为了验证SSIA的有效性，我们在卷积神经网络模型中执行了特定的实现（称为SSIA块），并在几个图像分类数据集上验证了它。实验结果表明，SSIA块可以显着改善模型性能，即使胜过许多流行的注意方法，这些方法需要其他参数和计算成本，例如挤压和激发和卷积障碍物注意模块。我们的实施将在GitHub上获得。

translated by 谷歌翻译

Deep Reinforced Attention Learning for Quality-Aware Visual Recognition

Duo Li , Qifeng Chen

分类：计算机视觉

2020-07-13

在本文中，我们基于任何卷积神经网络中中间注意图的弱监督生成机制，并更加直接地披露了注意模块的有效性，以充分利用其潜力。鉴于现有的神经网络配备了任意注意模块，我们介绍了一个元评论家网络，以评估主网络中注意力图的质量。由于我们设计的奖励的离散性，提出的学习方法是在强化学习环境中安排的，在此设置中，注意力参与者和经常性的批评家交替优化，以提供临时注意力表示的即时批评和修订，因此，由于深度强化的注意力学习而引起了人们的关注。（Dreal）。它可以普遍应用于具有不同类型的注意模块的网络体系结构，并通过最大程度地提高每个单独注意模块产生的最终识别性能的相对增益来促进其表现能力，如类别和实例识别基准的广泛实验所证明的那样。

translated by 谷歌翻译

A Discriminative Channel Diversification Network for Image Classification

Krushi Patel , Guanghui Wang

分类：计算机视觉

2021-12-10

已证明卷积神经网络中的渠道注意机制在各种计算机视觉任务中有效。但是，性能改进具有额外的模型复杂性和计算成本。在本文中，我们提出了一种被称为信道分流块的轻量级和有效的注意模块，以通过在全球层面建立信道关系来增强全局背景。与其他通道注意机制不同，所提出的模块通过在考虑信道激活时更加关注空间可区分的渠道，专注于最辨别的特征。与其他介绍模块不同的其他中间层之间的其他关注模型不同，所提出的模块嵌入在骨干网络的末尾，使其易于实现。在CiFar-10，SVHN和微型想象中心数据集上进行了广泛的实验表明，所提出的模块平均提高了基线网络的性能3％的余量。

translated by 谷歌翻译

CBAM: Convolutional Block Attention Module

Sanghyun Woo , Jongchan Park , Joon-Young Lee , In So Kweon

分类：

2018-07-17

We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS COCO detection, and VOC 2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

translated by 谷歌翻译

Deeply Supervised Layer Selective Attention Network: Towards Label-Efficient Learning for Medical Image Classification

Peng Jiang , Juan Liu , Lang Wang , Zhihui Ynag , Hongyu Dong , Jing Feng

分类：计算机视觉

2022-09-28

标记医学图像取决于专业知识，因此很难在短时间内以高质量获取大量注释的医学图像。因此，在小型数据集中充分利用有限标记的样品来构建高性能模型是医疗图像分类问题的关键。在本文中，我们提出了一个深入监督的层选择性注意网络（LSANET），该网络全面使用功能级和预测级监督中的标签信息。对于特征级别的监督，为了更好地融合低级功能和高级功能，我们提出了一个新颖的视觉注意模块，层选择性注意（LSA），以专注于不同层的特征选择。 LSA引入了一种权重分配方案，该方案可以在整个训练过程中动态调整每个辅助分支的加权因子，以进一步增强深入监督的学习并确保其概括。对于预测级的监督，我们采用知识协同策略，通过成对知识匹配来促进所有监督分支之间的层次信息互动。使用公共数据集MedMnist，这是用于涵盖多种医学专业的生物医学图像分类的大规模基准，我们评估了LSANET在多个主流CNN体系结构和各种视觉注意模块上评估。实验结果表明，我们所提出的方法对其相应的对应物进行了实质性改进，这表明LSANET可以为医学图像分类领域的标签有效学习提供有希望的解决方案。

translated by 谷歌翻译

TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in CNNs

Shantanu Jaiswal , Basura Fernando , Cheston Tan

分类：计算机视觉

2021-11-26

卷积神经网络（CNNS）的注意力模块是增强网络对多个计算机视觉任务的性能的有效方法。虽然许多作品专注于通过适当的渠道，空间和自我关注建立更有效的模块，但它们主要以供给送出方式运作。因此，注意机制强烈取决于单个输入特征激活的代表能力，并且可以从语义上更丰富的更高级别激活中受益，该激活可以通过自上而下信息流指定“有什么和位置”。这种反馈连接在灵长类动物视觉皮层中也普遍存在，并且神经科学家被认为是灵长类动物视觉关注的关键组成部分。因此，在这项工作中，我们提出了一种轻量级的自上而下（TD）注意模块，其迭代地产生“视觉探照灯”以执行自上而下的信道和其输入的空间调制，从而在每个计算步骤中输出更多的选择性特征激活。我们的实验表明，集成CNNS中的TD在Imagenet-1K分类上增强了它们的性能，并且优于突出的注意模块，同时具有更多参数和记忆力。此外，我们的模型在推理期间更改输入分辨率更加强大，并通过在没有任何显式监督的情况下本地化各个对象或特征来学习“转移注意”。除了改进细粒度和多标签分类的情况下，这种能力在弱监督对象定位上导致RESET50改进了5％。

translated by 谷歌翻译

SDA-$x$Net: Selective Depth Attention Networks for Adaptive Multi-scale Feature Representation

Qingbei Guo , Xiao-Jun Wu , Zhiquan Feng , Tianyang Xu , Cong Hu

分类：计算机视觉

2022-09-21

现有的多尺度解决方案会导致仅增加接受场大小的风险，同时忽略小型接受场。因此，有效构建自适应神经网络以识别各种空间尺度对象是一个具有挑战性的问题。为了解决这个问题，我们首先引入一个新的注意力维度，即除了现有的注意力维度（例如渠道，空间和分支）之外，并提出了一个新颖的选择性深度注意网络，以对称地处理各种视觉中的多尺度对象任务。具体而言，在给定神经网络的每个阶段内的块，即重新连接，输出层次功能映射共享相同的分辨率但具有不同的接收场大小。基于此结构属性，我们设计了一个舞台建筑模块，即SDA，其中包括树干分支和类似SE的注意力分支。躯干分支的块输出融合在一起，以通过注意力分支指导其深度注意力分配。根据提出的注意机制，我们可以动态选择不同的深度特征，这有助于自适应调整可变大小输入对象的接收场大小。这样，跨块信息相互作用会导致沿深度方向的远距离依赖关系。与其他多尺度方法相比，我们的SDA方法结合了从以前的块到舞台输出的多个接受场，从而提供了更广泛，更丰富的有效接收场。此外，我们的方法可以用作其他多尺度网络以及注意力网络的可插入模块，并创造为SDA- $ x $ net。它们的组合进一步扩展了有效的接受场的范围，可以实现可解释的神经网络。我们的源代码可在\ url {https://github.com/qingbeiguo/sda-xnet.git}中获得。

translated by 谷歌翻译

GhostNet: More Features from Cheap Operations

Kai Han , Yunhe Wang , Qi Tian , Jianyuan Guo , Chunjing Xu , Chang Xu

分类：

2019-11-27

Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. Ghost bottlenecks are designed to stack Ghost modules, and then the lightweight Ghost-Net can be easily established. Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance (e.g. 75.7% top-1 accuracy) than MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012 classification dataset. Code is available at https: //github.com/huawei-noah/ghostnet.

translated by 谷歌翻译

Efficient deep learning models for land cover image classification

Ioannis Papoutsis , Nikolaos-Ioannis Bountos , Angelos Zavras , Dimitrios Michail , Christos Tryfonopoulos

分类：计算机视觉

2021-11-18

哥内克人Sentinel Imagery的纯粹卷的可用性为使用深度学习的大尺度创造了新的土地利用陆地覆盖（Lulc）映射的机会。虽然在这种大型数据集上培训是一个非琐碎的任务。在这项工作中，我们试验Lulc Image分类和基准不同最先进模型的Bigearthnet数据集，包括卷积神经网络，多层感知，视觉变压器，高效导通和宽残余网络（WRN）架构。我们的目标是利用分类准确性，培训时间和推理率。我们提出了一种基于用于网络深度，宽度和输入数据分辨率的WRNS复合缩放的高效导通的框架，以有效地训练和测试不同的模型设置。我们设计一种新颖的缩放WRN架构，增强了有效的通道注意力机制。我们提出的轻量级模型具有较小的培训参数，实现所有19个LULC类的平均F分类准确度达到4.5％，并且验证了我们使用的resnet50最先进的模型速度快两倍作为基线。我们提供超过50种培训的型号，以及我们在多个GPU节点上分布式培训的代码。

translated by 谷歌翻译

Attention Mechanisms in Computer Vision: A Survey

Meng-Hao Guo , Tian-Xing Xu , Jiang-Jiang Liu , Zheng-Ning Liu , Peng-Tao Jiang , Tai-Jiang Mu , Song-Hai Zhang , Ralph R. Martin , Ming-Ming Cheng , Shi-Min Hu

分类：计算机视觉

2021-11-15

人类自然有效地在复杂的场景中找到突出区域。通过这种观察的动机，引入了计算机视觉中的注意力机制，目的是模仿人类视觉系统的这一方面。这种注意机制可以基于输入图像的特征被视为动态权重调整过程。注意机制在许多视觉任务中取得了巨大的成功，包括图像分类，对象检测，语义分割，视频理解，图像生成，3D视觉，多模态任务和自我监督的学习。在本调查中，我们对计算机愿景中的各种关注机制进行了全面的审查，并根据渠道注意，空间关注，暂时关注和分支注意力进行分类。相关的存储库https：//github.com/menghaoguo/awesome-vision-tions致力于收集相关的工作。我们还建议了未来的注意机制研究方向。

translated by 谷歌翻译

CAT: Learning to Collaborate Channel and Spatial Attention from Multi-Information Fusion

Zizhang Wu , Man Wang , Weiwei Sun , Yuchen Li , Tianhao Xu , Fan Wang , Keke Huang

分类：计算机视觉

2022-12-13

Channel and spatial attention mechanism has proven to provide an evident performance boost of deep convolution neural networks (CNNs). Most existing methods focus on one or run them parallel (series), neglecting the collaboration between the two attentions. In order to better establish the feature interaction between the two types of attention, we propose a plug-and-play attention module, which we term "CAT"-activating the Collaboration between spatial and channel Attentions based on learned Traits. Specifically, we represent traits as trainable coefficients (i.e., colla-factors) to adaptively combine contributions of different attention modules to fit different image hierarchies and tasks better. Moreover, we propose the global entropy pooling (GEP) apart from global average pooling (GAP) and global maximum pooling (GMP) operators, an effective component in suppressing noise signals by measuring the information disorder of feature maps. We introduce a three-way pooling operation into attention modules and apply the adaptive mechanism to fuse their outcomes. Extensive experiments on MS COCO, Pascal-VOC, Cifar-100, and ImageNet show that our CAT outperforms existing state-of-the-art attention mechanisms in object detection, instance segmentation, and image classification. The model and code will be released soon.

translated by 谷歌翻译

GhostNets on Heterogeneous Devices via Cheap Operations

Kai Han , Yunhe Wang , Chang Xu , Jianyuan Guo , Chunjing Xu , Enhua Wu , Qi Tian

分类：计算机视觉

2022-01-10

由于存储器和计算资源有限，部署在移动设备上的卷积神经网络（CNNS）是困难的。我们的目标是通过利用特征图中的冗余来设计包括CPU和GPU的异构设备的高效神经网络，这很少在神经结构设计中进行了研究。对于类似CPU的设备，我们提出了一种新颖的CPU高效的Ghost（C-Ghost）模块，以生成从廉价操作的更多特征映射。基于一组内在的特征映射，我们使用廉价的成本应用一系列线性变换，以生成许多幽灵特征图，可以完全揭示内在特征的信息。所提出的C-Ghost模块可以作为即插即用组件，以升级现有的卷积神经网络。 C-Ghost瓶颈旨在堆叠C-Ghost模块，然后可以轻松建立轻量级的C-Ghostnet。我们进一步考虑GPU设备的有效网络。在建筑阶段的情况下，不涉及太多的GPU效率（例如，深度明智的卷积），我们建议利用阶段明智的特征冗余来制定GPU高效的幽灵（G-GHOST）阶段结构。舞台中的特征被分成两个部分，其中使用具有较少输出通道的原始块处理第一部分，用于生成内在特征，另一个通过利用阶段明智的冗余来生成廉价的操作。在基准测试上进行的实验证明了所提出的C-Ghost模块和G-Ghost阶段的有效性。 C-Ghostnet和G-Ghostnet分别可以分别实现CPU和GPU的准确性和延迟的最佳权衡。代码可在https://github.com/huawei-noah/cv-backbones获得。

translated by 谷歌翻译

Densely connected convolutional networks

分类：

Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections-one between each layer and its subsequent layer-our network has L(L+1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

translated by 谷歌翻译

Dense Prediction with Attentive Feature Aggregation

Yung-Hsu Yang , Thomas E. Huang , Samuel Rota Bulò , Peter Kontschieder , Fisher Yu

分类：计算机视觉

2021-11-01

跨不同层的特征的聚合信息是密集预测模型的基本操作。尽管表现力有限，但功能级联占主导地位聚合运营的选择。在本文中，我们引入了细分特征聚合（AFA），以融合不同的网络层，具有更具表现力的非线性操作。 AFA利用空间和渠道注意，以计算层激活的加权平均值。灵感来自神经体积渲染，我们将AFA扩展到规模空间渲染（SSR），以执行多尺度预测的后期融合。 AFA适用于各种现有网络设计。我们的实验表明了对挑战性的语义细分基准，包括城市景观，BDD100K和Mapillary Vistas的一致而显着的改进，可忽略不计的计算和参数开销。特别是，AFA改善了深层聚集（DLA）模型在城市景观上的近6％Miou的性能。我们的实验分析表明，AFA学会逐步改进分割地图并改善边界细节，导致新的最先进结果对BSDS500和NYUDV2上的边界检测基准。在http://vis.xyz/pub/dla-afa上提供代码和视频资源。

translated by 谷歌翻译

Learning of Frequency-Time Attention Mechanism for Automatic Modulation Recognition

Shangao Lin , Yuan Zeng , Yi Gong

分类：计算机视觉

2021-11-05

最近的基于学习的图像分类和语音识别方法使得广泛利用注意力机制来实现最先进的识别力，这表明了注意力机制的有效性。由于调制无线电信号的频率和时间信息对调制模式识别至关重要的事实，本文提出了一种卷积神经网络（CNN）的调制识别框架的频率时间注意机制。所提出的频率 - 时间注意模块旨在了解哪些频道，频率和时间信息在CNN中更有意义，以进行调制识别。我们分析了所提出的频率时期注意机制的有效性，并比较了两个现有的基于学习的方法的提出方法。在开源调制识别数据集上的实验表明，所提出的框架的识别性能优于框架的识别性能，而无需朝向基于学习的方法。

translated by 谷歌翻译

Efficient Multi-order Gated Aggregation Network

Siyuan Li , Zedong Wang , Zicheng Liu , Cheng Tan , Haitao Lin , Di Wu , Zhiyuan Chen , Jiangbin Zheng , Stan Z. Li

分类：计算机视觉 | 人工智能

2022-11-07

Since the recent success of Vision Transformers (ViTs), explorations toward transformer-style architectures have triggered the resurgence of modern ConvNets. In this work, we explore the representation ability of DNNs through the lens of interaction complexities. We empirically show that interaction complexity is an overlooked but essential indicator for visual recognition. Accordingly, a new family of efficient ConvNets, named MogaNet, is presented to pursue informative context mining in pure ConvNet-based models, with preferable complexity-performance trade-offs. In MogaNet, interactions across multiple complexities are facilitated and contextualized by leveraging two specially designed aggregation blocks in both spatial and channel interaction spaces. Extensive studies are conducted on ImageNet classification, COCO object detection, and ADE20K semantic segmentation tasks. The results demonstrate that our MogaNet establishes new state-of-the-art over other popular methods in mainstream scenarios and all model scales. Typically, the lightweight MogaNet-T achieves 80.0\% top-1 accuracy with only 1.44G FLOPs using a refined training setup on ImageNet-1K, surpassing ParC-Net-S by 1.4\% accuracy but saving 59\% (2.04G) FLOPs.

translated by 谷歌翻译

Semiconductor Defect Pattern Classification by Self-Proliferation-and-Attention Neural Network

YuanFu Yang , Min Sun

分类：计算机视觉

2022-12-01

Semiconductor manufacturing is on the cusp of a revolution: the Internet of Things (IoT). With IoT we can connect all the equipment and feed information back to the factory so that quality issues can be detected. In this situation, more and more edge devices are used in wafer inspection equipment. This edge device must have the ability to quickly detect defects. Therefore, how to develop a high-efficiency architecture for automatic defect classification to be suitable for edge devices is the primary task. In this paper, we present a novel architecture that can perform defect classification in a more efficient way. The first function is self-proliferation, using a series of linear transformations to generate more feature maps at a cheaper cost. The second function is self-attention, capturing the long-range dependencies of feature map by the channel-wise and spatial-wise attention mechanism. We named this method as self-proliferation-and-attention neural network. This method has been successfully applied to various defect pattern classification tasks. Compared with other latest methods, SP&A-Net has higher accuracy and lower computation cost in many defect inspection tasks.

translated by 谷歌翻译

SSBNet: Improving Visual Recognition Efficiency by Adaptive Sampling

Ho Man Kwan , Shenghui Song

分类：计算机视觉 | 人工智能 | 机器学习

2022-07-23

被广泛采用的缩减采样是为了在视觉识别的准确性和延迟之间取得良好的权衡。不幸的是，没有学习常用的合并层，因此无法保留重要信息。作为另一个降低方法，自适应采样权重和与任务相关的过程区域，因此能够更好地保留有用的信息。但是，自适应采样的使用仅限于某些层。在本文中，我们表明，在深神经网络的构件中使用自适应采样可以提高其效率。特别是，我们提出了SSBNET，该SSBNET是通过将采样层反复插入Resnet等现有网络构建的。实验结果表明，所提出的SSBNET可以在ImageNet和可可数据集上实现竞争性图像分类和对象检测性能。例如，SSB-Resnet-RS-200在Imagenet数据集上的精度达到82.6％，比基线RESNET-RS-152高0.6％，具有相似的复杂性。可视化显示了SSBNET在允许不同层专注于不同位置的优势，而消融研究进一步验证了自适应采样比均匀方法的优势。

translated by 谷歌翻译

S\textsuperscript{2}-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation

Mohammed A. M. Elhassan , Chenhui Yang , Chenxi Huang , Tewodros Legesse Munea , Xin Hong

分类：计算机视觉 | 人工智能

2022-06-15

现代的高性能语义分割方法采用沉重的主链和扩张的卷积来提取相关特征。尽管使用上下文和语义信息提取功能对于分割任务至关重要，但它为实时应用程序带来了内存足迹和高计算成本。本文提出了一种新模型，以实现实时道路场景语义细分的准确性/速度之间的权衡。具体来说，我们提出了一个名为“比例吸引的条带引导特征金字塔网络”（s \ textsuperscript {2} -fpn）的轻巧模型。我们的网络由三个主要模块组成：注意金字塔融合（APF）模块，比例吸引条带注意模块（SSAM）和全局特征Upsample（GFU）模块。 APF采用了注意力机制来学习判别性多尺度特征，并有助于缩小不同级别之间的语义差距。 APF使用量表感知的关注来用垂直剥离操作编码全局上下文，并建模长期依赖性，这有助于将像素与类似的语义标签相关联。此外，APF还采用频道重新加权块（CRB）来强调频道功能。最后，S \ TextSuperScript {2} -fpn的解码器然后采用GFU，该GFU用于融合APF和编码器的功能。已经对两个具有挑战性的语义分割基准进行了广泛的实验，这表明我们的方法通过不同的模型设置实现了更好的准确性/速度权衡。提出的模型已在CityScapes Dataset上实现了76.2 \％miou/87.3fps，77.4 \％miou/67fps和77.8 \％miou/30.5fps，以及69.6 \％miou，71.0 miou，71.0 \％miou，和74.2 \％\％\％\％\％\％。 miou在Camvid数据集上。这项工作的代码将在\ url {https://github.com/mohamedac29/s2-fpn提供。

translated by 谷歌翻译

Squeeze-and-excitation networks

分类：

Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel-wise information together within local receptive fields. In order to boost the representational power of a network, several recent approaches have shown the benefit of enhancing spatial encoding. In this work, we focus on the channel relationship and propose a novel architectural unit, which we term the "Squeezeand-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We demonstrate that by stacking these blocks together, we can construct SENet architectures that generalise extremely well across challenging datasets. Crucially, we find that SE blocks produce significant performance improvements for existing state-ofthe-art deep architectures at minimal additional computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top-5 error to 2.251%, achieving a ∼25% relative improvement over the winning entry of 2016.

translated by 谷歌翻译