已证明卷积神经网络中的渠道注意机制在各种计算机视觉任务中有效。但是,性能改进具有额外的模型复杂性和计算成本。在本文中,我们提出了一种被称为信道分流块的轻量级和有效的注意模块,以通过在全球层面建立信道关系来增强全局背景。与其他通道注意机制不同,所提出的模块通过在考虑信道激活时更加关注空间可区分的渠道,专注于最辨别的特征。与其他介绍模块不同的其他中间层之间的其他关注模型不同,所提出的模块嵌入在骨干网络的末尾,使其易于实现。在CiFar-10,SVHN和微型想象中心数据集上进行了广泛的实验表明,所提出的模块平均提高了基线网络的性能3%的余量。
translated by 谷歌翻译
We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS COCO detection, and VOC 2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.
translated by 谷歌翻译
The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies, via aggregating query-specific global context to each query position. However, through a rigorous empirical analysis, we have found that the global contexts modeled by non-local network are almost the same for different query positions within an image. In this paper, we take advantage of this finding to create a simplified network based on a queryindependent formulation, which maintains the accuracy of NLNet but with significantly less computation. We further observe that this simplified design shares similar structure with Squeeze-Excitation Network (SENet). Hence we unify them into a three-step general framework for global context modeling. Within the general framework, we design a better instantiation, called the global context (GC) block, which is lightweight and can effectively model the global context. The lightweight property allows us to apply it for multiple layers in a backbone network to construct a global context network (GCNet), which generally outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks. The code and configurations are released at https://github.com/xvjiarui/GCNet.
translated by 谷歌翻译
标准卷积神经网络(CNN)设计很少专注于明确捕获各种功能以增强网络性能的重要性。相反,大多数现有方法遵循增加或调整网络深度和宽度的间接方法,这在许多情况下显着提高了计算成本。受生物视觉系统的启发,我们提出了一种多样化和自适应的卷积网络(DA $ ^ {2} $ - net),它使任何前锋CNN能够明确地捕获不同的功能,并自适应地选择并强调最具信息性的功能有效地提高网络的性能。 DA $ ^ {2} $ - NET会引入可忽略不计的计算开销,它旨在与任何CNN架构轻松集成。我们广泛地评估了基准数据集的DA $ ^ {2} $ - 网上,包括CNN架构的CNN100,SVHN和Imagenet,包括CNN100。实验结果显示DA $ ^ {2} $ - NET提供了具有非常最小的计算开销的显着性能改进。
translated by 谷歌翻译
在本文中,我们建议使用注意机制和全球环境进行图像分类的一般框架,该框架可以与各种网络体系结构结合起来以提高其性能。为了调查全球环境的能力,我们比较了四个数学模型,并观察到分开的条件生成模型中编码的全球环境可以提供更多的指导,因为“知道什么是任务无关紧要的,也将知道什么是相关的”。基于此观察结果,我们定义了一个新型的分离全球环境(CDGC),并设计了一个深层网络来获得它。通过参加CDGC,基线网络可以更准确地识别感兴趣的对象,从而改善性能。我们将框架应用于许多不同的网络体系结构,并与四个公开可用数据集的最新框架进行比较。广泛的结果证明了我们方法的有效性和优势。代码将在纸上接受公开。
translated by 谷歌翻译
我们提出了自我监督的隐式注意力(SSIA),这是一种新方法,可以适应性地指导深度神经网络模型,以通过利用模型本身的特性来吸引注意力。 SSIA是一种新颖的注意机制,在推理过程中不需要任何额外的参数,计算或内存访问成本,这与现有的注意机制相反。简而言之,通过将注意力重量视为高级语义信息,我们重新考虑了现有注意机制的实现,并进一步提出了从较高网络层中生成监督信号,以指导较低的网络层以进行参数更新。我们通过使用网络本身的层次特征来构建自我监督的学习任务,从而实现了这一目标,该任务仅在培训阶段起作用。为了验证SSIA的有效性,我们在卷积神经网络模型中执行了特定的实现(称为SSIA块),并在几个图像分类数据集上验证了它。实验结果表明,SSIA块可以显着改善模型性能,即使胜过许多流行的注意方法,这些方法需要其他参数和计算成本,例如挤压和激发和卷积障碍物注意模块。我们的实施将在GitHub上获得。
translated by 谷歌翻译
Channel and spatial attention mechanism has proven to provide an evident performance boost of deep convolution neural networks (CNNs). Most existing methods focus on one or run them parallel (series), neglecting the collaboration between the two attentions. In order to better establish the feature interaction between the two types of attention, we propose a plug-and-play attention module, which we term "CAT"-activating the Collaboration between spatial and channel Attentions based on learned Traits. Specifically, we represent traits as trainable coefficients (i.e., colla-factors) to adaptively combine contributions of different attention modules to fit different image hierarchies and tasks better. Moreover, we propose the global entropy pooling (GEP) apart from global average pooling (GAP) and global maximum pooling (GMP) operators, an effective component in suppressing noise signals by measuring the information disorder of feature maps. We introduce a three-way pooling operation into attention modules and apply the adaptive mechanism to fuse their outcomes. Extensive experiments on MS COCO, Pascal-VOC, Cifar-100, and ImageNet show that our CAT outperforms existing state-of-the-art attention mechanisms in object detection, instance segmentation, and image classification. The model and code will be released soon.
translated by 谷歌翻译
Semiconductor manufacturing is on the cusp of a revolution: the Internet of Things (IoT). With IoT we can connect all the equipment and feed information back to the factory so that quality issues can be detected. In this situation, more and more edge devices are used in wafer inspection equipment. This edge device must have the ability to quickly detect defects. Therefore, how to develop a high-efficiency architecture for automatic defect classification to be suitable for edge devices is the primary task. In this paper, we present a novel architecture that can perform defect classification in a more efficient way. The first function is self-proliferation, using a series of linear transformations to generate more feature maps at a cheaper cost. The second function is self-attention, capturing the long-range dependencies of feature map by the channel-wise and spatial-wise attention mechanism. We named this method as self-proliferation-and-attention neural network. This method has been successfully applied to various defect pattern classification tasks. Compared with other latest methods, SP&A-Net has higher accuracy and lower computation cost in many defect inspection tasks.
translated by 谷歌翻译
Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention modules for achieving better performance, which inevitably increase model complexity.To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, we propose a local crosschannel interaction strategy without dimensionality reduction, which can be efficiently implemented via 1D convolution. Furthermore, we develop a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction. The proposed ECA module is efficient yet effective, e.g., the parameters and computations of our modules against backbone of ResNet50 are 80 vs. 24.37M and 4.7e-4 GFLOPs vs. 3.86 GFLOPs, respectively, and the performance boost is more than 2% in terms of Top-1 accuracy. We extensively evaluate our ECA module on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our module is more efficient while performing favorably against its counterparts.
translated by 谷歌翻译
人类自然有效地在复杂的场景中找到突出区域。通过这种观察的动机,引入了计算机视觉中的注意力机制,目的是模仿人类视觉系统的这一方面。这种注意机制可以基于输入图像的特征被视为动态权重调整过程。注意机制在许多视觉任务中取得了巨大的成功,包括图像分类,对象检测,语义分割,视频理解,图像生成,3D视觉,多模态任务和自我监督的学习。在本调查中,我们对计算机愿景中的各种关注机制进行了全面的审查,并根据渠道注意,空间关注,暂时关注和分支注意力进行分类。相关的存储库https://github.com/menghaoguo/awesome-vision-tions致力于收集相关的工作。我们还建议了未来的注意机制研究方向。
translated by 谷歌翻译
卷积神经网络在过去十年中允许在单个图像超分辨率(SISR)中的显着进展。在SISR最近的进展中,关注机制对于高性能SR模型至关重要。但是,注意机制仍然不清楚为什么它在SISR中的工作原理。在这项工作中,我们试图量化和可视化SISR中的注意力机制,并表明并非所有关注模块都同样有益。然后,我们提出了关注网络(A $ ^ 2 $ n)的注意力,以获得更高效和准确的SISR。具体来说,$ ^ 2 $ n包括非关注分支和耦合注意力分支。提出了一种动态注意力模块,为这两个分支产生权重,以动态地抑制不需要的注意力调整,其中权重根据输入特征自适应地改变。这允许注意模块专门从事惩罚的有益实例,从而大大提高了注意力网络的能力,即几个参数开销。实验结果表明,我们的最终模型A $ ^ 2 $ n可以实现与类似尺寸的最先进网络相比的卓越的权衡性能。代码可以在https://github.com/haoyuc/a2n获得。
translated by 谷歌翻译
现有的多尺度解决方案会导致仅增加接受场大小的风险,同时忽略小型接受场。因此,有效构建自适应神经网络以识别各种空间尺度对象是一个具有挑战性的问题。为了解决这个问题,我们首先引入一个新的注意力维度,即除了现有的注意力维度(例如渠道,空间和分支)之外,并提出了一个新颖的选择性深度注意网络,以对称地处理各种视觉中的多尺度对象任务。具体而言,在给定神经网络的每个阶段内的块,即重新连接,输出层次功能映射共享相同的分辨率但具有不同的接收场大小。基于此结构属性,我们设计了一个舞台建筑模块,即SDA,其中包括树干分支和类似SE的注意力分支。躯干分支的块输出融合在一起,以通过注意力分支指导其深度注意力分配。根据提出的注意机制,我们可以动态选择不同的深度特征,这有助于自适应调整可变大小输入对象的接收场大小。这样,跨块信息相互作用会导致沿深度方向的远距离依赖关系。与其他多尺度方法相比,我们的SDA方法结合了从以前的块到舞台输出的多个接受场,从而提供了更广泛,更丰富的有效接收场。此外,我们的方法可以用作其他多尺度网络以及注意力网络的可插入模块,并创造为SDA- $ x $ net。它们的组合进一步扩展了有效的接受场的范围,可以实现可解释的神经网络。我们的源代码可在\ url {https://github.com/qingbeiguo/sda-xnet.git}中获得。
translated by 谷歌翻译
在本文中,我们基于任何卷积神经网络中中间注意图的弱监督生成机制,并更加直接地披露了注意模块的有效性,以充分利用其潜力。鉴于现有的神经网络配备了任意注意模块,我们介绍了一个元评论家网络,以评估主网络中注意力图的质量。由于我们设计的奖励的离散性,提出的学习方法是在强化学习环境中安排的,在此设置中,注意力参与者和经常性的批评家交替优化,以提供临时注意力表示的即时批评和修订,因此,由于深度强化的注意力学习而引起了人们的关注。 (Dreal)。它可以普遍应用于具有不同类型的注意模块的网络体系结构,并通过最大程度地提高每个单独注意模块产生的最终识别性能的相对增益来促进其表现能力,如类别和实例识别基准的广泛实验所证明的那样。
translated by 谷歌翻译
近年来,渠道注意机制被广泛研究了改善深卷积神经网络(CNNS)性能的巨大潜力。然而,在大多数现有方法中,只有相邻卷积层的输出被馈送到注意层以计算信道权重。忽略其他卷积图层的信息。通过这些观察,提出了一种简单的策略,名为PRINGSING NET(BA-NET),以获得更好的渠道注意机制。这种设计的主要思想是通过跳过通道权重生成的跳过连接来弥合先前卷积层的输出。 BA-NET不仅可以提供更丰富的功能,可以在馈电时计算频道重量,还可以在反向轮胎时乘以参数更新的路径。综合评价表明,与准确性和速度的现有方法相比,拟议的方法实现了最先进的性能。桥梁关注提供了一种新的神经网络架构设计的透视,并显示出改善现有通道注意力机制的性能的巨大潜力。该代码可用于\ url {https://github.com/zhaoy376/表弟主持 - 机制
translated by 谷歌翻译
Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. Ghost bottlenecks are designed to stack Ghost modules, and then the lightweight Ghost-Net can be easily established. Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance (e.g. 75.7% top-1 accuracy) than MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012 classification dataset. Code is available at https: //github.com/huawei-noah/ghostnet.
translated by 谷歌翻译
In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.
translated by 谷歌翻译
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections-one between each layer and its subsequent layer-our network has L(L+1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.
translated by 谷歌翻译
注意机制对研究界提出了重大兴趣,因为他们承诺改善神经网络架构的表现。但是,在任何特定的问题中,我们仍然缺乏主要的方法来选择导致保证改进的具体机制和超参数。最近,已经提出了自我关注并广泛用于变压器 - 类似的架构中,导致某些应用中的重大突破。在这项工作中,我们专注于两种形式的注意机制:注意模块和自我关注。注意模块用于重新重量每个层输入张量的特征。不同的模块具有不同的方法,可以在完全连接或卷积层中执行此重复。研究的注意力模型是完全模块化的,在这项工作中,它们将与流行的Reset架构一起使用。自我关注,最初在自然语言处理领域提出,可以将所有项目与输入序列中的所有项目相关联。自我关注在计算机视觉中越来越受欢迎,其中有时与卷积层相结合,尽管最近的一些架构与卷曲完全消失。在这项工作中,我们研究并执行了在特定计算机视觉任务中许多不同关注机制的客观的比较,在广泛使用的皮肤癌MNIST数据集中的样本分类。结果表明,关注模块有时会改善卷积神经网络架构的性能,也是这种改进虽然明显且统计学意义,但在不同的环境中并不一致。另一方面,通过自我关注机制获得的结果表明了一致和显着的改进,即使在具有减少数量的参数的架构中,也可以实现最佳结果。
translated by 谷歌翻译
In this work, we propose "Residual Attention Network", a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. Our Residual Attention Network is built by stacking Attention Modules which generate attention-aware features. The attention-aware features from different modules change adaptively as layers going deeper. Inside each Attention Module, bottom-up top-down feedforward structure is used to unfold the feedforward and feedback attention process into a single feedforward process. Importantly, we propose attention residual learning to train very deep Residual Attention Networks which can be easily scaled up to hundreds of layers.Extensive analyses are conducted on CIFAR-10 and CIFAR-100 datasets to verify the effectiveness of every module mentioned above. Our Residual Attention Network achieves state-of-the-art object recognition performance on three benchmark datasets including CIFAR-10 (3.90% error), CIFAR-100 (20.45% error) and ImageNet (4.8% single model and single crop, top-5 error). Note that, our method achieves 0.6% top-1 accuracy improvement with 46% trunk depth and 69% forward FLOPs comparing to ResNet-200. The experiment also demonstrates that our network is robust against noisy labels.
translated by 谷歌翻译
In this paper, we present a robust and low complexity deep learning model for Remote Sensing Image Classification (RSIC), the task of identifying the scene of a remote sensing image. In particular, we firstly evaluate different low complexity and benchmark deep neural networks: MobileNetV1, MobileNetV2, NASNetMobile, and EfficientNetB0, which present the number of trainable parameters lower than 5 Million (M). After indicating best network architecture, we further improve the network performance by applying attention schemes to multiple feature maps extracted from middle layers of the network. To deal with the issue of increasing the model footprint as using attention schemes, we apply the quantization technique to satisfies the number trainable parameter of the model lower than 5 M. By conducting extensive experiments on the benchmark datasets NWPU-RESISC45, we achieve a robust and low-complexity model, which is very competitive to the state-of-the-art systems and potential for real-life applications on edge devices.
translated by 谷歌翻译