近年来,卷积神经网络(CNN)在合成孔径雷达(SAR)目标识别方面表现出巨大的潜力。 SAR图像具有强烈的粒度感,并且具有不同的纹理特征,例如斑点噪声,目标优势散射器和目标轮廓,这些轮廓很少在传统的CNN模型中被考虑。本文提出了两个残留块,即具有多尺度接收场(RFS)的EMC2A块,基于多型结构,然后设计了有效的同位素体系结构深CNN(DCNN),EMC2A-net。 EMC2A阻止使用不同的扩张速率利用平行的扩张卷积,这可以有效地捕获多尺度上下文特征而不会显着增加计算负担。为了进一步提高多尺度功能融合的效率,本文提出了多尺度特征跨通道注意模块,即EMC2A模块,采用了局部的多尺度特征交互策略,而无需降低维度。该策略通过有效的一维(1D) - 圆形卷积和Sigmoid函数适应每个通道的权重,以指导全球通道明智的关注。 MSTAR数据集上的比较结果表明,EMC2A-NET优于相同类型的现有模型,并且具有相对轻巧的网络结构。消融实验结果表明,仅使用一些参数和适当的跨渠道相互作用,EMC2A模块可显着提高模型的性能。
translated by 谷歌翻译
有效的早期检测马铃薯晚枯萎病(PLB)是马铃薯栽培的必要方面。然而,由于缺乏在冠层水平上缺乏视觉线索,在具有传统成像方法的领域的早期阶段来检测晚期枯萎是一项挑战。高光谱成像可以,捕获来自宽范围波长的光谱信号也在视觉波长之外。在这种情况下,通过将2D卷积神经网络(2D-CNN)和3D-CNN与深度合作的网络(PLB-2D-3D-A)组合来提出高光谱图像的深度学习分类架构。首先,2D-CNN和3D-CNN用于提取丰富的光谱空间特征,然后使用注意力块和SE-RESET用于强调特征图中的突出特征,并提高模型的泛化能力。数据集采用15,360张图像(64x64x204)构建,从在实验领域捕获的240个原始图像裁剪,具有超过20种马铃薯基因型。 2000年图像的测试数据集中的精度在全带中达到0.739,特定带中的0.790(492nm,519nm,560nm,592nm,717nm和765nm)。本研究表明,具有深入学习和近端高光谱成像的早期检测PLB的令人鼓舞的结果。
translated by 谷歌翻译
现有的多尺度解决方案会导致仅增加接受场大小的风险,同时忽略小型接受场。因此,有效构建自适应神经网络以识别各种空间尺度对象是一个具有挑战性的问题。为了解决这个问题,我们首先引入一个新的注意力维度,即除了现有的注意力维度(例如渠道,空间和分支)之外,并提出了一个新颖的选择性深度注意网络,以对称地处理各种视觉中的多尺度对象任务。具体而言,在给定神经网络的每个阶段内的块,即重新连接,输出层次功能映射共享相同的分辨率但具有不同的接收场大小。基于此结构属性,我们设计了一个舞台建筑模块,即SDA,其中包括树干分支和类似SE的注意力分支。躯干分支的块输出融合在一起,以通过注意力分支指导其深度注意力分配。根据提出的注意机制,我们可以动态选择不同的深度特征,这有助于自适应调整可变大小输入对象的接收场大小。这样,跨块信息相互作用会导致沿深度方向的远距离依赖关系。与其他多尺度方法相比,我们的SDA方法结合了从以前的块到舞台输出的多个接受场,从而提供了更广泛,更丰富的有效接收场。此外,我们的方法可以用作其他多尺度网络以及注意力网络的可插入模块,并创造为SDA- $ x $ net。它们的组合进一步扩展了有效的接受场的范围,可以实现可解释的神经网络。我们的源代码可在\ url {https://github.com/qingbeiguo/sda-xnet.git}中获得。
translated by 谷歌翻译
In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.
translated by 谷歌翻译
随着深度学习的发展,单图像超分辨率(SISR)取得了重大突破。最近,已经提出了基于全局特征交互的SISR网络性能的方法。但是,需要动态地忽略对上下文的响应的神经元的功能。为了解决这个问题,我们提出了一个轻巧的交叉障碍性推理网络(CFIN),这是一个由卷积神经网络(CNN)和变压器组成的混合网络。具体而言,一种新型的交叉磁场导向变压器(CFGT)旨在通过使用调制卷积内核与局部代表性语义信息结合来自适应修改网络权重。此外,提出了基于CNN的跨尺度信息聚合模块(CIAM),以使模型更好地专注于潜在的实用信息并提高变压器阶段的效率。广泛的实验表明,我们提出的CFIN是一种轻巧有效的SISR模型,可以在计算成本和模型性能之间达到良好的平衡。
translated by 谷歌翻译
人行道表面数据的获取和评估在路面条件评估中起着至关重要的作用。在本文中,提出了一个称为RHA-NET的自动路面裂纹分割的有效端到端网络,以提高路面裂纹分割精度。 RHA-NET是通过将残留块(重阻)和混合注意块集成到编码器架构结构中来构建的。这些重组用于提高RHA-NET提取高级抽象特征的能力。混合注意块旨在融合低级功能和高级功能,以帮助模型专注于正确的频道和裂纹区域,从而提高RHA-NET的功能表现能力。构建并用于训练和评估所提出的模型的图像数据集,其中包含由自设计的移动机器人收集的789个路面裂纹图像。与其他最先进的网络相比,所提出的模型在全面的消融研究中验证了添加残留块和混合注意机制的功能。此外,通过引入深度可分离卷积生成的模型的轻加权版本可以更好地实现性能和更快的处理速度,而U-NET参数数量的1/30。开发的系统可以在嵌入式设备Jetson TX2(25 fps)上实时划分路面裂纹。实时实验拍摄的视频将在https://youtu.be/3xiogk0fig4上发布。
translated by 谷歌翻译
Semantic segmentation of UAV aerial remote sensing images provides a more efficient and convenient surveying and mapping method for traditional surveying and mapping. In order to make the model lightweight and improve a certain accuracy, this research developed a new lightweight and efficient network for the extraction of ground features from UAV aerial remote sensing images, called LDMCNet. Meanwhile, this research develops a powerful lightweight backbone network for the proposed semantic segmentation model. It is called LDCNet, and it is hoped that it can become the backbone network of a new generation of lightweight semantic segmentation algorithms. The proposed model uses dual multi-scale context modules, namely the Atrous Space Pyramid Pooling module (ASPP) and the Object Context Representation module (OCR). In addition, this research constructs a private dataset for semantic segmentation of aerial remote sensing images from drones. This data set contains 2431 training sets, 945 validation sets, and 475 test sets. The proposed model performs well on this dataset, with only 1.4M parameters and 5.48G floating-point operations (FLOPs), achieving an average intersection-over-union ratio (mIoU) of 71.12%. 7.88% higher than the baseline model. In order to verify the effectiveness of the proposed model, training on the public datasets "LoveDA" and "CITY-OSM" also achieved excellent results, achieving mIoU of 65.27% and 74.39%, respectively.
translated by 谷歌翻译
哥内克人Sentinel Imagery的纯粹卷的可用性为使用深度学习的大尺度创造了新的土地利用陆地覆盖(Lulc)映射的机会。虽然在这种大型数据集上培训是一个非琐碎的任务。在这项工作中,我们试验Lulc Image分类和基准不同最先进模型的Bigearthnet数据集,包括卷积神经网络,多层感知,视觉变压器,高效导通和宽残余网络(WRN)架构。我们的目标是利用分类准确性,培训时间和推理率。我们提出了一种基于用于网络深度,宽度和输入数据分辨率的WRNS复合缩放的高效导通的框架,以有效地训练和测试不同的模型设置。我们设计一种新颖的缩放WRN架构,增强了有效的通道注意力机制。我们提出的轻量级模型具有较小的培训参数,实现所有19个LULC类的平均F分类准确度达到4.5%,并且验证了我们使用的resnet50最先进的模型速度快两倍作为基线。我们提供超过50种培训的型号,以及我们在多个GPU节点上分布式培训的代码。
translated by 谷歌翻译
交通标志检测是无人驾驶系统的具有挑战性的任务,特别是对于检测多尺度目标和检测的实时问题。在交通标志检测过程中,目标的比例大大变化,这将对检测精度产生一定的影响。特征金字塔广泛用于解决这个问题,但它可能会破坏不同的交通标志尺度的功能一致性。此外,在实际应用中,常用方法难以提高多尺度交通标志的检测精度,同时确保实时检测。在本文中,我们提出了一种改进的特征金字塔模型,名为AF-FPN,它利用自适应注意模块(AAM)和特征增强模块(FEM)来减少特征映射生成过程中的信息损失,并提高表示能力特征金字塔。我们用AF-FPN替换了YOLOV5中的原始特征金字塔网络,这在确保实时检测的前提下提高了YOLOV5网络的多尺度目标的检测性能。此外,提出了一种新的自动学习数据增强方法来丰富数据集,提高模型的稳健性,使其更适合实际情况。关于清华腾讯100K(TT100K)数据集的广泛实验结果证明了与多种最先进的方法相比,所提出的方法的有效性和优越性。
translated by 谷歌翻译
土地覆盖分类是一项多级分割任务,将每个像素分类为地球表面的某些天然或人为类别,例如水,土壤,自然植被,农作物和人类基础设施。受硬件计算资源和内存能力的限制,大多数现有研究通过将它们放置或将其裁剪成小于512*512像素的小斑块来预处理原始遥感图像,然后再将它们发送到深神经网络。然而,下调图像会导致空间细节损失,使小细分市场难以区分,并逆转了数十年来努力获得的空间分辨率进度。将图像裁剪成小斑块会导致远程上下文信息的丢失,并将预测的结果恢复为原始大小会带来额外的延迟。为了响应上述弱点,我们提出了称为Mkanet的有效的轻巧的语义分割网络。 Mkanet针对顶视图高分辨率遥感图像的特征,利用共享内核同时且同样处理不一致的尺度的地面段,还采用平行且浅层的体系结构来提高推理速度和友好的支持速度和友好的支持图像贴片,超过10倍。为了增强边界和小段歧视,我们还提出了一种捕获类别杂质区域的方法,利用边界信息并对边界和小部分错误判断施加额外的惩罚。广泛实验的视觉解释和定量指标都表明,Mkanet在两个土地覆盖分类数据集上获得了最先进的准确性,并且比其他竞争性轻量级网络快2倍。所有这些优点突出了Mkanet在实际应用中的潜力。
translated by 谷歌翻译
最近,卷积神经网络(CNN)技术具有普及作为高光谱图像分类(HSIC)的工具。为了在有限样品的条件下提高HSIC的特征提取效率,目前的方法通常使用大量层的深层模型。然而,当样品有限时,深网络模型容易出现过度拟合和梯度消失问题。此外,空间分辨率严重降低,深度深度,这对空间边缘特征提取非常有害。因此,这封信提出了一种HSIC的浅模型,称为深度过度参数化卷积神经网络(DOCNN)。为了确保浅模型的有效提取,引入深度过度参数化卷积(DO-CONV)内核以提取歧视特征。深度过度参数化卷积内核由标准卷积内核和深度卷积内核组成,其可以单独地提取不同信道的空间特征,并同时熔合整个通道的空间特征。此外,为了进一步减少由于卷积操作引起的空间边缘特征的损失,提出了一种密集的残余连接(DRC)结构以适用于整个网络的特征提取部分。从三个基准数据集获得的实验结果表明,该方法在分类准确度和计算效率方面优于其他最先进的方法。
translated by 谷歌翻译
Learning-based infrared small object detection methods currently rely heavily on the classification backbone network. This tends to result in tiny object loss and feature distinguishability limitations as the network depth increases. Furthermore, small objects in infrared images are frequently emerged bright and dark, posing severe demands for obtaining precise object contrast information. For this reason, we in this paper propose a simple and effective ``U-Net in U-Net'' framework, UIU-Net for short, and detect small objects in infrared images. As the name suggests, UIU-Net embeds a tiny U-Net into a larger U-Net backbone, enabling the multi-level and multi-scale representation learning of objects. Moreover, UIU-Net can be trained from scratch, and the learned features can enhance global and local contrast information effectively. More specifically, the UIU-Net model is divided into two modules: the resolution-maintenance deep supervision (RM-DS) module and the interactive-cross attention (IC-A) module. RM-DS integrates Residual U-blocks into a deep supervision network to generate deep multi-scale resolution-maintenance features while learning global context information. Further, IC-A encodes the local context information between the low-level details and high-level semantic features. Extensive experiments conducted on two infrared single-frame image datasets, i.e., SIRST and Synthetic datasets, show the effectiveness and superiority of the proposed UIU-Net in comparison with several state-of-the-art infrared small object detection methods. The proposed UIU-Net also produces powerful generalization performance for video sequence infrared small object datasets, e.g., ATR ground/air video sequence dataset. The codes of this work are available openly at \url{https://github.com/danfenghong/IEEE_TIP_UIU-Net}.
translated by 谷歌翻译
从深度学习的迅速发展中受益,许多基于CNN的图像超分辨率方法已经出现并取得了更好的结果。但是,大多数算法很难同时适应空间区域和通道特征,更不用说它们之间的信息交换了。此外,注意力模块之间的信息交换对于研究人员而言甚至不太明显。为了解决这些问题,我们提出了一个轻量级的空间通道自适应协调,对多级改进增强网络(MREN)。具体而言,我们构建了一个空间通道自适应协调块,该块使网络能够在不同的接受场下学习空间区域和渠道特征感兴趣的信息。此外,在空间部分和通道部分之间的相应特征处理级别的信息在跳跃连接的帮助下交换,以实现两者之间的协调。我们通过简单的线性组合操作在注意模块之间建立了通信桥梁,以便更准确,连续地指导网络注意感兴趣的信息。在几个标准测试集上进行的广泛实验表明,我们的MREN在具有很少数量的参数和非常低的计算复杂性的其他高级算法上实现了优越的性能。
translated by 谷歌翻译
人类自然有效地在复杂的场景中找到突出区域。通过这种观察的动机,引入了计算机视觉中的注意力机制,目的是模仿人类视觉系统的这一方面。这种注意机制可以基于输入图像的特征被视为动态权重调整过程。注意机制在许多视觉任务中取得了巨大的成功,包括图像分类,对象检测,语义分割,视频理解,图像生成,3D视觉,多模态任务和自我监督的学习。在本调查中,我们对计算机愿景中的各种关注机制进行了全面的审查,并根据渠道注意,空间关注,暂时关注和分支注意力进行分类。相关的存储库https://github.com/menghaoguo/awesome-vision-tions致力于收集相关的工作。我们还建议了未来的注意机制研究方向。
translated by 谷歌翻译
自我监督学习的快速发展降低了从大量未标记的数据中的条形学习特征表示形式,并触发了一系列有关遥感图像的变更检测的研究。从自然图像分类到遥感图像的自我监督学习的挑战是从两个任务之间的差异引起的。对于像素级的精确更改检测,学习的补丁级特征表示不满意。在本文中,我们提出了一种新颖的像素级自我观察的高光谱空间传播理解网络(HyperNet),以完成像素的特征表示,以有效地进行高光谱变化检测。具体而言,不是斑块,而是整个图像被馈入网络,并且通过像素比较多个颞空间光谱特征。提出了一个强大的空间光谱注意模块,而不是处理二维成像空间和光谱响应维度,而是提出了一个强大的空间光谱注意模块,以探索分别分别的多个颞高光谱图像(HSIS)的空间相关性和判别光谱特征。仅创建并被迫对齐双期HSI的同一位置的正样品,旨在学习光谱差异不变的特征。此外,提出了一种新的相似性损失函数,以解决不平衡的简单和硬阳性样品比较的问题,其中这些硬样品的重量被扩大并突出显示以促进网络训练。已经采用了六个高光谱数据集来测试拟议的HyperNET的有效性和概括。广泛的实验表明,在下游高光谱变化检测任务上,HyperNET优于最先进的算法。
translated by 谷歌翻译
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.
translated by 谷歌翻译
区分计算机生成(CG)和自然摄影图像(PG)图像对于验证数字图像的真实性和独创性至关重要。但是,最近的尖端生成方法使CG图像中的合成质量很高,这使得这项具有挑战性的任务变得更加棘手。为了解决这个问题,提出了具有深层质地和高频特征的联合学习策略,以进行CG图像检测。我们首先制定并深入分析CG和PG图像的不同采集过程。基于这样的发现,即图像采集中的多个不同模块将导致对图像中基于卷积神经网络(CNN)渲染的不同敏感性不一致,我们提出了一个深层纹理渲染模块,以增强纹理差异和歧视性纹理表示。具体而言,生成语义分割图来指导仿射转换操作,该操作用于恢复输入图像不同区域中的纹理。然后,原始图像和原始图像和渲染图像的高频组件的组合被馈入配备了注意机制的多支球神经网络,该神经网络分别优化了中间特征,并分别促进了空间和通道维度的痕量探索。在两个公共数据集和一个具有更现实和多样化图像的新构建的数据集上进行的广泛实验表明,所提出的方法的表现优于现有方法,从而明确的余量。此外,结果还证明了拟议方法后处理操作和生成对抗网络(GAN)生成的图像的检测鲁棒性和泛化能力。
translated by 谷歌翻译
标记医学图像取决于专业知识,因此很难在短时间内以高质量获取大量注释的医学图像。因此,在小型数据集中充分利用有限标记的样品来构建高性能模型是医疗图像分类问题的关键。在本文中,我们提出了一个深入监督的层选择性注意网络(LSANET),该网络全面使用功能级和预测级监督中的标签信息。对于特征级别的监督,为了更好地融合低级功能和高级功能,我们提出了一个新颖的视觉注意模块,层选择性注意(LSA),以专注于不同层的特征选择。 LSA引入了一种权重分配方案,该方案可以在整个训练过程中动态调整每个辅助分支的加权因子,以进一步增强深入监督的学习并确保其概括。对于预测级的监督,我们采用知识协同策略,通过成对知识匹配来促进所有监督分支之间的层次信息互动。使用公共数据集MedMnist,这是用于涵盖多种医学专业的生物医学图像分类的大规模基准,我们评估了LSANET在多个主流CNN体系结构和各种视觉注意模块上评估。实验结果表明,我们所提出的方法对其相应的对应物进行了实质性改进,这表明LSANET可以为医学图像分类领域的标签有效学习提供有希望的解决方案。
translated by 谷歌翻译
Recently, great progress has been made in single-image super-resolution (SISR) based on deep learning technology. However, the existing methods usually require a large computational cost. Meanwhile, the activation function will cause some features of the intermediate layer to be lost. Therefore, it is a challenge to make the model lightweight while reducing the impact of intermediate feature loss on the reconstruction quality. In this paper, we propose a Feature Interaction Weighted Hybrid Network (FIWHN) to alleviate the above problem. Specifically, FIWHN consists of a series of novel Wide-residual Distillation Interaction Blocks (WDIB) as the backbone, where every third WDIBs form a Feature shuffle Weighted Group (FSWG) by mutual information mixing and fusion. In addition, to mitigate the adverse effects of intermediate feature loss on the reconstruction results, we introduced a well-designed Wide Convolutional Residual Weighting (WCRW) and Wide Identical Residual Weighting (WIRW) units in WDIB, and effectively cross-fused features of different finenesses through a Wide-residual Distillation Connection (WRDC) framework and a Self-Calibrating Fusion (SCF) unit. Finally, to complement the global features lacking in the CNN model, we introduced the Transformer into our model and explored a new way of combining the CNN and Transformer. Extensive quantitative and qualitative experiments on low-level and high-level tasks show that our proposed FIWHN can achieve a good balance between performance and efficiency, and is more conducive to downstream tasks to solve problems in low-pixel scenarios.
translated by 谷歌翻译
Dunhuang murals are a collection of Chinese style and national style, forming a self-contained Chinese-style Buddhist art. It has very high historical and cultural value and research significance. Among them, the lines of Dunhuang murals are highly general and expressive. It reflects the character's distinctive character and complex inner emotions. Therefore, the outline drawing of murals is of great significance to the research of Dunhuang Culture. The contour generation of Dunhuang murals belongs to image edge detection, which is an important branch of computer vision, aims to extract salient contour information in images. Although convolution-based deep learning networks have achieved good results in image edge extraction by exploring the contextual and semantic features of images. However, with the enlargement of the receptive field, some local detail information is lost. This makes it impossible for them to generate reasonable outline drawings of murals. In this paper, we propose a novel edge detector based on self-attention combined with convolution to generate line drawings of Dunhuang murals. Compared with existing edge detection methods, firstly, a new residual self-attention and convolution mixed module (Ramix) is proposed to fuse local and global features in feature maps. Secondly, a novel densely connected backbone extraction network is designed to efficiently propagate rich edge feature information from shallow layers into deep layers. Compared with existing methods, it is shown on different public datasets that our method is able to generate sharper and richer edge maps. In addition, testing on the Dunhuang mural dataset shows that our method can achieve very competitive performance.
translated by 谷歌翻译