使用遥感图像进行建筑检测和变更检测可以帮助城市和救援计划。此外,它们可用于自然灾害后的建筑损害评估。当前,大多数用于建筑物检测的现有模型仅使用一个图像(预拆架图像)来检测建筑物。这是基于这样的想法:由于存在被破坏的建筑物,后沙仪图像降低了模型的性能。在本文中,我们提出了一种称为暹罗形式的暹罗模型,该模型使用前和垃圾后图像作为输入。我们的模型有两个编码器,并具有分层变压器体系结构。两个编码器中每个阶段的输出都以特征融合的方式给予特征融合,以从disasaster图像生成查询,并且(键,值)是从disasaster图像中生成的。为此,在特征融合中也考虑了时间特征。在特征融合中使用颞变压器的另一个优点是,与CNN相比,它们可以更好地维持由变压器编码器产生的大型接受场。最后,在每个阶段,将颞变压器的输出输入简单的MLP解码器。在XBD和WHU数据集上评估了暹罗形式模型,用于构建检测以及Levir-CD和CDD数据集,以进行更改检测,并可以胜过最新的。
translated by 谷歌翻译
本文介绍了Dahitra,这是一种具有分层变压器的新型深度学习模型,可在飓风后根据卫星图像对建筑物的损害进行分类。自动化的建筑损害评估为决策和资源分配提供了关键信息,以快速应急响应。卫星图像提供了实时,高覆盖的信息,并提供了向大规模污点后建筑物损失评估提供信息的机会。此外,深入学习方法已证明在对建筑物的损害进行分类方面有希望。在这项工作中,提出了一个基于变压器的新型网络来评估建筑物的损失。该网络利用多个分辨率的层次空间特征,并在将变压器编码器应用于空间特征后捕获特征域的时间差异。当对大规模灾难损坏数据集(XBD)进行测试以构建本地化和损坏分类以及在Levir-CD数据集上进行更改检测任务时,该网络将实现最先进的绩效。此外,我们引入了一个新的高分辨率卫星图像数据集,IDA-BD(与2021年路易斯安那州的2021年飓风IDA有关,以便域名适应以进一步评估该模型的能力,以适用于新损坏的区域。域的适应结果表明,所提出的模型可以适应一个新事件,只有有限的微调。因此,所提出的模型通过更好的性能和域的适应来推进艺术的当前状态。此外,IDA-BD也提供了A高分辨率注释的数据集用于该领域的未来研究。
translated by 谷歌翻译
本文介绍了一种基于变压器的暹罗网络架构(由Cradiformer缩写),用于从一对共同登记的遥感图像改变检测(CD)。与最近的CD框架不同,该CD框架基于完全卷积的网络(CoundNets),该方法将具有多层感知(MLP)解码器的分层结构化变压器编码器统一,以暹罗网络架构中的多层感知器,以有效地呈现所需的多尺度远程详细信息用于准确的CD。两个CD数据集上的实验表明,所提出的端到端培训变换器架构比以前的同行实现更好的CD性能。我们的代码可在https://github.com/wgcban/changeFormer获得。
translated by 谷歌翻译
Deep learning based change detection methods have received wide attentoion, thanks to their strong capability in obtaining rich features from images. However, existing AI-based CD methods largely rely on three functionality-enhancing modules, i.e., semantic enhancement, attention mechanisms, and correspondence enhancement. The stacking of these modules leads to great model complexity. To unify these three modules into a simple pipeline, we introduce Relational Change Detection Transformer (RCDT), a novel and simple framework for remote sensing change detection tasks. The proposed RCDT consists of three major components, a weight-sharing Siamese Backbone to obtain bi-temporal features, a Relational Cross Attention Module (RCAM) that implements offset cross attention to obtain bi-temporal relation-aware features, and a Features Constrain Module (FCM) to achieve the final refined predictions with high-resolution constraints. Extensive experiments on four different publically available datasets suggest that our proposed RCDT exhibits superior change detection performance compared with other competing methods. The therotical, methodogical, and experimental knowledge of this study is expected to benefit future change detection efforts that involve the cross attention mechanism.
translated by 谷歌翻译
语义细分需要在处理大量数据时学习高级特征的方法。卷积神经网络(CNN)可以学习独特和适应性的特征,以实现这一目标。但是,由于遥感图像的大尺寸和高空间分辨率,这些网络无法有效地分析整个场景。最近,Deep Transformers证明了它们能够记录图像中不同对象之间的全局相互作用的能力。在本文中,我们提出了一个新的分割模型,该模型将卷积神经网络与变压器结合在一起,并表明这种局部和全局特征提取技术的混合物在遥感分割中提供了显着优势。此外,提出的模型包括两个融合层,这些融合层旨在有效地表示网络的多模式输入和输出。输入融合层提取物具有总结图像内容与高程图(DSM)之间关系的地图。输出融合层使用一种新型的多任务分割策略,其中使用特定于类的特征提取层和损耗函数来识别类标签。最后,使用快速制定的方法将所有不明的类标签转换为其最接近的邻居。我们的结果表明,与最新技术相比,提出的方法可以提高分割精度。
translated by 谷歌翻译
在过去的十年中,基于深度学习的算法在遥感图像分析的不同领域中广泛流行。最近,最初在自然语言处理中引入的基于变形金刚的体系结构遍布计算机视觉领域,在该字段中,自我发挥的机制已被用作替代流行的卷积操作员来捕获长期依赖性。受到计算机视觉的最新进展的启发,遥感社区还见证了对各种任务的视觉变压器的探索。尽管许多调查都集中在计算机视觉中的变压器上,但据我们所知,我们是第一个对基于遥感中变压器的最新进展进行系统评价的人。我们的调查涵盖了60多种基于变形金刚的60多种方法,用于遥感子方面的不同遥感问题:非常高分辨率(VHR),高光谱(HSI)和合成孔径雷达(SAR)图像。我们通过讨论遥感中变压器的不同挑战和开放问题来结束调查。此外,我们打算在遥感论文中频繁更新和维护最新的变压器,及其各自的代码:https://github.com/virobo-15/transformer-in-in-remote-sensing
translated by 谷歌翻译
Change detection (CD) aims to find the difference between two images at different times and outputs a change map to represent whether the region has changed or not. To achieve a better result in generating the change map, many State-of-The-Art (SoTA) methods design a deep learning model that has a powerful discriminative ability. However, these methods still get lower performance because they ignore spatial information and scaling changes between objects, giving rise to blurry or wrong boundaries. In addition to these, they also neglect the interactive information of two different images. To alleviate these problems, we propose our network, the Scale and Relation-Aware Siamese Network (SARAS-Net) to deal with this issue. In this paper, three modules are proposed that include relation-aware, scale-aware, and cross-transformer to tackle the problem of scene change detection more effectively. To verify our model, we tested three public datasets, including LEVIR-CD, WHU-CD, and DSFIN, and obtained SoTA accuracy. Our code is available at https://github.com/f64051041/SARAS-Net.
translated by 谷歌翻译
更改检测的目的(CD)是通过比较在不同时间拍摄的两张图像来检测变化。 CD的挑战性部分是跟踪用户想要突出显示的变化,例如新建筑物,并忽略了由于外部因素(例如环境,照明条件,雾或季节性变化)而引起的变化。深度学习领域的最新发展使研究人员能够在这一领域取得出色的表现。特别是,时空注意的不同机制允许利用从模型中提取的空间特征,并通过利用这两个可用图像来以时间方式将它们相关联。不利的一面是,这些模型已经变得越来越复杂且大,对于边缘应用来说通常是不可行的。当必须将模型应用于工业领域或需要实时性能的应用程序时,这些都是限制。在这项工作中,我们提出了一个名为TinyCD的新型模型,证明既轻量级又有效,能够实现较少参数13-150x的最新技术状态。在我们的方法中,我们利用了低级功能比较图像的重要性。为此,我们仅使用几个骨干块。此策略使我们能够保持网络参数的数量较低。为了构成从这两个图像中提取的特征,我们在参数方面引入了一种新颖的经济性,混合块能够在时空和时域中交叉相关的特征。最后,为了充分利用计算功能中包含的信息,我们定义了能够执行像素明智分类的PW-MLP块。源代码,模型和结果可在此处找到:https://github.com/andreacodegoni/tiny_model_4_cd
translated by 谷歌翻译
建筑物分割是地球观测和空中图像分析领域的基本任务。最现有的基于深度学习的文献中的基于深度学习的算法可以应用于固定或窄的空间分辨率图像。在实践方案中,用户处理广泛的图像分辨率,因此,通常需要重新确定给定的空中图像以匹配用于训练深度学习模型的数据集的空间分辨率。然而,这将导致输出分割掩模的质量严重降级。要处理此问题,我们提出了这项研究,该研究是能够在不同空间分辨率下的空中图像中存在的建筑物的规模不变神经网络(SCI-NET)。具体而言,我们修改了U-Net架构并用密集的空间金字塔池(ASPP)融合,以提取细粒度的多尺度表示。我们将拟议模型对开放城市AI DataSet上的若干艺术模型的拟议模型进行了比较,并显示了SCI-Net在数据集中可用的所有分辨率方面提供稳定的改进余量。
translated by 谷歌翻译
卷积神经网络(CNN)已成为医疗图像分割任务的共识。但是,由于卷积操作的性质,它们在建模长期依赖性和空间相关性时受到限制。尽管最初开发了变压器来解决这个问题,但它们未能捕获低级功能。相比之下,证明本地和全球特征对于密集的预测至关重要,例如在具有挑战性的环境中细分。在本文中,我们提出了一种新型方法,该方法有效地桥接了CNN和用于医学图像分割的变压器。具体而言,我们使用开创性SWIN变压器模块和一个基于CNN的编码器设计两个多尺度特征表示。为了确保从上述两个表示获得的全局和局部特征的精细融合,我们建议在编码器编码器结构的跳过连接中提出一个双层融合(DLF)模块。在各种医学图像分割数据集上进行的广泛实验证明了Hiformer在计算复杂性以及定量和定性结果方面对其他基于CNN的,基于变压器和混合方法的有效性。我们的代码可在以下网址公开获取:https://github.com/amirhossein-kz/hiformer
translated by 谷歌翻译
遥感图像的更改检测(CD)是通过分析两个次时图像之间的差异来检测变化区域。它广泛用于土地资源规划,自然危害监测和其他领域。在我们的研究中,我们提出了一个新型的暹罗神经网络,用于变化检测任务,即双UNET。与以前的单独编码BITEMAL图像相反,我们设计了一个编码器差分注意模块,以关注像素的空间差异关系。为了改善网络的概括,它计算了咬合图像之间的任何像素之间的注意力权重,并使用它们来引起更具区别的特征。为了改善特征融合并避免梯度消失,在解码阶段提出了多尺度加权方差图融合策略。实验表明,所提出的方法始终优于流行的季节性变化检测数据集最先进的方法。
translated by 谷歌翻译
Change detection (CD) aims to detect change regions within an image pair captured at different times, playing a significant role in diverse real-world applications. Nevertheless, most of the existing works focus on designing advanced network architectures to map the feature difference to the final change map while ignoring the influence of the quality of the feature difference. In this paper, we study the CD from a different perspective, i.e., how to optimize the feature difference to highlight changes and suppress unchanged regions, and propose a novel module denoted as iterative difference-enhanced transformers (IDET). IDET contains three transformers: two transformers for extracting the long-range information of the two images and one transformer for enhancing the feature difference. In contrast to the previous transformers, the third transformer takes the outputs of the first two transformers to guide the enhancement of the feature difference iteratively. To achieve more effective refinement, we further propose the multi-scale IDET-based change detection that uses multi-scale representations of the images for multiple feature difference refinements and proposes a coarse-to-fine fusion strategy to combine all refinements. Our final CD method outperforms seven state-of-the-art methods on six large-scale datasets under diverse application scenarios, which demonstrates the importance of feature difference enhancements and the effectiveness of IDET.
translated by 谷歌翻译
Semantic segmentation of UAV aerial remote sensing images provides a more efficient and convenient surveying and mapping method for traditional surveying and mapping. In order to make the model lightweight and improve a certain accuracy, this research developed a new lightweight and efficient network for the extraction of ground features from UAV aerial remote sensing images, called LDMCNet. Meanwhile, this research develops a powerful lightweight backbone network for the proposed semantic segmentation model. It is called LDCNet, and it is hoped that it can become the backbone network of a new generation of lightweight semantic segmentation algorithms. The proposed model uses dual multi-scale context modules, namely the Atrous Space Pyramid Pooling module (ASPP) and the Object Context Representation module (OCR). In addition, this research constructs a private dataset for semantic segmentation of aerial remote sensing images from drones. This data set contains 2431 training sets, 945 validation sets, and 475 test sets. The proposed model performs well on this dataset, with only 1.4M parameters and 5.48G floating-point operations (FLOPs), achieving an average intersection-over-union ratio (mIoU) of 71.12%. 7.88% higher than the baseline model. In order to verify the effectiveness of the proposed model, training on the public datasets "LoveDA" and "CITY-OSM" also achieved excellent results, achieving mIoU of 65.27% and 74.39%, respectively.
translated by 谷歌翻译
We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perceptron (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from different layers, and thus combining both local attention and global attention to render powerful representations. We show that this simple and lightweight design is the key to efficient segmentation on Transformers. We scale our approach up to obtain a series of models from SegFormer-B0 to SegFormer-B5, reaching significantly better performance and efficiency than previous counterparts. For example, SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, being 5× smaller and 2.2% better than the previous best method. Our best model, SegFormer-B5, achieves 84.0% mIoU on Cityscapes validation set and shows excellent zero-shot robustness on Cityscapes-C. Code will be released at: github.com/NVlabs/SegFormer.Preprint. Under review.
translated by 谷歌翻译
Change detection (CD) is an essential earth observation technique. It captures the dynamic information of land objects. With the rise of deep learning, convolutional neural networks (CNN) have shown great potential in CD. However, current CNN models introduce backbone architectures that lose detailed information during learning. Moreover, current CNN models are heavy in parameters, which prevents their deployment on edge devices such as UAVs. In this work, we tackle this issue by proposing RDP-Net: a region detail preserving network for CD. We propose an efficient training strategy that constructs the training tasks during the warmup period of CNN training and lets the CNN learn from easy to hard. The training strategy enables CNN to learn more powerful features with fewer FLOPs and achieve better performance. Next, we propose an effective edge loss that increases the penalty for errors on details and improves the network's attention to details such as boundary regions and small areas. Furthermore, we provide a CNN model with a brand new backbone that achieves the state-of-the-art empirical performance in CD with only 1.70M parameters. We hope our RDP-Net would benefit the practical CD applications on compact devices and could inspire more people to bring change detection to a new level with the efficient training strategy. The code and models are publicly available at https://github.com/Chnja/RDPNet.
translated by 谷歌翻译
尽管近期基于深度学习的语义细分,但远程感测图像的自动建筑检测仍然是一个具有挑战性的问题,由于全球建筑物的出现巨大变化。误差主要发生在构建足迹的边界,阴影区域,以及检测外表面具有与周围区域非常相似的反射率特性的建筑物。为了克服这些问题,我们提出了一种生成的对抗基于网络的基于网络的分割框架,其具有嵌入在发电机中的不确定性关注单元和改进模块。由边缘和反向关注单元组成的细化模块,旨在精炼预测的建筑地图。边缘注意力增强了边界特征,以估计更高的精度,并且反向关注允许网络探索先前估计区域中缺少的功能。不确定性关注单元有助于网络解决分类中的不确定性。作为我们方法的权力的衡量标准,截至2021年12月4日,它在Deepglobe公共领导板上的第二名,尽管我们的方法的主要重点 - 建筑边缘 - 并不完全对齐用于排行榜排名的指标。 DeepGlobe充满挑战数据集的整体F1分数为0.745。我们还报告了对挑战的Inria验证数据集的最佳成绩,我们的网络实现了81.28%的总体验证,总体准确性为97.03%。沿着同一条线,对于官方Inria测试数据集,我们的网络总体上得分77.86%和96.41%,而且准确性。
translated by 谷歌翻译
识别息肉对于在计算机辅助临床支持系统中自动分析内窥镜图像的自动分析具有挑战性。已经提出了基于卷积网络(CNN),变压器及其组合的模型,以分割息肉以有希望的结果。但是,这些方法在模拟息肉的局部外观方面存在局限性,或者在解码过程中缺乏用于空间依赖性的多层次特征。本文提出了一个新颖的网络,即结肠形式,以解决这些局限性。 Colonformer是一种编码器架构,能够在编码器和解码器分支上对远程语义信息进行建模。编码器是一种基于变压器的轻量级体系结构,用于在多尺度上建模全局语义关系。解码器是一种层次结构结构,旨在学习多层功能以丰富特征表示。此外,添加了一个新的Skip连接技术,以完善整体地图中的息肉对象的边界以进行精确分割。已经在五个流行的基准数据集上进行了广泛的实验,以进行息肉分割,包括Kvasir,CVC-Clinic DB,CVC-ColondB,CVC-T和Etis-Larib。实验结果表明,我们的结肠构造者在所有基准数据集上的表现优于其他最先进的方法。
translated by 谷歌翻译
由于长距离依赖性建模的能力,变压器在各种自然语言处理和计算机视觉任务中表现出令人印象深刻的性能。最近的进展证明,将这种变压器与基于CNN的语义图像分割模型相结合非常有前途。然而,目前还没有很好地研究了纯变压器的方法如何实现图像分割。在这项工作中,我们探索了语义图像分割的新框架,它是基于编码器 - 解码器的完全变压器网络(FTN)。具体地,我们首先提出金字塔组变压器(PGT)作为逐步学习分层特征的编码器,同时降低标准视觉变压器(VIT)的计算复杂性。然后,我们将特征金字塔变换器(FPT)提出了来自PGT编码器的多电平进行语义图像分割的多级别的语义级别和空间级信息。令人惊讶的是,这种简单的基线可以在多个具有挑战性的语义细分和面部解析基准上实现更好的结果,包括帕斯卡背景,ADE20K,Cocostuff和Celebamask-HQ。源代码将在https://github.com/br -dl/paddlevit上发布。
translated by 谷歌翻译
Semantic Change Detection (SCD) refers to the task of simultaneously extracting the changed areas and the semantic categories (before and after the changes) in Remote Sensing Images (RSIs). This is more meaningful than Binary Change Detection (BCD) since it enables detailed change analysis in the observed areas. Previous works established triple-branch Convolutional Neural Network (CNN) architectures as the paradigm for SCD. However, it remains challenging to exploit semantic information with a limited amount of change samples. In this work, we investigate to jointly consider the spatio-temporal dependencies to improve the accuracy of SCD. First, we propose a SCanFormer (Semantic Change Transformer) to explicitly model the 'from-to' semantic transitions between the bi-temporal RSIs. Then, we introduce a semantic learning scheme to leverage the spatio-temporal constraints, which are coherent to the SCD task, to guide the learning of semantic changes. The resulting network (ScanNet) significantly outperforms the baseline method in terms of both detection of critical semantic changes and semantic consistency in the obtained bi-temporal results. It achieves the SOTA accuracy on two benchmark datasets for the SCD.
translated by 谷歌翻译
最新的语义分段方法采用具有编码器解码器架构的U-Net框架。 U-Net仍然具有挑战性,具有简单的跳过连接方案来模拟全局多尺度上下文:1)由于编码器和解码器级的不兼容功能集的问题,并非每个跳过连接设置都是有效的,甚至一些跳过连接对分割性能产生负面影响; 2)原始U-Net比某些数据集上没有任何跳过连接的U-Net更糟糕。根据我们的调查结果,我们提出了一个名为Uctransnet的新分段框架(在U-Net中的提议CTRANS模块),从引导机制的频道视角。具体地,CTRANS模块是U-NET SKIP连接的替代,其包括与变压器(命名CCT)和子模块通道 - 明智的跨关注进行多尺度信道交叉融合的子模块(命名为CCA)以指导熔融的多尺度通道 - 明智信息,以有效地连接到解码器功能以消除歧义。因此,由CCT和CCA组成的所提出的连接能够替换原始跳过连接以解决精确的自动医学图像分割的语义间隙。实验结果表明,我们的UCTRANSNET产生更精确的分割性能,并通过涉及变压器或U形框架的不同数据集和传统架构的语义分割来实现一致的改进。代码:https://github.com/mcgregorwwwww/uctransnet。
translated by 谷歌翻译