Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译
Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in COPD. We propose a novel two-stage 3D contextual transformer-based U-Net for airway segmentation using CT images. The method consists of two stages, performing initial and refined airway segmentation. The two-stage model shares the same subnetwork with different airway masks as input. Contextual transformer block is performed both in the encoder and decoder path of the subnetwork to finish high-quality airway segmentation effectively. In the first stage, the total airway mask and CT images are provided to the subnetwork, and the intrapulmonary airway mask and corresponding CT scans to the subnetwork in the second stage. Then the predictions of the two-stage method are merged as the final prediction. Extensive experiments were performed on in-house and multiple public datasets. Quantitative and qualitative analysis demonstrate that our proposed method extracted much more branches and lengths of the tree while accomplishing state-of-the-art airway segmentation performance. The code is available at https://github.com/zhaozsq/airway_segmentation.
translated by 谷歌翻译
The self-configuring nnU-Net has achieved leading performance in a large range of medical image segmentation challenges. It is widely considered as the model of choice and a strong baseline for medical image segmentation. However, despite its extraordinary performance, nnU-Net does not supply a measure of uncertainty to indicate its possible failure. This can be problematic for large-scale image segmentation applications, where data are heterogeneous and nnU-Net may fail without notice. In this work, we introduce a novel method to estimate nnU-Net uncertainty for medical image segmentation. We propose a highly effective scheme for posterior sampling of weight space for Bayesian uncertainty estimation. Different from previous baseline methods such as Monte Carlo Dropout and mean-field Bayesian Neural Networks, our proposed method does not require a variational architecture and keeps the original nnU-Net architecture intact, thereby preserving its excellent performance and ease of use. Additionally, we boost the segmentation performance over the original nnU-Net via marginalizing multi-modal posterior models. We applied our method on the public ACDC and M&M datasets of cardiac MRI and demonstrated improved uncertainty estimation over a range of baseline methods. The proposed method further strengthens nnU-Net for medical image segmentation in terms of both segmentation accuracy and quality control.
translated by 谷歌翻译
Deep learning has achieved notable success in 3D object detection with the advent of large-scale point cloud datasets. However, severe performance degradation in the past trained classes, i.e., catastrophic forgetting, still remains a critical issue for real-world deployment when the number of classes is unknown or may vary. Moreover, existing 3D class-incremental detection methods are developed for the single-domain scenario, which fail when encountering domain shift caused by different datasets, varying environments, etc. In this paper, we identify the unexplored yet valuable scenario, i.e., class-incremental learning under domain shift, and propose a novel 3D domain adaptive class-incremental object detection framework, DA-CIL, in which we design a novel dual-domain copy-paste augmentation method to construct multiple augmented domains for diversifying training distributions, thereby facilitating gradual domain adaptation. Then, multi-level consistency is explored to facilitate dual-teacher knowledge distillation from different domains for domain adaptive class-incremental learning. Extensive experiments on various datasets demonstrate the effectiveness of the proposed method over baselines in the domain adaptive class-incremental learning scenario.
translated by 谷歌翻译
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
translated by 谷歌翻译
尽管目前基于深度学习的方法在盲目的单图像超分辨率(SISR)任务中已获得了有希望的表现,但其中大多数主要集中在启发式上构建多样化的网络体系结构,并更少强调对Blur之间的物理发电机制的明确嵌入内核和高分辨率(HR)图像。为了减轻这个问题,我们提出了一个模型驱动的深神经网络,称为blind SISR。具体而言,为了解决经典的SISR模型,我们提出了一种简单的效果迭代算法。然后,通过将所涉及的迭代步骤展开到相应的网络模块中,我们自然构建了KXNET。所提出的KXNET的主要特异性是整个学习过程与此SISR任务的固有物理机制完全合理地集成在一起。因此,学习的模糊内核具有清晰的物理模式,并且模糊内核和HR图像之间的相互迭代过程可以很好地指导KXNET沿正确的方向发展。关于合成和真实数据的广泛实验很好地证明了我们方法的卓越准确性和一般性超出了当前代表性的最先进的盲目SISR方法。代码可在:\ url {https://github.com/jiahong-fu/kxnet}中获得。
translated by 谷歌翻译
卷积神经网络(CNN)由于其强大的特征提取和分类功能而广泛用于机械系统的故障诊断。但是,CNN是一个典型的黑盒模型,CNN决策的机制尚不清楚,这限制了其在高可授权要求的故障诊断方案中的应用。为了解决这个问题,我们提出了一个新颖的可解释的神经网络,称为时频网(TFN),其中物理上有意义的时频变换(TFT)方法被嵌入传统的卷积层中,作为自适应预处理层。这个称为时频卷积(TFCONV)层的预处理层受到精心设计的内核函数的约束,以提取与故障相关的时间频率信息。它不仅改善了诊断性能,而且还揭示了频域中CNN预测的逻辑基础。不同的TFT方法对应于TFCONV层的不同内核函数。在这项研究中,考虑了四种典型的TFT方法来制定TFN,并且通过三个机械故障诊断实验证明了它们的有效性和解释性。实验结果还表明,所提出的TFCONV层可以很容易地推广到具有不同深度的其他CNN。 TFN的代码可在https://github.com/chenqian0618/tfn上获得。
translated by 谷歌翻译
AI的创作(例如诗歌或歌词产生)吸引了行业和学术社区的越来越多的关注,在过去的几年中,许多有前途的模型提出了许多有前途的模型。现有方法通常基于单个和独立的视觉或文本信息估算输出。但是,实际上,人类通常会根据自己的经验进行创作,这可能涉及不同的方式并依次相关。为了模拟这种人类能力,在本文中,我们根据人类的经验来定义和解决一个新颖的AI创建问题。更具体地说,我们研究了如何基于顺序多模式信息生成文本。与以前的作品相比,此任务要困难得多,因为设计的模型必须很好地理解和适应不同模式之间的语义,并以顺序的方式有效地将其转化为输出。为了减轻这些困难,我们首先设计了配备有多模式注意力网络的多通道序列到序列体系结构。为了获得更有效的优化,我们然后提出了针对顺序输入量身定制的课程负抽样策略。为了基准这个问题并证明我们的模型的有效性,我们手动标记了一个新的多模式体验数据集。使用该数据集,我们通过将模型与一系列代表性基线进行比较,进行了广泛的实验,我们可以基于自动和以人为中心的指标来证明模型的显着改进。代码和数据可在:\ url {https://github.com/aman-4-real/mmtg}中获得。
translated by 谷歌翻译
人类的姿势估计旨在弄清不同场景中所有人的关键。尽管结果有希望,但目前的方法仍然面临一些挑战。现有的自上而下的方法单独处理一个人,而没有不同的人与所在的场景之间的相互作用。因此,当发生严重闭塞时,人类检测的表现会降低。另一方面,现有的自下而上方法同时考虑所有人,并捕获整个图像的全局知识。但是,由于尺度变化,它们的准确性不如自上而下的方法。为了解决这些问题,我们通过整合自上而下和自下而上的管道来探索不同接受场的视觉线索并实现其互补性,提出了一种新颖的双皮线整合变压器(DPIT)。具体而言,DPIT由两个分支组成,自下而上的分支介绍了整个图像以捕获全局视觉信息,而自上而下的分支则从单人类边界框中提取本地视觉的特征表示。然后,从自下而上和自上而下的分支中提取的特征表示形式被馈入变压器编码器,以交互融合全局和本地知识。此外,我们定义了关键点查询,以探索全景和单人类姿势视觉线索,以实现两个管道的相互互补性。据我们所知,这是将自下而上和自上而下管道与变压器与人类姿势估计的变压器相结合的最早作品之一。关于可可和MPII数据集的广泛实验表明,我们的DPIT与最先进的方法相当。
translated by 谷歌翻译
在3D人类姿势估计任务中存在挑战性问题,例如由遮挡和自我封闭引起的性能差。最近,IMU-Vision传感器融合被认为对于解决这些问题很有价值。但是,先前关于IMU和视觉数据的融合的研究(异质性)无法充分利用IMU原始数据或可靠的高级视觉功能。为了促进更有效的传感器融合,在这项工作中,我们提出了一个在参数人运动模型下的框架,称为\ emph {fusepose}。具体而言,我们汇总了IMU或视觉数据的不同信息,并引入了三种独特的传感器融合方法:NaiveFuse,Kinefuse和AdadeEpfuse。 NaiveFuse服务器是一种基本方法,仅融合简化的IMU数据并估计欧几里得空间中的3D姿势。在运动学空间中,KineFuse能够将校准和对齐的IMU原始数据与转换后的3D姿势参数集成在一起。 AdadeEpfuse进一步将这种运动学融合过程发展为一种适应性和端到端的训练方式。进行消融研究的综合实验表明了所提出的框架的合理性和优越性。与基线结果相比,3D人姿势估计的性能得到了提高。在Total Capture数据集上,KineFuse超过了先前的最新技术,该最新仅用于测试8.6 \%。 AdadeEpfuse超过了最新的,该技术使用IMU进行培训和测试的最新时间为8.5 \%。此外,我们通过对人类360万数据集的实验来验证框架的概括能力。
translated by 谷歌翻译