轻巧的飞行时间(TOF)深度传感器很小,便宜,低能量,并且已在移动设备上大量部署在移动设备上,以进行自动对焦,障碍物检测等。但是,由于其特定的测量值(深度分布)在某个像素时的区域而不是深度值,并且分辨率极低,它们不足以用于需要高保真深度(例如3D重建)的应用。在本文中,我们提出了Deltar,这是一种新颖的方法,可以通过与颜色图像合作来赋予高分辨率和准确深度的能力。作为Deltar的核心,提出了一种用于深度分布的特征提取器,并提出了基于注意力的神经体系结构,以有效地从颜色和TOF域中融合信息。为了在现实世界中评估我们的系统,我们设计了一个数据收集设备,并提出了一种校准RGB摄像头和TOF传感器的新方法。实验表明,我们的方法比旨在使用商品级RGB-D传感器的PAR性能实现的现有框架比现有的框架产生更准确的深度。代码和数据可在https://zju3dv.github.io/deltar/上获得。
translated by 谷歌翻译
卷积神经网络已广泛应用于医学图像分割,并取得了相当大的性能。但是,性能可能会受到训练数据(源域)和测试数据(目标域)之间域间隙的显着影响。为了解决此问题,我们提出了一种基于数据操作的域泛化方法,称为域概括(AADG)的自动增强。我们的AADG框架可以有效地采样数据增强策略,从而产生新的领域并从适当的搜索空间中多样化训练集。具体而言,我们介绍了一项新的代理任务,以最大程度地提高了多个增强新颖的域之间的多样性,该域通过单位球体空间中的凹痕距离来衡量,从而使自动化的增强可牵引。对抗性训练和深入的强化学习有效地搜索了目标。全面执行了11个公开底部的底面图像数据集的定量和定性实验(四个用于视网膜血管分割,四个用于视盘和杯子和杯(OD/OC)分割(OD/OC)分割,视网膜病变细分进行了三个)。两个用于视网膜脉管系统分割的八八个数据集进一步涉及验证跨模式泛化。我们提出的AADG通过视网膜船,OD/OC和病变细分任务的相当大的利润来表现出最新的概括性能,并优于现有方法。学到的政策在经验上得到了证实为模型不平衡,并且可以很好地转移到其他模型中。源代码可在https://github.com/crazorback/aadg上找到。
translated by 谷歌翻译
虽然相机和激光雷达在大多数辅助和自主驾驶系统中广泛使用,但仅提出了少数作品来将用于在线传感器数据融合的摄像机和镜头的时间同步和外部校准相关联。时间和空间校准技术正面临缺乏相关性和实时的挑战。在本文中,我们介绍了姿势估计模型和环境鲁棒线的提取,以提高数据融合和即时在线校正能力的相关性。考虑到相邻力矩之间的点云匹配的对应关系,动态目标旨在寻求最佳政策。搜索优化过程旨在以计算精度和效率提供准确的参数。为了证明这种方法的好处,我们以基础真实价值在基蒂基准上进行评估。在在线实验中,与时间校准中的软同步方法相比,我们的方法提高了准确性38.5%。在空间校准时,我们的方法会在0.4秒内自动纠正干扰误差,并达到0.3度的精度。这项工作可以促进传感器融合的研究和应用。
translated by 谷歌翻译
解决纳米级的形态学化相变对各种学科的许多科学和工业应用至关重要。通过组合全场传输X射线显微镜(TXM)和X射线吸收附近边缘结构(XANES)的TXM-XANES成像技术是通过获取具有多能量X的一系列显微镜图像来操作的新兴工具 - 接合并配合以获得化学图。然而,由于系统误差和用于快速采集的低曝光照明,其能力受到差的信噪比差的限制。在这项工作中,通过利用TXM-XANES成像数据的内在属性和子空间建模,我们引入了一种简单且坚固的去噪方法来提高图像质量,这使得能够快速和高灵敏度的化学成像。对合成和实时数据集的广泛实验证明了该方法的优越性。
translated by 谷歌翻译
神经科学领域的研究揭示了情绪模式和脑功能区域之间的关系,展示了不同脑区之间的动态关系是影响通过脑电图(EEG)确定的情绪识别的必要因素。此外,在脑电情绪识别中,我们可以观察到,基于相同的脑电图数据,我们可以观察到粗粒情绪之间的粗粒情绪之间的边界;这表明大型粗糙和小细粒度情绪变化的同意。因此,来自粗糙到细粒度类别的渐进分类过程可能有助于EEG情绪识别。因此,在本研究中,我们提出了一种逐步的图表卷积网络(PGCN),用于捕获EEG情绪信号中的这种固有特性,并逐步学习鉴别性EEG特征。为了适应不同的EEG模式,我们构建了一个双图模块,以表征不同EEG通道之间的内在关系,其中包含神经科学研究的动态功能连接和脑区的静态空间接近信息。此外,通过观察粗糙和细粒度的情绪之间的关系,我们采用双头模块,使PGCN能够逐步了解更多辨别性EEG特征,从粗粒(简单)到细粒度的类别(困难),参考情绪的分层特征。为了验证我们模型的性能,在两个公共数据集中进行了广泛的实验:种子-46和多模态生理情绪数据库(MPED)。
translated by 谷歌翻译
稀疏的一般矩阵乘法(SPGEMM)是许多科学应用中的基本构件。 SPGEMM的一项关键任务是计算或预测有效的内存分配和负载平衡的输出矩阵的结构(即,每个输出行的非零元素的数量),这会影响SPGEMM的整体性能。现有工作要么精确地计算出输出结构,要么采用基于上限或采样的方法来预测输出结构。但是,这些方法要么需要太多执行时间,要么不够准确。在本文中,我们提出了一种基于采样的新方法,与现有基于采样的方法相比,具有更好的精度和低成本。该方法首先通过利用中间产品的数量(表示为flop)和同一采样结果矩阵的非零元素(表示为NNZ)来预测SPGEMM的压缩比。然后,通过将每次输出行除以预测的压缩率来获得预测的输出结构。我们还建议使用优化的计算开销的基于采样的方法的参考设计,以证明所提出的方法的准确性。我们构建具有各种矩阵维度和稀疏结构的625个测试用例,以评估预测准确性。实验结果表明,在最坏的情况下,所提出方法和参考设计的绝对相对误差分别为1.56 \%和8.12 \%,分别为25 \%和156 \%。
translated by 谷歌翻译
Most multimodal multi-objective evolutionary algorithms (MMEAs) aim to find all global Pareto optimal sets (PSs) for a multimodal multi-objective optimization problem (MMOP). However, in real-world problems, decision makers (DMs) may be also interested in local PSs. Also, searching for both global and local PSs is more general in view of dealing with MMOPs, which can be seen as a generalized MMOP. In addition, the state-of-the-art MMEAs exhibit poor convergence on high-dimension MMOPs. To address the above two issues, in this study, a novel coevolutionary framework termed CoMMEA for multimodal multi-objective optimization is proposed to better obtain both global and local PSs, and simultaneously, to improve the convergence performance in dealing with high-dimension MMOPs. Specifically, the CoMMEA introduces two archives to the search process, and coevolves them simultaneously through effective knowledge transfer. The convergence archive assists the CoMMEA to quickly approaching the Pareto optimal front (PF). The knowledge of the converged solutions is then transferred to the diversity archive which utilizes the local convergence indicator and the $\epsilon$-dominance-based method to obtain global and local PSs effectively. Experimental results show that CoMMEA is competitive compared to seven state-of-the-art MMEAs on fifty-four complex MMOPs.
translated by 谷歌翻译
Conditional variational models, using either continuous or discrete latent variables, are powerful for open-domain dialogue response generation. However, previous works show that continuous latent variables tend to reduce the coherence of generated responses. In this paper, we also found that discrete latent variables have difficulty capturing more diverse expressions. To tackle these problems, we combine the merits of both continuous and discrete latent variables and propose a Hybrid Latent Variable (HLV) method. Specifically, HLV constrains the global semantics of responses through discrete latent variables and enriches responses with continuous latent variables. Thus, we diversify the generated responses while maintaining relevance and coherence. In addition, we propose Conditional Hybrid Variational Transformer (CHVT) to construct and to utilize HLV with transformers for dialogue generation. Through fine-grained symbolic-level semantic information and additive Gaussian mixing, we construct the distribution of continuous variables, prompting the generation of diverse expressions. Meanwhile, to maintain the relevance and coherence, the discrete latent variable is optimized by self-separation training. Experimental results on two dialogue generation datasets (DailyDialog and Opensubtitles) show that CHVT is superior to traditional transformer-based variational mechanism w.r.t. diversity, relevance and coherence metrics. Moreover, we also demonstrate the benefit of applying HLV to fine-tuning two pre-trained dialogue models (PLATO and BART-base).
translated by 谷歌翻译
Cross-modality magnetic resonance (MR) image synthesis aims to produce missing modalities from existing ones. Currently, several methods based on deep neural networks have been developed using both source- and target-modalities in a supervised learning manner. However, it remains challenging to obtain a large amount of completely paired multi-modal training data, which inhibits the effectiveness of existing methods. In this paper, we propose a novel Self-supervised Learning-based Multi-scale Transformer Network (SLMT-Net) for cross-modality MR image synthesis, consisting of two stages, \ie, a pre-training stage and a fine-tuning stage. During the pre-training stage, we propose an Edge-preserving Masked AutoEncoder (Edge-MAE), which preserves the contextual and edge information by simultaneously conducting the image reconstruction and the edge generation. Besides, a patch-wise loss is proposed to treat the input patches differently regarding their reconstruction difficulty, by measuring the difference between the reconstructed image and the ground-truth. In this case, our Edge-MAE can fully leverage a large amount of unpaired multi-modal data to learn effective feature representations. During the fine-tuning stage, we present a Multi-scale Transformer U-Net (MT-UNet) to synthesize the target-modality images, in which a Dual-scale Selective Fusion (DSF) module is proposed to fully integrate multi-scale features extracted from the encoder of the pre-trained Edge-MAE. Moreover, we use the pre-trained encoder as a feature consistency module to measure the difference between high-level features of the synthesized image and the ground truth one. Experimental results show the effectiveness of the proposed SLMT-Net, and our model can reliably synthesize high-quality images when the training set is partially unpaired. Our code will be publicly available at https://github.com/lyhkevin/SLMT-Net.
translated by 谷歌翻译
Offline reinforcement learning (RL) enables the agent to effectively learn from logged data, which significantly extends the applicability of RL algorithms in real-world scenarios where exploration can be expensive or unsafe. Previous works have shown that extracting primitive skills from the recurring and temporally extended structures in the logged data yields better learning. However, these methods suffer greatly when the primitives have limited representation ability to recover the original policy space, especially in offline settings. In this paper, we give a quantitative characterization of the performance of offline hierarchical learning and highlight the importance of learning lossless primitives. To this end, we propose to use a \emph{flow}-based structure as the representation for low-level policies. This allows us to represent the behaviors in the dataset faithfully while keeping the expression ability to recover the whole policy space. We show that such lossless primitives can drastically improve the performance of hierarchical policies. The experimental results and extensive ablation studies on the standard D4RL benchmark show that our method has a good representation ability for policies and achieves superior performance in most tasks.
translated by 谷歌翻译