机器人进行深入增强学习(RL)的导航,在复杂的环境下实现了更高的性能,并且表现良好。同时,对深度RL模型的决策的解释成为更多自主机器人安全性和可靠性的关键问题。在本文中,我们提出了一种基于深入RL模型的注意力分支的视觉解释方法。我们将注意力分支与预先训练的深度RL模型联系起来,并通过以监督的学习方式使用受过训练的深度RL模型作为正确标签来训练注意力分支。由于注意力分支经过训练以输出与深RL模型相同的结果,因此获得的注意图与具有更高可解释性的代理作用相对应。机器人导航任务的实验结果表明,所提出的方法可以生成可解释的注意图以进行视觉解释。
translated by 谷歌翻译
Deep image prior (DIP) has recently attracted attention owing to its unsupervised positron emission tomography (PET) image reconstruction, which does not require any prior training dataset. In this paper, we present the first attempt to implement an end-to-end DIP-based fully 3D PET image reconstruction method that incorporates a forward-projection model into a loss function. To implement a practical fully 3D PET image reconstruction, which could not be performed due to a graphics processing unit memory limitation, we modify the DIP optimization to block-iteration and sequentially learn an ordered sequence of block sinograms. Furthermore, the relative difference penalty (RDP) term was added to the loss function to enhance the quantitative PET image accuracy. We evaluated our proposed method using Monte Carlo simulation with [$^{18}$F]FDG PET data of a human brain and a preclinical study on monkey brain [$^{18}$F]FDG PET data. The proposed method was compared with the maximum-likelihood expectation maximization (EM), maximum-a-posterior EM with RDP, and hybrid DIP-based PET reconstruction methods. The simulation results showed that the proposed method improved the PET image quality by reducing statistical noise and preserved a contrast of brain structures and inserted tumor compared with other algorithms. In the preclinical experiment, finer structures and better contrast recovery were obtained by the proposed method. This indicated that the proposed method can produce high-quality images without a prior training dataset. Thus, the proposed method is a key enabling technology for the straightforward and practical implementation of end-to-end DIP-based fully 3D PET image reconstruction.
translated by 谷歌翻译
In this paper, we propose a low error rate and real-time stereo vision system on GPU. Many stereo vision systems on GPU have been proposed to date. In those systems, the error rates and the processing speed are in trade-off relationship. We propose a real-time stereo vision system on GPU for the high resolution images. This system also maintains a low error rate compared to other fast systems. In our approach, we have implemented the cost aggregation (CA), cross-checking and median filter on GPU in order to realize the real-time processing. Its processing speed is 40 fps for 1436x992 pixels images when the maximum disparity is 145, and its error rate is the lowest among the GPU systems which are faster than 30 fps.
translated by 谷歌翻译
Mobile stereo-matching systems have become an important part of many applications, such as automated-driving vehicles and autonomous robots. Accurate stereo-matching methods usually lead to high computational complexity; however, mobile platforms have only limited hardware resources to keep their power consumption low; this makes it difficult to maintain both an acceptable processing speed and accuracy on mobile platforms. To resolve this trade-off, we herein propose a novel acceleration approach for the well-known zero-means normalized cross correlation (ZNCC) matching cost calculation algorithm on a Jetson Tx2 embedded GPU. In our method for accelerating ZNCC, target images are scanned in a zigzag fashion to efficiently reuse one pixel's computation for its neighboring pixels; this reduces the amount of data transmission and increases the utilization of on-chip registers, thus increasing the processing speed. As a result, our method is 2X faster than the traditional image scanning method, and 26% faster than the latest NCC method. By combining this technique with the domain transformation (DT) algorithm, our system show real-time processing speed of 32 fps, on a Jetson Tx2 GPU for 1,280x384 pixel images with a maximum disparity of 128. Additionally, the evaluation results on the KITTI 2015 benchmark show that our combined system is more accurate than the same algorithm combined with census by 7.26%, while maintaining almost the same processing speed.
translated by 谷歌翻译
The black-box nature of end-to-end speech translation (E2E ST) systems makes it difficult to understand how source language inputs are being mapped to the target language. To solve this problem, we would like to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word. A major challenge arises from the fact that translation is a non-monotonic sequence transduction task due to word ordering differences between languages -- this clashes with the monotonic nature of ASR. Therefore, we propose to generate ST tokens out-of-order while remembering how to re-order them later. We achieve this by predicting a sequence of tuples consisting of a source word, the corresponding target words, and post-editing operations dictating the correct insertion points for the target word. We examine two variants of such operation sequences which enable generation of monotonic transcriptions and non-monotonic translations from the same speech input simultaneously. We apply our approach to offline and real-time streaming models, demonstrating that we can provide explainable translations without sacrificing quality or latency. In fact, the delayed re-ordering ability of our approach improves performance during streaming. As an added benefit, our method performs ASR and ST simultaneously, making it faster than using two separate systems to perform these tasks.
translated by 谷歌翻译
扬声器在彼此保持一致的过程中建立了融洽的关系。在指导域材料的同时,已经证明了与教师的融洽关系,以促进学习。过去关于教育领域的词汇一致性的工作都在量化对齐方式的措施和与代理对齐的相互作用的类型中都遭受了限制。在本文中,我们采用基于数据驱动的共享表达式概念(可能由多个单词组成)的对齐措施,并比较一对一的人类机器人(H-R)相互作用的对齐方式与协作人类人类的H-R部分中的对齐方式-Orobot(H-H-R)相互作用。我们发现,H-R设置中的学生与H-H-R设置相比,与可教的机器人保持一致,并且词汇一致性和融洽关系之间的关系比以前的理论和经验工作所预测的要复杂。
translated by 谷歌翻译
ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译
联合学习(FL)是以分散的方式共同训练机器学习算法的范式。 FL中的大多数研究都集中在基于神经网络的方法上,但是,由于克服算法的迭代和添加性特征的挑战,在联合学习中基于XGBoost的方法(例如XGBOOST)在联合学习中没有得到反应。基于决策树的模型,尤其是XGBoost,可以处理非IID数据,这对于联合学习框架中使用的算法很重要,因为数据的基本特征是分散的,并且具有本质上非IID的风险。在本文中,我们专注于研究通过对各种基于样本量的数据偏斜方案进行实验以及这些模型在各种非IID方案下的性能,通过非IID分布的影响如何受到非IID分布的影响。我们在多个不同的数据集中进行了一组广泛的实验,并进行了不同的数据偏斜分区。我们的实验结果表明,尽管有各种分区比率,但模型的性能保持一致,并且与以集中式方式训练的模型接近或同样良好。
translated by 谷歌翻译
黑盒优化在许多应用中具有潜力,例如在实验设计中的机器学习和优化中的超参数优化。 ISING机器对二进制优化问题很有用,因为变量可以由Ising机器的单个二进制变量表示。但是,使用ISING机器的常规方法无法处理具有非二进制值的黑框优化问题。为了克服这一限制,我们通过与三种不同的整数编码方法合作,通过使用ISING/退火计算机和分解计算机来提出一种用于整数变量的黑盒优化问题的方法。使用不同的编码方法,使用一个简单的问题来计算最稳定状态下的氢分子能量,以不同的编码方法进行数值评估。提出的方法可以使用任何整数编码方法来计算能量。但是,单次编码对于小尺寸的问题很有用。
translated by 谷歌翻译
在对象检测中,数据量和成本是一种权衡,在特定领域中收集大量数据是劳动密集型的。因此,现有的大规模数据集用于预训练。但是,当目标域与源域显着不同时,常规传输学习和域的适应性不能弥合域间隙。我们提出了一种数据合成方法,可以解决大域间隙问题。在此方法中,目标图像的一部分被粘贴到源图像上,并通过利用对象边界框的信息来对齐粘贴区域的位置。此外,我们介绍对抗性学习,以区分原始区域或粘贴区域。所提出的方法在大量源图像和一些目标域图像上训练。在非常不同的域问题设置中,所提出的方法比常规方法获得更高的精度,其中RGB图像是源域,而热红外图像是目标域。同样,在模拟图像与真实图像的情况下,提出的方法达到了更高的精度。
translated by 谷歌翻译