The traditional statistical inference is static, in the sense that the estimate of the quantity of interest does not affect the future evolution of the quantity. In some sequential estimation problems however, the future values of the quantity to be estimated depend on the estimate of its current value. This type of estimation problems has been formulated as the dynamic inference problem. In this work, we formulate the Bayesian learning problem for dynamic inference, where the unknown quantity-generation model is assumed to be randomly drawn according to a random model parameter. We derive the optimal Bayesian learning rules, both offline and online, to minimize the inference loss. Moreover, learning for dynamic inference can serve as a meta problem, such that all familiar machine learning problems, including supervised learning, imitation learning and reinforcement learning, can be cast as its special cases or variants. Gaining a good understanding of this unifying meta problem thus sheds light on a broad spectrum of machine learning problems as well.
translated by 谷歌翻译
Event cameras that asynchronously output low-latency event streams provide great opportunities for state estimation under challenging situations. Despite event-based visual odometry having been extensively studied in recent years, most of them are based on monocular and few research on stereo event vision. In this paper, we present ESVIO, the first event-based stereo visual-inertial odometry, which leverages the complementary advantages of event streams, standard images and inertial measurements. Our proposed pipeline achieves temporal tracking and instantaneous matching between consecutive stereo event streams, thereby obtaining robust state estimation. In addition, the motion compensation method is designed to emphasize the edge of scenes by warping each event to reference moments with IMU and ESVIO back-end. We validate that both ESIO (purely event-based) and ESVIO (event with image-aided) have superior performance compared with other image-based and event-based baseline methods on public and self-collected datasets. Furthermore, we use our pipeline to perform onboard quadrotor flights under low-light environments. A real-world large-scale experiment is also conducted to demonstrate long-term effectiveness. We highlight that this work is a real-time, accurate system that is aimed at robust state estimation under challenging environments.
translated by 谷歌翻译
时间序列数据出现在各种应用程序中,例如智能运输和环境监测。时间序列分析的基本问题之一是时间序列预测。尽管最近的深度时间序列预测方法取得了成功,但它们仍需要足够的历史价值观察才能进行准确的预测。换句话说,输出长度(或预测范围)与输入和输出长度之和的比率应足够低(例如,0.3)。随着比率的增加(例如,到0.8),预测准确性的不确定性显着增加。在本文中,我们从理论和经验上都表明,通过将相关时间序列检索作为参考文献可以有效地降低不确定性。在理论分析中,我们首先量化不确定性,并显示其与平方误差(MSE)的连接。然后,我们证明,带有参考的模型比没有参考的模型更容易学习,因为检索到的参考可能会降低不确定性。为了凭经验证明基于检索的时间序列预测模型的有效性,我们引入了一种简单而有效的两阶段方法,称为“保留”,该方法由关系检索和内容合成组成。我们还表明,可以轻松地适应时空时间序列和时间序列插补设置。最后,我们评估了现实世界数据集上的延迟,以证明其有效性。
translated by 谷歌翻译
事件摄像机是运动激活的传感器,可捕获像素级照明的变化,而不是具有固定帧速率的强度图像。与标准摄像机相比,它可以在高速运动和高动态范围场景中提供可靠的视觉感知。但是,当相机和场景之间的相对运动受到限制时,例如在静态状态下,事件摄像机仅输出一点信息甚至噪音。尽管标准相机可以在大多数情况下,尤其是在良好的照明条件下提供丰富的感知信息。这两个相机完全是互补的。在本文中,我们提出了一种具有鲁棒性,高智能和实时优化的基于事件的视觉惯性镜(VIO)方法,具有事件角度,基于线的事件功能和基于点的图像功能。提出的方法旨在利用人为场景中的自然场景和基于线路的功能中的基于点的功能,以通过设计良好设计的功能管理提供更多其他结构或约束信息。公共基准数据集中的实验表明,与基于图像或基于事件的VIO相比,我们的方法可以实现卓越的性能。最后,我们使用我们的方法演示了机上闭环自动驾驶四极管飞行和大规模室外实验。评估的视频在我们的项目网站上介绍:https://b23.tv/oe3qm6j
translated by 谷歌翻译
磁共振成像是临床诊断的重要工具。但是,它遭受了漫长的收购时间。深度学习的利用,尤其是深层生成模型,在磁共振成像中提供了积极的加速和更好的重建。然而,学习数据分布作为先验知识并从有限数据中重建图像仍然具有挑战性。在这项工作中,我们提出了一种新颖的Hankel-K空间生成模型(HKGM),该模型可以从一个k-空间数据的训练集中生成样品。在先前的学习阶段,我们首先从k空间数据构建一个大的Hankel矩阵,然后从大型Hankel矩阵中提取多个结构化的K空间贴片,以捕获不同斑块之间的内部分布。从Hankel矩阵中提取斑块使生成模型可以从冗余和低级别的数据空间中学习。在迭代重建阶段,可以观察到所需的解决方案遵守学识渊博的先验知识。通过将其作为生成模型的输入来更新中间重建解决方案。然后,通过对测量数据对其Hankel矩阵和数据一致性组合施加低排名的惩罚来替代地进行操作。实验结果证实,单个K空间数据中斑块的内部统计数据具有足够的信息来学习强大的生成模型并提供最新的重建。
translated by 谷歌翻译
蛋白质通过折叠到特定的3D结构来执行生物学功能。为了准确地模拟蛋白质结构,应仔细考虑氨基酸(例如侧链扭转角度和氨基酸际方向)之间的总体几何拓扑和局部细粒关系。在这项工作中,我们提出了定向的体重神经网络,以更好地捕获不同氨基酸之间的几何关系。我们的新框架将单个重量从标量扩大到3D定向矢量,支持经典和SO(3)的丰富几何操作(3) - 表示特征,在其上,我们构建了一个可用于处理氨基酸的感知器单元信息。此外,我们还引入了一条蛋白质上的范式传递范式,以将定向权重的感知器插入现有的图形神经网络中,从而显示出在全球尺度上保持SO(3) - 均衡性方面的较高多功能性。实验表明,与经典的神经网络和(全球)模棱两可的网络相比,我们的网络在表示几何关系方面具有更好的表现力。它还在与蛋白质3D结构有关的各种计算生物学应用上实现最新性能。
translated by 谷歌翻译
可重新配置的智能表面(RIS)是未来无线通信系统的新兴技术。在这项工作中,我们考虑由RIS启用的下行链路空间多路复用,以获得加权和速率(WSR)最大化。在文献中,大多数解决方案使用交替的基于梯度的优化,具有中等性能,高复杂性和有限的可扩展性。我们建议应用完全卷积的网络(FCN)来解决这个问题,最初是为图像的语义分割而设计的。 RIS的矩形形状和具有相邻RIS天线的通道的空间相关性由于它们之间的短距离而鼓励我们将其应用于RIS配置。我们设计一组通道功能,包括通过RIS和Direct通道的级联通道。在基站(BS)中,可分离的最小均方平方误差(MMSE)预编码器用于预测,然后应用加权最小均方误差(WMMSE)预编码器以进行微调,这是不增强的,更复杂的,但实现更好的表现。评价结果表明,该解决方案具有更高的性能,允许比基线更快的评估。因此,它可以更好地缩放到大量的天线,推进RIS更接近实际部署的步骤。
translated by 谷歌翻译
自动疼痛识别对于医学诊断和治疗至关重要。现有工程分为三类:评估面部外观变化,利用生理线索,或以多模态的方式融合它们。然而,(1)外观变化很容易受到阻碍客观疼痛识别的主观因素的影响。此外,基于外观的方法忽略了对于随时间建模表达的远程空间依赖性。 (2)通过在人体上附着传感器来获得生理学提示,这不方便和不舒服。在本文中,我们提出了一种新的多任务学习框架,其以非接触方式编码外观变化和生理线索以进行疼痛识别。该框架能够通过所学习的外观表示的提出的注意机制来捕获局部和远程依赖性,这是通过在辅助任务中从视频中恢复的恢复的时间上富集的富集。该框架被称为RPPG的时空关注网络(RSTAN),并允许我们在公开的止痛数据库上建立非接触疼痛识别的最先进的性能。它展示了RPPG预测可以用作辅助任务,以便于非接触自动疼痛识别。
translated by 谷歌翻译
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
translated by 谷歌翻译
In recent years, the Transformer architecture has shown its superiority in the video-based person re-identification task. Inspired by video representation learning, these methods mainly focus on designing modules to extract informative spatial and temporal features. However, they are still limited in extracting local attributes and global identity information, which are critical for the person re-identification task. In this paper, we propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules to address the above issue. Specifically, MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips, respectively, achieving the holistic perception of the input person. We combine the outputs of all the stages for the final identification. In practice, to save the computational cost, the Spatial-Temporal Aggregation (STA) modules are first adopted in each stage to conduct the self-attention operations along the spatial and temporal dimensions separately. We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages. All of them are realized by employing newly designed self-attention operations with specific meanings. Moreover, temporal patch shuffling is also introduced to further improve the robustness of the model. Extensive experimental results demonstrate the effectiveness of the proposed modules in extracting the informative and discriminative information from the videos, and illustrate the MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
translated by 谷歌翻译