We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译
Robust Markov decision processes (RMDPs) are promising models that provide reliable policies under ambiguities in model parameters. As opposed to nominal Markov decision processes (MDPs), however, the state-of-the-art solution methods for RMDPs are limited to value-based methods, such as value iteration and policy iteration. This paper proposes Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs with a global convergence guarantee in tabular problems. Unlike value-based methods, DRPG does not rely on dynamic programming techniques. In particular, the inner-loop robust policy evaluation problem is solved via projected gradient descent. Finally, our experimental results demonstrate the performance of our algorithm and verify our theoretical guarantees.
translated by 谷歌翻译
Deep neural networks have strong capabilities of memorizing the underlying training data, which can be a serious privacy concern. An effective solution to this problem is to train models with differential privacy, which provides rigorous privacy guarantees by injecting random noise to the gradients. This paper focuses on the scenario where sensitive data are distributed among multiple participants, who jointly train a model through federated learning (FL), using both secure multiparty computation (MPC) to ensure the confidentiality of each gradient update, and differential privacy to avoid data leakage in the resulting model. A major challenge in this setting is that common mechanisms for enforcing DP in deep learning, which inject real-valued noise, are fundamentally incompatible with MPC, which exchanges finite-field integers among the participants. Consequently, most existing DP mechanisms require rather high noise levels, leading to poor model utility. Motivated by this, we propose Skellam mixture mechanism (SMM), an approach to enforce DP on models built via FL. Compared to existing methods, SMM eliminates the assumption that the input gradients must be integer-valued, and, thus, reduces the amount of noise injected to preserve DP. Further, SMM allows tight privacy accounting due to the nice composition and sub-sampling properties of the Skellam distribution, which are key to accurate deep learning with DP. The theoretical analysis of SMM is highly non-trivial, especially considering (i) the complicated math of differentially private deep learning in general and (ii) the fact that the mixture of two Skellam distributions is rather complex, and to our knowledge, has not been studied in the DP literature. Extensive experiments on various practical settings demonstrate that SMM consistently and significantly outperforms existing solutions in terms of the utility of the resulting model.
translated by 谷歌翻译
In recent years, deep-learning-based approaches have been introduced to solving time-series forecasting-related problems. These novel methods have demonstrated impressive performance in univariate and low-dimensional multivariate time-series forecasting tasks. However, when these novel methods are used to handle high-dimensional multivariate forecasting problems, their performance is highly restricted by a practical training time and a reasonable GPU memory configuration. In this paper, inspired by a change of basis in the Hilbert space, we propose a flexible data feature extraction technique that excels in high-dimensional multivariate forecasting tasks. Our approach was originally developed for the National Science Foundation (NSF) Algorithms for Threat Detection (ATD) 2022 Challenge. Implemented using the attention mechanism and Convolutional Neural Networks (CNN) architecture, our method demonstrates great performance and compatibility. Our models trained on the GDELT Dataset finished 1st and 2nd places in the ATD sprint series and hold promise for other datasets for time series forecasting.
translated by 谷歌翻译
持久图(PDS)通常以同源性类别的死亡和出生为特征,以提供图形结构的拓扑表示,通常在机器学习任务中有用。先前的作品依靠单个图形签名来构建PD。在本文中,我们探讨了多尺度图标志家族的使用,以增强拓扑特征的鲁棒性。我们提出了一个深度学习体系结构来处理该集合的输入。基准图分类数据集上的实验表明,与使用图神经网络的最新方法相比,我们所提出的架构优于其他基于同源的方法,并实现其他基于同源的方法,并实现竞争性能。此外,我们的方法可以轻松地应用于大尺寸的输入图,因为它不会遭受有限的可伸缩性,这对于图内核方法可能是一个问题。
translated by 谷歌翻译
事件传感是生物启发的飞行指导和控制系统中的主要组成部分。我们探讨了事件摄像机在腹侧着陆期间与表面进行时间接触(TTC)的用法。这是通过估计差异(逆TTC)的差异来实现的,即径向光流的速率,是从着陆期间产生的事件流。我们的核心贡献是针对基于事件的差异估计的一种新颖的对比度最大化公式,以及一种分支和结合算法,可准确地最大化对比度并找到最佳的差异值。进行GPU加速度以加快全球算法。另一个贡献是一个新的数据集,其中包含来自腹面着陆的真实事件流,该数据集用于测试和基准我们的方法。由于全局优化,与其他启发式差异估计器或基于事件的光流方法相比,我们的算法更有能力恢复真正的分歧。随着GPU加速,我们的方法还可以实现竞争性的运行时间。
translated by 谷歌翻译
集体感知是群体机器人技术中的基本问题,在该机器人技术中,群体必须就环境的连贯代表达成共识。集体感知的一个重要变体将其视为最佳决策过程,在该过程中,群体必须从一组替代方案中确定最有可能的代表。过去对这种变体的工作主要集中在表征不同的算法如何在群体必须决定最频繁的环境特征的情况下如何导航速度-VS-Accuracy折衷。至关重要的是,过去在最佳决策中的工作使机器人传感器是完美的(无噪声和故障),从而限制了这些算法的现实适用性。在本文中,我们从第一个原理中得出了一个最佳的,概率的框架,用于配备有缺陷的传感器的简约群机器人。然后,我们在群体共同决定某个环境特征的频率的情况下验证了我们的方法。我们研究了有关几个感兴趣的参数的决策过程的速度和准确性。即使存在严重的感觉噪声,我们的方法也可以提供及时,准确的频率估计。
translated by 谷歌翻译
使用合成数据训练的深层模型需要适应域的适应性,以弥合模拟环境和目标环境之间的差距。最新的域适应方法通常需要来自目标域的足够数量(未标记的)数据。但是,当目标域是极端环境(例如空间)时,这种需求很难满足。在本文中,我们的目标问题是接近卫星姿势估计,从实际的会合任务中获取卫星的图像是昂贵的。我们证明,事件传感提供了一种有希望的解决方案,可以在Stark照明差异下从模拟到目标域。我们的主要贡献是一种基于事件的卫星姿势估计技术,纯粹是对合成事件数据进行培训的,该数据具有基本数据增强,以提高针对实际(嘈杂)事件传感器的鲁棒性。基础我们的方法是一个具有仔细校准的地面真相的新型数据集,其中包括通过在剧烈的照明条件下在实验室中模拟卫星集合场景获得的真实事件数据。数据集上的结果表明,我们基于事件的卫星姿势估计方法仅在没有适应的情况下接受合成数据训练,可以有效地概括为目标域。
translated by 谷歌翻译
音乐包含超出节拍和措施的层次结构。尽管层次结构注释有助于音乐信息检索和计算机音乐学,但在当前的数字音乐数据库中,这种注释很少。在本文中,我们探讨了一种数据驱动的方法,以自动从分数中提取分层的度量结构。我们提出了一个具有时间卷积网络条件随机字段(TCN-CRF)体系结构的新模型。给定符号音乐得分,我们的模型以良好的形式采用任意数量的声音,并预测了从偏低级别到截面级别的4级层次级别结构。我们还使用RWC-POP MIDI文件来注释数据集,以促进培​​训和评估。我们通过实验表明,在不同的编排设置下,提出的方法的性能优于基于规则的方法。我们还对模型预测进行了一些简单的音乐分析。所有演示,数据集和预培训模型均在GitHub上公开可用。
translated by 谷歌翻译
在这项工作中,我们研究了对象检测模型的自我监督预审计的不同方法。我们首先设计一个通用框架,通过随机采样和投射框来学习从图像中学习空间一致的密集表示,并将其投影到每个增强视图,并最大程度地提高相应的盒子功能之间的相似性。我们研究文献中的现有设计选择,例如盒子生成,功能提取策略,并使用其在实例级图像表示学习技术上获得成功启发的多种视图。我们的结果表明,该方法对超参数的不同选择是可靠的,并且使用多个视图不如实例级图像表示学习所显示的那样有效。我们还设计了两个辅助任务,以通过(1)通过使用对比度损失从采样设置中预测盒子中的一个视图中的框来预测框,并且(2)使用变压器预测盒子坐标,这可能会受益。下游对象检测任务。我们发现,在标记数据上预审计的模型时,这些任务不会导致更好的对象检测性能。
translated by 谷歌翻译