我们为具有有界过程和测量噪声的未知线性系统模型提供了一种强大的数据驱动控制方案。不取决于传统预测控制中的系统模型,提出了利用数据驱动的可达区域的控制器。数据驱动的可到达区域基于矩阵Zonotope递归,并且基于仅系统的轨迹的噪声输入输出数据来计算。我们假设测量和过程噪声包含在有界集中。虽然我们承担了这些界限的知识,但假设了关于噪声的统计特性的知识。在无噪声情况下,我们证明所呈现的纯粹数据驱动的控制方案导致等效的闭环行为到标称模型预测控制方案。在测量和过程噪声的情况下,我们提出的方案保证了强大的约束满足感,这在安全关键型应用中至关重要。数值实验表明了所提出的数据驱动控制器与基于模型的控制方案相比的有效性。
translated by 谷歌翻译
我们考虑在重复的未知游戏中进行规避风险的学习,在这种游戏中,代理商的目标是最大程度地减少其个人产生高成本的风险。具体而言,代理商使用处于风险的条件值(CVAR)作为风险措施,并以每集选定动作的成本值的形式依靠强盗反馈来估算其CVAR值并更新其动作。使用匪徒反馈来估计CVAR的一个主要挑战是,代理只能访问其自身的成本值,但是,这取决于所有代理的行为。为了应对这一挑战,我们提出了一种新的规避风险的学习算法,并利用有关成本价值的完整历史信息。我们表明,该算法实现了子线性的遗憾,并匹配了文献中最著名的算法。我们为欧洲大师游戏提供了数值实验,该游戏表明我们的方法表现优于现有方法。
translated by 谷歌翻译
本文提出了一类具有多项式非线性的非线性系统的基于数据驱动的集基估计算法。使用系统的输入输出数据,所提出的方法实时计算,保证包含系统状态的集合。尽管假设系统是多项式类型,但不需要知道精确的多项式函数及其系数。为此,估算器依赖于离线和在线阶段。离线阶段利用过去的输入输出数据来估计多项式系统的一组可能的系数。然后,使用该估计的系数和关于系统的侧面信息,在线阶段提供了对状态的集合估计。最后,通过其对SIR(易感,受感染的)的流行病模型的应用来评估所提出的方法。
translated by 谷歌翻译
This paper considers the distributed online convex optimization problem with time-varying constraints over a network of agents. This is a sequential decision making problem with two sequences of arbitrarily varying convex loss and constraint functions. At each round, each agent selects a decision from the decision set, and then only a portion of the loss function and a coordinate block of the constraint function at this round are privately revealed to this agent. The goal of the network is to minimize the network-wide loss accumulated over time. Two distributed online algorithms with full-information and bandit feedback are proposed. Both dynamic and static network regret bounds are analyzed for the proposed algorithms, and network cumulative constraint violation is used to measure constraint violation, which excludes the situation that strictly feasible constraints can compensate the effects of violated constraints. In particular, we show that the proposed algorithms achieve $\mathcal{O}(T^{\max\{\kappa,1-\kappa\}})$ static network regret and $\mathcal{O}(T^{1-\kappa/2})$ network cumulative constraint violation, where $T$ is the time horizon and $\kappa\in(0,1)$ is a user-defined trade-off parameter. Moreover, if the loss functions are strongly convex, then the static network regret bound can be reduced to $\mathcal{O}(T^{\kappa})$. Finally, numerical simulations are provided to illustrate the effectiveness of the theoretical results.
translated by 谷歌翻译
数学模型是动态控制系统设计中的基本构件。随着控制系统变得越来越复杂和网络,基于第一原理的方法达到了限制。数据驱动的方法提供了替代方案。但是,在没有结构知识的情况下,这些方法很容易在训练数据中找到虚假的相关性,这可能会妨碍所获得的模型的概括能力。当系统暴露于未知情况时,这可以显着降低控制和预测性能。先前的因果鉴定可以防止这种陷阱。在本文中,我们提出了一种识别控制系统因果结构的方法。我们根据可控性概念设计实验,该概念提供了一种系统的方法来计算输入轨迹,该输入轨迹将系统引导到其状态空间中的特定区域。然后,我们分析从因果推理中利用强大技术的结果数据,并将其扩展到控制系统。此外,我们得出了保证发现系统真正因果结构的条件。在机器人臂上的实验表明,来自现实世界数据和增强的概括能力的可靠因果鉴定。
translated by 谷歌翻译
Markowitz mean-variance portfolios with sample mean and covariance as input parameters feature numerous issues in practice. They perform poorly out of sample due to estimation error, they experience extreme weights together with high sensitivity to change in input parameters. The heavy-tail characteristics of financial time series are in fact the cause for these erratic fluctuations of weights that consequently create substantial transaction costs. In robustifying the weights we present a toolbox for stabilizing costs and weights for global minimum Markowitz portfolios. Utilizing a projected gradient descent (PGD) technique, we avoid the estimation and inversion of the covariance operator as a whole and concentrate on robust estimation of the gradient descent increment. Using modern tools of robust statistics we construct a computationally efficient estimator with almost Gaussian properties based on median-of-means uniformly over weights. This robustified Markowitz approach is confirmed by empirical studies on equity markets. We demonstrate that robustified portfolios reach the lowest turnover compared to shrinkage-based and constrained portfolios while preserving or slightly improving out-of-sample performance.
translated by 谷歌翻译
In this paper we take the first steps in studying a new approach to synthesis of efficient communication schemes in multi-agent systems, trained via reinforcement learning. We combine symbolic methods with machine learning, in what is referred to as a neuro-symbolic system. The agents are not restricted to only use initial primitives: reinforcement learning is interleaved with steps to extend the current language with novel higher-level concepts, allowing generalisation and more informative communication via shorter messages. We demonstrate that this approach allow agents to converge more quickly on a small collaborative construction task.
translated by 谷歌翻译
Recommendation Systems (RSs) are ubiquitous in modern society and are one of the largest points of interaction between humans and AI. Modern RSs are often implemented using deep learning models, which are infamously difficult to interpret. This problem is particularly exasperated in the context of recommendation scenarios, as it erodes the user's trust in the RS. In contrast, the newly introduced Tsetlin Machines (TM) possess some valuable properties due to their inherent interpretability. TMs are still fairly young as a technology. As no RS has been developed for TMs before, it has become necessary to perform some preliminary research regarding the practicality of such a system. In this paper, we develop the first RS based on TMs to evaluate its practicality in this application domain. This paper compares the viability of TMs with other machine learning models prevalent in the field of RS. We train and investigate the performance of the TM compared with a vanilla feed-forward deep learning model. These comparisons are based on model performance, interpretability/explainability, and scalability. Further, we provide some benchmark performance comparisons to similar machine learning solutions relevant to RSs.
translated by 谷歌翻译
Riemannian geometry provides powerful tools to explore the latent space of generative models while preserving the inherent structure of the data manifold. Lengths, energies and volume measures can be derived from a pullback metric, defined through the immersion that maps the latent space to the data space. With this in mind, most generative models are stochastic, and so is the pullback metric. Manipulating stochastic objects is strenuous in practice. In order to perform operations such as interpolations, or measuring the distance between data points, we need a deterministic approximation of the pullback metric. In this work, we are defining a new metric as the expected length derived from the stochastic pullback metric. We show this metric is Finslerian, and we compare it with the expected pullback metric. In high dimensions, we show that the metrics converge to each other at a rate of $\mathcal{O}\left(\frac{1}{D}\right)$.
translated by 谷歌翻译
We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e.g. a robotic manipulator in a simulated kitchen. Instead of imitating low-level actions like joint velocities, our approach imitates the sequence of demonstrated semantic skills like "opening the microwave" or "turning on the stove". This allows us to transfer demonstrations across environments (e.g. real-world to simulated kitchen) and agent embodiments (e.g. bimanual human demonstration to robotic arm). We evaluate on three challenging cross-domain learning problems and match the performance of demonstration-accelerated RL approaches that require in-domain demonstrations. In a simulated kitchen environment, our approach learns long-horizon robot manipulation tasks, using less than 3 minutes of human video demonstrations from a real-world kitchen. This enables scaling robot learning via the reuse of demonstrations, e.g. collected as human videos, for learning in any number of target domains.
translated by 谷歌翻译