智能论文笔记

Learning robust marking policies for adaptive mesh refinement

Andrew Gillette , Brendan Keith , Socratis Petrides

分类：机器学习

2022-07-13

在这项工作中，我们重新审视标准自适应有限元方法（AFEM）中做出的标记决定。经验表明，na \“ {i} ve标记策略会导致对自适应网格改进的计算资源的效率低下。因此，实践中使用AFEM通常涉及临时或耗时的离线参数调整来设置适当的参数对于标记子例程。为了解决这些实际问题，我们将AMR作为马尔可夫决策过程，在该过程中可以在运行时选择完善参数，而无需专家用户进行预先调整。在此新范式中，还可以通过标记策略自适应地选择细化参数，该标记策略可以使用强化学习中的方法进行优化。我们使用泊松方程来证明我们在$ h $ - 和$ hp $ - $ $ - 重新计算基准问题上的技术，我们的实验表明，这表明我们的实验表明对于许多古典AFEM应用程序，尚未发现卓越的标记策略。此外，这项工作的意外观察是，对一个PDE家族进行培训的标记政策是有时的MES足够强大，可以在训练家庭之外的问题上表现出色。为了进行插图，我们表明，在只有一个重新入口的2D域中训练的简单$ HP $投资政策可以在更复杂的2D域甚至3D域中部署，而没有大幅度的性能损失。为了复制和更广泛的采用，我们伴随着这项工作，并采用了我们方法的开源实施。

translated by 谷歌翻译

Deep Reinforcement Learning for Adaptive Mesh Refinement

Corbin Foucart , Aaron Charous , Pierre F. J. Lermusiaux

分类：机器学习

2022-09-25

计算物理问题问题的有限元离散通常依赖于自适应网格细化（AMR）来优先解决模拟过程中包含重要特征的区域。但是，这些空间改进策略通常是启发式的，并且依靠特定领域的知识或反复试验。我们将自适应网状精炼的过程视为不完整的信息下的本地，顺序决策问题，将AMR作为部分可观察到的马尔可夫决策过程。使用深厚的增强学习方法，我们直接从数值模拟中训练政策网络为AMR策略训练。培训过程不需要精确的解决方案或手头部分微分方程的高保真地面真相，也不需要预先计算的培训数据集。我们强化学习公式的本地性质使政策网络可以廉价地培训比部署的问题要小得多。该方法不是特定于任何特定的部分微分方程，问题维度或数值离散化的特定，并且可以灵活地结合各种问题物理。为此，我们使用各种高阶不连续的Galerkin和杂交不连续的Galerkin有限元离散化，将方法应用于各种偏微分方程。我们表明，由此产生的深入强化学习政策与共同的AMR启发式方法具有竞争力，跨越问题类别概括，并在准确性和成本之间取得了有利的平衡，因此它们通常会导致每个问题自由度的准确性更高。

translated by 谷歌翻译

E2N: Error Estimation Networks for Goal-Oriented Mesh Adaptation

Joseph G. Wallwork , Jingyi Lu , Mingrui Zhang , Matthew D. Piggott

分类：机器学习

2022-07-22

给定部分微分方程（PDE），面向目标的误差估计使我们能够了解诊断数量的兴趣数量（QOI）或目标的错误如何发生并积累在数值近似中，例如使用有限元方法。通过将误差估计分解为来自各个元素的贡献，可以制定适应方法，该方法可以修改网格，以最大程度地减少所得QOI误差的目的。但是，标准误差估计公式涉及真实的伴随解决方案，这在实践中是未知的。因此，通常的做法是用“富集”的近似值（例如，在更高的空间或精制的网格上）近似。这样做通常会导致计算成本的显着增加，这可能是损害（面向目标）自适应模拟的竞争力的瓶颈。本文的核心思想是通过选择性更换昂贵的误差估计步骤，并使用适当的配置和训练的神经网络开发“数据驱动”目标的网格适应方法。这样，甚至可以在不构造富集空间的情况下获得误差估计器。此处采用了逐元构造，该元素构造与网格几何相关的各种参数的局部值和基础问题物理物理作为输入，并且对误差估计器的相应贡献作为输出。我们证明，这种方法能够以降低的计算成本获得相同的准确性，对于与潮汐涡轮机周围流动相关的自适应网格测试用例，这些测试用例是通过其下游唤醒相互作用的，以及农场的整体功率输出作为将其视为QOI。此外，我们证明了元素元素方法意味着培训成本相当低。

translated by 谷歌翻译

Multi-Agent Reinforcement Learning for Adaptive Mesh Refinement

Jiachen Yang , Ketan Mittal , Tarik Dzanic , Socratis Petrides , Brendan Keith , Brenden Petersen , Daniel Faissol , Robert Anderson

分类：机器学习 | 人工智能

2022-11-02

Adaptive mesh refinement (AMR) is necessary for efficient finite element simulations of complex physical phenomenon, as it allocates limited computational budget based on the need for higher or lower resolution, which varies over space and time. We present a novel formulation of AMR as a fully-cooperative Markov game, in which each element is an independent agent who makes refinement and de-refinement choices based on local information. We design a novel deep multi-agent reinforcement learning (MARL) algorithm called Value Decomposition Graph Network (VDGN), which solves the two core challenges that AMR poses for MARL: posthumous credit assignment due to agent creation and deletion, and unstructured observations due to the diversity of mesh geometries. For the first time, we show that MARL enables anticipatory refinement of regions that will encounter complex features at future times, thereby unlocking entirely new regions of the error-cost objective landscape that are inaccessible by traditional methods based on local error estimators. Comprehensive experiments show that VDGN policies significantly outperform error threshold-based policies in global error and cost metrics. We show that learned policies generalize to test problems with physical features, mesh geometries, and longer simulation times that were not seen in training. We also extend VDGN with multi-objective optimization capabilities to find the Pareto front of the tradeoff between cost and error.

translated by 谷歌翻译

Recent Advances in Reinforcement Learning in Finance

Ben Hambly , Renyuan Xu , Huining Yang

分类：机器学习

2021-12-08

由于数据量增加，金融业的快速变化已经彻底改变了数据处理和数据分析的技术，并带来了新的理论和计算挑战。与古典随机控制理论和解决财务决策问题的其他分析方法相比，解决模型假设的财务决策问题，强化学习（RL）的新发展能够充分利用具有更少模型假设的大量财务数据并改善复杂的金融环境中的决策。该调查纸目的旨在审查最近的资金途径的发展和使用RL方法。我们介绍了马尔可夫决策过程，这是许多常用的RL方法的设置。然后引入各种算法，重点介绍不需要任何模型假设的基于价值和基于策略的方法。连接是用神经网络进行的，以扩展框架以包含深的RL算法。我们的调查通过讨论了这些RL算法在金融中各种决策问题中的应用，包括最佳执行，投资组合优化，期权定价和对冲，市场制作，智能订单路由和Robo-Awaring。

translated by 谷歌翻译

Physics-based Deep Learning

Nils Thuerey , Philipp Holl , Maximilian Mueller , Patrick Schnell , Felix Trost , Kiwon Um

分类：机器学习

2021-09-11

这本数字本书包含在物理模拟的背景下与深度学习相关的一切实际和全面的一切。尽可能多，所有主题都带有Jupyter笔记本的形式的动手代码示例，以便快速入门。除了标准的受监督学习的数据中，我们将看看物理丢失约束，更紧密耦合的学习算法，具有可微分的模拟，以及加强学习和不确定性建模。我们生活在令人兴奋的时期：这些方法具有从根本上改变计算机模拟可以实现的巨大潜力。

translated by 谷歌翻译

Investigation of reinforcement learning for shape optimization of profile extrusion dies

Clemens Fricke , Daniel Wolff , Marco Kemmerling , Stefanie Elgeti

分类：机器学习

2022-12-23

Profile extrusion is a continuous production process for manufacturing plastic profiles from molten polymer. Especially interesting is the design of the die, through which the melt is pressed to attain the desired shape. However, due to an inhomogeneous velocity distribution at the die exit or residual stresses inside the extrudate, the final shape of the manufactured part often deviates from the desired one. To avoid these deviations, the shape of the die can be computationally optimized, which has already been investigated in the literature using classical optimization approaches. A new approach in the field of shape optimization is the utilization of Reinforcement Learning (RL) as a learning-based optimization algorithm. RL is based on trial-and-error interactions of an agent with an environment. For each action, the agent is rewarded and informed about the subsequent state of the environment. While not necessarily superior to classical, e.g., gradient-based or evolutionary, optimization algorithms for one single problem, RL techniques are expected to perform especially well when similar optimization tasks are repeated since the agent learns a more general strategy for generating optimal shapes instead of concentrating on just one single problem. In this work, we investigate this approach by applying it to two 2D test cases. The flow-channel geometry can be modified by the RL agent using so-called Free-Form Deformation, a method where the computational mesh is embedded into a transformation spline, which is then manipulated based on the control-point positions. In particular, we investigate the impact of utilizing different agents on the training progress and the potential of wall time saving by utilizing multiple environments during training.

translated by 谷歌翻译

Neural Approaches to Co-Optimization in Robotics

Charles Schaff

分类：机器人

2022-09-01

机器人和与世界相互作用或互动的机器人和智能系统越来越多地被用来自动化各种任务。这些系统完成这些任务的能力取决于构成机器人物理及其传感器物体的机械和电气部件，例如，感知算法感知环境，并计划和控制算法以生产和控制算法来生产和控制算法有意义的行动。因此，通常有必要在设计具体系统时考虑这些组件之间的相互作用。本文探讨了以端到端方式对机器人系统进行任务驱动的合作的工作，同时使用推理或控制算法直接优化了系统的物理组件以进行任务性能。我们首先考虑直接优化基于信标的本地化系统以达到本地化准确性的问题。设计这样的系统涉及将信标放置在整个环境中，并通过传感器读数推断位置。在我们的工作中，我们开发了一种深度学习方法，以直接优化信标的放置和位置推断以达到本地化精度。然后，我们将注意力转移到了由任务驱动的机器人及其控制器优化的相关问题上。在我们的工作中，我们首先提出基于多任务增强学习的数据有效算法。我们的方法通过利用能够在物理设计的空间上概括设计条件的控制器，有效地直接优化了物理设计和控制参数，以直接优化任务性能。然后，我们对此进行跟进，以允许对离散形态参数（例如四肢的数字和配置）进行优化。最后，我们通过探索优化的软机器人的制造和部署来得出结论。

translated by 谷歌翻译

HTML版本

Variational Physics Informed Neural Networks: the role of quadratures and test functions

Stefano Berrone , Claudio Canuto , Moreno Pintore

分类：机器学习

2021-09-05

在这项工作中，我们分析了不同程度的不同精度和分段多项式测试函数如何影响变异物理学知情神经网络（VPINN）的收敛速率，同时解决椭圆边界边界值问题，如何影响变异物理学知情神经网络（VPINN）的收敛速率。使用依靠INF-SUP条件的Petrov-Galerkin框架，我们在精确解决方案和合适的计算神经网络的合适的高阶分段插值之间得出了一个先验误差估计。数值实验证实了理论预测并突出了INF-SUP条件的重要性。我们的结果表明，以某种方式违反直觉，对于平滑解决方案，实现高衰减率的最佳策略在选择最低多项式程度的测试功能方面，同时使用适当高精度的正交公式。

translated by 谷歌翻译

Global Optimality Guarantees For Policy Gradient Methods

Jalaj Bhandari , Daniel Russo

分类：机器学习 | (统计)机器学习

2019-06-05

策略梯度方法适用于复杂的，不理解的，通过对参数化的策略进行随机梯度下降来控制问题。不幸的是，即使对于可以通过标准动态编程技术解决的简单控制问题，策略梯度算法也会面临非凸优化问题，并且被广泛理解为仅收敛到固定点。这项工作确定了结构属性 - 通过几个经典控制问题共享 - 确保策略梯度目标函数尽管是非凸面，但没有次优的固定点。当这些条件得到加强时，该目标满足了产生收敛速率的Polyak-lojasiewicz（梯度优势）条件。当其中一些条件放松时，我们还可以在任何固定点的最佳差距上提供界限。

translated by 谷歌翻译

Learning Mean Field Games: A Survey

Mathieu Laurière , Sarah Perrin , Matthieu Geist , Olivier Pietquin

分类：机器学习 | 人工智能

2022-05-25

具有很多玩家的非合作和合作游戏具有许多应用程序，但是当玩家数量增加时，通常仍然很棘手。由Lasry和Lions以及Huang，Caines和Malham \'E引入的，平均野外运动会（MFGS）依靠平均场外近似值，以使玩家数量可以成长为无穷大。解决这些游戏的传统方法通常依赖于以完全了解模型的了解来求解部分或随机微分方程。最近，增强学习（RL）似乎有望解决复杂问题。通过组合MFGS和RL，我们希望在人口规模和环境复杂性方面能够大规模解决游戏。在这项调查中，我们回顾了有关学习MFG中NASH均衡的最新文献。我们首先确定最常见的设置（静态，固定和进化）。然后，我们为经典迭代方法（基于最佳响应计算或策略评估）提供了一个通用框架，以确切的方式解决MFG。在这些算法和与马尔可夫决策过程的联系的基础上，我们解释了如何使用RL以无模型的方式学习MFG解决方案。最后，我们在基准问题上介绍了数值插图，并以某些视角得出结论。

translated by 谷歌翻译

Generalised Policy Improvement with Geometric Policy Composition

Shantanu Thakoor , Mark Rowland , Diana Borsa , Will Dabney , Rémi Munos , André Barreto

分类： (统计)机器学习 | 机器学习

2022-06-17

我们介绍了一种改进政策改进的方法，该方法在基于价值的强化学习（RL）的贪婪方法与基于模型的RL的典型计划方法之间进行了插值。新方法建立在几何视野模型（GHM，也称为伽马模型）的概念上，该模型对给定策略的折现状态验证分布进行了建模。我们表明，我们可以通过仔细的基本策略GHM的仔细组成，而无需任何其他学习，可以评估任何非马尔科夫策略，以固定的概率在一组基本马尔可夫策略之间切换。然后，我们可以将广义政策改进（GPI）应用于此类非马尔科夫政策的收集，以获得新的马尔可夫政策，通常将其表现优于其先驱。我们对这种方法提供了彻底的理论分析，开发了转移和标准RL的应用，并在经验上证明了其对标准GPI的有效性，对充满挑战的深度RL连续控制任务。我们还提供了GHM培训方法的分析，证明了关于先前提出的方法的新型收敛结果，并显示了如何在深度RL设置中稳定训练这些模型。

translated by 谷歌翻译

Adapting the Exploration Rate for Value-of-Information-Based Reinforcement Learning

Isaac J. Sledge , Jose C. Principe

分类：机器学习 | 人工智能

2022-12-20

In this paper, we consider the problem of adjusting the exploration rate when using value-of-information-based exploration. We do this by converting the value-of-information optimization into a problem of finding equilibria of a flow for a changing exploration rate. We then develop an efficient path-following scheme for converging to these equilibria and hence uncovering optimal action-selection policies. Under this scheme, the exploration rate is automatically adapted according to the agent's experiences. Global convergence is theoretically assured. We first evaluate our exploration-rate adaptation on the Nintendo GameBoy games Centipede and Millipede. We demonstrate aspects of the search process. We show that our approach yields better policies in fewer episodes than conventional search strategies relying on heuristic, annealing-based exploration-rate adjustments. We then illustrate that these trends hold for deep, value-of-information-based agents that learn to play ten simple games and over forty more complicated games for the Nintendo GameBoy system. Performance either near or well above the level of human play is observed.

translated by 谷歌翻译

Deep Reinforcement Learning for Turbulence Modeling in Large Eddy Simulations

Marius Kurz , Philipp Offenhäuser , Andrea Beck

分类：人工智能 | 机器学习

2022-06-21

在过去的几年中，有监督的学习（SL）已确立了自己的最新数据驱动湍流建模。在SL范式中，基于数据集对模型进行了训练，该数据集通常通过应用相应的滤波器函数来从高保真解决方案中计算出先验的模型，该函数将已分离的和未分辨的流量尺度分开。对于隐式过滤的大涡模拟（LES），此方法是不可行的，因为在这里，使用的离散化本身是隐式滤波器函数。因此，通常不知道确切的滤波器形式，因此，即使有完整的解决方案可用，也无法计算相应的闭合项。强化学习（RL）范式可用于避免通过先前获得的培训数据集训练，而是通过直接与动态LES环境本身进行交互来避免这种不一致。这允许通过设计将潜在复杂的隐式LES过滤器纳入训练过程中。在这项工作中，我们应用了一个增强学习框架，以找到最佳的涡流粘度，以隐式过滤强制均匀的各向同性湍流的大型涡流模拟。为此，我们将基于卷积神经网络的策略网络制定湍流建模的任务作为RL任务，该杂志神经网络仅基于局部流量状态在时空中动态地适应LES中的涡流效率。我们证明，受过训练的模型可以提供长期稳定的模拟，并且在准确性方面，它们的表现优于建立的分析模型。此外，这些模型可以很好地推广到其他决议和离散化。因此，我们证明RL可以为一致，准确和稳定的湍流建模提供一个框架，尤其是对于隐式过滤的LE。

translated by 谷歌翻译

Risk-Adaptive Approaches to Learning and Decision Making: A Survey

Johannes O. Royset

分类：机器学习 | (统计)机器学习

2022-12-01

Uncertainty is prevalent in engineering design, statistical learning, and decision making broadly. Due to inherent risk-averseness and ambiguity about assumptions, it is common to address uncertainty by formulating and solving conservative optimization models expressed using measure of risk and related concepts. We survey the rapid development of risk measures over the last quarter century. From its beginning in financial engineering, we recount their spread to nearly all areas of engineering and applied mathematics. Solidly rooted in convex analysis, risk measures furnish a general framework for handling uncertainty with significant computational and theoretical advantages. We describe the key facts, list several concrete algorithms, and provide an extensive list of references for further reading. The survey recalls connections with utility theory and distributionally robust optimization, points to emerging applications areas such as fair machine learning, and defines measures of reliability.

translated by 谷歌翻译

Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks

N. Sukumar , Ankit Srivastava

分类：神经与进化计算

2021-04-17

在本文中，我们介绍了一种基于距离场的新方法，以确保物理知识的深神经网络中的边界条件。众所周知，满足网状紫外线和颗粒方法中的Dirichlet边界条件的挑战是众所周知的。该问题在物理信息的开发中也是相关的，用于解决部分微分方程的解。我们在人工神经网络中介绍几何意识的试验功能，以改善偏微分方程的深度学习培训。为此，我们使用来自建设性的实体几何（R函数）和广义的等级坐标（平均值潜在字段）的概念来构建$ \ phi $，对域边界的近似距离函数。要恰好施加均匀的Dirichlet边界条件，试验函数乘以\ PHI $乘以PINN近似，并且通过Transfinite插值的泛化用于先验满足的不均匀Dirichlet（必要），Neumann（自然）和Robin边界复杂几何形状的条件。在这样做时，我们消除了与搭配方法中的边界条件满意相关的建模误差，并确保以ritz方法点点到运动可视性。我们在具有仿射和弯曲边界的域上的线性和非线性边值问题的数值解。 1D中的基准问题，用于线性弹性，平面扩散和光束弯曲;考虑了泊松方程的2D，考虑了双音态方程和非线性欧克隆方程。该方法延伸到更高的尺寸，并通过在4D超立方套上解决彼此与均匀的Dirichlet边界条件求泊松问题来展示其使用。该研究提供了用于网眼分析的途径，以在没有域离散化的情况下在确切的几何图形上进行。

translated by 谷歌翻译

Scientific Machine Learning through Physics-Informed Neural Networks: Where we are and What's next

Salvatore Cuomo , Vincenzo Schiano di Cola , Fabio Giampaolo , Gianluigi Rozza , Maziar Raissi , Francesco Piccialli

分类：机器学习 | 人工智能

2022-01-14

物理信息的神经网络（PINN）是神经网络（NNS），它们作为神经网络本身的组成部分编码模型方程，例如部分微分方程（PDE）。如今，PINN是用于求解PDE，分数方程，积分分化方程和随机PDE的。这种新颖的方法已成为一个多任务学习框架，在该框架中，NN必须在减少PDE残差的同时拟合观察到的数据。本文对PINNS的文献进行了全面的综述：虽然该研究的主要目标是表征这些网络及其相关的优势和缺点。该综述还试图将出版物纳入更广泛的基于搭配的物理知识的神经网络，这些神经网络构成了香草·皮恩（Vanilla Pinn）以及许多其他变体，例如物理受限的神经网络（PCNN），各种HP-VPINN，变量HP-VPINN，VPINN，VPINN，变体。和保守的Pinn（CPINN）。该研究表明，大多数研究都集中在通过不同的激活功能，梯度优化技术，神经网络结构和损耗功能结构来定制PINN。尽管使用PINN的应用范围广泛，但通过证明其在某些情况下比有限元方法（FEM）等经典数值技术更可行的能力，但仍有可能的进步，最著名的是尚未解决的理论问题。

translated by 谷歌翻译

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

Jean Tarbouriech , Matteo Pirotta , Michal Valko , Alessandro Lazaric

分类：机器学习 | (统计)机器学习

2020-07-13

在线强化学习（RL）中的挑战之一是代理人需要促进对环境的探索和对样品的利用来优化其行为。无论我们是否优化遗憾，采样复杂性，状态空间覆盖范围或模型估计，我们都需要攻击不同的勘探开发权衡。在本文中，我们建议在分离方法组成的探索 - 剥削问题：1）“客观特定”算法（自适应）规定哪些样本以收集到哪些状态，似乎它可以访问a生成模型（即环境的模拟器）; 2）负责尽可能快地生成规定样品的“客观无关的”样品收集勘探策略。建立最近在随机最短路径问题中进行探索的方法，我们首先提供一种算法，它给出了每个状态动作对所需的样本$ B（S，a）$的样本数量，需要$ \ tilde {o} （bd + d ^ {3/2} s ^ 2 a）收集$ b = \ sum_ {s，a} b（s，a）$所需样本的$时间步骤，以$ s $各国，$ a $行动和直径$ d $。然后我们展示了这种通用探索算法如何与“客观特定的”策略配对，这些策略规定了解决各种设置的样本要求 - 例如，模型估计，稀疏奖励发现，无需无成本勘探沟通MDP - 我们获得改进或新颖的样本复杂性保证。

translated by 谷歌翻译

Deep Learning Methods for Partial Differential Equations and Related Parameter Identification Problems

Derick Nganyu Tanyu , Jianfeng Ning , Tom Freudenberg , Nick Heilenkötter , Andreas Rademacher , Uwe Iben , Peter Maass

分类：机器学习

2022-12-06

Recent years have witnessed a growth in mathematics for deep learning--which seeks a deeper understanding of the concepts of deep learning with mathematics, and explores how to make it more robust--and deep learning for mathematics, where deep learning algorithms are used to solve problems in mathematics. The latter has popularised the field of scientific machine learning where deep learning is applied to problems in scientific computing. Specifically, more and more neural network architectures have been developed to solve specific classes of partial differential equations (PDEs). Such methods exploit properties that are inherent to PDEs and thus solve the PDEs better than classical feed-forward neural networks, recurrent neural networks, and convolutional neural networks. This has had a great impact in the area of mathematical modeling where parametric PDEs are widely used to model most natural and physical processes arising in science and engineering, In this work, we review such methods and extend them for parametric studies as well as for solving the related inverse problems. We equally proceed to show their relevance in some industrial applications.

translated by 谷歌翻译

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

Amrit Singh Bedi , Anjaly Parayil , Junyu Zhang , Mengdi Wang , Alec Koppel

分类：机器学习 | 人工智能 | (统计)机器学习

2021-06-15

Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model. Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates. In tabular Markov Decision Problems (MDPs), under persistent exploration and suitable parameterization, global optimality may be obtained. By contrast, in continuous space, the non-convexity poses a pathological challenge as evidenced by existing convergence results being mostly limited to stationarity or arbitrary local extrema. To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space. Doing so invalidates smoothness conditions of the score function common to PG. Thus, we establish how the convergence rate to stationarity depends on the policy's tail index alpha, a Holder continuity parameter, integrability conditions, and an exploration tolerance parameter introduced here for the first time. Further, we characterize the dependence of the set of local maxima on the tail index through an exit and transition time analysis of a suitably defined Markov chain, identifying that policies associated with Levy Processes of a heavier tail converge to wider peaks. This phenomenon yields improved stability to perturbations in supervised learning, which we corroborate also manifests in improved performance of policy search, especially when myopic and farsighted incentives are misaligned.

translated by 谷歌翻译