智能论文笔记

我们考虑了从相对较小的I.I.D.估算大因果多树的骨骼的问题。样本。这是由于确定因果结构的问题，当变量数量与样本量非常大，例如基因调节网络中的问题。我们给出了一种算法，该算法在此类设置中以高精度恢复了树。该算法在基本上没有分布或建模假设下起作用，而不是一些轻度的非分类条件。

translated by 谷歌翻译

在本文中，我们使用称为BSGD（块随机梯度下降）的非常通用的公式研究凸优化。在每次迭代中，有些但没有必要的参数所有组件都会更新。更新的方向可以是两种可能性之一：（i）使用一阶近似计算的噪声浪费的测量，或（ii）使用可能被噪声损坏的函数值计算的近似梯度。该公式包含大多数当前使用的随机梯度方法。我们基于随机近似理论，建立了BSGD收敛到全局最小值的条件。然后，我们通过数值实验来验证预测的收敛性。结果结果表明，当使用近似梯度时，BSGD会收敛，而基于动量的方法可能会差异。但是，不仅是我们的BSGD，还包括标准（全级别）梯度下降，以及各种基于动量的方法，即使有嘈杂的梯度也收敛。

translated by 谷歌翻译

In this paper, we study the almost sure boundedness and the convergence of the stochastic approximation (SA) algorithm. At present, most available convergence proofs are based on the ODE method, and the almost sure boundedness of the iterations is an assumption and not a conclusion. In Borkar-Meyn (2000), it is shown that if the ODE has only one globally attractive equilibrium, then under additional assumptions, the iterations are bounded almost surely, and the SA algorithm converges to the desired solution. Our objective in the present paper is to provide an alternate proof of the above, based on martingale methods, which are simpler and less technical than those based on the ODE method. As a prelude, we prove a new sufficient condition for the global asymptotic stability of an ODE. Next we prove a ``converse'' Lyapunov theorem on the existence of a suitable Lyapunov function with a globally bounded Hessian, for a globally exponentially stable system. Both theorems are of independent interest to researchers in stability theory. Then, using these results, we provide sufficient conditions for the almost sure boundedness and the convergence of the SA algorithm. We show through examples that our theory covers some situations that are not covered by currently known results, specifically Borkar-Meyn (2000).

translated by 谷歌翻译

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Rajeeva L. Karandikar , M. Vidyasagar

分类： (统计)机器学习 | 人工智能 | 机器学习

2021-09-08

随机近似算法是一种广泛使用的概率方法，用于查找矢量值构造的零，仅当函数的嘈杂测量值可用时。在迄今为止的文献中，可以区分“同步”更新，从而每次更新当前猜测的每个组件，以及'“同步”更新，从而更新一个组件。原则上，也可以在每次瞬间更新一些但不是全部的$ \ theta_t $的组件，这些组件可能被称为“批处理异步随机近似”（BASA）。另外，还可以在使用“本地”时钟与“全局”时钟之间有所区别。在本文中，我们提出了一种统一的配方异步随机近似（BASA）算法，并开发了一种通用方法，以证明这种算法会融合，而与使用是否使用了全球或本地时钟。这些融合证明利用了比现有结果较弱的假设。例如：当使用本地时钟时，现有的收敛证明要求测量噪声是I.I.D序列。在这里，假定测量误差形成了martingale差异序列。同样，迄今为止的所有结果都假设随机步骤大小满足了罗宾斯 - 单月条件的概率类似物。我们通过基础马尔可夫流程的不可约性的纯粹确定性条件代替了这一点。作为加固学习的特定应用，我们介绍了时间差算法$ td（0）$的``批次''版本，以进行价值迭代，以及$ q $ - 学习算法，以查找最佳操作值函数，还允许使用本地时钟而不是全局时钟。在所有情况下，我们在温和的条件下都比现有文献建立了这些算法的融合。

translated by 谷歌翻译