我们研究了在线马尔可夫决策过程(MDP),具有对抗性变化的损失功能和已知过渡。我们选择动态遗憾作为绩效度量,定义为学习者和任何可行的变化策略序列之间的绩效差异。这项措施严格比标准的静态遗憾要强得多,该标准遗憾的是,基准通过固定的政策将学习者的绩效表现为学习者的表现。我们考虑了三种在线MDP的基础模型,包括无情节循环随机路径(SSP),情节SSP和Infinite-Horizo​​n MDP。对于这三个模型,我们提出了新颖的在线集合算法并分别建立了动态​​遗憾保证,在这种情况下,情节性(无环)SSP的结果在时间范围和某些非平稳性度量方面是最佳的最低限度。此外,当学习者遇到的在线环境是可以预测的时,我们设计了改进的算法并为情节(无环)SSP实现更好的动态遗憾界限;此外,我们证明了无限 - 摩恩MDP的不可能结果。
translated by 谷歌翻译
当培训数据共享与即将到来的测试样本相同的分布时,标准监督学习范式有效地工作。但是,在现实世界中,通常会违反此假设,尤其是在以在线方式出现测试数据时。在本文中,我们制定和调查了在线标签转移(OLAS)的问题:学习者从标记的离线数据训练初始模型,然后将其部署到未标记的在线环境中,而基础标签分布会随着时间的推移而变化,但标签 - 条件密度没有。非平稳性和缺乏监督使问题具有挑战性。为了解决难度,我们构建了一个新的无偏风险估计器,该风险估计器利用了未标记的数据,该数据表现出许多良性特性,尽管具有潜在的非跨性别性。在此基础上,我们提出了新颖的在线合奏算法来应对环境的非平稳性。我们的方法享有最佳的动态遗憾,表明该性能与千里眼的千里眼竞争,后者是事后看来的在线环境,然后选择每轮的最佳决定。获得的动态遗憾结合量表与标签分布转移的强度和模式,因此在OLAS问题中表现出适应性。进行广泛的实验以验证有效性和支持我们的理论发现。
translated by 谷歌翻译
传统的机器学习研究通常假设学习过程不变的重要因素。随着当今机器学习的巨大成功,越来越多的实用任务,尤其是那些涉及开放环境场景的那些重要因素可能会发生变化,称为开放环境机器学习(Open ML),在本文中都存在于社区中。显然,对于机器学习从近距离环境转变为开放环境来说,这是一个巨大的挑战。它变得更加具有挑战性,因为在各种大数据任务中,数据通常会随着时间的流逝而累积,例如流,而在收集所有数据之后,很难训练机器学习模型。本文简要介绍了这一研究领域的一些进步,重点介绍了有关新的类别,减少/增量特征,变化的数据分布,各种学习目标的技术,并讨论了一些理论问题。
translated by 谷歌翻译
我们在非静止环境中调查在线凸优化,然后选择\ emph {动态后悔}作为性能测量,定义为在线算法产生的累积损失与任何可行比较器序列之间的差异。让$ t $是$ p_t $ be的路径长度,基本上反映了环境的非平稳性,最先进的动态遗憾是$ \ mathcal {o}(\ sqrt {t( 1 + p_t)})$。虽然这一界限被证明是凸函数最佳的最低限度,但在本文中,我们证明可以进一步提高一些简单的问题实例的保证,特别是当在线功能平滑时。具体而言,我们提出了新的在线算法,可以利用平滑度并替换动态遗憾的$ t $替换依据\ {问题依赖性}数量:损耗函数梯度的变化,比较器序列的累积损失,以及比较器序列的累积损失最低术语的最低限度。这些数量是大多数$ \ mathcal {o}(t)$,良性环境中可能更小。因此,我们的结果适应了问题的内在难度,因为边界比现有结果更严格,以便在最坏的情况下保证相同的速率。值得注意的是,我们的算法只需要\ emph {一个}渐变,这与开发的方法共享相同的渐变查询复杂性,以优化静态遗憾。作为进一步的应用,我们将来自全信息设置的结果扩展到具有两点反馈的强盗凸优化,从而达到此类强盗任务的第一个相关的动态遗憾。
translated by 谷歌翻译
灵活的变送器网络(FTNET)是最近提出的生物合理的神经网络,并在处理时间空间数据时与最先进的模型实现了竞争性能。但是,关于FTNET的理论理解仍然存在开放问题。从近似和局部最小值的角度来看,这项工作调查了一个隐藏层FTNET的理论属性。在温和的假设下,我们表明:i)Ftnet是一个普遍的近似器; ii)FTNET的近似复杂度可以是指数相比于具有前馈/复发架构的实值神经网络的复杂性,并且在最坏情况下具有相同的顺序; III)任何本地FTNET都是全局最小值,这表明本地搜索算法可以收敛到全局最小值。我们的理论结果表明,FTNET可以有效地表达目标功能,并且对局部最小值没有担忧,这补充了FTNET的理论空白,并表现出改善FTNET的可能性。
translated by 谷歌翻译
模仿和学习高效市场的长期记忆是机器学习与金融经济学与顺序数据相互作用的根本问题。尽管存在这个问题的突出,目前的治疗方法要么仍然很大程度上限于启发式技术或依赖于期间地点或高斯假设。在本文中,我们提出了对研究有效市场的非周期性半参数(出现)过程。出现的过程被制定为一些已知过程的无限和函数,采用非周期性频谱估计来确定密钥超参数,从而具有使用长期内存,非实用性建模价格数据的功率和可能性,和非周期性光谱。从理论上,我们进一步表明,没有循环系统和高斯假设的均匀加入,一致性和渐近常态。在实践中,我们应用出现的过程来确定现实世界市场的效率。此外,我们还提供了两种替代方案:研究各种机器学习模型的长期难忘性,并开发潜在的状态空间模型,推断和预测时间序列。数值实验证实了我们提出的方法的优势。
translated by 谷歌翻译
本文调查了非静止线性匪徒的问题,其中未知的回归参数随着时间的推移而发展。现有的研究开发了各种算法并显示他们享受$ \ widetilde {\ mathcal {p_t ^ {1/3})$动态遗憾,其中$ t $是时间范围和$ p_t $是测量演化未知参数的波动的路径长度。在本文中,我们发现一个严肃的技术缺陷使其结果未接地,然后呈现一个FIX,它给出$ \ WidTilde {\ Mathcal {o}}(t ^ {3/4} p_t ^ {1/4} )$动态遗憾而不修改原始算法。此外,我们证明了代替使用复杂的机制,例如滑动窗口或加权罚款,简单的重启策略足以实现相同的遗憾保证。具体而言,我们设计了UCB型算法来平衡利用和探索,并定期重新启动它以处理未知参数的漂移。我们的方法享有$ \ widetilde {\ mathcal {o}}(t ^ {3/4} p_t ^ {1/4})$动态遗憾。请注意,为了实现这一界限,该算法需要Oracle知识路径长度$ P_T $。将强盗带式机制组合通过将我们的算法视为基础学习者,我们可以通过无参数方式实现相同的遗憾。实证研究还验证了我们方法的有效性。
translated by 谷歌翻译
Evolutionary algorithms (EAs) are a kind of nature-inspired general-purpose optimization algorithm, and have shown empirically good performance in solving various real-word optimization problems. During the past two decades, promising results on the running time analysis (one essential theoretical aspect) of EAs have been obtained, while most of them focused on isolated combinatorial optimization problems, which do not reflect the general-purpose nature of EAs. To provide a general theoretical explanation of the behavior of EAs, it is desirable to study their performance on general classes of combinatorial optimization problems. To the best of our knowledge, the only result towards this direction is the provably good approximation guarantees of EAs for the problem class of maximizing monotone submodular functions with matroid constraints. The aim of this work is to contribute to this line of research. Considering that many combinatorial optimization problems involve non-monotone or non-submodular objective functions, we study the general problem classes, maximizing submodular functions with/without a size constraint and maximizing monotone approximately submodular functions with a size constraint. We prove that a simple multi-objective EA called GSEMO-C can generally achieve good approximation guarantees in polynomial expected running time.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译