Feature transformation for AI is an essential task to boost the effectiveness and interpretability of machine learning (ML). Feature transformation aims to transform original data to identify an optimal feature space that enhances the performances of a downstream ML model. Existing studies either combines preprocessing, feature selection, and generation skills to empirically transform data, or automate feature transformation by machine intelligence, such as reinforcement learning. However, existing studies suffer from: 1) high-dimensional non-discriminative feature space; 2) inability to represent complex situational states; 3) inefficiency in integrating local and global feature information. To fill the research gap, we formulate the feature transformation task as an iterative, nested process of feature generation and selection, where feature generation is to generate and add new features based on original features, and feature selection is to remove redundant features to control the size of feature space. Finally, we present extensive experiments and case studies to illustrate 24.7\% improvements in F1 scores compared with SOTAs and robustness in high-dimensional data.
translated by 谷歌翻译
功能转换旨在通过数学转换现有功能来提取良好的表示(功能)空间。应对维度的诅咒,增强模型概括,克服数据稀疏性并扩大经典模型的可用性至关重要。当前的研究重点是基于领域的知识特征工程或学习潜在表示;然而,这些方法并非完全自动化,不能产生可追溯和最佳的表示空间。在重建机器学习任务的功能空间时,可以同时解决这些限制吗?在这项扩展研究中,我们提出了一个用于特征转化的自优化框架。为了取得更好的性能,我们通过(1)获得高级状态表示来改善初步工作,以使加强代理能够更好地理解当前功能集; (2)解决Q值高估的Q值高估,以学习无偏见和有效的政策。最后,为了使实验比初步工作更具说服力,我们结论是通过五个数据集添加异常检测任务,评估各种状态表示方法,并比较不同的培训策略。广泛的实验和案例研究表明,我们的工作更有效和更高。
translated by 谷歌翻译
Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks, which has achieved great success in real-world applications. Current AFE methods mainly focus on improving the effectiveness of the produced features, but ignoring the low-efficiency issue for large-scale deployment. Therefore, in this work, we propose a generic framework to improve the efficiency of AFE. Specifically, we construct the AFE pipeline based on reinforcement learning setting, where each feature is assigned an agent to perform feature transformation \com{and} selection, and the evaluation score of the produced features in downstream tasks serve as the reward to update the policy. We improve the efficiency of AFE in two perspectives. On the one hand, we develop a Feature Pre-Evaluation (FPE) Model to reduce the sample size and feature size that are two main factors on undermining the efficiency of feature evaluation. On the other hand, we devise a two-stage policy training strategy by running FPE on the pre-evaluation task as the initialization of the policy to avoid training policy from scratch. We conduct comprehensive experiments on 36 datasets in terms of both classification and regression tasks. The results show $2.9\%$ higher performance in average and 2x higher computational efficiency comparing to state-of-the-art AFE methods.
translated by 谷歌翻译
Graph mining tasks arise from many different application domains, ranging from social networks, transportation to E-commerce, etc., which have been receiving great attention from the theoretical and algorithmic design communities in recent years, and there has been some pioneering work employing the research-rich Reinforcement Learning (RL) techniques to address graph data mining tasks. However, these graph mining methods and RL models are dispersed in different research areas, which makes it hard to compare them. In this survey, we provide a comprehensive overview of RL and graph mining methods and generalize these methods to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method descriptions, open-source codes, and benchmark datasets of GRL methods. Furthermore, we propose important directions and challenges to be solved in the future. As far as we know, this is the latest work on a comprehensive survey of GRL, this work provides a global view and a learning resource for scholars. In addition, we create an online open-source for both interested scholars who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
translated by 谷歌翻译
深度强化学习(DRL)赋予了各种人工智能领域,包括模式识别,机器人技术,推荐系统和游戏。同样,图神经网络(GNN)也证明了它们在图形结构数据的监督学习方面的出色表现。最近,GNN与DRL用于图形结构环境的融合引起了很多关注。本文对这些混合动力作品进行了全面评论。这些作品可以分为两类:(1)算法增强,其中DRL和GNN相互补充以获得更好的实用性; (2)特定于应用程序的增强,其中DRL和GNN相互支持。这种融合有效地解决了工程和生命科学方面的各种复杂问题。基于审查,我们进一步分析了融合这两个领域的适用性和好处,尤其是在提高通用性和降低计算复杂性方面。最后,集成DRL和GNN的关键挑战以及潜在的未来研究方向被突出显示,这将引起更广泛的机器学习社区的关注。
translated by 谷歌翻译
深度强化学习(DRL)和深度多机构的强化学习(MARL)在包括游戏AI,自动驾驶汽车,机器人技术等各种领域取得了巨大的成功。但是,众所周知,DRL和Deep MARL代理的样本效率低下,即使对于相对简单的问题设置,通常也需要数百万个相互作用,从而阻止了在实地场景中的广泛应用和部署。背后的一个瓶颈挑战是众所周知的探索问题,即如何有效地探索环境和收集信息丰富的经验,从而使政策学习受益于最佳研究。在稀疏的奖励,吵闹的干扰,长距离和非平稳的共同学习者的复杂环境中,这个问题变得更加具有挑战性。在本文中,我们对单格和多代理RL的现有勘探方法进行了全面的调查。我们通过确定有效探索的几个关键挑战开始调查。除了上述两个主要分支外,我们还包括其他具有不同思想和技术的著名探索方法。除了算法分析外,我们还对一组常用基准的DRL进行了全面和统一的经验比较。根据我们的算法和实证研究,我们终于总结了DRL和Deep Marl中探索的公开问题,并指出了一些未来的方向。
translated by 谷歌翻译
尽管深度强化学习(RL)最近取得了许多成功,但其方法仍然效率低下,这使得在数据方面解决了昂贵的许多问题。我们的目标是通过利用未标记的数据中的丰富监督信号来进行学习状态表示,以解决这一问题。本文介绍了三种不同的表示算法,可以访问传统RL算法使用的数据源的不同子集使用:(i)GRICA受到独立组件分析(ICA)的启发,并训练深层神经网络以输出统计独立的独立特征。输入。 Grica通过最大程度地减少每个功能与其他功能之间的相互信息来做到这一点。此外,格里卡仅需要未分类的环境状态。 (ii)潜在表示预测(LARP)还需要更多的上下文:除了要求状态作为输入外,它还需要先前的状态和连接它们的动作。该方法通过预测当前状态和行动的环境的下一个状态来学习状态表示。预测器与图形搜索算法一起使用。 (iii)重新培训通过训练深层神经网络来学习国家表示,以学习奖励功能的平滑版本。该表示形式用于预处理输入到深度RL,而奖励预测指标用于奖励成型。此方法仅需要环境中的状态奖励对学习表示表示。我们发现,每种方法都有其优势和缺点,并从我们的实验中得出结论,包括无监督的代表性学习在RL解决问题的管道中可以加快学习的速度。
translated by 谷歌翻译
动机:医学图像分析涉及帮助医师对病变或解剖结构进行定性和定量分析的任务,从而显着提高诊断和预后的准确性和可靠性。传统上,这些任务由医生或医学物理学家完成,并带来两个主要问题:(i)低效率; (ii)受个人经验的偏见。在过去的十年中,已经应用了许多机器学习方法来加速和自动化图像分析过程。与受监督和无监督的学习模型的大量部署相比,在医学图像分析中使用强化学习的尝试很少。这篇评论文章可以作为相关研究的垫脚石。意义:从我们的观察结果来看,尽管近年来增强学习逐渐增强了动力,但医学分析领域的许多研究人员发现很难理解和部署在诊所中。一个原因是缺乏组织良好的评论文章,针对缺乏专业计算机科学背景的读者。本文可能没有提供医学图像分析中所有强化学习模型的全面列表,而是可以帮助读者学习如何制定和解决他们的医学图像分析研究作为强化学习问题。方法和结果:我们从Google Scholar和PubMed中选择了已发表的文章。考虑到相关文章的稀缺性,我们还提供了一些出色的最新预印本。根据图像分析任务的类型对论文进行仔细审查和分类。我们首先回顾了强化学习的基本概念和流行模型。然后,我们探讨了增强学习模型在具有里程碑意义的检测中的应用。最后,我们通过讨论审查的强化学习方法的局限性和可能的​​改进来结束这篇文章。
translated by 谷歌翻译
采用合理的策略是具有挑战性的,但对于智能代理商的智能代理人至关重要,其资源有限,在危险,非结构化和动态环境中工作,以改善系统实用性,降低整体成本并增加任务成功概率。深度强化学习(DRL)帮助组织代理的行为和基于其状态的行为,并代表复杂的策略(行动的组成)。本文提出了一种基于贝叶斯链条的新型分层策略分解方法,将复杂的政策分为几个简单的子手段,并将其作为贝叶斯战略网络(BSN)组织。我们将这种方法整合到最先进的DRL方法中,软演奏者 - 批评者(SAC),并通过组织几个子主管作为联合政策来构建相应的贝叶斯软演奏者(BSAC)模型。我们将建议的BSAC方法与标准连续控制基准(Hopper-V2,Walker2D-V2和Humanoid-V2)在SAC和其他最先进的方法(例如TD3,DDPG和PPO)中进行比较 - Mujoco与Openai健身房环境。结果表明,BSAC方法的有希望的潜力可显着提高训练效率。可以从https://github.com/herolab-uga/bsac访问BSAC的开源代码。
translated by 谷歌翻译
资产分配(或投资组合管理)是确定如何最佳将有限预算的资金分配给一系列金融工具/资产(例如股票)的任务。这项研究调查了使用无模型的深RL代理应用于投资组合管理的增强学习(RL)的性能。我们培训了几个RL代理商的现实股票价格,以学习如何执行资产分配。我们比较了这些RL剂与某些基线剂的性能。我们还比较了RL代理,以了解哪些类别的代理表现更好。从我们的分析中,RL代理可以执行投资组合管理的任务,因为它们的表现明显优于基线代理(随机分配和均匀分配)。四个RL代理(A2C,SAC,PPO和TRPO)总体上优于最佳基线MPT。这显示了RL代理商发现更有利可图的交易策略的能力。此外,基于价值和基于策略的RL代理之间没有显着的性能差异。演员批评者的表现比其他类型的药物更好。同样,在政策代理商方面的表现要好,因为它们在政策评估方面更好,样品效率在投资组合管理中并不是一个重大问题。这项研究表明,RL代理可以大大改善资产分配,因为它们的表现优于强基础。基于我们的分析,在政策上,参与者批评的RL药物显示出最大的希望。
translated by 谷歌翻译
由于数据量增加,金融业的快速变化已经彻底改变了数据处理和数据分析的技术,并带来了新的理论和计算挑战。与古典随机控制理论和解决财务决策问题的其他分析方法相比,解决模型假设的财务决策问题,强化学习(RL)的新发展能够充分利用具有更少模型假设的大量财务数据并改善复杂的金融环境中的决策。该调查纸目的旨在审查最近的资金途径的发展和使用RL方法。我们介绍了马尔可夫决策过程,这是许多常用的RL方法的设置。然后引入各种算法,重点介绍不需要任何模型假设的基于价值和基于策略的方法。连接是用神经网络进行的,以扩展框架以包含深的RL算法。我们的调查通过讨论了这些RL算法在金融中各种决策问题中的应用,包括最佳执行,投资组合优化,期权定价和对冲,市场制作,智能订单路由和Robo-Awaring。
translated by 谷歌翻译
Hybrid FSO/RF system requires an efficient FSO and RF link switching mechanism to improve the system capacity by realizing the complementary benefits of both the links. The dynamics of network conditions, such as fog, dust, and sand storms compound the link switching problem and control complexity. To address this problem, we initiate the study of deep reinforcement learning (DRL) for link switching of hybrid FSO/RF systems. Specifically, in this work, we focus on actor-critic called Actor/Critic-FSO/RF and Deep-Q network (DQN) called DQN-FSO/RF for FSO/RF link switching under atmospheric turbulences. To formulate the problem, we define the state, action, and reward function of a hybrid FSO/RF system. DQN-FSO/RF frequently updates the deployed policy that interacts with the environment in a hybrid FSO/RF system, resulting in high switching costs. To overcome this, we lift this problem to ensemble consensus-based representation learning for deep reinforcement called DQNEnsemble-FSO/RF. The proposed novel DQNEnsemble-FSO/RF DRL approach uses consensus learned features representations based on an ensemble of asynchronous threads to update the deployed policy. Experimental results corroborate that the proposed DQNEnsemble-FSO/RF's consensus-learned features switching achieves better performance than Actor/Critic-FSO/RF, DQN-FSO/RF, and MyOpic for FSO/RF link switching while keeping the switching cost significantly low.
translated by 谷歌翻译
Reinforcement Learning (RL) is a popular machine learning paradigm where intelligent agents interact with the environment to fulfill a long-term goal. Driven by the resurgence of deep learning, Deep RL (DRL) has witnessed great success over a wide spectrum of complex control tasks. Despite the encouraging results achieved, the deep neural network-based backbone is widely deemed as a black box that impedes practitioners to trust and employ trained agents in realistic scenarios where high security and reliability are essential. To alleviate this issue, a large volume of literature devoted to shedding light on the inner workings of the intelligent agents has been proposed, by constructing intrinsic interpretability or post-hoc explainability. In this survey, we provide a comprehensive review of existing works on eXplainable RL (XRL) and introduce a new taxonomy where prior works are clearly categorized into model-explaining, reward-explaining, state-explaining, and task-explaining methods. We also review and highlight RL methods that conversely leverage human knowledge to promote learning efficiency and performance of agents while this kind of method is often ignored in XRL field. Some challenges and opportunities in XRL are discussed. This survey intends to provide a high-level summarization of XRL and to motivate future research on more effective XRL solutions. Corresponding open source codes are collected and categorized at https://github.com/Plankson/awesome-explainable-reinforcement-learning.
translated by 谷歌翻译
为了跟踪视频中的目标,当前的视觉跟踪器通常采用贪婪搜索每个帧中目标对象定位,也就是说,将选择最大响应分数的候选区域作为每个帧的跟踪结果。但是,我们发现这可能不是一个最佳选择,尤其是在遇到挑战性的跟踪方案(例如重闭塞和快速运动)时。为了解决这个问题,我们建议维护多个跟踪轨迹并将光束搜索策略应用于视觉跟踪,以便可以识别出更少的累积错误的轨迹。因此,本文介绍了一种新型的基于梁搜索策略的新型多代理增强学习策略,称为横梁。它主要是受图像字幕任务的启发,该任务将图像作为输入,并使用Beam搜索算法生成多种描述。因此,我们通过多个并行决策过程来将跟踪提出作为样本选择问题,每个过程旨在将一个样本作为每个帧的跟踪结果选择。每个维护的轨迹都与代理商相关联,以执行决策并确定应采取哪些操作来更新相关信息。处理所有帧时,我们将最大累积分数作为跟踪结果选择轨迹。在七个流行的跟踪基准数据集上进行了广泛的实验证实了所提出的算法的有效性。
translated by 谷歌翻译
虽然众所周知,探索在强化学习中发挥着关键作用(RL),但RL中连续控制任务的普遍探索策略主要基于朴素的各向同性高斯噪声,而不管动作空间与任务之间的因果关系,并考虑所有尺寸行动同样重要。在这项工作中,我们建议对原始行动空间进行干预,以发现行动空间与任务奖励之间的因果关系。我们提出了国家明智行动的方法(SWAR),该方法解决了行动空间冗余问题,促进了RL中的因果关系。我们在RL任务中制定因果关系,作为国家依赖的动作空间选择问题,并提出了两个实际算法作为解决方案。第一种方法,TD-SWAR,在时间差异学习期间检测任务相关的动作,而第二种方法Dyn-Swar通过动态模型预测揭示了重要的动态。凭经验,这两种方法都提供了了解RL代理商所做的决定并提高行动 - 冗余任务中的学习效率的方法。
translated by 谷歌翻译
在这项努力中,我们考虑一种加强学习(RL)技术,用于解决具有复杂奖励信号的个性化任务。特别是,我们的方法是基于状态空间聚类,使用简单的$ k $ -means算法以及网络架构和优化算法的传统选择。数值示例展示了不同RL程序的效率,并用于说明该技术加速了代理的学习能力,并不限制代理商的性能。
translated by 谷歌翻译
In order to avoid conventional controlling methods which created obstacles due to the complexity of systems and intense demand on data density, developing modern and more efficient control methods are required. In this way, reinforcement learning off-policy and model-free algorithms help to avoid working with complex models. In terms of speed and accuracy, they become prominent methods because the algorithms use their past experience to learn the optimal policies. In this study, three reinforcement learning algorithms; DDPG, TD3 and SAC have been used to train Fetch robotic manipulator for four different tasks in MuJoCo simulation environment. All of these algorithms are off-policy and able to achieve their desired target by optimizing both policy and value functions. In the current study, the efficiency and the speed of these three algorithms are analyzed in a controlled environment.
translated by 谷歌翻译
由于共同国家行动空间相对于代理人的数量,多代理强化学习(MARL)中的政策学习(MARL)是具有挑战性的。为了实现更高的可伸缩性,通过分解执行(CTDE)的集中式培训范式被MARL中的分解结构广泛采用。但是,我们观察到,即使在简单的矩阵游戏中,合作MARL中现有的CTDE算法也无法实现最佳性。为了理解这种现象,我们引入了一个具有政策分解(GPF-MAC)的广义多代理参与者批评的框架,该框架的特征是对分解的联合政策的学习,即,每个代理人的政策仅取决于其自己的观察行动历史。我们表明,最受欢迎的CTDE MARL算法是GPF-MAC的特殊实例,可能会陷入次优的联合政策中。为了解决这个问题,我们提出了一个新颖的转型框架,该框架将多代理的MDP重新制定为具有连续结构的特殊“单位代理” MDP,并且可以允许使用现成的单机械加固学习(SARL)算法来有效地学习相应的多代理任务。这种转换保留了SARL算法的最佳保证,以合作MARL。为了实例化此转换框架,我们提出了一个转换的PPO,称为T-PPO,该PPO可以在有限的多代理MDP中进行理论上执行最佳的策略学习,并在一系列合作的多代理任务上显示出明显的超出性能。
translated by 谷歌翻译
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policybased methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
translated by 谷歌翻译
强化学习和最近的深度增强学习是解决如Markov决策过程建模的顺序决策问题的流行方法。问题和选择算法和超参数的RL建模需要仔细考虑,因为不同的配置可能需要完全不同的性能。这些考虑因素主要是RL专家的任务;然而,RL在研究人员和系统设计师不是RL专家的其他领域中逐渐变得流行。此外,许多建模决策,例如定义状态和动作空间,批次的大小和批量更新的频率以及时间戳的数量通常是手动进行的。由于这些原因,RL框架的自动化不同组成部分具有重要意义,近年来它引起了很多关注。自动RL提供了一个框架,其中RL的不同组件包括MDP建模,算法选择和超参数优化是自动建模和定义的。在本文中,我们探讨了可以在自动化RL中使用的文献和目前的工作。此外,我们讨论了Autorl中的挑战,打开问题和研究方向。
translated by 谷歌翻译