广泛适用的在线匹配问题中的挑战在于在未来输入不确定性时进行不可撤销的作业。大多数理论上的政策本质上都是近视或贪婪。在定期重复匹配过程的实际应用程序中,可以利用基础数据分布来更好地决策。我们提出了一个端到端的强化学习框架,用于根据历史数据的反复试验得出更好的匹配政策。我们设计了一组神经网络体系结构,设计功能表示,并在两个在线匹配问题中对它们进行经验评估:边缘加权的在线双方匹配和在线次级两部分匹配。我们表明,大多数学习方法在四个合成和现实世界数据集上的经典基线算法始终如一地表现更好。平均而言,我们提出的模型在各种合成和现实世界数据集上提高了3-10%的匹配质量。我们的代码可在https://github.com/lyeskhalil/corl上公开获取。
translated by 谷歌翻译
组合优化是运营研究和计算机科学领域的一个公认领域。直到最近,它的方法一直集中在孤立地解决问题实例,而忽略了它们通常源于实践中的相关数据分布。但是,近年来,人们对使用机器学习,尤其是图形神经网络(GNN)的兴趣激增,作为组合任务的关键构件,直接作为求解器或通过增强确切的求解器。GNN的电感偏差有效地编码了组合和关系输入,因为它们对排列和对输入稀疏性的意识的不变性。本文介绍了对这个新兴领域的最新主要进步的概念回顾,旨在优化和机器学习研究人员。
translated by 谷歌翻译
The design of good heuristics or approximation algorithms for NP-hard combinatorial optimization problems often requires significant specialized knowledge and trial-and-error. Can we automate this challenging, tedious process, and learn the algorithms instead? In many real-world applications, it is typically the case that the same optimization problem is solved again and again on a regular basis, maintaining the same problem structure but differing in the data. This provides an opportunity for learning heuristic algorithms that exploit the structure of such recurring problems. In this paper, we propose a unique combination of reinforcement learning and graph embedding to address this challenge. The learned greedy policy behaves like a meta-algorithm that incrementally constructs a solution, and the action is determined by the output of a graph embedding network capturing the current state of the solution. We show that our framework can be applied to a diverse range of optimization problems over graphs, and learns effective algorithms for the Minimum Vertex Cover, Maximum Cut and Traveling Salesman problems.
translated by 谷歌翻译
This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization problems. Given the hard nature of these problems, state-of-the-art algorithms rely on handcrafted heuristics for making decisions that are otherwise too expensive to compute or mathematically not well defined. Thus, machine learning looks like a natural candidate to make such decisions in a more principled and optimized way. We advocate for pushing further the integration of machine learning and combinatorial optimization and detail a methodology to do so. A main point of the paper is seeing generic optimization problems as data points and inquiring what is the relevant distribution of problems to use for learning on a given task.
translated by 谷歌翻译
Steiner树问题(STP)在图中旨在在连接给定的顶点集的图表中找到一个最小权重的树。它是一种经典的NP - 硬组合优化问题,具有许多现实世界应用(例如,VLSI芯片设计,运输网络规划和无线传感器网络)。为STP开发了许多精确和近似算法,但它们分别遭受高计算复杂性和弱案例解决方案保证。还开发了启发式算法。但是,它们中的每一个都需要应用域知识来设计,并且仅适用于特定方案。最近报道的观察结果,同一NP-COLLECLIAL问题的情况可能保持相同或相似的组合结构,但主要在其数据中不同,我们调查将机器学习技术应用于STP的可行性和益处。为此,我们基于新型图形神经网络和深增强学习设计了一种新型模型瓦坎。 Vulcan的核心是一种新颖的紧凑型图形嵌入,将高瞻度图形结构数据(即路径改变信息)转换为低维矢量表示。鉴于STP实例,Vulcan使用此嵌入来对其路径相关的信息进行编码,并基于双层Q网络(DDQN)将编码的图形发送到深度加强学习组件,以找到解决方案。除了STP之外,Vulcan还可以通过将解决方案(例如,SAT,MVC和X3C)来减少到STP来找到解决方案。我们使用现实世界和合成数据集进行广泛的实验,展示了vulcan的原型,并展示了它的功效和效率。
translated by 谷歌翻译
广泛研究和使用旅行推销员问题等图形问题,如旅行推销员问题,或发现最小的施泰纳树在数据工程和计算机科学中使用。通常,在现实世界应用中,图表的特征往往会随着时间的推移而变化,因此,找到问题的解决方案变得具有挑战性。许多图表问题的动态版本是运输,电信和社交网络中普遍世界问题的关键。近年来,利用深度学习技术来寻找NP-Hard图组合问题的启发式解决方案,因为这些学习的启发式可以有效地找到近最佳解决方案。但是,大多数现有的学习启发式方法都关注静态图问题。动态性质使NP-Hard图表问题更具挑战性,并且现有方法无法找到合理的解决方案。在本文中,我们提出了一种名为Cabl时间关注的新型建筑,并利用加固学习(GTA-RL)来学习基于图形的动态组合优化问题的启发式解决方案。 GTA-RL架构包括能够嵌入组合问题实例的时间特征的编码器和能够动态地关注嵌入功能的解码器,以找到给定组合问题实例的解决方案。然后,我们将架构扩展到学习HeuRistics的组合优化问题的实时版本,其中问题的所有输入特征是未知的,而是实时学习。我们针对几种最先进的基于学习的算法和最佳求解器的实验结果表明,我们的方法在动态和效率方面,在有效性和最佳求解器方面优于基于最先进的学习方法。实时图组合优化。
translated by 谷歌翻译
Influence Maximization (IM) is a classical combinatorial optimization problem, which can be widely used in mobile networks, social computing, and recommendation systems. It aims at selecting a small number of users such that maximizing the influence spread across the online social network. Because of its potential commercial and academic value, there are a lot of researchers focusing on studying the IM problem from different perspectives. The main challenge comes from the NP-hardness of the IM problem and \#P-hardness of estimating the influence spread, thus traditional algorithms for overcoming them can be categorized into two classes: heuristic algorithms and approximation algorithms. However, there is no theoretical guarantee for heuristic algorithms, and the theoretical design is close to the limit. Therefore, it is almost impossible to further optimize and improve their performance. With the rapid development of artificial intelligence, the technology based on Machine Learning (ML) has achieved remarkable achievements in many fields. In view of this, in recent years, a number of new methods have emerged to solve combinatorial optimization problems by using ML-based techniques. These methods have the advantages of fast solving speed and strong generalization ability to unknown graphs, which provide a brand-new direction for solving combinatorial optimization problems. Therefore, we abandon the traditional algorithms based on iterative search and review the recent development of ML-based methods, especially Deep Reinforcement Learning, to solve the IM problem and other variants in social networks. We focus on summarizing the relevant background knowledge, basic principles, common methods, and applied research. Finally, the challenges that need to be solved urgently in future IM research are pointed out.
translated by 谷歌翻译
用于图形组合优化问题的神经网络溶剂的端到端培训,例如旅行销售人员问题(TSP)最近看到了感兴趣的激增,但在几百节节点的图表中保持棘手和效率低下。虽然最先进的学习驱动的方法对于TSP在培训的古典索引时与古典求解器密切相关,但它们无法通过实际尺度的实际情况概括到更大的情况。这项工作提出了一个端到端的神经组合优化流水线,统一几个卷纸,以确定促进比在训练中看到的实例的概括的归纳偏差,模型架构和学习算法。我们的受控实验提供了第一个原则上调查这种零拍摄的概括,揭示了超越训练数据的推断需要重新思考从网络层和学习范例到评估协议的神经组合优化流水线。此外,我们分析了深入学习的最近进步,通过管道的镜头路由问题,并提供新的方向,以刺激未来的研究。
translated by 谷歌翻译
我们提出了一个通用图形神经网络体系结构,可以作为任何约束满意度问题(CSP)作为末端2端搜索启发式训练。我们的体系结构可以通过政策梯度下降进行无监督的培训,以纯粹的数据驱动方式为任何CSP生成问题的特定启发式方法。该方法基于CSP的新型图表,既是通用又紧凑的,并且使我们能够使用一个GNN处理所有可能的CSP实例,而不管有限的Arity,关系或域大小。与以前的基于RL的方法不同,我们在全局搜索动作空间上运行,并允许我们的GNN在随机搜索的每个步骤中修改任何数量的变量。这使我们的方法能够正确利用GNN的固有并行性。我们进行了彻底的经验评估,从随机数据(包括图形着色,Maxcut,3-SAT和Max-K-Sat)中学习启发式和重要的CSP。我们的方法表现优于先验的神经组合优化的方法。它可以在测试实例上与常规搜索启发式竞争,甚至可以改善几个数量级,结构上比训练中看到的数量级更为复杂。
translated by 谷歌翻译
回溯搜索算法通常用于解决约束满足问题(CSP)。回溯搜索的效率在很大程度上取决于可变排序启发式。目前,最常用的启发式是根据专家知识进行手工制作的。在本文中,我们提出了一种基于深度的加强学习方法,可以自动发现新的变量订购启发式,更好地适用于给定类CSP实例。我们显示,直接优化搜索成本很难用于自动启动,并建议优化在搜索树中到达叶节点的预期成本。为了捕获变量和约束之间的复杂关系,我们设计基于图形神经网络的表示方案,可以处理具有不同大小和约束的CSP实例。随机CSP实例上的实验结果表明,学习的政策在最小化搜索树大小的方面优于古典手工制作的启发式,并且可以有效地推广到比训练中使用的实例。
translated by 谷歌翻译
深度强化学习(DRL)赋予了各种人工智能领域,包括模式识别,机器人技术,推荐系统和游戏。同样,图神经网络(GNN)也证明了它们在图形结构数据的监督学习方面的出色表现。最近,GNN与DRL用于图形结构环境的融合引起了很多关注。本文对这些混合动力作品进行了全面评论。这些作品可以分为两类:(1)算法增强,其中DRL和GNN相互补充以获得更好的实用性; (2)特定于应用程序的增强,其中DRL和GNN相互支持。这种融合有效地解决了工程和生命科学方面的各种复杂问题。基于审查,我们进一步分析了融合这两个领域的适用性和好处,尤其是在提高通用性和降低计算复杂性方面。最后,集成DRL和GNN的关键挑战以及潜在的未来研究方向被突出显示,这将引起更广泛的机器学习社区的关注。
translated by 谷歌翻译
图形上的组合优化问题(COP)是优化的基本挑战。强化学习(RL)最近成为解决这些问题的新框架,并证明了令人鼓舞的结果。但是,大多数RL解决方案都采用贪婪的方式来逐步构建解决方案,因此不可避免地对动作序列构成不必要的依赖性,并且需要许多特定于问题的设计。我们提出了一个通用的RL框架,该框架不仅表现出最先进的经验表现,而且还推广到各种各样的警察。具体而言,我们将状态定义为解决问题实例的解决方案,并将操作作为对该解决方案的扰动。我们利用图形神经网络(GNN)为给定的问题实例提取潜在表示,然后应用深Q学习以获得通过翻转或交换顶点标签逐渐完善解决方案的策略。实验是在最大$ k $ cut和旅行推销员问题上进行的,并且针对一系列基于学习的启发式基线实现了绩效改善。
translated by 谷歌翻译
Graph mining tasks arise from many different application domains, ranging from social networks, transportation to E-commerce, etc., which have been receiving great attention from the theoretical and algorithmic design communities in recent years, and there has been some pioneering work employing the research-rich Reinforcement Learning (RL) techniques to address graph data mining tasks. However, these graph mining methods and RL models are dispersed in different research areas, which makes it hard to compare them. In this survey, we provide a comprehensive overview of RL and graph mining methods and generalize these methods to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method descriptions, open-source codes, and benchmark datasets of GRL methods. Furthermore, we propose important directions and challenges to be solved in the future. As far as we know, this is the latest work on a comprehensive survey of GRL, this work provides a global view and a learning resource for scholars. In addition, we create an online open-source for both interested scholars who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
translated by 谷歌翻译
Adequately assigning credit to actions for future outcomes based on their contributions is a long-standing open challenge in Reinforcement Learning. The assumptions of the most commonly used credit assignment method are disadvantageous in tasks where the effects of decisions are not immediately evident. Furthermore, this method can only evaluate actions that have been selected by the agent, making it highly inefficient. Still, no alternative methods have been widely adopted in the field. Hindsight Credit Assignment is a promising, but still unexplored candidate, which aims to solve the problems of both long-term and counterfactual credit assignment. In this thesis, we empirically investigate Hindsight Credit Assignment to identify its main benefits, and key points to improve. Then, we apply it to factored state representations, and in particular to state representations based on the causal structure of the environment. In this setting, we propose a variant of Hindsight Credit Assignment that effectively exploits a given causal structure. We show that our modification greatly decreases the workload of Hindsight Credit Assignment, making it more efficient and enabling it to outperform the baseline credit assignment method on various tasks. This opens the way to other methods based on given or learned causal structures.
translated by 谷歌翻译
强化学习和最近的深度增强学习是解决如Markov决策过程建模的顺序决策问题的流行方法。问题和选择算法和超参数的RL建模需要仔细考虑,因为不同的配置可能需要完全不同的性能。这些考虑因素主要是RL专家的任务;然而,RL在研究人员和系统设计师不是RL专家的其他领域中逐渐变得流行。此外,许多建模决策,例如定义状态和动作空间,批次的大小和批量更新的频率以及时间戳的数量通常是手动进行的。由于这些原因,RL框架的自动化不同组成部分具有重要意义,近年来它引起了很多关注。自动RL提供了一个框架,其中RL的不同组件包括MDP建模,算法选择和超参数优化是自动建模和定义的。在本文中,我们探讨了可以在自动化RL中使用的文献和目前的工作。此外,我们讨论了Autorl中的挑战,打开问题和研究方向。
translated by 谷歌翻译
在本文中,我们介绍了有关典型乘车共享系统中决策优化问题的强化学习方法的全面,深入的调查。涵盖了有关乘车匹配,车辆重新定位,乘车,路由和动态定价主题的论文。在过去的几年中,大多数文献都出现了,并且要继续解决一些核心挑战:模型复杂性,代理协调和多个杠杆的联合优化。因此,我们还引入了流行的数据集和开放式仿真环境,以促进进一步的研发。随后,我们讨论了有关该重要领域的强化学习研究的许多挑战和机会。
translated by 谷歌翻译
近年来,近年来,加强学习与图形神经网络(GNN)架构相结合,可以学会解决硬组合优化问题:给定原始输入数据和评估者指导过程,这个想法是自动学习策略返回可行和高质量的输出。最近的工作表明了有希望的结果,但后者主要在旅行推销员问题(TSP)和类似的抽象变体上进行评估,例如分割输送车辆路由问题(SDVRP)。在本文中,我们分析了如何以及最近的神经架构如何应用于实际重要性的图表问题。因此,我们将这些架构系统上“将这些架构转移到电力和信道分配问题(PCAP),其具有实际相关性,例如无线网络中的无线电资源分配。我们的实验结果表明现有的架构(I)仍然无法捕获图形结构特征,并且(II)不适合图表上的动作更改图形属性的问题。在一个积极的票据上,我们表明,增强了距离编码问题的结构表示是迈向学习多用途自主求解器的仍然雄心勃勃的目标的有希望的一步。
translated by 谷歌翻译
钢筋学习最近在许多组合优化问题中显示了学习质量解决方案的承诺。特别地,基于注意的编码器 - 解码器模型在各种路由问题上显示出高效率,包括旅行推销员问题(TSP)。不幸的是,它们对具有无人机(TSP-D)的TSP表现不佳,需要在协调中路由车辆的异构队列 - 卡车和无人机。在TSP-D中,这两个车辆正在串联移动,并且可能需要在用于其他车辆的节点上等待加入。不那么关注的基于关注的解码器无法在车辆之间进行这种协调。我们提出了一种注意力编码器-LSTM解码器混合模型,其中解码器的隐藏状态可以代表所做的动作序列。我们经验证明,这种混合模型可提高基于纯粹的关注的模型,用于解决方案质量和计算效率。我们对MIN-MAX电容车辆路由问题(MMCVRP)的实验还确认混合模型更适合于多车辆的协调路由而不是基于注意的模型。
translated by 谷歌翻译
In recent years, methods based on deep neural networks, and especially Neural Improvement (NI) models, have led to a revolution in the field of combinatorial optimization. Given an instance of a graph-based problem and a candidate solution, they are able to propose a modification rule that improves its quality. However, existing NI approaches only consider node features and node-wise positional encodings to extract the instance and solution information, respectively. Thus, they are not suitable for problems where the essential information is encoded in the edges. In this paper, we present a NI model to solve graph-based problems where the information is stored either in the nodes, in the edges, or in both of them. We incorporate the NI model as a building block of hill-climbing-based algorithms to efficiently guide the election of neighborhood operations considering the solution at that iteration. Conducted experiments show that the model is able to recommend neighborhood operations that are in the $99^{th}$ percentile for the Preference Ranking Problem. Moreover, when incorporated to hill-climbing algorithms, such as Iterated or Multi-start Local Search, the NI model systematically outperforms the conventional versions. Finally, we demonstrate the flexibility of the model by extending the application to two well-known problems: the Traveling Salesman Problem and the Graph Partitioning Problem.
translated by 谷歌翻译
即使机器学习算法已经在数据科学中发挥了重要作用,但许多当前方法对输入数据提出了不现实的假设。由于不兼容的数据格式,或数据集中的异质,分层或完全缺少的数据片段,因此很难应用此类方法。作为解决方案,我们提出了一个用于样本表示,模型定义和培训的多功能,统一的框架,称为“ Hmill”。我们深入审查框架构建和扩展的机器学习的多个范围范式。从理论上讲,为HMILL的关键组件的设计合理,我们将通用近似定理的扩展显示到框架中实现的模型所实现的所有功能的集合。本文还包含有关我们实施中技术和绩效改进的详细讨论,该讨论将在MIT许可下发布供下载。该框架的主要资产是其灵活性,它可以通过相同的工具对不同的现实世界数据源进行建模。除了单独观察到每个对象的一组属性的标准设置外,我们解释了如何在框架中实现表示整个对象系统的图表中的消息推断。为了支持我们的主张,我们使用框架解决了网络安全域的三个不同问题。第一种用例涉及来自原始网络观察结果的IoT设备识别。在第二个问题中,我们研究了如何使用以有向图表示的操作系统的快照可以对恶意二进制文件进行分类。最后提供的示例是通过网络中实体之间建模域黑名单扩展的任务。在所有三个问题中,基于建议的框架的解决方案可实现与专业方法相当的性能。
translated by 谷歌翻译