机器学习(ML)方法已成为解决车辆路由问题的有用工具,可以与流行的启发式方法或独立模型结合使用。但是,当解决不同大小或不同分布的问题时,当前的方法的概括不佳。结果,车辆路由中的ML见证了一个扩展阶段,为特定问题实例创建了新方法,这些方法在较大的问题大小上变得不可行。本文旨在通过理解和改善当前现有模型,即Kool等人的注意模型来鼓励该领域的整合。我们确定了VRP概括的两个差异类别。第一个是基于问题本身固有的差异,第二个与限制模型概括能力的建筑弱点有关。我们的贡献变成了三倍:我们首先通过适应Kool等人来靶向模型差异。方法及其基于alpha-entmax激活的稀疏动态注意力的损耗函数。然后,我们通过使用混合实例训练方法来靶向固有的差异,该方法已被证明在某些情况下超过了单个实例培训。最后,我们介绍了推理水平数据增强的框架,该框架通过利用模型缺乏旋转和扩张变化的不变性来提高性能。
translated by 谷歌翻译
这项工作通过调整适合常规TSP的最新方法,使用深入的加固学习(DRL)提出了使用优先限制(TSPPC)的解决方案。这些方法共有的是基于多头注意(MHA)层的图形模型的使用。解决拾取和交付问题(PDP)的一个想法是使用异质注意来嵌入每个节点可以扮演的不同可能的角色。在这项工作中,我们将这种异质注意的概念推广到TSPPC。此外,我们适应了最近的想法,以使注意力稀疏以获得更好的可扩展性。总体而言,我们通过对解决TSPPC的最新DRL方法的应用和评估为研究界做出了贡献。
translated by 谷歌翻译
组合优化问题在许多实际情况(例如物流和生产)中遇到,但是精确的解决方案尤其难以找到,通常对于大量的问题大小而言,通常是NP-HARD。为了计算近似解决方案,通常使用局部搜索的通用和特定问题的动物园。但是,哪种变体适用于哪种特定问题,即使对于专家来说也很难决定。在本文中,我们确定了这种本地搜索算法的三个独立算法方面,并将其在优化过程中正式选择为马尔可夫决策过程(MDP)。我们将深图神经网络设计为该MDP的策略模型,为当地搜索提供了一个名为Neurols的局部搜索控制器。充分的实验证据表明,神经元能够胜过操作研究和最新基于机器学习的方法的众所周知的通用本地搜索控制器。
translated by 谷歌翻译
学习解决组合优化问题,例如车辆路径问题,提供古典运营研究求解器和启发式的巨大计算优势。最近开发的深度加强学习方法迭代或顺序地构建一组个别旅游的最初给定的解决方案。然而,大多数现有的基于学习的方法都无法为固定数量的车辆工作,从而将客户的复杂分配问题绕过APRIORI给定数量的可用车辆。另一方面,这使得它们不太适合真实应用程序,因为许多物流服务提供商依赖于提供的解决方案提供了特定的界限船队规模,并且无法适应车辆数量的短期更改。相比之下,我们提出了一个强大的监督深度学习框架,在尊重APRiori固定数量的可用车辆的同时构建完整的旅游计划。与高效的后处理方案结合,我们的监督方法不仅要快得多,更容易训练,而且还实现了包含车辆成本的实际方面的竞争结果。在彻底的控制实验中,我们将我们的方法与我们展示稳定性能的多种最先进的方法进行比较,同时利用较少的车辆并在相关工作的实验协议中存在一些亮点。
translated by 谷歌翻译
The analysis of network structure is essential to many scientific areas, ranging from biology to sociology. As the computational task of clustering these networks into partitions, i.e., solving the community detection problem, is generally NP-hard, heuristic solutions are indispensable. The exploration of expedient heuristics has led to the development of particularly promising approaches in the emerging technology of quantum computing. Motivated by the substantial hardware demands for all established quantum community detection approaches, we introduce a novel QUBO based approach that only needs number-of-nodes many qubits and is represented by a QUBO-matrix as sparse as the input graph's adjacency matrix. The substantial improvement on the sparsity of the QUBO-matrix, which is typically very dense in related work, is achieved through the novel concept of separation-nodes. Instead of assigning every node to a community directly, this approach relies on the identification of a separation-node set, which -- upon its removal from the graph -- yields a set of connected components, representing the core components of the communities. Employing a greedy heuristic to assign the nodes from the separation-node sets to the identified community cores, subsequent experimental results yield a proof of concept. This work hence displays a promising approach to NISQ ready quantum community detection, catalyzing the application of quantum computers for the network structure analysis of large scale, real world problem instances.
translated by 谷歌翻译
Efficient surrogate modelling is a key requirement for uncertainty quantification in data-driven scenarios. In this work, a novel approach of using Sparse Random Features for surrogate modelling in combination with self-supervised dimensionality reduction is described. The method is compared to other methods on synthetic and real data obtained from crashworthiness analyses. The results show a superiority of the here described approach over state of the art surrogate modelling techniques, Polynomial Chaos Expansions and Neural Networks.
translated by 谷歌翻译
In the era of noisy intermediate scale quantum devices, variational quantum circuits (VQCs) are currently one of the main strategies for building quantum machine learning models. These models are made up of a quantum part and a classical part. The quantum part is given by a parametrization $U$, which, in general, is obtained from the product of different quantum gates. By its turn, the classical part corresponds to an optimizer that updates the parameters of $U$ in order to minimize a cost function $C$. However, despite the many applications of VQCs, there are still questions to be answered, such as for example: What is the best sequence of gates to be used? How to optimize their parameters? Which cost function to use? How the architecture of the quantum chips influences the final results? In this article, we focus on answering the last question. We will show that, in general, the cost function will tend to a typical average value the closer the parameterization used is from a $2$-design. Therefore, the closer this parameterization is to a $2$-design, the less the result of the quantum neural network model will depend on its parametrization. As a consequence, we can use the own architecture of the quantum chips to defined the VQC parametrization, avoiding the use of additional swap gates and thus diminishing the VQC depth and the associated errors.
translated by 谷歌翻译
Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting.
translated by 谷歌翻译
This short paper discusses continually updated causal abstractions as a potential direction of future research. The key idea is to revise the existing level of causal abstraction to a different level of detail that is both consistent with the history of observed data and more effective in solving a given task.
translated by 谷歌翻译
State-of-the-art poetry generation systems are often complex. They either consist of task-specific model pipelines, incorporate prior knowledge in the form of manually created constraints or both. In contrast, end-to-end models would not suffer from the overhead of having to model prior knowledge and could learn the nuances of poetry from data alone, reducing the degree of human supervision required. In this work, we investigate end-to-end poetry generation conditioned on styles such as rhyme, meter, and alliteration. We identify and address lack of training data and mismatching tokenization algorithms as possible limitations of past attempts. In particular, we successfully pre-train and release ByGPT5, a new token-free decoder-only language model, and fine-tune it on a large custom corpus of English and German quatrains annotated with our styles. We show that ByGPT5 outperforms other models such as mT5, ByT5, GPT-2 and ChatGPT, while also being more parameter efficient and performing favorably compared to humans. In addition, we analyze its runtime performance and introspect the model's understanding of style conditions. We make our code, models, and datasets publicly available.
translated by 谷歌翻译