智能论文笔记

Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor Problem

Breno Serrano , Stefan Minner , Maximilian Schiffer , Thibaut Vidal

分类：机器学习

2022-09-12

我们研究了基于功能的新闻企业问题，其中决策者可以访问包括需求观察和外源特征组成的历史数据。在这种情况下，我们研究了功能选择，旨在得出具有改进样本外部性能的稀疏，可解释的模型。到目前为止，最新的方法利用正则化，这会惩罚所选特征的数量或解决方案向量的规范。作为替代方案，我们介绍了一种新型的双层编程公式。高级问题选择了一部分功能，这些功能将基于固定验证集的订购决策的样本外成本估算最小化。下层问题仅使用上层选择的功能，了解训练集中决策功能的最佳系数。我们为Bilevel程序提供了混合整数线性程序重新制定，可以通过标准优化求解器求解为最佳性。我们的计算实验表明，该方法准确地恢复了几百个观察结果的实例中的基础真相。相反，基于正则化的技术通常在功能恢复时失败，或者需要数千个观察值才能获得相似的准确性。关于样本外的概括，我们实现了改进或可比的成本绩效。

translated by 谷歌翻译

Integrated Conditional Estimation-Optimization

Paul Grigas , Meng Qi , Zuo-Jun , Shen

分类： (统计)机器学习 | 机器学习

2021-10-24

许多实际优化问题涉及不确定的参数，这些参数具有概率分布，可以使用上下文特征信息来估算。与首先估计不确定参数的分布然后基于估计优化目标的标准方法相反，我们提出了一个\ textIt {集成条件估计 - 优化}（ICEO）框架，该框架估计了随机参数的潜在条件分布同时考虑优化问题的结构。我们将随机参数的条件分布与上下文特征之间的关系直接建模，然后以与下游优化问题对齐的目标估算概率模型。我们表明，我们的ICEO方法在适度的规律性条件下渐近一致，并以概括范围的形式提供有限的性能保证。在计算上，使用ICEO方法执行估计是一种非凸面且通常是非差异的优化问题。我们提出了一种通用方法，用于近似从估计的条件分布到通过可区分函数的最佳决策的潜在非差异映射，这极大地改善了应用于非凸问题的基于梯度的算法的性能。我们还提供了半代理案例中的多项式优化解决方案方法。还进行了数值实验，以显示我们在不同情况下的方法的经验成功，包括数据样本和模型不匹配。

translated by 谷歌翻译

Decomposition and Adaptive Sampling for Data-Driven Inverse Linear Optimization

Rishabh Gupta , Qi Zhang

分类：机器学习

2020-09-16

这项工作解决了逆线优化，其中目标是推断线性程序的未知成本向量。具体地，我们考虑数据驱动的设置，其中可用数据是对应于线性程序的不同实例的最佳解决方案的嘈杂的观察。我们介绍了一个问题的新配方，与其他现有方法相比，允许恢复较少的限制性和一般更适当的可允许成本估算。可以表明，该逆优化问题产生有限数量的解决方案，并且我们开发了一个精确的两相算法来确定所有此类解决方案。此外，我们提出了一种有效的分解算法来解决问题的大实例。该算法自然地扩展到在线学习环境，可以用于提供成本估计的快速更新，因为新数据随着时间的推移可用。对于在线设置，我们进一步开发了一种有效的自适应采样策略，指导下一个样本的选择。所提出的方法的功效在涉及两种应用，客户偏好学习和生产计划的成本估算的计算实验中进行了证明。结果表明计算和采样努力的显着减少。

translated by 谷歌翻译

Efficient Learning of Decision-Making Models: A Penalty Block Coordinate Descent Algorithm for Data-Driven Inverse Optimization

Rishabh Gupta , Qi Zhang

分类：机器学习

2022-10-27

Decision-making problems are commonly formulated as optimization problems, which are then solved to make optimal decisions. In this work, we consider the inverse problem where we use prior decision data to uncover the underlying decision-making process in the form of a mathematical optimization model. This statistical learning problem is referred to as data-driven inverse optimization. We focus on problems where the underlying decision-making process is modeled as a convex optimization problem whose parameters are unknown. We formulate the inverse optimization problem as a bilevel program and propose an efficient block coordinate descent-based algorithm to solve large problem instances. Numerical experiments on synthetic datasets demonstrate the computational advantage of our method compared to standard commercial solvers. Moreover, the real-world utility of the proposed approach is highlighted through two realistic case studies in which we consider estimating risk preferences and learning local constraint parameters of agents in a multiplayer Nash bargaining game.

translated by 谷歌翻译

Predictive Machine Learning of Objective Boundaries for Solving COPs

Helge Spieker , Arnaud Gotlieb

分类：人工智能 | 机器学习

2021-11-04

通过边界估计可以显着简化求解约束优化问题（COP），即提供成本函数的紧密边界。通过使用由已知边界的数据组成的数据以及COMPS提取的特征来馈送监督机器学习（ML）模型，可以训练模型以估计新COP实例的边界。在本文中，我们首先概述了来自问题实例的约束编程（CP）的ML的现有知识体系。其次，我们介绍了应用于支持CP解算器的工具的边界估计框架。在该框架内，讨论并评估了不同的ML模型，并评估其对边界估计的适用性，并避免避免求解器找到最佳解决方案的不可行估计的对策。第三，我们在七个警察中提出了一种实验研究，与不同的CP溶剂。我们的结果表明，可以仅限于这些警察的近似最佳边界。这些估计的边界将客观域大小减少60-88％，可以帮助求解器在搜索期间提前找到近乎最佳解决方案。

translated by 谷歌翻译

Margin Optimal Classification Trees

Federico D'Onofrio , Giorgio Grani , Marta Monaci , Laura Palagi

分类：机器学习

2022-10-19

In recent years there has been growing attention to interpretable machine learning models which can give explanatory insights on their behavior. Thanks to their interpretability, decision trees have been intensively studied for classification tasks, and due to the remarkable advances in mixed-integer programming (MIP), various approaches have been proposed to formulate the problem of training an Optimal Classification Tree (OCT) as a MIP model. We present a novel mixed-integer quadratic formulation for the OCT problem, which exploits the generalization capabilities of Support Vector Machines for binary classification. Our model, denoted as Margin Optimal Classification Tree (MARGOT), encompasses the use of maximum margin multivariate hyperplanes nested in a binary tree structure. To enhance the interpretability of our approach, we analyse two alternative versions of MARGOT, which include feature selection constraints inducing local sparsity of the hyperplanes. First, MARGOT has been tested on non-linearly separable synthetic datasets in 2-dimensional feature space to provide a graphical representation of the maximum margin approach. Finally, the proposed models have been tested on benchmark datasets from the UCI repository. The MARGOT formulation turns out to be easier to solve than other OCT approaches, and the generated tree better generalizes on new observations. The two interpretable versions are effective in selecting the most relevant features and maintaining good prediction quality.

translated by 谷歌翻译

Data-Driven Sample Average Approximation with Covariate Information

Rohit Kannan , Güzin Bayraksan , James R. Luedtke

分类： (统计)机器学习

2022-07-27

当我们对优化模型中的不确定参数进行观察以及对协变量的同时观察时，我们研究了数据驱动决策的优化。鉴于新的协变量观察，目标是选择一个决定以此观察为条件的预期成本的决定。我们研究了三个数据驱动的框架，这些框架将机器学习预测模型集成在随机编程样本平均值近似（SAA）中，以近似解决该问题的解决方案。 SAA框架中的两个是新的，并使用了场景生成的剩余预测模型的样本外残差。我们研究的框架是灵活的，并且可以容纳参数，非参数和半参数回归技术。我们在数据生成过程，预测模型和随机程序中得出条件，在这些程序下，这些数据驱动的SaaS的解决方案是一致且渐近最佳的，并且还得出了收敛速率和有限的样本保证。计算实验验证了我们的理论结果，证明了我们数据驱动的公式比现有方法的潜在优势（即使预测模型被误解了），并说明了我们在有限的数据制度中新的数据驱动配方的好处。

translated by 谷歌翻译

Solving Multistage Stochastic Linear Programming via Regularized Linear Decision Rules: An Application to Hydrothermal Dispatch Planning

Felipe Nazare , Alexandre Street

分类：机器学习 | (统计)机器学习

2021-10-07

多阶段随机线性问题（MSLP）的解决方案代表了许多应用程序的挑战。长期水热调度计划（LHDP）在影响全球电力市场，经济和自然资源的现实世界中实现了这一挑战。没有用于MSLP的封闭式解决方案，并且具有高质量的非预期策略的定义是至关重要的。线性决策规则（LDR）提供了一个有趣的基于模拟的框架，可通过两阶段随机模型为MSLP找到高质量的策略。但是，在实际应用中，使用LDR时要估计的参数数量可能接近或高于样本平均近似问题的场景数量，从而在样本外产生样本外的过度效果和差的表现不佳模拟。在本文中，我们提出了一个新型的正则LDR来基于Adalasso（自适应最少的绝对收缩和选择算子）求解MSLP。目的是使用高维线性回归模型中所研究的简约原理，以获得应用于MSLP的LDR的更好的样本外部性能。计算实验表明，使用经典的非规范LDR来求解LHDP时，过度合适的威胁是不可忽略的，这是研究最多的MSLP之一，其中具有相关应用在行业中。我们的分析强调了拟议框架与非规范化基准相比的以下好处：1）非零系数的数量显着减少（模型简约），2）2）大幅度降低样本外评估的成本降低， 3）改善了现货价格概况。

translated by 谷歌翻译

Surgical Scheduling via Optimization and Machine Learning with Long-Tailed Data

Yuan Shi , Saied Mahdian , Jose Blanchet , Peter Glynn , Andrew Y. Shin , David Scheinker

分类：机器学习

2022-02-13

Using data from cardiovascular surgery patients with long and highly variable post-surgical lengths of stay (LOS), we develop a modeling framework to reduce recovery unit congestion. We estimate the LOS and its probability distribution using machine learning models, schedule procedures on a rolling basis using a variety of optimization models, and estimate performance with simulation. The machine learning models achieved only modest LOS prediction accuracy, despite access to a very rich set of patient characteristics. Compared to the current paper-based system used in the hospital, most optimization models failed to reduce congestion without increasing wait times for surgery. A conservative stochastic optimization with sufficient sampling to capture the long tail of the LOS distribution outperformed the current manual process and other stochastic and robust optimization approaches. These results highlight the perils of using oversimplified distributional models of LOS for scheduling procedures and the importance of using optimization methods well-suited to dealing with long-tailed behavior.

translated by 谷歌翻译

Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Bernd Bischl , Martin Binder , Michel Lang , Tobias Pielok , Jakob Richter , Stefan Coors , Janek Thomas , Theresa Ullmann , Marc Becker , Anne-Laure Boulesteix

分类： (统计)机器学习 | 机器学习

2021-07-13

大多数机器学习算法由一个或多个超参数配置，必须仔细选择并且通常会影响性能。为避免耗时和不可递销的手动试验和错误过程来查找性能良好的超参数配置，可以采用各种自动超参数优化（HPO）方法，例如，基于监督机器学习的重新采样误差估计。本文介绍了HPO后，本文审查了重要的HPO方法，如网格或随机搜索，进化算法，贝叶斯优化，超带和赛车。它给出了关于进行HPO的重要选择的实用建议，包括HPO算法本身，性能评估，如何将HPO与ML管道，运行时改进和并行化结合起来。这项工作伴随着附录，其中包含关于R和Python的特定软件包的信息，以及用于特定学习算法的信息和推荐的超参数搜索空间。我们还提供笔记本电脑，这些笔记本展示了这项工作的概念作为补充文件。

translated by 谷歌翻译

Minimax risk classifiers with 0-1 loss

Santiago Mazuelas , Mauricio Romero , Peter Grünwald

分类： (统计)机器学习 | 机器学习

2022-01-17

有监督的分类技术使用培训样本来学习具有小预期0-1损失（错误概率）的分类规则。常规方法可以通过使用替代损失而不是0-1损失并考虑特定的规则家族（假设类别）来实现可拖动学习并提供样本外的概括。本文介绍了Minimax风险分类器（MRCS），该分类器将最差的0-1损失比一般分类规则最小化，并在学习时提供严格的绩效保证。我们表明，使用特征内核给出的特征映射非常普遍地一致。本文还提出了MRC学习的有效优化技术，并表明提出的方法可以提供准确的分类以及实践中的紧张性能保证。

translated by 谷歌翻译

Mixed-Integer Optimization with Constraint Learning

Donato Maragno , Holly Wiberg , Dimitris Bertsimas , S. Ilker Birbil , Dick den Hertog , Adejuyigbe Fajemisin

分类：机器学习 | (统计)机器学习

2021-11-04

我们为学习限制建立了混合整数优化的广泛方法论基础。我们提出了一种用于数据驱动决策的端到端管道，其中使用机器学习直接从数据中学习限制和目标，并且培训的模型嵌入在优化配方中。我们利用许多机器学习方法的混合整数优化 - 焦点，包括线性模型，决策树，集合和多层的感知。对多种方法的考虑允许我们捕获决策，上下文变量和结果之间的各种潜在关系。我们还使用观察结果的凸船体来表征决策信任区域，以确保可信的建议并避免推断。我们有效地使用列生成和聚类来纳入这个表示。结合域驱动的约束和客观术语，嵌入式模型和信任区域定义了处方生成的混合整数优化问题。我们将此框架实施为从业者的Python包（OptiCl）。我们展示了化疗优化和世界食物计划规划中的方法。案例研究说明了在生成高质量处方的框架中的框架，由信任区域添加的值，加入多个机器学习方法以及包含多个学习约束的框架。

translated by 谷歌翻译

On data-driven chance constraint learning for mixed-integer optimization problems

Antonio Alcántara , Carlos Ruiz

分类： (统计)机器学习

2022-07-08

在处理现实世界优化问题时，决策者通常会面临与部分信息，未知参数或这些问题之间的复杂关系与问题决策变量相关的高度不确定性。在这项工作中，我们开发了一种新颖的机会限制学习（CCL）方法，重点是混合组合线性优化问题，该问题结合了机会约束和约束学习文献的思想。机会约束为要实现的单个或一组约束设定了概率置信度，而约束学习方法旨在通过预测模型对问题变量之间的功能关系进行建模。当我们需要为其响应变量设定进一步的界限时，就会出现一个主要问题之一：实现这些变量直接与预测模型的准确性及其概率行为有关。从这个意义上讲，CCL利用可线化的机器学习模型来估计学习变量的条件分位数，从而为机会约束提供了数据驱动的解决方案。已经开发了一个开放式软件，可以由从业人员使用。此外，在两个现实世界中的案例研究中已经测试了CCL的益处，证明当设定概率界限以进行学习的约束时，如何将鲁棒性添加到最佳解决方案中。

translated by 谷歌翻译

Learning Optimal Solutions via an LSTM-Optimization Framework

Dogacan Yilmaz , İ. Esra Büyüktahtakın

分类：机器学习

2022-07-06

在这项研究中，我们提出了一个深入的学习优化框架，以解决动态的混合企业计划。具体而言，我们开发了双向长期内存（LSTM）框架，可以及时向前和向后处理信息，以学习最佳解决方案，以解决顺序决策问题。我们展示了我们在预测单项电容批号问题（CLSP）的最佳决策方面的方法，其中二进制变量表示是否在一个时期内产生。由于问题的动态性质，可以将CLSP视为序列标记任务，在该任务中，复发性神经网络可以捕获问题的时间动力学。计算结果表明，我们的LSTM优化（LSTM-OPT）框架大大减少了基准CLSP问题的解决方案时间，而没有太大的可行性和最佳性。例如，对于240,000多个测试实例，在85 \％级别的预测平均将CPLEX溶液的时间减少了9倍，最佳差距小于0.05 \％\％和0.4 \％\％\％\％\％的不可行性。此外，使用较短的计划范围训练的模型可以成功预测具有更长计划范围的实例的最佳解决方案。对于最困难的数据集，LSTM在25 \％级别的LSTM预测将70 CPU小时的溶液时间降低至小于2 CPU分钟，最佳差距为0.8 \％，而没有任何不可行。 LSTM-OPT框架在解决方案质量和精确方法方面，诸如Logistic回归和随机森林之类的经典ML算法（例如（$ \ ell $，s）和基于动态编程的不平等，解决方案时间的改进。我们的机器学习方法可能有益于解决类似于CLSP的顺序决策问题，CLSP需要重复，经常和快速地解决。

translated by 谷歌翻译

Data-driven Prediction of Relevant Scenarios for Robust Combinatorial Optimization

Marc Goerigk , Jannis Kurtz

分类：机器学习

2022-03-30

We study iterative methods for (two-stage) robust combinatorial optimization problems with discrete uncertainty. We propose a machine-learning-based heuristic to determine starting scenarios that provide strong lower bounds. To this end, we design dimension-independent features and train a Random Forest Classifier on small-dimensional instances. Experiments show that our method improves the solution process for larger instances than contained in the training set and also provides a feature importance-score which gives insights into the role of scenario properties.

translated by 谷歌翻译

Sinkhorn Distributionally Robust Optimization

Jie Wang , Rui Gao , Yao Xie

分类：机器学习 | (统计)机器学习

2021-09-24

We study distributionally robust optimization (DRO) with Sinkhorn distance -- a variant of Wasserstein distance based on entropic regularization. We provide convex programming dual reformulation for a general nominal distribution. Compared with Wasserstein DRO, it is computationally tractable for a larger class of loss functions, and its worst-case distribution is more reasonable. We propose an efficient first-order algorithm with bisection search to solve the dual reformulation. We demonstrate that our proposed algorithm finds $\delta$-optimal solution of the new DRO formulation with computation cost $\tilde{O}(\delta^{-3})$ and memory cost $\tilde{O}(\delta^{-2})$, and the computation cost further improves to $\tilde{O}(\delta^{-2})$ when the loss function is smooth. Finally, we provide various numerical examples using both synthetic and real data to demonstrate its competitive performance and light computational speed.

translated by 谷歌翻译

Multi-Objective Hyperparameter Optimization -- An Overview

Florian Karl , Tobias Pielok , Julia Moosbauer , Florian Pfisterer , Stefan Coors , Martin Binder , Lennart Schneider , Janek Thomas , Jakob Richter , Michel Lang

分类：机器学习 | (统计)机器学习

2022-06-15

超参数优化构成了典型的现代机器学习工作流程的很大一部分。这是由于这样一个事实，即机器学习方法和相应的预处理步骤通常只有在正确调整超参数时就会产生最佳性能。但是在许多应用中，我们不仅有兴趣仅仅为了预测精度而优化ML管道；确定最佳配置时，必须考虑其他指标或约束，从而导致多目标优化问题。由于缺乏知识和用于多目标超参数优化的知识和容易获得的软件实现，因此通常在实践中被忽略。在这项工作中，我们向读者介绍了多个客观超参数优化的基础知识，并激励其在应用ML中的实用性。此外，我们从进化算法和贝叶斯优化的领域提供了现有优化策略的广泛调查。我们说明了MOO在几个特定ML应用中的实用性，考虑了诸如操作条件，预测时间，稀疏，公平，可解释性和鲁棒性之类的目标。

translated by 谷歌翻译

Interpretable and Fair Boolean Rule Sets via Column Generation

Connor Lawless , Sanjeeb Dash , Oktay Gunluk , Dennis Wei

分类：机器学习 | 人工智能

2021-11-16

本文考虑了在分解正常形式（DNF，ANDS的DNF，ANDS，相当于判定规则集）或联合正常形式（CNF，ORS）作为分类模型的联合正常形式的学习。为规则简化，将整数程序配制成最佳贸易分类准确性。我们还考虑公平设定，并扩大制定，以包括对两种不同分类措施的明确限制：机会平等和均等的赔率。列生成（CG）用于有效地搜索候选条款（连词或剖钉）的指数数量，而不需要启发式规则挖掘。此方法还会绑定所选规则集之间的间隙和培训数据上的最佳规则集。要处理大型数据集，我们建议使用随机化的近似CG算法。与三个最近提出的替代方案相比，CG算法主导了16个数据集中的8个中的精度简单折衷。当最大限度地提高精度时，CG与为此目的设计的规则学习者具有竞争力，有时发现明显更简单的解决方案，这些解决方案不太准确。与其他公平和可解释的分类器相比，我们的方法能够找到符合较严格的公平概念的规则集，以适度的折衷准确性。

translated by 谷歌翻译

Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon

Yoshua Bengio , Andrea Lodi , Antoine Prouvost

分类：

2018-11-15

This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization problems. Given the hard nature of these problems, state-of-the-art algorithms rely on handcrafted heuristics for making decisions that are otherwise too expensive to compute or mathematically not well defined. Thus, machine learning looks like a natural candidate to make such decisions in a more principled and optimized way. We advocate for pushing further the integration of machine learning and combinatorial optimization and detail a methodology to do so. A main point of the paper is seeing generic optimization problems as data points and inquiring what is the relevant distribution of problems to use for learning on a given task.

translated by 谷歌翻译

Support Vector Machines with the Hard-Margin Loss: Optimal Training via Combinatorial Benders' Cuts

Ítalo Santana , Breno Serrano , Maximilian Schiffer , Thibaut Vidal

分类：机器学习

2022-07-15

由于其损耗函数的无限性，经典的铰链损耗支撑矢量机（SVM）模型对异常观测值敏感。为了解决这个问题，最近的研究集中在非凸损失函数上，例如硬质量损失，该损失将恒定的罚款与任何错误分类或细边样品内的样本相关联。应用此损失函数可为关键应用带来急需的鲁棒性，但它也导致NP硬化模型，这使训练变得困难，因为当前的精确优化算法显示有限的可伸缩性，而启发式方法无法始终找到高质量的解决方案。在这种背景下，我们提出了新的整数编程策略，这些策略可显着提高我们将硬利润SVM模型培训为全球最优性的能力。我们引入了一种迭代采样和分解方法，其中使用较小的子问题来分离组合弯曲器的切割。这些切割量在分支和切割算法中的使用，可以更快地收敛到全球最佳。通过对经典基准数据集的大量数值分析，我们的解决方案算法首次求解了117个新数据集，以达到最佳性，并在基准最困难的数据集的平均最佳差距中降低了50％。

translated by 谷歌翻译