In this paper we survey the primary research, both theoretical and applied, in the area of Robust Optimization (RO). Our focus is on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying prominent theoretical results of RO, we also present some recent results linking RO to adaptable models for multi-stage decision-making problems. Finally, we highlight applications of RO across a wide spectrum of domains, including finance, statistics, learning, and various areas of engineering.
translated by 谷歌翻译
Stochastic programming can effectively describe many decision making problems in uncertain environments. Unfortunately , such programs are often computationally demanding to solve. In addition, their solution can be misleading when there is ambiguity in the choice of a distribution for the random parameters. In this paper, we propose a model that describes uncertainty in both the distribution form (discrete, Gaussian, exponential, etc.) and moments (mean and covari-ance matrix). We demonstrate that for a wide range of cost functions the associated distributionally robust (or min-max) stochastic program can be solved efficiently. Furthermore, by deriving a new confidence region for the mean and the covariance matrix of a random vector, we provide probabilistic arguments for using our model in problems that rely heavily on historical data. These arguments are confirmed in a practical example of portfolio selection, where our framework leads to better performing policies on the "true" distribution underlying the daily returns of financial assets.
translated by 谷歌翻译
本文研究了凸优化方法在社区检测中的最新理论进展。我们介绍了一些重要的理论技术和结果,以建立各种统计模型下凸社区检测的一致性。特别是,我们讨论了基于原始和双重分析的基本技术。我们还提出了一些结果,证明了凸群落检测的几个独特优势,包括对异常节点的鲁棒性,弱相关性下的一致性以及对异质度的适应性。本调查并不是对这个快速发展的主题的大量文献的完整概述。相反,我们的目标是提供该领域近期显着发展的全貌,并使调查可供广大受众使用。我们希望这篇说明文章可以作为有兴趣在网络分析中使用,设计和分析凸松弛方法的读者的入门指南。
translated by 谷歌翻译
The stochastic block model (SBM) is a popular framework for studying community detection in networks. This model is limited by the assumption that all nodes in the same community are statistically equivalent and have equal expected degrees. The degree-corrected stochastic block model (DCSBM) is a natural extension of SBM that allows for degree heterogeneity within communities. This paper proposes a convexi-fied modularity maximization approach for estimating the hidden communities under DCSBM. Our approach is based on a convex programming relaxation of the classical (generalized) modularity maximization formulation, followed by a novel doubly-weighted 1-norm k-median procedure. We establish non-asymptotic theoretical guarantees for both approximate clustering and perfect clustering. Our approximate clustering results are insensitive to the minimum degree, and hold even in sparse regime with bounded average degrees. In the special case of SBM, these theoretical results match the best-known performance guarantees of computationally feasible algorithms. Numerically, we provide an efficient implementation of our algorithm, which is applied to both synthetic and real-world networks. Experiment results show that our method enjoys competitive performance compared to the state of the art in the literature.
translated by 谷歌翻译
We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added 1-norm penalty term. The problem as formulated is convex but the memory requirements and complexity of existing interior point methods are prohibitive for problems with more than tens of nodes. We present two new algorithms for solving problems with at least a thousand nodes in the Gaussian case. Our first algorithm uses block coordinate descent, and can be interpreted as recursive 1-norm penalized regression. Our second algorithm, based on Nesterov's first order method, yields a complexity estimate with a better dependence on problem size than existing interior point methods. Using a log determinant relaxation of the log partition function (Wainwright and Jordan, 2006), we show that these same algorithms can be used to solve an approximate sparse maximum likelihood problem for the binary case. We test our algorithms on synthetic data, as well as on gene expression and senate voting records data.
translated by 谷歌翻译
This paper considers the problem of clustering a partially observedunweighted graph---i.e., one where for some node pairs we know there is an edgebetween them, for some others we know there is no edge, and for the remainingwe do not know whether or not there is an edge. We want to organize the nodesinto disjoint clusters so that there is relatively dense (observed)connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on findingthe clustering that minimizes the number of "disagreements"---i.e., the sum ofthe number of (observed) missing edges within clusters, and (observed) presentedges across clusters. Our algorithm uses convex optimization; its basis is areduction of disagreement minimization to the problem of recovering an(unknown) low-rank matrix and an (unknown) sparse matrix from their partiallyobserved sum. We evaluate the performance of our algorithm on the classicalPlanted Partition/Stochastic Block Model. Our main theorem provides sufficientconditions for the success of our algorithm as a function of the minimumcluster size, edge density and observation probability; in particular, theresults characterize the tradeoff between the observation probability and theedge density gap. When there are a constant number of clusters of equal size,our results are optimal up to logarithmic factors.
translated by 谷歌翻译
Graph clustering involves the task of dividing nodes into clusters, so that the edge density is higher within clusters as opposed to across clusters. A natural, classic and popular statistical setting for evaluating solutions to this problem is the stochastic block model, also referred to as the planted partition model. In this paper we present a new algorithm-a convexified version of Maximum Likelihood-for graph clustering. We show that, in the classic stochastic block model setting, it outperforms existing methods by polynomial factors when the cluster size is allowed to have general scalings. In fact, it is within logarithmic factors of known lower bounds for spectral methods, and there is evidence suggesting that no polynomial time algorithm would do significantly better. We then show that this guarantee carries over to a more general extension of the stochastic block model. Our method can handle the settings of semi-random graphs, heterogeneous degree distributions, unequal cluster sizes, unaffiliated nodes, partially observed graphs and planted clique/coloring etc. In particular, our results provide the best exact recovery guarantees to date for the planted partition, planted k-disjoint-cliques and planted noisy coloring models with general cluster sizes; in other settings, we match the best existing results up to logarithmic factors.
translated by 谷歌翻译
To address difficult optimization problems, convex relaxations based on semidefinite programming are now common place in many fields. Although solvable in polynomial time, large semidefinite programs tend to be computationally challenging. Over a decade ago, exploiting the fact that in many applications of interest the desired solutions are low rank, Burer and Monteiro proposed a heuristic to solve such semidefinite programs by restricting the search space to low-rank matrices. The accompanying theory does not explain the extent of the empirical success. We focus on Synchronization and Community Detection problems and provide theoretical guarantees shedding light on the remarkable efficiency of this heuristic.
translated by 谷歌翻译
我们提出了一种新的超完全独立分量分析(ICA)算法,其中潜在源k的数量超过观察到的变量的维数p。先前的算法要么具有高计算复杂度,要么对混合矩阵的形式做出强有力的假设。我们的算法不做任何稀疏性假设,但仍具有良好的计算和理论性质。我们的算法包括两个主要步骤:(a)估计累积量生成函数的Hessians(与大多数算法使用的四阶和更高阶累积量相反)和(b)用于恢复a的新的半定规划(SDP)松弛我们表明,利用投影加速梯度下降法可以有效地解决这种松弛问题,使整个算法在计算上具有实用性。此外,我们推测所提出的程序以k <p ^ 2/4的速率恢复混合分量,并证明当k <(2-epsilon)p log时原始分量均匀采样时,可以高概率地回收混合成分。随机的超球面。在合成数据和真实图像的CIFAR-10数据集上提供实验。
translated by 谷歌翻译
In data-driven inverse optimization an observer aims to learn the preferences of an agent who solves a parametric optimization problem depending on an exogenous signal. Thus, the observer seeks the agent's objective function that best explains a historical sequence of signals and corresponding optimal actions. We focus here on situations where the observer has imperfect information, that is, where the agent's true objective function is not contained in the search space of candidate objectives, where the agent suffers from bounded rationality or implementation errors, or where the observed signal-response pairs are corrupted by measurement noise. We formalize this inverse optimization problem as a distributionally robust program minimizing the worst-case risk that the predicted decision (i.e., the decision implied by a particular candidate objective) differs from the agent's actual response to a random signal. We show that our framework offers rigorous out-of-sample guarantees for different loss functions used to measure prediction errors and that the emerging inverse optimization problems can be exactly reformulated as (or safely approximated by) tractable convex programs when a new suboptimality loss function is used. We show through extensive numerical tests that the proposed distributionally robust approach to inverse optimization attains often better out-of-sample performance than the state-of-the-art approaches.
translated by 谷歌翻译
We perform a finite sample analysis of the detection levels for sparse principal components of a high-dimensional covariance matrix. Our mini-max optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general, and we describe a compu-tationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.
translated by 谷歌翻译
A common assumption in supervised machine learning is that the training examples provided to the learning algorithm are statistically identical to the instances encountered later on, during the classification phase. This assumption is unrealistic in many real-world situations where machine learning techniques are used. We focus on the case where features of a binary classification problem, which were available during the training phase, are either deleted or become corrupted during the classification phase. We prepare for the worst by assuming that the subset of deleted and corrupted features is controlled by an adversary , and may vary from instance to instance. We design and analyze two novel learning algorithms that anticipate the actions of the adversary and account for them when training a classifier. Our first technique formulates the learning problem as a linear program. We discuss how the particular structure of this program can be exploited for computational efficiency and we prove statistical bounds on the risk of the resulting classifier. Our second technique addresses the robust learning problem by combining a modified version of the Perceptron algorithm with an online-to-batch conversion technique, and also comes with statistical generalization guarantees. We demonstrate the effectiveness of our approach with a set of experiments.
translated by 谷歌翻译
This paper is concerned with the optimal distributed control (ODC) problem for discrete-time deterministic and stochastic systems. The objective is to design a fixed-order distributed controller with a pre-specified structure that is globally optimal with respect to a quadratic cost functional. It is shown that this NP-hard problem has a quadratic formulation, which can be relaxed to a semidefinite program (SDP). If the SDP relaxation has a rank-1 solution, a globally optimal distributed controller can be recovered from this solution. By utilizing the notion of treewidth, it is proved that the nonlinearity of the ODC problem appears in such a sparse way that an SDP relaxation of this problem has a matrix solution with rank at most 3. Since the proposed SDP relaxation is computationally expensive for a large-scale system, a computationally-cheap SDP relaxation is also developed with the property that its objective function indirectly penalizes the rank of the SDP solution. Various techniques are proposed to approximate a low-rank SDP solution with a rank-1 matrix, leading to recovering a near-global controller together with a bound on its optimality degree. The above results are developed for both finite-horizon and infinite horizon ODC problems. While the finite-horizon ODC is investigated using a time-domain formulation, the infinite-horizon ODC problem for both deterministic and stochastic systems is studied via a Lyapunov formulation. The SDP relaxations developed in this work are exact for the design of a centralized controller, hence serving as an alternative for solving Riccati equations. The efficacy of the proposed SDP relaxations is elucidated in numerical examples.
translated by 谷歌翻译
This paper addresses the optimal control problem known as the Linear Quadratic Regulator in the case when the dynamics are unknown. We propose a multi-stage procedure, called Coarse-ID control, that estimates a model from a few experimental trials, estimates the error in that model with respect to the truth, and then designs a controller using both the model and uncertainty estimate. Our technique uses contemporary tools from random matrix theory to bound the error in the estimation procedure. We also employ a recently developed approach to control synthesis called System Level Synthesis that enables robust control design by solving a quasiconvex optimization problem. We provide end-to-end bounds on the relative error in control cost that are optimal in the number of parameters and that highlight salient properties of the system to be controlled such as closed-loop sensitivity and optimal control magnitude. We show experimentally that the Coarse-ID approach enables efficient computation of a stabilizing controller in regimes where simple control schemes that do not take the model uncertainty into account fail to stabilize the true system.
translated by 谷歌翻译
We study the computational complexity of approximating the 2-to-q norm of linear operators (defined as A 2→q = max v0 Av q /v 2) for q > 2, as well as connections between this question and issues arising in quantum information theory and the study of Khot's Unique Games Conjecture (UGC). We show the following: 1. For any constant even integer q 4, a graph G is a small-set expander if and only if the projector into the span of the top eigenvectors of G's adjacency matrix has bounded 2 → q norm. As a corollary, a good approximation to the 2 → q norm will refute the Small-Set Expansion Conjecture-a close variant of the UGC. We also show that such a good approximation can be computed in exp(n 2/q) time, thus obtaining a different proof of the known subexponential algorithm for Small-Set Expansion. 2. Constant rounds of the "Sum of Squares" semidefinite programing hierarchy certify an upper bound on the 2 → 4 norm of the projector to low-degree polynomials over the Boolean cube, as well certify the unsatisfiability of the "noisy cube" and "short code" based instances of Unique Games considered by prior works. This improves on the previous upper bound of exp(log O(1) n) rounds (for the "short code"), as well as separates the "Sum of Squares"/"Lasserre" hierarchy from weaker hierarchies that were known to require ω(1) rounds. * Microsoft Research New England.. Much of the work done while the author was an intern at Microsoft Research New England. 3. We show reductions between computing the 2 → 4 norm and computing the injective tensor norm of a tensor, a problem with connections to quantum information theory. Three corollaries are: (i) the 2 → 4 norm is NP-hard to approximate to precision inverse-polynomial in the dimension, (ii) the 2 → 4 norm does not have a good approximation (in the sense above) unless 3-SAT can be solved in time exp(√ n poly log(n)), and (iii) known algorithms for the quantum sep-arability problem imply a non-trivial additive approximation for the 2 → 4 norm.
translated by 谷歌翻译
We consider the least-square regression problem with regularization by ablock 1-norm, i.e., a sum of Euclidean norms over spaces of dimensions largerthan one. This problem, referred to as the group Lasso, extends the usualregularization by the 1-norm where all spaces have dimension one, where it iscommonly referred to as the Lasso. In this paper, we study the asymptotic modelconsistency of the group Lasso. We derive necessary and sufficient conditionsfor the consistency of group Lasso under practical assumptions, such as modelmisspecification. When the linear predictors and Euclidean norms are replacedby functions and reproducing kernel Hilbert norms, the problem is usuallyreferred to as multiple kernel learning and is commonly used for learning fromheterogeneous data sources and for non linear variable selection. Using toolsfrom functional analysis, and in particular covariance operators, we extend theconsistency results to this infinite dimensional case and also propose anadaptive scheme to obtain a consistent model estimate, even when the necessarycondition required for the non adaptive scheme is not satisfied.
translated by 谷歌翻译
We present a Distributionally Robust Optimization (DRO) approach to estimate a robusti-fied regression plane in a linear regression setting, when the observed samples are potentially contaminated with adversarially corrupted outliers. Our approach mitigates the impact of outliers by hedging against a family of probability distributions on the observed data, some of which assign very low probabilities to the outliers. The set of distributions under consideration are close to the empirical distribution in the sense of the Wasserstein metric. We show that this DRO formulation can be relaxed to a convex optimization problem which encompasses a class of models. By selecting proper norm spaces for the Wasserstein metric, we are able to recover several commonly used regularized regression models. We provide new insights into the regularization term and give guidance on the selection of the regularization coefficient from the standpoint of a confidence region. We establish two types of performance guarantees for the solution to our formulation under mild conditions. One is related to its out-of-sample behavior (prediction bias), and the other concerns the discrepancy between the estimated and true regression planes (estimation bias). Extensive numerical results demonstrate the superiority of our approach to a host of regression models , in terms of the prediction and estimation accuracies. We also consider the application of our robust learning procedure to outlier detection, and show that our approach achieves a much higher AUC (Area Under the ROC Curve) than M-estimation (Huber, 1964, 1973).
translated by 谷歌翻译
本文试图弥合研究人员在基于正定核的两种广泛使用的方法之间的概念差距:一方使用高斯过程进行贝叶斯学习或推理,另一方面基于再生核希尔伯特空间的常用核方法。机器学习中众所周知,这两种形式是密切相关的;例如,核岭回归的估计与高斯过程回归的后验均值相同。然而,它们几乎是由两个基本上独立的社区独立研究和开发的,这使得难以在它们之间无缝地转移结果。我们的目标是克服这一潜在的困难。为此,我们回顾了来自任何一方的几个新旧结果和概念,并将每个框架中的算法数量并列,以突出显示相似性。我们还讨论了两种方法之间微妙的哲学和理论差异。
translated by 谷歌翻译
我们考虑了在聚合样本中估计稀疏高斯图形模型(sGGM)的额外知识的问题,这些模型经常出现在生物信息学和神经成像应用中。以前的联合sGGM估计未能使用现有知识,或者无法在高维(大$ $)情况下扩展到许多任务(大$ K $)。在本文中,我们提出了一个新的\下划线{J} oint \下划线{E} lementary \ underline {E} stimatorincorporating additional \ underline {K} nowledge(JEEK)从大规模异构数据中推断出多个相关的稀疏高斯图形模型。使用域知识作为权重,我们设计了一个新的混合规范作为最小化目标,以强制实现两个加权稀疏约束的叠加,一个在共享交互上,另一个在任务特定的结构模式上。这使得JEEK能够优雅地考虑基于现有领域的各种形式的现有知识,并且无需设计知识特定的优化。 JEEK通过快速且可入门的可并行化解决方案解决,该解决方案大大提高了最先进的$ O(p ^ 5K ^ 4)$到$ O(p ^ 2K ^ 4)$的计算效率。我们进行了严格的统计分析,显示JEEK实现了相同的收敛率$ O(\ log(Kp)/ n_ {tot})$作为最先进的计算器估算器。根据经验,在多个合成数据集和两个真实世界数据上,JEEK在达到相同水平的预测准确度的同时,显着优于现有技术的速度。可用作R工具“jeek”
translated by 谷歌翻译
Phase retrieval seeks to recover a signal x from the amplitude |Ax| of linearmeasurements. We cast the phase retrieval problem as a non-convex quadraticprogram over a complex phase vector and formulate a tractable relaxation(called PhaseCut) similar to the classical MaxCut semidefinite program. Wesolve this problem using a provably convergent block coordinate descentalgorithm whose structure is similar to that of the original greedy algorithmin Gerchberg-Saxton, where each iteration is a matrix vector product. Numericalresults show the performance of this approach over three different phaseretrieval problems, in comparison with greedy phase retrieval algorithms andmatrix completion formulations.
translated by 谷歌翻译