智能论文笔记

Distributionally Robust Graph Learning from Smooth Signals under Moment Uncertainty

Xiaolu Wang , Yuen-Man Pun , Anthony Man-Cho So

分类：机器学习

2021-05-12

我们考虑从有限的嘈杂图形信号观察中学习图表的问题，其目标是找到图形信号的平滑表示。这种问题是通过在大型数据集中推断的关系结构，并且近年来广泛研究了这种问题。大多数现有方法专注于学习观察信号平滑的图表。但是，学习的图表容易过度拟合，因为它不会考虑未观察到的信号。为了解决这个问题，我们提出了一种基于分布稳健优化方法的新型图形学习模型，该模型旨在识别不仅提供了对观察信号中的不确定性的平滑表示的图表。在统计方面，我们建立了我们提出的模型的样本绩效保障。在优化方面，我们表明，在曲线图信号分布的温和假设下，我们提出的模型承认了平滑的非凸优化配方。然后，我们开发了一个预测的渐变方法来解决这一制定并建立其收敛保证。我们的配方在图形学习环境中提供了一个新的正则化视角。此外，综合和实世界数据的广泛数值实验表明，根据各种度量的观察信号的不同群体的模型具有比较不同的群体的较强的性能。

translated by 谷歌翻译

The Performance of Wasserstein Distributionally Robust M-Estimators in High Dimensions

Liviu Aolaritei , Soroosh Shafieezadeh-Abadeh , Florian Dörfler

分类： (统计)机器学习 | 机器学习

2022-06-27

Wasserstein的分布在强大的优化方面已成为强大估计的有力框架，享受良好的样本外部性能保证，良好的正则化效果以及计算上可易处理的双重重新纠正。在这样的框架中，通过将最接近经验分布的所有概率分布中最接近的所有概率分布中最小化的最差预期损失来最大程度地减少估计量。在本文中，我们提出了一个在噪声线性测量中估算未知参数的Wasserstein分布稳定的M估计框架，我们专注于分析此类估计器的平方误差性能的重要且具有挑战性的任务。我们的研究是在现代的高维比例状态下进行的，在该状态下，环境维度和样品数量都以相对的速度进行编码，该速率以编码问题的下/过度参数化的比例。在各向同性高斯特征假设下，我们表明可以恢复平方误差作为凸 - 串联优化问题的解，令人惊讶的是，它在最多四个标量变量中都涉及。据我们所知，这是在Wasserstein分布强劲的M估计背景下研究此问题的第一项工作。

translated by 谷歌翻译

Optimal Convex and Nonconvex Regularizers for a Data Source

Oscar Leong , Eliza O'Reilly , Yong Sheng Soh , Venkat Chandrasekaran

分类： (统计)机器学习

2022-12-27

In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment the objective with a regularizer to address challenges associated with ill-posedness. The choice of a suitable regularizer is typically driven by prior domain information and computational considerations. Convex regularizers are attractive as they are endowed with certificates of optimality as well as the toolkit of convex analysis, but exhibit a computational scaling that makes them ill-suited beyond moderate-sized problem instances. On the other hand, nonconvex regularizers can often be deployed at scale, but do not enjoy the certification properties associated with convex regularizers. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what are the optimal regularizers, both convex and nonconvex, for data drawn from the distribution? What properties of a data source govern whether it is amenable to convex regularization? We address these questions for the class of continuous and positively homogenous regularizers for which convex and nonconvex regularizers correspond, respectively, to convex bodies and star bodies. By leveraging dual Brunn-Minkowski theory, we show that a radial function derived from a data distribution is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization. Using tools such as $\Gamma$-convergence, we show that our results are robust in the sense that the optimal regularizers for a sample drawn from a distribution converge to their population counterparts as the sample size grows large. Finally, we give generalization guarantees that recover previous results for polyhedral regularizers (i.e., dictionary learning) and lead to new ones for semidefinite regularizers.

translated by 谷歌翻译

Sinkhorn Distributionally Robust Optimization

Jie Wang , Rui Gao , Yao Xie

分类：机器学习 | (统计)机器学习

2021-09-24

We study distributionally robust optimization (DRO) with Sinkhorn distance -- a variant of Wasserstein distance based on entropic regularization. We provide convex programming dual reformulation for a general nominal distribution. Compared with Wasserstein DRO, it is computationally tractable for a larger class of loss functions, and its worst-case distribution is more reasonable. We propose an efficient first-order algorithm with bisection search to solve the dual reformulation. We demonstrate that our proposed algorithm finds $\delta$-optimal solution of the new DRO formulation with computation cost $\tilde{O}(\delta^{-3})$ and memory cost $\tilde{O}(\delta^{-2})$, and the computation cost further improves to $\tilde{O}(\delta^{-2})$ when the loss function is smooth. Finally, we provide various numerical examples using both synthetic and real data to demonstrate its competitive performance and light computational speed.

translated by 谷歌翻译

Controlling Wasserstein distances by Kernel norms with application to Compressive Statistical Learning

Titouan Vayer , Rémi Gribonval

分类： (统计)机器学习 | 机器学习

2021-12-01

比较概率分布是许多机器学习算法的关键。最大平均差异（MMD）和最佳运输距离（OT）是在过去几年吸引丰富的关注的概率措施之间的两类距离。本文建立了一些条件，可以通过MMD规范控制Wassersein距离。我们的作品受到压缩统计学习（CSL）理论的推动，资源有效的大规模学习的一般框架，其中训练数据总结在单个向量（称为草图）中，该训练数据捕获与所考虑的学习任务相关的信息。在CSL中的现有结果启发，我们介绍了H \“较旧的较低限制的等距属性（H \”较旧的LRIP）并表明这家属性具有有趣的保证对压缩统计学习。基于MMD与Wassersein距离之间的关系，我们通过引入和研究学习任务的Wassersein可读性的概念来提供压缩统计学习的保证，即概率分布之间的某些特定于特定的特定度量，可以由Wassersein界定距离。

translated by 谷歌翻译

Sparse Graph Learning Under Laplacian-Related Constraints

Jitendra K. Tugnait

分类： (统计)机器学习 | 机器学习

2021-11-16

我们考虑学习底层多变量数据的稀疏无向图的问题。我们专注于稀疏精度矩阵上的图表拉普拉斯相关的约束，它在与图形节点相关联的随机变量之间编码条件依赖性。在这些约束下，精度矩阵的偏差元素是非正（总阳性），并且精度矩阵可能不是全级。我们调查了对广泛使用惩罚的日志似然方法来强制执行总积极性但不是拉普拉斯结构的修改。然后可以从非对角线精密矩阵中提取图拉普拉斯。乘法器（ADMM）算法的交替方向方法被提出和分析了Laplacian相关约束和套索的约束优化以及自适应套索处罚。基于合成数据的数值结果表明，所提出的约束的自适应套索方法显着优于现有的基于拉普拉斯的方法。我们还评估了我们对实际财务数据的方法。

translated by 谷歌翻译

On Asymptotic Linear Convergence of Projected Gradient Descent for Constrained Least Squares

Trung Vu , Raviv Raich

分类：机器学习

2021-12-22

诸如压缩感测，图像恢复，矩阵/张恢复和非负矩阵分子等信号处理和机器学习中的许多近期问题可以作为约束优化。预计的梯度下降是一种解决如此约束优化问题的简单且有效的方法。本地收敛分析将我们对解决方案附近的渐近行为的理解，与全球收敛分析相比，收敛率的较小界限提供了较小的界限。然而，本地保证通常出现在机器学习和信号处理的特定问题领域。此稿件在约束最小二乘范围内，对投影梯度下降的局部收敛性分析提供了统一的框架。该建议的分析提供了枢转局部收敛性的见解，例如线性收敛的条件，收敛区域，精确的渐近收敛速率，以及达到一定程度的准确度所需的迭代次数的界限。为了证明所提出的方法的适用性，我们介绍了PGD的收敛分析的配方，并通过在四个基本问题上的配方的开始延迟应用来证明它，即线性约束最小二乘，稀疏恢复，最小二乘法使用单位规范约束和矩阵完成。

translated by 谷歌翻译

Variable Clustering via Distributionally Robust Nodewise Regression

Kaizheng Wang , Xiao Xu , Xun Yu Zhou

分类：机器学习

2022-12-15

We study a multi-factor block model for variable clustering and connect it to the regularized subspace clustering by formulating a distributionally robust version of the nodewise regression. To solve the latter problem, we derive a convex relaxation, provide guidance on selecting the size of the robust region, and hence the regularization weighting parameter, based on the data, and propose an ADMM algorithm for implementation. We validate our method in an extensive simulation study. Finally, we propose and apply a variant of our method to stock return data, obtain interpretable clusters that facilitate portfolio selection and compare its out-of-sample performance with other clustering methods in an empirical study.

translated by 谷歌翻译

Asymptotics of Network Embeddings Learned via Subsampling

Andrew Davison , Morgane Austern

分类： (统计)机器学习 | 机器学习

2021-07-06

Network data are ubiquitous in modern machine learning, with tasks of interest including node classification, node clustering and link prediction. A frequent approach begins by learning an Euclidean embedding of the network, to which algorithms developed for vector-valued data are applied. For large networks, embeddings are learned using stochastic gradient methods where the sub-sampling scheme can be freely chosen. Despite the strong empirical performance of such methods, they are not well understood theoretically. Our work encapsulates representation methods using a subsampling approach, such as node2vec, into a single unifying framework. We prove, under the assumption that the graph is exchangeable, that the distribution of the learned embedding vectors asymptotically decouples. Moreover, we characterize the asymptotic distribution and provided rates of convergence, in terms of the latent parameters, which includes the choice of loss function and the embedding dimension. This provides a theoretical foundation to understand what the embedding vectors represent and how well these methods perform on downstream tasks. Notably, we observe that typically used loss functions may lead to shortcomings, such as a lack of Fisher consistency.

translated by 谷歌翻译

A Non-Asymptotic Framework for Approximate Message Passing in Spiked Models

Gen Li , Yuting Wei

分类：机器学习 | (统计)机器学习

2022-08-05

近似消息传递（AMP）是解决高维统计问题的有效迭代范式。但是，当迭代次数超过$ o \ big（\ frac {\ log n} {\ log log \ log \ log n} \时big）$（带有$ n $问题维度）。为了解决这一不足，本文开发了一个非吸附框架，用于理解峰值矩阵估计中的AMP。基于AMP更新的新分解和可控的残差项，我们布置了一个分析配方，以表征在存在独立初始化的情况下AMP的有限样本行为，该过程被进一步概括以进行光谱初始化。作为提出的分析配方的两个具体后果：（i）求解$ \ mathbb {z} _2 $同步时，我们预测了频谱初始化AMP的行为，最高为$ o \ big（\ frac {n} {\ mathrm {\ mathrm { poly} \ log n} \ big）$迭代，表明该算法成功而无需随后的细化阶段（如最近由\ citet {celentano2021local}推测）; （ii）我们表征了稀疏PCA中AMP的非反应性行为（在尖刺的Wigner模型中），以广泛的信噪比。

translated by 谷歌翻译

Minimax Optimal Regression over Sobolev Spaces via Laplacian Eigenmaps on Neighborhood Graphs

Alden Green , Sivaraman Balakrishnan , Ryan J. Tibshirani

分类： (统计)机器学习

2021-11-14

本文研究了基于Laplacian Eigenmaps（Le）的基于Laplacian EIGENMAPS（PCR-LE）的主要成分回归的统计性质，这是基于Laplacian Eigenmaps（Le）的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =（y_1，\ ldots，y_n）$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下，PCR-le达到估计的最佳速率（其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /（2s + d））} $）和健美的测试（$ n ^ { - 4s /（4s + d）$）。我们还表明PCR-LE是\ EMPH {歧管Adaptive}：即，我们考虑在小型内在维度$ M $的歧管上支持设计的情况，并为PCR-LE提供更快的界限Minimax估计（$ n ^ { - 2s /（2s + m）$）和测试（$ n ^ { - 4s /（4s + m）$）收敛率。有趣的是，这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说，对于这个问题的回归估计的特征似乎更容易，统计上讲，而不是估计特征本身。我们通过经验证据支持这些理论结果。

translated by 谷歌翻译

Learning Transition Operators From Sparse Space-Time Samples

Christian Kümmerle , Mauro Maggioni , Sui Tang

分类：机器学习 | (统计)机器学习

2022-12-01

We consider the nonlinear inverse problem of learning a transition operator $\mathbf{A}$ from partial observations at different times, in particular from sparse observations of entries of its powers $\mathbf{A},\mathbf{A}^2,\cdots,\mathbf{A}^{T}$. This Spatio-Temporal Transition Operator Recovery problem is motivated by the recent interest in learning time-varying graph signals that are driven by graph operators depending on the underlying graph topology. We address the nonlinearity of the problem by embedding it into a higher-dimensional space of suitable block-Hankel matrices, where it becomes a low-rank matrix completion problem, even if $\mathbf{A}$ is of full rank. For both a uniform and an adaptive random space-time sampling model, we quantify the recoverability of the transition operator via suitable measures of incoherence of these block-Hankel embedding matrices. For graph transition operators these measures of incoherence depend on the interplay between the dynamics and the graph topology. We develop a suitable non-convex iterative reweighted least squares (IRLS) algorithm, establish its quadratic local convergence, and show that, in optimal scenarios, no more than $\mathcal{O}(rn \log(nT))$ space-time samples are sufficient to ensure accurate recovery of a rank-$r$ operator $\mathbf{A}$ of size $n \times n$. This establishes that spatial samples can be substituted by a comparable number of space-time samples. We provide an efficient implementation of the proposed IRLS algorithm with space complexity of order $O(r n T)$ and per-iteration time complexity linear in $n$. Numerical experiments for transition operators based on several graph models confirm that the theoretical findings accurately track empirical phase transitions, and illustrate the applicability and scalability of the proposed algorithm.

translated by 谷歌翻译

Hedging against Complexity: Distributionally Robust Optimization with Parametric Approximation

Garud Iyengar , Henry Lam , Tianyu Wang

分类：机器学习

2022-12-03

Empirical risk minimization (ERM) and distributionally robust optimization (DRO) are popular approaches for solving stochastic optimization problems that appear in operations management and machine learning. Existing generalization error bounds for these methods depend on either the complexity of the cost function or dimension of the uncertain parameters; consequently, the performance of these methods is poor for high-dimensional problems with objective functions under high complexity. We propose a simple approach in which the distribution of uncertain parameters is approximated using a parametric family of distributions. This mitigates both sources of complexity; however, it introduces a model misspecification error. We show that this new source of error can be controlled by suitable DRO formulations. Our proposed parametric DRO approach has significantly improved generalization bounds over existing ERM / DRO methods and parametric ERM for a wide variety of settings. Our method is particularly effective under distribution shifts. We also illustrate the superior performance of our approach on both synthetic and real-data portfolio optimization and regression tasks.

translated by 谷歌翻译

Analysis of Generalized Bregman Surrogate Algorithms for Nonsmooth Nonconvex Statistical Learning

Yiyuan She , Zhifeng Wang , Jiuwu Jin

分类： (统计)机器学习

2021-12-16

现代统计应用常常涉及最小化可能是非流动和/或非凸起的目标函数。本文侧重于广泛的Bregman-替代算法框架，包括本地线性近似，镜像下降，迭代阈值，DC编程以及许多其他实例。通过广义BREGMAN功能的重新发出使我们能够构建合适的误差测量并在可能高维度下建立非凸起和非凸起和非球形目标的全球收敛速率。对于稀疏的学习问题，在一些规律性条件下，所获得的估算器作为代理人的固定点，尽管不一定是局部最小化者，但享受可明确的统计保障，并且可以证明迭代顺序在所需的情况下接近统计事实准确地快速。本文还研究了如何通过仔细控制步骤和放松参数来设计基于适应性的动力的加速度而不假设凸性或平滑度。

translated by 谷歌翻译

Understanding Entropic Regularization in GANs

Daria Reshetova , Yikun Bai , Xiugang Wu , Ayfer Ozgur

分类：机器学习 | (统计)机器学习

2021-11-02

生成的对策网络是一种流行的方法，用于通过根据已知分发的函数来建立目标分布来从数据学习分布的流行方法。经常被称为发电机的功能优化，以最小化所生成和目标分布之间的所选距离测量。这种目的的一个常用措施是Wassersein距离。然而，Wassersein距离难以计算和优化，并且在实践中，使用熵正则化技术来改善数值趋同。然而，正规化对学到的解决方案的影响仍未得到很好的理解。在本文中，我们研究了Wassersein距离的几个流行的熵正规提出如何在一个简单的基准设置中冲击解决方案，其中发电机是线性的，目标分布是高维高斯的。我们表明，熵正则化促进了解决方案稀疏化，同时更换了与秸秆角偏差的Wasserstein距离恢复了不断的解决方案。两种正则化技术都消除了Wasserstein距离所遭受的维度的诅咒。我们表明，可以从目标分布中学习最佳发电机，以$ O（1 / \ epsilon ^ 2）$ samples从目标分布中学习。因此，我们得出结论，这些正则化技术可以提高来自大量分布的经验数据的发电机的质量。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

Efficient Clustering for Stretched Mixtures: Landscape and Optimality

Kaizheng Wang , Yuling Yan , Mateo Díaz

分类： (统计)机器学习 | 机器学习

2020-03-22

本文考虑了一个规范聚类问题，其中一个人从两个椭圆分布的平衡混合物中获取未标记的样本，并旨在估计标签的分类器。许多流行的方法包括PCA和K-Meanse需要混合物的各个组分在稍微球形，并且在拉伸时表现不佳。为了克服这个问题，我们提出了一个非凸面的程序寻求仿射变换，将数据转换为一维点云集中在$ -1 $和1美元之后，之后群集变得容易。我们的理论贡献是两倍：（1）我们表明，当样品大小超过维度的一些恒定倍数时，非凸损耗功能表现出理想的几何特性，以及（2）我们利用这一点，以证明这是一个有效的第一 - 订单算法在没有良好的初始化的情况下实现了近最佳统计精度。我们还提出了一般的方法，用于聚类，具有灵活的特征变换和损失目标。

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译

BRIDGE: Byzantine-resilient Decentralized Gradient Descent

Cheng Fang , Zhixiong Yang , Waheed U. Bajwa

分类： (统计)机器学习 | 机器学习

2019-08-21

机器学习已开始在许多应用中发挥核心作用。这些应用程序中的许多应用程序通常还涉及由于设计约束（例如多元系统）或计算/隐私原因（例如，在智能手机数据上学习），这些数据集分布在多个计算设备/机器上。这样的应用程序通常需要以分散的方式执行学习任务，其中没有直接连接到所有节点的中央服务器。在现实世界中的分散设置中，由于设备故障，网络攻击等，节点容易出现未发现的故障，这可能会崩溃非稳固的学习算法。本文的重点是在发生拜占庭失败的节点的存在下对分散学习的鲁棒化。拜占庭故障模型允许故障节点任意偏离其预期行为，从而确保设计最健壮的算法的设计。但是，与分布式学习相反，对分散学习中拜占庭式的弹性的研究仍处于起步阶段。特别是，现有的拜占庭式分散学习方法要么不能很好地扩展到大规模的机器学习模型，要么缺乏统计收敛性可确保有助于表征其概括错误。在本文中，引入了一个可扩展的，拜占庭式的分散的机器学习框架，称为拜占庭的分散梯度下降（桥梁）。本文中还提供了强烈凸出问题和一类非凸问题的算法和统计收敛保证。此外，使用大规模的分散学习实验来确定桥梁框架是可扩展的，并且为拜占庭式弹性凸和非convex学习提供了竞争结果。

translated by 谷歌翻译

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

Frederic Koehler , Lijia Zhou , Danica J. Sutherland , Nathan Srebro

分类： (统计)机器学习 | 机器学习

2021-06-17

我们考虑与高斯数据的高维线性回归中的插值学习，并在类高斯宽度方面证明了任意假设类别中的内插器的泛化误差。将通用绑定到欧几里德常规球恢复了Bartlett等人的一致性结果。（2020）对于最小规范内插器，并确认周等人的预测。（2020）在高斯数据的特殊情况下，对于近乎最小常态的内插器。我们通过将其应用于单位来证明所界限的一般性，从而获得最小L1-NORM Interpoolator（基础追踪）的新型一致性结果。我们的结果表明，基于规范的泛化界限如何解释并用于分析良性过度装备，至少在某些设置中。

translated by 谷歌翻译