智能论文笔记

Improved Generalization Bound and Learning of Sparsity Patterns for Data-Driven Low-Rank Approximation

Shinsaku Sakaue , Taihei Oki

分类：机器学习

2022-09-17

学习素描矩阵的快速，准确的低级别近似（LRA）的注意力越来越多。最近，Bartlett，Indyk和Wagner（Colt 2022）提出了针对基于学习的LRA的概括。具体来说，对于使用$ m \ times n $学习的级别$ k $近似，每列中$ s $ non-Zeros的素描矩阵，他们证明了$ \ tilde {\ tilde {\ mathrm {o}}（nsm）$ \ emph {fat Shattering Dimension}（$ \ tilde {\ mathrm {o}} $隐藏对数因素）。我们以他们的工作为基础，并做出了两项贡献。 1.我们提出了一个更好的$ \ tilde {\ mathrm {o}}（nsk）$ bund（$ k \ le m $）。在获得界限的途径中，我们给出一个低复杂性\ emph {goldberg - jerrum算法}，用于计算伪内矩阵，这将具有独立的关注。 2.我们可以缓解先前研究的假设，即素描矩阵的稀疏模式是固定的。我们证明，非二方的学习位置仅将脂肪破碎的维度增加到$ {\ mathrm {o}}（ns \ log n）$。此外，实验证实了学习稀疏模式的实际好处。

translated by 谷歌翻译

Generalization Bounds for Data-Driven Numerical Linear Algebra

Peter Bartlett , Piotr Indyk , Tal Wagner

分类：机器学习

2022-06-16

数据驱动的算法可以通过从输入的训练样本中学习，可以使其内部结构或参数适应来自未知应用程序特定分布的输入。最近的一些作品将这种方法应用于数值线性代数中的问题，获得了绩效的显着经验增长。然而，尚无理论上的成功解释。在这项工作中，我们证明了这些算法的概括范围，在Gupta和Roughgarden提出的数据驱动算法选择的PAC学习框架内（Sicomp 2017）。我们的主要结果与Indyk等人的基于学习的低级近似算法的脂肪破碎维度紧密匹配（Neurips 2019）。我们的技术是一般的，并为数值线性代数中的许多其他最近提出的数据驱动算法提供了概括，涵盖了基于草图的基于草图的方法和基于多机的方法。这大大扩展了可用的PAC学习分析的数据驱动算法类别。

translated by 谷歌翻译

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

Nathan Halko , Per-Gunnar Martinsson , Joel A. Tropp

分类：

2009-09-22

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets.This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

translated by 谷歌翻译

Low-Rank Approximation with $1/ε^{1/3}$ Matrix-Vector Products

Ainesh Bakshi , Kenneth L. Clarkson , David P. Woodruff

分类：机器学习

2022-02-10

我们研究基于Krylov子空间的迭代方法，用于在任何Schatten $ p $ Norm中的低级别近似值。在这里，通过矩阵向量产品访问矩阵$ a $ $如此$ \ | a（i -zz^\ top）\ | _ {s_p} \ leq（1+ \ epsilon）\ min_ {u^\ top u = i_k} } $，其中$ \ | m \ | _ {s_p} $表示$ m $的单数值的$ \ ell_p $ norm。对于$ p = 2 $（frobenius norm）和$ p = \ infty $（频谱规范）的特殊情况，musco and Musco（Neurips 2015）获得了基于Krylov方法的算法，该方法使用$ \ tilde {o}（k）（k /\ sqrt {\ epsilon}）$ matrix-vector产品，改进na \“ ive $ \ tilde {o}（k/\ epsilon）$依赖性，可以通过功率方法获得，其中$ \ tilde {o} $抑制均可抑制poly $（\ log（dk/\ epsilon））$。我们的主要结果是仅使用$ \ tilde {o}（kp^{1/6}/\ epsilon^{1/3} {1/3}）$ matrix $ matrix的算法 - 矢量产品，并为所有$ p \ geq 1 $。为$ p = 2 $工作，我们的限制改进了先前的$ \ tilde {o}（k/\ epsilon^{1/2}）$绑定到$ \ tilde {o}（k/\ epsilon^{1/3}）$。由于schatten- $ p $和schatten-$ \ infty $ norms在$（1+ \ epsilon）$ pers $ p时相同\ geq（\ log d）/\ epsilon $，我们的界限恢复了Musco和Musco的结果，以$ p = \ infty $。此外，我们证明了矩阵矢量查询$ \ omega的下限（1/\ epsilon^ {1/3}）$对于任何固定常数$ p \ geq 1 $，表明令人惊讶的$ \ tilde {\ theta}（1/\ epsilon^{ 1/3}）$是常数〜$ k $的最佳复杂性。为了获得我们的结果，我们介绍了几种新技术，包括同时对多个Krylov子空间进行优化，以及针对分区操作员的不平等现象。我们在[1,2] $中以$ p \的限制使用了Araki-lieb-thirring Trace不平等，而对于$ p> 2 $，我们呼吁对安装分区操作员的规范压缩不平等。

translated by 谷歌翻译

Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning

Nai-Hui Chia , András Gilyén , Tongyang Li , Han-Hsuan Lin , Ewin Tang , Chunhao Wang

分类：机器学习

2019-10-14

我们提出了一个算法框架，用于近距离矩阵上的量子启发的经典算法，概括了Tang的突破性量子启发算法开始的一系列结果，用于推荐系统[STOC'19]。由量子线性代数算法和gily \'en，su，low和wiebe [stoc'19]的量子奇异值转换（SVT）框架[SVT）的动机[STOC'19]，我们开发了SVT的经典算法合适的量子启发的采样假设。我们的结果提供了令人信服的证据，表明在相应的QRAM数据结构输入模型中，量子SVT不会产生指数量子加速。由于量子SVT框架基本上概括了量子线性代数的所有已知技术，因此我们的结果与先前工作的采样引理相结合，足以概括所有有关取消量子机器学习算法的最新结果。特别是，我们的经典SVT框架恢复并经常改善推荐系统，主成分分析，监督聚类，支持向量机器，低秩回归和半决赛程序解决方案的取消结果。我们还为汉密尔顿低级模拟和判别分析提供了其他取消化结果。我们的改进来自识别量子启发的输入模型的关键功能，该模型是所有先前量子启发的结果的核心：$ \ ell^2 $ -Norm采样可以及时近似于其尺寸近似矩阵产品。我们将所有主要结果减少到这一事实，使我们的简洁，独立和直观。

translated by 谷歌翻译

Arithmetic Circuits, Structured Matrices and (not so) Deep Learning

Atri Rudra

分类：机器学习

2022-06-24

这项调查表明，在算术电路复杂性，结构化矩阵和深度学习的交集中，一定是不完整的（偏见）概述的结果。最近，有一些研究活动在通过结构化的网络中代替神经网络中的非结构化重量矩阵（目的是减少相应的深度学习模型的大小）。这项工作的大部分都是实验性的，在这项调查中，我们将研究问题正式化，并展示了最新的工作如何结合算术电路复杂性，结构化矩阵和深度学习，从本质上回答了这个问题。这项调查针对的是复杂的理论家，他们可能喜欢阅读有关算术电路复杂性中开发的工具如何帮助设计（据我们所知）一个新的结构化矩阵家族，这反过来又非常适合深度学习。但是，我们希望主要对深度学习感兴趣的人们也会欣赏与复杂性理论的联系。

translated by 谷歌翻译

Low Rank Approximation for General Tensor Networks

Arvind V. Mahankali , David P. Woodruff , Ziyu Zhang

分类：机器学习

2022-07-15

我们研究了用$ q $ modes $ a \ in \ mathbb {r}^{n \ times \ ldots \ times n} $的近似给定张量的问题。图$ g =（v，e）$，其中$ | v | = q $，以及张张量的集合$ \ {u_v \ mid v \ in v \} $，以$ g $指定的方式收缩以获取张量$ t $。对于$ u_v $的每种模式，对应于$ v $的边缘事件，尺寸为$ k $，我们希望找到$ u_v $，以便最小化$ t $和$ a $之间的frobenius norm距离。这概括了许多众所周知的张量网络分解，例如张量列，张量环，塔克和PEPS分解。我们大约是二进制树网络$ t'$带有$ o（q）$核的大约$ a $，因此该网络的每个边缘上的尺寸最多是$ \ widetilde {o}（k^{o（dt） } \ cdot q/\ varepsilon）$，其中$ d $是$ g $的最大度，$ t $是其树宽，因此$ \ | a -t'-t'\ | _f^2 \ leq（1 + \ Varepsilon）\ | a -t \ | _f^2 $。我们算法的运行时间为$ o（q \ cdot \ text {nnz}（a）） + n \ cdot \ text {poly}（k^{dt} q/\ varepsilon）$，其中$ \ text {nnz }（a）$是$ a $的非零条目的数量。我们的算法基于一种可能具有独立感兴趣的张量分解的新维度降低技术。我们还开发了固定参数可处理的$（1 + \ varepsilon）$ - 用于张量火车和塔克分解的近似算法，改善了歌曲的运行时间，Woodruff和Zhong（Soda，2019），并避免使用通用多项式系统求解器。我们表明，我们的算法对$ 1/\ varepsilon $具有几乎最佳的依赖性，假设没有$ O（1）$ - 近似算法的$ 2 \至4 $ norm，并且运行时间比蛮力更好。最后，我们通过可靠的损失函数和固定参数可拖动CP分解给出了塔克分解的其他结果。

translated by 谷歌翻译

A Strongly Polynomial Algorithm for Approximate Forster Transforms and its Application to Halfspace Learning

Ilias Diakonikolas , Christos Tzamos , Daniel M. Kane

分类：机器学习 | (统计)机器学习

2022-12-06

The Forster transform is a method of regularizing a dataset by placing it in {\em radial isotropic position} while maintaining some of its essential properties. Forster transforms have played a key role in a diverse range of settings spanning computer science and functional analysis. Prior work had given {\em weakly} polynomial time algorithms for computing Forster transforms, when they exist. Our main result is the first {\em strongly polynomial time} algorithm to compute an approximate Forster transform of a given dataset or certify that no such transformation exists. By leveraging our strongly polynomial Forster algorithm, we obtain the first strongly polynomial time algorithm for {\em distribution-free} PAC learning of halfspaces. This learning result is surprising because {\em proper} PAC learning of halfspaces is {\em equivalent} to linear programming. Our learning approach extends to give a strongly polynomial halfspace learner in the presence of random classification noise and, more generally, Massart noise.

translated by 谷歌翻译

Sharp Analysis of Sketch-and-Project Methods via a Connection to Randomized Singular Value Decomposition

Michał Dereziński , Elizaveta Rebrova

分类： (统计)机器学习

2022-08-20

素描和项目是一个框架，它统一了许多已知的迭代方法来求解线性系统及其变体，并进一步扩展了非线性优化问题。它包括流行的方法，例如随机kaczmarz，坐标下降，凸优化的牛顿方法的变体等。在本文中，我们通过新的紧密频谱边界为预期的草图投影矩阵获得了素描和项目的收敛速率的敏锐保证。我们的估计值揭示了素描和项目的收敛率与另一个众所周知但看似无关的算法家族的近似误差之间的联系，这些算法使用草图加速了流行的矩阵因子化，例如QR和SVD。这种连接使我们更接近准确量化草图和项目求解器的性能如何取决于其草图大小。我们的分析不仅涵盖了高斯和次高斯的素描矩阵，还涵盖了一个有效的稀疏素描方法，称为较少的嵌入方法。我们的实验备份了理论，并证明即使极稀疏的草图在实践中也显示出相同的收敛属性。

translated by 谷歌翻译

Algorithmic Gaussianization through Sketching: Converting Data into Sub-gaussian Random Designs

Michał Dereziński

分类：机器学习 | (统计)机器学习

2022-06-21

算法高斯化是一种现象，当使用随机素描或采样方法生成较小的大数据集的较小表示时，可能会出现的现象：对于某些任务，已经观察到这些草图表示表现出许多可靠的性能特征，这些性能是在数据样本中出现的，这些性能来自次高斯随机设计，是一个强大的数据分布统计模型。但是，这种现象仅研究了特定的任务和指标，或依靠计算昂贵的方法。我们通过为平均值提供用于高斯数据分布的算法框架来解决这一问题，并证明可以有效构建几乎无法区分的数据草图（与亚高斯随机设计有关的总变化距离）。特别是，依靠最近引入的素描技术称为杠杆得分稀疏（少）嵌入，我们表明一个人可以构造$ n \ times d $矩阵$ a $的$ n \ times d $ sketch of $ n \ times d $ n \ ll n $，几乎与次高斯设计几乎没有区别$ a $中的非零条目的数量。结果，可以直接适用于我们的草图框架，可直接适用于我们的草图框架。我们通过对草图最小二乘正方形的新近似保证进行了说明。

translated by 谷歌翻译

Active Sampling for Linear Regression Beyond the $\ell_2$ Norm

Cameron Musco , Christopher Musco , David P. Woodruff , Taisuke Yasuda

分类：机器学习 | (统计)机器学习

2021-11-09

我们研究了用于线性回归的主动采样算法，该算法仅旨在查询目标向量$ b \ in \ mathbb {r} ^ n $的少量条目，并将近最低限度输出到$ \ min_ {x \ In \ mathbb {r} ^ d} \ | ax-b \ | $，其中$ a \ in \ mathbb {r} ^ {n \ times d} $是一个设计矩阵和$ \ | \ cdot \ | $是一些损失函数。对于$ \ ell_p $ norm回归的任何$ 0 <p <\ idty $，我们提供了一种基于Lewis权重采样的算法，其使用只需$ \ tilde {o}输出$（1+ \ epsilon）$近似解决方案（d ^ {\ max（1，{p / 2}）} / \ mathrm {poly}（\ epsilon））$查询到$ b $。我们表明，这一依赖于$ D $是最佳的，直到对数因素。我们的结果解决了陈和Derezi的最近开放问题，陈和Derezi \'{n} Ski，他们为$ \ ell_1 $ norm提供了附近的最佳界限，以及$ p \中的$ \ ell_p $回归的次优界限（1,2） $。我们还提供了$ O的第一个总灵敏度上限（D ^ {\ max \ {1，p / 2 \} \ log ^ 2 n）$以满足最多的$ p $多项式增长。这改善了Tukan，Maalouf和Feldman的最新结果。通过将此与我们的技术组合起来的$ \ ell_p $回归结果，我们获得了一个使$ \ tilde o的活动回归算法（d ^ {1+ \ max \ {1，p / 2 \}} / \ mathrm {poly}。（\ epsilon））$疑问，回答陈和德里兹的另一个打开问题{n}滑雪。对于Huber损失的重要特殊情况，我们进一步改善了我们对$ \ tilde o的主动样本复杂性的绑定（d ^ {（1+ \ sqrt2）/ 2} / \ epsilon ^ c）$和非活跃$ \ tilde o的样本复杂性（d ^ {4-2 \ sqrt 2} / \ epsilon ^ c）$，由于克拉克森和伍德拉夫而改善了Huber回归的以前的D ^ 4 $。我们的敏感性界限具有进一步的影响，使用灵敏度采样改善了各种先前的结果，包括orlicz规范子空间嵌入和鲁棒子空间近似。最后，我们的主动采样结果为每种$ \ ell_p $ norm提供的第一个Sublinear时间算法。

translated by 谷歌翻译

Fast and Near-Optimal Diagonal Preconditioning

Arun Jambulapati , Jerry Li , Christopher Musco , Aaron Sidford , Kevin Tian

分类：机器学习 | (统计)机器学习

2020-08-04

求解线性系统的迭代方法的收敛速率$ \ mathbf {a} x = b $通常取决于矩阵$ \ mathbf {a} $的条件号。预处理是通过以计算廉价的方式减少该条件号来加速这些方法的常用方式。在本文中，我们通过左或右对角线重构重新审视如何最好地提高$ \ mathbf {a}条件号的数十年。我们在几个方向上取得了这个问题。首先，我们为缩放$ \ mathbf {a} $的经典启发式提供了新的界限（a.k.a.jacobi预处理）。我们证明了这种方法将$ \ MATHBF {a} $的条件号减少到最佳可能缩放的二次因素中。其次，我们为结构化混合包装和覆盖了Semidefinite程序（MPC SDP）提供了一个求解器，它计算$ \ mathbf {a} $ in $ \ widetilde {o}（\ text {nnz}（\ mathbf {a}）\ cdot \ text {poly}（\ kappa ^ \ star））$ time;这与在缩放到$ \ widetilde {o}（\ text {poly}（\ kappa ^ \ star））$ factors之后求解线性系统的成本匹配。第三，我们证明了足够一般的宽度无关的MPC SDP求解器将暗示我们考虑的缩放问题的近乎最佳的运行时间，以及与平均调理措施有关的自然变体。最后，我们突出了我们的预处理技术与半随机噪声模型的连接，以及在几种统计回归模型中降低风险的应用。

translated by 谷歌翻译

Hierarchical Identifiability in Multi-layer Sparse Matrix Factorization

Léon Zheng , Elisa Riccietti , Rémi Gribonval

分类：机器学习

2021-10-04

许多众所周知的矩阵$ Z $与FORMS $ z = x ^ j \ ldots x ^ 1 $相对应的快速变换相关联，其中每个因素$ x ^ \ ell $稀疏和可能结构化。本文研究了这种因素的基本独特性。我们的第一个主要贡献是证明具有所谓的蝴蝶结构的任何$ n \ times n $矩阵承认为$ j $蝴蝶因子（其中$ n = 2 ^ $），并且这些因素可以是通过分层分解方法恢复。这与现有的方法形成对比，其通过梯度下降将蝴蝶因子产品拟合到给定基质的乘积。该提出的方法可以特别应用于检索Hadamard或离散傅里叶变换矩阵的尺寸为2 ^ j $的分解。计算此类构建的成本$ \ mathcal {o}（n ^ 2）$，它是密集矩阵 - 矢量乘法的顺序，而获得的因子化使能快速$ \ mathcal {o}（n \ log n）$矩阵 - 矢量乘法。此分层标识性属性依赖于最近建立的两层和固定支持设置中的简单标识性条件。虽然蝴蝶结构对应于每个因素的固定规定的支撑，但我们的第二款贡献是通过允许的稀疏模式的更多普通家庭获得可识别性结果，同时考虑到不可避免的诽谤歧义。通常，我们通过分层范式展示了分离傅里叶变换矩阵的蝴蝶分解矩阵为2 ^ j $承认为$ 2 $ 2 $-al-dialAlysity的$ 2 $-ad-assity时，将独特的稀疏因子分解为$ j $ factors。关于每个因素。

translated by 谷歌翻译

Sketching Algorithms and Lower Bounds for Ridge Regression

Praneeth Kacham , David P. Woodruff

分类：机器学习

2022-04-13

我们给出了一种基于草图的迭代算法，该算法计算$ 1 +\ varepsilon $近似解决方案，用于脊回归问题$ \ min_x \ | ax-b \ | ax-b \ | _2^2 +\ lambda \ lambda \ | x \ | x \ | _2^2 $ were $ a \ in r^{n \ times d} $带有$ d \ ge n $。我们的算法对于恒定数量的迭代（需要输入量的恒定通过），通过要求素描矩阵仅具有较弱的近似矩阵乘法（AMM）保证，可以改善早期工作（Chowdhury等人）（Chowdhury等人）。在$ \ varepsilon $上，以及恒定的子空间嵌入保证。相反，较早的工作要求素描矩阵具有取决于$ \ varepsilon $的子空间嵌入保证。例如，要在$ 1 $迭代中生产$ 1+\ varepsilon $近似解决方案，需要$ 2 $通过输入，我们的算法需要OSNAP嵌入$ m = o（n \ sigma^2/\ lambda \ lambda \ varepsilon \ varepsilon ）带有稀疏参数$ s = o（\ log（n））$的$行，而Chowdhury等人的早期算法。使用相同数量的OSNAP行需要稀疏$ s = o（\ sqrt {\ sigma^2/\ lambda \ varepsilon} \ cdot \ log（n））$，其中$ \ sigma = \ opnorm = \ opnorm {a}是矩阵$ a $的光谱规范。我们还表明，该算法可用于为内核脊回归提供更快的算法。最后，我们表明，我们的算法所需的草图大小实质上对于山脊回归算法的自然框架实质上是最佳的，它通过证明AMM的遗漏素描矩阵上的下限。 AMM的草图大小的下限可能具有独立的兴趣。

translated by 谷歌翻译

Precise expressions for random projections: Low-rank approximation and randomized Newton

Michał Dereziński , Feynman Liang , Zhenyu Liao , Michael W. Mahoney

分类：机器学习 | (统计)机器学习

2020-06-18

通常希望通过将其投影到低维子空间来降低大数据集的维度。矩阵草图已成为一种非常有效地执行这种维度降低的强大技术。尽管有关于草图最差的表现的广泛文献，但现有的保证通常与实践中观察到的差异截然不同。我们利用随机矩阵的光谱分析中的最新发展来开发新技术，这些技术为通过素描获得的随机投影矩阵的期望值提供了准确的表达。这些表达式可以用来表征各种常见的机器学习任务中尺寸降低的性能，从低级别近似到迭代随机优化。我们的结果适用于几种流行的草图方法，包括高斯和拉德马赫草图，它们可以根据数据的光谱特性对这些方法进行精确的分析。经验结果表明，我们得出的表达式反映了这些草图方法的实际性能，直到低阶效应甚至不变因素。

translated by 谷歌翻译

A Unifying Theory of Distance from Calibration

Jarosław Błasiok , Parikshit Gopalan , Lunjia Hu , Preetum Nakkiran

分类：机器学习

2022-11-30

We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors. While the notion of perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Numerous calibration measures have been proposed in the literature, but it is unclear how they compare to each other, and many popular measures such as Expected Calibration Error (ECE) fail to satisfy basic properties like continuity. We present a rigorous framework for analyzing calibration measures, inspired by the literature on property testing. We propose a ground-truth notion of distance from calibration: the $\ell_1$ distance to the nearest perfectly calibrated predictor. We define a consistent calibration measure as one that is a polynomial factor approximation to the this distance. Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently: smooth calibration, interval calibration, and Laplace kernel calibration. The former two give quadratic approximations to the ground truth distance, which we show is information-theoretically optimal. Our work thus establishes fundamental lower and upper bounds on measuring distance to calibration, and also provides theoretical justification for preferring certain metrics (like Laplace kernel calibration) in practice.

translated by 谷歌翻译

Hardness and Algorithms for Robust and Sparse Optimization

Eric Price , Sandeep Silwal , Samson Zhou

分类：机器学习

2022-06-29

我们探索稀疏优化问题的算法和局限性，例如稀疏线性回归和稳健的线性回归。稀疏线性回归问题的目的是确定少数关键特征，而强大的线性回归问题的目标是确定少量错误的测量值。具体而言，稀疏线性回归问题寻求$ k $ -sparse vector $ x \ in \ mathbb {r}^d $以最小化$ \ | ax-b \ | _2 $，给定输入矩阵$ a \ in \ mathbb in \ mathbb {r}^{n \ times d} $和一个目标向量$ b \ in \ mathbb {r}^n $，而强大的线性回归问题寻求一个$ s $ s $，最多可以忽略$ k $行和a向量$ x $最小化$ \ |（ax-b）_s \ | _2 $。我们首先显示了在[OWZ15]工作上稳健回归构建的近似近似值的双晶格，这意味着稀疏回归的结果相似。我们通过减少$ k $ clique的猜想，进一步显示出稳健回归的精细颗粒硬度。在正面，我们给出了一种鲁棒回归的算法，该算法可实现任意准确的添加误差，并使用运行时与从细粒硬度结果中的下界紧密匹配的运行时，以及与类似运行时稀疏回归的算法。我们的上限和下限都依赖于从鲁棒线性回归到我们引入的稀疏回归的一般减少。我们的算法受到3SUM问题的启发，使用大约最近的邻居数据结构，并且可能具有独立的兴趣来解决稀疏优化问题。例如，我们证明我们的技术也可以用于研究稀疏的PCA问题。

translated by 谷歌翻译

Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation

Ainesh Bakshi , Piotr Indyk , Praneeth Kacham , Sandeep Silwal , Samson Zhou

分类：机器学习

2022-12-01

Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent computation, thus incurring $\Omega(n^2)$ runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain $\textit{subquadratic}$ time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving linear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix. In particular, we develop efficient reductions from $\textit{weighted vertex}$ and $\textit{weighted edge sampling}$ on kernel graphs, $\textit{simulating random walks}$ on kernel graphs, and $\textit{importance sampling}$ on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in $\textit{sublinear}$ (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsification, where we observe a $\textbf{9x}$ decrease in the number of kernel evaluations over baselines for LRA and a $\textbf{41x}$ reduction in the graph size for spectral sparsification.

translated by 谷歌翻译

Robust Sparse Mean Estimation via Sum of Squares

Ilias Diakonikolas , Daniel M. Kane , Sushrut Karmalkar , Ankit Pensia , Thanasis Pittas

分类：机器学习 | (统计)机器学习

2022-06-07

我们研究了在存在$ \ epsilon $ - 对抗异常值的高维稀疏平均值估计的问题。先前的工作为此任务获得了该任务的样本和计算有效算法，用于辅助性Subgaussian分布。在这项工作中，我们开发了第一个有效的算法，用于强大的稀疏平均值估计，而没有对协方差的先验知识。对于$ \ Mathbb r^d $上的分布，带有“认证有限”的$ t $ tum-矩和足够轻的尾巴，我们的算法达到了$ o（\ epsilon^{1-1/t}）$带有样品复杂性$的错误（\ epsilon^{1-1/t}） m =（k \ log（d））^{o（t）}/\ epsilon^{2-2/t} $。对于高斯分布的特殊情况，我们的算法达到了$ \ tilde o（\ epsilon）$的接近最佳错误，带有样品复杂性$ m = o（k^4 \ mathrm {polylog}（d）（d））/\ epsilon^^ 2 $。我们的算法遵循基于方形的总和，对算法方法的证明。我们通过统计查询和低度多项式测试的下限来补充上限，提供了证据，表明我们算法实现的样本时间 - 错误权衡在质量上是最好的。

translated by 谷歌翻译

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

Benjamin Recht , Maryam Fazel , Pablo A. Parrilo

分类：

2007-06-28

The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NP-hard, because it contains vector cardinality minimization as a special case.In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability, provided the codimension of the subspace is Ω(r(m + n) log mn), where m, n are the dimensions of the matrix, and r is its rank.The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this pre-existing concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization. We also discuss several algorithmic approaches to solving the norm minimization relaxations, and illustrate our results with numerical examples.

translated by 谷歌翻译