智能论文笔记

Generative modeling via tensor train sketching

Y. Hur , J. G. Hoskins , M. Lindsey , E. M. Stoudenmire , Y. Khoo

分类：机器学习

2022-02-23

在本文中，我们介绍了一种草图算法，用于构建其样品概率密度的张量列车表示。我们的方法偏离了基于标准的递归SVD构建张量列车的程序。取而代之的是，我们为单个张量火车芯制定并求解一系列小型线性系统。这种方法可以避免维数的诅咒，从而威胁恢复问题的算法和样本复杂性。具体而言，对于马尔可夫模型，我们证明可以使用相对于尺寸恒定的样品复杂性回收张量芯。最后，我们通过几个数值实验说明了该方法的性能。

translated by 谷歌翻译

Generative Modeling via Tree Tensor Network States

Xun Tang , Yoonhaeng Hur , Yuehaw Khoo , Lexing Ying

分类： (统计)机器学习 | 机器学习

2022-09-03

在本文中，我们提出了一个基于树张量网状状态的密度估计框架。所提出的方法包括使用Chow-Liu算法确定树拓扑，并获得线性系统通过草图技术定义张量 - 网络组件的线性系统。开发了草图功能的新颖选择，以考虑包含循环的图形模型。提供样品复杂性保证，并通过数值实验进一步证实。

translated by 谷歌翻译

High-dimensional density estimation with tensorizing flow

Yinuo Ren , Hongli Zhao , Yuehaw Khoo , Lexing Ying

分类：机器学习 | (统计)机器学习

2022-12-01

We propose the tensorizing flow method for estimating high-dimensional probability density functions from the observed data. The method is based on tensor-train and flow-based generative modeling. Our method first efficiently constructs an approximate density in the tensor-train form via solving the tensor cores from a linear system based on the kernel density estimators of low-dimensional marginals. We then train a continuous-time flow model from this tensor-train density to the observed empirical distribution by performing a maximum likelihood estimation. The proposed method combines the optimization-less feature of the tensor-train with the flexibility of the flow-based generative models. Numerical results are included to demonstrate the performance of the proposed method.

translated by 谷歌翻译

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

Tian Tong , Cong Ma , Ashley Prater-Bennette , Erin Tripp , Yuejie Chi

分类：机器学习 | (统计)机器学习

2021-04-29

提供了一种强大而灵活的模型，可用于代表多属数据和多种方式相互作用，在科学和工程中的各个领域中发挥着现代数据科学中的不可或缺的作用。基本任务是忠实地以统计和计算的有效方式从高度不完整的测量中恢复张量。利用Tucker分解中的张量的低级别结构，本文开发了一个缩放的梯度下降（Scaledgd）算法，可以直接恢复具有定制频谱初始化的张量因子，并表明它以与条件号无关的线性速率收敛对于两个规范问题的地面真理张量 - 张量完成和张量回归 - 一旦样本大小高于$ n ^ {3/2} $忽略其他参数依赖项，$ n $是维度张量。这导致与现有技术相比的低秩张力估计的极其可扩展的方法，这些方法具有以下至少一个缺点：对记忆和计算方面的对不良，偏移成本高的极度敏感性，或差样本复杂性保证。据我们所知，Scaledgd是第一算法，它可以同时实现近最佳统计和计算复杂性，以便与Tucker分解进行低级张力完成。我们的算法突出了加速非耦合统计估计在加速非耦合统计估计中的适当预处理的功率，其中迭代改复的预处理器促进轨迹的所需的不变性属性相对于低级张量分解中的底层对称性。

translated by 谷歌翻译

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

Nathan Halko , Per-Gunnar Martinsson , Joel A. Tropp

分类：

2009-09-22

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets.This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

translated by 谷歌翻译

Tensor-train decomposition

分类：

A simple nonrecursive form of the tensor decomposition in d dimensions is presented. It does not inherently suffer from the curse of dimensionality, it has asymptotically the same number of parameters as the canonical decomposition, but it is stable and its computation is based on lowrank approximation of auxiliary unfolding matrices. The new form gives a clear and convenient way to implement all basic operations efficiently. A fast rounding procedure is presented, as well as basic linear algebra operations. Examples showing the benefits of the decomposition are given, and the efficiency is demonstrated by the computation of the smallest eigenvalue of a 19-dimensional operator.

translated by 谷歌翻译

Low Rank Approximation for General Tensor Networks

Arvind V. Mahankali , David P. Woodruff , Ziyu Zhang

分类：机器学习

2022-07-15

我们研究了用$ q $ modes $ a \ in \ mathbb {r}^{n \ times \ ldots \ times n} $的近似给定张量的问题。图$ g =（v，e）$，其中$ | v | = q $，以及张张量的集合$ \ {u_v \ mid v \ in v \} $，以$ g $指定的方式收缩以获取张量$ t $。对于$ u_v $的每种模式，对应于$ v $的边缘事件，尺寸为$ k $，我们希望找到$ u_v $，以便最小化$ t $和$ a $之间的frobenius norm距离。这概括了许多众所周知的张量网络分解，例如张量列，张量环，塔克和PEPS分解。我们大约是二进制树网络$ t'$带有$ o（q）$核的大约$ a $，因此该网络的每个边缘上的尺寸最多是$ \ widetilde {o}（k^{o（dt） } \ cdot q/\ varepsilon）$，其中$ d $是$ g $的最大度，$ t $是其树宽，因此$ \ | a -t'-t'\ | _f^2 \ leq（1 + \ Varepsilon）\ | a -t \ | _f^2 $。我们算法的运行时间为$ o（q \ cdot \ text {nnz}（a）） + n \ cdot \ text {poly}（k^{dt} q/\ varepsilon）$，其中$ \ text {nnz }（a）$是$ a $的非零条目的数量。我们的算法基于一种可能具有独立感兴趣的张量分解的新维度降低技术。我们还开发了固定参数可处理的$（1 + \ varepsilon）$ - 用于张量火车和塔克分解的近似算法，改善了歌曲的运行时间，Woodruff和Zhong（Soda，2019），并避免使用通用多项式系统求解器。我们表明，我们的算法对$ 1/\ varepsilon $具有几乎最佳的依赖性，假设没有$ O（1）$ - 近似算法的$ 2 \至4 $ norm，并且运行时间比蛮力更好。最后，我们通过可靠的损失函数和固定参数可拖动CP分解给出了塔克分解的其他结果。

translated by 谷歌翻译

Perturbation Analysis of Randomized SVD and its Applications to High-dimensional Statistics

Yichi Zhang , Minh Tang

分类： (统计)机器学习

2022-03-19

随机奇异值分解（RSVD）是用于计算大型数据矩阵截断的SVD的一类计算算法。给定A $ n \ times n $对称矩阵$ \ mathbf {m} $，原型RSVD算法输出通过计算$ \ mathbf {m mathbf {m} $的$ k $引导singular vectors的近似m}^{g} \ mathbf {g} $;这里$ g \ geq 1 $是一个整数，$ \ mathbf {g} \ in \ mathbb {r}^{n \ times k} $是一个随机的高斯素描矩阵。在本文中，我们研究了一般的“信号加上噪声”框架下的RSVD的统计特性，即，观察到的矩阵$ \ hat {\ mathbf {m}} $被认为是某种真实但未知的加法扰动信号矩阵$ \ mathbf {m} $。我们首先得出$ \ ell_2 $（频谱规范）和$ \ ell_ {2 \ to \ infty} $（最大行行列$ \ ell_2 $ norm）$ \ hat {\ hat {\ Mathbf {M}} $和信号矩阵$ \ Mathbf {M} $的真实单数向量。这些上限取决于信噪比（SNR）和功率迭代$ g $的数量。观察到一个相变现象，其中较小的SNR需要较大的$ g $值以保证$ \ ell_2 $和$ \ ell_ {2 \ to \ fo \ infty} $ distances的收敛。我们还表明，每当噪声矩阵满足一定的痕量生长条件时，这些相变发生的$ g $的阈值都会很清晰。最后，我们得出了近似奇异向量的行波和近似矩阵的进入波动的正常近似。我们通过将RSVD的几乎最佳性能保证在应用于三个统计推断问题的情况下，即社区检测，矩阵完成和主要的组件分析，并使用缺失的数据来说明我们的理论结果。

translated by 谷歌翻译

More Efficient Sampling for Tensor Decomposition With Worst-Case Guarantees

Osman Asif Malik

分类：机器学习

2021-10-14

最近的论文开发了CP和张量环分解的交替正方形（ALS）方法，其均值成本是sublinear，在低级别分解的输入张量输入量中是sublinear。在本文中，我们提出了基于抽样的ALS方法，用于CP和张量环分解，其成本没有指数级的依赖性，从而显着改善了先前的最先前。我们提供详细的理论分析，并在特征提取实验中应用这些方法。

translated by 谷歌翻译

Tensor train completion: local recovery guarantees via Riemannian optimization

Stanislav Budzinskiy , Nikolai Zamarashkin

分类：机器学习

2021-10-08

在这项工作中，我们估计具有高概率的张量的随机选择元素的数量，保证了黎曼梯度下降的局部收敛性，以便张力列车完成。基于展开奇异值的谐波平均值，我们从正交投影的正交投影推导出一个新的界限，并引入张力列车的核心相干概念。我们还将结果扩展到张力列车完成与侧面信息，并获得相应的本地收敛保证。

translated by 谷歌翻译

Subquadratic Kronecker Regression with Applications to Tensor Decomposition

Matthew Fahrbach , Thomas Fu , Mehrdad Ghadiri

分类：机器学习

2022-09-11

kronecker回归是一个高度结构的最小二乘问题$ \ min _ {\ mathbf {x}}} \ lvert \ mathbf {k} \ mathbf {x} - \ mathbf {b} \ rvert_ \ rvert_ {2}^2 $矩阵$ \ mathbf {k} = \ mathbf {a}^{（1）} \ otimes \ cdots \ cdots \ otimes \ mathbf {a}^{（n）} $是因子矩阵的Kronecker产品。这种回归问题是在广泛使用的最小二乘（ALS）算法的每个步骤中都出现的，用于计算张量的塔克分解。我们介绍了第一个用于求解Kronecker回归的子次数算法，以避免在运行时间中避免指数项$ o（\ varepsilon^{ - n}）$的$（1+ \ varepsilon）$。我们的技术结合了利用分数抽样和迭代方法。通过扩展我们对一个块是Kronecker产品的块设计矩阵的方法，我们还实现了（1）Kronecker Ridge回归的亚次级时间算法，并且（2）更新ALS中Tucker分解的因子矩阵，这不是一个不是一个纯Kronecker回归问题，从而改善了Tucker ALS的所有步骤的运行时间。我们证明了该Kronecker回归算法在合成数据和现实世界图像张量上的速度和准确性。

translated by 谷歌翻译

Tensor decompositions for learning latent variable models

Anima Anandkumar , Rong Ge , Daniel Hsu , Sham M. Kakade , Matus Telgarsky

分类：

2012-10-29

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models-including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation-which exploits a certain tensor structure in their low-order observable moments (typically, of second-and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.

translated by 谷歌翻译

A Non-Asymptotic Framework for Approximate Message Passing in Spiked Models

Gen Li , Yuting Wei

分类：机器学习 | (统计)机器学习

2022-08-05

近似消息传递（AMP）是解决高维统计问题的有效迭代范式。但是，当迭代次数超过$ o \ big（\ frac {\ log n} {\ log log \ log \ log n} \时big）$（带有$ n $问题维度）。为了解决这一不足，本文开发了一个非吸附框架，用于理解峰值矩阵估计中的AMP。基于AMP更新的新分解和可控的残差项，我们布置了一个分析配方，以表征在存在独立初始化的情况下AMP的有限样本行为，该过程被进一步概括以进行光谱初始化。作为提出的分析配方的两个具体后果：（i）求解$ \ mathbb {z} _2 $同步时，我们预测了频谱初始化AMP的行为，最高为$ o \ big（\ frac {n} {\ mathrm {\ mathrm { poly} \ log n} \ big）$迭代，表明该算法成功而无需随后的细化阶段（如最近由\ citet {celentano2021local}推测）; （ii）我们表征了稀疏PCA中AMP的非反应性行为（在尖刺的Wigner模型中），以广泛的信噪比。

translated by 谷歌翻译

Clustering Mixtures with Almost Optimal Separation in Polynomial Time

Jerry Li , Allen Liu

分类：机器学习 | (统计)机器学习

2021-12-01

我们考虑了在高维度中平均分离的高斯聚类混合物的问题。我们是从$ k $身份协方差高斯的混合物提供的样本，使任何两对手段之间的最小成对距离至少为$ \ delta $，对于某些参数$ \ delta> 0 $，目标是恢复这些样本的地面真相聚类。它是分离$ \ delta = \ theta（\ sqrt {\ log k}）$既有必要且足以理解恢复良好的聚类。但是，实现这种担保的估计值效率低下。我们提供了在多项式时间内运行的第一算法，几乎符合此保证。更确切地说，我们给出了一种算法，它需要多项式许多样本和时间，并且可以成功恢复良好的聚类，只要分离为$ \ delta = \ oomega（\ log ^ {1/2 + c} k）$ ，任何$ c> 0 $。以前，当分离以k $的分离和可以容忍$ \ textsf {poly}（\ log k）$分离所需的quasi arynomial时间时，才知道该问题的多项式时间算法。我们还将我们的结果扩展到分布的分布式的混合物，该分布在额外的温和假设下满足Poincar \ {e}不等式的分布。我们认为我们相信的主要技术工具是一种新颖的方式，可以隐含地代表和估计分配的高度时刻，这使我们能够明确地提取关于高度时刻的重要信息而没有明确地缩小全瞬间张量。

translated by 谷歌翻译

On the representation and learning of monotone triangular transport maps

Ricardo Baptista , Youssef Marzouk , Olivier Zahm

分类： (统计)机器学习 | 机器学习

2020-09-22

度量的运输提供了一种用于建模复杂概率分布的多功能方法，并具有密度估计，贝叶斯推理，生成建模及其他方法的应用。单调三角传输地图$ \ unicode {x2014} $近似值$ \ unicode {x2013} $ rosenblatt（kr）重新安排$ \ unicode {x2014} $是这些任务的规范选择。然而，此类地图的表示和参数化对它们的一般性和表现力以及对从数据学习地图学习（例如，通过最大似然估计）出现的优化问题的属性产生了重大影响。我们提出了一个通用框架，用于通过平滑函数的可逆变换来表示单调三角图。我们建立了有关转化的条件，以使相关的无限维度最小化问题没有伪造的局部最小值，即所有局部最小值都是全球最小值。我们展示了满足某些尾巴条件的目标分布，唯一的全局最小化器与KR地图相对应。鉴于来自目标的样品，我们提出了一种自适应算法，该算法估计了基础KR映射的稀疏半参数近似。我们证明了如何将该框架应用于关节和条件密度估计，无可能的推断以及有向图形模型的结构学习，并在一系列样本量之间具有稳定的概括性能。

translated by 谷歌翻译

Tractability from overparametrization: The example of the negative perceptron

Andrea Montanari , Yiqiao Zhong , Kangjie Zhou

分类：机器学习

2021-10-28

在负面的感知问题中，我们给出了$ n $数据点$（{\ boldsymbol x} _i，y_i）$，其中$ {\ boldsymbol x} _i $是$ d $ -densional vector和$ y_i \ in \ { + 1，-1 \} $是二进制标签。数据不是线性可分离的，因此我们满足自己的内容，以找到最大的线性分类器，具有最大的\ emph {否定}余量。换句话说，我们想找到一个单位常规矢量$ {\ boldsymbol \ theta} $，最大化$ \ min_ {i \ le n} y_i \ langle {\ boldsymbol \ theta}，{\ boldsymbol x} _i \ rangle $ 。这是一个非凸优化问题（它相当于在Polytope中找到最大标准矢量），我们在两个随机模型下研究其典型属性。我们考虑比例渐近，其中$ n，d \ to \ idty $以$ n / d \ to \ delta $，并在最大边缘$ \ kappa _ {\ text {s}}（\ delta）上证明了上限和下限）$或 - 等效 - 在其逆函数$ \ delta _ {\ text {s}}（\ kappa）$。换句话说，$ \ delta _ {\ text {s}}（\ kappa）$是overparametization阈值：以$ n / d \ le \ delta _ {\ text {s}}（\ kappa） - \ varepsilon $一个分类器实现了消失的训练错误，具有高概率，而以$ n / d \ ge \ delta _ {\ text {s}}（\ kappa）+ \ varepsilon $。我们在$ \ delta _ {\ text {s}}（\ kappa）$匹配，以$ \ kappa \ to - \ idty $匹配。然后，我们分析了线性编程算法来查找解决方案，并表征相应的阈值$ \ delta _ {\ text {lin}}（\ kappa）$。我们观察插值阈值$ \ delta _ {\ text {s}}（\ kappa）$和线性编程阈值$ \ delta _ {\ text {lin {lin}}（\ kappa）$之间的差距，提出了行为的问题其他算法。

translated by 谷歌翻译

Error Analysis of Tensor-Train Cross Approximation

Zhen Qin , Alexander Lidiak , Zhexuan Gong , Gongguo Tang , Michael B. Wakin , Zhihui Zhu

分类：机器学习

2022-07-09

张量火车的分解因其高维张量的简洁表示，因此在机器学习和量子物理学中广泛使用，克服了维度的诅咒。交叉近似 - 从近似形式开发用于从一组选定的行和列中表示矩阵，这是一种有效的方法，用于构建来自其少数条目的张量的张量列器分解。虽然张量列车交叉近似在实际应用中取得了显着的性能，但迄今为止缺乏其理论分析，尤其是在近似误差方面的理论分析。据我们所知，现有结果仅提供元素近似精度的保证，这会导致扩展到整个张量时的束缚非常松。在本文中，我们通过提供精确测量和嘈杂测量的整个张量来保证准确性来弥合这一差距。我们的结果说明了选定子观察器的选择如何影响交叉近似的质量，并且模型误差和/或测量误差引起的近似误差可能不会随着张量的顺序而指数增长。这些结果通过数值实验来验证，并且可能对高阶张量的交叉近似值（例如在量子多体状态的描述中遇到的）具有重要意义。

translated by 谷歌翻译

Tight bounds on the hardness of learning simple nonparametric mixtures

Bryon Aragam , Wai Ming Tai

分类：机器学习 | (统计)机器学习

2022-03-28

我们研究有限混合物中学习非参数分布的问题，并在样品复杂性上建立紧密的界限，以学习此类模型中的组件分布。也就是说，我们得到了I.I.D.来自pdf $ f $ whene $$ f = \ sum_ {i = 1}^k w_i f_i，\ quad \ sum_ {i = 1}^k w_i = 1，\ quad w_i> 0 $$的样品在学习每个组件$ f_i $时。没有关于$ f_i $的任何假设，此问题是错误的。为了识别组件$ f_i $，我们假设每个$ f_i $都可以写为高斯的卷积和紧凑的密度密度$ \ nu_i $，带有$ \ text {supp {supp}（\ nu_i）\ cap \ text \ text {supp}（\ nu_j）= \ emptyset $。我们的主要结果表明，$（\ frac {1} {\ varepsilon}）^{\ omega（\ log \ log \ log \ frac {1} {\ varepsilon}）} $ samples $ samples是估计每个$ f_i $的样本所必需的。与参数混合物不同，难度不是源于$ k $或小重量$ w_i $的订单，并且与非参数密度估计不同，它不是源于维度，不规则性或不均匀性的诅咒。证明依赖于与高斯人的近似值的快速率，这可能是独立的。要证明这很紧，我们还提出了一种算法，该算法使用$（\ frac {1} {\ varepsilon}）^{o（\ log \ log \ log \ frac {1} {\ varepsilon} {\ varepsilon}} $ sample f_i $。与基于力矩匹配和张量方法学习潜在变量模型的现有方法不同，我们的证明涉及通过正交功能对不良条件线性系统进行微妙的分析。结合了这些界限，我们得出结论，该问题的最佳样本复杂性正确在于多项式和指数之间，这在学习理论中并不常见。

translated by 谷歌翻译

Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales

Jonathan Kelner , Annie Marsden , Vatsal Sharan , Aaron Sidford , Gregory Valiant , Honglin Yuan

分类：机器学习 | (统计)机器学习

2021-11-04

我们提供了新的基于梯度的方法，以便有效解决广泛的病态化优化问题。我们考虑最小化函数$ f：\ mathbb {r} ^ d \ lightarrow \ mathbb {r} $的问题，它是隐含的可分解的，作为$ m $未知的非交互方式的总和，强烈的凸起功能并提供方法这解决了这个问题，这些问题是缩放（最快的对数因子）作为组件的条件数量的平方根的乘积。这种复杂性绑定（我们证明几乎是最佳的）可以几乎指出的是加速梯度方法的几乎是指数的，这将作为$ F $的条件数量的平方根。此外，我们提供了求解该多尺度优化问题的随机异标变体的有效方法。而不是学习$ F $的分解（这将是过度昂贵的），而是我们的方法应用一个清洁递归“大步小步”交错标准方法。由此产生的算法使用$ \ tilde {\ mathcal {o}}（d m）$空间，在数字上稳定，并打开门以更细粒度的了解凸优化超出条件号的复杂性。

translated by 谷歌翻译

Exact Matrix Completion via Convex Optimization

Emmanuel J. Candes , Benjamin Recht

分类：

2008-05-29

We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M . Can we complete the matrix and recover the entries that we have not seen?We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys m ≥ C n 1.2 r log n for some positive numerical constant C, then with very high probability, most n × n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.

translated by 谷歌翻译