Kernel machines have sustained continuous progress in the field of quantum chemistry. In particular, they have proven to be successful in the low-data regime of force field reconstruction. This is because many physical invariances and symmetries can be incorporated into the kernel function to compensate for much larger datasets. So far, the scalability of this approach has however been hindered by its cubical runtime in the number of training points. While it is known, that iterative Krylov subspace solvers can overcome these burdens, they crucially rely on effective preconditioners, which are elusive in practice. Practical preconditioners need to be computationally efficient and numerically robust at the same time. Here, we consider the broad class of Nystr\"om-type methods to construct preconditioners based on successively more sophisticated low-rank approximations of the original kernel matrix, each of which provides a different set of computational trade-offs. All considered methods estimate the relevant subspace spanned by the kernel matrix columns using different strategies to identify a representative set of inducing points. Our comprehensive study covers the full spectrum of approaches, starting from naive random sampling to leverage score estimates and incomplete Cholesky factorizations, up to exact SVD decompositions.
translated by 谷歌翻译
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets.This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.
translated by 谷歌翻译
随机旋转的Cholesky(RPCholesky)是一种用于计算N X N阳性半芬酸矩阵(PSD)矩阵的等级K近似的天然算法。RPCholesky只需几行代码就可以实现。它仅需要(k+1)n进入评估,o(k^2 n)其他算术操作。本文对其实验和理论行为进行了首次认真研究。从经验上讲,rpcholesky匹配或改善了低级别PSD近似的替代算法的性能。此外,RPCholesky可证明达到了近乎最佳的近似保证。该算法的简单性,有效性和鲁棒性强烈支持其在科学计算和机器学习应用中的使用。
translated by 谷歌翻译
预处理一直是优化和机器学习方面的主食技术。它通常会减少其应用于矩阵的条件数,从而加快优化算法的收敛性。尽管实践中有许多流行的预处理技术,但大多数人缺乏降低病数的理论保证。在本文中,我们研究了最佳对角线预处理的问题,以分别或同时分别或同时缩放其行或列来实现任何全级矩阵的条件数量的最大降低。我们首先将问题重新将问题重新制定为一个准凸出问题,并提供了一种基线一分配算法,该算法在实践中易于实现,其中每次迭代都包含SDP可行性问题。然后,我们建议使用$ o(\ log(\ frac {1} {\ epsilon})))$迭代复杂度提出多项式时间潜在的降低算法,其中每个迭代均由基于Nesterov-todd方向的牛顿更新组成。我们的算法基于该问题的表述,该问题是von Neumann最佳生长问题的广义版本。接下来,我们专注于单方面的最佳对角线预处理问题,并证明它们可以作为标准双SDP问题配方,我们应用了有效的定制求解器并研究我们最佳的对角线预处理的经验性能。我们在大型矩阵上进行的广泛实验表明,与基于启发式的预处理相比,最佳对角线预处理在减少条件数方面的实际吸引力。
translated by 谷歌翻译
FIG. 1. Schematic diagram of a Variational Quantum Algorithm (VQA). The inputs to a VQA are: a cost function C(θ), with θ a set of parameters that encodes the solution to the problem, an ansatz whose parameters are trained to minimize the cost, and (possibly) a set of training data {ρ k } used during the optimization. Here, the cost can often be expressed in the form in Eq. ( 3), for some set of functions {f k }. Also, the ansatz is shown as a parameterized quantum circuit (on the left), which is analogous to a neural network (also shown schematically on the right). At each iteration of the loop one uses a quantum computer to efficiently estimate the cost (or its gradients). This information is fed into a classical computer that leverages the power of optimizers to navigate the cost landscape C(θ) and solve the optimization problem in Eq. ( 1). Once a termination condition is met, the VQA outputs an estimate of the solution to the problem. The form of the output depends on the precise task at hand. The red box indicates some of the most common types of outputs.
translated by 谷歌翻译
对称考虑对于用于提供原子配置的有效数学表示的主要框架的核心,然后在机器学习模型中用于预测与每个结构相关的特性。在大多数情况下,模型依赖于以原子为中心的环境的描述,并且适合于学习可以分解成原子贡献的原子特性或全局观察到。然而,许多与量子机械计算相关的数量 - 最值得注意的是,以原子轨道基础写入时的单粒子哈密顿矩阵 - 与单个中心无关,但结构中有两个(或更多个)原子。我们讨论一系列结构描述符,以概括为N中心案例的非常成功的原子居中密度相关特征,特别是如何应用这种结构,以有效地学习(有效)单粒子汉密尔顿人的矩阵元素以原子为中心的轨道基础。这些N中心的特点是完全的,不仅在转换和旋转方面,而且还就与原子相关的指数的排列而言 - 并且适合于构建新类的对称适应的机器学习模型分子和材料的性质。
translated by 谷歌翻译
We consider the nonlinear inverse problem of learning a transition operator $\mathbf{A}$ from partial observations at different times, in particular from sparse observations of entries of its powers $\mathbf{A},\mathbf{A}^2,\cdots,\mathbf{A}^{T}$. This Spatio-Temporal Transition Operator Recovery problem is motivated by the recent interest in learning time-varying graph signals that are driven by graph operators depending on the underlying graph topology. We address the nonlinearity of the problem by embedding it into a higher-dimensional space of suitable block-Hankel matrices, where it becomes a low-rank matrix completion problem, even if $\mathbf{A}$ is of full rank. For both a uniform and an adaptive random space-time sampling model, we quantify the recoverability of the transition operator via suitable measures of incoherence of these block-Hankel embedding matrices. For graph transition operators these measures of incoherence depend on the interplay between the dynamics and the graph topology. We develop a suitable non-convex iterative reweighted least squares (IRLS) algorithm, establish its quadratic local convergence, and show that, in optimal scenarios, no more than $\mathcal{O}(rn \log(nT))$ space-time samples are sufficient to ensure accurate recovery of a rank-$r$ operator $\mathbf{A}$ of size $n \times n$. This establishes that spatial samples can be substituted by a comparable number of space-time samples. We provide an efficient implementation of the proposed IRLS algorithm with space complexity of order $O(r n T)$ and per-iteration time complexity linear in $n$. Numerical experiments for transition operators based on several graph models confirm that the theoretical findings accurately track empirical phase transitions, and illustrate the applicability and scalability of the proposed algorithm.
translated by 谷歌翻译
我们向高吞吐量基准介绍了用于材料和分子数据集的化学系统的多种表示的高吞吐量基准的机器学习(ML)框架。基准测试方法的指导原理是通过将模型复杂性限制在简单的回归方案的同时,在执行最佳ML实践的同时将模型复杂性限制为简单的回归方案,允许通过沿着同步的列车测试分裂的系列进行学习曲线来评估学习进度来评估原始描述符性能。结果模型旨在为未来方法开发提供通知的基线,旁边指示可以学习给定的数据集多么容易。通过对各种物理化学,拓扑和几何表示的培训结果的比较分析,我们介绍了这些陈述的相对优点以及它们的相互关联。
translated by 谷歌翻译
我们提出了一个算法框架,用于近距离矩阵上的量子启发的经典算法,概括了Tang的突破性量子启发算法开始的一系列结果,用于推荐系统[STOC'19]。由量子线性代数算法和gily \'en,su,low和wiebe [stoc'19]的量子奇异值转换(SVT)框架[SVT)的动机[STOC'19],我们开发了SVT的经典算法合适的量子启发的采样假设。我们的结果提供了令人信服的证据,表明在相应的QRAM数据结构输入模型中,量子SVT不会产生指数量子加速。由于量子SVT框架基本上概括了量子线性代数的所有已知技术,因此我们的结果与先前工作的采样引理相结合,足以概括所有有关取消量子机器学习算法的最新结果。特别是,我们的经典SVT框架恢复并经常改善推荐系统,主成分分析,监督聚类,支持向量机器,低秩回归和半决赛程序解决方案的取消结果。我们还为汉密尔顿低级模拟和判别分析提供了其他取消化结果。我们的改进来自识别量子启发的输入模型的关键功能,该模型是所有先前量子启发的结果的核心:$ \ ell^2 $ -Norm采样可以及时近似于其尺寸近似矩阵产品。我们将所有主要结果减少到这一事实,使我们的简洁,独立和直观。
translated by 谷歌翻译
Koopman运算符是无限维的运算符,可全球线性化非线性动态系统,使其光谱信息可用于理解动态。然而,Koopman运算符可以具有连续的光谱和无限维度的子空间,使得它们的光谱信息提供相当大的挑战。本文介绍了具有严格融合的数据驱动算法,用于从轨迹数据计算Koopman运算符的频谱信息。我们引入了残余动态模式分解(ResDMD),它提供了第一种用于计算普通Koopman运算符的Spectra和PseudtoStra的第一种方案,无需光谱污染。使用解析器操作员和RESDMD,我们还计算与测量保存动态系统相关的光谱度量的平滑近似。我们证明了我们的算法的显式收敛定理,即使计算连续频谱和离散频谱的密度,也可以实现高阶收敛即使是混沌系统。我们展示了在帐篷地图,高斯迭代地图,非线性摆,双摆,洛伦茨系统和11美元延长洛伦兹系统的算法。最后,我们为具有高维状态空间的动态系统提供了我们的算法的核化变体。这使我们能够计算与具有20,046维状态空间的蛋白质分子的动态相关的光谱度量,并计算出湍流流过空气的误差界限的非线性Koopman模式,其具有雷诺数为$> 10 ^ 5 $。一个295,122维的状态空间。
translated by 谷歌翻译
The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NP-hard, because it contains vector cardinality minimization as a special case.In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability, provided the codimension of the subspace is Ω(r(m + n) log mn), where m, n are the dimensions of the matrix, and r is its rank.The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this pre-existing concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization. We also discuss several algorithmic approaches to solving the norm minimization relaxations, and illustrate our results with numerical examples.
translated by 谷歌翻译
Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent computation, thus incurring $\Omega(n^2)$ runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain $\textit{subquadratic}$ time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving linear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix. In particular, we develop efficient reductions from $\textit{weighted vertex}$ and $\textit{weighted edge sampling}$ on kernel graphs, $\textit{simulating random walks}$ on kernel graphs, and $\textit{importance sampling}$ on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in $\textit{sublinear}$ (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsification, where we observe a $\textbf{9x}$ decrease in the number of kernel evaluations over baselines for LRA and a $\textbf{41x}$ reduction in the graph size for spectral sparsification.
translated by 谷歌翻译
低精度算术对神经网络的训练产生了变革性的影响,从而减少了计算,记忆和能量需求。然而,尽管有希望,低精确的算术对高斯流程(GPS)的关注很少,这主要是因为GPS需要在低精确度中不稳定的复杂线性代数例程。我们研究以一半精度训练GP时可能发生的不同故障模式。为了避免这些故障模式,我们提出了一种多方面的方法,该方法涉及具有重新构造,混合精度和预处理的共轭梯度。我们的方法大大提高了低精度在各种设置中的偶联梯度的数值稳定性和实践性能,从而使GPS能够在单个GPU上以10美元的$ 10 $ 10 $ 10 $ 10 $ 10的数据点进行培训,而没有任何稀疏的近似值。
translated by 谷歌翻译
从大型套装中选择不同的和重要的项目,称为地标是机器学习兴趣的问题。作为一个具体示例,为了处理大型训练集,内核方法通常依赖于基于地标的选择或采样的低等级矩阵NYSTR \“OM近似值。在此上下文中,我们提出了一个确定性和随机的自适应算法在培训数据集中选择地标点。这些地标与克尼利克里斯特步函数序列的最小值有关。除了ChristOffel功能和利用分数之间的已知联系,我们的方法也有限决定性点过程(DPP)也是如此解释。即,我们的建设以类似于DPP的方式促进重要地标点之间的多样性。此外,我们解释了我们的随机自适应算法如何影响内核脊回归的准确性。
translated by 谷歌翻译
Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware. We present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM). BBMM inference uses a modified batched version of the conjugate gradients algorithm to derive all terms for training and inference in a single call. BBMM reduces the asymptotic complexity of exact GP inference from O(n 3 ) to O(n 2 ). Adapting this algorithm to scalable approximations and complex GP models simply requires a routine for efficient matrix-matrix multiplication with the kernel and its derivative. In addition, BBMM uses a specialized preconditioner to substantially speed up convergence. In experiments we show that BBMM effectively uses GPU hardware to dramatically accelerate both exact GP inference and scalable approximations. Additionally, we provide GPyTorch, a software platform for scalable GP inference via BBMM, built on PyTorch.
translated by 谷歌翻译
We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural network's Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse. It is derived by approximating various large blocks of the Fisher (corresponding to entire layers) as being the Kronecker product of two much smaller matrices. While only several times more expensive to compute than the plain stochastic gradient, the updates produced by K-FAC make much more progress optimizing the objective, which results in an algorithm that can be much faster than stochastic gradient descent with momentum in practice. And unlike some previously proposed approximate natural-gradient/Newton methods which use high-quality non-diagonal curvature matrices (such as Hessian-free optimization), K-FAC works very well in highly stochastic optimization regimes. This is because the cost of storing and inverting K-FAC's approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.
translated by 谷歌翻译
素描和项目是一个框架,它统一了许多已知的迭代方法来求解线性系统及其变体,并进一步扩展了非线性优化问题。它包括流行的方法,例如随机kaczmarz,坐标下降,凸优化的牛顿方法的变体等。在本文中,我们通过新的紧密频谱边界为预期的草图投影矩阵获得了素描和项目的收敛速率的敏锐保证。我们的估计值揭示了素描和项目的收敛率与另一个众所周知但看似无关的算法家族的近似误差之间的联系,这些算法使用草图加速了流行的矩阵因子化,例如QR和SVD。这种连接使我们更接近准确量化草图和项目求解器的性能如何取决于其草图大小。我们的分析不仅涵盖了高斯和次高斯的素描矩阵,还涵盖了一个有效的稀疏素描方法,称为较少的嵌入方法。我们的实验备份了理论,并证明即使极稀疏的草图在实践中也显示出相同的收敛属性。
translated by 谷歌翻译
分子或材料的电子密度最近作为机器学习模型的目标数量受到了主要关注。一种自然选择,用于构建可传递可转移和线性缩放预测的模型是使用类似于通常用于密度拟合近似值的常规使用的原子基础来表示标量场。但是,基础的非正交性对学习练习构成了挑战,因为它需要立即考虑所有原子密度成分。我们设计了一种基于梯度的方法,可以直接在优化且高度稀疏的特征空间中最大程度地减少回归问题的损失函数。这样,我们克服了与采用以原子为中心的模型相关的限制,以在任意复杂的数据集上学习电子密度,从而获得极为准确的预测。增强的框架已在32个液体水的32个周期细胞上进行测试,具有足够的复杂性,需要在准确性和计算效率之间取得最佳平衡。我们表明,从预测的密度开始,可以执行单个Kohn-Sham对角度步骤,以访问总能量组件,而总能量组件仅针对参考密度函数计算,而误差仅为0.1 MEV/ATOM。最后,我们测试了高度异构QM9基准数据集的方法,这表明训练数据的一小部分足以在化学精度内得出地面总能量。
translated by 谷歌翻译
凸(特别是半决赛)放松提供了一种强大的方法来构建健壮的机器感知系统,从而使在许多实际情况下确切地恢复了全球最佳的挑战估计问题的最佳解决方案。然而,解决这种方法的大规模半决赛松弛仍然是一项巨大的计算挑战。在许多最先进的(基于孟买分解)可认证的估计方法的主要成本是解决方案验证(测试给定候选解决方案的全球最佳性),这需要计算一定的对称证书矩阵的最低特征型。 。在本文中,我们展示了如何显着加速此验证步骤,从而使可认证的估计方法的总体速度。首先,我们表明,在Burer-Monteiro方法中产生的证书矩阵通常具有光谱,使验证问题使用标准的迭代特征值方法昂贵。然后,我们展示了如何使用预处理的特征材料来应对这一挑战;具体而言,我们根据局部最佳块预处理共轭梯度(LOBPCG)方法设计了一种专门的解决方案验证算法,并使用简单但高效的代数预处理。对各种模拟和现实世界的实验评估表明,我们提出的验证方案在实践中非常有效,可以通过多达280倍加速溶液验证,而总体burer-monteiro方法最多可通过16倍,而当标准Lanczos方法与标准的Lanczos方法相比适用于源自大规模巨大基准测试的松弛。
translated by 谷歌翻译
网络数据通常在各种应用程序中收集,代表感兴趣的功能之间直接测量或统计上推断的连接。在越来越多的域中,这些网络会随着时间的流逝而收集,例如不同日子或多个主题之间的社交媒体平台用户之间的交互,例如在大脑连接性的多主体研究中。在分析多个大型网络时,降低降低技术通常用于将网络嵌入更易于处理的低维空间中。为此,我们通过专门的张量分解来开发用于网络集合的主组件分析(PCA)的框架,我们将半对称性张量PCA或SS-TPCA术语。我们得出计算有效的算法来计算我们提出的SS-TPCA分解,并在标准的低级别信号加噪声模型下建立方法的统计效率。值得注意的是,我们表明SS-TPCA具有与经典矩阵PCA相同的估计精度,并且与网络中顶点数的平方根成正比,而不是预期的边缘数。我们的框架继承了古典PCA的许多优势,适用于广泛的无监督学习任务,包括识别主要网络,隔离有意义的更改点或外出观察,以及表征最不同边缘的“可变性网络”。最后,我们证明了我们的提案对模拟数据的有效性以及经验法律研究的示例。用于建立我们主要一致性结果的技术令人惊讶地简单明了,可能会在其他各种网络分析问题中找到使用。
translated by 谷歌翻译