这项正在进行的工作旨在为统计学习提供统一的介绍,从诸如GMM和HMM等经典模型到现代神经网络(如VAE和扩散模型)缓慢地构建。如今,有许多互联网资源可以孤立地解释这一点或新的机器学习算法,但是它们并没有(也不能在如此简短的空间中)将这些算法彼此连接起来,或者与统计模型的经典文献相连现代算法出现了。同样明显缺乏的是一个单一的符号系统,尽管对那些已经熟悉材料的人(如这些帖子的作者)不满意,但对新手的入境造成了重大障碍。同样,我的目的是将各种模型(尽可能)吸收到一个用于推理和学习的框架上,表明(以及为什么)如何以最小的变化将一个模型更改为另一个模型(其中一些是新颖的,另一些是文献中的)。某些背景当然是必要的。我以为读者熟悉基本的多变量计算,概率和统计以及线性代数。这本书的目标当然不是​​完整性,而是从基本知识到过去十年中极强大的新模型的直线路径或多或少。然后,目标是补充而不是替换,诸如Bishop的\ emph {模式识别和机器学习}之类的综合文本,该文本现在已经15岁了。
translated by 谷歌翻译
A fundamental problem in neural network research, as well as in many other disciplines, is finding a suitable representation of multivariate data, i.e. random vectors. For reasons of computational and conceptual simplicity, the representation is often sought as a linear transformation of the original data. In other words, each component of the representation is a linear combination of the original variables. Well-known linear transformation methods include principal component analysis, factor analysis, and projection pursuit. Independent component analysis (ICA) is a recently developed method in which the goal is to find a linear representation of nongaussian data so that the components are statistically independent, or as independent as possible. Such a representation seems to capture the essential structure of the data in many applications, including feature extraction and signal separation. In this paper, we present the basic theory and applications of ICA, and our recent work on the subject.
translated by 谷歌翻译
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact
translated by 谷歌翻译
本论文主要涉及解决深层(时间)高斯过程(DGP)回归问题的状态空间方法。更具体地,我们代表DGP作为分层组合的随机微分方程(SDES),并且我们通过使用状态空间过滤和平滑方法来解决DGP回归问题。由此产生的状态空间DGP(SS-DGP)模型生成丰富的电视等级,与建模许多不规则信号/功能兼容。此外,由于他们的马尔可道结构,通过使用贝叶斯滤波和平滑方法可以有效地解决SS-DGPS回归问题。本论文的第二次贡献是我们通过使用泰勒力矩膨胀(TME)方法来解决连续离散高斯滤波和平滑问题。这诱导了一类滤波器和SmooThers,其可以渐近地精确地预测随机微分方程(SDES)解决方案的平均值和协方差。此外,TME方法和TME过滤器和SmoOthers兼容模拟SS-DGP并解决其回归问题。最后,本文具有多种状态 - 空间(深)GPS的应用。这些应用主要包括(i)来自部分观察到的轨迹的SDES的未知漂移功能和信号的光谱 - 时间特征估计。
translated by 谷歌翻译
or if the system is intended to service several individual channels in multiplex; (e) Several functions of several variables --in color television the message consists of three functions f (x,y, t), g(x, y, t), h(x, y, t) defined in a three-dimensional continuum --we may also think of these three functions as components of a vector field defined in the region --similarly, several black and white television sources would produce "messages" consisting of a number of functions of three variables; (f) Various combinations also occur, for example in television with an associated audio channel.. A transmitter which operates on the message in some way to produce a signal suitable for transmission over the channel. In telephony this operation consists merely of changing sound pressure into a proportional electrical current. In telegraphy we have an encoding operation which produces a sequence of dots, dashes and spaces on the channel corresponding to the message. In a multiplex PCM system the different speech functions must be sampled, compressed, quantized and encoded, and finally interleaved properly to construct the signal. Vocoder systems, television and frequency modulation are other examples of complex operations applied to the message to obtain the signal.3. The channel is merely the medium used to transmit the signal from transmitter to receiver. It may be a pair of wires, a coaxial cable, a band of radio frequencies, a beam of light, etc. 4. The receiver ordinarily performs the inverse operation of that done by the transmitter, reconstructing the message from the signal.5. The destination is the person (or thing) for whom the message is intended.We wish to consider certain general problems involving communication systems. To do this it is first necessary to represent the various elements involved as mathematical entities, suitably idealized from their physical counterparts. We may roughly classify communication systems into three main categories: discrete, continuous and mixed. By a discrete system we will mean one in which both the message and the signal are a sequence of discrete symbols. A typical case is telegraphy where the message is a sequence of letters and the signal a sequence of dots, dashes and spaces. A continuous system is one in which the message and signal are both treated
translated by 谷歌翻译
这项教程调查概述了统计学习理论中最新的非征血性进步与控制和系统识别相关。尽管在所有控制领域都取得了重大进展,但在线性系统的识别和学习线性二次调节器时,该理论是最发达的,这是本手稿的重点。从理论的角度来看,这些进步的大部分劳动都在适应现代高维统计和学习理论的工具。虽然与控制对机器学习的工具感兴趣的理论家高度相关,但基础材料并不总是容易访问。为了解决这个问题,我们提供了相关材料的独立介绍,概述了基于最新结果的所有关键思想和技术机械。我们还提出了许多开放问题和未来的方向。
translated by 谷歌翻译
In my previous article I mentioned for the first time that a classical neural network may have quantum properties as its own structure may be entangled. The question one may ask now is whether such a quantum property can be used to entangle other systems? The answer should be yes, as shown in what follows.
translated by 谷歌翻译
这项调查旨在提供线性模型及其背后的理论的介绍。我们的目标是对读者进行严格的介绍,并事先接触普通最小二乘。在机器学习中,输出通常是输入的非线性函数。深度学习甚至旨在找到需要大量计算的许多层的非线性依赖性。但是,这些算法中的大多数都基于简单的线性模型。然后,我们从不同视图中描述线性模型,并找到模型背后的属性和理论。线性模型是回归问题中的主要技术,其主要工具是最小平方近似,可最大程度地减少平方误差之和。当我们有兴趣找到回归函数时,这是一个自然的选择,该回归函数可以最大程度地减少相应的预期平方误差。这项调查主要是目的的摘要,即线性模型背后的重要理论的重要性,例如分布理论,最小方差估计器。我们首先从三种不同的角度描述了普通的最小二乘,我们会以随机噪声和高斯噪声干扰模型。通过高斯噪声,该模型产生了可能性,因此我们引入了最大似然估计器。它还通过这种高斯干扰发展了一些分布理论。最小二乘的分布理论将帮助我们回答各种问题并引入相关应用。然后,我们证明最小二乘是均值误差的最佳无偏线性模型,最重要的是,它实际上接近了理论上的极限。我们最终以贝叶斯方法及以后的线性模型结束。
translated by 谷歌翻译
Uncertainty is prevalent in engineering design, statistical learning, and decision making broadly. Due to inherent risk-averseness and ambiguity about assumptions, it is common to address uncertainty by formulating and solving conservative optimization models expressed using measure of risk and related concepts. We survey the rapid development of risk measures over the last quarter century. From its beginning in financial engineering, we recount their spread to nearly all areas of engineering and applied mathematics. Solidly rooted in convex analysis, risk measures furnish a general framework for handling uncertainty with significant computational and theoretical advantages. We describe the key facts, list several concrete algorithms, and provide an extensive list of references for further reading. The survey recalls connections with utility theory and distributionally robust optimization, points to emerging applications areas such as fair machine learning, and defines measures of reliability.
translated by 谷歌翻译
We consider autocovariance operators of a stationary stochastic process on a Polish space that is embedded into a reproducing kernel Hilbert space. We investigate how empirical estimates of these operators converge along realizations of the process under various conditions. In particular, we examine ergodic and strongly mixing processes and obtain several asymptotic results as well as finite sample error bounds. We provide applications of our theory in terms of consistency results for kernel PCA with dependent data and the conditional mean embedding of transition probabilities. Finally, we use our approach to examine the nonparametric estimation of Markov transition operators and highlight how our theory can give a consistency analysis for a large family of spectral analysis methods including kernel-based dynamic mode decomposition.
translated by 谷歌翻译
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets.This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.
translated by 谷歌翻译
最近有一项激烈的活动在嵌入非常高维和非线性数据结构的嵌入中,其中大部分在数据科学和机器学习文献中。我们分四部分调查这项活动。在第一部分中,我们涵盖了非线性方法,例如主曲线,多维缩放,局部线性方法,ISOMAP,基于图形的方法和扩散映射,基于内核的方法和随机投影。第二部分与拓扑嵌入方法有关,特别是将拓扑特性映射到持久图和映射器算法中。具有巨大增长的另一种类型的数据集是非常高维网络数据。第三部分中考虑的任务是如何将此类数据嵌入中等维度的向量空间中,以使数据适合传统技术,例如群集和分类技术。可以说,这是算法机器学习方法与统计建模(所谓的随机块建模)之间的对比度。在论文中,我们讨论了两种方法的利弊。调查的最后一部分涉及嵌入$ \ mathbb {r}^ 2 $,即可视化中。提出了三种方法:基于第一部分,第二和第三部分中的方法,$ t $ -sne,UMAP和大节。在两个模拟数据集上进行了说明和比较。一个由嘈杂的ranunculoid曲线组成的三胞胎,另一个由随机块模型和两种类型的节点产生的复杂性的网络组成。
translated by 谷歌翻译
This review presents empirical researchers with recent advances in causal inference, and stresses the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called "causal effects" or "policy evaluation") (2) queries about probabilities of counterfactuals, (including assessment of "regret," "attribution" or "causes of effects") and (3) queries about direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.
translated by 谷歌翻译
我们为特殊神经网络架构,称为运营商复发性神经网络的理论分析,用于近似非线性函数,其输入是线性运算符。这些功能通常在解决方案算法中出现用于逆边值问题的问题。传统的神经网络将输入数据视为向量,因此它们没有有效地捕获与对应于这种逆问题中的数据的线性运算符相关联的乘法结构。因此,我们介绍一个类似标准的神经网络架构的新系列,但是输入数据在向量上乘法作用。由较小的算子出现在边界控制中的紧凑型操作员和波动方程的反边值问题分析,我们在网络中的选择权重矩阵中促进结构和稀疏性。在描述此架构后,我们研究其表示属性以及其近似属性。我们还表明,可以引入明确的正则化,其可以从所述逆问题的数学分析导出,并导致概括属性上的某些保证。我们观察到重量矩阵的稀疏性改善了概括估计。最后,我们讨论如何将运营商复发网络视为深度学习模拟,以确定诸如用于从边界测量的声波方程中重建所未知的WAVESTED的边界控制的算法算法。
translated by 谷歌翻译
量子哈密顿学习和量子吉布斯采样的双重任务与物理和化学中的许多重要问题有关。在低温方案中,这些任务的算法通常会遭受施状能力,例如因样本或时间复杂性差而遭受。为了解决此类韧性,我们将量子自然梯度下降的概括引入了参数化的混合状态,并提供了稳健的一阶近似算法,即量子 - 固定镜下降。我们使用信息几何学和量子计量学的工具证明了双重任务的数据样本效率,因此首次将经典Fisher效率的开创性结果推广到变异量子算法。我们的方法扩展了以前样品有效的技术,以允许模型选择的灵活性,包括基于量子汉密尔顿的量子模型,包括基于量子的模型,这些模型可能会规避棘手的时间复杂性。我们的一阶算法是使用经典镜下降二元性的新型量子概括得出的。两种结果都需要特殊的度量选择,即Bogoliubov-Kubo-Mori度量。为了从数值上测试我们提出的算法,我们将它们的性能与现有基准进行了关于横向场ISING模型的量子Gibbs采样任务的现有基准。最后,我们提出了一种初始化策略,利用几何局部性来建模状态的序列(例如量子 - 故事过程)的序列。我们从经验上证明了它在实际和想象的时间演化的经验上,同时定义了更广泛的潜在应用。
translated by 谷歌翻译
强大的机器学习模型的开发中的一个重要障碍是协变量的转变,当训练和测试集的输入分布时发生的分配换档形式在条件标签分布保持不变时发生。尽管现实世界应用的协变量转变普遍存在,但在现代机器学习背景下的理论理解仍然缺乏。在这项工作中,我们检查协变量的随机特征回归的精确高尺度渐近性,并在该设置中提出了限制测试误差,偏差和方差的精确表征。我们的结果激发了一种自然部分秩序,通过协变速转移,提供足够的条件来确定何时何时损害(甚至有助于)测试性能。我们发现,过度分辨率模型表现出增强的协会转变的鲁棒性,为这种有趣现象提供了第一个理论解释之一。此外,我们的分析揭示了分销和分发外概率性能之间的精确线性关系,为这一令人惊讶的近期实证观察提供了解释。
translated by 谷歌翻译
近似消息传递(AMP)是解决高维统计问题的有效迭代范式。但是,当迭代次数超过$ o \ big(\ frac {\ log n} {\ log log \ log \ log n} \时big)$(带有$ n $问题维度)。为了解决这一不足,本文开发了一个非吸附框架,用于理解峰值矩阵估计中的AMP。基于AMP更新的新分解和可控的残差项,我们布置了一个分析配方,以表征在存在独立初始化的情况下AMP的有限样本行为,该过程被进一步概括以进行光谱初始化。作为提出的分析配方的两个具体后果:(i)求解$ \ mathbb {z} _2 $同步时,我们预测了频谱初始化AMP的行为,最高为$ o \ big(\ frac {n} {\ mathrm {\ mathrm { poly} \ log n} \ big)$迭代,表明该算法成功而无需随后的细化阶段(如最近由\ citet {celentano2021local}推测); (ii)我们表征了稀疏PCA中AMP的非反应性行为(在尖刺的Wigner模型中),以广泛的信噪比。
translated by 谷歌翻译
We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M . Can we complete the matrix and recover the entries that we have not seen?We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys m ≥ C n 1.2 r log n for some positive numerical constant C, then with very high probability, most n × n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.
translated by 谷歌翻译
开发了一种使用多个辅助变量的非静止空间建模算法。它将Geodatistics与Simitile随机林结合起来,以提供一种新的插值和随机仿真算法。本文介绍了该方法,并表明它具有与施加地统计学建模和定量随机森林的那些相似的一致性结果。该方法允许嵌入更简单的插值技术,例如Kriging,以进一步调节模型。该算法通过估计每个目标位置处的目标变量的条件分布来工作。这种分布的家庭称为目标变量的包络。由此,可以获得空间估计,定量和不确定性。还开发了一种从包络产生条件模拟的算法。随着它们从信封中的样本,因此通过相对变化的次要变量,趋势和可变性的相对变化局部地影响。
translated by 谷歌翻译