由于数据的注释可以在大规模的实际问题中稀缺,利用未标记的示例是机器学习中最重要的方面之一。这是半监督学习的目的。从访问未标记数据的访问中受益,它很自然地弥漫将标记数据平稳地知识到未标记的数据。这诱导了Laplacian正规化的使用。然而,Laplacian正则化的当前实施遭受了几种缺点,特别是众所周知的维度诅咒。在本文中,我们提供了统计分析以克服这些问题,并揭示了具有所需行为的大型光谱滤波方法。它们通过(再现)内核方法来实现,我们提供了现实的计算指南,以使我们的方法可用于大量数据。
translated by 谷歌翻译
The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the "active labeling" problem, which focuses on active learning with partial supervision, we provide a streaming technique that provably minimizes the ratio of generalization error over the number of samples. We illustrate our technique in depth for robust regression.
translated by 谷歌翻译
本文研究了基于Laplacian Eigenmaps(Le)的基于Laplacian EIGENMAPS(PCR-LE)的主要成分回归的统计性质,这是基于Laplacian Eigenmaps(Le)的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =(y_1,\ ldots,y_n)$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下,PCR-le达到估计的最佳速率(其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /(2s + d) )} $)和健美的测试($ n ^ { - 4s /(4s + d)$)。我们还表明PCR-LE是\ EMPH {歧管Adaptive}:即,我们考虑在小型内在维度$ M $的歧管上支持设计的情况,并为PCR-LE提供更快的界限Minimax估计($ n ^ { - 2s /(2s + m)$)和测试($ n ^ { - 4s /(4s + m)$)收敛率。有趣的是,这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说,对于这个问题的回归估计的特征似乎更容易,统计上讲,而不是估计特征本身。我们通过经验证据支持这些理论结果。
translated by 谷歌翻译
我们研究了非参数脊的最小二乘的学习属性。特别是,我们考虑常见的估计人的估计案例,由比例依赖性内核定义,并专注于规模的作用。这些估计器内插数据,可以显示规模来通过条件号控制其稳定性。我们的分析表明,这是不同的制度,具体取决于样本大小,其尺寸与问题的平滑度之间的相互作用。实际上,当样本大小小于数据维度中的指数时,可以选择比例,以便学习错误减少。随着样本尺寸变大,总体错误停止减小但有趣地可以选择规模,使得噪声引起的差异仍然存在界线。我们的分析结合了概率,具有来自插值理论的许多分析技术。
translated by 谷歌翻译
We study a natural extension of classical empirical risk minimization, where the hypothesis space is a random subspace of a given space. In particular, we consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystrom approaches for kernel methods. Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded. These statistical-computational tradeoffs have been recently explored for the least squares loss and self-concordant loss functions, such as the logistic loss. Here, we work to extend these results to convex Lipschitz loss functions, that might not be smooth, such as the hinge loss used in support vector machines. This unified analysis requires developing new proofs, that use different technical tools, such as sub-gaussian inputs, to achieve fast rates. Our main results show the existence of different settings, depending on how hard the learning problem is, for which computational efficiency can be improved with no loss in performance.
translated by 谷歌翻译
比较概率分布是许多机器学习算法的关键。最大平均差异(MMD)和最佳运输距离(OT)是在过去几年吸引丰富的关注的概率措施之间的两类距离。本文建立了一些条件,可以通过MMD规范控制Wassersein距离。我们的作品受到压缩统计学习(CSL)理论的推动,资源有效的大规模学习的一般框架,其中训练数据总结在单个向量(称为草图)中,该训练数据捕获与所考虑的学习任务相关的信息。在CSL中的现有结果启发,我们介绍了H \“较旧的较低限制的等距属性(H \”较旧的LRIP)并表明这家属性具有有趣的保证对压缩统计学习。基于MMD与Wassersein距离之间的关系,我们通过引入和研究学习任务的Wassersein可读性的概念来提供压缩统计学习的保证,即概率分布之间的某些特定于特定的特定度量,可以由Wassersein界定距离。
translated by 谷歌翻译
光谱滤波理论是一个显着的工具,可以了解用核心学习的统计特性。对于最小二乘来,它允许导出各种正则化方案,其产生的速度超越风险的收敛率比Tikhonov正规化更快。这通常通过利用称为源和容量条件的经典假设来实现,这表征了学习任务的难度。为了了解来自其他损失功能的估计,Marteau-Ferey等。已经将Tikhonov正规化理论扩展到广义自助损失功能(GSC),其包含例如物流损失。在本文中,我们进一步逐步,并表明通过使用迭代的Tikhonov正规方案,可以实现快速和最佳的速率,该计划与优化中的近端点方法有本质相关,并克服了古典Tikhonov规范化的限制。
translated by 谷歌翻译
在机器学习或统计中,通常希望减少高维空间$ \ mathbb {r} ^ d $的数据点样本的维度。本文介绍了一种维度还原方法,其中嵌入坐标是作为半定程序无限尺寸模拟的溶液获得的正半定核的特征向量。这种嵌入是自适应和非线性的。我们对学习内核的弱者和强烈的平滑假设讨论了这个问题。我们的方法的主要特点是在两种情况下存在嵌入坐标的样本延伸公式。该外推公式产生内核矩阵的延伸到数据相关的Mercer内核功能。我们的经验结果表明,与光谱嵌入方法相比,该嵌入方法对异常值的影响更加稳健。
translated by 谷歌翻译
教师 - 学生模型提供了一个框架,其中可以以封闭形式描述高维监督学习的典型情况。高斯I.I.D的假设然而,可以认为典型教师 - 学生模型的输入数据可以被认为过于限制,以捕获现实数据集的行为。在本文中,我们介绍了教师和学生可以在不同的空格上行动的模型的高斯协变态概括,以固定的,而是通用的特征映射。虽然仍处于封闭形式的仍然可解决,但这种概括能够捕获广泛的现实数据集的学习曲线,从而兑现师生框架的潜力。我们的贡献是两倍:首先,我们证明了渐近培训损失和泛化误差的严格公式。其次,我们呈现了许多情况,其中模型的学习曲线捕获了使用内​​核回归和分类学习的现实数据集之一,其中盒出开箱特征映射,例如随机投影或散射变换,或者与散射变换预先学习的 - 例如通过培训多层神经网络学到的特征。我们讨论了框架的权力和局限性。
translated by 谷歌翻译
在这项工作中,我们考虑线性逆问题$ y = ax + \ epsilon $,其中$ a \ colon x \ to y $是可分离的hilbert spaces $ x $和$ y $之间的已知线性运算符,$ x $。 $ x $和$ \ epsilon $中的随机变量是$ y $的零平均随机过程。该设置涵盖成像中的几个逆问题,包括去噪,去束和X射线层析造影。在古典正规框架内,我们专注于正则化功能的情况下未能先验,而是从数据中学习。我们的第一个结果是关于均方误差的最佳广义Tikhonov规则器的表征。我们发现它完全独立于前向操作员$ a $,并仅取决于$ x $的平均值和协方差。然后,我们考虑从两个不同框架中设置的有限训练中学习常规程序的问题:一个监督,根据$ x $和$ y $的样本,只有一个无人监督,只基于$ x $的样本。在这两种情况下,我们证明了泛化界限,在X $和$ \ epsilon $的分发的一些弱假设下,包括子高斯变量的情况。我们的界限保持在无限尺寸的空间中,从而表明更精细和更细的离散化不会使这个学习问题更加困难。结果通过数值模拟验证。
translated by 谷歌翻译
内核方法是学习算法,这些算法享有坚实的理论基础,同时遭受了重要的计算局限性。素描包括在缩小尺寸的子空间中寻找解决方案,是一种经过广泛研究的方法来减轻这种数值负担。但是,快速的草图策略(例如非自适应子采样)大大降低了算法的保证,而理论上准确的草图(例如高斯曲线)在实践中的实践相对较慢。在本文中,我们介绍了$ p $ -sparsified的草图,这些草图结合了两种方法的好处,以实现统计准确性和计算效率之间的良好权衡。为了支持我们的方法,我们在单个和多个输出问题上得出了多余的风险范围,并具有通用Lipschitz损失,从可靠的回归到多个分位数回归为广泛的应用提供了新的保证。我们还提供了草图优于最近SOTA方法的优势的经验证据。
translated by 谷歌翻译
We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semi-supervised framework that incorporates labeled and unlabeled data in a general-purpose learner. Some transductive graph learning algorithms and standard methods including support vector machines and regularized least squares can be obtained as special cases. We use properties of reproducing kernel Hilbert spaces to prove new Representer theorems that provide theoretical basis for the algorithms. As a result (in contrast to purely graph-based approaches) we obtain a natural out-of-sample extension to novel examples and so are able to handle both transductive and truly semi-supervised settings. We present experimental evidence suggesting that our semi-supervised algorithms are able to use unlabeled data effectively. Finally we have a brief discussion of unsupervised and fully supervised learning within our general framework.
translated by 谷歌翻译
我们考虑通过复制内核希尔伯特空间的相关协方差操作员对概率分布进行分析。我们表明,冯·诺伊曼(Von Neumann)的熵和这些操作员的相对熵与香农熵和相对熵的通常概念密切相关,并具有许多特性。它们与来自概率分布的各种口径的有效估计算法结合在一起。我们还考虑了产品空间,并表明对于张量产品内核,我们可以定义互信息和联合熵的概念,然后可以完美地表征独立性,但只能部分条件独立。我们最终展示了这些新的相对熵概念如何导致对数分区函数的新上限,这些函数可以与变异推理方法中的凸优化一起使用,从而提供了新的概率推理方法家族。
translated by 谷歌翻译
Interacting particle or agent systems that display a rich variety of swarming behaviours are ubiquitous in science and engineering. A fundamental and challenging goal is to understand the link between individual interaction rules and swarming. In this paper, we study the data-driven discovery of a second-order particle swarming model that describes the evolution of $N$ particles in $\mathbb{R}^d$ under radial interactions. We propose a learning approach that models the latent radial interaction function as Gaussian processes, which can simultaneously fulfill two inference goals: one is the nonparametric inference of {the} interaction function with pointwise uncertainty quantification, and the other one is the inference of unknown scalar parameters in the non-collective friction forces of the system. We formulate the learning problem as a statistical inverse problem and provide a detailed analysis of recoverability conditions, establishing that a coercivity condition is sufficient for recoverability. Given data collected from $M$ i.i.d trajectories with independent Gaussian observational noise, we provide a finite-sample analysis, showing that our posterior mean estimator converges in a Reproducing kernel Hilbert space norm, at an optimal rate in $M$ equal to the one in the classical 1-dimensional Kernel Ridge regression. As a byproduct, we show we can obtain a parametric learning rate in $M$ for the posterior marginal variance using $L^{\infty}$ norm, and the rate could also involve $N$ and $L$ (the number of observation time instances for each trajectory), depending on the condition number of the inverse problem. Numerical results on systems that exhibit different swarming behaviors demonstrate efficient learning of our approach from scarce noisy trajectory data.
translated by 谷歌翻译
现代神经网络通常以强烈的过度构造状态运行:它们包含许多参数,即使实际标签被纯粹随机的标签代替,它们也可以插入训练集。尽管如此,他们在看不见的数据上达到了良好的预测错误:插值训练集并不会导致巨大的概括错误。此外,过度散色化似乎是有益的,因为它简化了优化景观。在这里,我们在神经切线(NT)制度中的两层神经网络的背景下研究这些现象。我们考虑了一个简单的数据模型,以及各向同性协变量的矢量,$ d $尺寸和$ n $隐藏的神经元。我们假设样本量$ n $和尺寸$ d $都很大,并且它们在多项式上相关。我们的第一个主要结果是对过份术的经验NT内核的特征结构的特征。这种表征意味着必然的表明,经验NT内核的最低特征值在$ ND \ gg n $后立即从零界限,因此网络可以在同一制度中精确插值任意标签。我们的第二个主要结果是对NT Ridge回归的概括误差的表征,包括特殊情况,最小值-ULL_2 $ NORD插值。我们证明,一旦$ nd \ gg n $,测试误差就会被内核岭回归之一相对于无限宽度内核而近似。多项式脊回归的误差依次近似后者,从而通过与激活函数的高度组件相关的“自我诱导的”项增加了正则化参数。多项式程度取决于样本量和尺寸(尤其是$ \ log n/\ log d $)。
translated by 谷歌翻译
We consider neural networks with a single hidden layer and non-decreasing positively homogeneous activation functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical non-Euclidean regularization tools on the output weights, they lead to a convex optimization problem and we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace. Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of observations. However, solving this convex optimization problem in infinite dimensions is only possible if the non-convex subproblem of addition of a new unit can be solved efficiently. We provide a simple geometric interpretation for our choice of activation functions and describe simple conditions for convex relaxations of the finite-dimensional non-convex subproblem to achieve the same generalization error bounds, even when constant-factor approximations cannot be found. We were not able to find strong enough convex relaxations to obtain provably polynomial-time algorithms and leave open the existence or non-existence of such tractable algorithms with non-exponential sample complexities.
translated by 谷歌翻译
最近的实证工作表明,由卷积神经网络(CNNS)启发的分层卷积核(CNNS)显着提高了内核方法​​在图像分类任务中的性能。对这些架构成功的广泛解释是它们编码适合自然图像的假设类。然而,了解卷积架构中近似和泛化之间的精确相互作用仍然是一个挑战。在本文中,我们考虑均匀分布在超立方体上的协变量(图像像素)的程式化设置,并完全表征由单层卷积,汇集和下采样操作组成的内核的RKH。然后,我们使用这些内核通过标准内部产品内核来研究内核方法的样本效率的增益。特别是,我们展示了1)卷积层通过将RKHS限制为“本地”功能来打破维度的诅咒; 2)局部汇集偏置朝向低频功能,这是较小的翻译稳定; 3)下采样可以修改高频成粒空间,但留下了大致不变的低频部分。值得注意的是,我们的结果量化了选择适应目标函数的架构如何导致样本复杂性的大量改善。
translated by 谷歌翻译
We study a class of dynamical systems modelled as Markov chains that admit an invariant distribution via the corresponding transfer, or Koopman, operator. While data-driven algorithms to reconstruct such operators are well known, their relationship with statistical learning is largely unexplored. We formalize a framework to learn the Koopman operator from finite data trajectories of the dynamical system. We consider the restriction of this operator to a reproducing kernel Hilbert space and introduce a notion of risk, from which different estimators naturally arise. We link the risk with the estimation of the spectral decomposition of the Koopman operator. These observations motivate a reduced-rank operator regression (RRR) estimator. We derive learning bounds for the proposed estimator, holding both in i.i.d. and non i.i.d. settings, the latter in terms of mixing coefficients. Our results suggest RRR might be beneficial over other widely used estimators as confirmed in numerical experiments both for forecasting and mode decomposition.
translated by 谷歌翻译
当图形亲和力矩阵是由$ n $随机样品构建的,在$ d $ d $维歧管上构建图形亲和力矩阵时,这项工作研究图形拉普拉斯元素与拉普拉斯 - 贝特拉米操作员的光谱收敛。通过分析DIRICHLET形成融合并通过歧管加热核卷积构建候选本本函数,我们证明,使用高斯内核,可以设置核band band band band parame $ \ epsilon \ sim \ sim(\ log n/ n/ n)^{1/(D /2+2)} $使得特征值收敛率为$ n^{ - 1/(d/2+2)} $,并且2-norm中的特征向量收敛率$ n^{ - 1/(d+) 4)} $;当$ \ epsilon \ sim(\ log n/n)^{1/(d/2+3)} $时,eigenValue和eigenVector速率均为$ n^{ - 1/(d/2+3)} $。这些费率最高为$ \ log n $因素,并被证明是有限的许多低洼特征值。当数据在歧管上均匀采样以及密度校正的图laplacian(在两个边的度矩阵中归一化)时,结果适用于非归一化和随机漫步图拉普拉斯laplacians laplacians laplacians以及密度校正的图laplacian(其中两侧的级别矩阵)采样数据。作为中间结果,我们证明了密度校正图拉普拉斯的新点和差异形式的收敛速率。提供数值结果以验证理论。
translated by 谷歌翻译
近年来目睹了采用灵活的机械学习模型进行乐器变量(IV)回归的兴趣,但仍然缺乏不确定性量化方法的发展。在这项工作中,我们为IV次数回归提出了一种新的Quasi-Bayesian程序,建立了最近开发的核化IV模型和IV回归的双/极小配方。我们通过在$ l_2 $和sobolev规范中建立最低限度的最佳收缩率,并讨论可信球的常见有效性来分析所提出的方法的频繁行为。我们进一步推出了一种可扩展的推理算法,可以扩展到与宽神经网络模型一起工作。实证评价表明,我们的方法对复杂的高维问题产生了丰富的不确定性估计。
translated by 谷歌翻译