智能论文笔记

An $l_1$-oracle inequality for the Lasso in high-dimensional mixtures of experts models

TrungTin Nguyen , Hien D Nguyen , Faicel Chamroukhi , Geoffrey J McLachlan

分类：人工智能 | 机器学习 | (统计)机器学习

2020-09-22

专家（MOE）模型的混合物是对数据中的异质性建模的流行框架，由于其灵活性以及可用的统计估计和模型选择工具的丰富性，用于统计和机器学习中的回归和分类问题。这种灵活性来自于允许MOE模型中的混合物重量（或门控函数）与专家（或组件密度）一起取决于解释变量。与经典的有限混合物和回归模型的有限混合物相比，这允许由更复杂的数据生成过程产生的数据建模，该过程的混合参数与协变量无关。从计算的角度来看，当解释变量的数量可能大于样本量时，MOE模型在高维度中的使用是挑战的，尤其是从理论的角度来看，文献是对于统计估计和特征选择问题，仍缺乏处理维度诅咒的结果。我们考虑具有软马克斯门控函数和高斯专家的有限MOE模型，用于在异质数据上进行高维回归，并通过Lasso进行$ L_1 $调查的估计。我们专注于拉索估计属性，而不是其特征选择属性。我们在LASSO函数的正规化参数上提供了一个下限，该参数确保了根据Kullback-Leibler损失，Lasso估算器满足了$ L_1 $ -ORACLE不平等。

translated by 谷歌翻译

A non-asymptotic approach for model selection via penalization in high-dimensional mixture of experts models

TrungTin Nguyen , Hien Duy Nguyen , Faicel Chamroukhi , Florence Forbes

分类：人工智能 | 机器学习 | (统计)机器学习

2021-04-06

专家（MOE）的混合是一种流行的统计和机器学习模型，由于其灵活性和效率，多年来一直引起关注。在这项工作中，我们将高斯门控的局部MOE（GLOME）和块对基因协方差局部MOE（Blome）回归模型在异质数据中呈现非线性关系，并在高维预测变量之间具有潜在的隐藏图形结构相互作用。这些模型从计算和理论角度提出了困难的统计估计和模型选择问题。本文致力于研究以混合成分数量，高斯平均专家的复杂性以及协方差矩阵的隐藏块 - 基因结构为特征的Glome或Blome模型集合中的模型选择问题。惩罚最大似然估计框架。特别是，我们建立了以弱甲骨文不平等的形式的非反应风险界限，但前提是罚款的下限。然后，在合成和真实数据集上证明了我们的模型的良好经验行为。

translated by 谷歌翻译

On lower bounds for the bias-variance trade-off

Alexis Derumigny , Johannes Schmidt-Hieber

分类： (统计)机器学习

2020-05-30

对于高维和非参数统计模型，速率最优估计器平衡平方偏差和方差是一种常见的现象。虽然这种平衡被广泛观察到，但很少知道是否存在可以避免偏差和方差之间的权衡的方法。我们提出了一般的策略，以获得对任何估计方差的下限，偏差小于预先限定的界限。这表明偏差差异折衷的程度是不可避免的，并且允许量化不服从其的方法的性能损失。该方法基于许多抽象的下限，用于涉及关于不同概率措施的预期变化以及诸如Kullback-Leibler或Chi-Sque-diversence的信息措施的变化。其中一些不平等依赖于信息矩阵的新概念。在该物品的第二部分中，将抽象的下限应用于几种统计模型，包括高斯白噪声模型，边界估计问题，高斯序列模型和高维线性回归模型。对于这些特定的统计应用，发生不同类型的偏差差异发生，其实力变化很大。对于高斯白噪声模型中集成平方偏置和集成方差之间的权衡，我们将较低界限的一般策略与减少技术相结合。这允许我们将原始问题与估计的估计器中的偏差折衷联动，以更简单的统计模型中具有额外的对称性属性。在高斯序列模型中，发生偏差差异的不同相位转换。虽然偏差和方差之间存在非平凡的相互作用，但是平方偏差的速率和方差不必平衡以实现最小估计速率。

translated by 谷歌翻译

Minimax Semiparametric Learning With Approximate Sparsity

Jelena Bradic , Victor Chernozhukov , Whitney K. Newey , Yinchu Zhu

分类： (统计)机器学习

2019-12-27

本文涉及根N的可行性和手段，始终估算高维，大约稀疏回归的线性，均方连续功能。这些对象包括各种有趣的参数，例如回归系数，平均衍生物和平均治疗效果。我们给出了回归斜率和平均导数的估计量的收敛速率的下限，并发现这些界限大大比低维，半参数设置大。我们还提供了依据的机器学习者，这些学习者在最小的稀疏条件或速率双重鲁棒性下是一致的。这些估计值对在先前已知的更一般条件下保持root-n一致的现有估计值有所改善。

translated by 谷歌翻译

An improper estimator with optimal excess risk in misspecified density estimation and logistic regression

Jaouad Mourtada , Stéphane Gaïffas

分类：机器学习 | (统计)机器学习

2019-12-23

我们在对数损失下引入条件密度估计的过程，我们调用SMP（样本Minmax预测器）。该估算器最大限度地减少了统计学习的新一般过度风险。在标准示例中，此绑定量表为$ d / n $，$ d $ d $模型维度和$ n $ sample大小，并在模型拼写条目下批判性仍然有效。作为一个不当（超出型号）的程序，SMP在模型内估算器（如最大似然估计）的内部估算器上，其风险过高的风险降低。相比，与顺序问题的方法相比，我们的界限删除了SubOltimal $ \ log n $因子，可以处理无限的类。对于高斯线性模型，SMP的预测和风险受到协变量的杠杆分数，几乎匹配了在没有条件的线性模型的噪声方差或近似误差的条件下匹配的最佳风险。对于Logistic回归，SMP提供了一种非贝叶斯方法来校准依赖于虚拟样本的概率预测，并且可以通过解决两个逻辑回归来计算。它达到了$ O的非渐近风险（（d + b ^ 2r ^ 2）/ n）$，其中$ r $绑定了特征的规范和比较参数的$ B $。相比之下，在模型内估计器内没有比$ \ min达到更好的速率（{b r} / {\ sqrt {n}}，{d e ^ {br} / {n}）$。这为贝叶斯方法提供了更实用的替代方法，这需要近似的后部采样，从而部分地解决了Foster等人提出的问题。（2018）。

translated by 谷歌翻译

Strong identifiability and parameter learning in regression with heterogeneous response

Dat Do , Linh Do , XuanLong Nguyen

分类： (统计)机器学习

2022-12-08

Mixtures of regression are a powerful class of models for regression learning with respect to a highly uncertain and heterogeneous response variable of interest. In addition to being a rich predictive model for the response given some covariates, the parameters in this model class provide useful information about the heterogeneity in the data population, which is represented by the conditional distributions for the response given the covariates associated with a number of distinct but latent subpopulations. In this paper, we investigate conditions of strong identifiability, rates of convergence for conditional density and parameter estimation, and the Bayesian posterior contraction behavior arising in finite mixture of regression models, under exact-fitted and over-fitted settings and when the number of components is unknown. This theory is applicable to common choices of link functions and families of conditional distributions employed by practitioners. We provide simulation studies and data illustrations, which shed some light on the parameter learning behavior found in several popular regression mixture models reported in the literature.

translated by 谷歌翻译

Asymptotic Statistical Analysis of $f$-divergence GAN

Xinwei Shen , Kani Chen , Tong Zhang

分类：机器学习 | (统计)机器学习

2022-09-14

生成对抗网络（GAN）在数据生成方面取得了巨大成功。但是，其统计特性尚未完全理解。在本文中，我们考虑了GAN的一般$ f $ divergence公式的统计行为，其中包括Kullback- Leibler Divergence与最大似然原理密切相关。我们表明，对于正确指定的参数生成模型，在适当的规律性条件下，所有具有相同歧视类别类别的$ f $ divergence gans均在渐近上等效。 Moreover, with an appropriately chosen local discriminator, they become equivalent to the maximum likelihood estimate asymptotically.对于被误解的生成模型，具有不同$ f $ -Divergences {收敛到不同估计器}的gan，因此无法直接比较。但是，结果表明，对于某些常用的$ f $ -Diverences，原始的$ f $ gan并不是最佳的，因为当更换原始$ f $ gan配方中的判别器培训时，可以实现较小的渐近方差通过逻辑回归。结果估计方法称为对抗梯度估计（年龄）。提供了实证研究来支持该理论，并证明了年龄的优势，而不是模型错误的原始$ f $ gans。

translated by 谷歌翻译

Non-Asymptotic Guarantees for Robust Statistical Learning under $(1+\varepsilon)$-th Moment Assumption

Lihu Xu , Fang Yao , Qiuran Yao , Huiming Zhang

分类： (统计)机器学习 | 机器学习

2022-01-10

在统计和机器学习中具有重尾数据的模型开发强大的估计估计兴趣兴趣。本文提出了一个用于大家庭统计回归的日志截断的M估计，并在数据具有$ \ varepsilon \中的数据（0，1] $。随着相关风险函数的额外假设，我们获得了估计的$ \ ell_2 $ -Error绑定。我们的定理应用于建立具体回归的强大M估计。除了凸面回归等分位数回归之外广义线性模型，许多非凸回归也可以符合我们的定理，我们专注于强大的深度神经网络回归，这可以通过随机梯度下降算法解决。模拟和实际数据分析证明了日志截断估计的优越性超过标准估计。

translated by 谷歌翻译

Concentration analysis of multivariate elliptic diffusion processes

Cathrine Aeckerle-Willems , Claudia Strauch , Lukas Trottner

分类： (统计)机器学习

2022-06-07

我们证明了连续和离散时间添加功能的浓度不平等和相关的PAC界限，用于可能是多元，不可逆扩散过程的无界函数。我们的分析依赖于通过泊松方程的方法，使我们能够考虑一系列非常广泛的指数性千古过程。这些结果增加了现有的浓度不平等，用于扩散过程的加性功能，这些功能仅适用于有界函数或从明显较小的类别中的过程的无限函数。我们通过两个截然不同的区域的例子来证明这些指数不平等的力量。考虑到在稀疏性约束下可能具有高维参数非线性漂移模型，我们应用连续的时间浓度结果来验证套索估计的受限特征值条件，这对于甲骨文不平等的推导至关重要。离散添加功能的结果用于研究未经调整的Langevin MCMC算法，用于采样中等重尾密度$ \ pi $。特别是，我们为多项式增长功能$ f $的样品蒙特卡洛估计量$ \ pi（f）提供PAC边界，以量化足够的样本和阶梯尺寸，以在规定的边距内近似具有很高的可能性。

translated by 谷歌翻译

Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates

Nicholas J. Irons , Meyer Scetbon , Soumik Pal , Zaid Harchaoui

分类： (统计)机器学习 | 机器学习

2021-12-31

三角形流量，也称为kn \“{o}的Rosenblatt测量耦合，包括用于生成建模和密度估计的归一化流模型的重要构建块，包括诸如实值的非体积保存变换模型的流行自回归流模型（真实的NVP）。我们提出了三角形流量统计模型的统计保证和样本复杂性界限。特别是，我们建立了KN的统计一致性和kullback-leibler估算器的rospblatt的kullback-leibler估计的有限样本会聚率使用实证过程理论的工具测量耦合。我们的结果突出了三角形流动下播放功能类的各向异性几何形状，优化坐标排序，并导致雅各比比流动的统计保证。我们对合成数据进行数值实验，以说明我们理论发现的实际意义。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Retire: Robust Expectile Regression in High Dimensions

Rebeka Man , Kean Ming Tan , Zian Wang , Wen-Xin Zhou

分类： (统计)机器学习

2022-12-11

High-dimensional data can often display heterogeneity due to heteroscedastic variance or inhomogeneous covariate effects. Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data. The former is computationally challenging due to the non-smooth nature of the check loss, and the latter is sensitive to heavy-tailed error distributions. In this paper, we propose and study (penalized) robust expectile regression (retire), with a focus on iteratively reweighted $\ell_1$-penalization which reduces the estimation bias from $\ell_1$-penalization and leads to oracle properties. Theoretically, we establish the statistical properties of the retire estimator under two regimes: (i) low-dimensional regime in which $d \ll n$; (ii) high-dimensional regime in which $s\ll n\ll d$ with $s$ denoting the number of significant predictors. In the high-dimensional setting, we carefully characterize the solution path of the iteratively reweighted $\ell_1$-penalized retire estimation, adapted from the local linear approximation algorithm for folded-concave regularization. Under a mild minimum signal strength condition, we show that after as many as $\log(\log d)$ iterations the final iterate enjoys the oracle convergence rate. At each iteration, the weighted $\ell_1$-penalized convex program can be efficiently solved by a semismooth Newton coordinate descent algorithm. Numerical studies demonstrate the competitive performance of the proposed procedure compared with either non-robust or quantile regression based alternatives.

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

Data-Driven Sample Average Approximation with Covariate Information

Rohit Kannan , Güzin Bayraksan , James R. Luedtke

分类： (统计)机器学习

2022-07-27

当我们对优化模型中的不确定参数进行观察以及对协变量的同时观察时，我们研究了数据驱动决策的优化。鉴于新的协变量观察，目标是选择一个决定以此观察为条件的预期成本的决定。我们研究了三个数据驱动的框架，这些框架将机器学习预测模型集成在随机编程样本平均值近似（SAA）中，以近似解决该问题的解决方案。 SAA框架中的两个是新的，并使用了场景生成的剩余预测模型的样本外残差。我们研究的框架是灵活的，并且可以容纳参数，非参数和半参数回归技术。我们在数据生成过程，预测模型和随机程序中得出条件，在这些程序下，这些数据驱动的SaaS的解决方案是一致且渐近最佳的，并且还得出了收敛速率和有限的样本保证。计算实验验证了我们的理论结果，证明了我们数据驱动的公式比现有方法的潜在优势（即使预测模型被误解了），并说明了我们在有限的数据制度中新的数据驱动配方的好处。

translated by 谷歌翻译

Optimal learning of high-dimensional classification problems using deep neural networks

Philipp Petersen , Felix Voigtlaender

分类：机器学习 | (统计)机器学习

2021-12-23

我们在决策边界是一定规律的假设下，研究从无噪声训练样本的学习分类功能的问题。我们为这一估计问题建立了普遍的下限，对于连续决策边界的一般阶级。对于本地禁区的类别，我们发现最佳估计率基本上独立于底层维度，并且可以通过在适当类的深神经网络上通过经验风险最小化方法实现。这些结果基于$ l ^ 1 $和$ l ^ \ infty $ intty $ inthty $ off的禁区常规职能的新颖估计数。

translated by 谷歌翻译

The Lasso with general Gaussian designs with applications to hypothesis testing

Michael Celentano , Andrea Montanari , Yuting Wei

分类：机器学习 | (统计)机器学习

2020-07-27

套索是一种高维回归的方法，当时，当协变量$ p $的订单数量或大于观测值$ n $时，通常使用它。由于两个基本原因，经典的渐近态性理论不适用于该模型：$（1）$正规风险是非平滑的； $（2）$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果，标准的扰动论点是渐近正态性的传统基础。另一方面，套索估计器可以精确地以$ n $和$ p $大，$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量：在这里，我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限，它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序，我们研究了借助拉索的分布，并表明需要校正程度对于计算有效的置信区间是必要的。

translated by 谷歌翻译

On minimax density estimation via measure transport

Sven Wang , Youssef Marzouk

分类： (统计)机器学习

2022-07-20

我们研究基于度量传输的非参数密度估计器的收敛性和相关距离。这些估计量代表了利息的度量，作为传输图下选择的参考分布的推动力，其中地图是通过最大似然目标选择（等效地，将经验性的kullback-leibler损失）或其受惩罚版本选择。我们通过将M估计的技术与基于运输的密度表示的分析性能相结合，为一般惩罚措施估计量的一般类别的措施运输估计器建立了浓度不平等。然后，我们证明了我们的理论对三角形knothe-rosenblatt（kr）在$ d $维单元方面的运输的含义，并表明该估计器的惩罚和未化的版本都达到了Minimax最佳收敛速率，超过了H \ \ \'“较旧的密度类别。具体来说，我们建立了在有限的h \“较旧型球上，未确定的非参数最大似然估计，然后在某些sobolev-penalate的估计器和筛分的小波估计器中建立了最佳速率。

translated by 谷歌翻译

Off-the-grid learning of sparse mixtures from a continuous dictionary

Cristina Butucea , Jean-François Delmas , Anne Dutfoy , Clément Hardy

分类： (统计)机器学习 | 机器学习

2022-06-29

我们考虑了一个通用的非线性模型，其中信号是未知（可能增加的，可能增加的特征数量）的有限混合物，该特征是由由真实非线性参数参数化的连续字典发出的。在连续或离散设置中使用高斯（可能相关）噪声观察信号。我们提出了一种网格优化方法，即一种不使用参数空间上任何离散化方案的方法来估计特征的非线性参数和混合物的线性参数。我们使用有关离网方法的几何形状的最新结果，在真实的基础非线性参数上给出最小的分离，以便可以构建插值证书函数。还使用尾部界限，用于高斯过程的上流，我们将预测误差限制为高概率。假设可以构建证书函数，我们的预测误差绑定到日志 - 因线性回归模型中LASSO预测器所达到的速率类似。我们还建立了收敛速率，以高概率量化线性和非线性参数的估计质量。

translated by 谷歌翻译

Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

Andrea Montanari , Kangjie Zhou

分类： (统计)机器学习 | 机器学习

2022-06-14

给定$ n $数据点$ \ mathbb {r}^d $中的云，请考虑$ \ mathbb {r}^d $的$ m $ dimensional子空间预计点。当$ n，d $增长时，这一概率分布的集合如何？我们在零模型下考虑了这个问题。标准高斯矢量，重点是渐近方案，其中$ n，d \ to \ infty $，$ n/d \ to \ alpha \ in（0，\ infty）$，而$ m $是固定的。用$ \ mathscr {f} _ {m，\ alpha} $表示$ \ mathbb {r}^m $中的一组概率分布，在此限制中以低维度为单位，我们在此限制中建立了新的内部和外部界限$ \ mathscr {f} _ {m，\ alpha} $。特别是，我们将$ \ mathscr {f} _ {m，\ alpha} $的Wasserstein Radius表征为对数因素，并以$ M = 1 $确切确定它。我们还通过kullback-leibler差异和r \'{e} NYI信息维度证明了尖锐的界限。上一个问题已应用于无监督的学习方法，例如投影追求和独立的组件分析。我们介绍了与监督学习相关的相同问题的版本，并证明了尖锐的沃斯坦斯坦半径绑定。作为一个应用程序，我们在具有$ M $隐藏神经元的两层神经网络的插值阈值上建立了上限。

translated by 谷歌翻译

Estimating the minimizer and the minimum value of a regression function under passive design

Arya Akhavan , Davit Gogolashvili , Alexandre B. Tsybakov

分类： (统计)机器学习

2022-11-29

We propose a new method for estimating the minimizer $\boldsymbol{x}^*$ and the minimum value $f^*$ of a smooth and strongly convex regression function $f$ from the observations contaminated by random noise. Our estimator $\boldsymbol{z}_n$ of the minimizer $\boldsymbol{x}^*$ is based on a version of the projected gradient descent with the gradient estimated by a regularized local polynomial algorithm. Next, we propose a two-stage procedure for estimation of the minimum value $f^*$ of regression function $f$. At the first stage, we construct an accurate enough estimator of $\boldsymbol{x}^*$, which can be, for example, $\boldsymbol{z}_n$. At the second stage, we estimate the function value at the point obtained in the first stage using a rate optimal nonparametric procedure. We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $\boldsymbol{z}_n$, and for the risk of estimating $f^*$. We establish minimax lower bounds showing that, under certain choice of parameters, the proposed algorithms achieve the minimax optimal rates of convergence on the class of smooth and strongly convex functions.

translated by 谷歌翻译