智能论文笔记

Smooth Nested Simulation: Bridging Cubic and Square Root Convergence Rates in High Dimensions

Wenjia Wang , Yanyuan Wang , Xiaowei Zhang

分类： (统计)机器学习

2022-01-09

嵌套模拟涉及通过模拟估算条件期望的功能。在本文中，我们提出了一种基于内核RIDGE回归的新方法，利用作为多维调节变量的函数的条件期望的平滑度。渐近分析表明，随着仿真预算的增加，所提出的方法可以有效地减轻了对收敛速度的维度诅咒，只要条件期望足够平滑。平滑度桥接立方根收敛速度之间的间隙（即标准嵌套模拟的最佳速率）和平方根收敛速率（即标准蒙特卡罗模拟的规范率）。我们通过来自投资组合风险管理和输入不确定性量化的数值例子来证明所提出的方法的性能。

translated by 谷歌翻译

Sample and Computationally Efficient Stochastic Kriging in High Dimensions

Liang Ding , Xiaowei Zhang

分类： (统计)机器学习

2020-10-14

随机kriging已被广泛用于模拟元模拟，以预测复杂模拟模型的响应表面。但是，它的使用仅限于设计空间低维的情况，因为通常，样品复杂性（即随机Kriging生成准确预测所需的设计点数量）在设计的维度上呈指数增长。空间。大型样本量导致运行模拟模型的过度样本成本和由于需要倒入大量协方差矩阵而引起的严重计算挑战。基于张量的马尔可夫内核和稀疏的网格实验设计，我们开发了一种新颖的方法，可极大地减轻维数的诅咒。我们表明，即使在模型错误指定下，提议的方法论的样本复杂性也仅在维度上略有增长。我们还开发了快速算法，这些算法以其精确形式计算随机kriging，而无需任何近似方案。我们通过广泛的数值实验证明，我们的方法可以通过超过10,000维的设计空间来处理问题，从而通过相对于典型的替代方法在实践中通过数量级来提高预测准确性和计算效率。

translated by 谷歌翻译

Quasi-Bayesian Dual Instrumental Variable Regression

Ziyu Wang , Yuhao Zhou , Tongzheng Ren , Jun Zhu

分类： (统计)机器学习 | 机器学习

2021-06-16

近年来目睹了采用灵活的机械学习模型进行乐器变量（IV）回归的兴趣，但仍然缺乏不确定性量化方法的发展。在这项工作中，我们为IV次数回归提出了一种新的Quasi-Bayesian程序，建立了最近开发的核化IV模型和IV回归的双/极小配方。我们通过在$ l_2 $和sobolev规范中建立最低限度的最佳收缩率，并讨论可信球的常见有效性来分析所提出的方法的频繁行为。我们进一步推出了一种可扩展的推理算法，可以扩展到与宽神经网络模型一起工作。实证评价表明，我们的方法对复杂的高维问题产生了丰富的不确定性估计。

translated by 谷歌翻译

On the Estimation of Derivatives Using Plug-in KRR Estimators

Zejian Liu , Meng Li

分类： (统计)机器学习 | 机器学习

2020-06-02

我们研究了估计回归函数的导数的问题，该函数的衍生物具有广泛的应用，作为未知函数的关键非参数功能。标准分析可以定制为特定的衍生订单，参数调整仍然是一个艰巨的挑战，尤其是对于高阶导数。在本文中，我们提出了一个简单的插入式内核脊回归（KRR）估计器，其非参数回归中具有随机设计，该设计广泛适用于多维支持和任意混合派生衍生物。我们提供了非反应分析，以统一的方式研究提出的估计量的行为，该估计量涵盖回归函数及其衍生物，从而在强$ l_ \ infty $ norm中导致一般核类中的一般内核的两个误差范围。在专门针对多个多项式衰减特征值核的具体示例中，提出的估计器将最小值的最佳速率恢复到估计H \ h \ offormions ofergarithmic因子的最佳速率。因此，在任何衍生词的顺序中都选择了调整参数。因此，提出的估计器享受\ textIt {插件属性}的衍生物，因为它会自动适应要估计的衍生物顺序，从而可以轻松地在实践中调整。我们的仿真研究表明，相对于几种现有方法蓝色的几种现有方法的有限样本性能有限，并证实了其最小值最优性的理论发现。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Vishal Gupta , Michael Huang , Paat Rusmevichientong

分类：机器学习 | (统计)机器学习

2021-07-26

由于在数据稀缺的设置中，交叉验证的性能不佳，我们提出了一个新颖的估计器，以估计数据驱动的优化策略的样本外部性能。我们的方法利用优化问题的灵敏度分析来估计梯度关于数据中噪声量的最佳客观值，并利用估计的梯度将策略的样本中的表现为依据。与交叉验证技术不同，我们的方法避免了为测试集牺牲数据，在训练和因此非常适合数据稀缺的设置时使用所有数据。我们证明了我们估计量的偏见和方差范围，这些问题与不确定的线性目标优化问题，但已知的，可能是非凸的，可行的区域。对于更专业的优化问题，从某种意义上说，可行区域“弱耦合”，我们证明结果更强。具体而言，我们在估算器的错误上提供明确的高概率界限，该估计器在策略类别上均匀地保持，并取决于问题的维度和策略类的复杂性。我们的边界表明，在轻度条件下，随着优化问题的尺寸的增长，我们的估计器的误差也会消失，即使可用数据的量仍然很小且恒定。说不同的是，我们证明我们的估计量在小型数据中的大规模政权中表现良好。最后，我们通过数值将我们提出的方法与最先进的方法进行比较，通过使用真实数据调度紧急医疗响应服务的案例研究。我们的方法提供了更准确的样本外部性能估计，并学习了表现更好的政策。

translated by 谷歌翻译

Selection of the Most Probable Best

Taeho Kim , Kyoung-kuk Kim , Eunhye Song

分类：机器学习 | (统计)机器学习

2022-07-15

我们考虑一个预期值排名和选择问题，其中所有K解决方案的仿真输出都取决于常见的不确定输入模型。鉴于输入模型的不确定性是由有限支持的概率单纯捕获的，我们将最佳最佳（MPB）定义为最佳概率最大的解决方案。为了设计有效的采样算法以找到MPB，我们首先得出了一个错误选择MPB的概率的较大偏差率，然后提出最佳计算预算分配（OCBA）问题，以找到最佳的静态采样比率的最佳静态采样率所有解决方案输入模型对最大化下限。我们设计了一系列顺序算法，这些算法应用于可解释和计算有效的采样规则，并证明其采样比达到了随着仿真预算的增加而达到OCBA问题的最佳条件。该算法针对用于上下文排名和选择问题的最新顺序抽样算法进行了基准测试，并证明在查找MPB时具有出色的经验性能。

translated by 谷歌翻译

A Cross Validation framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond

Anamitra Chaudhuri , Sabyasachi Chatterjee

分类： (统计)机器学习

2022-01-07

本文为信号去噪提供了一般交叉验证框架。然后将一般框架应用于非参数回归方法，例如趋势过滤和二元推车。然后显示所得到的交叉验证版本以获得最佳调谐的类似物所熟知的几乎相同的收敛速度。没有任何先前的趋势过滤或二元推车的理论分析。为了说明框架的一般性，我们还提出并研究了两个基本估算器的交叉验证版本;套索用于高维线性回归和矩阵估计的奇异值阈值阈值。我们的一般框架是由Chatterjee和Jafarov（2015）的想法的启发，并且可能适用于使用调整参数的广泛估算方法。

translated by 谷歌翻译

Quantifying the Effects of Data Augmentation

Kevin H. Huang , Peter Orbanz , Morgane Austern

分类：机器学习 | (统计)机器学习

2022-02-18

We provide results that exactly quantify how data augmentation affects the convergence rate and variance of estimates. They lead to some unexpected findings: Contrary to common intuition, data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables. The pathological behavior we identify is not a consequence of complex models, but can occur even in the simplest settings -- one of our examples is a ridge regressor with two parameters. On the other hand, our results also show that data augmentation can have real, quantifiable benefits.

translated by 谷歌翻译

How do noise tails impact on deep ReLU networks?

Jianqing Fan , Yihong Gu , Wen-Xin Zhou

分类： (统计)机器学习

2022-03-20

This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.

translated by 谷歌翻译

Communication-Constrained Distributed Quantile Regression with Optimal Statistical Guarantees

Kean Ming Tan , Heather Battey , Wen-Xin Zhou

分类： (统计)机器学习

2021-10-25

我们解决了如何在没有严格缩放条件的情况下实现分布式分数回归中最佳推断的问题。由于分位数回归（QR）损失函数的非平滑性质，这是具有挑战性的，这使现有方法的使用无效。难度通过应用于本地（每个数据源）和全局目标函数的双光滑方法解决。尽管依赖局部和全球平滑参数的精致组合，但分位数回归模型是完全参数的，从而促进了解释。在低维度中，我们为顺序定义的分布式QR估计器建立了有限样本的理论框架。这揭示了通信成本和统计错误之间的权衡。我们进一步讨论并比较了基于WALD和得分型测试和重采样技术的反转的几种替代置信集结构，并详细介绍了对更极端分数系数有效的改进。在高维度中，采用了一个稀疏的框架，其中提出的双滑目标功能与$ \ ell_1 $ -penalty相辅相成。我们表明，相应的分布式QR估计器在近乎恒定的通信回合之后达到了全球收敛率。一项彻底的模拟研究进一步阐明了我们的发现。

translated by 谷歌翻译

Jump Interval-Learning for Individualized Decision Making

Hengrui Cai , Chengchun Shi , Rui Song , Wenbin Lu

分类：机器学习 | (统计)机器学习

2021-11-17

个性化决定规则（IDR）是一个决定函数，可根据他/她观察到的特征分配给定的治疗。文献中的大多数现有工作考虑使用二进制或有限的许多治疗方案的设置。在本文中，我们专注于连续治疗设定，并提出跳跃间隔 - 学习，开发一个最大化预期结果的个性化间隔值决定规则（I2DR）。与推荐单一治疗的IDRS不同，所提出的I2DR为每个人产生了一系列治疗方案，使其在实践中实施更加灵活。为了获得最佳I2DR，我们的跳跃间隔学习方法估计通过跳转惩罚回归给予治疗和协变量的结果的条件平均值，并基于估计的结果回归函数来衍生相应的最佳I2DR。允许回归线是用于清晰的解释或深神经网络的线性，以模拟复杂的处理 - 协调会相互作用。为了实现跳跃间隔学习，我们开发了一种基于动态编程的搜索算法，其有效计算结果回归函数。当结果回归函数是处理空间的分段或连续功能时，建立所得I2DR的统计特性。我们进一步制定了一个程序，以推断（估计）最佳政策下的平均结果。进行广泛的模拟和对华法林研究的真实数据应用，以证明所提出的I2DR的经验有效性。

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译

Minimax Optimal Regression over Sobolev Spaces via Laplacian Eigenmaps on Neighborhood Graphs

Alden Green , Sivaraman Balakrishnan , Ryan J. Tibshirani

分类： (统计)机器学习

2021-11-14

本文研究了基于Laplacian Eigenmaps（Le）的基于Laplacian EIGENMAPS（PCR-LE）的主要成分回归的统计性质，这是基于Laplacian Eigenmaps（Le）的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =（y_1，\ ldots，y_n）$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下，PCR-le达到估计的最佳速率（其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /（2s + d））} $）和健美的测试（$ n ^ { - 4s /（4s + d）$）。我们还表明PCR-LE是\ EMPH {歧管Adaptive}：即，我们考虑在小型内在维度$ M $的歧管上支持设计的情况，并为PCR-LE提供更快的界限Minimax估计（$ n ^ { - 2s /（2s + m）$）和测试（$ n ^ { - 4s /（4s + m）$）收敛率。有趣的是，这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说，对于这个问题的回归估计的特征似乎更容易，统计上讲，而不是估计特征本身。我们通过经验证据支持这些理论结果。

translated by 谷歌翻译

Batch Policy Learning in Average Reward Markov Decision Processes

Peng Liao , Zhengling Qi , Predrag Klasnja , Susan Murphy

分类： (统计)机器学习

2020-07-23

我们在无限地平线马尔可夫决策过程中考虑批量（离线）策略学习问题。通过移动健康应用程序的推动，我们专注于学习最大化长期平均奖励的政策。我们为平均奖励提出了一款双重强大估算器，并表明它实现了半导体效率。此外，我们开发了一种优化算法来计算参数化随机策略类中的最佳策略。估计政策的履行是通过政策阶级的最佳平均奖励与估计政策的平均奖励之间的差异来衡量，我们建立了有限样本的遗憾保证。通过模拟研究和促进体育活动的移动健康研究的分析来说明该方法的性能。

translated by 谷歌翻译

The Lasso with general Gaussian designs with applications to hypothesis testing

Michael Celentano , Andrea Montanari , Yuting Wei

分类：机器学习 | (统计)机器学习

2020-07-27

套索是一种高维回归的方法，当时，当协变量$ p $的订单数量或大于观测值$ n $时，通常使用它。由于两个基本原因，经典的渐近态性理论不适用于该模型：$（1）$正规风险是非平滑的； $（2）$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果，标准的扰动论点是渐近正态性的传统基础。另一方面，套索估计器可以精确地以$ n $和$ p $大，$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量：在这里，我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限，它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序，我们研究了借助拉索的分布，并表明需要校正程度对于计算有效的置信区间是必要的。

translated by 谷歌翻译

Bless and curse of smoothness and phase transitions in nonparametric regressions: a nonasymptotic perspective

Ying Zhu

分类：机器学习

2021-12-07

当回归函数属于标准的平滑类时，由衍生物的单变量函数组成，衍生物到达$（\ gamma + 1）$ th由Action Anclople或Ae界定的常见常数，众所周知，最小的收敛速率均值平均错误（MSE）是$ \左（\ FRAC {\ SIGMA ^ {2}} {n} \右）^ {\ frac {2 \ gamma + 2} {2 \ gamma + 3}} $ \伽玛$是有限的，样本尺寸$ n \ lightarrow \ idty $。从一个不可思议的观点来看，考虑有限$ N $，本文显示：对于旧的H \“较旧的和SoboLev类，最低限度最佳速率是$ \ frac {\ sigma ^ {2} \ left（\ gamma \ vee1 \右）$ \ frac {n} {\ sigma ^ {2}} \ precsim \ left（\ gamma \ vee1 \右）^ {2 \ gamma + 3} $和$ \ left（\ frac {\ sigma ^ {2}} {n} \右）^ {\ frac {2 \ gamma + 2} $ \ r \ frac {n} {\ sigma ^ {2}}} \ succsim \ left（\ gamma \ vee1 \右）^ {2 \ gamma + 3} $。为了建立这些结果，我们在覆盖和覆盖号码上获得上下界限，以获得$ k的广义H \“较旧的班级$ th（$ k = 0，...，\ gamma $）衍生物由上面的参数$ r_ {k} $和$ \ gamma $ th衍生物是$ r _ {\ gamma + 1} - $ lipschitz （以及广义椭圆形的平滑功能）。我们的界限锐化了标准类的古典度量熵结果，并赋予$ \ gamma $和$ r_ {k} $的一般依赖。通过在$ r_ {k} = 1 $以下派生MIMIMAX最佳MSE率，$ r_ {k} \ LEQ \ left（k-1 \右）！$和$ r_ {k} = k！$（与后两个在我们的介绍中有动机的情况）在我们的新熵界的帮助下，我们展示了一些有趣的结果，无法在文献中的现有熵界显示。对于H \“较旧的$ D-$变化函数，我们的结果表明，归一渐近率$ \左（\ frac {\ sigma ^ {2}} {n}右）^ {\ frac {2 \ Gamma + 2} {2 \ Gamma + 2 + D}} $可能是有限样本中的MSE低估。

translated by 谷歌翻译

Sinkhorn Distributionally Robust Optimization

Jie Wang , Rui Gao , Yao Xie

分类：机器学习 | (统计)机器学习

2021-09-24

We study distributionally robust optimization (DRO) with Sinkhorn distance -- a variant of Wasserstein distance based on entropic regularization. We provide convex programming dual reformulation for a general nominal distribution. Compared with Wasserstein DRO, it is computationally tractable for a larger class of loss functions, and its worst-case distribution is more reasonable. We propose an efficient first-order algorithm with bisection search to solve the dual reformulation. We demonstrate that our proposed algorithm finds $\delta$-optimal solution of the new DRO formulation with computation cost $\tilde{O}(\delta^{-3})$ and memory cost $\tilde{O}(\delta^{-2})$, and the computation cost further improves to $\tilde{O}(\delta^{-2})$ when the loss function is smooth. Finally, we provide various numerical examples using both synthetic and real data to demonstrate its competitive performance and light computational speed.

translated by 谷歌翻译