智能论文笔记

Wasserstein multivariate auto-regressive models for modeling distributional time series and its application in graph learning

Yiye Jiang

分类： (统计)机器学习 | 机器学习

2022-07-12

我们为多元分布时间序列的统计分析提出了一个新的自动回归模型。感兴趣的数据包括一系列在真实线的有限间隔内支持的多个概率度量，并由不同的时间瞬间索引。概率度量是在Wasserstein空间中建模为随机对象的。我们通过首先将所有原始措施居中在Lebesgue度量的切线空间中建立自动回归模型，以便它们的Fr \'Echet意味着变为Lebesgue度量。使用迭代的随机函数系统的理论，提供了这种模型解决方案的存在，独特性和平稳性的结果。我们还提出了模型系数的一致估计器。除了对模拟数据的分析外，还用两个实际数据集说明了所提出的模型集，该数据集由不同国家 /地区的年龄分布和巴黎的自行车共享网络制成。最后，由于我们对模型系数施加的正面和有限性约束，这是在这些约束下学习的拟议估计器，因此自然具有稀疏的结构。稀疏性允许在多变量分布时间序列中学习提出的模型在学习时间依赖性图中的应用。

translated by 谷歌翻译

Functional Linear Regression of Cumulative Distribution Functions

Qian Zhang , Anuran Makur , Kamyar Azizzadenesheli

分类：机器学习

2022-05-28

The estimation of cumulative distribution functions (CDFs) is an important learning task with a great variety of downstream applications, such as risk assessments in predictions and decision making. In this paper, we study functional regression of contextual CDFs where each data point is sampled from a linear combination of context dependent CDF basis functions. We propose functional ridge-regression-based estimation methods that estimate CDFs accurately everywhere. In particular, given $n$ samples with $d$ basis functions, we show estimation error upper bounds of $\widetilde{O}(\sqrt{d/n})$ for fixed design, random design, and adversarial context cases. We also derive matching information theoretic lower bounds, establishing minimax optimality for CDF functional regression. Furthermore, we remove the burn-in time in the random design setting using an alternative penalized estimator. Then, we consider agnostic settings where there is a mismatch in the data generation process. We characterize the error of the proposed estimators in terms of the mismatched error, and show that the estimators are well-behaved under model mismatch. Finally, to complete our study, we formalize infinite dimensional models where the parameter space is an infinite dimensional Hilbert space, and establish self-normalized estimation error upper bounds for this setting.

translated by 谷歌翻译

Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models

Yihan Zhang , Marco Mondelli , Ramji Venkataramanan

分类：机器学习 | (统计)机器学习

2022-11-21

In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output the top two eigenvectors of a suitable data-dependent matrix. However, despite the wide applicability, their design is still obtained via heuristic considerations, and the number of samples $n$ needed to guarantee recovery is super-linear in the signal dimension $d$. In this paper, we develop exact asymptotics on spectral methods in the challenging proportional regime in which $n, d$ grow large and their ratio converges to a finite constant. By doing so, we are able to optimize the design of the spectral method, and combine it with a simple linear estimator, in order to minimize the estimation error. Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms. Numerical simulations for mixed linear regression and phase retrieval display the advantage enabled by our analysis over existing designs of spectral methods.

translated by 谷歌翻译

Nonlinear Sufficient Dimension Reduction for Distribution-on-Distribution Regression

Qi Zhang , Bing Li , Lingzhou Xue

分类： (统计)机器学习

2022-07-11

我们引入了一个新的非线性降低框架的新框架，其中预测因子和响应都是分布数据，它们被建模为度量空间的成员。我们实现非线性足够尺寸降低的关键步骤是在度量空间上构建通用内核，从而导致繁殖Hilbert空间的预测变量和响应，这些空间足以表征有条件的独立性，以决定足够的尺寸减少。对于单变量分布，我们使用Wasserstein距离的众所周知的分位数来构建通用内核。对于多元分布，我们求助于最近开发的切成薄片的Wasserstein距离，以实现此目的。由于可以通过单变量瓦斯汀距离的分位数表示来计算切片的瓦斯坦距离，因此多变量瓦斯坦距离的计算保持在可管理的水平。该方法应用于几个数据集，包括生育能力和死亡率分布数据和卡尔加里温度数据。

translated by 谷歌翻译

Projected Statistical Methods for Distributional Data on the Real Line with the Wasserstein Metric

Matteo Pegoraro , Mario Beraha

分类： (统计)机器学习

2021-01-22

我们介绍了一类小说的预计方法，对实际线上的概率分布数据集进行统计分析，具有2-Wassersein指标。我们特别关注主成分分析（PCA）和回归。为了定义这些模型，我们通过将数据映射到合适的线性空间并使用度量投影运算符来限制Wassersein空间中的结果来利用与其弱利米结构密切相关的Wasserstein空间的表示。通过仔细选择切线，我们能够推出快速的经验方法，利用受约束的B样条近似。作为我们方法的副产品，我们还能够为PCA的PCA进行更快的例程来获得分布。通过仿真研究，我们将我们的方法与先前提出的方法进行比较，表明我们预计的PCA具有类似的性能，即使在拼盘下也是极其灵活的。研究了模型的若干理论性质，并证明了渐近一致性。讨论了两个真实世界应用于美国和风速预测的Covid-19死亡率。

translated by 谷歌翻译

An Interpretable and Efficient Infinite-Order Vector Autoregressive Model for High-Dimensional Time Series

Yao Zheng , Shibo Li

分类： (统计)机器学习

2022-09-02

作为一种特殊的无限级矢量自回旋（VAR）模型，矢量自回归移动平均值（VARMA）模型比广泛使用的有限级var模型可以捕获更丰富的时间模式。然而，长期以来，其实用性一直受到其不可识别性，计算疾病性和解释相对难度的阻碍。本文介绍了一种新颖的无限级VAR模型，该模型不仅避免了VARMA模型的缺点，而且继承了其有利的时间模式。作为另一个有吸引力的特征，可以单独解释该模型的时间和横截面依赖性结构，因为它们的特征是不同的参数集。对于高维时间序列，这种分离激发了我们对确定横截面依赖性的参数施加稀疏性。结果，可以在不牺牲任何时间信息的情况下实现更高的统计效率和可解释性。我们为提出的模型引入了一个$ \ ell_1 $调查估计量，并得出相应的非反应误差边界。开发了有效的块坐标下降算法和一致的模型顺序选择方法。拟议方法的优点得到了模拟研究和现实世界的宏观经济数据分析的支持。

translated by 谷歌翻译

Time-uniform central limit theory, asymptotic confidence sequences, and anytime-valid causal inference

Ian Waudby-Smith , David Arbour , Ritwik Sinha , Edward H. Kennedy , Aaditya Ramdas

分类： (统计)机器学习

2021-03-11

基于中央限制定理（CLT）的置信区间是经典统计的基石。尽管仅渐近地有效，但它们是无处不在的，因为它们允许在非常弱的假设下进行统计推断，即使不可能进行非反应性推断，通常也可以应用于问题。本文引入了这种渐近置信区间的时间均匀类似物。为了详细说明，我们的方法采用置信序列（CS）的形式 - 随着时间的推移均匀有效的置信区间序列。 CSS在任意停止时间时提供有效的推断，与需要预先确定样本量的经典置信区间不同，因此没有受到“窥视”数据的惩罚。文献中现有的CSS是非肿瘤的，因此不享受上述渐近置信区间的广泛适用性。我们的工作通过给出“渐近CSS”的定义来弥合差距，并得出仅需要类似CLT的假设的通用渐近CS。虽然CLT在固定样本量下近似于高斯的样本平均值的分布，但我们使用强大的不变性原理（来自Komlos，Major和Tusnady的1970年代的开创性工作），按照整个样品平均过程均匀地近似于整个样品平均过程。隐性的高斯过程。我们通过在观察性研究中基于双重稳健的估计量来得出非参数渐近级别的CSS来证明它们的实用性，即使在固定的时间方案中，也可能不存在非催化方法（由于混淆偏见）。这些使双重强大的因果推断可以连续监测并自适应地停止。

translated by 谷歌翻译

Stochastic Subgradient Descent Escapes Active Strict Saddles

Pascal Bianchi , Walid Hachem , Sholom Schechtman

分类： (统计)机器学习

2021-08-04

In non-smooth stochastic optimization, we establish the non-convergence of the stochastic subgradient descent (SGD) to the critical points recently called active strict saddles by Davis and Drusvyatskiy. Such points lie on a manifold $M$ where the function $f$ has a direction of second-order negative curvature. Off this manifold, the norm of the Clarke subdifferential of $f$ is lower-bounded. We require two conditions on $f$. The first assumption is a Verdier stratification condition, which is a refinement of the popular Whitney stratification. It allows us to establish a reinforced version of the projection formula of Bolte \emph{et.al.} for Whitney stratifiable functions, and which is of independent interest. The second assumption, termed the angle condition, allows to control the distance of the iterates to $M$. When $f$ is weakly convex, our assumptions are generic. Consequently, generically in the class of definable weakly convex functions, the SGD converges to a local minimizer.

translated by 谷歌翻译

Debiased Machine Learning of Set-Identified Linear Models

Vira Semenova

分类： (统计)机器学习 | 机器学习

2017-12-28

This paper provides estimation and inference methods for an identified set's boundary (i.e., support function) where the selection among a very large number of covariates is based on modern regularized tools. I characterize the boundary using a semiparametric moment equation. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent, uniformly asymptotically Gaussian estimator of the boundary and propose a multiplier bootstrap procedure to conduct inference. I apply this result to the partially linear model, the partially linear IV model and the average partial derivative with an interval-valued outcome.

translated by 谷歌翻译

Unsupervised learning of observation functions in state-space models by nonparametric moment methods

Qingci An , Yannis Kevrekidis , Fei Lu , Mauro Maggioni

分类： (统计)机器学习 | 机器学习

2022-07-12

我们研究了非线性状态空间模型中对不可糊化的观察函数的无监督学习。假设观察过程的大量数据以及状态过程的分布，我们引入了一种非参数通用力矩方法，以通过约束回归来估计观察函数。主要的挑战来自观察函数的不可抑制性以及国家与观察之间缺乏数据对。我们解决了二次损失功能可识别性的基本问题，并表明可识别性的功能空间是闭合状态过程的RKHS。数值结果表明，前两个矩和时间相关以及上限和下限可以识别从分段多项式到平滑函数的功能，从而导致收敛估计器。还讨论了该方法的局限性，例如由于对称性和平稳性而引起的非识别性。

translated by 谷歌翻译

Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning Algorithms

Milad Sefidgaran , Amin Gohari , Gaël Richard , Umut Şimşekli

分类： (统计)机器学习 | 机器学习

2022-03-04

了解现代机器学习设置中的概括一直是统计学习理论的主要挑战之一。在这种情况下，近年来见证了各种泛化范围的发展，表明了不同的复杂性概念，例如数据样本和算法输出之间的相互信息，假设空间的可压缩性以及假设空间的分形维度。尽管这些界限从不同角度照亮了手头的问题，但它们建议的复杂性概念似乎似乎无关，从而限制了它们的高级影响。在这项研究中，我们通过速率理论的镜头证明了新的概括界定，并明确地将相互信息，可压缩性和分形维度的概念联系起来。我们的方法包括（i）通过使用源编码概念来定义可压缩性的广义概念，（ii）表明“压缩错误率”可以与预期和高概率相关。我们表明，在“无损压缩”设置中，我们恢复并改善了现有的基于信息的界限，而“有损压缩”方案使我们能够将概括与速率延伸维度联系起来，这是分形维度的特定概念。我们的结果为概括带来了更统一的观点，并打开了几个未来的研究方向。

translated by 谷歌翻译

The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks

Takuo Matsubara , Chris J. Oates , François-Xavier Briol

分类： (统计)机器学习 | 机器学习

2020-10-16

贝叶斯神经网络试图将神经网络的强大预测性能与与贝叶斯架构预测产出相关的不确定性的正式量化相结合。然而，它仍然不清楚如何在升入网络的输出空间时，如何赋予网络的参数。提出了一种可能的解决方案，使用户能够为手头的任务提供适当的高斯过程协方差函数。我们的方法构造了网络参数的先前分配，称为ridgelet，它近似于网络的输出空间中的Posited高斯过程。与神经网络和高斯过程之间的连接的现有工作相比，我们的分析是非渐近的，提供有限的样本大小的错误界限。这建立了贝叶斯神经网络可以近似任何高斯过程，其协方差函数是足够规律的任何高斯过程。我们的实验评估仅限于概念验证，在那里我们证明ridgele先前可以在可以提供合适的高斯过程的回归问题之前出现非结构化。

translated by 谷歌翻译

On Variance Estimation of Random Forests

Tianning Xu , Ruoqing Zhu , Xiaofeng Shao

分类： (统计)机器学习 | 机器学习

2022-02-18

合奏方法（例如随机森林）由于其高预测精度而在应用中很受欢迎。现有文献将随机的森林预测视为无限顺序不完整的U统计量，以量化其不确定性。但是，这些方法集中在每棵树的小次采样大小上，这在理论上是有效但实际上有限的。本文基于不完整的U统计数据，开发了公正的方差估计器，该估计量可以与整体样本量相当，从而使统计推断在更广泛的实际应用中成为可能。仿真结果表明，我们的估计量没有额外的计算成本，估计器的偏见和更准确的覆盖率。我们还提出了一项局部平滑过程，以减少估计器的变化，当树木数量相对较小时，该过程显示出改善的数值性能。此外，我们研究了在特定方案下提出的方差估计器的比率一致性。特别是，我们开发了一种新的“双U统计”公式，以分析估算器差异的HOFFING分解。

translated by 谷歌翻译

Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay

Yuetian Luo , Anru R. Zhang

分类：机器学习 | (统计)机器学习

2022-06-17

我们研究了张量张量的回归，其中的目标是将张量的响应与张量协变量与塔克等级参数张量/矩阵连接起来，而没有其内在等级的先验知识。我们提出了Riemannian梯度下降（RGD）和Riemannian Gauss-Newton（RGN）方法，并通过研究等级过度参数化的影响来应对未知等级的挑战。我们通过表明RGD和RGN分别线性地和四边形地收敛到两个等级的统计最佳估计值，从而为一般的张量调节回归提供了第一个收敛保证。我们的理论揭示了一种有趣的现象：Riemannian优化方法自然地适应了过度参数化，而无需修改其实施。我们还为低度多项式框架下的标量调整回归中的统计计算差距提供了第一个严格的证据。我们的理论证明了``统计计算差距的祝福''现象：在张张量的张量回归中，对于三个或更高的张紧器，在张张量的张量回归中，计算所需的样本量与中等级别相匹配的计算量相匹配。在考虑计算可行的估计器时，虽然矩阵设置没有此类好处。这表明中等等级的过度参数化本质上是``在张量调整的样本量三分或更高的样本大小上，三分或更高的样本量。最后，我们进行仿真研究以显示我们提出的方法的优势并证实我们的理论发现。

translated by 谷歌翻译

Risk Measures and Upper Probabilities: Coherence and Stratification

Christian Fröhlich , Robert C. Williamson

分类：机器学习

2022-06-07

机器学习通常以经典的概率理论为前提，这意味着聚集是基于期望的。现在有多种原因可以激励人们将经典概率理论作为机器学习的数学基础。我们系统地检查了一系列强大而丰富的此类替代品，即各种称为光谱风险度量，Choquet积分或Lorentz规范。我们提出了一系列的表征结果，并演示了使这个光谱家族如此特别的原因。在此过程中，我们证明了所有连贯的风险度量的自然分层，从它们通过利用重新安排不变性Banach空间理论的结果来诱导的上层概率。我们凭经验证明了这种新的不确定性方法如何有助于解决实用的机器学习问题。

translated by 谷歌翻译

On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization

Fei Jiang , Yeqing Zhou , Jianxuan Liu , Yanyuan Ma

分类：机器学习

2022-12-31

We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.

translated by 谷歌翻译

Optimal high-dimensional and nonparametric distributed testing under communication constraints

Botond Szabó , Lasse Vuursteen , Harry van Zanten

分类： (统计)机器学习

2022-02-02

我们在分布式框架中得出最小值测试错误，其中数据被分成多个机器，并且它们与中央机器的通信仅限于$ b $位。我们研究了高斯白噪声下的$ d $ - 和无限维信号检测问题。我们还得出达到理论下限的分布式测试算法。我们的结果表明，分布式测试受到从根本上不同的现象，这些现象在分布式估计中未观察到。在我们的发现中，我们表明，可以访问共享随机性的测试协议在某些制度中的性能比不进行的测试协议可以更好地表现。我们还观察到，即使仅使用单个本地计算机上可用的信息，一致的非参数分布式测试始终是可能的，即使只有$ 1 $的通信和相应的测试优于最佳本地测试。此外，我们还得出了自适应非参数分布测试策略和相应的理论下限。

translated by 谷歌翻译

Double Robust Bayesian Inference on Average Treatment Effects

Christoph Breunig , Ruixuan Liu , Zhengfei Yu

分类： (统计)机器学习

2022-11-29

We study a double robust Bayesian inference procedure on the average treatment effect (ATE) under unconfoundedness. Our Bayesian approach involves a correction term for prior distributions adjusted by the propensity score. We prove asymptotic equivalence of our Bayesian estimator and efficient frequentist estimators by establishing a new semiparametric Bernstein-von Mises theorem under double robustness; i.e., the lack of smoothness of conditional mean functions can be compensated by high regularity of the propensity score and vice versa. Consequently, the resulting Bayesian point estimator internalizes the bias correction as the frequentist-type doubly robust estimator, and the Bayesian credible sets form confidence intervals with asymptotically exact coverage probability. In simulations, we find that this corrected Bayesian procedure leads to significant bias reduction of point estimation and accurate coverage of confidence intervals, especially when the dimensionality of covariates is large relative to the sample size and the underlying functions become complex. We illustrate our method in an application to the National Supported Work Demonstration.

translated by 谷歌翻译

Entropy Regularized Optimal Transport Independence Criterion

Lang Liu , Soumik Pal , Zaid Harchaoui

分类： (统计)机器学习 | 机器学习

2021-12-31

最佳运输（OT）及其熵正则后代最近在机器学习和AI域中获得了很多关注。特别地，最优传输已被用于在概率分布之间开发概率度量。我们在本文中介绍了基于熵正常的最佳运输的独立性标准。我们的标准可用于测试两个样本之间的独立性。我们为测试统计制定非渐近界，研究其在零和替代假设下的统计行为。我们的理论结果涉及来自U-Process理论和最佳运输理论的工具。我们在现有的基准上提出了实验结果，说明了所提出的标准的兴趣。

translated by 谷歌翻译

Uncertainty Quantification of the 4th kind; optimal posterior accuracy-uncertainty tradeoff with the minimum enclosing ball

Hamed Hamze Bajgiran , Pau Batlle Franch , Houman Owhadi , Mostafa Samir , Clint Scovel , Mahdy Shirdel , Michael Stanley , Peyman Tavallali

分类：机器学习

2021-08-24

基本上有三种不确定性量化方法（UQ）：（a）强大的优化，（b）贝叶斯，（c）决策理论。尽管（a）坚固，但在准确性和数据同化方面是不利的。（b）需要先验，通常是脆弱的，后验估计可能很慢。尽管（c）导致对最佳先验的识别，但其近似遭受了维度的诅咒，风险的概念是相对于数据分布的平均值。我们引入了第四种，它是（a），（b），（c）和假设检验之间的杂种。可以总结为在观察样本$ x $之后，（1）通过相对可能性定义了可能性区域，（2）在该区域玩Minmax游戏以定义最佳估计器及其风险。最终的方法具有几种理想的属性（a）测量数据后确定了最佳先验，并且风险概念是后部的，（b）确定最佳估计值，其风险可以降低到计算最小封闭的最小封闭式。利益图量下的可能性区域图像的球（这是快速的，不受维数的诅咒）。该方法的特征在于$ [0,1] $中的参数，该参数是在观察到的数据（相对可能性）的稀有度上被假定的下限。当该参数接近$ 1 $时，该方法会产生一个后分布，该分布集中在最大似然估计的情况下，并具有较低的置信度UQ估计值。当该参数接近$ 0 $时，该方法会产生最大风险后验分布，并具有很高的信心UQ估计值。除了导航准确性不确定性权衡外，该建议的方法还通过导航与数据同化相关的稳健性 - 准确性权衡解决了贝叶斯推断的脆弱性。

translated by 谷歌翻译