智能论文笔记

CovNet: Covariance Networks for Functional Data on Multidimensional Domains

Soham Sarkar , Victor M. Panaretos

分类： (统计)机器学习

2021-04-11

协方差估计在功能数据分析中普遍存在。然而，对多维域的功能观测的情况引入了计算和统计挑战，使标准方法有效地不适用。为了解决这个问题，我们将“协方差网络”（CoVNet）介绍为建模和估算工具。 Covnet模型是“Universal” - 它可用于近似于达到所需精度的任何协方差。此外，该模型可以有效地拟合到数据，其神经网络架构允许我们在实现中采用现代计算工具。 Covnet模型还承认了一个封闭形式的实体分解，可以有效地计算，而不构建协方差本身。这有助于在CoVnet的背景下轻松存储和随后操纵协方差。我们建立了拟议估计者的一致性，得出了汇合速度。通过广泛的仿真研究和休息状态FMRI数据的应用，证明了所提出的方法的有用性。

translated by 谷歌翻译

Nonparametric regression using deep neural networks with ReLU activation function

Johannes Schmidt-Hieber

分类：

2017-08-22

Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to log nfactors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential network parameters exceeding the sample size. The analysis gives some insights into why multilayer feedforward neural networks perform well in practice. Interestingly, for ReLU activation function the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural. It is also shown that under the composition assumption wavelet estimators can only achieve suboptimal rates.

translated by 谷歌翻译

Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach

Nathan Kallus , Xiaojie Mao , Masatoshi Uehara

分类： (统计)机器学习 | 机器学习

2021-03-25

当并非观察到所有混杂因子并获得负面对照时，我们研究因果参数的估计。最近的工作表明，这些方法如何通过两个所谓的桥梁函数来实现识别和有效估计。在本文中，我们使用阴性对照来应对因果推断的主要挑战：这些桥梁功能的识别和估计。先前的工作依赖于这些功能的完整性条件，以识别因果参数并在估计中需要进行独特性假设，并且还集中于桥梁函数的参数估计。相反，我们提供了一种新的识别策略，以避免完整性条件。而且，我们根据最小学习公式为这些功能提供新的估计量。这些估计值适合通用功能类别，例如重现Hilbert空间和神经网络。我们研究了有限样本收敛的结果，既可以估计桥梁功能本身，又要在各种假设组合下对因果参数进行最终估计。我们尽可能避免桥梁上的独特条件。

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

Neural Operator: Learning Maps Between Function Spaces

Nikola Kovachki , Zongyi Li , Burigede Liu , Kamyar Azizzadenesheli , Kaushik Bhattacharya , Andrew Stuart , Anima Anandkumar

分类：机器学习

2021-08-19

神经网络的经典发展主要集中在有限维欧基德空间或有限组之间的学习映射。我们提出了神经网络的概括，以学习映射无限尺寸函数空间之间的运算符。我们通过一类线性积分运算符和非线性激活函数的组成制定运营商的近似，使得组合的操作员可以近似复杂的非线性运算符。我们证明了我们建筑的普遍近似定理。此外，我们介绍了四类运算符参数化：基于图形的运算符，低秩运算符，基于多极图形的运算符和傅里叶运算符，并描述了每个用于用每个计算的高效算法。所提出的神经运营商是决议不变的：它们在底层函数空间的不同离散化之间共享相同的网络参数，并且可以用于零击超分辨率。在数值上，与现有的基于机器学习的方法，达西流程和Navier-Stokes方程相比，所提出的模型显示出卓越的性能，而与传统的PDE求解器相比，与现有的基于机器学习的方法有关的基于机器学习的方法。

translated by 谷歌翻译

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

Hao Liu , Haizhao Yang , Minshuo Chen , Tuo Zhao , Wenjing Liao

分类： (统计)机器学习 | 机器学习

2022-01-01

无限尺寸空间之间的学习运营商是机器学习，成像科学，数学建模和仿真等广泛应用中出现的重要学习任务。本文研究了利用深神经网络的Lipschitz运营商的非参数估计。 Non-asymptotic upper bounds are derived for the generalization error of the empirical risk minimizer over a properly chosen network class.在假设目标操作员表现出低维结构的情况下，由于训练样本大小增加，我们的误差界限衰减，根据我们估计中的内在尺寸，具有吸引力的快速速度。我们的假设涵盖了实际应用中的大多数情况，我们的结果通过利用操作员估算中的低维结构来产生快速速率。我们还研究了网络结构（例如，网络宽度，深度和稀疏性）对神经网络估计器的泛化误差的影响，并提出了对网络结构的选择来定量地最大化学习效率的一般建议。

translated by 谷歌翻译

On the Universality of the Double Descent Peak in Ridgeless Regression

David Holzmüller

分类： (统计)机器学习 | 机器学习 | 神经与进化计算

2020-10-05

对于由缺陷线性回归中的标签噪声引起的预期平均平方概率，我们证明了无渐近分布的下限。我们的下部结合概括了过度公共数据（内插）制度的类似已知结果。与最先前的作品相比，我们的分析适用于广泛的输入分布，几乎肯定的全排列功能矩阵，允许我们涵盖各种类型的确定性或随机特征映射。我们的下限是渐近的锐利，暗示在存在标签噪声时，缺陷的线性回归不会在任何这些特征映射中围绕内插阈值进行良好的。我们详细分析了强加的假设，并为分析（随机）特征映射提供了理论。使用此理论，我们可以表明我们的假设对于具有（Lebesgue）密度的输入分布以及随机深神经网络给出的特征映射，具有Sigmoid，Tanh，SoftPlus或Gelu等分析激活功能。作为进一步的例子，我们示出了来自随机傅里叶特征和多项式内核的特征映射也满足我们的假设。通过进一步的实验和分析结果，我们补充了我们的理论。

translated by 谷歌翻译

How do noise tails impact on deep ReLU networks?

Jianqing Fan , Yihong Gu , Wen-Xin Zhou

分类： (统计)机器学习

2022-03-20

This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.

translated by 谷歌翻译

Quasi-Bayesian Dual Instrumental Variable Regression

Ziyu Wang , Yuhao Zhou , Tongzheng Ren , Jun Zhu

分类： (统计)机器学习 | 机器学习

2021-06-16

近年来目睹了采用灵活的机械学习模型进行乐器变量（IV）回归的兴趣，但仍然缺乏不确定性量化方法的发展。在这项工作中，我们为IV次数回归提出了一种新的Quasi-Bayesian程序，建立了最近开发的核化IV模型和IV回归的双/极小配方。我们通过在$ l_2 $和sobolev规范中建立最低限度的最佳收缩率，并讨论可信球的常见有效性来分析所提出的方法的频繁行为。我们进一步推出了一种可扩展的推理算法，可以扩展到与宽神经网络模型一起工作。实证评价表明，我们的方法对复杂的高维问题产生了丰富的不确定性估计。

translated by 谷歌翻译

Deep learning architectures for nonlinear operator functions and nonlinear inverse problems

Maarten V. de Hoop , Matti Lassas , Christopher A. Wong

分类：机器学习

2019-12-23

我们为特殊神经网络架构，称为运营商复发性神经网络的理论分析，用于近似非线性函数，其输入是线性运算符。这些功能通常在解决方案算法中出现用于逆边值问题的问题。传统的神经网络将输入数据视为向量，因此它们没有有效地捕获与对应于这种逆问题中的数据的线性运算符相关联的乘法结构。因此，我们介绍一个类似标准的神经网络架构的新系列，但是输入数据在向量上乘法作用。由较小的算子出现在边界控制中的紧凑型操作员和波动方程的反边值问题分析，我们在网络中的选择权重矩阵中促进结构和稀疏性。在描述此架构后，我们研究其表示属性以及其近似属性。我们还表明，可以引入明确的正则化，其可以从所述逆问题的数学分析导出，并导致概括属性上的某些保证。我们观察到重量矩阵的稀疏性改善了概括估计。最后，我们讨论如何将运营商复发网络视为深度学习模拟，以确定诸如用于从边界测量的声波方程中重建所未知的WAVESTED的边界控制的算法算法。

translated by 谷歌翻译

Neural Estimation of Statistical Divergences

Sreejith Sreekumar , Ziv Goldfeld

分类： (统计)机器学习

2021-10-07

量化概率分布之间的异化的统计分歧（SDS）是统计推理和机器学习的基本组成部分。用于估计这些分歧的现代方法依赖于通过神经网络（NN）进行参数化经验变化形式并优化参数空间。这种神经估算器在实践中大量使用，但相应的性能保证是部分的，并呼吁进一步探索。特别是，涉及的两个错误源之间存在基本的权衡：近似和经验估计。虽然前者需要NN课程富有富有表现力，但后者依赖于控制复杂性。我们通过非渐近误差界限基于浅NN的基于浅NN的估计的估算权，重点关注四个流行的$ \ mathsf {f} $ - 分离 - kullback-leibler，chi squared，squared hellinger，以及总变异。我们分析依赖于实证过程理论的非渐近功能近似定理和工具。界限揭示了NN尺寸和样品数量之间的张力，并使能够表征其缩放速率，以确保一致性。对于紧凑型支持的分布，我们进一步表明，上述上三次分歧的神经估算器以适当的NN生长速率接近Minimax率 - 最佳，实现了对数因子的参数速率。

translated by 谷歌翻译

Dimension-Free Average Treatment Effect Inference with Deep Neural Networks

Xinze Du , Yingying Fan , Jinchi Lv , Tianshu Sun , Patrick Vossler

分类： (统计)机器学习 | 机器学习

2021-12-02

本文研究了在潜在的结果框架中使用深神经网络（DNN）的平均治疗效果（ATE）的估计和推理。在一些规则性条件下，观察到的响应可以作为与混杂变量和治疗指标作为自变量的平均回归问题的响应。使用这种配方，我们研究了通过使用特定网络架构的DNN回归基于估计平均回归函数的两种尝试估计和推断方法。我们表明ATE的两个DNN估计在底层真正的均值回归模型上的一些假设下与无维一致性率一致。我们的模型假设可容纳观察到的协变量的潜在复杂的依赖结构，包括治疗指标和混淆变量之间的潜在因子和非线性相互作用。我们还基于采样分裂的思想，确保精确推理和不确定量化，建立了我们估计的渐近常态。仿真研究和实际数据应用证明了我们的理论调查结果，支持我们的DNN估计和推理方法。

translated by 谷歌翻译

On the Estimation of Derivatives Using Plug-in KRR Estimators

Zejian Liu , Meng Li

分类： (统计)机器学习 | 机器学习

2020-06-02

我们研究了估计回归函数的导数的问题，该函数的衍生物具有广泛的应用，作为未知函数的关键非参数功能。标准分析可以定制为特定的衍生订单，参数调整仍然是一个艰巨的挑战，尤其是对于高阶导数。在本文中，我们提出了一个简单的插入式内核脊回归（KRR）估计器，其非参数回归中具有随机设计，该设计广泛适用于多维支持和任意混合派生衍生物。我们提供了非反应分析，以统一的方式研究提出的估计量的行为，该估计量涵盖回归函数及其衍生物，从而在强$ l_ \ infty $ norm中导致一般核类中的一般内核的两个误差范围。在专门针对多个多项式衰减特征值核的具体示例中，提出的估计器将最小值的最佳速率恢复到估计H \ h \ offormions ofergarithmic因子的最佳速率。因此，在任何衍生词的顺序中都选择了调整参数。因此，提出的估计器享受\ textIt {插件属性}的衍生物，因为它会自动适应要估计的衍生物顺序，从而可以轻松地在实践中调整。我们的仿真研究表明，相对于几种现有方法蓝色的几种现有方法的有限样本性能有限，并证实了其最小值最优性的理论发现。

translated by 谷歌翻译

Asymptotics of Network Embeddings Learned via Subsampling

Andrew Davison , Morgane Austern

分类： (统计)机器学习 | 机器学习

2021-07-06

Network data are ubiquitous in modern machine learning, with tasks of interest including node classification, node clustering and link prediction. A frequent approach begins by learning an Euclidean embedding of the network, to which algorithms developed for vector-valued data are applied. For large networks, embeddings are learned using stochastic gradient methods where the sub-sampling scheme can be freely chosen. Despite the strong empirical performance of such methods, they are not well understood theoretically. Our work encapsulates representation methods using a subsampling approach, such as node2vec, into a single unifying framework. We prove, under the assumption that the graph is exchangeable, that the distribution of the learned embedding vectors asymptotically decouples. Moreover, we characterize the asymptotic distribution and provided rates of convergence, in terms of the latent parameters, which includes the choice of loss function and the embedding dimension. This provides a theoretical foundation to understand what the embedding vectors represent and how well these methods perform on downstream tasks. Notably, we observe that typically used loss functions may lead to shortcomings, such as a lack of Fisher consistency.

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

分类：机器学习 | (统计)机器学习

2021-12-23

我们研究了随机近似程序，以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后，我们证明了一种在适当平均迭代序列上的非渐近实例依赖性，具有匹配局部渐近最小的限制的领先术语，包括对参数$的敏锐依赖（d，t _ {\ mathrm {mix}}） $以高阶术语。我们将这些上限与非渐近Minimax的下限补充，该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD（$ \ lambda $）算法，以便[0,1）$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门（例如，在运行TD（$ \ Lambda $）算法时选择$ \ lambda $的值）。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

Andrea Montanari , Yiqiao Zhong

分类： (统计)机器学习 | 机器学习

2020-07-25

现代神经网络通常以强烈的过度构造状态运行：它们包含许多参数，即使实际标签被纯粹随机的标签代替，它们也可以插入训练集。尽管如此，他们在看不见的数据上达到了良好的预测错误：插值训练集并不会导致巨大的概括错误。此外，过度散色化似乎是有益的，因为它简化了优化景观。在这里，我们在神经切线（NT）制度中的两层神经网络的背景下研究这些现象。我们考虑了一个简单的数据模型，以及各向同性协变量的矢量，$ d $尺寸和$ n $隐藏的神经元。我们假设样本量$ n $和尺寸$ d $都很大，并且它们在多项式上相关。我们的第一个主要结果是对过份术的经验NT内核的特征结构的特征。这种表征意味着必然的表明，经验NT内核的最低特征值在$ ND \ gg n $后立即从零界限，因此网络可以在同一制度中精确插值任意标签。我们的第二个主要结果是对NT Ridge回归的概括误差的表征，包括特殊情况，最小值-ULL_2 $ NORD插值。我们证明，一旦$ nd \ gg n $，测试误差就会被内核岭回归之一相对于无限宽度内核而近似。多项式脊回归的误差依次近似后者，从而通过与激活函数的高度组件相关的“自我诱导的”项增加了正则化参数。多项式程度取决于样本量和尺寸（尤其是$ \ log n/\ log d $）。

translated by 谷歌翻译

Spectral Representation Learning for Conditional Moment Models

Ziyu Wang , Yucen Luo , Yueru Li , Jun Zhu , Bernhard Schölkopf

分类： (统计)机器学习 | 机器学习

2022-10-29

Many problems in causal inference and economics can be formulated in the framework of conditional moment models, which characterize the target function through a collection of conditional moment restrictions. For nonparametric conditional moment models, efficient estimation often relies on preimposed conditions on various measures of ill-posedness of the hypothesis space, which are hard to validate when flexible models are used. In this work, we address this issue by proposing a procedure that automatically learns representations with controlled measures of ill-posedness. Our method approximates a linear representation defined by the spectral decomposition of a conditional expectation operator, which can be used for kernelized estimators and is known to facilitate minimax optimal estimation in certain settings. We show this representation can be efficiently estimated from data, and establish L2 consistency for the resulting estimator. We evaluate the proposed method on proximal causal inference tasks, exhibiting promising performance on high-dimensional, semi-synthetic data.

translated by 谷歌翻译

Smooth Nested Simulation: Bridging Cubic and Square Root Convergence Rates in High Dimensions

Wenjia Wang , Yanyuan Wang , Xiaowei Zhang

分类： (统计)机器学习

2022-01-09

嵌套模拟涉及通过模拟估算条件期望的功能。在本文中，我们提出了一种基于内核RIDGE回归的新方法，利用作为多维调节变量的函数的条件期望的平滑度。渐近分析表明，随着仿真预算的增加，所提出的方法可以有效地减轻了对收敛速度的维度诅咒，只要条件期望足够平滑。平滑度桥接立方根收敛速度之间的间隙（即标准嵌套模拟的最佳速率）和平方根收敛速率（即标准蒙特卡罗模拟的规范率）。我们通过来自投资组合风险管理和输入不确定性量化的数值例子来证明所提出的方法的性能。

translated by 谷歌翻译