我们束缚了使用梯度流训练的深度线性网络的多余风险。在先前用于建立最小$ \ ell_2 $ -norm interpolant的风险范围的设置中,我们表明随机初始化的深线性网络可以紧密近似甚至匹配已知的范围,即最小$ \ ell_2 $ - norm interpolant。我们的分析还表明,插值深线性模型具有与最小$ \ ell_2 $ -Norm解决方案完全相同的条件差异。由于噪声仅通过条件差异影响多余的风险,因此这意味着深度并不能提高算法“隐藏噪声”的能力。我们的模拟验证了我们边界的各个方面反映了简单数据分布的典型行为。我们还发现,在具有Relu网络的模拟中也可以看到类似的现象,尽管情况更加细微。
translated by 谷歌翻译
范罗尤等人。介绍了凸损函数的概念对随机分类噪声具有鲁棒,并且在这种意义上建立了“取出的”损耗功能是稳健的。在本说明中,我们研究通过最小化脱色损耗而获得的二进制分类器的准确性,并且遵守即使对于简单的线性可分离数据分布,即使是用于简单的线性可分离数据分布,最小化损耗可能仅产生二进制分类器,精度不会比随机猜测更好。
translated by 谷歌翻译
我们证明了稀疏内插程序的过度风险,用于在过度分开的制度中与高斯数据的线性回归的稀疏插值程序的风险。我们应用此结果以获得基础追踪的下限(最低$ \ ell_1 $ -norm interpolant),这意味着其过度的风险可以以指数较慢的速率收敛于ols(最低$ \ ell_2 $ -norm interpolant),即使地面真相稀疏。我们的分析暴露了类似于“人群智慧”的效果的好处,除了拟合$ \ yexit {噪音} $的危害通过在许多方向之间传播来改善 - 从价值开始\ textit {愚蠢} $人群。
translated by 谷歌翻译
神经网络模型的最新成功揭示了一种令人惊讶的统计现象:完全拟合噪声数据的统计模型可以很好地推广到看不见的测试数据。了解$ \ textit {良性过拟合} $的这种现象吸引了强烈的理论和经验研究。在本文中,我们考虑插值两层线性神经网络在平方损失上梯度流训练,当协变量满足亚高斯和抗浓度的特性时,在平方损耗上训练,并在多余的风险上获得界限,并且噪声是独立和次级高斯的。。通过利用最新的结果来表征该估计器的隐性偏见,我们的边界强调了初始化质量的作用以及数据协方差矩阵在实现低过量风险中的特性。
translated by 谷歌翻译
神经切线内核(NTK)是在初始化时使用神经网络定义的内核的广泛网络限制,其嵌入是网络输出相对于其参数的梯度。我们研究了“内核之后”,它使用相同的嵌入方式,除了培训之外,对于具有标准架构的神经网络,从MNIST和CIFAR-10提取的二进制分类问题,以标准方式使用SGD训练。对于某些数据集 - 架构对,在神经网络训练的几个时期之后,在使用网络之后使用网络的硬质边缘SVM比使用网络的初始内核更准确。对于具有类似于vgg的架构的网络,内核之后是更“全局”,因此它的意义不变地扰乱图像的全局结构的输入图像,同时离开本地统计很大程度上完好无损。对于完全连接的网络,在这个意义上,内核之后较少。核之后往往是小的换档,旋转和缩放更不变;数据增强不会改善这些修正。 (有限近似到)共轭内核,使用最后一层隐藏节点获得,有时,有时但不是总是,为NTK和后核提供了很好的近似。培训具有较大学习率的网络(在保持训练误差常量时)产生更好的内核,通过硬质边缘SVM的测试误差来测量。在具有较大学习率培训的网络的内核之后往往是更全局的全球性,更不变到小的班次,旋转和缩放。
translated by 谷歌翻译
The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of the effective rank of the data covariance. It shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. By studying examples of data covariance properties that this characterization shows are required for benign overfitting, we find an important role for finite-dimensional data: the accuracy of the minimum norm interpolating prediction rule approaches the best possible accuracy for a much narrower range of properties of the data distribution when the data lies in an infinite dimensional space versus when the data lies in a finite dimensional space whose dimension grows faster than the sample size.
translated by 谷歌翻译
In off-policy reinforcement learning, a behaviour policy performs exploratory interactions with the environment to obtain state-action-reward samples which are then used to learn a target policy that optimises the expected return. This leads to a problem of off-policy evaluation, where one needs to evaluate the target policy from samples collected by the often unrelated behaviour policy. Importance sampling is a traditional statistical technique that is often applied to off-policy evaluation. While importance sampling estimators are unbiased, their variance increases exponentially with the horizon of the decision process due to computing the importance weight as a product of action probability ratios, yielding estimates with low accuracy for domains involving long-term planning. This paper proposes state-based importance sampling (SIS), which drops the action probability ratios of sub-trajectories with "neglible states" -- roughly speaking, those for which the chosen actions have no impact on the return estimate -- from the computation of the importance weight. Theoretical results show that this results in a reduction of the exponent in the variance upper bound as well as improving the mean squared error. An automated search algorithm based on covariance testing is proposed to identify a negligible state set which has minimal MSE when performing state-based importance sampling. Experiments are conducted on a lift domain, which include "lift states" where the action has no impact on the following state and reward. The results demonstrate that using the search algorithm, SIS yields reduced variance and improved accuracy compared to traditional importance sampling, per-decision importance sampling, and incremental importance sampling.
translated by 谷歌翻译
In this paper, we assess the viability of transformer models in end-to-end InfoSec settings, in which no intermediate feature representations or processing steps occur outside the model. We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files - in a novel end-to-end approach, and explore a variety of architectural designs, training regimes, and experimental settings to determine the ingredients necessary for performant detection models. We show that in contrast to conventional transformers trained on more standard NLP-related tasks, our URL transformer model requires a different training approach to reach high performance levels. Specifically, we show that 1) pre-training on a massive corpus of unlabeled URL data for an auto-regressive task does not readily transfer to binary classification of malicious or benign URLs, but 2) that using an auxiliary auto-regressive loss improves performance when training from scratch. We introduce a method for mixed objective optimization, which dynamically balances contributions from both loss terms so that neither one of them dominates. We show that this method yields quantitative evaluation metrics comparable to that of several top-performing benchmark classifiers. Unlike URLs, binary executables contain longer and more distributed sequences of information-rich bytes. To accommodate such lengthy byte sequences, we introduce additional context length into the transformer by providing its self-attention layers with an adaptive span similar to Sukhbaatar et al. We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets, but also point out the need for further exploration into model improvements in scalability and compute efficiency.
translated by 谷歌翻译
DBLP是计算机科学科学文章的最大开放访问存储库,并提供与出版物,作者和场地相关的元数据。我们从DBLP中检索了超过600万个出版物,并从出版物文本中提取了相关的元数据(例如摘要,作者分支机构,引用),以创建DBLP Discovery Dataset(D3)。 D3可用于确定计算机科学研究的研究活动,生产力,偏见,可及性和影响的趋势。我们提出了针对计算机科学研究量(例如论文,作者,研究活动的数量),感兴趣主题和引文模式的初步分析。我们的发现表明,计算机科学是一个不断增长的研究领域(每年约15%),拥有一个积极的协作研究员社区。与前几十年相比,近年来的论文提供了更多的书目条目,但引用的平均数量仍在下降。调查论文的摘要表明,最近的主题趋势在D3中明显反映。最后,我们列出了D3和提出补充研究问题的进一步应用。 D3数据集,我们的发现和源代码可公开用于研究目的。
translated by 谷歌翻译
超越地球轨道的人类空间勘探将涉及大量距离和持续时间的任务。为了有效减轻无数空间健康危害,数据和空间健康系统的范式转移是实现地球独立性的,而不是Earth-Reliance所必需的。有希望在生物学和健康的人工智能和机器学习领域的发展可以解决这些需求。我们提出了一个适当的自主和智能精密空间健康系统,可以监控,汇总和评估生物医学状态;分析和预测个性化不良健康结果;适应并响应新累积的数据;并提供对其船员医务人员的个人深度空间机组人员和迭代决策支持的预防性,可操作和及时的见解。在这里,我们介绍了美国国家航空航天局组织的研讨会的建议摘要,以便在太空生物学和健康中未来的人工智能应用。在未来十年,生物监测技术,生物标志科学,航天器硬件,智能软件和简化的数据管理必须成熟,并编织成精确的空间健康系统,以使人类在深空中茁壮成长。
translated by 谷歌翻译