我们提出了一种用于初始化(播种)$ k $ -MEANS聚类算法的元方法,称为PNN-Smoothing。它包括将给定的数据集拆分为$ j $随机子集,单独聚类,并将所得聚类合并与成对的nearest-neart-neighbor(PNN)方法。从某种意义上说,当单个子集聚类时,任何种子算法都可以使用。如果该播种算法的计算复杂性在数据$ n $的大小和簇$ k $的大小上是线性的,则PNN-Smoothing几乎是线性的,可以选择$ J $,并且在实践中具有竞争力。我们在经验上使用几种现有的播种方法和对几个合成和真实数据集进行测试,表明该过程在系统上会带来更好的成本。我们的实施可在https://github.com/carlobaldassi/kmeanspnnsmoothing.jl上公开获得。
translated by 谷歌翻译
我们通过将其基于实现功能空间而不是参数空间的几何形状来系统地研究深度神经网络景观的方法。将分类器分组到等效类中,我们开发了一个标准化的参数化,其中所有对称性都被删除,从而导致环形拓扑。在这个空间上,我们探讨了误差景观而不是损失。这使我们能够得出有意义的概念,即最小化器的平坦度和连接它们的地球通道的概念。使用不同的优化算法,这些算法采样具有不同平坦度的最小化器,我们研究模式连接性和相对距离。测试各种最先进的体系结构和基准数据集,我们确认了平面度和泛化性能之间的相关性;我们进一步表明,在功能空间中,minima彼此更近,并且连接它们的大地测量学的屏障很小。我们还发现,通过梯度下降的变体发现的最小化器可以通过由参数空间中的两个直线组成的零误差路径连接,即带有单个弯曲的多边形链。我们观察到具有二进制权重和激活的神经网络中相似的定性结果,这为在这种情况下的连通性提供了第一个结果之一。我们的结果取决于对称性的去除,并且与对简单浅层模型进行的一些分析研究所描述的丰富现象学非常吻合。
translated by 谷歌翻译
我们将数字化量子退火(QA)和量子近似优化算法(QAOA)应用于人工神经网络中监督学习的范式任务:二元切割的突触权优化。在与MaxCut常用的Qoaa应用程序方差,或对Quantum Spin-Chains接地状态准备,经典Hamiltonian的特征在于高度非局部多自旋相互作用。然而,我们为QAOA参数提供最佳顺利解决的证据,这些参数可在同一问题的典型实例之间转移,并且我们证明了Qaoa在传统Qa上的增强性能。我们还研究了QAOA优化景观几何形状在这个问题中的作用,表明QA中遇到的间隙闭合转变的不利影响也对我们实施QAOA实施的表现负面影响。
translated by 谷歌翻译
当前的深度神经网络被高度参数化(多达数十亿个连接权重)和非线性。然而,它们几乎可以通过梯度下降算法的变体完美地拟合数据,并达到预测准确性的意外水平,而不会过度拟合。这些是巨大的结果,无视统计学习的预测,并对非凸优化构成概念性挑战。在本文中,我们使用来自无序系统的统计物理学的方法来分析非凸二进制二进制神经网络模型中过度参数化的计算后果,该模型对从结构上更简单但“隐藏”网络产生的数据进行了培训。随着连接权重的增加,我们遵循误差损失函数不同最小值的几何结构的变化,并将其与学习和概括性能相关联。当解决方案开始存在时,第一次过渡发生在所谓的插值点(完美拟合变得可能)。这种过渡反映了典型溶液的特性,但是它是尖锐的最小值,难以采样。差距后,发生了第二个过渡,并具有不同类型的“非典型”结构的不连续外观:重量空间的宽区域,这些区域特别是解决方案密度且具有良好的泛化特性。两种解决方案共存,典型的解决方案的呈指数数量,但是从经验上讲,我们发现有效的算法采样了非典型,稀有的算法。这表明非典型相变是学习的相关阶段。与该理论建议的可观察到的现实网络的数值测试结果与这种情况一致。
translated by 谷歌翻译
深度学习的成功揭示了神经网络对整个科学的应用潜力,并开辟了基本的理论问题。特别地,基于梯度方法的简单变体的学习算法能够找到高度非凸损函数的近最佳最佳最小值,是神经网络的意外特征。此外,这种算法即使在存在噪声的情况下也能够适合数据,但它们具有出色的预测能力。若干经验结果表明了通过算法实现的最小值的所谓平坦度与概括性性能之间的可再现相关性。同时,统计物理结果表明,在非透露网络中,多个窄的最小值可能与较少数量的宽扁平最小值共存,这概括了很好。在这里,我们表明,从“高边缘”(即局部稳健的)配置,从最小值的聚结会出现宽平坦的结构。尽管与零保证金相比具有呈指数稀有的稀有性,但高利润最小值倾向于集中在特定地区。这些最小值又被较小且较小的边距的其他解决方案包围,导致长距离的溶液区域密集。我们的分析还提供了一种替代分析方法,用于估计扁平最小值,当算法开始找到解决方案时,随着模型参数的数量变化。
translated by 谷歌翻译
在神经网络的经验风险景观中扁平最小值的性质已经讨论了一段时间。越来越多的证据表明他们对尖锐物质具有更好的泛化能力。首先,我们讨论高斯混合分类模型,并分析显示存在贝叶斯最佳点估算器,其对应于属于宽平区域的最小值。可以通过直接在分类器(通常是独立的)或学习中使用的可分解损耗函数上应用最大平坦度算法来找到这些估计器。接下来,我们通过广泛的数值验证将分析扩展到深度学习场景。使用两种算法,熵-SGD和复制-SGD,明确地包括在优化目标中,所谓的非局部平整度措施称为本地熵,我们一直提高常见架构的泛化误差(例如Resnet,CeffectnNet)。易于计算的平坦度测量显示与测试精度明确的相关性。
translated by 谷歌翻译
This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time.
translated by 谷歌翻译
Computational units in artificial neural networks follow a simplified model of biological neurons. In the biological model, the output signal of a neuron runs down the axon, splits following the many branches at its end, and passes identically to all the downward neurons of the network. Each of the downward neurons will use their copy of this signal as one of many inputs dendrites, integrate them all and fire an output, if above some threshold. In the artificial neural network, this translates to the fact that the nonlinear filtering of the signal is performed in the upward neuron, meaning that in practice the same activation is shared between all the downward neurons that use that signal as their input. Dendrites thus play a passive role. We propose a slightly more complex model for the biological neuron, where dendrites play an active role: the activation in the output of the upward neuron becomes optional, and instead the signals going through each dendrite undergo independent nonlinear filterings, before the linear combination. We implement this new model into a ReLU computational unit and discuss its biological plausibility. We compare this new computational unit with the standard one and describe it from a geometrical point of view. We provide a Keras implementation of this unit into fully connected and convolutional layers and estimate their FLOPs and weights change. We then use these layers in ResNet architectures on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof, obtaining performance improvements over standard ResNets up to 1.73%. Finally, we prove a universal representation theorem for continuous functions on compact sets and show that this new unit has more representational power than its standard counterpart.
translated by 谷歌翻译
Non-linear state-space models, also known as general hidden Markov models, are ubiquitous in statistical machine learning, being the most classical generative models for serial data and sequences in general. The particle-based, rapid incremental smoother PaRIS is a sequential Monte Carlo (SMC) technique allowing for efficient online approximation of expectations of additive functionals under the smoothing distribution in these models. Such expectations appear naturally in several learning contexts, such as likelihood estimation (MLE) and Markov score climbing (MSC). PARIS has linear computational complexity, limited memory requirements and comes with non-asymptotic bounds, convergence results and stability guarantees. Still, being based on self-normalised importance sampling, the PaRIS estimator is biased. Our first contribution is to design a novel additive smoothing algorithm, the Parisian particle Gibbs PPG sampler, which can be viewed as a PaRIS algorithm driven by conditional SMC moves, resulting in bias-reduced estimates of the targeted quantities. We substantiate the PPG algorithm with theoretical results, including new bounds on bias and variance as well as deviation inequalities. Our second contribution is to apply PPG in a learning framework, covering MLE and MSC as special examples. In this context, we establish, under standard assumptions, non-asymptotic bounds highlighting the value of bias reduction and the implicit Rao--Blackwellization of PPG. These are the first non-asymptotic results of this kind in this setting. We illustrate our theoretical results with numerical experiments supporting our claims.
translated by 谷歌翻译
The visual dimension of cities has been a fundamental subject in urban studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim, and Jacobs. Several decades later, big data and artificial intelligence (AI) are revolutionizing how people move, sense, and interact with cities. This paper reviews the literature on the appearance and function of cities to illustrate how visual information has been used to understand them. A conceptual framework, Urban Visual Intelligence, is introduced to systematically elaborate on how new image data sources and AI techniques are reshaping the way researchers perceive and measure cities, enabling the study of the physical environment and its interactions with socioeconomic environments at various scales. The paper argues that these new approaches enable researchers to revisit the classic urban theories and themes, and potentially help cities create environments that are more in line with human behaviors and aspirations in the digital age.
translated by 谷歌翻译