Two features distinguish the Bayesian approach to learning models from data. First, beliefs derived from background knowledge are used to select a prior probability distribution for the model parameters. Second, predictions of future observations are made by integrating the model's predictions with respect to the posterior parameter distribution obtained by updating this prior to take account of the data. For neural network models, both these aspects present diiculties | the prior over network parameters has no obvious relation to our prior knowledge, and integration over the posterior is computationally very demanding. I address the rst problem by deening classes of prior distributions for network parameters that reach sensible limits as the size of the network goes to innnity. In this limit, the properties of these priors can be elucidated. Some priors converge to Gaussian processes, in which functions computed by the network may be smooth, Brownian, or fractionally Brownian. Other priors converge to non-Gaussian stable processes. Interesting eeects are obtained by combining priors of both sorts in networks with more than one hidden layer. The problem of integrating over the posterior can be solved using Markov chain Monte Carlo methods. I demonstrate that the hybrid Monte Carlo algorithm, which is based on dynamical simulation, is superior to methods based on simple random walks. I use a hybrid Monte Carlo implementation to test the performance of Bayesian neural network models on several synthetic and real data sets. Good results are obtained on small data sets when large networks are used in conjunction with priors designed to reach limits as network size increases, connrming that with Bayesian learning one need not restrict the complexity of the network based on the size of the data set. A Bayesian approach is also found to be eeective in automatically determining the relevance of inputs. ii
translated by 谷歌翻译
Abslract Bayesian probabilily theory provides a unifying framework for dara modelling. In this framework the overall aims are to find models that are well-matched to, the &a, and to use &se models to make optimal predictions. Neural network laming is interpreted as an inference of the most probable parameters for Ihe model, given the training data The search in model space (i.e., the space of architectures, noise models, preprocessings, regularizes and weight decay constants) can then also be treated as an inference problem, in which we infer the relative probability of alternative models, given the data. This review describes practical techniques based on G ~ ~ s s ~ M approximations for implementation of these powerful methods for controlling, comparing and using adaptive network$. 1. Probability theory and Occam's razor Bayesian probability theory provides a unifying framework for data modelling. A Bayesian data-modeller's aim is to develop probabilistic models that are well-matched to the data, and make optimal predictions using those models. The Bayesian framework has several advantages. Probability theory forces us to make explicit all our modelling assumptions, whereupon our inferences are mechanistic. Once a model space has been defined, then, whatever question we wish to pose, the rules of probability theory give a unique answer which consistently takes into account all the given information. This is in contrast to orthodox (also known as 'frequentist' or 'sampling theoretical') statistics, in which one must invent 'estimatop' of quantities of interest and then choose between those estimators using some criterion measuring their sampling properties; there is no clear principle for deciding which criterion to use to measure the performance of an estimator; nor, for most criteria, is there any systematic procedure for the construction of optimal estimators. Bayesian inference satisfies the likelihood. principle merger 1985): our inferences depend only on~the probabilities assigned to the data that were received, not on properties of other data sets which might have occurred but did not. Probabilistic modelling handles uncertainty in a natural manner. There is a unique prescription (marginalization) for incorporating uncertainty about parameters into our predictions of other variables. Finally, Bayesian model comparison embodies Occam's razor, the principle that states a preference for simple models. This point will be expanded on in a moment. The remainder of section 1 reviews Bayesian model comparison, with particular emphasis on the automatic complexity control that it provides. In section 2 the Bayesian 0954-898X/95/030469+37$1950 @ 1995 IOP Publishing Ltd 469
translated by 谷歌翻译
Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be applied to these machine learning landscapes to gain new insight into the solution space involved in training and the nature of the corresponding predictions. In particular, we can define quantities analogous to molecular structure, thermodynamics, and kinetics, and relate these emergent properties to the structure of the underlying landscape. This Perspective aims to describe these analogies with examples from recent applications, and suggest avenues for new interdisciplinary research.
translated by 谷歌翻译
各种逆问题的高度复杂性对基于模型的重建方案提出了重大挑战,在这种情况下,这些方案经常达到其极限。与此同时,我们见证了基于数据的方法,例如深度学习,取得了非凡的成功。然而,在反向异常的情况下,深度神经网络主要充当黑盒子例程,用于例如在经典图像重构中稍微未指定地去除伪像。在本文中,我们将重点关注有限角度计算机断层扫描的严重不良反演,其中整个边界部分未在测量中捕获。我们将开发一种混合重建框架,将基于模型的稀疏正则化与数据驱动的深度学习融合在一起。我们的方法是可靠的,因为我们只学习可证明不能通过基于模型的方法处理的部分,同时将理论上可控的稀疏正则化技术应用于其余部分。通过剪切变换可以实现对可见和不可见区段的这种分解,该剪切变换允许解析相空间中的波前组。此外,这种分裂使我们能够将明确的任务推荐给神经网络,从而在有限的计算机断层扫描的背景下对其性能进行解释。我们的数值实验表明,我们的算法显着超过纯模型和更多基于数据的重建方法。
translated by 谷歌翻译
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
translated by 谷歌翻译
表征反问题解的统计特性对于决策是必不可少的。贝叶斯反演为此目的提供了易于处理的框架,但是目前的方法在临床上是计算上不可行的形式实际成像应用。我们介绍了两种新的基于加权学习的方法,用于使用贝叶斯逆向求解大规模逆问题:一种基于采样的方法,使用WGAN和一种新颖的微型判别方法,一种使用新型损失函数训练神经网络的直接方法。两种方法的性能是在低剂量3D螺旋CT的图像重建中证明了这一点。我们计算3D图像的后验均值和标准偏差,然后进行假设检验以评估是否存在癌症患者肝脏中的“暗点”。这两种方法都具有计算效率,我们的评估显示了非常有前途的性能,这显然支持了贝叶斯反演在时间关键应用中可用于3D成像的主张。
translated by 谷歌翻译
This thesis describes the Generative Topographic Mapping (GTM) | a non-linear latent variable model, intended for modelling continuous, intrinsically low-dimensional probability distributions, embedded in high-dimensional spaces. It can be seen as a non-linear form of principal component analysis or factor analysis. It also provides a principled alternative to the self-organizing map | a widely established neural network model for unsupervised learning | resolving many of its associated theoretical problems. An important, potential application of the GTM is visualization of high-dimensional data. Since the GTM is non-linear, the relationship between data and its visual representation may be far from trivial, but a better understanding of this relationship can be gained by computing the so-called magniication factor. In essence, the magniication factor relates the distances between data points, as they appear when visualized, to the actual distances between those data points. There are two principal limitations of the basic GTM model. The computational eeort required will grow exponentially with the intrinsic dimensionality of the density model. However, if the intended application is visualization, this will typically not be a problem. The other limitation is the inherent structure of the GTM, which makes it most suitable for modelling moderately curved probability distributions of approximately rectangular shape. When the target distribution is very diierent to that, the aim of maintaining anìnterpretable' structure, suitable for visualizing data, may come in connict with the aim of providing a good density model. The fact that the GTM is a probabilistic model means that results from probability theory and statistics can be used to address problems such as model complexity. Furthermore, this framework provides solid ground for extending the GTM to wider contexts than that of this thesis.
translated by 谷歌翻译
Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (gener-alized) additive models, smoothing spline models, state space models, semiparametric regression , spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of struc-tured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality , which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.
translated by 谷歌翻译
Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations [13, 78, 31]. The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.
translated by 谷歌翻译
声学数据提供从生物学和通信到海洋和地球科学等领域的科学和工程见解。我们调查了机器学习(ML)的进步和变革潜力,包括声学领域的深度学习。 ML是用于自动检测和利用模式印度的广泛的统计技术家族。相对于传统的声学和信号处理,ML是数据驱动的。给定足够的训练数据,ML可以发现特征之间的复杂关系。通过大量的训练数据,ML candiscover模型描述复杂的声学现象,如人类语音和混响。声学中的ML正在迅速发展,具有令人瞩目的成果和未来的重大前景。我们首先介绍ML,然后在五个声学研究领域强调MLdevelopments:语音处理中的源定位,海洋声学中的源定位,生物声学,地震探测和日常场景中的环境声音。
translated by 谷歌翻译
我们开发,讨论并比较几种推理技术,以对抗碰撞实验中的约束参数。通过利用粒子物理过程的潜在空间结构,我们从模拟器中提取额外的信息。该增强数据可用于训练精确估计似然比的神经网络。新方法可以很好地扩展到许多可观测量和高维参数空间,不需要任何近似淋浴和探测器响应的近似,并且可以在几微秒内进行评估。使用弱玻色子融合Higgs生成作为示例过程,我们比较了几种技术的性能。对于使用关于该分数的额外信息训练的似然比估计器,发现了最佳结果,该对数似然函数关于理论参数的梯度。该分数还提供了足够的统计数据,其中包含了标准模型邻域推理所需的所有信息。这些方法使我们能够在基于直方图的传统方法上对有效维度 - 六个运算符施加更大的界限。这种方法优于通用机器学习方法,不利用粒子物理结构,证明了它们有可能大大改善LHC遗留结果的新物理范围。
translated by 谷歌翻译
The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a pos-teriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007).
translated by 谷歌翻译
我们回顾了一些使用机器学习和预测建模分析心脏电生理学数据的最新方法。心律失常,尤其是心房颤动,是全球主要的医疗保健挑战。治疗通常通过导管消融,其涉及心肌区域的靶向局部破坏,其负责引发或使心律失常持续。通过现代电解剖标测系统通过增加空间密度获得的接触心内电图分析,确定消融目标是根据其功能特性确定的,或通过其功能特性确定的。虽然在过去的十年中已经研究了许多定量方法来识别这些关键的治疗部位,但很少有提供可靠且可重复的成功率提升。机器学习技术,包括最近的深度学习方法,提供了一种潜在的途径,可以从现有方法难以分析的大量高度复杂的时空信息中获得新的见解。与预测模型相结合,这些技术为推动该领域提供了令人兴奋的机会,并提供更准确的诊断和强大的个性化治疗。我们对这些方法中的一些进行了概述,并通过更快速地预测系统的未来状态以及通过实验观察来推断这些模型的参数来说明它们用于通过接触电描记图和增强预测建模工具进行预测的用途。
translated by 谷歌翻译
我们提出了一个用于信号重建的原理贝叶斯框架,其中信号由基函数建模,其数量(和形式,如果需要)由数据本身确定。这种方法基于贝叶斯对传统稀疏重建和规范化技术的解释,其中通过贝叶斯模型选择通过先验施加稀疏性。我们演示了我们用于噪声1维和2维信号的方法,包括天文图像。此外,通过使用产品空间方法,可以将基函数的数量和类型视为整数参数,并且可以直接对其后验分布进行采样。可以看出,与单独计算贝叶斯证据相比,该技术可以实现计算效率的数量级增加,并且可以结合动态嵌套采样使用它进一步提高计算量。我们的方法可以很容易地应用于神经网络,其中它允许通过将节点和隐藏层的数量视为参数,由数据以原始贝叶斯方式确定网络架构。
translated by 谷歌翻译
We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of speci®c applications. However, in any speci®c application the values of necessary parameters may be unknown. In this case, physical observations of the system in the speci®c context are used to learn about the unknown parameters. The process of ®tting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc ®tting, and after calibration the model is used, with the ®tted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the ®tted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-®tting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise. 1. Overview 1.1. Computer models and calibration Various sciences use mathematical models to describe processes that would otherwise be very dif®cult to analyse, and these models are typically implemented in computer codes. Often, the mathematical model is highly complex, and the resulting computer code is large and may be expensive in terms of the computer time required for a single run. Nevertheless, running the computer model will be much cheaper than making direct observations of the process. Sacks, Welch, Mitchell and Wynn (1989) have given several examples. The codes that we consider are deterministic, i.e. running the code with the same inputs always produces the same output. Computer models are generally designed to be applicable to a wide range of particular contexts. However, to use a model to make predictions in a speci®c context it may be necessary ®rst to calibrate the model by using some observed data. To illustrate this process we introduce a simple example. Two more examples are described in detail in Section 2.2. To decide on a dose regime (e.g. size, frequency and release rates of tablets) for a new drug, a pharmacokinetic model is used. This models the movement of the drug through various`compartments various`compartments' of the patient's body and its eventual elimination (e.g. by chemical reactions
translated by 谷歌翻译
质量或结合能是原子核的基本属性。它决定了它的稳定性,反应和衰变率。量化核结合对于理解宇宙中元素的来源非常重要。负责恒星中核合成的天体物理过程远离稳定谷,而实验质量不明。在这种情况下,必须通过使用极端外推的理论预测来提供缺失的核信息。贝叶斯机器学习技术可以应用于通过充分利用实验和计算机之间的偏差中包含的信息来改进预测。我们考虑了10个基于核密度泛函理论的全球模型以及两个现象学质量模型。使用贝叶斯高斯过程和贝叶斯神经网络构造S2nresiduals和可信区间定义理论误差条的仿真器。我们考虑了一个关于核的大型训练数据集,这些数据集的质量是在2003年之前测量的。对于测试数据集,我们考虑了那些在2003年之后确定其质量的异常核。然后我们对2n滴灌进行了outextrapolations。虽然高斯过程和贝叶斯神经网络都显着降低了rms与实验的偏差,但GP提供了更好,更稳定的性能。预测能力的提高是相当惊人的:测试数据集上的实验产生的rms偏差与更多的现象学模型相似。我们获得的经验覆盖概率曲线很好地匹配了非常需要的参考值,以确保不确定性量化的统一性,并且预测的估计可信度区间使得评估个体模型的预测能力成为可能。
translated by 谷歌翻译
We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems. The method is based on the invariances or stability of predicted results when known data are represented as expansions in terms of various stochastic functions. The algorithm predicts the classifica-tion/regression values of new data by combining (via voting) the outputs of these numerous linear expansions in randomly chosen functions. The few parameters (typically only one parameter is used in all studied examples) that this model has may be automatically optimized. The algorithm has been tested on 10 diverse training data sets of various types and feature space dimensions. It has been shown to consistently exhibit high accuracy and readily allow for optimization of parameters, while simultaneously avoiding pitfalls of existing algorithms such as those associated with class imbalance. The ensemble of stochastic functions that we use suggests a way of deriving algorithm independent bounds on the accuracy. We very briefly speculate on whether spatial coordinates in physical theories may be viewed as emergent "features" that enable a robust machine learning type description of data with generic low order smooth functions.
translated by 谷歌翻译
最近的工作引入了一种简单的数值方法,用于求解具有深度神经网络(DNN)的偏微分方程(PDE)。本文对该方法进行了回顾和扩展,同时将其应用于分析数值偏微分方程和非线性分析中最基本的特征之一:irregularsolutions。首先,讨论,分析了可压缩欧拉方程的Sod激波管解决方案,然后与传统的有限元和有限体积方法进行了比较。扩展这些方法以考虑性能改进和同时参数空间探索。接下来,解决了可压缩磁流体动力学(MHD)的冲击解决方案,并将其用于利用实验数据来增强PDE系统的情况,该PDE系统不足以验证对观察/实验数据的验证。这是通过使用源项加强模型PDE系统并使用合成实验数据的监督训练来实现的。由此产生的PDE的DNN框架似乎表现出几乎非常简单的系统分型,大数据集的自然整合(无论是合成的实验),同时还能实现对整个参数空间的单遍探索。
translated by 谷歌翻译
This paper considers Bayesian counterparts of the classical tests for goodness of fit and their use in judging the fit of a single Bayesian model to the observed data. We focus on posterior predictive assessment, in a framework that also includes conditioning on auxiliary statistics. The Bayesian formulation facilitates the construction and calculation of a meaningful reference distribution not only for any (classical) statistic, but also for any parameter-dependent "statistic" or discrepancy. The latter allows us to propose the realized discrepancy assessment of model fitness, which directly measures the true discrepancy between data and the posited model, for any aspect of the model which we want to explore. The computation required for the realized discrepancy assessment is a straightforward byproduct of the posterior simulation used for the original Bayesian analysis. We illustrate with three applied examples. The first example, which serves mainly to motivate the work, illustrates the difficulty of classical tests in assessing the fitness of a Poisson model to a positron emission tomography image that is constrained to be nonnegative. The second and third examples illustrate the details of the posterior predictive approach in two problems: estimation in a model with inequality constraints on the parameters, and estimation in a mixture model. In all three examples, standard test statistics (either a χ 2 or a likelihood ratio) are not pivotal: the difficulty is not just how to compute the reference distribution for the test, but that in the classical framework no such distribution exists, independent of the unknown model parameters.
translated by 谷歌翻译
基于one.g.的数值模拟评估的计算工作量。有限元方法很高。元模型可用于创建低成本替代方案。然而,用于创建足够的元模型的所需样本的数量应该保持较低,这可以通过使用自适应采样技术来实现。在这篇硕士论文中,研究了自适应采样技术在使用克里金技术创建元模型中的应用,该技术通过由先验协方差控制的高斯过程来插值。提出了扩展到多保真问题的Kriging框架,并用于比较文献中提出的基准问题的自适应采样技术以及接触力学的应用。本文首次对Kriging框架的自适应技术的大范围进行了综合比较。此外,自适应技术的灵活性被引入到多保真Kriging以及具有减少的超参数维度的Kriging模型,称为偏最小二乘Kriging。此外,提出了一种创新的二进制分类自适应方案,并用于识别Duffing型振荡器的混沌运动。
translated by 谷歌翻译