A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor,'' penalizing overflexible and overcomplex models.The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained.This paper makes use of the Bayesian framework for regularization and model comparison described in the companion paper "Bayesian Interpolation" (MacKay 1992a). This framework is due to Gull and Skilling (Gull 1989).
The Gaps in BackpropThere are many knobs on the black box of "backprop" [learning by backpropagation of errors (Rumelhart et al. 198611. Generally these knobs are set by rules of thumb, trial and error, and the use of reserved test data to assess generalization ability (or more sophisticated cross-validation). The knobs fall into two classes: (1) parameters that change the effective learning model, for example, number of hidden units, and weight decay
translated by 谷歌翻译