Bio-inspired learning has been gaining popularity recently given that Backpropagation (BP) is not considered biologically plausible. Many algorithms have been proposed in the literature which are all more biologically plausible than BP. However, apart from overcoming the biological implausibility of BP, a strong motivation for using Bio-inspired algorithms remains lacking. In this study, we undertake a holistic comparison of BP vs. multiple Bio-inspired algorithms to answer the question of whether Bio-learning offers additional benefits over BP, rather than just biological plausibility. We test Bio-algorithms under different design choices such as access to only partial training data, resource constraints in terms of the number of training epochs, sparsification of the neural network parameters and addition of noise to input samples. Through these experiments, we notably find two key advantages of Bio-algorithms over BP. Firstly, Bio-algorithms perform much better than BP when the entire training dataset is not supplied. Four of the five Bio-algorithms tested outperform BP by upto 5% accuracy when only 20% of the training dataset is available. Secondly, even when the full dataset is available, Bio-algorithms learn much quicker and converge to a stable accuracy in far lesser training epochs than BP. Hebbian learning, specifically, is able to learn in just 5 epochs compared to around 100 epochs required by BP. These insights present practical reasons for utilising Bio-learning rather than just its biological plausibility and also point towards interesting new directions for future work on Bio-learning.
translated by 谷歌翻译
人工神经网络中的监督学习通常依赖于反向传播,其中权重根据误差函数梯度进行更新,并从输出层到输入层依次传播。尽管这种方法已被证明在广泛的应用领域有效,但在许多方面缺乏生物学上的合理性,包括重量对称问题,学习对非本地信号的依赖性,错误传播期间的神经活动的冻结以及更新锁定的冻结问题。已经引入了替代培训计划,包括标志对称性,反馈对准和直接反馈对准,但它们总是依靠向后传球,这阻碍了同时解决所有问题的可能性。在这里,我们建议用第二个正向通行证替换向后通行证,其中根据网络的误差调制输入信号。我们表明,这项新颖的学习规则全面解决了上述所有问题,并且可以应用于完全连接和卷积模型。我们测试了有关MNIST,CIFAR-10和CIFAR-100的学习规则。这些结果有助于将生物学原理纳入机器学习。
translated by 谷歌翻译
最近对反向传播的近似(BP)减轻了BP的许多计算效率低下和与生物学的不兼容性,但仍然存在重要的局限性。此外,近似值显着降低了基准的准确性,这表明完全不同的方法可能更富有成果。在这里,基于在软冠军全网络中Hebbian学习的最新理论基础上,我们介绍了多层softhebb,即一种训练深神经网络的算法,没有任何反馈,目标或错误信号。结果,它通过避免重量传输,非本地可塑性,层更新的时间锁定,迭代平衡以及(自我)监督或其他反馈信号来实现效率,这在其他方法中是必不可少的。与最先进的生物学知识学习相比,它提高的效率和生物兼容性不能取得准确性的折衷,而是改善了准确性。 MNIST,CIFAR-10,STL-10和IMAGENET上最多五个隐藏层和添加的线性分类器,分别达到99.4%,80.3%,76.2%和27.3%。总之,SOFTHEBB显示出与BP的截然不同的方法,即对几层的深度学习在大脑中可能是合理的,并提高了生物学上的机器学习的准确性。
translated by 谷歌翻译
The error Backpropagation algorithm (BP) is a key method for training deep neural networks. While performant, it is also resource-demanding in terms of computation, memory usage and energy. This makes it unsuitable for online learning on edge devices that require a high processing rate and low energy consumption. More importantly, BP does not take advantage of the parallelism and local characteristics offered by dedicated neural processors. There is therefore a demand for alternative algorithms to BP that could improve the latency, memory requirements, and energy footprint of neural networks on hardware. In this work, we propose a novel method based on Direct Feedback Alignment (DFA) which uses Forward-Mode Automatic Differentiation to estimate backpropagation paths and learn feedback connections in an online manner. We experimentally show that Directional DFA achieves performances that are closer to BP than other feedback methods on several benchmark datasets and architectures while benefiting from the locality and parallelization characteristics of DFA. Moreover, we show that, unlike other feedback learning algorithms, our method provides stable learning for convolution layers.
translated by 谷歌翻译
预测性编码(PC)是计算神经科学中的有影响力的理论,它认为皮层通过实施层次结构的预测误差最小化过程来形成无监督的世界模型。 PC网络(PCN)分为两个阶段。首先,更新神经活动以优化网络对外部刺激的反应。其次,更新突触权重以整合活动中的这种变化 - 一种称为\ emph {前瞻性配置}的算法。虽然先前的工作已经显示了如何在各种限制下发现近似倒流(BP),但最近的工作表明,在该标准制度中运行的PCN不近似BP,但仍获得了竞争性培训和广泛性培训,以进行BP训练。网络在诸如在线,几乎没有射击和持续学习之类的任务上的网络效果超过了它们,在该任务中,大脑擅长于大脑。尽管这种有希望的经验表现,但理论上对PCN的性质和动力学在该制度中的理解很少。在本文中,我们对经过预期配置训练的PCN的性质进行了全面的理论分析。我们首先得出有关PCN的推理平衡以及与目标传播(TP)的紧密联系关系的分析结果。其次,我们提供了PCN中学习的理论分析,作为广义期望最大化的变体,并使用它来证明PCN与BP损耗函数的关键点的收敛性,从而表明,从理论上讲,深色PCN可以实现相同的实现。作为BP的概括性能,同时保持其独特的优势。
translated by 谷歌翻译
生物大脑中的自上而下的连接已被证明在高认知功能中很重要。但是,这种机制在机器学习中的功能尚未清楚地定义。在这项研究中,我们建议制定由自下而上和自上而下的网络构成的框架。在这里,我们使用自上而下的信用分配网络(TDCA网络)来替换损失功能和背部传播(BP),该功能是传统自下而上网络培训范式中的反馈机制。我们的结果表明,训练有素的TDCA网络给予的信用优于在多个数据集上不同设置下的分类任务中反向传播的梯度。此外,我们成功地使用了信用扩散的技巧,该技巧可以保持训练和测试性能保持不变,以降低TDCA网络的参数复杂性。更重要的是,通过比较它们在参数景观中的轨迹,我们发现TDCA网络直接达到了全局最佳,而与该反向传播只能获得局部最佳最佳。因此,我们的结果表明,TDCA网络不仅提供了一种生物学合理的学习机制,而且有可能直接实现全球最佳效果,这表明自上而下的信用分配可以替代反向传播,并为深层神经网络提供更好的学习框架。 。
translated by 谷歌翻译
大脑如何执行信用分配是神经科学中的基本未解决问题。已经提出了许多“生物学上合理的”算法,这些算法计算了近似通过反向传播计算的梯度(BP),并以更紧密地满足神经回路施加的约束的方式运行。许多这样的算法都利用了基于能量的模型(EBM)的框架,其中对模型中的所有自由变量进行了优化以最大程度地减少全局能量函数。但是,在文献中,这些算法存在于孤立状态,没有将它们联系在一起的统一理论。在这里,我们提供了一个全面的理论,说明EBM可以近似BP的条件,这使我们能够统一许多BP近似值导致文献中的许多BP近似(即预测性编码,平衡传播和HEBBIAN学习),并证明它们的近似值均为BP源于自由相平衡处EBM的简单和一般数学特性。然后可以通过不同的能量函数以不同的方式利用该属性,这些特定选择产生了BP Approxatimating算法的家族,两者都包含文献中的已知结果,并且可用于得出新的结果。
translated by 谷歌翻译
Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths. We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forward and backward passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning. Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with less neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding.
translated by 谷歌翻译
Backpropagation is widely used to train artificial neural networks, but its relationship to synaptic plasticity in the brain is unknown. Some biological models of backpropagation rely on feedback projections that are symmetric with feedforward connections, but experiments do not corroborate the existence of such symmetric backward connectivity. Random feedback alignment offers an alternative model in which errors are propagated backward through fixed, random backward connections. This approach successfully trains shallow models, but learns slowly and does not perform well with deeper models or online learning. In this study, we develop a novel meta-plasticity approach to discover interpretable, biologically plausible plasticity rules that improve online learning performance with fixed random feedback connections. The resulting plasticity rules show improved online training of deep models in the low data regime. Our results highlight the potential of meta-plasticity to discover effective, interpretable learning rules satisfying biological constraints.
translated by 谷歌翻译
预测编码(PC)是皮质功能的一般理论。最近显示了一种PC模型中的本地梯度的学习规则,以密切近似近似。该发现表明,基于梯度的PC模型可能有助于了解大脑如何解决信用分配问题。该模型也可用于开发与神经族硬件兼容的局部学习算法。在本文中,我们修改了该PC模型,使其更好地适合生物限制,包括神经元只能具有正射击率的约束和突触只在一个方向上流动的约束。我们还计算基于梯度的权重和活动更新,给定修改的活动值。我们表明,在某些条件下,这些修改后的PC网络也表现出或几乎在MNIST数据中作为未修改的PC模型和具有BackPropagation培训的网络。
translated by 谷歌翻译
With an ever-growing number of parameters defining increasingly complex networks, Deep Learning has led to several breakthroughs surpassing human performance. As a result, data movement for these millions of model parameters causes a growing imbalance known as the memory wall. Neuromorphic computing is an emerging paradigm that confronts this imbalance by performing computations directly in analog memories. On the software side, the sequential Backpropagation algorithm prevents efficient parallelization and thus fast convergence. A novel method, Direct Feedback Alignment, resolves inherent layer dependencies by directly passing the error from the output to each layer. At the intersection of hardware/software co-design, there is a demand for developing algorithms that are tolerable to hardware nonidealities. Therefore, this work explores the interrelationship of implementing bio-plausible learning in-situ on neuromorphic hardware, emphasizing energy, area, and latency constraints. Using the benchmarking framework DNN+NeuroSim, we investigate the impact of hardware nonidealities and quantization on algorithm performance, as well as how network topologies and algorithm-level design choices can scale latency, energy and area consumption of a chip. To the best of our knowledge, this work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa. The best results achieved for accuracy remain Backpropagation-based, notably when facing hardware imperfections. Direct Feedback Alignment, on the other hand, allows for significant speedup due to parallelization, reducing training time by a factor approaching N for N-layered networks.
translated by 谷歌翻译
Target Propagation (TP) is a biologically more plausible algorithm than the error backpropagation (BP) to train deep networks, and improving practicality of TP is an open issue. TP methods require the feedforward and feedback networks to form layer-wise autoencoders for propagating the target values generated at the output layer. However, this causes certain drawbacks; e.g., careful hyperparameter tuning is required to synchronize the feedforward and feedback training, and frequent updates of the feedback path are usually required than that of the feedforward path. Learning of the feedforward and feedback networks is sufficient to make TP methods capable of training, but is having these layer-wise autoencoders a necessary condition for TP to work? We answer this question by presenting Fixed-Weight Difference Target Propagation (FW-DTP) that keeps the feedback weights constant during training. We confirmed that this simple method, which naturally resolves the abovementioned problems of TP, can still deliver informative target values to hidden layers for a given task; indeed, FW-DTP consistently achieves higher test performance than a baseline, the Difference Target Propagation (DTP), on four classification datasets. We also present a novel propagation architecture that explains the exact form of the feedback function of DTP to analyze FW-DTP.
translated by 谷歌翻译
尖峰神经网络(SNNS)是脑激发的模型,可在神经形状硬件上实现节能实现。然而,由于尖刺神经元模型的不连续性,SNN的监督培训仍然是一个难题。大多数现有方法模仿人工神经网络的BackProjagation框架和前馈架构,并在尖峰时间使用代理衍生物或计算梯度来处理问题。这些方法累积近似误差,或者仅通过现有尖峰被限制地传播信息,并且通常需要沿着具有大的内存成本和生物言行的时间步长的信息传播。在这项工作中,我们考虑反馈尖刺神经网络,这些神经网络更为大脑,并提出了一种新的训练方法,不依赖于前向计算的确切反向。首先,我们表明,具有反馈连接的SNN的平均触发速率将沿着时间的时间逐渐发展到均衡状态,这沿着定点方程沿着时间延续。然后通过将反馈SNN的前向计算作为这种等式的黑匣子求解器,并利用了方程上的隐式差异,我们可以计算参数的梯度而不考虑确切的前向过程。以这种方式,向前和向后程序被解耦,因此避免了不可微分的尖峰功能的问题。我们还简要介绍了隐含分化的生物合理性,这只需要计算另一个平衡。在Mnist,Fashion-Mnist,N-Mnist,CiFar-10和CiFar-100上进行了广泛的实验,证明了我们在少量时间步骤中具有较少神经元和参数的反馈模型的方法的优越性。我们的代码是在https://github.com/pkuxmq/ide-fsnn中获得的。
translated by 谷歌翻译
Spiking neural networks (SNN) are a viable alternative to conventional artificial neural networks when energy efficiency and computational complexity are of importance. A major advantage of SNNs is their binary information transfer through spike trains. The training of SNN has, however, been a challenge, since neuron models are non-differentiable and traditional gradient-based backpropagation algorithms cannot be applied directly. Furthermore, spike-timing-dependent plasticity (STDP), albeit being a spike-based learning rule, updates weights locally and does not optimize for the output error of the network. We present desire backpropagation, a method to derive the desired spike activity of neurons from the output error. The loss function can then be evaluated locally for every neuron. Incorporating the desire values into the STDP weight update leads to global error minimization and increasing classification accuracy. At the same time, the neuron dynamics and computational efficiency of STDP are maintained, making it a spike-based supervised learning rule. We trained three-layer networks to classify MNIST and Fashion-MNIST images and reached an accuracy of 98.41% and 87.56%, respectively. Furthermore, we show that desire backpropagation is computationally less complex than backpropagation in traditional neural networks.
translated by 谷歌翻译
HEBBIAN在获奖者全方位(WTA)网络中的可塑性对于神经形态的片上学习非常有吸引力,这是由于其高效,本地,无监督和在线性质。此外,它的生物学合理性可能有助于克服人工算法的重要局限性,例如它们对对抗攻击和长期训练时间的敏感性。但是,Hebbian WTA学习在机器学习(ML)中很少使用,这可能是因为它缺少与深度学习兼容的优化理论(DL)。在这里,我们严格地表明,由标准DL元素构建的WTA网络与我们得出的Hebbian样可塑性结合在一起,维持数据的贝叶斯生成模型。重要的是,在没有任何监督的情况下,我们的算法,SOFTHEBB,可以最大程度地减少跨渗透性,即监督DL中的共同损失函数。我们在理论上和实践中展示了这一点。关键是“软” WTA,那里没有绝对的“硬”赢家神经元。令人惊讶的是,在浅网络比较与背面的比较(BP)中,SOFTHEBB表现出超出其HEBBIAN效率的优势。也就是说,它的收敛速度更快,并且对噪声和对抗性攻击更加强大。值得注意的是,最大程度地混淆SoftheBB的攻击也使人眼睛混淆,可能将人类感知的鲁棒性与Hebbian WTA Cortects联系在一起。最后,SOFTHEBB可以将合成对象作为真实对象类的插值生成。总而言之,Hebbian效率,理论的基础,跨透明拷贝最小化以及令人惊讶的经验优势,表明SOFTHEBB可能会激发高度神经态和彻底不同,但实用且有利的学习算法和硬件加速器。
translated by 谷歌翻译
在过去的十年中,修剪神经网络已经流行,当时证明可以安全地从现代神经网络中安全地删除大量权重,而不会损害准确性。从那时起,已经提出了许多修剪方法,每种方法都比以前更好。如今,许多最先进的技术(SOTA)技术依赖于使用重要性得分的复杂修剪方法,通过反向传播获得反馈或在其他等方面获得基于启发式的修剪规则。我们质疑这种引入复杂性的模式,以获得更好的修剪结果。我们对这些SOTA技术基准针对全球幅度修剪(全球MP)(一个天真的修剪基线),以评估是否确实需要复杂性来实现更高的性能。全球MP按其幅度顺序排列权重,并修理最小的权重。因此,它以香草形式是最简单的修剪技术之一。令人惊讶的是,我们发现香草全球MP的表现优于所有其他SOTA技术,并取得了新的SOTA结果。它还可以在拖叉稀疏方面取得良好的性能,当以逐渐修剪的方式进行修剪时,我们发现这是增强的。我们还发现,全球MP在具有卓越性能的任务,数据集和模型之间可以推广。此外,许多修剪算法以高稀疏速率遇到的一个常见问题,即可以通过设置要保留在每层中的最小权重阈值来轻松固定在全球MP中。最后,与许多其他SOTA技术不同,全球MP不需要任何其他特定算法的超参数,并且非常简单地调整和实施。我们在各种模型(WRN-28-8,Resnet-32,Resnet-50,Mobilenet-V1和FastGrnn)和多个数据集(CIFAR-10,Imagenet和HAR-2)上展示了我们的发现。代码可在https://github.com/manasgupta-1/globalmp上找到。
translated by 谷歌翻译
驱动深度学习成功的反向传播很可能与大脑的学习机制不同。在本文中,我们制定了一项受生物学启发的学习规则,该规则在HEBB著名的建议的想法之后,发现了当地竞争的特征。已经证明,该本地学习规则所学的无监督功能可以作为培训模型,以提高某些监督学习任务的绩效。更重要的是,该本地学习规则使我们能够构建一个与返回传播完全不同的新学习范式,该范式命名为激活学习,其中神经网络的输出激活大致衡量了输入模式的可能性。激活学习能够从几乎没有输入模式的几镜头中学习丰富的本地特征,并且当训练样本的数量相对较小时,比反向传播算法表现出明显更好的性能。这种学习范式统一了无监督的学习,监督的学习和生成模型,并且更安全地抵抗对抗性攻击,为建立一般任务神经网络的某些可能性铺平了道路。
translated by 谷歌翻译
为了在具有快速收敛和低内存的边缘设备上学习,我们提出了一种新型的无反向传播优化算法,称为目标投影投影随机梯度下降(TPSGD)。 TPSGD将直接的随机目标投影概括为使用任意损失函数,并扩展训练复发性神经网络(RNN)的目标投影,此外还有其他损失函数。 TPSGD使用层的随机梯度下降(SGD)和通过标签的随机投影生成的局部目标来训练网络逐层,仅通过正向传递。 TPSGD在优化过程中不需要保留梯度,与SGD反向传播(BP)方法相比,记忆分配大大降低了,这些方法需要整个神经网络权重,输入/输出和中间结果的多个实例。我们的方法在相对较浅的层,卷积层和经常性层的相对较浅的网络上,在5%的精度内的BP梯度降低性能相当。 TPSGD还胜过由多层感知器,卷积神经网络(CNN)和RNN组成的浅层模型中的其他最先进的无梯度算法,具有竞争力准确性,记忆力和时间更少。我们评估TPSGD在训练深神经网络(例如VGG)中的性能,并将方法扩展到多层RNN。这些实验突出了与使用TPSGD在边缘的TPSGD进行域转移的优化基于层的适配器训练有关的新研究方向。
translated by 谷歌翻译
最近的作品研究了在神经切线内核(NTK)制度中训练的广泛神经网络的理论和经验特性。鉴于生物神经网络比其人工对应物宽得多,因此我们认为NTK范围广泛的神经网络是生物神经网络的可能模型。利用NTK理论,我们从理论上说明梯度下降驱动层的重量更新与其输入活动相关性一致,并通过误差加权,并从经验上证明了结果在有限宽度的宽网络中也存在。对齐结果使我们能够制定一个生物动机的,无反向传播的学习规则,理论上等同于无限宽度网络中的反向传播。我们测试了馈电和经常性神经网络中基准问题的这些学习规则,并在宽网络中证明了与反向传播相当的性能。所提出的规则在低数据制度中特别有效,这在生物学习环境中很常见。
translated by 谷歌翻译
A large amount of recent research has the far-reaching goal of finding training methods for deep neural networks that can serve as alternatives to backpropagation (BP). A prominent example is predictive coding (PC), which is a neuroscience-inspired method that performs inference on hierarchical Gaussian generative models. These methods, however, fail to keep up with modern neural networks, as they are unable to replicate the dynamics of complex layers and activation functions. In this work, we solve this problem by generalizing PC to arbitrary probability distributions, enabling the training of architectures, such as transformers, that are hard to approximate with only Gaussian assumptions. We perform three experimental analyses. First, we study the gap between our method and the standard formulation of PC on multiple toy examples. Second, we test the reconstruction quality on variational autoencoders, where our method reaches the same reconstruction quality as BP. Third, we show that our method allows us to train transformer networks and achieve a performance comparable with BP on conditional language models. More broadly, this method allows neuroscience-inspired learning to be applied to multiple domains, since the internal distributions can be flexibly adapted to the data, tasks, and architectures used.
translated by 谷歌翻译