整流的线性单元目前是深度卷积神经网络中的最先进的激活功能。为了对抗Relu的垂死神经元问题,我们提出了参数分层线性单元(PVLU),其增加了具有培训系数的正弦函数来relu。随着在整个真实域的非线性和非零梯度引入非线性和非零梯度,PVLU在转移学习的背景下实施时作为微调的机制。在简单的非转移顺序CNN上,PVLU取代允许的相对误差减少16.3%和11.3%(无且数据增强)在CIFAR-100上。 PVLU也在转移学习模型上进行测试。 VGG-16和VGG-19分别在CREU与PVLU取代后,在CIFAR-10分别体验了9.5%和10.7%的相对误差。当在高斯过滤的CiFar-10图像上进行培训时,VGG型号将注意类似的改进。最值得注意的是,使用PVLU的微调允许在CIFAR数据集上的近最先进的剩余神经网络架构上的相对误差减少和超过10%。
translated by 谷歌翻译
An activation function has a significant impact on the efficiency and robustness of the neural networks. As an alternative, we evolved a cutting-edge non-monotonic activation function, Negative Stimulated Hybrid Activation Function (Nish). It acts as a Rectified Linear Unit (ReLU) function for the positive region and a sinus-sigmoidal function for the negative region. In other words, it incorporates a sigmoid and a sine function and gaining new dynamics over classical ReLU. We analyzed the consistency of the Nish for different combinations of essential networks and most common activation functions using on several most popular benchmarks. From the experimental results, we reported that the accuracy rates achieved by the Nish is slightly better than compared to the Mish in classification.
translated by 谷歌翻译
为了对线性不可分离的数据进行分类,神经元通常被组织成具有至少一个隐藏层的多层神经网络。灵感来自最近神经科学的发现,我们提出了一种新的神经元模型以及一种新的激活函数,可以使用单个神经元来学习非线性决策边界。我们表明标准神经元随后是新颖的顶端枝晶激活(ADA)可以使用100 \%的精度来学习XOR逻辑函数。此外,我们在计算机视觉,信号处理和自然语言处理中进行五个基准数据集进行实验,即摩洛哥,utkface,crema-d,时尚mnist和微小的想象成,表明ADA和泄漏的ADA功能提供了卓越的结果用于各种神经网络架构的整流线性单元(Relu),泄漏的Relu,RBF和嗖嗖声,例如单隐层或两个隐藏层的多层的Perceptrons(MLPS)和卷积神经网络(CNNS),如LENET,VGG,RESET和字符级CNN。当我们使用具有顶端树突激活(Pynada)的金字塔神经元改变神经元的标准模型时,我们获得进一步的性能改进。我们的代码可用于:https://github.com/raduionescu/pynada。
translated by 谷歌翻译
在神经网络中,通过激活功能引入非线性。一个常用的激活功能是整流线性单元(Relu)。 Relu是一个激烈的激活,但有缺陷。像嗖嗖声和莫什这样的最先进的功能现在,他们的注意力是一个更好的选择,因为它们打击了其他激活功能呈现的许多缺陷。 COLU是一个类似于闪光和MISH的激活函数。它定义为f(x)= x /(1-xe ^ - (x + e ^ x))。它是光滑的,不断微分,未呈现的上面,偏向于下方,不饱和和非单调。基于用具有不同激活功能的COLU完成的实验,观察到COLU通常比更深的神经网络上的其他功能更好地执行。在逐步越来越多的卷积层上训练Mnist上的不同神经网络,COLU保留了更多层的最高精度。在带有8个卷积层的较小网络上,COLU具有最高的平均准确性,紧随其后的是Relu。在Sfirfure-Mnist培训的VGG-13上,COLU比MISH高4.20%,比RELU高3.31%。在CIFAR-10培训的Resnet-9上,Colu比速度高0.05%,精度高出0.09%,比Relu高0.29%。观察到,激活函数可以基于包括层数,层数,参数类型,参数数量,参数数,学习速率,优化器等的不同因素来表现得好。可以在这些因素和激活功能上进行进一步的研究更优化的激活功能和更多关于他们行为的知识。
translated by 谷歌翻译
受生物神经元的启发,激活功能在许多现实世界中常用的任何人工神经网络的学习过程中起着重要作用。文献中已经提出了各种激活功能,用于分类和回归任务。在这项工作中,我们调查了过去已经使用的激活功能以及当前的最新功能。特别是,我们介绍了多年来激活功能的各种发展以及这些激活功能的优势以及缺点或局限性。我们还讨论了经典(固定)激活功能,包括整流器单元和自适应激活功能。除了基于表征的激活函数的分类法外,还提出了基于应用的激活函数的分类法。为此,对MNIST,CIFAR-10和CIFAR-100等分类数据集进行了各种固定和自适应激活函数的系统比较。近年来,已经出现了一个具有物理信息的机器学习框架,以解决与科学计算有关的问题。为此,我们还讨论了在物理知识的机器学习框架中使用的激活功能的各种要求。此外,使用Tensorflow,Pytorch和Jax等各种机器学习库之间进行了不同的固定和自适应激活函数进行各种比较。
translated by 谷歌翻译
The choice of activation functions and their motivation is a long-standing issue within the neural network community. Neuronal representations within artificial neural networks are commonly understood as logits, representing the log-odds score of presence of features within the stimulus. We derive logit-space operators equivalent to probabilistic Boolean logic-gates AND, OR, and XNOR for independent probabilities. Such theories are important to formalize more complex dendritic operations in real neurons, and these operations can be used as activation functions within a neural network, introducing probabilistic Boolean-logic as the core operation of the neural network. Since these functions involve taking multiple exponents and logarithms, they are computationally expensive and not well suited to be directly used within neural networks. Consequently, we construct efficient approximations named $\text{AND}_\text{AIL}$ (the AND operator Approximate for Independent Logits), $\text{OR}_\text{AIL}$, and $\text{XNOR}_\text{AIL}$, which utilize only comparison and addition operations, have well-behaved gradients, and can be deployed as activation functions in neural networks. Like MaxOut, $\text{AND}_\text{AIL}$ and $\text{OR}_\text{AIL}$ are generalizations of ReLU to two-dimensions. While our primary aim is to formalize dendritic computations within a logit-space probabilistic-Boolean framework, we deploy these new activation functions, both in isolation and in conjunction to demonstrate their effectiveness on a variety of tasks including image classification, transfer learning, abstract reasoning, and compositional zero-shot learning.
translated by 谷歌翻译
Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layerdeep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at https: //github.com/szagoruyko/wide-residual-networks.
translated by 谷歌翻译
Very deep convolutional networks with hundreds of layers have led to significant reductions in error on competitive benchmarks. Although the unmatched expressiveness of the many layers can be highly desirable at test time, training very deep networks comes with its own set of challenges. The gradients can vanish, the forward flow often diminishes, and the training time can be painfully slow. To address these problems, we propose stochastic depth, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time. We start with very deep networks but during training, for each mini-batch, randomly drop a subset of layers and bypass them with the identity function. This simple approach complements the recent success of residual networks. It reduces training time substantially and improves the test error significantly on almost all data sets that we used for evaluation. With stochastic depth we can increase the depth of residual networks even beyond 1200 layers and still yield meaningful improvements in test error (4.91% on CIFAR-10).
translated by 谷歌翻译
纯粹后的损害评估对于管理资源分配和执行有效响应至关重要。传统上,这种评估是通过野外侦察进行的,该侦察速度缓慢,危险且艰巨。取而代之的是,在本文中,我们进一步提出了通过卷积神经网络实施深度学习的想法,以便将建筑物的后卫星卫星图像分类为被洪水/损坏或未损坏的。该实验是在2017年哈维飓风之后使用的,该数据集采用了一个包含大休斯顿地区的纯种后卫星图像的数据集进行。本文实施了三个卷积神经网络模型体系结构,并配对了其他模型考虑,以实现高精度(超过99%),(超过99%),,超过99%),(超过99%)加强在殖民后灾难评估中有效使用机器学习。
translated by 谷歌翻译
Neural networks require careful weight initialization to prevent signals from exploding or vanishing. Existing initialization schemes solve this problem in specific cases by assuming that the network has a certain activation function or topology. It is difficult to derive such weight initialization strategies, and modern architectures therefore often use these same initialization schemes even though their assumptions do not hold. This paper introduces AutoInit, a weight initialization algorithm that automatically adapts to different neural network architectures. By analytically tracking the mean and variance of signals as they propagate through the network, AutoInit appropriately scales the weights at each layer to avoid exploding or vanishing signals. Experiments demonstrate that AutoInit improves performance of convolutional, residual, and transformer networks across a range of activation function, dropout, weight decay, learning rate, and normalizer settings, and does so more reliably than data-dependent initialization methods. This flexibility allows AutoInit to initialize models for everything from small tabular tasks to large datasets such as ImageNet. Such generality turns out particularly useful in neural architecture search and in activation function discovery. In these settings, AutoInit initializes each candidate appropriately, making performance evaluations more accurate. AutoInit thus serves as an automatic configuration tool that makes design of new neural network architectures more robust. The AutoInit package provides a wrapper around TensorFlow models and is available at https://github.com/cognizant-ai-labs/autoinit.
translated by 谷歌翻译
近年来,神经网络已显示出巨大的增长,以解决许多问题。已经引入了各种类型的神经网络来处理不同类型的问题。但是,任何神经网络的主要目标是使用层层次结构将非线性可分离的输入数据转换为更线性可分离的抽象特征。这些层是线性和非线性函数的组合。最流行和常见的非线性层是激活功能(AFS),例如Logistic Sigmoid,Tanh,Relu,Elu,Swish和Mish。在本文中,在神经网络中为AFS提供了全面的概述和调查,以进行深度学习。涵盖了不同类别的AFS,例如Logistic Sigmoid和Tanh,基于RELU,基于ELU和基于学习的AFS。还指出了AFS的几种特征,例如输出范围,单调性和平滑度。在具有不同类型的数据的不同网络的18个最先进的AF中,还进行了性能比较。提出了AFS的见解,以使研究人员受益于进一步的研究和从业者在不同选择中进行选择。用于实验比较的代码发布于:\ url {https://github.com/shivram1987/activationfunctions}。
translated by 谷歌翻译
Training a very deep neural network is a challenging task, as the deeper a neural network is, the more non-linear it is. We compare the performances of various preconditioned Langevin algorithms with their non-Langevin counterparts for the training of neural networks of increasing depth. For shallow neural networks, Langevin algorithms do not lead to any improvement, however the deeper the network is and the greater are the gains provided by Langevin algorithms. Adding noise to the gradient descent allows to escape from local traps, which are more frequent for very deep neural networks. Following this heuristic we introduce a new Langevin algorithm called Layer Langevin, which consists in adding Langevin noise only to the weights associated to the deepest layers. We then prove the benefits of Langevin and Layer Langevin algorithms for the training of popular deep residual architectures for image classification.
translated by 谷歌翻译
Most modern convolutional neural networks (CNNs) used for object recognition are built using the same principles: Alternating convolution and max-pooling layers followed by a small number of fully connected layers. We re-evaluate the state of the art for object recognition from small images with convolutional networks, questioning the necessity of different components in the pipeline. We find that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks. Following this finding -and building on other recent work for finding simple network structures -we propose a new architecture that consists solely of convolutional layers and yields competitive or state of the art performance on several object recognition datasets (CIFAR-10, CIFAR-100, ImageNet). To analyze the network we introduce a new variant of the "deconvolution approach" for visualizing features learned by CNNs, which can be applied to a broader range of network structures than existing approaches.
translated by 谷歌翻译
Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks. However, this hypothesis has never been systematically tested. Here, we compare the performance of 16 classification networks on 12 image classification datasets. We find that, when networks are used as fixed feature extractors or fine-tuned, there is a strong correlation between ImageNet accuracy and transfer accuracy (r = 0.99 and 0.96, respectively). In the former setting, we find that this relationship is very sensitive to the way in which networks are trained on ImageNet; many common forms of regularization slightly improve ImageNet accuracy but yield penultimate layer features that are much worse for transfer learning. Additionally, we find that, on two small fine-grained image classification datasets, pretraining on ImageNet provides minimal benefits, indicating the learned features from Ima-geNet do not transfer well to fine-grained tasks. Together, our results show that ImageNet architectures generalize well across datasets, but ImageNet features are less general than previously suggested.
translated by 谷歌翻译
由于它们的时间加工能力及其低交换(尺寸,重量和功率)以及神经形态硬件中的节能实现,尖峰神经网络(SNNS)已成为传统人工神经网络(ANN)的有趣替代方案。然而,培训SNNS所涉及的挑战在准确性方面有限制了它们的表现,从而限制了他们的应用。因此,改善更准确的特征提取的学习算法和神经架构是SNN研究中的当前优先级之一。在本文中,我们展示了现代尖峰架构的关键组成部分的研究。我们在从最佳执行网络中凭经验比较了图像分类数据集中的不同技术。我们设计了成功的残余网络(Reset)架构的尖峰版本,并测试了不同的组件和培训策略。我们的结果提供了SNN设计的最新版本,它允许在尝试构建最佳视觉特征提取器时进行明智的选择。最后,我们的网络优于CIFAR-10(94.1%)和CIFAR-100(74.5%)数据集的先前SNN架构,并将现有技术与DVS-CIFAR10(71.3%)相匹配,参数较少而不是先前的状态艺术,无需安静转换。代码在https://github.com/vicenteax/spiking_resnet上获得。
translated by 谷歌翻译
本文提出了一种新的和富有激光激活方法,被称为FPLUS,其利用具有形式的极性标志的数学功率函数。它是通过常见的逆转操作来启发,同时赋予仿生学的直观含义。制剂在某些先前知识和预期特性的条件下理论上得出,然后通过使用典型的基准数据集通过一系列实验验证其可行性,其结果表明我们的方法在许多激活功能中拥有卓越的竞争力,以及兼容稳定性许多CNN架构。此外,我们将呈现给更广泛类型的功能延伸到称为PFPlus的函数,具有两个可以固定的或学习的参数,以便增加其表现力的容量,并且相同的测试结果验证了这种改进。
translated by 谷歌翻译
最近的结果表明,在训练期间重新升级神经网络参数的子集可以改善泛化,特别是对于小型训练集。我们研究不同重新初始化方法在12个基准图像分类数据集中的几种卷积架构中的影响,分析了它们的潜在收益和突出显示限制。我们还介绍了一种新的层状重新初始化算法,优于先前的方法,并建议观察到的改进的泛化的解释。首先,我们表明,无需增加重量的规范,可以在不增加重量的规范的情况下增加训练示例的余量。因此,导致神经网络的边缘的泛化范围的改善。其次,我们证明它在损失表面的平坦局部最小值中稳定。第三,它鼓励学习一般规则,并通过强调神经网络的下层来劝阻记忆。我们的外带消息是使用自下而上的层状重新初始化的小型数据集可以改善卷积神经网络的准确性,其中重新初始层的数量可能因可用计算预算而变化。
translated by 谷歌翻译
为了增强神经网络的非线性并提高输入和响应变量之间的映射能力,激活函数在数据中扮演更复杂的关系和模式的重要作用。在这项工作中,提出了一种新颖的方法,仅通过向传统的激活功能(如Sigmoid,TanH和Relu)添加很少的参数来自适应地自定义激活函数。为了验证所提出的方法的有效性,提出了关于加速收敛性和提高性能的一些理论和实验分析,并基于各种网络模型进行一系列实验(例如AlexNet,Vggnet,Googlenet,Reset和DenSenet)和各种数据集(如Cifar10,CiFar100,MiniimAgenet,Pascal VOC和Coco)。为了进一步验证各种优化策略和使用场景中的有效性和适用性,还在不同的优化策略(如SGD,势头,adagrad,Adadelta和AdaDelta和Adam)之间实施了一些比较实验以及与分类和检测等不同的识别任务。结果表明,提出的方法非常简单,但在收敛速度,精度和泛化方面具有显着性能,它可以超越像雷丝和自适应功能等其他流行的方法,如在整体性能方面几乎所有实验。该代码公开可在https://github.com/huhaigen/aptove-custivation-操作系统上使用。该包装包括所提出的三种自适应激活功能,可用于可重复性目的。
translated by 谷歌翻译
胶囊网络是一类神经网络,可在许多计算机视觉任务上取得有希望的结果。但是,由于高计算和内存要求,基线胶囊网络未能在更复杂的数据集上达到最新结果。我们通过提出一种称为动量胶囊网络(Mocapsnet)的新网络体系结构来解决这个问题。Mocapsnets的灵感来自动量Resnets,这是一种应用可逆残留构建块的网络。可逆的网络允许重新计算后反向传播算法中正向通行的激活,因此可以大大减少这些内存要求。在本文中,我们提供了一个框架,介绍如何将可逆的残留构建块应用于胶囊网络。我们将证明Mocapsnet在MNIST,SVHN,CIFAR-10和CIFAR-100上击败基线胶囊网络的准确性,同时使用的内存较少。源代码可在https://github.com/moejoe95/mocapsnet上找到。
translated by 谷歌翻译
非线性激活功能赋予神经网络,具有学习复杂的高维功能的能力。激活功能的选择是一个重要的超参数,确定深神经网络的性能。它显着影响梯度流动,训练速度,最终是神经网络的表示力。像Sigmoids这样的饱和活化功能遭受消失的梯度问题,不能用于深神经网络。通用近似定理保证,Sigmoids和Relu的多层网络可以学习任意复杂的连续功能,以任何准确性。尽管多层神经网络来学习任意复杂的激活功能,但传统神经网络中的每个神经元(使用SIGMOIDS和Relu类似的网络)具有单个超平面作为其决策边界,因此进行线性分类。因此,具有S形,Relu,Swish和Mish激活功能的单个神经元不能学习XOR函数。最近的研究已经发现了两层和三个人皮层中的生物神经元,具有摆动激活功能并且能够单独学习XOR功能。生物神经元中振荡激活功能的存在可能部分解释生物和人工神经网络之间的性能差距。本文提出了4个新的振荡激活功能,使单个神经元能够在没有手动功能工程的情况下学习XOR功能。本文探讨了使用振荡激活功能来解决较少神经元并减少培训时间的分类问题的可能性。
translated by 谷歌翻译