近年来,使用正交矩阵已被证明是通过训练,稳定性和收敛尤其是控制梯度来改善复发性神经网络(RNN)的一种有希望的方法。通过使用各种门和记忆单元,封闭的复发单元(GRU)和长期短期记忆(LSTM)体系结构解决了消失的梯度问题,但它们仍然容易出现爆炸梯度问题。在这项工作中,我们分析了GRU中的梯度,并提出了正交矩阵的使用,以防止梯度问题爆炸并增强长期记忆。我们研究了在哪里使用正交矩阵,并提出了基于Neumann系列的缩放尺度的Cayley转换,以训练GRU中的正交矩阵,我们称之为Neumann-cayley Orthoconal orthoconal Gru或简单的NC-GRU。我们介绍了有关几个合成和现实世界任务的模型的详细实验,这些实验表明NC-GRU明显优于GRU以及其他几个RNN。
translated by 谷歌翻译
事实证明,诸如层归一化(LN)和批处理(BN)之类的方法可有效改善复发性神经网络(RNN)的训练。但是,现有方法仅在一个特定的时间步骤中仅使用瞬时信息进行归一化,而归一化的结果是具有时间无关分布的预反应状态。该实现无法解释RNN的输入和体系结构中固有的某些时间差异。由于这些网络跨时间步骤共享权重,因此也可能需要考虑标准化方案中时间步长之间的连接。在本文中,我们提出了一种称为“分类时间归一化”(ATN)的归一化方法,该方法保留了来自多个连续时间步骤的信息,并使用它们归一化。这种设置使我们能够将更长的时间依赖项引入传统的归一化方法,而无需引入任何新的可训练参数。我们介绍了梯度传播的理论推导,并证明了权重缩放不变属性。我们将ATN应用于LN的实验表明,对各种任务(例如添加,复制和DENOISE问题和语言建模问题)表现出一致的改进。
translated by 谷歌翻译
In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.
translated by 谷歌翻译
We introduce a novel gated recurrent unit (GRU) with a weighted time-delay feedback mechanism in order to improve the modeling of long-term dependencies in sequential data. This model is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs). By considering a suitable time-discretization scheme, we propose $\tau$-GRU, a discrete-time gated recurrent unit with delay. We prove the existence and uniqueness of solutions for the continuous-time model, and we demonstrate that the proposed feedback mechanism can help improve the modeling of long-term dependencies. Our empirical results show that $\tau$-GRU can converge faster and generalize better than state-of-the-art recurrent units and gated recurrent architectures on a range of tasks, including time-series classification, human activity recognition, and speech recognition.
translated by 谷歌翻译
Common to all different kinds of recurrent neural networks (RNNs) is the intention to model relations between data points through time. When there is no immediate relationship between subsequent data points (like when the data points are generated at random, e.g.), we show that RNNs are still able to remember a few data points back into the sequence by memorizing them by heart using standard backpropagation. However, we also show that for classical RNNs, LSTM and GRU networks the distance of data points between recurrent calls that can be reproduced this way is highly limited (compared to even a loose connection between data points) and subject to various constraints imposed by the type and size of the RNN in question. This implies the existence of a hard limit (way below the information-theoretic one) for the distance between related data points within which RNNs are still able to recognize said relation.
translated by 谷歌翻译
预测基金绩效对投资者和基金经理都是有益的,但这是一项艰巨的任务。在本文中,我们测试了深度学习模型是否比传统统计技术更准确地预测基金绩效。基金绩效通常通过Sharpe比率进行评估,该比例代表了风险调整的绩效,以确保基金之间有意义的可比性。我们根据每月收益率数据序列数据计算了年度夏普比率,该数据的时间序列数据为600多个投资于美国上市大型股票的开放式共同基金投资。我们发现,经过现代贝叶斯优化训练的长期短期记忆(LSTM)和封闭式复发单元(GRUS)深度学习方法比传统统计量相比,预测基金的Sharpe比率更高。结合了LSTM和GRU的预测的合奏方法,可以实现所有模型的最佳性能。有证据表明,深度学习和结合能提供有希望的解决方案,以应对基金绩效预测的挑战。
translated by 谷歌翻译
We explore relations between the hyper-parameters of a recurrent neural network (RNN) and the complexity of string sequences it is able to memorize. We compare long short-term memory (LSTM) networks and gated recurrent units (GRUs). We find that an increase of RNN depth does not necessarily result in better memorization capability when the training time is constrained. Our results also indicate that the learning rate and the number of units per layer are among the most important hyper-parameters to be tuned. Generally, GRUs outperform LSTM networks on low complexity sequences while on high complexity sequences LSTMs perform better.
translated by 谷歌翻译
经常性的神经网络(RNNS)是用于顺序建模的强大工具,但通常需要显着的过分识别和正则化以实现最佳性能。这导致在资源限制的环境中部署大型RNN的困难,同时还引入了近似参数选择和培训的并发症。为了解决这些问题,我们介绍了一种“完全张化的”RNN架构,该架构使用轻质的张力列车(TT)分解在每个反复电池内联合编码单独的权重矩阵。该方法代表了一种重量共享的新形式,其减少了多个数量级的模型大小,同时与标准RNN相比保持相似或更好的性能。图像分类和扬声器验证任务的实验表明了减少推理时间和稳定模型培训和封闭表选择的进一步益处。
translated by 谷歌翻译
目标传播(TP)算法计算目标,而不是神经网络的梯度,并以与梯度反向传播(BP)类似但不同的方式向后传播它们。首先将该想法作为扰动替代的反向传播,当训练多层神经网络时可能在梯度评估中获得更高的准确性(Lecun等,1989)。然而,TP仍然是具有许多变体的模板算法,而不是良好识别的算法。重新审视Lecun等人的见解,(1989),最近的Lee等人。 (2015),我们介绍了一个简单版本的目标传播,基于网络层的正则化反转,可在可差异的编程框架中实现。我们将其计算复杂性与BP之一进行了比较,并与BP相比,描绘了TP可以吸引的制度。我们展示了我们的TP如何用于培训具有关于各种序列建模问题的长序列的经常性神经网络。实验结果强调了在实践中在TP中规范化的重要性。
translated by 谷歌翻译
The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism that outperforms Oord et al. (2016b) and investigate the impact of key architectural decisions. The proposed approach achieves state-of-the-art on the WikiText-103 benchmark, even though it features longterm dependencies, as well as competitive results on the Google Billion Words benchmark. Our model reduces the latency to score a sentence by an order of magnitude compared to a recurrent baseline. To our knowledge, this is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
translated by 谷歌翻译
Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, they are often treated as black-box models and as such it is difficult to understand what exactly they learn as well as how they arrive at a particular prediction. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications. This mechanism, which we term state-regularization, makes RNNs transition between a finite set of learnable states. We evaluate state-regularized RNNs on (1) regular languages for the purpose of automata extraction; (2) non-regular languages such as balanced parentheses and palindromes where external memory is required; and (3) real-word sequence learning tasks for sentiment analysis, visual object recognition and text categorisation. We show that state-regularization (a) simplifies the extraction of finite state automata that display an RNN's state transition dynamic; (b) forces RNNs to operate more like automata with external memory and less like finite state machines, which potentiality leads to a more structural memory; (c) leads to better interpretability and explainability of RNNs by leveraging the probabilistic finite state transition mechanism over time steps.
translated by 谷歌翻译
复发性神经网络(RNN)的可伸缩性受到每个时间步骤计算对先前时间步长输出的顺序依赖性的阻碍。因此,加快和扩展RNN的一种方法是减少每个时间步长所需的计算,而不是模型大小和任务。在本文中,我们提出了一个模型,该模型将封闭式复发单元(GRU)作为基于事件的活动模型,我们称为基于事件的GRU(EGRU),其中仅在收到输入事件(事件 - 基于其他单位。当与一次活跃的单位仅一小部分(活动 - 帕斯斯)相结合时,该模型具有比当前RNN的计算更高效的潜力。值得注意的是,我们模型中的活动 - 表格性也转化为梯度下降期间稀疏参数更新,从而将此计算效率扩展到训练阶段。我们表明,与现实世界中最新的经常性网络模型相比,EGRU表现出竞争性能,包括语言建模,同时在推理和培训期间自然保持高活动稀疏性。这为下一代重复网络奠定了基础,这些网络可扩展,更适合新型神经形态硬件。
translated by 谷歌翻译
Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful fANOVA framework. In total, we summarize the results of 5400 experimental runs (≈ 15 years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.
translated by 谷歌翻译
具有诸如LSTM或GRU之类的门控机制的经常性神经网络是模拟顺序数据的强大工具。在机制中,最近被引入到RNN中的隐藏状态下控制信息流的忘记门被重新解释为状态的时间尺度的代表,即RNN保留信息的时间在输入上。在此解释的基础上,已经提出了几种参数初始化方法,以利用数据依赖于数据中的时间依赖性的知识,以提高可读性。然而,解释依赖于各种不切实际的假设,例如在一定时间点之后没有输入。在这项工作中,我们重新考虑了忘记门的解释,更现实的环境。我们首先概括了所存在的网格RNN理论,以便我们可以考虑连续给出输入的情况。然后,我们争论作为时间表示的忘记门的解释是有效的,当随着时间的推移时,当相对于国家的损失的梯度减小时是有效的。我们经验证明现有的RNNS在初始训练阶段满足了几个任务的初始训练阶段,这与先前的初始化方法很好。在此发现的基础上,我们提出了一种构建新的RNN的方法,可以代表比传统模型更长的时间级,这将提高长期顺序数据的可读性。通过使用现实世界数据集的实验,我们验证了我们的方法的有效性。
translated by 谷歌翻译
Recent developments in quantum computing and machine learning have propelled the interdisciplinary study of quantum machine learning. Sequential modeling is an important task with high scientific and commercial value. Existing VQC or QNN-based methods require significant computational resources to perform the gradient-based optimization of a larger number of quantum circuit parameters. The major drawback is that such quantum gradient calculation requires a large amount of circuit evaluation, posing challenges in current near-term quantum hardware and simulation software. In this work, we approach sequential modeling by applying a reservoir computing (RC) framework to quantum recurrent neural networks (QRNN-RC) that are based on classical RNN, LSTM and GRU. The main idea to this RC approach is that the QRNN with randomly initialized weights is treated as a dynamical system and only the final classical linear layer is trained. Our numerical simulations show that the QRNN-RC can reach results comparable to fully trained QRNN models for several function approximation and time series prediction tasks. Since the QRNN training complexity is significantly reduced, the proposed model trains notably faster. In this work we also compare to corresponding classical RNN-based RC implementations and show that the quantum version learns faster by requiring fewer training epochs in most cases. Our results demonstrate a new possibility to utilize quantum neural network for sequential modeling with greater quantum hardware efficiency, an important design consideration for noisy intermediate-scale quantum (NISQ) computers.
translated by 谷歌翻译
Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization, bi-directional dependence and attention. In order to maintain good accuracy, these components are frequently run using full-precision floating-point computation, making them slow, inefficient and difficult to deploy on edge devices. In addition, the complex nature of these operations makes them challenging to quantize using standard quantization methods without a significant performance drop. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions, to serve a wide range of state-of-the-art RNNs. The proposed method enables RNN-based language models to run on edge devices with $2\times$ improvement in runtime, and $4\times$ reduction in model size while maintaining similar accuracy as its full-precision counterpart.
translated by 谷歌翻译
The time-series forecasting (TSF) problem is a traditional problem in the field of artificial intelligence. Models such as Recurrent Neural Network (RNN), Long Short Term Memory (LSTM), and GRU (Gate Recurrent Units) have contributed to improving the predictive accuracy of TSF. Furthermore, model structures have been proposed to combine time-series decomposition methods, such as seasonal-trend decomposition using Loess (STL) to ensure improved predictive accuracy. However, because this approach is learned in an independent model for each component, it cannot learn the relationships between time-series components. In this study, we propose a new neural architecture called a correlation recurrent unit (CRU) that can perform time series decomposition within a neural cell and learn correlations (autocorrelation and correlation) between each decomposition component. The proposed neural architecture was evaluated through comparative experiments with previous studies using five univariate time-series datasets and four multivariate time-series data. The results showed that long- and short-term predictive performance was improved by more than 10%. The experimental results show that the proposed CRU is an excellent method for TSF problems compared to other neural architectures.
translated by 谷歌翻译
基于注意力的深网络已成功应用于NLP字段中的文本数据。然而,与普通文本词不同,它们在蛋白质序列上的应用造成额外的挑战。标准关注技术面临的这些未开发的挑战包括(i)消失注意评分问题和(ii)注意分布的高变化。在这方面,我们介绍了一种新颖的{\ Lambda} -Scaled注意技术,用于快速有效地建模蛋白质序列,这些蛋白质序列解决了上述问题。这用于开发{\ lambda} -scaled注意网络,并评估在蛋白质序列水平上实施的蛋白质功能预测的任务。对生物过程的数据集(BP)和分子函数(MF)的实验表明,基于标准注意技术(+ 2.01%),所提出的{\ Lambda} -scaled技术的F1分数值的F1评分值的显着改进(+ 2.01% BP和MF的+ 4.67%)和最先进的Protvecgen-Plus方法(BP的2.61%,MF的2.20%)。此外,在训练过程中,还观察到快速收敛(在时期的一半)和高效学习(在训练和验证损失之间的差异方面)也被观察到。
translated by 谷歌翻译
研究了自闭症数据集,以确定自闭症和健康组之间的差异。为此,分析了这两组的静止状态功能磁共振成像(RS-FMRI)数据,并创建了大脑区域之间的连接网络。开发了几个分类框架,以区分组之间的连接模式。比较了统计推断和精度的最佳模型,并分析了精度和模型解释性之间的权衡。最后,据报道,分类精度措施证明了我们框架的性能。我们的最佳模型可以以71%的精度将自闭症和健康的患者分类为多站点I数据。
translated by 谷歌翻译
深度学习使用由其重量进行参数化的神经网络。通常通过调谐重量来直接最小化给定损耗功能来训练神经网络。在本文中,我们建议将权重重新参数转化为网络中各个节点的触发强度的目标。给定一组目标,可以计算使得发射强度最佳地满足这些目标的权重。有人认为,通过我们称之为级联解压缩的过程,使用培训的目标解决爆炸梯度的问题,并使损失功能表面更加光滑,因此导致更容易,培训更快,以及潜在的概括,神经网络。它还允许更容易地学习更深层次和经常性的网络结构。目标对重量的必要转换有额外的计算费用,这是在许多情况下可管理的。在目标空间中学习可以与现有的神经网络优化器相结合,以额外收益。实验结果表明了使用目标空间的速度,以及改进的泛化的示例,用于全连接的网络和卷积网络,以及调用和处理长时间序列的能力,并使用经常性网络进行自然语言处理。
translated by 谷歌翻译