We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neu-ral language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity. 1
translated by 谷歌翻译
本文提出了一种先进的递归神经网络(RNN)语言模型,该模型结合了不仅从afinal RNN层而且从中间层计算的概率分布。我们提出的方法基于Yang等人引入的语言建模的矩阵因子分解,提高了语言模型的表达能力。 (2018)。所提出的方法改进了当前最先进的语言模型,并在Penn Treebank和WikiText-2上获得了最佳分数,这是标准的基准数据集。此外,我们指出我们提出的方法有助于两个应用任务:机器翻译和标题生成。我们的代码公开于:https://github.com/nttcslab-nlp/doc_lm。
translated by 谷歌翻译
高度正则化的LSTM在语言建模的几个基准数据集中取得了令人瞩目的成果。我们提出了一种新的正则化方法,该方法基于使用下一个令牌的预测分布对上下文中的最后一个令牌进行解码。这使模型偏向于保留更多的上下文信息,从而提高其预测下一个令牌的能力。我们的pastdecode正规化(PDR)方法在参数数量和训练时间方面具有不可忽略的开销,在Penn Treebank(55.6)和WikiText-2(63.5)数据集上实现了最先进的单词级别的复杂度。用于字符级语言建模的Penn Treebank Character(1.169)数据集。使用动态评估,我们还在Penn Treebank测试集上实现了49.3的第一个50的困惑。
translated by 谷歌翻译
语义合理性的自动评估是一个重要且具有挑战性的任务,目前的自动技术无法很好地识别句子是否在语义上是合理的。基于语言模型的方法不是通过合理性而是通过共性来测量句子。如果人类书面参考不可用,那么基于与人类书面句子的相似性的方法将会失败。在本文中,我们提出了一种新的模型,称为Sememe-Word-Matching Neural Network(SWM-NN),通过利用sememe知识库知网来解决语义合理性评估问题。优点是我们的模型可以利用正确的sememes组合来表示特定上下文中单词的细粒度语义含义。我们使用细粒度的语义表示来帮助模型学习单词之间的语义依赖性。为了评估所提出的模型的有效性,webuild提供了一个大规模的合理性评估数据集。在该数据集上的实验结果表明,所提出的模型优于竞争基线,准确度提高了5.4%。
translated by 谷歌翻译
The long short-term memory (LSTM) language model (LM) has been widely investigated for automatic speech recognition (ASR) and natural language processing (NLP). Although excellent performance is obtained for large vocabulary tasks, tremendous memory consumption prohibits the use of LSTM LMs in low-resource devices. The memory consumption mainly comes from the word embedding layer. In this paper, a novel binarized LSTM LM is proposed to address the problem. Words are encoded into binary vectors and other LSTM parameters are further binarized to achieve high memory compression. This is the first effort to investigate binary LSTMs for large vocabulary language modeling. Experiments on both English and Chinese LM and ASR tasks showed that binarization can achieve a compression ratio of 11.3 without any loss of LM and ASR performance and a compression ratio of 31.6 with acceptable minor performance degradation.
translated by 谷歌翻译
Sememes are minimum semantic units of word meanings, and the meaning of each word sense is typically composed by several sememes. Since sememes are not explicit for each word, people manually annotate word sememes and form linguistic common-sense knowledge bases. In this paper, we present that, word sememe information can improve word representation learning (WRL), which maps words into a low-dimensional semantic space and serves as a fundamental step for many NLP tasks. The key idea is to utilize word sememes to capture exact meanings of a word within specific contexts accurately. More specifically, we follow the framework of Skip-gram and present three sememe-encoded models to learn representations of sememes, senses and words , where we apply the attention scheme to detect word senses in various contexts. We conduct experiments on two tasks including word similarity and word analogy, and our models significantly outperform baselines. The results indicate that WRL can benefit from sememes via the attention scheme, and also confirm our models being capable of correctly modeling sememe information.
translated by 谷歌翻译
在自然语言处理(NLP)中,重要的是检测两个序列之间的关系或者在给定其他观察序列的情况下生成一系列标记。我们将建模序列对的问题类型称为序列到序列(seq2seq)映射问题。许多研究致力于寻找解决这些问题的方法,传统方法依赖于手工制作的特征,对齐模型,分割启发式和外部语言资源的组合。虽然取得了很大进展,但这些传统方法还存在各种缺陷,如复杂的流水线,繁琐的特征工程,以及领域适应的困难。最近,神经网络成为NLP,语音识别和计算机视觉中许多问题的解决方案。神经模型是强大的,因为它们可以端到端地进行训练,很好地概括为看不见的例子,同样的框架可以很容易地适应新的领域。本论文的目的是通过神经网络推进seq2seq映射问题的最新技术。我们从三个主要方面探索解决方案:研究用于表示序列的神经模型,建模序列之间的相互作用,以及使用不成对数据来提高神经模型的性能。对于每个方面,我们提出新模型并评估它们对seq2seq映射的各种任务的功效。
translated by 谷歌翻译
Sememes是人类语言中概念的最小语义单位,例如,词义是由一个或多个sememes组成的。语言通常由语言学家用他们的语义手工注释,并形成广泛用于各种NLP任务的语言常识知识库。最近,引入了神秘的sememe预测任务。它包括自动推荐单词的sememes,这有望提高注释效率和一致性。然而,现有的词汇量预测方法通常依赖于词语的外部语境来表示意义,这通常无法处理低频和词汇外单词。为了解决中国人的这个问题,我们提出了一种新颖的框架,以利用内部字符信息和单词的外部上下文信息。我们在中国的sememe知识库HowNet上进行实验,并证明我们的框架大大优于最先进的基线,并且即使对于低频词也能保持稳健的性能。
translated by 谷歌翻译
Fixed-vocabulary language models fail to account for one of the most characteristic statistical facts of natural language: the frequent creation and reuse of new word types. Although character-level language models offer a partial solution in that they can create word types not attested in the training corpus, they do not capture the "bursty" distribution of such words. In this paper, we augment a hierarchical LSTM language model that generates sequences of word tokens character by character with a caching mechanism that learns to reuse previously generated words. To validate our model we construct a new open-vocabulary language modeling corpus (the Multilingual Wikipedia Corpus; MWC) from comparable Wikipedia articles in 7 typologically diverse languages and demonstrate the effectiveness of our model across this range of languages.
translated by 谷歌翻译
A recent ''third wave'' of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, Kezban Dilek Onal and Ye Zhang contributed equally. Maarten de Rijke and Matthew Lease contributed equally. we survey the current landscape of Neural IR research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research.
translated by 谷歌翻译
在过去几年中,自然语言处理领域受到深度学习模型使用爆炸式推进的推动。本调查简要介绍了该领域,并简要介绍了深度学习架构和方法。然后,它通过大量的研究进行筛选,并总结了大量相关的贡献。经过分析的研究领域包括几个核心语言处理问题,以及计算语言学的许多应用。然后提供对现有技术的讨论以及该领域中的未来研究的建议。
translated by 谷歌翻译
语言模型是许多NLP问题的核心,对研究人员来说总是很有意义。神经语言模型具有分布式表示和远程上下文的优点。凭借其允许在网络内循环信息的特定动态,“Recurrentneural network”(RNN)成为神经语言建模的理想范例。长短期记忆(LSTM)架构解决了标准RNN在建模远程环境中的不足之处。尽管存在过多的RNN变量,但是很少有可能在LSTM节点中添加多个存储器单元。在这里,我们提出了LSTM的多小区节点架构,并研究了神经语言建模的适用性。所提出的多单元LSTM语言模型在众所周知的PennTreebank(PTB)设置上优于最先进的结果。
translated by 谷歌翻译
Neural architectures are prominent in the construction of language models (LMs). However , word-level prediction is typically agnostic of subword-level information (characters and character sequences) and operates over a closed vocabulary, consisting of a limited word set. Indeed, while subword-aware models boost performance across a variety of NLP tasks, previous work did not evaluate the ability of these models to assist next-word prediction in language modeling tasks. Such subword-level informed models should be particularly effective for morphologically-rich languages (MRLs) that exhibit high type-to-token ratios. In this work, we present a large-scale LM study on 50 typologically diverse languages covering a wide variety of morphological systems, and offer new LM benchmarks to the community , while considering subword-level information. The main technical contribution of our work is a novel method for injecting subword-level information into semantic word vectors, integrated into the neural language modeling training, to facilitate word-level prediction. We conduct experiments in the LM setting where the number of infrequent words is large, and demonstrate strong perplexity gains across our 50 languages, especially for morphologically-rich languages. Our code and data sets are publicly available.
translated by 谷歌翻译
递归神经网络(RNN)已被广泛用于处理自然语言任务并取得巨大成功。传统的RNN通常统一和平等地对待句子中的每个句子。然而,这可能会错过句子的丰富语义结构信息,这对理解自然语言很有用。由于诸如词依赖模式之类的语义结构未被参数化,因此捕获和利用结构信息是一项挑战。在本文中,我们提出了一种改进的RNN变体,即多信道RNN(MC-RNN),以动态捕获和利用本地语义结构信息。具体地,MC-RNN包含多个信道,每个信道一次表示局部依赖模式。根据语义信息,引入注意机制以在每个步骤组合这些模式。然后,我们通过自适应地选择通道之间最合适的连接结构来参数化结构信息。通过这种方式,MC-RNN可以很好地捕获多语言结构和句子中的依赖模式。为了验证MC-RNN的有效性,我们对典型的自然语言处理任务进行了广泛的实验,包括神经机器翻译,抽象概括和语言建模。关于这些任务的实验结果都显示出MC-RNN相对于当前系统的显着改进。
translated by 谷歌翻译
Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector. Depending on its position in the table, a word is jointly represented by two components: a row vector and a column vector. Since the words in the same row share the row vector and the words in the same column share the column vector, we only need 2 |V | vectors to represent a vocabulary of |V | unique words, which are far less than the |V | vectors required by existing approaches. Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets. The results show that our algorithm significantly reduces the model size and speeds up the training process, without sacrifice of accuracy (it achieves similar, if not better, perplexity as compared to state-of-the-art language models). Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves comparable perplexity to previous language models, whilst reducing the model size by a factor of 40-100, and speeding up the training process by a factor of 2. We name our proposed algorithm LightRNN to reflect its very small model size and very high training speed.
translated by 谷歌翻译
We propose a selective encoding model to extend the sequence-to-sequenceframework for abstractive sentence summarization. It consists of a sentenceencoder, a selective gate network, and an attention equipped decoder. Thesentence encoder and decoder are built with recurrent neural networks. Theselective gate network constructs a second level sentence representation bycontrolling the information flow from encoder to decoder. The second levelrepresentation is tailored for sentence summarization task, which leads tobetter performance. We evaluate our model on the English Gigaword, DUC 2004 andMSR abstractive sentence summarization datasets. The experimental results showthat the proposed selective encoding model outperforms the state-of-the-artbaseline models.
translated by 谷歌翻译
连续单词表示(又名单词嵌入)是自然语言处理任务中使用的许多基于神经网络的模型的基本构建模块。虽然人们普遍认为具有相似语义的单词应该在嵌入空间中相互关联,但我们发现该单词嵌入在几个任务中的嵌入偏向于词频:高频和低频词的嵌入位于嵌入空间的不同子区域,并且即使它们在语义上相似,也可以将罕见词和流行词的嵌入相互远离。 。这使得学习的wordembeddings无效,特别是对于罕见的单词,因此限制了这些神经网络模型的性能。在本文中,我们开发了一种简洁而有效的方法来学习\ emph {FRequency-Agnostic word Embedding}(FRAGE),使用对抗训练。我们对四种自然语言处理任务中的tendatasets进行了全面的研究,包括词语相似度,语言建模,机器翻译和文本分类。结果表明,使用FRAGE,我们在所有任务中都比基线具有更高的性能。
translated by 谷歌翻译
In this paper, we propose a novel neural approach for paraphrase generation.Conventional para- phrase generation methods either leverage hand-written rulesand thesauri-based alignments, or use statistical machine learning principles.To the best of our knowledge, this work is the first to explore deep learningmodels for paraphrase generation. Our primary contribution is a stackedresidual LSTM network, where we add residual connections between LSTM layers.This allows for efficient training of deep LSTMs. We evaluate our model andother state-of-the-art deep learning models on three different datasets: PPDB,WikiAnswers and MSCOCO. Evaluation results demonstrate that our modeloutperforms sequence to sequence, attention-based and bi- directional LSTMmodels on BLEU, METEOR, TER and an embedding-based sentence similarity metric.
translated by 谷歌翻译
We propose BlackOut, an approximation algorithm to efficiently train massive recurrent neural network language models (RNNLMs) with million word vocabularies. BlackOut is motivated by using a discriminative loss, and we describe a weighted sampling strategy which significantly reduces computation while improving stability, sample efficiency, and rate of convergence. One way to understand BlackOut is to view it as an extension of the DropOut strategy to the output layer, wherein we use a discriminative training loss and a weighted sampling scheme. We also establish close connections between BlackOut, importance sampling , and noise contrastive estimation (NCE). Our experiments, on the recently released one billion word language modeling benchmark, demonstrate scalabil-ity and accuracy of BlackOut; we outperform the state-of-the art, and achieve the lowest perplexity scores on this dataset. Moreover, unlike other established methods which typically require GPUs or CPU clusters, we show that a carefully implemented version of BlackOut requires only 1-10 days on a single machine to train a RNNLM with a million word vocabulary and billions of parameters on one billion words. Although we describe BlackOut in the context of RNNLM training, it can be used to any networks with large softmax output layers.
translated by 谷歌翻译
许多任务(包括语言生成)都受益于学习输出空间的结构,特别是当输出标签的空间很大且数据稀疏时。最先进的神经语言模型直接捕获分类器权重中的输出空间结构,因为它们缺少输出标签之间的参数共享。学习共享输出标签映射有所帮助,但现有方法的表达能力有限,容易过度拟合。在本文中,我们研究了更强大的共享映射对输出标签的有用性,并提出了一种深层残差输出映射,层间丢失以更好地捕获输出空间的结构并避免过度拟合。对三种语言生成任务的评估表明,输出标签映射可以匹配或改进最先进的循环和自我关注架构,并建议分类器不一定需要高级别才能更好地模拟自然语言,如果它更好捕获输出空间的结构。
translated by 谷歌翻译