Recent work on the problem of latent tree learning has made it possible totrain neural networks that learn to both parse a sentence and use the resultingparse to interpret the sentence, all without exposure to ground-truth parsetrees at training time. Surprisingly, these models often perform better atsentence understanding tasks than models that use parse trees from conventionalparsers. This paper aims to investigate what these latent tree learning modelslearn. We replicate two such models in a shared codebase and find that (i) onlyone of these models outperforms conventional tree-structured models on sentenceclassification, (ii) its parsing strategies are not especially consistentacross random restarts, (iii) the parses it produces tend to be shallower thanstandard Penn Treebank (PTB) parses, and (iv) they do not resemble those of PTBor any other semantic or syntactic formalism that the authors are aware of.
translated by 谷歌翻译
We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences. In contrast with prior work on tree-structured models in which the trees are either provided as input or predicted using supervision from explicit treebank annotations, the tree structures in this work are optimized to improve performance on a downstream task. Experiments demonstrate the benefit of learning task-specific composition orders, outperforming both sequential encoders and recursive encoders based on treebank annotations. We analyze the induced trees and show that while they discover some linguistically intuitive structures (e.g., noun phrases, simple verb phrases), they are different than conventional English syntactic structures.
translated by 谷歌翻译
句子嵌入是大多数基于深度学习的NLP任务的有效特征表示。一种流行的方法是使用递归树结构网络来嵌入具有任务特定结构的句子。但是,现有模型没有明确的机制来强调树结构中的信息量。为此,我们提出了一种延迟递归树模型(AR-Tr​​ee),其中的单词根据它们在任务中的重要性进行动态定位。具体来说,我们在一个提议的重要第一策略中构造一个句子的潜在树,并将更多注意力的词放在更接近根的位置;因此,AR-Tr​​ee可以在句子嵌入的自下而上组成期间固有地强调重要的单词。我们为AR-Tr​​ee提出了一种端到端的强化培训策略,该策略在三句话理解任务中表现出始终优于或至少可与最先进的句子嵌入方法相媲美。
translated by 谷歌翻译
递归神经网络(RNN)模型广泛用于处理由潜在树结构控制的顺序数据。以前的工作表明,RNN模型(特别是基于长短期记忆(LSTM)的模型)可以学习利用底层树结构。但是,它的性能始终落后于基于树的模型。这项工作提出了一种新的归纳biasOrdered神经元,它强制执行隐藏状态神经元之间更新频率的顺序。我们证明有序神经元可以将thelatent树结构明确地整合到循环模型中。为此,我们提出了一种新的RNNunit:ON-LSTM,它在四个不同的任务上取得了良好的性能:语言建模,无监督解析,有针对性的句法评估和逻辑推理。
translated by 谷歌翻译
深度NLP模型受益于数据中的底层结构 - 例如,parsetrees--通常使用现成的解析器提取。最近联合学习潜在结构的尝试遇到了权衡:要么制定限制表达的假设,要么牺牲端到端的可区分性。使用最近提出的SparseMAP推理,其在潜在结构上进行稀疏分布,我们提出了与潜在预测器一起用于潜在结构预测器的端到端学习的新方法。据我们所知,我们的方法是从全局结构中首次实现无限制的动态计算图构造,同时保持可微性。
translated by 谷歌翻译
For years, recursive neural networks (RvNNs) have been shown to be suitable for representing text into fixed-length vectors and achieved good performance on several natural language processing tasks. However, the main drawback of RvNNs is that they require structured input, which makes data preparation and model implementation hard. In this paper , we propose Gumbel Tree-LSTM, a novel tree-structured long short-term memory architecture that learns how to compose task-specific tree structures only from plain text data efficiently. Our model uses Straight-Through Gumbel-Softmax estimator to decide the parent node among candidates dynamically and to calculate gradients of the discrete decision. We evaluate the proposed model on natural language inference and sentiment analysis, and show that our model outperforms or is at least comparable to previous models. We also find that our model converges significantly faster than other models.
translated by 谷歌翻译
本文描述了一种神经语义解析器,它将自然语言的容量映射到逻辑形式,这些逻辑形式可以针对任务特定的环境(例如知识库或数据库)执行,以产生响应。 Theparser使用基于转换的方法生成树形结构的逻辑形式,该方法将通用树生成算法与逻辑语言定义的域生成陷阱相结合。生成过程由结构化的递归神经网络建模,该网络提供对这些初始上下文和生成历史的丰富编码以进行预测。为了解决自然语言和逻辑形式标记之间的差异,探索了各种关注机制。最后,我们考虑神经语义解析器的不同训练设置,包括给出注释逻辑形式的完全监督训练,提供指示的弱监督训练,以及只有未标记句子和知识库可用的远程监督。跨广泛的数据集的实验证明了我们的解析器的有效性。
translated by 谷歌翻译
递归神经网络语法(RNNG)是语言的生成模型,其通过以自上而下,从左到右的顺序递增地生成asyntax树和句子来联合地模拟语法和表面结构。受监督的RNNGsachieve强大的语言建模和解析性能,但需要解析的解析树语料库。在这项工作中,我们尝试了无人监督的RNNG学习。由于在潜在树木的空间上直接边缘化是难以处理的,我们改为采用摊销的变分推理。为了最大化证据下限,我们开发了一个推理网络,参数化为自然CRF选区解析器。在语言建模方面,无人监督的RNNG以及英语和汉语基准测试中的监督对应物。关于选区语法归纳,它们与最近的神经语言模型竞争,这些语言模型通过注意机制从单词引导树结构。
translated by 谷歌翻译
We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.
translated by 谷歌翻译
大多数现有的递归神经网络(RvNN)体系结构仅利用解析树的结构,忽略了作为解析产品提供的句法标签。我们提出了一种新颖的RvNN架构,它可以通过考虑从结构和语言标签中获得的综合语法信息来提供动态组合。具体来说,我们介绍一个由单独的tag-leveltree-LSTM构造的结构感知标签表示。这样,我们可以通过将表示作为树LSTM的门函数的补充输入来扩展来控制现有的词级树LSTM的组合函数。我们表明,与先前的树形结构模型和其他复杂的神经模型进行比较时,建立在所提出的体系结构上的模型在几个句子级别任务上获得了优越的性能,例如情感分析和自然语言推理。特别是,我们的模型实现了新的状态。 StanfordSentiment Treebank,Movie Review和Text Retrieval Conference数据集的艺术成果。
translated by 谷歌翻译
Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparse-ness. Instead, we introduce a Compo-sitional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. The CVG improves the PCFG of the Stanford Parser by 3.8% to obtain an F1 score of 90.4%. It is fast to train and implemented approximately as an efficient reranker it is about 20% faster than the current Stanford factored parser. The CVG learns a soft notion of head words and improves performance on the types of ambiguities that require semantic information such as PP attachments.
translated by 谷歌翻译
在过去几年中,神经网络重新成为强大的机器学习模型,在图像识别和语音处理等领域产生了最先进的结果。最近,神经网络模型开始应用于文本自然语言信号,同样具有非常有希望的结果。本教程从自然语言处理研究的角度对神经网络模型进行了调查,试图通过神经技术使自然语言研究人员加快速度。本教程介绍了自然语言任务,前馈网络,卷积网络,循环网络和递归网络的输入编码,以及自动梯度计算的计算图形抽象。
translated by 谷歌翻译
回答需要多步推理的构成问题具有挑战性。我们引入了一个端到端的可微分模型,用于解释关于知识图(KG)的问题,该模型受到正式的语义学的启发。每个文本范围都由KG和avector中的外延表示,用于捕获意义的未接触方面。学习的组合模块递归地组合成分跨度,最终为完整的句子奠定基础,从而回答问题。例如,为了解释“notgreen”,模型将“green”表示为一组KG实体,将“not”表示为无法接受的未接地矢量---然后使用该向量来参数化执行补充操作的组合函数。对于每个句子,我们构建一个包含所有可能的解析的解析图表,允许模型通过来自最终任务监督的渐变来共同学习合成操作符和输出结构。该模型学习各种具有挑战性的语义运算符,如量词,析取和组合关系,并推断出潜在的句法结构。与RNN,基于树的变量和语义解析基线相比,它还可以很好地推广到比训练数据中看到的更长的问题。
translated by 谷歌翻译
We propose a technique for learning representations of parser states in transition-based dependency parsers. Our primary innovation is a new control structure for sequence-to-sequence neural networks-the stack LSTM. Like the conventional stack data structures used in transition-based parsing, elements can be pushed to or popped from the top of the stack in constant time, but, in addition, an LSTM maintains a continuous space embedding of the stack contents. This lets us formulate an efficient parsing model that captures three facets of a parser's state: (i) unbounded look-ahead into the buffer of incoming words, (ii) the complete history of actions taken by the parser, and (iii) the complete contents of the stack of partially built tree fragments, including their internal structures. Standard backpropagation techniques are used for training and yield state-of-the-art parsing performance.
translated by 谷歌翻译
Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks , a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).
translated by 谷歌翻译
Attention networks have proven to be an effective approach for embeddingcategorical inference within a deep neural network. However, for many tasks wemay want to model richer structural dependencies without abandoning end-to-endtraining. In this work, we experiment with incorporating richer structuraldistributions, encoded using graphical models, within deep networks. We showthat these structured attention networks are simple extensions of the basicattention procedure, and that they allow for extending attention beyond thestandard soft-selection approach, such as attending to partial segmentations orto subtrees. We experiment with two different classes of structured attentionnetworks: a linear-chain conditional random field and a graph-based parsingmodel, and describe how these models can be practically implemented as neuralnetwork layers. Experiments show that this approach is effective forincorporating structural biases, and structured attention networks outperformbaseline attention models on a variety of synthetic and real tasks: treetransduction, neural machine translation, question answering, and naturallanguage inference. We further find that models trained in this way learninteresting unsupervised hidden representations that generalize simpleattention.
translated by 谷歌翻译
树形结构神经网络利用有价值的句法解析信息来解释句子的含义。然而,它们遭受两个关键技术问题,这些问题使得它们对于大规模NLP任务变得缓慢和笨拙:它们通常在解析的句子上运行,并且它们不直接支持计算。我们通过引入Stack-augmentedParser-Interpreter神经网络(SPINN)来解决这些问题,该网络通过将树结构的句子解释集成到shift-reduce解析器的线性顺序结构中,在单个树序列混合模型中结合解析和解释。我们的模型支持批量计算,比其他树形结构模型加速25倍,并且其集成的解析器可以在未解析的数据上运行,几乎没有精度损失。我们在斯坦福NLI蕴涵任务上对其进行评估,并表明它明显优于其他能力编码模型。
translated by 谷歌翻译
Recurrent neural networks (RNNs) process input text sequentially and modelthe conditional transition between word tokens. In contrast, the advantages ofrecursive networks include that they explicitly model the compositionality andthe recursive structure of natural language. However, the current recursivearchitecture is limited by its dependence on syntactic tree. In this paper, weintroduce a robust syntactic parsing-independent tree structured model, NeuralTree Indexers (NTI) that provides a middle ground between the sequential RNNsand the syntactic treebased recursive models. NTI constructs a full n-ary treeby processing the input text with its node function in a bottom-up fashion.Attention mechanism can then be applied to both structure and node function. Weimplemented and evaluated a binarytree model of NTI, showing the model achievedthe state-of-the-art performance on three different NLP tasks: natural languageinference, answer sentence selection, and sentence classification,outperforming state-of-the-art recurrent and recursive neural networks.
translated by 谷歌翻译
我们介绍了递归神经网络语法,具有显式短语结构的句子的概率模型。我们解释了有效的推理过程,允许应用于解析和语言建模。实验表明,它们提供了更好的英语解析,比任何单独的先前发布的监督生成模型和更好的语言建模,使用最先进的英语和汉语连续RNN。
translated by 谷歌翻译
One major idea in structured prediction is to assume that the predictor computes its output by finding the maximum of a score function. The training of such a predictor can then be cast as the problem of finding weights of the score function so that the output of the predictor on the inputs matches the corresponding structured labels on the training set. A similar problem is studied in inverse reinforcement learning (IRL) where one is given an environment and a set of trajectories and the problem is to find a reward function such that an agent acting optimally with respect to the reward function would follow trajectories that match those in the training set. In this paper we show how IRL algorithms can be applied to structured prediction, in particular to parser training. We present a number of recent incremental IRL algorithms in a unified framework and map them to parser training algorithms. This allows us to recover some existing parser training algorithms, as well as to obtain a new one. The resulting algorithms are compared in terms of their sensitivity to the choice of various parameters and generalization ability on the Penn Treebank WSJ corpus.
translated by 谷歌翻译