We study grammar induction with mildly context-sensitive grammars for unsupervised discontinuous parsing. Using the probabilistic linear context-free rewriting system (LCFRS) formalism, our approach fixes the rule structure in advance and focuses on parameter learning with maximum likelihood. To reduce the computational complexity of both parsing and parameter estimation, we restrict the grammar formalism to LCFRS-2 (i.e., binary LCFRS with fan-out two) and further discard rules that require O(n^6) time to parse, reducing inference to O(n^5). We find that using a large number of nonterminals is beneficial and thus make use of tensor decomposition-based rank-space dynamic programming with an embedding-based parameterization of rule probabilities to scale up the number of nonterminals. Experiments on German and Dutch show that our approach is able to induce linguistically meaningful trees with continuous and discontinuous structures
translated by 谷歌翻译
结构分布,即组合空间的分布,通常用于学习观察到数据的潜在概率表示。然而,缩放这些模型是由高计算和内存复杂度相对于潜在表示的大小的瓶颈。诸如隐藏的马尔可夫模型(HMMS)和概率的无内容语法(PCFG)的常见模型在隐藏状态的数量中需要时间和空间二次和立方。这项工作展示了一种简单的方法来降低大类结构化模型的计算和内存复杂性。我们展示通过将中央推理步骤视为矩阵 - 矢量产品,并使用低秩约束,我们可以通过等级进行模型表达性和速度。用神经参数化结构化模型进行语言建模,复音音乐建模,无监督语法诱导和视频建模的实验表明,我们的方法在提供实用加速度的同时匹配大状态空间的标准模型的准确性。
translated by 谷歌翻译
具有神经网络的顺序序列学习已成为序列预测任务的事实标准。这种方法通常使用强大的神经网络模拟本地分布,该方法可以在任意上下文上条件。虽然灵活和性能,这些模型通常需要大型数据集进行培训,并且可以在旨在测试组成概括的基准上非常失败。这项工作探讨了与准同步语法的序列到序列学习的替代,分层方法,其中目标树中的每个节点由源区中的节点传电。源和靶树木都被视为潜在的并在训练期间诱导。我们开发了语法的神经参数化,它能够在没有手动功能工程的情况下通过Combinatial规则的组合空间共享参数。我们将此潜在的神经语法应用于各种域 - 一种诊断语言导航任务,旨在测试组成泛化(扫描),样式转移和小型机器翻译,并发现它与标准基线相比表现得尊重。
translated by 谷歌翻译
通常认为语言模型能够编码语法[Tenney等,2019; Jawahar等,2019; Hewitt和Manning,2019]。在本文中,我们提出了UPOA,这是一种无监督的组成分析模型,该模型仅基于以验证的语言模型学习为跨度分割的句法距离,仅基于自我发挥的权重矩阵来计算出OUT关联得分。我们进一步提出了一个增强的版本UPIO,该版本利用了内部关联和外部关联得分来估计跨度的可能性。使用UPOA和UPIO的实验揭示了自我注意机制中查询和密钥的线性投影矩阵在解析中起重要作用。因此,我们将无监督的模型扩展到了几个射击模型(FPOA,FPIO),这些模型使用一些注释的树来学习更好的线性投影矩阵进行解析。宾夕法尼亚河岸上的实验表明,我们的无监督解析模型UPIO实现了与短句子(长度<= 10)相当的结果。我们的几个解析模型FPIO接受了仅20棵带注释的树木的训练,优于前几种镜头解析方法,该方法接受了50棵带注释的树木的训练。交叉解析的实验表明,无监督和少数解析方法都比SPMRL大多数语言的先前方法都更好[Seddah等,2013]。
translated by 谷歌翻译
We propose a transition-based approach that, by training a single model, can efficiently parse any input sentence with both constituent and dependency trees, supporting both continuous/projective and discontinuous/non-projective syntactic structures. To that end, we develop a Pointer Network architecture with two separate task-specific decoders and a common encoder, and follow a multitask learning strategy to jointly train them. The resulting quadratic system, not only becomes the first parser that can jointly produce both unrestricted constituent and dependency trees from a single model, but also proves that both syntactic formalisms can benefit from each other during training, achieving state-of-the-art accuracies in several widely-used benchmarks such as the continuous English and Chinese Penn Treebanks, as well as the discontinuous German NEGRA and TIGER datasets.
translated by 谷歌翻译
在本文中,我们试图通过引入深度学习模型的句法归纳偏见来建立两所学校之间的联系。我们提出了两个归纳偏见的家族,一个家庭用于选区结构,另一个用于依赖性结构。选区归纳偏见鼓励深度学习模型使用不同的单位(或神经元)分别处理长期和短期信息。这种分离为深度学习模型提供了一种方法,可以从顺序输入中构建潜在的层次表示形式,即更高级别的表示由高级表示形式组成,并且可以分解为一系列低级表示。例如,在不了解地面实际结构的情况下,我们提出的模型学会通过根据其句法结构组成变量和运算符的表示来处理逻辑表达。另一方面,依赖归纳偏置鼓励模型在输入序列中找到实体之间的潜在关系。对于自然语言,潜在关系通常被建模为一个定向依赖图,其中一个单词恰好具有一个父节点和零或几个孩子的节点。将此约束应用于类似变压器的模型之后,我们发现该模型能够诱导接近人类专家注释的有向图,并且在不同任务上也优于标准变压器模型。我们认为,这些实验结果为深度学习模型的未来发展展示了一个有趣的选择。
translated by 谷歌翻译
许多自然语言处理任务,例如核心解决方案和语义角色标签,都需要选择文本跨度并就其做出决定。此类任务的典型方法是为所有可能的跨度评分,并贪婪地选择特定任务的下游处理的跨度。然而,这种方法并未纳入有关应选择哪种跨度的诱导偏见,例如,选定的跨度倾向于是句法成分。在本文中,我们提出了一种新型的基于语法的结构化选择模型,该模型学会了利用为此类问题提供的部分跨度注释。与以前的方法相比,我们的方法摆脱了启发式贪婪的跨度选择方案,使我们能够在一组最佳跨度上对下游任务进行建模。我们在两个流行的跨度预测任务上评估我们的模型:核心分辨率和语义角色标签。我们对两者都展示了经验改进。
translated by 谷歌翻译
In order to achieve deep natural language understanding, syntactic constituent parsing is a vital step, highly demanded by many artificial intelligence systems to process both text and speech. One of the most recent proposals is the use of standard sequence-to-sequence models to perform constituent parsing as a machine translation task, instead of applying task-specific parsers. While they show a competitive performance, these text-to-parse transducers are still lagging behind classic techniques in terms of accuracy, coverage and speed. To close the gap, we here extend the framework of sequence-to-sequence models for constituent parsing, not only by providing a more powerful neural architecture for improving their performance, but also by enlarging their coverage to handle the most complex syntactic phenomena: discontinuous structures. To that end, we design several novel linearizations that can fully produce discontinuities and, for the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks, obtaining competitive results on par with task-specific discontinuous constituent parsers and achieving state-of-the-art scores on the (discontinuous) English Penn Treebank.
translated by 谷歌翻译
Most recent studies on neural constituency parsing focus on encoder structures, while few developments are devoted to decoders. Previous research has demonstrated that probabilistic statistical methods based on syntactic rules are particularly effective in constituency parsing, whereas syntactic rules are not used during the training of neural models in prior work probably due to their enormous computation requirements. In this paper, we first implement a fast CKY decoding procedure harnessing GPU acceleration, based on which we further derive a syntactic rule-based (rule-constrained) CKY decoding. In the experiments, our method obtains 95.89 and 92.52 F1 on the datasets of PTB and CTB respectively, which shows significant improvements compared with previous approaches. Besides, our parser achieves strong and competitive cross-domain performance in zero-shot settings.
translated by 谷歌翻译
We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism -- one that is independent of composed syntactic representations -- plays an important role in current successful models of long text.
translated by 谷歌翻译
超过三十年,研究人员已经开发和分析了潜伏树诱导的方法作为无监督句法解析的方法。尽管如此,与其监督的对应物相比,现代系统仍然不足以使其具有任何实际用途作为文本的结构注释。在这项工作中,我们提出了一种技术,该技术以跨度约束(即短语包围)的形式使用远端监督,以提高在无监督选项解析中的性能。使用相对少量的跨度约束,我们可以大大提高Diora的输出,这是一个已经竞争的无监督解析系统。与完整的解析树注释相比,可以通过最小的努力来获取跨度约束,例如使用从维基百科派生的词典,以查找确切的文本匹配。我们的实验显示了基于实体的跨度约束,提高了英语WSJ Penn TreeBank的选区分析超过5 F1。此外,我们的方法延伸到跨度约束易于实现的任何域,以及作为一个案例研究,我们通过从工艺数据集解析生物医学文本来证明其有效性。
translated by 谷歌翻译
语义角色标签(SRL)是NLP社区的一项基本而艰巨的任务。 SRL的最新作品主要分为两行:1)基于生物的; 2)基于跨度的。尽管普遍存在,但它们具有不考虑内部论证结构的一些内在缺点,可能会阻碍模型的表现力。关键挑战是参数是平坦的结构,并且在参数中没有确定的子树实现。为了解决这个问题,在本文中,我们建议将平坦的论点跨越为潜在子树,因此将SRL缩小为树解析任务。特别是,我们为制剂配备了新型的跨度限制的treecrf,以使树结构跨度感知,并将其进一步扩展到二阶情况。我们在Conll05和Conll12基准测试上进行了广泛的实验。结果表明,我们的方法的性能比所有以前的语法 - 不知不线作品都更好,在端到端和w/ w/ w/ gold prepticates设置下实现了新的最先进的作品。
translated by 谷歌翻译
已显示通用非结构化神经网络在分布外的组成概述上挣扎。通过示例重组的组成数据增强已经转移了一些关于组成性的关于多个语义解析任务的黑盒神经模型的先前知识,但这通常需要特定于任务的工程或提供有限的收益。我们使用称为组成结构学习者(CSL)的型号提供更强大的数据重组方法。 CSL是一种具有拟同步无线语法骨干的生成模型,我们从训练数据中诱导。我们从CSL中进行重组的例子,并将其添加到预先训练的序列到序列模型(T5)的微调数据中。该程序有效地将大多数CSL的组成偏差转移到T5以进行诊断任务,并导致模型比在两个真实世界的组成泛化任务上的T5-CSL集合更强。这导致新的最先进的性能,这些挑战性的语义解析任务需要泛化自然语言变异和元素的新组成。
translated by 谷歌翻译
我们在佩恩 - 赫尔辛基解析的早期现代英语(PPCEME)中的第一个解析结果,是一个190万字的TreeBank,这是句法变化研究的重要资源。我们描述了PPCEME的关键特征,使其成为解析的挑战,包括比Penn TreeBank中更大且更多样化的功能标签。我们使用伯克利神经解析器的修改版本为此语料库提出了结果,以及Gabbard等人的功能标签恢复的方法(2006)。尽管其简单性,这种方法令人惊讶地令人惊讶地令人惊讶的是,建议可以以足够的准确度恢复原始结构,以支持语言应用(例如,寻找涉及的句法结构)。然而,对于函数标签的子集(例如,指示直接演讲的标签),需要额外的工作,我们讨论了这种方法的一些进一步限制。由此产生的解析器将用于在网上解析早期英语书籍,一个11亿字形的语料库,其实用性对于句法变化的效用将大大增加,加入准确的解析树。
translated by 谷歌翻译
In constituency parsing, span-based decoding is an important direction. However, for Chinese sentences, because of their linguistic characteristics, it is necessary to utilize other models to perform word segmentation first, which introduces a series of uncertainties and generally leads to errors in the computation of the constituency tree afterward. This work proposes a method for joint Chinese word segmentation and Span-based Constituency Parsing by adding extra labels to individual Chinese characters on the parse trees. Through experiments, the proposed algorithm outperforms the recent models for joint segmentation and constituency parsing on CTB 5.1.
translated by 谷歌翻译
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.
translated by 谷歌翻译
Are extralinguistic signals such as image pixels crucial for inducing constituency grammars? While past work has shown substantial gains from multimodal cues, we investigate whether such gains persist in the presence of rich information from large language models (LLMs). We find that our approach, LLM-based C-PCFG (LC-PCFG), outperforms previous multi-modal methods on the task of unsupervised constituency parsing, achieving state-of-the-art performance on a variety of datasets. Moreover, LC-PCFG results in an over 50% reduction in parameter count, and speedups in training time of 1.7x for image-aided models and more than 5x for video-aided models, respectively. These results challenge the notion that extralinguistic signals such as image pixels are needed for unsupervised grammar induction, and point to the need for better text-only baselines in evaluating the need of multi-modality for the task.
translated by 谷歌翻译
我们探索无监督模型解释和语法诱导的文本表示的深度聚类。由于这些表示是高维的,因此邮件中的开箱即用的方法不起作用。因此,我们的方法共同将表示转换为较低的聚类友好空间并群集它们。我们考虑了两种语法:在这项工作中的语音感应(POSI)和选区标签(Colab)的一部分。有趣的是,我们发现多语言伯爵(Mbert)包含令人惊讶的英语句法知识;甚至可能和英语伯特(Ebert)一样多。我们的模型可用作无可自由的探针,可使可以是较少偏见的探测方式。我们发现与监督探针相比,无监督探针显示出较高层次的益处。我们进一步注意,我们无监督的探测器使用Ebert和Mbert表示不同,特别是对于POSI。我们通过将其作为无监督的语法感应技术证明其能力来验证探针的功效。通过简单地调整输入表示,我们的探针适用于句法形式主义。我们举报了我们在45标签的英语POSI上探讨的竞争性表现,在10个语言上的12标签POSI上的最先进性能,以及Colab上的竞争结果。我们还对资源贫困语言进行零拍语法归纳,并报告强劲的结果。
translated by 谷歌翻译
诸如变形金刚和LSTMS之类的流行模型将令牌用作其信息单位。也就是说,每个令牌都被编码为向量表示,这些向量直接在计算中使用。但是,人类经常考虑跨令牌(即短语)而不是其组成代币。在本文中,我们介绍了TreeFormer,这是一个受CKY算法和变压器启发的体系结构,该体系结构学习了组成操作员和汇总功能,以构建针对短语和句子的层次编码。我们的广泛实验证明了将层次结构纳入变压器的好处,并且与机器翻译,抽象性摘要和各种自然语言理解任务相比,与基线变压器相比显示出重大改进。
translated by 谷歌翻译
以前的语音(POS)归纳模型通常假设某些独立假设(例如,马尔可夫,单向,本地依赖性),这些假设不具有真实语言。例如,主题 - 动词协议可以是长期和双向的。为了促进灵活的依赖性建模,我们提出了一个蒙版的言论部分模型(MPOSM),灵感来自蒙版语言模型(MLM)的最新成功。 MPOSM可以通过掩盖POS重建的目的对任意标签依赖性建模并执行POS归纳。我们在英语Penn WSJ数据集以及包含10种不同语言的通用树库中取得了竞争成果。尽管对长期依赖性进行建模应该理想地有助于这项任务,但我们的消融研究表明,不同语言的趋势不同。为了更好地理解这种现象,我们设计了一个新颖的合成实验,可以专门诊断该模型学习标签一致性的能力。令人惊讶的是,我们发现即使强大的基线也无法在非常简化的设置中始终如一地解决这个问题:相邻单词之间的一致性。尽管如此,MPOSM仍能取得更好的性能。最后,我们进行了详细的错误分析,以阐明其他剩余挑战。我们的代码可从https://github.com/owenzx/mposm获得
translated by 谷歌翻译