在这项工作中,我们统一了一个框架中的标点符号预测的几个现有的解码策略,并引入了一种新的策略,该策略在不同窗口中使用每个单词的多个预测。我们表明,通过在培训模型之后优化这些策略,可以实现显着的改进,只能导致推理时间的潜在增加,没有要求再培训。我们进一步使用我们的解码策略框架,以便在实时设置中的标记和分类方法的第一次比较。我们的研究结果表明,当较少或没有右侧上下文时,标点符号预测的分类方法可能是有益的。
translated by 谷歌翻译
确保适当的标点符号和字母外壳是朝向应用复杂的自然语言处理算法的关键预处理步骤。这对于缺少标点符号和壳体的文本源,例如自动语音识别系统的原始输出。此外,简短的短信和微博的平台提供不可靠且经常错误的标点符号和套管。本调查概述了历史和最先进的技术,用于恢复标点符号和纠正单词套管。此外,突出了当前的挑战和研究方向。
translated by 谷歌翻译
A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we fine-tune a general-purpose, large language model to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We compare to several segmentation strategies and find that our approach improves BLEU score on three languages by an average of 2.7 BLEU overall compared to an automatic punctuation baseline. Further, we demonstrate the effectiveness of two constrained decoding strategies to improve well-formedness of the model output from above 99% to 100%.
translated by 谷歌翻译
本文介绍了流媒体和非流定向晶体翻译的统一端到端帧工作。虽然非流媒体语音翻译的培训配方已经成熟,但尚未建立流媒体传播的食谱。在这项工作中,WEFOCUS在开发一个统一的模型(UNIST),它从基本组成部分的角度支持流媒体和非流媒体ST,包括培训目标,注意机制和解码政策。对最流行的语音到文本翻译基准数据集,MERE-C的实验表明,与媒体ST的BLEU评分和延迟度量有更好的折衷和液化标准端到端基线和级联模型。我们将公开提供我们的代码和评估工具。
translated by 谷歌翻译
目前,用于训练语言模型的最广泛的神经网络架构是所谓的BERT,导致各种自然语言处理(NLP)任务的改进。通常,BERT模型中的参数的数量越大,这些NLP任务中获得的结果越好。不幸的是,内存消耗和训练持续时间随着这些模型的大小而大大增加。在本文中,我们调查了较小的BERT模型的各种训练技术:我们将不同的方法与Albert,Roberta和相对位置编码等其他BERT变体相结合。此外,我们提出了两个新的微调修改,导致更好的性能:类开始终端标记和修改形式的线性链条条件随机字段。此外,我们介绍了整个词的注意力,从而降低了伯特存储器的使用,并导致性能的小幅增加,与古典的多重关注相比。我们评估了这些技术的五个公共德国命名实体识别(NER)任务,其中两条由这篇文章引入了两项任务。
translated by 谷歌翻译
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
机器翻译模型具有离散的词汇表,通常使用子字分段技术来实现“开放词汇”。这种方法依赖于一致和正确的基础Unicode序列,并使模型易于从常见类型的噪声和变化中降解。通过人类语言处理的稳健性,我们提出了使用视觉文本表示的使用,这些表征占据了一个有限的文本嵌入,支持通过处理具有滑动窗口的视觉渲染文本而创建的连续词汇。我们展示了使用视觉文本表示方法或在小型和较大数据集中匹配传统文本模型的性能的模型。更重要的是,具有视觉嵌入的模型对不同类型的噪声展示了显着的稳健性,实现了诸如字符中的25.9 bleu允许德语 - 英语任务,其中子字模型降低到1.9。
translated by 谷歌翻译
许多NLP任务需要处理超出预磨模模型的长度限制的长语境。为了将这些模型扩展到更长的文本序列,已经提出了许多有效的远程注意力变体。尽管沿着这个方向进行了丰富的研究,但仍然难以在实际用例中衡量这些模型的相对有效性,例如,如果我们在预先rain-yfetune范式之后应用这些模型。在这项工作中,我们的目标是对这些具有大规模和受控实验的这些新兴模型进行彻底的分析。对于每个关注变体,我们使用相同的长DOC语料库,然后使用相同的长DOC语料库,然后为现实世界的长情节任务进行芬特这些模型。我们的调查结果揭示了现有广泛使用的远程基准的陷阱,并显示任何经过测试的高效关注可以在标准预介质范式下击败一个简单的本地窗口关注。对本地注意力变化的进一步分析表明,即使是常用的注意力窗口重叠也没有必要实现良好的下游结果 - 使用不相交的本地关注,我们能够构建符合性能的更简单且更高效的Long-Doc QA模型霍尔福勒〜\ citep {longformer}其预先花费的一半。
translated by 谷歌翻译
自回归(AR)和非自动增加(NAR)模型对性能和延迟具有自己的优势,将它们与一个模型相结合,可能会利用两者。目前的组合框架更多地关注多个解码范例的集成,具有统一的生成模型,例如,屏蔽语言模型。然而,由于训练目标和推理之间的差距,概括可能对性能有害。在本文中,我们的目标是通过在统一框架下保留AR和NAR的原始目标来缩小差距。具体地,我们通过将AR和NAR共同建模(左右,左右和直)与新引入的方向变量来提出定向变压器(Diformer),这通过控制每个的预测令牌在那方面有特定的依赖关系。通过方向实现的统一成功地保留了AR和NAR中使用的原始依赖性假设,保留了泛化和性能。 4 WMT基准测试的实验表明,Diformer优于当前的联合建模工作,适用于AR和NAR解码的1.5个以上的BLEU积分,也对最先进的独立AR和NAR模型具有竞争力。
translated by 谷歌翻译
In order to achieve deep natural language understanding, syntactic constituent parsing is a vital step, highly demanded by many artificial intelligence systems to process both text and speech. One of the most recent proposals is the use of standard sequence-to-sequence models to perform constituent parsing as a machine translation task, instead of applying task-specific parsers. While they show a competitive performance, these text-to-parse transducers are still lagging behind classic techniques in terms of accuracy, coverage and speed. To close the gap, we here extend the framework of sequence-to-sequence models for constituent parsing, not only by providing a more powerful neural architecture for improving their performance, but also by enlarging their coverage to handle the most complex syntactic phenomena: discontinuous structures. To that end, we design several novel linearizations that can fully produce discontinuities and, for the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks, obtaining competitive results on par with task-specific discontinuous constituent parsers and achieving state-of-the-art scores on the (discontinuous) English Penn Treebank.
translated by 谷歌翻译
我们提出了直接同时的语音转换(SIMUL-S2ST)模型,此外,翻译的产生与中间文本表示无关。我们的方法利用了最近与离散单位直接语音转换的最新进展,其中从模型中预测了一系列离散表示,而不是连续频谱图特征,而不是以无监督的方式学习,并直接传递给语音的声码器综合在一起。我们还介绍了变分单调的多口语注意力(V-MMA),以处理语音同声翻译中效率低效的政策学习的挑战。然后,同时策略在源语音特征和目标离散单元上运行。我们开展实证研究,比较级联和直接方法对Fisher西班牙语 - 英语和必需的英语西班牙语数据集。直接同步模型显示通过在翻译质量和延迟之间实现更好的权衡来优于级联模型。
translated by 谷歌翻译
基于全注意力的变压器体系结构的强大建模能力通常会导致过度拟合,并且 - 对于自然语言处理任务,导致自动回归变压器解码器中隐式学习的内部语言模型,使外部语言模型的集成变得复杂。在本文中,我们探索了放松的注意力,对注意力的重量进行了简单易于实现的平滑平滑,从编码器。其次,我们表明它自然支持外部语言模型的整合,因为它通过放松解码器中的交叉注意来抑制隐式学习的内部语言模型。我们证明了在几项任务中放松注意力的好处,并与最近的基准方法相结合,并明显改善。具体而言,我们超过了最大的最大公共唇部阅读LRS3基准的26.90%单词错误率的先前最新性能,单词错误率为26.31%,并且我们达到了最佳表现的BLEU分数37.67在IWSLT14(de $ \ rightarrow $ en)的机器翻译任务没有外部语言模型,几乎没有其他模型参数。代码和模型将公开可用。
translated by 谷歌翻译
在人类人类客户支持语音互动期间的代理协助需要根据呼叫者的意图触发工作流程(通话原因)。预测的及时性对于良好的用户体验至关重要。目的是使系统在代理商能够检测到它时检测呼叫者的意图(意图边界)。一些方法着重于预测离线输出,即,一旦ASR系统处理了完整的口语输入(例如,整个对话转弯)。每当在转弯中早些时候可以检测到意图时,这会引入预测中的不良延迟。关于语音助手的最新工作已在单词层面上使用增量实时预测,以在命令结束之前检测意图。但是,人指导和机器指导的语音具有截然不同的特征。在这项工作中,我们建议将一种在语音辅助方面开发的方法应用于在线实时呼叫者在人类口语互动中的意图检测问题。我们使用双重体系结构,其中两个LSTM共同训练:一个预测意图边界(IB),然后预测IB处的意图类别。我们在私人数据集上进行实验,其中包括来自电信客户支持域的人类电话交谈的成绩单。我们报告结果分析了系统的准确性以及不同体系结构对整体准确性和预测潜伏期之间的权衡的影响。
translated by 谷歌翻译
最近的神经机翻译研究探索了灵活的发行订单,作为左右一代的替代品。然而,培训非单调模型带来了新的并发症:如何在同一最终结果到达的订单组合爆炸时搜索良好的订单?此外,这些自动排序如何与人类翻译的实际行为进行比较?目前的模型依靠手动构建的偏见或留下自己的所有可能性。在本文中,我们分析了人工后编辑所产生的排序,并使用它们培训自动编辑后系统。我们将生成的系统与由左右和随机编辑排序训练的人进行比较。我们观察到人类倾向于遵循几乎左右的顺序,而是有趣的偏差,例如首选通过纠正标点符号或动词而开始。
translated by 谷歌翻译
The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation. Although its patterns have been exploited to perform different tasks, from neural network understanding to textual alignment, no previous work has analysed the encoder-decoder attention behavior in speech translation (ST) nor used it to improve ST on a specific task. In this paper, we fill this gap by proposing an attention-based policy (EDAtt) for simultaneous ST (SimulST) that is motivated by an analysis of the existing attention relations between audio input and textual output. Its goal is to leverage the encoder-decoder attention scores to guide inference in real time. Results on en->{de, es} show that the EDAtt policy achieves overall better results compared to the SimulST state of the art, especially in terms of computational-aware latency.
translated by 谷歌翻译
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are required for training and applying such big models, resulting in a large carbon footprint and making it difficult for researchers and practitioners to use them. We show that performance similar to GPT-3 can be obtained with language models that are much "greener" in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements. We identify key factors required for successful natural language understanding with small language models. 1
translated by 谷歌翻译
This paper presents a new UNIfied pre-trained Language Model (UNILM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UNILM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UNILM achieves new state-ofthe-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), the SQuAD question generation BLEU-4 to 22.12 (3.75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2.67 (human performance is 2.65). The code and pre-trained models are available at https://github.com/microsoft/unilm. * Equal contribution. † Contact person.
translated by 谷歌翻译
Transfer learning, where a model is first pre-trained on a data-rich task before being finetuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.
translated by 谷歌翻译
同时语音转换(Simulst)是必须在部分,增量语音输入上执行输出生成的任务。近年来,由于交叉语言应用场景的传播,如国际现场会议和流媒体讲座,Sumulst已经变得很受欢迎,因为在飞行的语音翻译中可以促进用户访问视听内容。在本文中,我们分析到目前为止所开发的Simulst系统的特征,讨论其优势和缺点。然后我们专注于正确评估系统效率所需的评估框架。为此,我们提高了更广泛的性能分析的需求,还包括用户体验的角度。实际上,Simulst Systems不仅应在质量/延迟措施方面进行评估,而且还可以通过以任务为导向的指标计费,例如,用于所采用的可视化策略。鉴于此,我们突出了社区实现的目标以及仍然缺少的目标。
translated by 谷歌翻译