问题应答系统这些天通常使用基于模板的语言生成。虽然足够适用于特定于域的任务,但这些系统对于域无关的系统来说太限性和预定义。本文提出了一个输出全长答案的系统给出一个问题和提取的事实答案(如命名实体等短跨度)作为输入。我们的系统使用选区和依赖性解析问题的树木。基于变压器的语法纠错模型Gector(2020)用作后处理步骤,以便更好流畅。我们将系统与(i)修改的指针生成器(SOTA)和(ii)微调对话框进行了比较。我们还通过更好的结果测试我们的方法(是 - 否)问题的方法。我们的模型比最先进的(SOTA)方法产生准确和流畅的答案。评估是在NewsQA和Squad数据集上完成的,分别增加0.4和0.9个百分点的速度分数。与SOTA相比,推理时间也减少了85 \%。用于我们评估的改进数据集将作为研究贡献的一部分发布。
translated by 谷歌翻译
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
translated by 谷歌翻译
We introduce a large scale MAchine Reading COmprehension dataset, which we name MS MARCO. The dataset comprises of 1,010,916 anonymized questionssampled from Bing's search query logs-each with a human generated answer and 182,669 completely human rewritten generated answers. In addition, the dataset contains 8,841,823 passages-extracted from 3,563,535 web documents retrieved by Bing-that provide the information necessary for curating the natural language answers. A question in the MS MARCO dataset may have multiple answers or no answers at all. Using this dataset, we propose three different tasks with varying levels of difficulty: (i) predict if a question is answerable given a set of context passages, and extract and synthesize the answer as a human would (ii) generate a well-formed answer (if possible) based on the context passages that can be understood with the question and passage context, and finally (iii) rank a set of retrieved passages given a question. The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering. We believe that the scale and the real-world nature of this dataset makes it attractive for benchmarking machine reading comprehension and question-answering models.
translated by 谷歌翻译
虽然考试风格的问题是一家提供各种目的的基本型教育工具,但有问题的手动构建是一个复杂的过程,需要培训,经验和资源。为减少与人工建设相关的开支并满足不需要持续供应新问题,可以使用自动问题(QG)技术。但是,与自动问题应答(QA)相比,QG是一个更具挑战性的任务。在这项工作中,我们在QA,QG的多任务设置中微调多语言T5(MT5)变压器,并使用土耳其QA DataSet回答提取任务。据我们所知,这是第一个尝试从土耳其语文本执行自动文本到文本问题的学术工作。评估结果表明,拟议的多任务设置达到了最先进的土耳其语问题应答和问题绩效,而不是TQuadv1,TQuadv2数据集和XQuad土耳其分裂。源代码和预先训练的模型可在https://github.com/obss/turkish-question-generation中获得。
translated by 谷歌翻译
Often clickbait articles have a title that is phrased as a question or vague teaser that entices the user to click on the link and read the article to find the explanation. We developed a system that will automatically find the answer or explanation of the clickbait hook from the website text so that the user does not need to read through the text themselves. We fine-tune an extractive question and answering model (RoBERTa) and an abstractive one (T5), using data scraped from the 'StopClickbait' Facebook pages and Reddit's 'SavedYouAClick' subforum. We find that both extractive and abstractive models improve significantly after finetuning. We find that the extractive model performs slightly better according to ROUGE scores, while the abstractive one has a slight edge in terms of BERTscores.
translated by 谷歌翻译
自动问题回答是电子商务中的一个重要但具有挑战性的任务,因为用户发布了有兴趣购买的产品的数百万个问题。因此,对使用有关产品的相关信息提供快速响应的自动答案生成系统存在很大的需求。他们有三种知识来源可用于接听用户发布查询,它们是评论,重复或类似的问题和规范。有效利用这些信息来源将极大地帮助我们回答复杂问题。然而,利用这些来源存在两个主要挑战:(i)存在无关信息和(ii)的存在评论和类似问题的情绪模糊。通过这项工作,我们提出了一种新的管道(MSQAP),其通过在生成响应之前分别执行相关性和歧义预测来利用上述来源中存在的丰富信息。实验结果表明,与硼基基线相比,我们的相关性预测模型(BERT-QA)优于所有其他变体,并且在F1分数中提高了12.36%。我们的生成模型(T5-QA)优于所有内容保存度量的基线,如Bleu,Rouge,并且在Bleu中的平均提高35.02%,与最高表现为基线(HSSC-Q)相比,BLEU中的198.75%。人为评估我们的管道向我们展示了我们的方法在生成模型(T5-QA)上的准确性提高了30.7%,导致我们的全部管道的方法(MSQAP)提供更准确的答案。据我们所知,这是电子商务域中的第一个工作,它自动生成自然语言答案,将目前的信息与规格,类似问题和评论数据相结合。
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
当前在提取问题答案(EQA)中进行的研究对单跨度提取设置进行了建模,其中单个答案跨度是可以预测给定问题对对的标签。对于通用域EQA来说,这种设置是自然的,因为可以单个跨度可以回答通用域中的大多数问题。遵循通用域EQA模型,当前的生物医学EQA(BIOEQA)模型利用单跨度提取设置,采用后处理步骤。在本文中,我们调查了整个普通和生物医学领域的问题分布,发现生物医学问题更可能需要列表型答案(多个答案),而不是Factoid-type答案(单个答案)。这需要能够为问题提供多个答案的模型。基于这项初步研究,我们为Bioeqa提出了一种序列标记方法,Bioeqa是一种多跨度提取设置。我们的方法直接以不同数量的短语作为答案来解决问题,并可以学会从培训数据中确定问题的答案数量。我们在BioASQ 7B和8B列表类型问题上的实验结果优于表现最佳的现有模型,而无需进行后处理步骤。源代码和资源可免费下载,网址为https://github.com/dmis-lab/seqtagqa
translated by 谷歌翻译
自动问题应答(QA)系统的目的是以时间有效的方式向用户查询提供答案。通常在数据库(或知识库)或通常被称为语料库的文件集合中找到答案。在过去的几十年里,收购知识的扩散,因此生物医学领域的新科学文章一直是指数增长。因此,即使对于领域专家,也难以跟踪域中的所有信息。随着商业搜索引擎的改进,用户可以在某些情况下键入其查询并获得最相关的一小组文档,以及在某些情况下从文档中的相关片段。但是,手动查找所需信息或答案可能仍然令人疑惑和耗时。这需要开发高效的QA系统,该系统旨在为用户提供精确和精确的答案提供了生物医学领域的自然语言问题。在本文中,我们介绍了用于开发普通域QA系统的基本方法,然后彻底调查生物医学QA系统的不同方面,包括使用结构化数据库和文本集合的基准数据集和几种提出的方​​法。我们还探讨了当前系统的局限性,并探索潜在的途径以获得进一步的进步。
translated by 谷歌翻译
过去十年互联网上可用的信息和信息量增加。该数字化导致自动应答系统需要从冗余和过渡知识源中提取富有成效的信息。这些系统旨在利用自然语言理解(NLU)从此巨型知识源到用户查询中最突出的答案,从而取决于问题答案(QA)字段。问题答案涉及但不限于用户问题映射的步骤,以获取相关查询,检索相关信息,从检索到的信息等找到最佳合适的答案等。当前对深度学习模型的当前改进估计所有这些任务的令人信服的性能改进。在本综述工作中,根据问题的类型,答案类型,证据答案来源和建模方法进行分析QA场的研究方向。此细节随后是自动问题生成,相似性检测和语言的低资源可用性等领域的开放挑战。最后,提出了对可用数据集和评估措施的调查。
translated by 谷歌翻译
现有的通用机器翻译或自然语言生成评估指标有几个问题,在这种情况下,提问(QA)系统无动于衷。为了构建强大的质量检查系统,我们需要具有等效鲁棒评估系统的能力,以验证对问题的模型预测是否类似于地面真相注释。比较基于语义而不是纯字符串重叠的相似性的能力对于公平比较模型并指出现实生活应用中更现实的接受标准很重要。我们首先建立在我们的知识论文的基础上,该论文使用基于变压器的模型指标来评估语义答案的相似性,并在没有词汇重叠的情况下实现与人类判断的更高相关性。我们提出了跨编码器增强双重编码器和Bertscore模型,以进行语义答案相似性,该模型在新的数据集中进行了培训,该数据集由美国公共人物的名称对组成。就我们而言,我们提供了第一个共同参考名称字符串对的数据集及其相似性,可用于培训。机器学习与应用第四届机器学习与应用国际会议(CMLA 2022)6月25日至2022年6月25日,哥本哈根,丹麦批量编辑:David C. Wyld,Dhinaharan Nagamalai(EDS)
translated by 谷歌翻译
Question Generation (QG) is fundamentally a simple syntactic transformation; however, many aspects of semantics influence what questions are good to form. We implement this observation by developing SynQG, a set of transparent syntactic rules leveraging universal dependencies, shallow semantic parsing, lexical resources, and custom rules which transform declarative sentences into question-answer pairs. We utilize PropBank argument descriptions and VerbNet state predicates to incorporate shallow semantic content, which helps generate questions of a descriptive nature and produce inferential and semantically richer questions than existing systems. In order to improve syntactic fluency and eliminate grammatically incorrect questions, we employ back-translation over the output of these syntactic rules. A set of crowd-sourced evaluations shows that our system can generate a larger number of highly grammatical and relevant questions than previous QG systems and that back-translation drastically improves grammaticality at a slight cost of generating irrelevant questions.
translated by 谷歌翻译
谈话问题应答需要能够正确解释问题。然而,由于在日常谈话中难以理解共同参考和省略号的难度,目前的模型仍然不令人满意。尽管生成方法取得了显着的进展,但它们仍然被语义不完整陷入困境。本文提出了一种基于动作的方法来恢复问题的完整表达。具体地,我们首先在将相应的动作分配给每个候选跨度的同时定位问题中的共同引用或省略号的位置。然后,我们寻找与对话环境中的候选线索相关的匹配短语。最后,根据预测的操作,我们决定是否用匹配的信息替换共同参考或补充省略号。我们展示了我们对英语和中文发言权重写任务的方法的有效性,在RESTORATION-200K数据集中分别在3.9 \%和Rouge-L中提高了最先进的EM(完全匹配)。
translated by 谷歌翻译
查询聚焦的文本摘要(QFTS)任务旨在构建基于给定查询的文本文档摘要的构建系统。解决此任务的关键挑战是缺乏培训摘要模型的大量标记数据。在本文中,我们通过探索一系列域适应技术来解决这一挑战。鉴于最近在广泛的自然语言处理任务中进行预先接受的变压器模型的成功,我们利用此类模型为单文档和多文件方案的QFTS任务产生抽象摘要。对于域适应,我们使用预先训练的变压器的摘要模型应用了各种技术,包括转移学习,弱监督学习和远程监督。六个数据集的广泛实验表明,我们所提出的方法非常有效地为QFTS任务产生抽象摘要,同时在一组自动和人类评估指标上设置新的最先进的结果。
translated by 谷歌翻译
传达相关和忠实信息的能力对于有条件生成的许多任务至关重要,但对于神经SEQ-seq seq模型仍然难以捉摸,这些模型的输出通常显示出幻觉,并且无法正确涵盖重要细节。在这项工作中,我们主张规划作为有用的中间表示,以使有条件的一代减少不透明和扎根。我们的作品提出了将文本计划作为一系列提问(QA)对的新概念化。我们用QA蓝图作为内容选择(即〜说什么)和计划(即〜按什么顺序)来增强现有数据集(例如,用于摘要)。我们通过利用最先进的问题生成技术并将输入输出对自动获取蓝图,并将其转换为输入 - 蓝图输出输出元组。我们开发了基于变压器的模型,每个模型都在它们如何将蓝图合并到生成的输出中(例如,作为全局计划或迭代)。跨指标和数据集的评估表明,蓝图模型比不采取计划并允许对生成输出进行更严格控制的替代方案更为事实。
translated by 谷歌翻译
学术研究是解决以前从未解决过的问题的探索活动。通过这种性质,每个学术研究工作都需要进行文献审查,以区分其Novelties尚未通过事先作品解决。在自然语言处理中,该文献综述通常在“相关工作”部分下进行。鉴于研究文件的其余部分和引用的论文列表,自动相关工作生成的任务旨在自动生成“相关工作”部分。虽然这项任务是在10年前提出的,但直到最近,它被认为是作为科学多文件摘要问题的变种。然而,即使在今天,尚未标准化了自动相关工作和引用文本生成的问题。在这项调查中,我们进行了一个元研究,从问题制定,数据集收集,方法方法,绩效评估和未来前景的角度来比较相关工作的现有文献,以便为读者洞察到国家的进步 - 最内容的研究,以及如何进行未来的研究。我们还调查了我们建议未来工作要考虑整合的相关研究领域。
translated by 谷歌翻译
近年来,低资源机器阅读理解(MRC)取得了重大进展,模型在各种语言数据集中获得了显着性能。但是,这些模型都没有为URDU语言定制。这项工作探讨了通过将机器翻译的队伍与来自剑桥O级书籍的Wikipedia文章和Urdu RC工作表组合的人生成的样本组合了机器翻译的小队,探讨了乌尔通题的半自动创建了数据集(UQuad1.0)。 UQuad1.0是一个大型URDU数据集,用于提取机器阅读理解任务,由49K问题答案成对组成,段落和回答格式。在UQuad1.0中,通过众包的原始SquAd1.0和大约4000对的机器翻译产生45000对QA。在本研究中,我们使用了两种类型的MRC型号:基于规则的基线和基于先进的变换器的模型。但是,我们发现后者优于其他人;因此,我们已经决定专注于基于变压器的架构。使用XLMroberta和多语言伯特,我们分别获得0.66和0.63的F1得分。
translated by 谷歌翻译
The General QA field has been developing the methodology referencing the Stanford Question answering dataset (SQuAD) as the significant benchmark. However, compiling factual questions is accompanied by time- and labour-consuming annotation, limiting the training data's potential size. We present the WikiOmnia dataset, a new publicly available set of QA-pairs and corresponding Russian Wikipedia article summary sections, composed with a fully automated generative pipeline. The dataset includes every available article from Wikipedia for the Russian language. The WikiOmnia pipeline is available open-source and is also tested for creating SQuAD-formatted QA on other domains, like news texts, fiction, and social media. The resulting dataset includes two parts: raw data on the whole Russian Wikipedia (7,930,873 QA pairs with paragraphs for ruGPT-3 XL and 7,991,040 QA pairs with paragraphs for ruT5-large) and cleaned data with strict automatic verification (over 160,000 QA pairs with paragraphs for ruGPT-3 XL and over 3,400,000 QA pairs with paragraphs for ruT5-large).
translated by 谷歌翻译
大型语言模型可以产生流畅的对话,但往往是幻觉的事实不准确。虽然检索式增强的模型有助于缓解这个问题,但他们仍然面临着推理的艰难挑战,以便同时提供正确的知识和产生对话。在这项工作中,我们提出了一种模块化模型,知识响应(K2R),将知识纳入会话代理商,这将这个问题分解为两个更简单的步骤。 K2R首先生成一个知识序列,给定对话背景作为中间步骤。在此“推理步骤”之后,该模型随后参加自己生成的知识序列,以及对话背景,以产生最终的响应。在详细的实验中,我们发现这种模型在知识接地的对话任务中少幻觉,并且在可解释性和模块化方面具有优势。特别地,它可以用来将QA和对话系统一起融合在一起,以使对话代理能够提供知识渊博的答案,或者QA模型,以在零拍摄设置中给出对话响应。
translated by 谷歌翻译
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.
translated by 谷歌翻译