Table-and-text hybrid question answering (HybridQA) is a widely used and challenging NLP task commonly applied in the financial and scientific domain. The early research focuses on migrating other QA task methods to HybridQA, while with further research, more and more HybridQA-specific methods have been present. With the rapid development of HybridQA, the systematic survey is still under-explored to summarize the main techniques and advance further research. So we present this work to summarize the current HybridQA benchmarks and methods, then analyze the challenges and future directions of this task. The contributions of this paper can be summarized in three folds: (1) first survey, to our best knowledge, including benchmarks, methods and challenges for HybridQA; (2) systematic investigation with the reasonable comparison of the existing systems to articulate their advantages and shortcomings; (3) detailed analysis of challenges in four important dimensions to shed light on future directions.
translated by 谷歌翻译
表问题回答(表QA)是指从表中提供精确的答案来回答用户的问题。近年来,在表质量检查方面有很多作品,但是对该研究主题缺乏全面的调查。因此,我们旨在提供表QA中可用数据集和代表性方法的概述。我们根据其技术将现有的表质量质量质量检查分为五个类别,其中包括基于语义的,生成,提取,基于匹配的基于匹配的方法和基于检索的方法。此外,由于表质量质量质量检查仍然是现有方法的一项艰巨的任务,因此我们还识别和概述了一些关键挑战,并讨论了表质量质量检查的潜在未来方向。
translated by 谷歌翻译
Structured tabular data exist across nearly all fields. Reasoning task over these data aims to answer questions or determine the truthiness of hypothesis sentences by understanding the semantic meaning of a table. While previous works have devoted significant efforts to the tabular reasoning task, they always assume there are sufficient labeled data. However, constructing reasoning samples over tables (and related text) is labor-intensive, especially when the reasoning process is complex. When labeled data is insufficient, the performance of models will suffer an unendurable decline. In this paper, we propose a unified framework for unsupervised complex tabular reasoning (UCTR), which generates sufficient and diverse synthetic data with complex logic for tabular reasoning tasks, assuming no human-annotated data at all. We first utilize a random sampling strategy to collect diverse programs of different types and execute them on tables based on a "Program-Executor" module. To bridge the gap between the programs and natural language sentences, we design a powerful "NL-Generator" module to generate natural language sentences with complex logic from these programs. Since a table often occurs with its surrounding texts, we further propose novel "Table-to-Text" and "Text-to-Table" operators to handle joint table-text reasoning scenarios. This way, we can adequately exploit the unlabeled table resources to obtain a well-performed reasoning model under an unsupervised setting. Our experiments cover different tasks (question answering and fact verification) and different domains (general and specific), showing that our unsupervised methods can achieve at most 93% performance compared to supervised models. We also find that it can substantially boost the supervised performance in low-resourced domains as a data augmentation technique. Our code is available at https://github.com/leezythu/UCTR.
translated by 谷歌翻译
在现实世界中的问题回答场景中,将表格和文本内容均结合的混合形式吸引了越来越多的关注,其中数值推理问题是最典型和最具挑战性的问题之一。现有方法通常采用编码器框架来表示混合内容并生成答案。但是,它无法捕获编码器侧数值,表格架构和文本信息之间的丰富关系。解码器使用一个简单的预定义运算符分类器,该分类器的灵活性不足以处理具有不同表达式的数值推理过程。为了解决这些问题,本文提出了一个\ textbf {re} lational \ textbf {g} raph增强\ textbf {h} ybrid table-text \ textbf {n}带有\ textbf {t textbf {t text} ree decoder(\ textbff recoder(\ textbf) {reghnt})。它模拟了对表 - 文本混合内容的回答的数值问题,作为表达树的生成任务。此外,我们提出了一种新颖的关系图建模方法,该方法模拟了问题,表和段落之间的对齐方式。我们验证了公开可用的Table-Text混合质量质量质量标准(TAT-QA)的模型。拟议的reghnt显着胜过基线模型,并实现最新结果\脚注{我们在〜\ url {https://github.com/lfy79001/reghnt}}}〜(20222)公开发布了源代码和数据-05-05)。
translated by 谷歌翻译
Machine reading comprehension (MRC) is a long-standing topic in natural language processing (NLP). The MRC task aims to answer a question based on the given context. Recently studies focus on multi-hop MRC which is a more challenging extension of MRC, which to answer a question some disjoint pieces of information across the context are required. Due to the complexity and importance of multi-hop MRC, a large number of studies have been focused on this topic in recent years, therefore, it is necessary and worth reviewing the related literature. This study aims to investigate recent advances in the multi-hop MRC approaches based on 31 studies from 2018 to 2022. In this regard, first, the multi-hop MRC problem definition will be introduced, then 31 models will be reviewed in detail with a strong focus on their multi-hop aspects. They also will be categorized based on their main techniques. Finally, a fine-grain comprehensive comparison of the models and techniques will be presented.
translated by 谷歌翻译
目前用于开放域问题的最先进的生成模型(ODQA)专注于从非结构化文本信息生成直接答案。但是,大量的世界知识存储在结构化数据库中,并且需要使用SQL等查询语言访问。此外,查询语言可以回答需要复杂推理的问题,以及提供完全的解释性。在本文中,我们提出了一个混合框架,将文本和表格证据占据了输入,并根据哪种形式更好地回答这个问题而生成直接答案或SQL查询。然后可以在关联的数据库上执行生成的SQL查询以获得最终答案。据我们所知,这是第一种将Text2SQL与ODQA任务应用于ODQA任务的论文。凭经验,我们证明,在几个ODQA数据集上,混合方法始终如一地优于仅采用大边缘的均匀输入的基线模型。具体地,我们使用T5基础模型实现OpenSquad数据集的最先进的性能。在一个详细的分析中,我们证明能够生成结构的SQL查询可以始终带来增益,特别是对于那些需要复杂推理的问题。
translated by 谷歌翻译
Multi-hop Machine reading comprehension is a challenging task with aim of answering a question based on disjoint pieces of information across the different passages. The evaluation metrics and datasets are a vital part of multi-hop MRC because it is not possible to train and evaluate models without them, also, the proposed challenges by datasets often are an important motivation for improving the existing models. Due to increasing attention to this field, it is necessary and worth reviewing them in detail. This study aims to present a comprehensive survey on recent advances in multi-hop MRC evaluation metrics and datasets. In this regard, first, the multi-hop MRC problem definition will be presented, then the evaluation metrics based on their multi-hop aspect will be investigated. Also, 15 multi-hop datasets have been reviewed in detail from 2017 to 2022, and a comprehensive analysis has been prepared at the end. Finally, open issues in this field have been discussed.
translated by 谷歌翻译
知识基础问题回答(KBQA)旨在通过知识库(KB)回答问题。早期研究主要集中于回答有关KB的简单问题,并取得了巨大的成功。但是,他们在复杂问题上的表现远非令人满意。因此,近年来,研究人员提出了许多新颖的方法,研究了回答复杂问题的挑战。在这项调查中,我们回顾了KBQA的最新进展,重点是解决复杂问题,这些问题通常包含多个主题,表达复合关系或涉及数值操作。详细说明,我们从介绍复杂的KBQA任务和相关背景开始。然后,我们描述用于复杂KBQA任务的基准数据集,并介绍这些数据集的构建过程。接下来,我们提出两个复杂KBQA方法的主流类别,即基于语义解析的方法(基于SP)的方法和基于信息检索的方法(基于IR)。具体而言,我们通过流程设计说明了他们的程序,并讨论了它们的主要差异和相似性。之后,我们总结了这两类方法在回答复杂问题时会遇到的挑战,并解释了现有工作中使用的高级解决方案和技术。最后,我们结论并讨论了与复杂的KBQA有关的几个有希望的方向,以进行未来的研究。
translated by 谷歌翻译
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
translated by 谷歌翻译
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
translated by 谷歌翻译
自动问题应答(QA)系统的目的是以时间有效的方式向用户查询提供答案。通常在数据库(或知识库)或通常被称为语料库的文件集合中找到答案。在过去的几十年里,收购知识的扩散,因此生物医学领域的新科学文章一直是指数增长。因此,即使对于领域专家,也难以跟踪域中的所有信息。随着商业搜索引擎的改进,用户可以在某些情况下键入其查询并获得最相关的一小组文档,以及在某些情况下从文档中的相关片段。但是,手动查找所需信息或答案可能仍然令人疑惑和耗时。这需要开发高效的QA系统,该系统旨在为用户提供精确和精确的答案提供了生物医学领域的自然语言问题。在本文中,我们介绍了用于开发普通域QA系统的基本方法,然后彻底调查生物医学QA系统的不同方面,包括使用结构化数据库和文本集合的基准数据集和几种提出的方​​法。我们还探讨了当前系统的局限性,并探索潜在的途径以获得进一步的进步。
translated by 谷歌翻译
使用来自表格(TableQA)的信息回答自然语言问题是最近的兴趣。在许多应用程序中,表未孤立,但嵌入到非结构化文本中。通常,通过将其部分与表格单元格内容或非结构化文本跨度匹配,并从任一源中提取答案来最佳地回答问题。这导致了HybridQA数据集引入的TextableQA问题的新空间。现有的表格表示对基于变换器的阅读理解(RC)架构的适应性未通过单个系统解决两个表示的不同模式。培训此类系统因对遥远监督的需求而进一步挑战。为了降低认知负担,培训实例通常包括问题和答案,后者匹配多个表行和文本段。这导致嘈杂的多实例培训制度不仅涉及表的行,而且涵盖了链接文本的跨度。我们通过提出Mitqa来回应这些挑战,这是一个新的TextableQA系统,明确地模拟了表行选择和文本跨度选择的不同但密切相关的概率空间。与最近的基线相比,我们的实验表明了我们的方法的优越性。该方法目前在HybridQA排行榜的顶部,并进行了一个试验集,在以前公布的结果上实现了对em和f1的21%的绝对改善。
translated by 谷歌翻译
问题回答(QA)是最重要的自然语言处理(NLP)任务之一。它旨在使用NLP技术根据大规模的非结构化语料库生成对给定问题的相应答案。随着深度学习的发展,正在提出越来越具有挑战性的质量检查数据集,并且许多用于解决它们的新方法也正在出现。在本文中,我们研究了在深度学习时代发布的有影响力的质量检查数据集。具体来说,我们首先引入两个最常见的质量检查任务 - 文本问题答案和视觉问题 - 分别涵盖最具代表性的数据集,然后给出质量检查研究的一些当前挑战。
translated by 谷歌翻译
过去十年互联网上可用的信息和信息量增加。该数字化导致自动应答系统需要从冗余和过渡知识源中提取富有成效的信息。这些系统旨在利用自然语言理解(NLU)从此巨型知识源到用户查询中最突出的答案,从而取决于问题答案(QA)字段。问题答案涉及但不限于用户问题映射的步骤,以获取相关查询,检索相关信息,从检索到的信息等找到最佳合适的答案等。当前对深度学习模型的当前改进估计所有这些任务的令人信服的性能改进。在本综述工作中,根据问题的类型,答案类型,证据答案来源和建模方法进行分析QA场的研究方向。此细节随后是自动问题生成,相似性检测和语言的低资源可用性等领域的开放挑战。最后,提出了对可用数据集和评估措施的调查。
translated by 谷歌翻译
文本到SQL解析是一项必不可少且具有挑战性的任务。文本到SQL解析的目的是根据关系数据库提供的证据将自然语言(NL)问题转换为其相应的结构性查询语言(SQL)。来自数据库社区的早期文本到SQL解析系统取得了显着的进展,重度人类工程和用户与系统的互动的成本。近年来,深层神经网络通过神经生成模型显着提出了这项任务,该模型会自动学习从输入NL问题到输出SQL查询的映射功能。随后,大型的预训练的语言模型将文本到SQL解析任务的最新作品带到了一个新级别。在这项调查中,我们对文本到SQL解析的深度学习方法进行了全面的评论。首先,我们介绍了文本到SQL解析语料库,可以归类为单转和多转。其次,我们提供了预先训练的语言模型和现有文本解析方法的系统概述。第三,我们向读者展示了文本到SQL解析所面临的挑战,并探索了该领域的一些潜在未来方向。
translated by 谷歌翻译
文档视觉问题回答(VQA)旨在了解视觉上富裕的文档,以自然语言回答问题,这是自然语言处理和计算机视觉的新兴研究主题。在这项工作中,我们介绍了一个名为TAT-DQA的新文档VQA数据集,该数据集由3,067个文档页面组成,其中包含半结构化表和非结构化文本以及16,558个问答,通过扩展Tat-QA Dataset。这些文档是从现实世界中的财务报告中取样的,并包含大量数字,这意味着要求离散的推理能力回答该数据集上的问题。基于TAT-DQA,我们进一步开发了一个名为MHST的新型模型,该模型在包括文本,布局和视觉图像在内的多模式中考虑了信息,以智能地以相应的策略(即提取或推理)智能地解决不同类型的问题。广泛的实验表明,MHST模型明显优于基线方法,证明其有效性。但是,表演仍然远远落后于专家人类。我们预计,我们的新Tat-DQA数据集将有助于研究对视觉和语言结合的视觉丰富文档的深入理解,尤其是对于需要离散推理的场景。另外,我们希望拟议的模型能够激发研究人员将来设计更高级的文档VQA模型。
translated by 谷歌翻译
Pre-trained Language Models (PLMs) which are trained on large text corpus through the self-supervised learning method, have yielded promising performance on various tasks in Natural Language Processing (NLP). However, though PLMs with huge parameters can effectively possess rich knowledge learned from massive training text and benefit downstream tasks at the fine-tuning stage, they still have some limitations such as poor reasoning ability due to the lack of external knowledge. Incorporating knowledge into PLMs has been tried to tackle these issues. In this paper, we present a comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) to provide a clear insight into this thriving field. We introduce appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight the focus of these two kinds of tasks. For NLU, we take several types of knowledge into account and divide them into four categories: linguistic knowledge, text knowledge, knowledge graph (KG), and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods. Finally, we point out some promising future directions of KE-PLMs.
translated by 谷歌翻译
文本到SQL引起了自然语言处理和数据库社区的关注,因为它能够将自然语言中的语义转换为SQL查询及其在构建自然语言接口到数据库系统中的实际应用。文本到SQL的主要挑战在于编码自然话语的含义,解码为SQL查询,并翻译这两种形式之间的语义。这些挑战已被最近的进步解决了不同的范围。但是,对于这项任务仍缺乏全面的调查。为此,我们回顾了有关数据集,方法和评估的文本到SQL的最新进展,并提供了这项系统的调查,解决了上述挑战并讨论潜在的未来方向。我们希望这项调查可以作为快速获取现有工作并激励未来的研究。
translated by 谷歌翻译
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
translated by 谷歌翻译
Hybrid tabular-textual question answering (QA) requires reasoning from heterogeneous information, and the types of reasoning are mainly divided into numerical reasoning and span extraction. Despite being the main challenge of the task compared to extractive QA, current numerical reasoning method simply uses LSTM to autoregressively decode program sequences, and each decoding step produces either an operator or an operand. However, the step-by-step decoding suffers from exposure bias, and the accuracy of program generation drops sharply with progressive decoding. In this paper, we propose a non-autoregressive program generation framework, which facilitates program generation in parallel. Our framework, which independently generates complete program tuples containing both operators and operands, can significantly boost the speed of program generation while addressing the error accumulation issue. Our experiments on the MultiHiertt dataset shows that our model can bring about large improvements (+7.97 EM and +6.38 F1 points) over the strong baseline, establishing the new state-of-the-art performance, while being much faster (21x) in program generation. The performance drop of our method is also significantly smaller than the baseline with increasing numbers of numerical reasoning steps.
translated by 谷歌翻译