Reading comprehension (RC)---in contrast to information retrieval---requiresintegrating information and reasoning about events, entities, and theirrelations across a full document. Question answering is conventionally used toassess RC ability, in both artificial agents and children learning to read.However, existing RC datasets and tasks are dominated by questions that can besolved by selecting answers using superficial information (e.g., local contextsimilarity or global term frequency); they thus fail to test for the essentialintegrative aspect of RC. To encourage progress on deeper comprehension oflanguage, we present a new dataset and set of tasks in which the reader mustanswer questions about stories by reading entire books or movie scripts. Thesetasks are designed so that successfully answering their questions requiresunderstanding the underlying narrative rather than relying on shallow patternmatching or salience. We show that although humans solve the tasks easily,standard RC models struggle on the tasks presented here. We provide an analysisof the dataset and the challenges it presents.
translated by 谷歌翻译
We present WIKIREADING, a large-scale natural language understanding task and publicly-available dataset with 18 million instances. The task is to predict textual values from the structured knowledge base Wikidata by reading the text of the corresponding Wikipedia articles. The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs). We compare various state-of-the-art DNN-based architectures for document classification , information extraction, and question answering. We find that models supporting a rich answer space, such as word or character sequences, perform best. Our best-performing model, a word-level sequence to sequence model with a mechanism to copy out-of-vocabulary words, obtains an accuracy of 71.8%.
translated by 谷歌翻译
This paper presents an end-to-end neural network model, named NeuralGenerative Question Answering (GENQA), that can generate answers to simplefactoid questions, based on the facts in a knowledge-base. More specifically,the model is built on the encoder-decoder framework for sequence-to-sequencelearning, while equipped with the ability to enquire the knowledge-base, and istrained on a corpus of question-answer pairs, with their associated triples inthe knowledge-base. Empirical study shows the proposed model can effectivelydeal with the variations of questions and answers, and generate right andnatural answers by referring to the facts in the knowledge-base. The experimenton question answering demonstrates that the proposed model can outperform anembedding-based QA model as well as a neural dialogue model trained on the samedata.
translated by 谷歌翻译
This paper proposes to tackle open-domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.
translated by 谷歌翻译
With the rapid growth of knowledge bases (KBs) on the web, how to take full advantage of them becomes increasingly important. Question answering over knowledge base (KB-QA) is one of the promising approaches to access the substantial knowledge. Meanwhile, as the neural network-based (NN-based) methods develop, NN-based KB-QA has already achieved impressive results. However, previous work did not put more emphasis on question representation, and the question is converted into a fixed vector regardless of its candidate answers. This simple representation strategy is not easy to express the proper information in the question. Hence, we present an end-to-end neural network model to represent the questions and their corresponding scores dynamically according to the various candidate answer aspects via cross-attention mechanism. In addition , we leverage the global knowledge inside the underlying KB, aiming at integrating the rich KB information into the representation of the answers. As a result, it could alleviates the out-of-vocabulary (OOV) problem, which helps the cross-attention model to represent the question more precisely. The experimental results on WebQuestions demonstrate the effectiveness of the proposed approach.
translated by 谷歌翻译
A long-term goal of machine learning is to build intelligent conversational agents. One recent popular approach is to train end-to-end models on a large amount of real dialog transcripts between humans (Sordoni et al., 2015; Vinyals & Le, 2015; Shang et al., 2015). However, this approach leaves many questions unanswered as an understanding of the precise successes and shortcomings of each model is hard to assess. A contrasting recent proposal are the bAbI tasks (Weston et al., 2015b) which are synthetic data that measure the ability of learning machines at various reasoning tasks over toy language. Unfortunately, those tests are very small and hence may encourage methods that do not scale. In this work, we propose a suite of new tasks of a much larger scale that attempt to bridge the gap between the two regimes. Choosing the domain of movies, we provide tasks that test the ability of models to answer factual questions (utilizing OMDB), provide personalization (utilizing MovieLens), carry short conversations about the two, and finally to perform on natural dialogs from Reddit. We provide a dataset covering ∼75k movie entities and with ∼3.5M training examples. We present results of various models on these tasks, and evaluate their performance.
translated by 谷歌翻译
Information Extraction (IE) refers to automatically extracting struc-tured relation tuples from unstructured texts. Common IE solutions, including Relation Extraction (RE) and open IE systems, can hardly handle cross-sentence tuples, and are severely restricted by limited relation types as well as informal relation specifications (e.g., free-text based relation tuples). In order to overcome these weaknesses, we propose a novel IE framework named QA4IE, which leverages the flexible question answering (QA) approaches to produce high quality relation triples across sentences. Based on the framework, we develop a large IE benchmark with high quality human evaluation. This benchmark contains 293K documents, 2M golden relation triples, and 636 relation types. We compare our system with some IE baselines on our benchmark and the results show that our system achieves great improvements.
translated by 谷歌翻译
Existing knowledge-based question answering systems often rely on small annotated training data. While shallow methods like relation extraction are robust to data scarcity, they are less expressive than the deep meaning representation methods like semantic parsing, thereby failing at answering questions involving multiple constraints. Here we alleviate this problem by empowering a relation extraction method with additional evidence from Wikipedia. We first present a neural network based relation extractor to retrieve the candidate answers from Freebase, and then infer over Wikipedia to validate these answers. Experiments on the WebQuestions question answering dataset show that our method achieves an F 1 of 53.3%, a substantial improvement over the state-of-the-art.
translated by 谷歌翻译
问答是信息检索和自然语言处理边界中最重要和最困难的应用之一,尤其是当我们讨论需要某种形式的推理来确定正确答案的复杂科学问题时。在本文中,我们提出了一个两步法,它将针对问答交换优化的信息检索技术与用于自然语言推理的深度学习模型相结合,以便在科学领域中解决多项选择问题。对于每个问题 - 答案对,我们使用基于标准检索的模型来查找相关的候选上下文,并将主要问题分解为两个不同的问题。首先,使用来自Lucene的检索模型,基于上下文为每个候选答案分配正确性分数。其次,我们使用深度学习架构来计算候选答案是否可以从包含从知识库中检索到的句子组成的某些选择的上下文中推断出来。最后,所有这些求解器使用简单的神经网络进行组合,从而预测正确的答案。这个提出的两步模型在绝对精度方面优于基于最佳检索的求解器超过3%。
translated by 谷歌翻译
会话机器理解需要深入了解对话历史。为了使传统的单圈模型能够全面地对历史进行编码,我们引入了Flow,这种机制可以通过交替的并行处理结构结合在回答前一个问题的过程中产生的中间表示。将先前的问题/答案连接起来作为输入的toshallow方法相比,Flow更深入地整合了对话历史的潜在语义。我们的模型FlowQA在最近提出的两个对话挑战中表现出优异的性能(CoQA为+ 7.2%,QuAC为+ 4.0%)。 Flow的有效性也体现在其他任务中。通过减少对会话机器理解的顺序构建理解,FlowQ在SCONE中对所有三个域进行了最佳模型,准确度提高了1.8%到+ 4.4%。
translated by 谷歌翻译
Visual Question Answering (VQA) is a challenging task that has receivedincreasing attention from both the computer vision and the natural languageprocessing communities. Given an image and a question in natural language, itrequires reasoning over visual elements of the image and general knowledge toinfer the correct answer. In the first part of this survey, we examine thestate of the art by comparing modern approaches to the problem. We classifymethods by their mechanism to connect the visual and textual modalities. Inparticular, we examine the common approach of combining convolutional andrecurrent neural networks to map images and questions to a common featurespace. We also discuss memory-augmented and modular architectures thatinterface with structured knowledge bases. In the second part of this survey,we review the datasets available for training and evaluating VQA systems. Thevarious datatsets contain questions at different levels of complexity, whichrequire different capabilities and types of reasoning. We examine in depth thequestion/answer pairs from the Visual Genome project, and evaluate therelevance of the structured annotations of images with scene graphs for VQA.Finally, we discuss promising future directions for the field, in particularthe connection to structured knowledge bases and the use of natural languageprocessing models.
translated by 谷歌翻译
Recent development of large-scale question answering (QA) datasets triggereda substantial amount of research into end-to-end neural architectures for QA.Increasingly complex systems have been conceived without comparison to simplerneural baseline systems that would justify their complexity. In this work, wepropose a simple heuristic that guides the development of neural baselinesystems for the extractive QA task. We find that there are two ingredientsnecessary for building a high-performing neural QA system: first, the awarenessof question words while processing the context and second, a compositionfunction that goes beyond simple bag-of-words modeling, such as recurrentneural networks. Our results show that FastQA, a system that meets these tworequirements, can achieve very competitive performance compared with existingmodels. We argue that this surprising finding puts results of previous systemsand the complexity of recent QA datasets into perspective.
translated by 谷歌翻译
A critical task for question answering is the final answer selection stage, which has to combine multiple signals available about each answer candidate. This paper proposes EviNets: a novel neural network architecture for factoid question answering. EviNets scores candidate answer entities by combining the available supporting evidence, e.g., structured knowledge bases and unstructured text documents. EviNets represents each piece of evidence with a dense embeddings vector, scores their relevance to the question, and aggregates the support for each candidate to predict their final scores. Each of the components is generic and allows plugging in a variety of models for semantic similarity scoring and information aggregation. We demonstrate the effectiveness of EviNets in experiments on the existing TREC QA and WikiMovies benchmarks, and on the new Yahoo! Answers dataset introduced in this paper. EviNets can be extended to other information types and could facilitate future work on combining evidence signals for joint reasoning in question answering.
translated by 谷歌翻译
问答(QA)近年来受益于深度学习技术。然而,领域特定的QA仍然是训练神经网络所需的大量数据的挑战。本文通过选择圣经中的相关经文来研究圣经领域中的答案句选择任务和答案。为此,我们基于圣经琐事问题创建一个新的数据集BibleQA,并为我们的任务提出三个神经网络模型。我们在大型QAdataset,SQuAD上预训我们的模型,并研究传递权重对模式的影响。此外,我们还使用不同的上下文长度和不同的圣经翻译来测量模型的准确度。我们确认转移学习在模型准确性方面有显着提高。使用较短的上下文长度来获得相对较好的结果,而较长的上下文长度降低了模型精度。我们还发现在数据集中使用更现代的可翻译对任务有积极影响。
translated by 谷歌翻译
我们描述了一类称为内存网络的新型学习模型。 Memorynetworks推理使用推理组件和长期内存组件;他们学习如何共同使用这些。可以读取和写入长期记忆,目的是将其用于预测。我们在问答(QA)的背景下研究这些模型,其中长期记忆有效地充当(动态)知识库,并且输出是文本响应。我们在大规模QA任务中评估它们,并从模拟世界生成一个更小但更复杂的玩具任务。在后者中,我们通过将多个支持句子链接到需要理解动词内涵的问题来展示这些模型的推理能力。
translated by 谷歌翻译
We present NewsQA, a challenging machine comprehension dataset of over100,000 human-generated question-answer pairs. Crowdworkers supply questionsand answers based on a set of over 10,000 news articles from CNN, with answersconsisting of spans of text from the corresponding articles. We collect thisdataset through a four-stage process designed to solicit exploratory questionsthat require reasoning. A thorough analysis confirms that NewsQA demandsabilities beyond simple word matching and recognizing textual entailment. Wemeasure human performance on the dataset and compare it to several strongneural models. The performance gap between humans and machines (0.198 in F1)indicates that significant progress can be made on NewsQA through futureresearch. The dataset is freely available athttps://datasets.maluuba.com/NewsQA.
translated by 谷歌翻译
现有的问答(QA)数据集无法训练QA系统执行复杂的推理并提供答案的解释。我们介绍HotpotQA,这是一个新的数据集,包含113k基于维基百科的问答对,有四个关键特征:(1)问题需要查找和推理多个支持文档才能回答; (2)问题多种多样,不受任何先前存在的知识库或知识模式的约束; (3)我们提供推理所需的句子级支持事实,允许QAsystems在强有力的监督下进行推理并解释预测; (4)我们提供了一种新型的事实比较问题来测试QA系统提取相关事实和进行必要比较的能力。我们证明HotpotQA对最新的QA系统具有挑战性,支持事实使模型能够提高性能并做出可解释的预测。
translated by 谷歌翻译
我们提出了QuAC,一个上下文问答的数据集,包含14K信息搜索QA对话(总共100K问题)。这个对话包括两个群众工作者:(1)一个学生,他提出一系列自由形式,尽可能多地学习隐藏的维基百科文本,以及(2)通过提供文本摘要来回答问题的教师.QuAC不会引入挑战在现有的机器理解数据集中找到:它的问题通常在对话框环境中更开放,无法回答或仅有意义,正如我们在详细的定性评估中所显示的那样。我们还报告了许多参考模型的结果,包括最近最先进的阅读理解架构扩展的tomodel对话框上下文。我们最好的模型在20 F1之前表现不及人类,这表明这些数据未来有很大的发展空间。数据集,基线和排行榜可在http://quac.ai上找到。
translated by 谷歌翻译
We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the students' ability in understanding and reasoning. In particular, the proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models (43%) and the ceiling human performance (95%). We hope this new dataset can serve as a valuable resource for research and evaluation in machine comprehension. The dataset is freely available at
translated by 谷歌翻译
This paper aims at improving how machines can answer questions directly from text, with the focus of having models that can answer correctly multiple types of questions and from various types of texts, documents or even from large collections of them. To that end, we introduce the Weaver model that uses a new way to relate a question to a textual context by weaving layers of recurrent networks, with the goal of making as few assumptions as possible as to how the information from both question and context should be combined to form the answer. We show empirically on six datasets that Weaver performs well in multiple conditions. For instance, it produces solid results on the very popular SQuAD dataset (Rajpurkar et al., 2016), solves almost all bAbI tasks (Weston et al., 2015) and greatly outper-forms state-of-the-art methods for open domain question answering from text (Chen et al., 2017).
translated by 谷歌翻译