Story generation and understanding -- as with all NLG/NLU tasks -- has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for any flaws that the neural networks might have. However, symbolic methods are extremely costly in terms of the amount of time and expertise needed to create them. In this work, we capitalize on state-of-the-art Code-LLMs, such as Codex, to bootstrap the use of symbolic methods for tracking the state of stories and aiding in story understanding. We show that our CoRRPUS system and abstracted prompting procedures can beat current state-of-the-art structured LLM techniques on pre-existing story understanding tasks (bAbI task 2 and Re^3) with minimal hand engineering. We hope that this work can help highlight the importance of symbolic representations and specialized prompting for LLMs as these models require some guidance for performing reasoning tasks properly.
translated by 谷歌翻译
人工推理通常可以理解为两个系统之间的相互作用:直观和关联(“系统1”)和审议和逻辑(“系统2”)。神经序列模型 - 在执行复杂,结构化任务时越来越成功 - 表现出系统1的优点和故障模式:它们是快速和学习数据的模式,但通常不一致和不连贯。在这项工作中,我们通过添加系统2-Inspired逻辑推理,寻求一种轻量级,无培训的手段来改善现有系统1样序列模型。我们探讨了该主题的几种变体,其中通过符号推理模块检查来自神经序列模型的候选几代,可以通过符号推理模块来接受或拒绝几代人。我们的方法使用神经推理来介导神经系统1和逻辑系统2.导致强大的故事生成和接地的指示,表明这种方法可以增加神经基代的一致性和准确性。
translated by 谷歌翻译
自动化讲故事长期以来一直抓住了研究人员在日常生活中的叙述中的难以感受。但是,在用神经语言模型产生叙述时,保持一致性并保持对特定结束的特定结束挑战。在本文中,我们介绍了读者模型(Storm)的故事生成,这是一个框架,其中读者模型用于推理故事的推理应该进步。读者模型是人类读者相信关于虚构故事世界的概念,实体和关系的人。我们展示了如何作为知识图表所代表的明确读者模型提供故事一致性,并以实现给定的故事世界目标的形式提供可控性。实验表明,我们的模型产生了显着更加连贯和主题的故事,优于尺寸的基线,包括情节合理性并保持主题。我们的系统也优于在未订购的情况下在组成给定概念时占总引导的故事生成基线。
translated by 谷歌翻译
Automated plot generation is the challenge of generating a sequence of events that will be perceived by readers as the plot of a coherent story. Traditional symbolic planners plan a story from a goal state and guarantee logical causal plot coherence but rely on a library of hand-crafted actions with their preconditions and effects. This closed world setting limits the length and diversity of what symbolic planners can generate. On the other hand, pre-trained neural language models can generate stories with great diversity, while being generally incapable of ending a story in a specified manner and can have trouble maintaining coherence. In this paper, we present an approach to story plot generation that unifies causal planning with neural language models. We propose to use commonsense knowledge extracted from large language models to recursively expand a story plot in a backward chaining fashion. Specifically, our system infers the preconditions for events in the story and then events that will cause those conditions to become true. We performed automatic evaluation to measure narrative coherence as indicated by the ability to answer questions about whether different events in the story are causally related to other events. Results indicate that our proposed method produces more coherent plotlines than several strong baselines.
translated by 谷歌翻译
The rapid development and application of natural language generation (NLG) techniques has revolutionized the field of automatic text production. However, these techniques are still limited in their ability to produce human-like text that is truly reasonable and informative. In this paper, we explore the importance of NLG being guided by knowledge, in order to convey human-like reasoning through language generation. We propose ten goals for intelligent NLG systems to pursue, and briefly review the achievement of NLG techniques guided by knowledge and reasoning. We also conclude by envisioning future directions and challenges in the pursuit of these goals.
translated by 谷歌翻译
Pre-trained language models (LMs) have shown remarkable reasoning performance using explanations (or ``chain-of-thought'' (CoT)) for in-context learning. On the other hand, these reasoning tasks are usually presumed to be more approachable for symbolic programming. To make progress towards understanding in-context learning, we curate synthetic datasets containing equivalent (natural, symbolic) data pairs, where symbolic examples contain first-order logic rules and predicates from knowledge bases (KBs). Then we revisit neuro-symbolic approaches and use Language Models as Logic Programmer (LMLP) that learns from demonstrations containing logic rules and corresponding examples to iteratively reason over KBs, recovering Prolog's backward chaining algorithm. Comprehensive experiments are included to systematically compare LMLP with CoT in deductive reasoning settings, showing that LMLP enjoys more than 25% higher accuracy than CoT on length generalization benchmarks even with fewer parameters.
translated by 谷歌翻译
我们介绍了一项对自然语言(NL)推理的人类通知,开放域和逻辑上复杂且多样的数据集,配备了一阶逻辑(fol)注释。对开本由1,435个示例(独特的结论)组成,每个示例与487组前提之一搭配,这些场所作为规则,可用于演绎理由,以理解每个结论的有效性。前提和结论的逻辑正确性是通过其平行注释来确保的,这些注释会自动由我们的FOL推理引擎验证。除了主要的NL推理任务外,对开本中的NL-FOL对自动构成了使用FOL作为逻辑形式的新的NL-FOL翻译数据集。我们对广泛的实验系统地评估了对中型语言模型(BERT,ROBERTA)进行微调的FOL推理能力,并且在大型语言模型(GPT-NEOX,OPT,OPT,GPT-3,Codex)上促成了很少的射击。对于NL-FOL翻译,我们尝试使用GPT-3和Codex。我们的结果表明,公开可用的最强大的大语言模型之一(LLM),GPT-3 Davinci,仅比随机结果略好,而在一部分集的一部分中,该模型尤其不好,并且在预测该模型方面尤其不好。纠正虚假和未知结论的真实价值。我们的数据集和代码可在https://github.com/yale-lily/folio上找到。
translated by 谷歌翻译
Despite the success of large language models (LLMs) in various natural language processing (NLP) tasks, the stored knowledge in these models may inevitably be incomplete, out-of-date, or incorrect. This motivates the need to utilize external knowledge to assist LLMs. Unfortunately, current methods for incorporating external knowledge often require additional training or fine-tuning, which can be costly and may not be feasible for LLMs. To address this issue, we propose a novel post-processing approach, rethinking with retrieval (RR), which retrieves relevant external knowledge based on the decomposed reasoning steps obtained from the chain-of-thought (CoT) prompting. This lightweight approach does not require additional training or fine-tuning and is not limited by the input length of LLMs. We evaluate the effectiveness of RR through extensive experiments with GPT-3 on three complex reasoning tasks: commonsense reasoning, temporal reasoning, and tabular reasoning. Our results show that RR can produce more faithful explanations and improve the performance of LLMs.
translated by 谷歌翻译
Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We also discuss the potential reasons for emerging such reasoning abilities and highlight future research directions.
translated by 谷歌翻译
We address the general task of structured commonsense reasoning: given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. To employ large language models (LMs) for this task, existing approaches ``serialize'' the output graph as a flat list of nodes and edges. Although feasible, these serialized graphs strongly deviate from the natural language corpora that LMs were pre-trained on, hindering LMs from generating them correctly. In this paper, we show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language, even when the downstream task does not involve source code at all. We demonstrate our approach across three diverse structured commonsense reasoning tasks. In all these natural language tasks, we show that using our approach, a code generation LM (CODEX) outperforms natural-LMs that are fine-tuned on the target task (e.g., T5) and other strong LMs such as GPT-3 in the few-shot setting.
translated by 谷歌翻译
对事件序列的预测对于信息检索和自然语言处理中的许多现实世界应用至关重要。在事件序列预测中,未来的活动生成(FEG)是一项具有挑战性的任务,因为它不仅需要流利的文本生成,而且需要常识性推理才能保持整个事件故事的逻辑连贯性。在本文中,我们提出了一个新颖的可解释的FEG框架COEP。它突出并整合了两种类型的事件知识,对直接事件事件关系的顺序知识以及推论知识,这些知识反映了事件之间的中间角色心理学(例如意图,原因,反应),这些心理本质地将故事推向了故事。为了减轻知识遗忘问题,我们为每种类型的知识设计了两个模块,即IM和GM,它们是通过及时调整组合的。首先,IM专注于理解推论知识,以产生常识性解释并为通用汽车提供软提示向量。我们还设计了一种对比歧视器,以提高概括能力。其次,GM通过用IM的指导对直接顺序知识进行建模来生成未来事件。自动和人类评估表明,我们的方法可以产生更连贯,具体和逻辑的未来事件。
translated by 谷歌翻译
Storytelling and narrative are fundamental to human experience, intertwined with our social and cultural engagement. As such, researchers have long attempted to create systems that can generate stories automatically. In recent years, powered by deep learning and massive data resources, automatic story generation has shown significant advances. However, considerable challenges, like the need for global coherence in generated stories, still hamper generative models from reaching the same storytelling ability as human narrators. To tackle these challenges, many studies seek to inject structured knowledge into the generation process, which is referred to as structure knowledge-enhanced story generation. Incorporating external knowledge can enhance the logical coherence among story events, achieve better knowledge grounding, and alleviate over-generalization and repetition problems in stories. This survey provides the latest and comprehensive review of this research field: (i) we present a systematical taxonomy regarding how existing methods integrate structured knowledge into story generation; (ii) we summarize involved story corpora, structured knowledge datasets, and evaluation metrics; (iii) we give multidimensional insights into the challenges of knowledge-enhanced story generation and cast light on promising directions for future study.
translated by 谷歌翻译
We consider the problem of automatically generating stories in multiple languages. Compared to prior work in monolingual story generation, crosslingual story generation allows for more universal research on story planning. We propose to use Prompting Large Language Models with Plans to study which plan is optimal for story generation. We consider 4 types of plans and systematically analyse how the outputs differ for different planning strategies. The study demonstrates that formulating the plans as question-answer pairs leads to more coherent generated stories while the plan gives more control to the story creators.
translated by 谷歌翻译
我们探索如何产生一系列思想 - 一系列中间推理步骤 - 显着提高了大语言模型执行复杂推理的能力。特别是,我们通过一种称为“思想链”提示的简单方法在足够大的语言模型中自然出现这种推理能力,在此过程中,一些思想示范被作为提示的示例提供了。三种大语模型的实验表明,促使思想链提高了一系列算术,常识和象征性推理任务的性能。经验收益可能会引人注目。例如,仅使用八个思想范围的540B参数语言模型才能在数学单词问题的GSM8K基准上实现最新的精度,甚至超过了带有验证器的Fineted GPT-3。
translated by 谷歌翻译
基于神经语言模型的自动化故事生成方法遭受了两个重要的限制。首先,基于语言模型的故事生成器通常不适用于给定的目标或结束。其次,当故事变长时,他们经常失去一致。我们提出了一种新的自动化故事生成方法,将问题视为生成的问答之一。我们所提出的故事生成系统从封装故事的最终事件的句子开始。然后系统迭代地(1)分析描述最新事件的文本,(2)生成关于“为什么”一个字符正在执行他们在事件中执行的事情的问题,然后(3)尝试生成另一个前面的回答这个问题的事件。
translated by 谷歌翻译
预训练的语言模型(PLM)无法生成长形式的叙事文本,因为它们不考虑全局结构。结果,生成的文本通常是不巧妙的,重复的或缺乏内容的。故事发电的最新工作以提示,关键字或语义框架的形式重新引入了明确的内容计划。经过大型平行语料库的培训,这些模型可以生成更合乎逻辑的事件序列,从而产生更满足的故事。但是,这些中间表示通常不使用自然语言,并且不需要微调就无法使用。我们建议使用现成的PLM生成故事情节,同时保持内容计划的好处,以产生凝聚力和满足的故事。我们提出的方法ScratchPlot首先提示PLM构成内容计划。然后,我们生成故事的身体并以内容计划结束。此外,我们通过使用其他PLM来对生成的(故事,结尾)对进行排名。我们用各种基线基准测试我们的方法,并在人类和自动评估中取得了卓越的结果。
translated by 谷歌翻译
动机,情感和行动是人类活动中相关的基本因素。尽管长期以来一直认为动机和情感是探索人们如何在人类活动中采取行动的核心,但几乎没有研究支持分析人类精神状态与行动之间的关系。我们介绍了第一项研究,该研究研究了基于语言的人类活动中建模动机,情感和行动的生存能力,即逗号(人类活动的认知框架)。在逗号的指导下,我们定义了三个自然语言处理任务(情感理解,动机理解和有条件的动作生成),并通过自动从故事常识中提取样本来建立一个具有挑战性的数据集冰雹。 NLP应用程序的实验结果证明了建模关系的有效性。此外,与现有方法相比,受逗号启发的模型可以更好地揭示动机,情感和行动之间的基本关系。
translated by 谷歌翻译
近年来带来了对自然语言理解领域的勤义代表和推理的重新兴趣。新的致辞知识图表(CSKG)的发展是这些进步的核心,因为他们的不同事实可以通过机器学习模型来解决新的和具有挑战性的任务。与此同时,由于全面地涵盖了一般勤杂朗知识所需的大规模规模,对这些资源的质量和覆盖率仍存在疑问。在这项工作中,我们将手动构建的CSKGS分配在NLP代理商遇到的所有情况下,我们将永远不会实现适用所需的覆盖范围。因此,我们提出了一种新的评估框架,用于测试KGS的效用,基于如何从中学习有效的隐式知识表示。通过这一新目标,我们提出了一个含有知识的全新CSKG的新CSKG,该知识不容易获得预用的语言模型。我们与其他领先的CSKG相比,评估其属性,表现了对勤杂朗语言知识资源的第一个大规模对研究。接下来,我们显示原子2020更适合培训知识模型,可以为新的,看不见的实体和事件产生准确,代表知识。最后,通过人类评估,我们表明,尽管使用超过430倍的参数,但GPT-3(175B参数)的几次射击性能较低,而令人印象深刻,令人印象深刻,令人印象深刻,令人印象深刻,仍然低于原子型2020的巴特的知识模型。
translated by 谷歌翻译
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
translated by 谷歌翻译
Reasoning is a fundamental aspect of human intelligence that plays a crucial role in activities such as problem solving, decision making, and critical thinking. In recent years, large language models (LLMs) have made significant progress in natural language processing, and there is observation that these models may exhibit reasoning abilities when they are sufficiently large. However, it is not yet clear to what extent LLMs are capable of reasoning. This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous research in this field, and suggestions on future directions. Our aim is to provide a detailed and up-to-date review of this topic and stimulate meaningful discussion and future work.
translated by 谷歌翻译