Large Language Models (LLMs) have been the subject of active research, significantly advancing the field of Natural Language Processing (NLP). From BERT to BLOOM, LLMs have surpassed state-of-the-art results in various natural language tasks such as question answering, summarization, and text generation. Many ongoing efforts focus on understanding LLMs' capabilities, including their knowledge of the world, syntax, and semantics. However, extending the textual prowess of LLMs to symbolic reasoning has been slow and predominantly focused on tackling problems related to the mathematical field. In this paper, we explore the use of LLMs for automated planning - a branch of AI concerned with the realization of action sequences (plans) to achieve a goal, typically executed by intelligent agents, autonomous robots, and unmanned vehicles. We introduce Plansformer; an LLM fine-tuned on planning problems and capable of generating plans with favorable behavior in terms of correctness and length with reduced knowledge-engineering efforts. We also demonstrate the adaptability of Plansformer in solving different planning domains with varying complexities, owing to the transfer learning abilities of LLMs. For one configuration of Plansformer, we achieve ~97% valid plans, out of which ~95% are optimal for Towers of Hanoi - a puzzle-solving domain.
translated by 谷歌翻译
大型语言模型(LLM)的最新进展已改变了自然语言处理(NLP)的领域。从GPT-3到Palm,每种新的大型语言模型都在推动自然语言任务的最新表现。除了自然语言的能力外,人们还对理解这种模型(接受大量数据,具有推理能力的培训)也引起了重大兴趣。因此,人们有兴趣为各种推理任务开发基准,并且在此类基准测试中测试LLM的初步结果似乎主要是积极的。但是,目前的基准相对简单,这些基准的性能不能用作支持的证据,很多时候是古怪的,对LLMS的推理能力提出了主张。截至目前,这些基准仅代表了一组非常有限的简单推理任务集,如果我们要衡量此类基于LLM的系统的真实限制,我们需要研究更复杂的推理问题。通过这种动机,我们提出了一个可扩展的评估框架,以测试LLM在人类智能的中心方面的能力,这是关于行动和变化的推理。我们提供的多个测试案例比任何先前建立的推理基准都更重要,并且每个测试案例都评估了有关行动和变化的推理的某些方面。对GPT-3(Davinci)基本版本的初步评估结果,在这些基准测试中显示了Subpar的性能。
translated by 谷歌翻译
语言模型(LMS)被证明具有对物理世界的常识知识,这对于在日常情况下完成任务至关重要。但是,LMS是否有能力为具体任务生成扎根的可执行计划,这仍然是一个悬而未决的问题。这是非常具有挑战性的,因为LMS没有“眼睛”或“手”来感知现实的环境。在这项工作中,我们展示了有关这个重要研究问题的第一个研究。我们首先提出了一个名为G-Planet的新型问题公式,它将其作为输入一个高级目标和在特定环境中的对象表。预期输出是一个计划,该计划包括逐步指令供代理执行。为了实现此问题的研究,我们建立了一个评估协议,并设计了一个专门的指标来评估计划的质量。在我们的广泛实验中,我们表明,为编码环境添加扁平表并使用迭代解码策略都可以提高LMS的基础计划能力。我们对结果的分析也导致有趣的非平凡发现。
translated by 谷歌翻译
预训练模型已在许多代码智能任务中有效。这些模型在大规模未标记的语料库中进行了预训练,然后在下游任务中进行了微调。但是,由于预训练和下游任务的输入是不同的形式,因此很难充分探索预训练模型的知识。此外,微调的性能强烈依赖于下游数据的量,而实际上,具有稀缺数据的场景很常见。自然语言处理(NLP)领域的最新研究表明,迅速调整,一种调整的新范式,减轻上述问题并在各种NLP任务中实现了有希望的结果。在迅速调整中,在调整过程中插入的提示提供了特定于任务的知识,这对于具有相对较少数据的任务特别有益。在本文中,我们凭经验评估了代码智能任务中迅速调整的用法和效果。我们对流行的预训练模型Codebert和codet5进行及时调整,并尝试三个代码智能任务,包括缺陷预测,代码摘要和代码翻译。我们的实验结果表明,在所有三个任务中,迅速调整始终优于微调。此外,及时调整在低资源场景中显示出很大的潜力,例如,对于代码摘要,平均将微调的BLEU分数提高了26%以上。我们的结果表明,我们可以调整代码智能任务的迅速调整,以实现更好的性能,尤其是在缺乏特定于任务的数据时,我们可以调整及时调整。
translated by 谷歌翻译
This study focuses on embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. Existing methods rely on a large amount of (instruction, gold trajectory) pairs to learn a good policy. The high data cost and poor sample efficiency prevents the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a novel method, LLM-Planner, that harnesses the power of large language models (LLMs) such as GPT-3 to do few-shot planning for embodied agents. We further propose a simple but effective way to enhance LLMs with physical grounding to generate plans that are grounded in the current environment. Experiments on the ALFRED dataset show that our method can achieve very competitive few-shot performance, even outperforming several recent baselines that are trained using the full training data despite using less than 0.5% of paired training data. Existing methods can barely complete any task successfully under the same few-shot setting. Our work opens the door for developing versatile and sample-efficient embodied agents that can quickly learn many tasks.
translated by 谷歌翻译
任务计划可能需要定义有关机器人需要采取行动的世界的无数领域知识。为了改善这项工作,可以使用大型语言模型(LLM)在任务计划期间为潜在的下一个操作评分,甚至直接生成动作序列,鉴于没有其他域信息的自然语言指令。但是,这样的方法要么需要列举所有可能的下一步评分,要么生成可能包含在当前机器人中给定机器人上不可能操作的自由形式文本。我们提出了一个程序化的LLM提示结构,该结构能够跨越位置环境,机器人功能和任务的计划生成功能。我们的关键见解是提示LLM具有环境中可用操作和对象的类似程序的规格,以及可以执行的示例程序。我们通过消融实验提出了有关迅速结构和生成约束的具体建议,证明了虚拟屋家庭任务中最先进的成功率,并将我们的方法部署在桌面任务的物理机器人组上。网站progprompt.github.io
translated by 谷歌翻译
我们提出了Pangu-Coder,这是一种仅预读的解码器语言模型,该模型采用pangu-alpha架构进行文本到代码生成,即给定自然语言问题描述的编程语言解决方案的合成。我们使用两阶段策略训练Pangu-Coder:第一阶段采用因果语言建模(CLM)来预先培训原始编程语言数据,而第二阶段则使用因果语言建模和掩盖语言建模(MLM)的组合培训目标,专注于文本到代码生成的下游任务,并培训松散的自然语言程序定义和代码功能。最后,我们讨论了pangu-coder-ft,该pander the是通过竞争性编程问题和代码与持续集成测试的结合进行了微调的。我们评估了pangu-coder,重点是它是否生成功能上正确的程序,并证明它在参加较小的上下文窗口和较少的数据培训的同时,它比诸如Codex之类的类似大小的模型(例如Codex)实现等效性或更好的性能。
translated by 谷歌翻译
最近的作品表明,如何将大语言模型(LLM)的推理能力应用于自然语言处理以外的领域,例如机器人的计划和互动。这些具体的问题要求代理商了解世界上许多语义方面:可用技能的曲目,这些技能如何影响世界以及对世界的变化如何映射回该语言。在体现环境中规划的LLMS不仅需要考虑要做什么技能,还需要考虑如何以及何时进行操作 - 答案随着时间的推移而变化,以响应代理商自己的选择。在这项工作中,我们调查了在这种体现的环境中使用的LLM在多大程度上可以推论通过自然语言提供的反馈来源,而无需任何其他培训。我们建议,通过利用环境反馈,LLM能够形成内部独白,使他们能够在机器人控制方案中进行更丰富的处理和计划。我们研究了各种反馈来源,例如成功检测,场景描述和人类互动。我们发现,闭环语言反馈显着改善了三个领域的高级指导完成,包括模拟和真实的桌面顶部重新排列任务以及现实世界中厨房环境中的长途移动操作任务。
translated by 谷歌翻译
语言规划旨在通过分解为更简单的低级步骤来实现复杂的高级目标。这种程序推理能力对于诸如家用机器人和虚拟助手等应用至关重要。尽管语言规划是日常生活中人类的基本技能,但对于缺乏现实世界中缺乏深层常识性知识的大型语言模型(LLM)来说,这仍然是一个挑战。以前的方法需要手动示例或带注释的程序才能从LLM中获取此类能力。相比之下,本文提出了神经符号的因果语言规划师(CLAP),该策划者通过注入常识的提示从LLM中引起了程序知识。 LLMS中的预训练知识本质上是一种未观察到的混杂因素,它在任务和行动计划之间引起虚假的相关性。通过结构性因果模型(SCM)的镜头,我们提出了一个有效的策略,以构建提示作为对SCM的因果干预。我们的策略使用图形采样技术和符号程序执行者,正式从常识知识基础上形成结构化因果提示。拍手在Wikihow和机器人上获得最新的表现,在反事实环境下,人类评估的相对提高了5.28%。这表明在语义和顺序的因果语言规划中拍手的优势。
translated by 谷歌翻译
We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.
translated by 谷歌翻译
当前独立于域的经典计划者需要问题域和实例作为输入的符号模型,从而导致知识采集瓶颈。同时,尽管深度学习在许多领域都取得了重大成功,但知识是在与符号系统(例如计划者)不兼容的亚符号表示中编码的。我们提出了Latplan,这是一种无监督的建筑,结合了深度学习和经典计划。只有一组未标记的图像对,显示了环境中允许的过渡子集(训练输入),Latplan学习了环境的完整命题PDDL动作模型。稍后,当给出代表初始状态和目标状态(计划输入)的一对图像时,Latplan在符号潜在空间中找到了目标状态的计划,并返回可视化的计划执行。我们使用6个计划域的基于图像的版本来评估LATPLAN:8个插头,15个式嘴,Blockworld,Sokoban和两个LightsOut的变体。
translated by 谷歌翻译
We address the general task of structured commonsense reasoning: given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. To employ large language models (LMs) for this task, existing approaches ``serialize'' the output graph as a flat list of nodes and edges. Although feasible, these serialized graphs strongly deviate from the natural language corpora that LMs were pre-trained on, hindering LMs from generating them correctly. In this paper, we show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language, even when the downstream task does not involve source code at all. We demonstrate our approach across three diverse structured commonsense reasoning tasks. In all these natural language tasks, we show that using our approach, a code generation LM (CODEX) outperforms natural-LMs that are fine-tuned on the target task (e.g., T5) and other strong LMs such as GPT-3 in the few-shot setting.
translated by 谷歌翻译
This paper provides an introductory survey to GPT-3. We cover some of the historical development behind this technology, some of the key features of GPT-3, and discuss the machine learning model and the datasets used. We survey both academic and commercial efforts applying GPT-3 in diverse domains such as developing conversational AI chatbots, software development, creative work, domain knowledge, and business productivity. We discuss some of the challenges that GPT-3 faces such as the problems of training complexity, bias, and hallucination/incorrect answers. We also discuss the future research opportunities in this area.
translated by 谷歌翻译
从制造环境到个人房屋的最终用户任务的巨大多样性使得预编程机器人非常具有挑战性。事实上,教学机器人从划痕的新行动可以重复使用以前看不见的任务仍然是一个艰难的挑战,一般都留给了机器人专家。在这项工作中,我们展示了Iropro,这是一个交互式机器人编程框架,允许最终用户没有技术背景,以教授机器人新的可重用行动。我们通过演示和自动规划技术将编程结合起来,以允许用户通过通过动力学示范教授新的行动来构建机器人的知识库。这些行动是概括的,并重用任务计划程序来解决用户定义的先前未经调查的问题。我们将iropro作为Baxter研究机器人的端到端系统实施,同时通过演示通过示范来教授低级和高级操作,以便用户可以通过图形用户界面自定义以适应其特定用例。为了评估我们的方法的可行性,我们首先进行了预设计实验,以更好地了解用户采用所涉及的概念和所提出的机器人编程过程。我们将结果与设计后实验进行比较,在那里我们进行了用户学习,以验证我们对真实最终用户的方法的可用性。总体而言,我们展示了具有不同编程水平和教育背景的用户可以轻松学习和使用Iropro及其机器人编程过程。
translated by 谷歌翻译
Automated plot generation is the challenge of generating a sequence of events that will be perceived by readers as the plot of a coherent story. Traditional symbolic planners plan a story from a goal state and guarantee logical causal plot coherence but rely on a library of hand-crafted actions with their preconditions and effects. This closed world setting limits the length and diversity of what symbolic planners can generate. On the other hand, pre-trained neural language models can generate stories with great diversity, while being generally incapable of ending a story in a specified manner and can have trouble maintaining coherence. In this paper, we present an approach to story plot generation that unifies causal planning with neural language models. We propose to use commonsense knowledge extracted from large language models to recursively expand a story plot in a backward chaining fashion. Specifically, our system infers the preconditions for events in the story and then events that will cause those conditions to become true. We performed automatic evaluation to measure narrative coherence as indicated by the ability to answer questions about whether different events in the story are causally related to other events. Results indicate that our proposed method produces more coherent plotlines than several strong baselines.
translated by 谷歌翻译
Transfer learning, where a model is first pre-trained on a data-rich task before being finetuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.
translated by 谷歌翻译
预审前的语言模型已被证明在许多与软件有关的一代任务中都是有效的。但是,它们不适合编辑任务,因为它们不是为了推理编辑的原因。为了解决这个问题,我们提出了一个新颖的预处理目标,该目标明确地对编辑进行了建模并使用它来构建Coditt5,这是一种用于软件相关编辑任务的大型语言模型,该任务是在大量源代码和自然语言评论中鉴定的。我们将其对各种下游编辑任务进行微调,包括评论更新,错误修复和自动代码审核。通过优于基于纯生成的模型,我们证明了方法的普遍性及其对编辑任务的适用性。我们还展示了纯生成模型和我们的基于编辑的模型如何通过简单的重读策略相互补充,我们可以通过该策略实现三个下游编辑任务的最新性能。
translated by 谷歌翻译
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
translated by 谷歌翻译
A key missing ability of current language models (LMs) is grounding to real-world environments. Most existing work for grounded language understanding uses LMs to directly generate plans that can be executed in the environment to achieve the desired effects. It casts the burden of ensuring grammaticality, faithfulness, and controllability all on the LMs. We propose Pangu, a generic framework for grounded language understanding that capitalizes on the discriminative ability of LMs instead of their generative ability. Pangu consists of a symbolic agent and a neural LM working in a concerted fashion: the agent explores the environment to incrementally construct valid candidate plans, and the LM evaluates the plausibility of the candidate plans to guide the search process. A case study on the challenging problem of knowledge base question answering (KBQA), which features a massive environment, demonstrates the remarkable effectiveness and flexibility of Pangu: A BERT-base LM is sufficient for achieving a new state of the art on standard KBQA datasets, and larger LMs further improve the performance by a large margin. Pangu also enables, for the first time, effective few-shot in-context learning for KBQA with large LMs such as Codex.
translated by 谷歌翻译
在对关节对象表示表示的工作之后,引入了面向对象的网络(FOON)作为机器人的知识图表示。以双方图的形式,Foon包含符号(高级)概念,可用于机器人对任务及其对象级别计划的环境的理解及其环境。在本文之前,几乎没有做任何事情来证明如何通过任务树检索从FOON获取的任务计划如何由机器人执行,因为Foon中的概念太抽象了,无法立即执行。我们提出了一种分层任务计划方法,该方法将FOON图转换为基于PDDL的域知识表示操作计划的表示。由于这个过程,可以获取一个任务计划,即机器人可以从头到尾执行,以利用动态运动原始功能(DMP)的形式使用动作上下文和技能。我们演示了从计划到使用Coppeliasim执行的整个管道,并展示如何将学习的动作上下文扩展到从未见过的场景。
translated by 谷歌翻译