Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
translated by 谷歌翻译
Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions. Prior works on improving the logical reasoning ability of language models require complex processing of training data (e.g., aligning symbolic knowledge to text), yielding task-specific data augmentation solutions that restrict the learning of general logical reasoning skills. In this work, we propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities. We select a subset of Wikipedia, based on a set of logical inference keywords, for continued pretraining of a language model. We use two self-supervised loss functions: a modified masked language modeling loss where only specific parts-of-speech words, that would likely require more reasoning than basic language understanding, are masked, and a sentence-level classification loss that teaches the model to distinguish between entailment and contradiction types of sentences. The proposed training paradigm is both simple and independent of task formats. We demonstrate the effectiveness of APOLLO by comparing it with prior baselines on two logical reasoning datasets. APOLLO performs comparably on ReClor and outperforms baselines on LogiQA.
translated by 谷歌翻译
This article proposes a model-based deep reinforcement learning (DRL) method to design emergency control strategies for short-term voltage stability problems in power systems. Recent advances show promising results in model-free DRL-based methods for power systems, but model-free methods suffer from poor sample efficiency and training time, both critical for making state-of-the-art DRL algorithms practically applicable. DRL-agent learns an optimal policy via a trial-and-error method while interacting with the real-world environment. And it is desirable to minimize the direct interaction of the DRL agent with the real-world power grid due to its safety-critical nature. Additionally, state-of-the-art DRL-based policies are mostly trained using a physics-based grid simulator where dynamic simulation is computationally intensive, lowering the training efficiency. We propose a novel model-based-DRL framework where a deep neural network (DNN)-based dynamic surrogate model, instead of a real-world power-grid or physics-based simulation, is utilized with the policy learning framework, making the process faster and sample efficient. However, stabilizing model-based DRL is challenging because of the complex system dynamics of large-scale power systems. We solved these issues by incorporating imitation learning to have a warm start in policy learning, reward-shaping, and multi-step surrogate loss. Finally, we achieved 97.5% sample efficiency and 87.7% training efficiency for an application to the IEEE 300-bus test system.
translated by 谷歌翻译
Autonomous driving confronts great challenges in complex traffic scenarios, where the risk of Safety of the Intended Functionality (SOTIF) can be triggered by the dynamic operational environment and system insufficiencies. The SOTIF risk is reflected not only intuitively in the collision risk with objects outside the autonomous vehicles (AVs), but also inherently in the performance limitation risk of the implemented algorithms themselves. How to minimize the SOTIF risk for autonomous driving is currently a critical, difficult, and unresolved issue. Therefore, this paper proposes the "Self-Surveillance and Self-Adaption System" as a systematic approach to online minimize the SOTIF risk, which aims to provide a systematic solution for monitoring, quantification, and mitigation of inherent and external risks. The core of this system is the risk monitoring of the implemented artificial intelligence algorithms within the AV. As a demonstration of the Self-Surveillance and Self-Adaption System, the risk monitoring of the perception algorithm, i.e., YOLOv5 is highlighted. Moreover, the inherent perception algorithm risk and external collision risk are jointly quantified via SOTIF entropy, which is then propagated downstream to the decision-making module and mitigated. Finally, several challenging scenarios are demonstrated, and the Hardware-in-the-Loop experiments are conducted to verify the efficiency and effectiveness of the system. The results demonstrate that the Self-Surveillance and Self-Adaption System enables dependable online monitoring, quantification, and mitigation of SOTIF risk in real-time critical traffic environments.
translated by 谷歌翻译
Entities, as important carriers of real-world knowledge, play a key role in many NLP tasks. We focus on incorporating entity knowledge into an encoder-decoder framework for informative text generation. Existing approaches tried to index, retrieve, and read external documents as evidence, but they suffered from a large computational overhead. In this work, we propose an encoder-decoder framework with an entity memory, namely EDMem. The entity knowledge is stored in the memory as latent representations, and the memory is pre-trained on Wikipedia along with encoder-decoder parameters. To precisely generate entity names, we design three decoding methods to constrain entity generation by linking entities in the memory. EDMem is a unified framework that can be used on various entity-intensive question answering and generation tasks. Extensive experimental results show that EDMem outperforms both memory-based auto-encoder models and non-memory encoder-decoder models.
translated by 谷歌翻译
通常,通过解决轨迹优化问题并使用跟踪控制器来执行轨迹,通常在四足机器人上实现了专业运动。这种方法与通常通过在线重新计划控制常规步态的模型预测控制(MPC)策略平行。在这项工作中,我们提出了一种非线性MPC(NMPC)技术,该技术可以在统一框架内自然地重新计划专门运动技能和常规运动。 NMPC有关混合动力学模型的原因,并使用约束差分动态编程(DDP)求解器的变体来解决。拟议的NMPC使机器人能够发挥各种敏捷技能,例如跳跃,边界和小跑,以及这些技能之间的快速过渡。我们通过三个具有挑战性的运动序列评估了提出的算法,这些算法将多个敏捷技能结合在两个四倍的平台,即Unitree A1和MIT Mini Cheetah上,显示了其有效性和通用性。
translated by 谷歌翻译
强化学习(RL)见证了四足动物的大步进展,在可靠的SIM转移到现实的政策转移方面持续进展。但是,重用另一个机器人的政策仍然是一个挑战,这可以节省重新培训的时间。在这项工作中,我们提出了一个用于零射击政策重新定位的框架,其中可以在不同形状和尺寸的机器人之间转移多种运动技能。新框架以系统整合RL和模型预测控制(MPC)的计划和控制管道为中心。计划阶段采用RL来生成动态合理的轨迹以及联系时间表,避免了接触序列优化的组合复杂性。然后,将这些信息用于播种MPC,以通过新的混合运动动力学(HKD)模型稳定和鲁棒性地推出策略,该模型隐含地优化了立足点位置。硬件结果表明能够将政策从A1和Laikago机器人转移到MIT MIT MINI CHEETAH机器人,而无需重新调整政策。
translated by 谷歌翻译
知识密集型任务,例如开放域问题答案(QA),需要访问大量的世界知识或领域知识。知识密集型任务的一种常见方法是采用检索到阅读的管道,该管道首先从诸如Wikipedia之类的外部语料库中检索少数相关的上下文文档,然后预测在检索文档的条件下得到答案。在本文中,我们提出了一种新的观点,可以通过用大型语言模型生成器代替文档检索器来解决知识密集型任务。我们称我们的方法生成-Read Read(GenRead),该方法首先提示大型语言模型根据给定问题生成上下文文档,然后读取生成的文档以产生最终答案。此外,我们提出了一种基于聚类的提示方法,该方法选择了不同的提示,从而产生了涵盖不同观点的生成文档,从而更好地回忆了可接受的答案。我们对三个不同的知识密集任务进行了广泛的实验,包括开放域质量检查,事实检查和对话系统。值得注意的是,GenRead在Triviaqa和WebQ上实现了71.6和54.4的精确匹配分数,显着超过了最先进的检索到+4.0和+3.9的最先进的dpr-fid,而无需从任何外部知识源中检索任何文档。最后,我们证明可以通过结合检索和生成来进一步提高模型性能。
translated by 谷歌翻译
图像文本检索(ITR)在桥接视觉和舌形式方面具有挑战性。对比度学习已被大多数先前的艺术所采用。除了有限的负面图像文本对外,约束学习的能力受到手动加权负对以及对外部知识的不认识的限制。在本文中,我们提出了新型耦合多样性敏感的动量约束学习(编码器),以改善跨模式表示。首先,发明了一种新颖的多样性对比度学习(DCL)体系结构。我们引入了两种模式的动态词典,以扩大图像文本对的比例,并且通过自适应负面对加权实现多样性敏感性。此外,编码器设计了两个分支。一个人从图像/文本中学习实例级的嵌入式,它还基于其嵌入为其输入图像/文本生成伪在线聚类标签。同时,另一个分支学会从常识知识图中查询以形成两种模式的概念级描述符。之后,两个分支都利用DCL来对齐跨模式嵌入空间,而额外的伪聚类标签预测损失则用于促进第二个分支的概念级表示学习。在两个流行的基准测试(即Mscoco和Flicker30k)上进行的广泛实验,验证编码器的表现明显优于最先进的方法。
translated by 谷歌翻译
进化策略(ES)算法由于其巨大的并行能力,简单的实现,有效的参数空间探索和快速训练时间,在训练复杂的机器人控制策略中显示出令人鼓舞的结果。但是,ES的关键限制是其对大容量模型(包括现代神经网络体系结构)的可扩展性。在这项工作中,我们开发了预测信息增强随机搜索(PI-ARS),以通过利用表示表示学习来减少ES的参数搜索空间来减轻这种限制。即,PI-ARS将基于梯度的表示技术,预测信息(PI)与无梯度ES算法,增强随机搜索(ARS)结合在一起,以训练可以处理复杂机器人感觉输入并处理高度非线性机器人的策略动力学。我们在一系列具有挑战性的视觉范围任务上评估了PI-ARS,四倍的机器人需要在不平坦的踏脚石,Quincuncial Pile和移动平台上行走,并完成室内导航任务。在所有任务中,与ARS基线相比,PI-ARS表现出明显更好的学习效率和表现。我们通过证明学识渊博的政策可以成功地转移到真正的四倍机器人的情况下,进一步验证我们的算法,例如,在现实世界中的垫脚石环境上取得了100%的成功率,从而显着提高了先前的结果,从而实现了40%的成功。
translated by 谷歌翻译