We introduce an information theoretic model pre-dictive control (MPC) algorithm capable of handling complex cost criteria and general nonlinear dynamics. The generality of the approach makes it possible to use multi-layer neural networks as dynamics models, which we incorporate into our MPC algorithm in order to solve model-based reinforcement learning tasks. We test the algorithm in simulation on a cart-pole swing up and quadrotor navigation task, as well as on actual hardware in an aggressive driving task. Empirical results demonstrate that the algorithm is capable of achieving a high level of performance and does so only utilizing data collected from the system.
translated by 谷歌翻译
Locally weighted learning (LWL) is a class of techniques from nonparametric statistics that provides useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of robotic systems. This paper introduces several LWL algorithms that have been tested successfully in real-time learning of complex robot tasks. We discuss two major classes of LWL, memory-based LWL and purely incremental LWL that does not need to remember any data explicitly. In contrast to the traditional belief that LWL methods cannot work well in high-dimensional spaces, we provide new algorithms that have been tested on up to 90 dimensional learning problems. The applicability of our LWL algorithms is demonstrated in various robot learning examples, including the learning of devil-sticking, pole-balancing by a humanoid robot arm, and inverse-dynamics learning for a seven and a 30 degree-of-freedom robot. In all these examples, the application of our statistical neural networks techniques allowed either faster or more accurate acquisition of motor control than classical control engineering.
translated by 谷歌翻译
在本文中,我们提出了一个框架,用于结合基于深度学习的道路检测,粒子滤波器和模型预测控制(MPC),只使用单目相机,IMU和车轮速度传感器。该框架使用结合LSTM的深度卷积神经网络来学习车辆前方轨道的本地成本图表示。 Aparticle过滤器使用此动态观察模型在原理图中进行定位,并使用MPC积极地使用此基于粒子过滤器的状态估计进行驱动。我们展示了广泛的真实世界测试结果,并证明了车辆在复杂的污垢轨道上的摩擦极限下的可靠运行。我们使用我们的1:5比例测试车,在长达105英尺(32米)的泥路上达到27英里/小时(12米/秒)以上的速度。
translated by 谷歌翻译
In this paper, a model predictive control (MPC) approach for controlling an active front steering system in an autonomous vehicle is presented. At each time step, a trajectory is assumed to be known over a finite horizon, and an MPC controller computes the front steering angle in order to follow the trajectory on slippery roads at the highest possible entry speed. We present two approaches with different computational complexities. In the first approach, we formulate the MPC problem by using a nonlinear vehicle model. The second approach is based on successive online linearization of the vehicle model. Discussions on computational complexity and performance of the two schemes are presented. The effectiveness of the proposed MPC formulation is demonstrated by simulation and experimental tests up to 21 m/s on icy roads.
translated by 谷歌翻译
We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost on-board sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance.
translated by 谷歌翻译
In this paper we present a model predictive control algorithm designed for optimizing non-linear systems subject to complex cost criteria. The algorithm is based on a stochastic optimal control framework using a fundamental relationship between the information theoretic notions of free energy and relative entropy. The optimal controls in this setting take the form of a path integral, which we approximate using an efficient importance sampling scheme. We experimentally verify the algorithm by implementing it on a Graphics Processing Unit (GPU) and apply it to the problem of controlling a fifth-scale Auto-Rally vehicle in an aggressive driving task.
translated by 谷歌翻译
We present a control method for improved repetitive path following for a ground vehicle that is geared towards long-term operation where the operating conditions can change over time and are initially unknown. We use weighted Bayesian Linear Regression (wBLR) to model the unknown dynamics, and show how this simple model is more accurate in both its estimate of the mean behaviour and model uncertainty than Gaussian Process Regression and generalizes to novel operating conditions with little or no tuning. In addition, wBLR allows us to use fast adaptation and long-term learning in one, unified framework, to adapt quickly to new operating conditions and learn repetitive model errors over time. This comes with the added benefit of lower computational cost, longer look-ahead, and easier optimization when the model is used in a stochastic Model Predictive Controller (MPC). In order to fully capitalize on the long prediction horizons that are possible with this new approach, we use Tube MPC to reduce the growth of predicted uncertainty. We demonstrate the effectiveness of our approach in experiment on a 900 kg ground robot showing results over 3.0 km of driving with both physical and artificial changes to the robot's dynamics. All of our experiments are conducted using a stereo camera for localization. I. INTRODUCTION This paper presents a new probabilistic method for modelling robot dynamics geared towards stochastic Model Pre-dictive Control (MPC) and repetitive path following tasks. The goal of our approach is to enable a robot to operate in challenging and changing environments with minimal expert input and prior knowledge of the operating conditions. Our study is motivated by our previous work with Gaussian Processes (GPs) on this topic [1] and an interest in deploying robots in a wide range of operating conditions. Our method requires the unknown part of the dynamics to be linear in a set of model parameters. Safe control methods have emerged as a way to guarantee that safety constraints (e.g. a bound on maximum path tracking error) are kept in the face of model errors. Having an accurate estimate of model error is of critical importance to the validity of these safety guarantees. In order to derive models for complex systems or systems operating in challenging operating conditions, researchers increasingly rely on tools from machine learning. In particular, probabilistic models are used since they provide a measure of model uncertainty which can naturally be used to derive an upper bound on model error. Two common methods for doing this are GP regression [1]-[3] and various forms of local linear regression [4]-[6].
translated by 谷歌翻译
Locally weighted projection regression (LWPR) is a new algorithm for in-cremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
translated by 谷歌翻译
This paper presents a model predictive controller (MPC) structure for solving the path-tracking problem of terrestrial autonomous vehicles. To achieve the desired performance during high-speed driving, the controller architecture considers both the kinematic and the dynamic control in a cascade structure. Our study contains a comparative study between two kinematic linear predictive control strategies: The first strategy is based on the successive linearization concept, and the other strategy combines a local reference frame with an approaching path strategy. Our goal is to search for the strategy that best comprises the performance and hardware-cost criteria. For the dynamic controller , a decentralized predictive controller based on a linearized model of the vehicle is used. Practical experiments obtained using an autonomous "Mini-Baja" vehicle equipped with an embedded computing system are presented. These results confirm that the proposed MPC structure is the solution that better matches the target criteria.
translated by 谷歌翻译
对于任何自动驾驶车辆,控制模块确定其道路性能和安全性,即其精度和稳定性应保持在精确设计的范围内。尽管如此,控制算法需要车载动力学(例如纵向动力学)作为输入,遗憾的是,这些算法不可能实时校准。因此,为了实现合理的性能,大多数(如果不是全部的话)以研究为导向的自动驾驶汽车以一对一的方式进行手动校准。由于手动校准一旦进入工业用途的批量生产阶段就不可持续,我们在此介绍一种基于机器学习的自动驾驶车辆自动校准系统。在本文中,我们将展示如何使用机器学习技术构建adata驱动的纵向校准程序。我们首先从人类驾驶数据生成离线校准表。离线表作为以后使用的初始猜测,它只需要20分钟的数据收集和处理。然后,我们使用anonline-learning算法根据实时性能分析适当更新初始表(离线表)。自2018年4月以来,这种纵向自动校准系统已部署到100多辆百度自动驾驶车辆(包括混合动力家用车和电子交付车辆)。到2018年8月27日,它经过了超过两千小时的测试,十年数千公里(6,213英里),但事实证明是有效的。
translated by 谷歌翻译
由于通用性,使用简单性和贝叶斯预测的效用,高斯过程(GP)回归已广泛应用于机器人技术。特别是,GP回归的主要实现是基于内核的,因为通过利用内核函数作为无限维特征,可以拟合任意非线性函数。虽然结合先前信息有可能大大提高基于内核的GP回归的数据效率,但通过选择内核函数和相关的超参数来表达复杂的先验通常具有挑战性且不直观。此外,基于内核的GPregression的计算复杂度随着样本数量的不足而缩小,限制了其在可获得大量数据的情况下的应用。在这项工作中,我们提出了ALPaCA,一种有效的贝叶斯回归算法,可以解决这些问题。 ALPaCA使用样本函数的数据集来学习特定于域的有限维特征编码,以及相关权重之前的先验,使得该特征空间中的贝叶斯线性回归产生对后密度的准确在线预测。这些功能是神经网络,通过元学习方法进行训练。 ALPaCA从数据集中提取所有先行信息,而不是依赖于任意限制性内核超参数的选择。此外,它大大降低了样本的复杂性,并允许扩展到大型系统。我们研究了ALPaCA在两个简单回归问题,两个模拟机器人系统以及人类执行的车道变换驾驶任务上的表现。我们发现,我们的方法优于基于内核的GP回归,以及theart元学习方法的状态,从而为机器人中的多种回归任务提供了一种有前途的插件工具,其中可扩展性和数据效率是重要的。
translated by 谷歌翻译
Trajectory-following controllers for autonomous ground vehicles must carefully consider the possibility of vehicle instability. Previous approaches have provided a reference velocity along with a trajectory to follow, and a supervisory controller selects low velocity if turns are anticipated. However , such an approach is not robust across different vehicle platforms, and does not take into account passenger comfort. This paper provides a controller and design methodology to couple an existing trajectory controller with a speed limiting controller, where the speed-limiting controller is created based on user driving data. The result is a controller that can be optimized using a set of linearized controllers, and which is also demonstrated to remain below the velocity/turnrate thresholds established by human drivers: a conservative approximation for stability thresholds to prevent rollovers and skidding. Analysis is performed on data gathered using velocities between 0-18 m/s gathered in driving on surface streets as well as maneuvers in an open area, to demonstrate the validity of the thresholds in various conditions.
translated by 谷歌翻译
Precise models of the robot inverse dynamics allow the design of significantly more accurate, energy-efficient and more compliant robot control. However, in some cases the accuracy of rigid-body models does not suffice for sound control performance due to unmodeled nonlinearities arising from hydraulic cable dynamics, complex friction or actuator dynamics. In such cases, estimating the inverse dynamics model from measured data poses an interesting alternative. Nonparametric regression methods, such as Gaussian process regression (GPR) or locally weighted projection regression (LWPR), are not as restrictive as parametric models and, thus, offer a more flexible framework for approximating unknown nonlinearities. In this paper, we propose a local approximation to the standard GPR, called local GPR (LGP), for real-time model online-learning by combining the strengths of both regression methods, i.e., the high accuracy of GPR and the fast speed of LWPR. The approach is shown to have competitive learning performance for high-dimensional data while being sufficiently fast for real-time learning. The effectiveness of LGP is exhibited by a comparison with the state-of-the-art regression techniques, such as GPR, LWPR and ν-SVR. The applicability of the proposed LGP method is demonstrated by real-time online-learning of the inverse dynamics model for robot model-based control on a Barrett WAM robot arm.
translated by 谷歌翻译
Designing controllers for tasks with complex non-linear dynamics is extremely challenging, time-consuming, and in many cases, infeasible. This difficulty is exacerbated in tasks such as robotic food-cutting, in which dynamics might vary both with environmental properties, such as material and tool class, and with time while acting. In this work, we present DeepMPC, an online real-time model-predictive control approach designed to handle such difficult tasks. Rather than hand-design a dynamics model for the task, our approach uses a novel deep architecture and learning algorithm, learning controllers for complex tasks directly from data. We validate our method in experiments on a large-scale dataset of 1488 material cuts for 20 diverse classes, and in 450 real-world robotic experiments, demonstrating significant improvement over several other approaches.
translated by 谷歌翻译
该调查概述了自动系统的验证技术,重点关注安全关键的自治网络物理系统(CPS)及其子组件。 CPS中的自主性通过人工智能(AI)和机器学习(ML)的最新进展实现,通过诸如深度神经网络(DNN)的方法,嵌入在所谓的学习启用组件(LEC)中,完成从分类到控制的任务。最近,正式方法和形式验证社区已经开发出一些方法来描述这些LEC中的行为,其最终目标是正式验证LEC的规范,本文介绍了对这些最近方法的许多方法的调查。
translated by 谷歌翻译
Autonomous helicopter flight is widely regarded to be a highly challenging control problem. Despite this fact, human experts can reliably fly helicopters through a wide range of maneuvers, including aerobatic maneuvers at the edge of the helicopter's capabilities. We present apprenticeship learning algorithms, which leverage expert demonstrations to efficiently learn good controllers for tasks being demonstrated by an expert. These apprenticeship learning algorithms have enabled us to significantly extend the state of the art in autonomous helicopter aerobatics. Our experimental results include the first autonomous execution of a wide range of maneuvers, including but not limited to in-place flips, in-place rolls, loops and hurricanes, and even auto-rotation landings, chaos and tic-tocs, which only exceptional human pilots can perform. Our results also include complete airshows, which require autonomous transitions between many of these maneuvers. Our controllers perform as well as, and often even better than, our expert pilot.
translated by 谷歌翻译
This thesis presents a study of biped dynamic walking using reinforcement learning. A hardware biped robot was built. It uses low gear ratio DC motors in order to provide free leg movements. The Self Scaling Reinforcement learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. A new learning architecture was designed to solve complex control problems. It uses different modules that consist of simple controllers and small neural networks. The architecture allows for easy incorporation of modules that represent new knowledge, or new requirements for the desired task. Control experiments were carried out using a simulator and the physical biped. The biped learned dynamic walking on flat surfaces without any previous knowledge about its dynamic model. This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book. DISSERTATION Subm itted to the University of N ew Hampshire in Partial Fulfillment of the R equirem ents for the D eg ree of Doctor of P hilosophy in Engineering D ecem ber, 1 9 9 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 9717850 C o p y r i g h t 1 9 9 6 b y B e n b r a h i m , H a m id All rights reserved. UMI Microform 9717850
translated by 谷歌翻译
卷积神经网络通常用于控制自动驾驶汽车的转向角。大多数情况下,多个远程摄像机用于生成横向故障情况。在本文中,我们提出了一个新的模型来生成这个数据和标签增加只使用一个短程fisheyecamera。我们展示了我们的模拟器以及如何将其用作横向端到端控制评估的一致度量标准。在对应于超过10000公里和200小时的开放式公路驾驶的自定义数据集上进行实验。最后,我们在现实世界的驾驶场景,开放式道路以及具有挑战性避障和锐化的自定义测试跑道上评估此模型。在我们基于真实世界视频的模拟器中,最终模型在城市道路上具有超过99%的自主权
translated by 谷歌翻译
One of the key challenges in applying reinforcement learning to complex robotic control tasks is the need to gather large amounts of experience in order to find an effective policy for the task at hand. Model-based reinforcement learning can achieve good sample efficiency, but requires the ability to learn a model of the dynamics that is good enough to learn an effective policy. In this work, we develop a model-based reinforcement learning algorithm that combines prior knowledge from previous tasks with online adaptation of the dynamics model. These two ingredients enable highly sample-efficient learning even in regimes where estimating the true dynamics is very difficult, since the online model adaptation allows the method to locally compensate for unmodeled variation in the dynamics. We encode the prior experience into a neural network dynamics model, adapt it online by progressively refitting a local linear model of the dynamics, and use model predictive control to plan under these dynamics. Our experimental results show that this approach can be used to solve a variety of complex robotic manipulation tasks in just a single attempt, using prior data from other manipulation behaviors.
translated by 谷歌翻译
Standardly echo state networks (ESNs) are built from simple additive units with a sigmoid activation function. Here we investigate ESNs whose reservoir units are leaky integrator units. Units of this type have individual state dynamics , which can be exploited in various ways to accommodate the network to the temporal characteristics of a learning task. We present stability conditions , introduce and investigate a stochastic gradient descent method for the optimization of the global learning parameters (input and output feedback scalings, leaking rate, spectral radius), and demonstrate the usefulness of leaky integrator ESNs for (i) learning very slow dynamical systems and replaying the learnt system at different speeds, (ii) classifying of relatively slow and noisy time series (the Japanese Vowel dataset-here we obtain a zero test error rate), and (iii) recognizing strongly time-warped dynamical patterns. 2
translated by 谷歌翻译