Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learning that reduces the distribution shift suffered by the latter. However, AIL requires effective exploration during an online reinforcement learning phase. In this work, we show that the standard, naive approach to exploration can manifest as a suboptimal local maximum if a policy learned with AIL sufficiently matches the expert distribution without fully learning the desired task. This can be particularly catastrophic for manipulation tasks, where the difference between an expert and a non-expert state-action pair is often subtle. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory, auxiliary tasks in addition to a main task. The addition of these auxiliary tasks forces the agent to explore states and actions that standard AIL may learn to ignore. Additionally, this particular formulation allows for the reusability of expert data between main tasks. Our experimental results in a challenging multitask robotic manipulation domain indicate that LfGP significantly outperforms both AIL and behaviour cloning, while also being more expert sample efficient than these baselines. To explain this performance gap, we provide further analysis of a toy problem that highlights the coupling between a local maximum and poor exploration, and also visualize the differences between the learned models from AIL and LfGP.
translated by 谷歌翻译
TensorFlow GNN(TF-GNN)是张量曲线的图形神经网络的可扩展库。它是从自下而上设计的,以支持当今信息生态系统中发生的丰富的异质图数据。Google的许多生产模型都使用TF-GNN,最近已作为开源项目发布。在本文中,我们描述了TF-GNN数据模型,其KERAS建模API以及相关功能,例如图形采样,分布式训练和加速器支持。
translated by 谷歌翻译
有效的探索仍然是一个重要的挑战,这可以防止为许多物理系统部署加强学习。对于具有连续和高维状态和动作空间的系统尤其如此,例如机器人操纵器。挑战在稀疏奖励环境中强调,其中设计密集奖励设计所需的低级状态信息不可用。对手仿制学习(AIL)可以通过利用专家生成的最佳行为和基本上提供替代奖励信息的替代来部分克服这一屏障。不幸的是,专家示范的可用性并不一定能够改善代理商有效探索的能力,并且正如我们经常展现所在,可以导致效率低或停滞不前。我们从引导播放(LFGP)中展示了一个框架,其中我们利用了专家演示,除了主要任务,多个辅助任务。随后,使用修改的AIL过程来使用分层模型来学习每个任务奖励和策略,其中通过组合不同任务的调度程序强制对所有任务的探索。这提供了许多好处:具有挑战瓶颈转换的主要任务的学习效率得到改善,专家数据在任务之间可重复使用,并且通过重用学习辅助任务模型的传输学习成为可能。我们在一个具有挑战性的多任务机器人操纵域中的实验结果表明我们的方法有利地对监督模仿学习和最先进的AIL方法进行比较。代码可在https://github.com/utiasstars/lfgp获得。
translated by 谷歌翻译
鉴于ICU(重症监护股)监测心脏病患者,用于大脑活动,我们如何尽早预测其健康结果?早期决策在许多应用中至关重要,例如,监测患者可能有助于早期干预和改善护理。另一方面,EEG数据的早期预测造成了几个挑战:(i)早期准确性权衡;观察更多数据通常会提高精度,但牺牲了,(ii)大规模(用于训练)和流传输(在线决策)数据处理,(iii)多变化(由于多个电极)和多长度(由于变化患者的逗留时间)时间序列。通过这种现实世界的应用程序,我们提供了从早期预测中耗尽的受益者,以及从错误分类到统一的区域特定目标中的成本。统一这两种数量允许我们直接估计单个目标(即益处),重要的是,准确地指示输出预测的时间:当益处估计变为肯定时。 Eventitter(a)是高效且快速的,在输入序列的数量中具有训练时间线性,并且可以实时运行以进行决策,(b)可以处理多变化和可变长度的时间序列,适用于患者数据和(c)是有效的,与竞争对手相比,提供高达2倍的时间,具有相同或更好的准确性。
translated by 谷歌翻译
当今软件的复杂性日益增加,需要成千上万的开发人员的贡献。这种复杂的协作结构使开发人员更有可能引入易缺陷的更改,从而导致软件故障。确定何时引入这些缺陷的变化已被证明具有挑战性,并且使用传统的机器学习(ML)方法来做出这些决定似乎已经达到了平稳状态。在这项工作中,我们构建了由开发人员和源文件组成的贡献图,以捕获构建软件所需的更改的细微复杂性。通过利用这些贡献图,我们的研究表明了使用基于图的ML改善及时(JIT)缺陷预测的潜力。我们假设从贡献图中提取的功能可能是易缺陷变化的预测指标,而不是从软件特征中得出的固有特征。我们使用基于图的ML来证实我们的假设,以分类表示易缺陷变化的边缘。 JIT缺陷预测问题的新框架导致了更好的结果。我们在14个开源项目上测试了我们的方法,并表明我们的最佳模型可以预测代码更改是否会导致F1分数高达77.55 $ \%$的缺陷。这比JIT缺陷预测中最新的$ \%$的增加高达46.72美元。我们描述了局限性,开放挑战以及该方法如何用于操作JIT缺陷预测。
translated by 谷歌翻译
在许多领域,包括强化学习和控制在内的许多领域,从一系列高维观测中学习或识别动力学是一个困难的挑战。最近通过潜在动力学从生成的角度研究了这个问题:将高维观测结果嵌入到较低维的空间中,可以在其中学习动力学。尽管取得了一些成功,但尚未将潜在动力学模型应用于现实世界的机器人系统,在这些机器人系统中,学习的表示形式必须适合各种感知混杂和噪声源。在本文中,我们提出了一种共同学习潜在状态表示的方法以及在感知困难条件下的长期计划和闭环控制的相关动力。作为我们的主要贡献,我们描述了我们的表示如何能够通过检测新颖或分布(OOD)输入来捕获测试时间的异质或输入特异性不确定性的概念。我们介绍了有关两个基于图像的任务的预测和控制实验的结果:一个模拟的摆平衡任务和实现任务的现实世界机器人操纵器。我们证明,与仅在不同程度的输入降解的情况下,我们的模型可产生更准确的预测,并表现出改善的控制性能。
translated by 谷歌翻译
Accurate determination of a small molecule candidate (ligand) binding pose in its target protein pocket is important for computer-aided drug discovery. Typical rigid-body docking methods ignore the pocket flexibility of protein, while the more accurate pose generation using molecular dynamics is hindered by slow protein dynamics. We develop a tiered tensor transform (3T) algorithm to rapidly generate diverse protein-ligand complex conformations for both pose and affinity estimation in drug screening, requiring neither machine learning training nor lengthy dynamics computation, while maintaining both coarse-grain-like coordinated protein dynamics and atomistic-level details of the complex pocket. The 3T conformation structures we generate are closer to experimental co-crystal structures than those generated by docking software, and more importantly achieve significantly higher accuracy in active ligand classification than traditional ensemble docking using hundreds of experimental protein conformations. 3T structure transformation is decoupled from the system physics, making future usage in other computational scientific domains possible.
translated by 谷歌翻译
Location-aware networks will introduce new services and applications for modern convenience, surveillance, and public safety. In this paper, we consider the problem of cooperative localization in a wireless network where the position of certain anchor nodes can be controlled. We introduce an active planning method that aims at moving the anchors such that the information gain of future measurements is maximized. In the control layer of the proposed method, control inputs are calculated by minimizing the traces of approximate inverse Bayesian Fisher information matrixes (FIMs). The estimation layer computes estimates of the agent states and provides Gaussian representations of marginal posteriors of agent positions to the control layer for approximate Bayesian FIM computations. Based on a cost function that accumulates Bayesian FIM contributions over a sliding window of discrete future timesteps, a receding horizon (RH) control is performed. Approximations that make it possible to solve the resulting tree-search problem efficiently are also discussed. A numerical case study demonstrates the intelligent behavior of a single controlled anchor in a 3-D scenario and the resulting significantly improved localization accuracy.
translated by 谷歌翻译
Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. In practice, often at least one of these sub-problems is overparameterized. In this case, there are many ways to choose among optima that achieve equivalent objective values. Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of gradient-based algorithms for bilevel optimization. We delineate two standard BLO methods -- cold-start and warm-start -- and show that the converged solution or long-run behavior depends to a large degree on these and other algorithmic choices, such as the hypergradient approximation. We also show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective, even when the outer parameters are low-dimensional. We believe that implicit bias deserves as central a role in the study of bilevel optimization as it has attained in the study of single-level neural net optimization.
translated by 谷歌翻译
The Covid-19 pandemic induced a vast increase in adolescents diagnosed with eating disorders and hospitalized due to eating disorders. This immense growth stemmed partially from the stress of the pandemic but also from increased exposure to content that promotes eating disorders via social media, which, within the last decade, has become plagued by pro-eating disorder content. This study aimed to create a deep learning model capable of determining whether a given social media post promotes eating disorders based solely on image data. Tweets from hashtags that have been documented to promote eating disorders along with tweets from unrelated hashtags were collected. After prepossessing, these images were labeled as either pro-eating disorder or not based on which Twitter hashtag they were scraped from. Several deep-learning models were trained on the scraped dataset and were evaluated based on their accuracy, F1 score, precision, and recall. Ultimately, the vision transformer model was determined to be the most accurate, attaining an F1 score of 0.877 and an accuracy of 86.7% on the test set. The model, which was applied to unlabeled Twitter image data scraped from "#selfie", uncovered seasonal fluctuations in the relative abundance of pro-eating disorder content, which reached its peak in the summertime. These fluctuations correspond not only to the seasons, but also to stressors, such as the Covid-19 pandemic. Moreover, the Twitter image data indicated that the relative amount of pro-eating disorder content has been steadily rising over the last five years and is likely to continue increasing in the future.
translated by 谷歌翻译