This paper presents a technique to train a robot to perform kick-motion in AI soccer by using reinforcement learning (RL). In RL, an agent interacts with an environment and learns to choose an action in a state at each step. When training RL algorithms, a problem called the curse of dimensionality (COD) can occur if the dimension of the state is high and the number of training data is low. The COD often causes degraded performance of RL models. In the situation of the robot kicking the ball, as the ball approaches the robot, the robot chooses the action based on the information obtained from the soccer field. In order not to suffer COD, the training data, which are experiences in the case of RL, should be collected evenly from all areas of the soccer field over (theoretically infinite) time. In this paper, we attempt to use the relative coordinate system (RCS) as the state for training kick-motion of robot agent, instead of using the absolute coordinate system (ACS). Using the RCS eliminates the necessity for the agent to know all the (state) information of entire soccer field and reduces the dimension of the state that the agent needs to know to perform kick-motion, and consequently alleviates COD. The training based on the RCS is performed with the widely used Deep Q-network (DQN) and tested in the AI Soccer environment implemented with Webots simulation software.
translated by 谷歌翻译
随着智能建筑应用的增长,住宅建筑中的占用信息变得越来越重要。在智能建筑物的范式的背景下,为了广泛的目的,需要这种信息,包括提高能源效率和乘员舒适性。在这项研究中,使用基于电器技术信息的深度学习实施了住宅建筑中的占用检测。为此,提出了一种新型的智能住宅建筑系统占用方法。通过智能计量系统测量的电器,传感器,光和HVAC的数据集用于模拟。为了对数据集进行分类,使用了支持向量机和自动编码器算法。混淆矩阵用于准确性,精度,召回和F1,以证明所提出的方法在占用检测中的比较性能。拟议的算法使用电器的技术信息达到95.7〜98.4%。为了验证占用检测数据,采用主成分分析和T分布的随机邻居嵌入(T-SNE)算法。通过使用占用检测,智能建筑物中可再生能源系统的功耗降低到11.1〜13.1%。
translated by 谷歌翻译
变形金刚是一种深入学习语言模型,用于数据中心中的自然语言处理(NLP)服务。在变压器模型中,生成的预训练的变压器(GPT)在文本生成或自然语言生成(NLG)中取得了显着的性能,它需要在摘要阶段处理大型输入上下文,然后是产生一个生成阶段的一次单词。常规平台(例如GPU)专门用于在摘要阶段平行处理大型输入,但是由于其顺序特征,它们的性能在生成阶段显着降低。因此,需要一个有效的硬件平台来解决由文本生成的顺序特征引起的高潜伏期。在本文中,我们提出了DFX,这是一种多FPGA加速器,该设备在摘要和发电阶段中执行GPT-2模型端到端,并具有低延迟和高吞吐量。 DFX使用模型并行性和优化的数据流,这是模型和硬件感知的设备之间快速同时执行执行。其计算核心根据自定义说明运行,并提供GPT-2操作端到端。我们在四个Xilinx Alveo U280 FPGAS上实现了建议的硬件体系结构,并利用了高带宽内存(HBM)的所有频道,以及用于高硬件效率的最大计算资源数量。 DFX在现代GPT-2模型上实现了四个NVIDIA V100 GPU的5.58倍加速度和3.99倍的能效。 DFX的成本效益比GPU设备更具成本效益,这表明它是云数据中心中文本生成工作负载的有前途解决方案。
translated by 谷歌翻译
为了通过使用可再生能源来取代化石燃料,间歇性风能和光伏(PV)功率的资源不平衡是点对点(P2P)功率交易的关键问题。为了解决这个问题,本文介绍了增强学习(RL)技术。对于RL,图形卷积网络(GCN)和双向长期记忆(BI-LSTM)网络由基于合作游戏理论的纳米簇之间的P2P功率交易共同应用于P2P功率交易。柔性且可靠的DC纳米醇适合整合可再生能源以进行分配系统。每个局部纳米粒子群都采用了生产者的位置,同时着重于功率生产和消费。对于纳米级簇的电源管理,使用物联网(IoT)技术将多目标优化应用于每个本地纳米群集群。考虑到风和光伏发电的间歇性特征,进行电动汽车(EV)的充电/排放。 RL算法,例如深Q学习网络(DQN),深度复发Q学习网络(DRQN),BI-DRQN,近端策略优化(PPO),GCN-DQN,GCN-DQN,GCN-DRQN,GCN-DRQN,GCN-BI-DRQN和GCN-PPO用于模拟。因此,合作P2P电力交易系统利用使用时间(TOU)基于关税的电力成本和系统边际价格(SMP)最大化利润,并最大程度地减少电网功耗的量。用P2P电源交易的纳米簇簇的电源管理实时模拟了分配测试馈线,并提议的GCN-PPO技术将纳米糖簇的电量降低了36.7%。
translated by 谷歌翻译
与变压器架构相关的自我监督学习的最新进展使自然语言处理(NLP)表现出极低的困惑。如此强大的模型需要越来越多的模型大小,因此需要大量的计算和内存足迹。在本文中,我们为大规模生成语言模型提出了一个有效的推理框架。作为减少模型大小的关键,我们通过不均匀的量化方法量化权重。然后,我们提出的称为NUQMM的量化矩阵乘法加速了,该内核可以在压缩比和准确性之间进行广泛的权衡。我们提出的NUQMM不仅减少了每个GPU的延迟,还减少了大LMS的全部推断,因为高压缩比(通过低位量化)减轻了最小所需的GPU数量。我们证明NUQMM可以将GPT-3(175b)模型的推理速度加速约14.4倍,并将能源消耗降低93%。
translated by 谷歌翻译
尽管存在扩散模型的各种变化,但将线性扩散扩散到非线性扩散过程中仅由几项作品研究。非线性效应几乎没有被理解,但是直觉上,将有更多有希望的扩散模式来最佳地训练生成分布向数据分布。本文介绍了基于分数扩散模型的数据自适应和非线性扩散过程。提出的隐式非线性扩散模型(INDM)通过结合归一化流量和扩散过程来学习非线性扩散过程。具体而言,INDM通过通过流网络利用\ textIt {litex {litex {littent Space}的线性扩散来隐式构建\ textIt {data Space}的非线性扩散。由于非线性完全取决于流网络,因此该流网络是形成非线性扩散的关键。这种灵活的非线性是针对DDPM ++的非MLE训练,将INDM的学习曲线提高到了几乎最大的似然估计(MLE)训练,事实证明,这是具有身份流量的INDM的特殊情况。同样,训练非线性扩散可以通过离散的步骤大小产生采样鲁棒性。在实验中,INDM实现了Celeba的最新FID。
translated by 谷歌翻译
异常检测涉及广泛的应用,如故障检测,系统监控和事件检测。识别从智能计量系统获得的计量数据的异常是提高电力系统的可靠性,稳定性和效率的关键任务。本文介绍了异常检测过程,以发现在智能计量系统中观察到的异常值。在所提出的方法中,使用双向长短期存储器(BILSTM)的AutoEncoder并找到异常数据点。它通过具有非异常数据的AutoEncoder计算重建错误,并且将分类为异常的异常值通过预定义的阈值与非异常数据分离。基于Bilstm AutoEncoder的异常检测方法用来自985户家庭收集的4种能源电力/水/加热/热水的计量数据进行测试。
translated by 谷歌翻译
Training agents via off-policy deep reinforcement learning (RL) requires a large memory, named replay memory, that stores past experiences used for learning. These experiences are sampled, uniformly or non-uniformly, to create the batches used for training. When calculating the loss function, off-policy algorithms assume that all samples are of the same importance. In this paper, we hypothesize that training can be enhanced by assigning different importance for each experience based on their temporal-difference (TD) error directly in the training objective. We propose a novel method that introduces a weighting factor for each experience when calculating the loss function at the learning stage. In addition to improving convergence speed when used with uniform sampling, the method can be combined with prioritization methods for non-uniform sampling. Combining the proposed method with prioritization methods improves sampling efficiency while increasing the performance of TD-based off-policy RL algorithms. The effectiveness of the proposed method is demonstrated by experiments in six environments of the OpenAI Gym suite. The experimental results demonstrate that the proposed method achieves a 33%~76% reduction of convergence speed in three environments and an 11% increase in returns and a 3%~10% increase in success rate for other three environments.
translated by 谷歌翻译
As the demand for autonomous driving increases, it is paramount to ensure safety. Early accident prediction using deep learning methods for driving safety has recently gained much attention. In this task, early accident prediction and a point prediction of where the drivers should look are determined, with the dashcam video as input. We propose to exploit the double actors and regularized critics (DARC) method, for the first time, on this accident forecasting platform. We derive inspiration from DARC since it is currently a state-of-the-art reinforcement learning (RL) model on continuous action space suitable for accident anticipation. Results show that by utilizing DARC, we can make predictions 5\% earlier on average while improving in multiple metrics of precision compared to existing methods. The results imply that using our RL-based problem formulation could significantly increase the safety of autonomous driving.
translated by 谷歌翻译
在环境中的多进球强化学习中,代理商通过利用从与环境的互动中获得的经验来学习实现多个目标的政策。由于缺乏成功的经验,培训代理人凭借稀疏的二元奖励特别具有挑战性。为了解决这个问题,事后观察体验重播(她)从失败的经历中获得了成功的经验。但是,在不考虑实现目标财产的情况下产生成功的经验效率较低。在本文中,提出了一种基于集群的采样策略,利用实现目标的财产。提出的采样策略小组以不同的方式实现了目标和样本经历。对于分组,使用K-均值聚类算法。集群的质心是从定义为未实现的原始目标的失败目标的分布中获得的。该方法通过使用OpenAI健身房的三个机器人控制任务进行实验来验证。实验的结果表明,所提出的方法显着减少了在这三个任务中的两个中收敛所需的时期数量,并略微增加了其余一个任务的成功率。还表明,提出的方法可以与她的其他抽样策略结合使用。
translated by 谷歌翻译