由于需要确保安全可靠的人工智能(AI),因此在过去几年中,机器伦理学受到了越来越多的关注。这两种在机器伦理中使用的主要理论是道义和功利主义伦理。另一方面,美德伦理经常被称为另一种伦理理论。尽管这种有趣的方法比流行的道德理论具有一定的优势,但由于其形式化,编纂和解决道德困境以训练良性剂的挑战,工程人工贤惠的媒介几乎没有努力。我们建议通过使用充满道德困境的角色扮演游戏来弥合这一差距。有几种这样的游戏,例如论文,生活很奇怪,主要角色遇到的情况必须通过放弃对他们所珍视的其他东西来选择正确的行动方案。我们从此类游戏中汲取灵感,以展示如何设计系统的角色扮演游戏来发展人造代理中的美德。使用现代的AI技术,例如基于亲和力的强化学习和可解释的AI,我们激励了扮演这种角色扮演游戏的良性代理,以及通过美德道德镜头对他们的决策进行检查。这种代理和环境的发展是朝着实际上正式化和证明美德伦理在伦理代理发展的价值的第一步。
translated by 谷歌翻译
人工智能的扩散越来越依赖于模型理解。理解既需要一种解释 - 关于模型行为的人类推理,又是解释 - 模型功能的象征性表示。尽管必须对安全性,信任和接受的透明度,但最先进的强化学习算法的不透明性掩盖了其学习策略的基础。我们已经开发了一种政策正规化方法,该方法主张了学识渊博的策略的全球固有亲和力。这些亲和力提供了一种关于政策行为的推理手段,从而使其固有地解释。我们已经在个性化的繁荣管理中展示了我们的方法,其中个人的支出行为及时决定了他们的投资策略,即不同的支出人物可能与不同的投资类别有不同的关联。现在,我们通过使用离散的Markov模型重现潜在的原型策略来解释我们的模型。这些全球替代物是原型政策的符号表示。
translated by 谷歌翻译
将强化学习(RL)应用于资产管理的共同目的是利润的最大化。用于学习最佳策略的外部奖励功能通常不会考虑任何其他偏好或约束。我们已经开发了一种正则化方法,该方法可确保策略具有全球固有亲和力,即,不同的个性可能对某些资产可能会随着时间而改变。我们利用这些内在政策亲和力,使我们的RL模型固有地解释。我们演示了如何对RL代理进行培训,以为特定的个性概况编排此类政策,并仍然获得高回报。
translated by 谷歌翻译
产品和服务的个性化正在迅速成为银行和商业成功的驱动力。机器学习具有对客户需求和偏好的更深入了解和量身定制的希望。尽管对财务决策问题的传统解决方案经常依赖于模型假设,但强化学习能够利用大量数据,以改善具有更少假设的复杂财务环境中的客户建模和决策。从监管的角度来看,解释性和可解释性提出了挑战,要求接受透明度;他们还提供了改善对客户的了解和理解的机会。事后方法通常用于解释预贴紧的加固学习模型。基于我们以前对客户支出行为的建模,我们适应了最近的强化学习算法,这些学习算法本质地表征了理想的行为,并且我们过渡到资产管理问题。我们训练固有的可解释的强化学习代理,以提供与原型财务人格特征保持一致的投资建议,这些建议合并为最终建议。我们观察到,受过训练的代理商的建议遵守其预期特征,他们学习复合增长的价值,并且在没有任何明确的参考的情况下,风险的概念以及改善的政策融合。
translated by 谷歌翻译
过去十年已经看到人工智能(AI)的显着进展,这导致了用于解决各种问题的算法。然而,通过增加模型复杂性并采用缺乏透明度的黑匣子AI模型来满足这种成功。为了响应这种需求,已经提出了说明的AI(Xai)以使AI更透明,从而提高关键结构域中的AI。虽然有几个关于Xai主题的Xai主题的评论,但在Xai中发现了挑战和潜在的研究方向,这些挑战和研究方向被分散。因此,本研究为Xai组织的挑战和未来的研究方向提出了系统的挑战和未来研究方向:(1)基于机器学习生命周期的Xai挑战和研究方向,基于机器的挑战和研究方向阶段:设计,开发和部署。我们认为,我们的META调查通过为XAI地区的未来探索指导提供了XAI文学。
translated by 谷歌翻译
金融部门客户的微分是一个非琐碎的任务,近期科学文学一直是一项非典型的遗漏。如果传统分割根据人口统计数据等粗略特征对客户进行分类,则微分内容描绘了个体之间的更细致的差异,提出了几个优点,包括改进金融服务中个性化的潜力。 AI和代表学习提供了解决微分段问题的独特机会。虽然在许多行业普遍存在,但金融等敏感产业的AI扩散已经取决于深层模型的解释性。我们之前通过从经常性神经网络(RNN)的状态空间提取了时间特征来解决了微分段问题。但是,由于RNN的固有不透明度,我们的解决方案缺乏解释。在本研究中,我们通过提取我们模型的符号解释并提供对我们的时间特征的解释来解决这个问题。为了解释,我们使用线性回归模型来重建具有高保真度的状态空间中的功能。我们表明我们的线性回归系数不仅了解了用于重新创建功能的规则,而且还学习了在原始数据中直接明显的关系。最后,我们提出了一种新的方法,通过使用逆回归和动态系统来定位和标记一组吸引子来解释状态空间的动态。
translated by 谷歌翻译
Robotic teleoperation is a key technology for a wide variety of applications. It allows sending robots instead of humans in remote, possibly dangerous locations while still using the human brain with its enormous knowledge and creativity, especially for solving unexpected problems. A main challenge in teleoperation consists of providing enough feedback to the human operator for situation awareness and thus create full immersion, as well as offering the operator suitable control interfaces to achieve efficient and robust task fulfillment. We present a bimanual telemanipulation system consisting of an anthropomorphic avatar robot and an operator station providing force and haptic feedback to the human operator. The avatar arms are controlled in Cartesian space with a direct mapping of the operator movements. The measured forces and torques on the avatar side are haptically displayed to the operator. We developed a predictive avatar model for limit avoidance which runs on the operator side, ensuring low latency. The system was successfully evaluated during the ANA Avatar XPRIZE competition semifinals. In addition, we performed in lab experiments and carried out a small user study with mostly untrained operators.
translated by 谷歌翻译
The purpose of this work was to tackle practical issues which arise when using a tendon-driven robotic manipulator with a long, passive, flexible proximal section in medical applications. A separable robot which overcomes difficulties in actuation and sterilization is introduced, in which the body containing the electronics is reusable and the remainder is disposable. A control input which resolves the redundancy in the kinematics and a physical interpretation of this redundancy are provided. The effect of a static change in the proximal section angle on bending angle error was explored under four testing conditions for a sinusoidal input. Bending angle error increased for increasing proximal section angle for all testing conditions with an average error reduction of 41.48% for retension, 4.28% for hysteresis, and 52.35% for re-tension + hysteresis compensation relative to the baseline case. Two major sources of error in tracking the bending angle were identified: time delay from hysteresis and DC offset from the proximal section angle. Examination of these error sources revealed that the simple hysteresis compensation was most effective for removing time delay and re-tension compensation for removing DC offset, which was the primary source of increasing error. The re-tension compensation was also tested for dynamic changes in the proximal section and reduced error in the final configuration of the tip by 89.14% relative to the baseline case.
translated by 谷歌翻译
Learning enabled autonomous systems provide increased capabilities compared to traditional systems. However, the complexity of and probabilistic nature in the underlying methods enabling such capabilities present challenges for current systems engineering processes for assurance, and test, evaluation, verification, and validation (TEVV). This paper provides a preliminary attempt to map recently developed technical approaches in the assurance and TEVV of learning enabled autonomous systems (LEAS) literature to a traditional systems engineering v-model. This mapping categorizes such techniques into three main approaches: development, acquisition, and sustainment. We review the latest techniques to develop safe, reliable, and resilient learning enabled autonomous systems, without recommending radical and impractical changes to existing systems engineering processes. By performing this mapping, we seek to assist acquisition professionals by (i) informing comprehensive test and evaluation planning, and (ii) objectively communicating risk to leaders.
translated by 谷歌翻译
In inverse reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. However, many existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs). We address two limitations of existing IRL techniques. First, they require an excessive amount of data due to the information asymmetry between the expert and the learner. Second, most of these IRL techniques require solving the computationally intractable forward problem -- computing an optimal policy given a reward function -- in POMDPs. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations. Further, the algorithm avoids a common source of algorithmic complexity by building on causal entropy as the measure of the likelihood of the demonstrations as opposed to entropy. Nevertheless, the resulting problem is nonconvex due to the so-called forward problem. We solve the intrinsic nonconvexity of the forward problem in a scalable manner through a sequential linear programming scheme that guarantees to converge to a locally optimal policy. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the task while inducing similar behavior to the expert by leveraging the provided side information.
translated by 谷歌翻译