基于相机的非接触式光电子溶血性描绘是指一组流行的非接触生理测量技术。目前的最先进的神经模型通常以伴随金标准生理测量的视频以监督方式培训。但是,它们通常概括域名差别示例(即,与培训集中的视频不同)。个性化模型可以帮助提高型号的概括性,但许多个性化技术仍然需要一些金标准数据。为了帮助缓解这一依赖性,在本文中,我们展示了一种名为Mobilememon的新型移动感应系统,该系统是第一个移动个性化远程生理传感系统,它利用智能手机上的前后相机,为培训产生高质量的自我监督标签个性化非接触式相机的PPG模型。为了评估MobilemeLephys的稳健性,我们使用39名参与者进行了一个用户学习,他们在不同的移动设备下完成了一组任务,照明条件/强度,运动任务和皮肤类型。我们的研究结果表明,Mobilephys显着优于最先进的设备监督培训和几次拍摄适应方法。通过广泛的用户研究,我们进一步检查了Mobilephys如何在复杂的真实环境中执行。我们设想,从我们所提出的双摄像机移动传感系统产生的校准或基于相机的非接触式PPG模型将为智能镜,健身和移动健康应用等许多未来应用打开门。
translated by 谷歌翻译
深度加强学习为雄心机器人提供了坚定的地形的强大运动政策。迄今为止,很少有研究已经利用基于模型的方法来将这些运动技能与机械手的精确控制相结合。在这里,我们将外部动态计划纳入了基于学习的移动操纵的机置策略。我们通过在模拟中应用机器人基础上的随机扳手序列来培训基础政策,并将有无令的扳手序列预测添加到政策观察。然后,该政策学会抵消部分已知的未来干扰。随机扳手序列被使用与模型预测控制的动态计划生成的扳手预测替换为启用部署。在训练期间,我们向机械手显示零拍摄适应。在硬件上,我们展示了带有外部扳手的腿机器人的稳定运动。
translated by 谷歌翻译
A key feature of federated learning (FL) is to preserve the data privacy of end users. However, there still exist potential privacy leakage in exchanging gradients under FL. As a result, recent research often explores the differential privacy (DP) approaches to add noises to the computing results to address privacy concerns with low overheads, which however degrade the model performance. In this paper, we strike the balance of data privacy and efficiency by utilizing the pervasive social connections between users. Specifically, we propose SCFL, a novel Social-aware Clustered Federated Learning scheme, where mutually trusted individuals can freely form a social cluster and aggregate their raw model updates (e.g., gradients) inside each cluster before uploading to the cloud for global aggregation. By mixing model updates in a social group, adversaries can only eavesdrop the social-layer combined results, but not the privacy of individuals. We unfold the design of SCFL in three steps. \emph{i) Stable social cluster formation. Considering users' heterogeneous training samples and data distributions, we formulate the optimal social cluster formation problem as a federation game and devise a fair revenue allocation mechanism to resist free-riders. ii) Differentiated trust-privacy mapping}. For the clusters with low mutual trust, we design a customizable privacy preservation mechanism to adaptively sanitize participants' model updates depending on social trust degrees. iii) Distributed convergence}. A distributed two-sided matching algorithm is devised to attain an optimized disjoint partition with Nash-stable convergence. Experiments on Facebook network and MNIST/CIFAR-10 datasets validate that our SCFL can effectively enhance learning utility, improve user payoff, and enforce customizable privacy protection.
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译
As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is better, and then train a preference model from this dataset of AI preferences. We then train with RL using the preference model as the reward signal, i.e. we use 'RL from AI Feedback' (RLAIF). As a result we are able to train a harmless but non-evasive AI assistant that engages with harmful queries by explaining its objections to them. Both the SL and RL methods can leverage chain-of-thought style reasoning to improve the human-judged performance and transparency of AI decision making. These methods make it possible to control AI behavior more precisely and with far fewer human labels.
translated by 谷歌翻译
Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think about this problem, with a focus on how to turn it into one that can be productively studied empirically. We first present an experimental design centered on choosing tasks for which human specialists succeed but unaided humans and current general AI systems fail. We then present a proof-of-concept experiment following meant to demonstrate a key feature of this experimental design and show its viability with two question-answering tasks: MMLU and time-limited QuALITY. On these tasks, we find that human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance. These results are an encouraging sign that scalable oversight will be tractable to study with present models and bolster recent findings that large language models can productively assist humans with difficult tasks.
translated by 谷歌翻译
“感应头”是注意力头,它实现了一种简单的算法来完成令牌序列,例如[a] [b] ... [a] - > [b]。在这项工作中,我们提供了一个假设的初步和间接证据,即诱导头可能构成大型大型变压器模型中所有“文本学习”中大多数的机制(即减少在增加代币指数时损失的损失)。我们发现,诱导头在与秘密学习能力突然急剧上的急剧上升的位置完全相同,这是训练损失的颠簸。我们提出了六种互补的证据,认为诱导头可能是任何大小的变压器模型中一般性内部学习的机理来源。对于仅关注的小型模型,我们提供了有力的因果证据。对于具有MLP的较大模型,我们提供相关证据。
translated by 谷歌翻译
模块化设计是未来大型空间设施的On On On构造技术的基础。标准界面是未来空间机器人系统和空间设施模块化设计的关键技术。本文介绍了Petlock的设计和测试,标准和测试无性别界面可以在未来的模块化空间机器人操纵器和航天器之间传递机械载荷,功率和数据。Petlock采用完全无性别的设计,包括连接面,锁定机制,数据和功率接口。连接表面提供了较大的翻译和旋转错位耐受性,由于其120度对称和3D形状的设计。锁定机制具有三个锁定引脚撤回结构设计,这是简单可靠的。高锁定力,高容忍度,高可靠性和低成本的优势,Petloc K在未来的轨道施工任务中具有很大的应用潜力。
translated by 谷歌翻译
向前和向后触及逆运动学(FABRIK)是一种启发式逆运动求解器,逐渐应用于具有快速收敛和生成更真实配置的优势的操纵器。但是,在高误差限制下,Fabrik表现出不稳定的收敛行为,这对于操纵器的实时运动计划是不满意的。在本文中,提出了一种结合Fabrik和顺序二次编程(SQP)算法的新型逆运动学算法,其中Fabrik推迟的关节角度将被视为SQP算法的初始种子,以避免粘在局部最小值中。通过实验评估合并的算法,在高误差约束下,我们的算法比FabRik获得更高的成功率和更快的解决方案时间。此外,联合算法可以在路径跟踪中为UR5和KUKA LBR IIWA 14 R820操纵器生成连续轨迹,而无姿势误差和最终效应器的允许位置误差。
translated by 谷歌翻译
示范学习旨在通过在少数射击设置中提供回答的演示来指导及时的预测。尽管取得了令人鼓舞的结果,但现有工作仅将回答的示例与及时模板(包括原始上下文)相连,而无需任何其他操作,从而忽略了迅速示意的依赖性。此外,先前的研究发现,随机替换示威的标签极小地损害了性能,这表明该模型无法正确地了解示威活动所带来的知识。受到人类学习过程的启发,在本文中,我们引入了模仿演示学习(模仿),以通过明确模仿人类审查行为来加强演示学习,其中包括:(1)对比度学习机制,以专注于类似的演示。 (2)证明标签重新预测方法以合并已知知识。实验结果表明,我们提出的方法在14个分类中心中有11个实现了最先进的性能。进一步的研究还证明,模仿 - demo加强了迅速与示威之间的关联,这可以为探索示范学习的工作方式提供基础。
translated by 谷歌翻译