智能论文笔记

Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-Care

Anna L. Trella , Kelly W. Zhang , Inbal Nahum-Shani , Vivek Shetty , Finale Doshi-Velez , Susan A. Murphy

分类：人工智能 | 机器学习

2022-08-15

牙齿疾病是最常见的慢性疾病之一，尽管可以预防。但是，关于最佳口腔卫生实践的专业建议通常被患者遗忘或放弃。因此，患者可能会受益于及时和个性化的鼓励来进行口腔自我保健行为。在本文中，我们开发了一种在线增强学习（RL）算法，用于优化基于移动的提示以鼓励口腔卫生行为的交付。开发这种算法的主要挑战之一是确保算法考虑当前行动对未来行动有效性（即延迟效应）的影响，尤其是当使算法变得稳定，自动运行时，尤其是当该算法变得简单时在受约束的现实世界中（即高度嘈杂，稀疏的数据）中。我们通过设计质量奖励来应对这一挑战，从而最大程度地提高所需的健康结果（即高质量的刷牙），同时最大程度地减少用户负担。我们还强调了一个程序，可以通过构建模拟环境测试床并使用测试床评估候选人来优化奖励的超参数。本文讨论的RL算法将用于Oralytics，这是一种口头自我护理应用程序，提供行为策略，以促进患者参与口腔卫生实践。

translated by 谷歌翻译

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Kelly W. Zhang , Omer Gottesman , Finale Doshi-Velez

分类：机器学习 | 人工智能

2022-07-30

在强化学习文献中，有许多用于上下文强盗（CB）或马尔可夫决策过程（MDP）环境的算法。但是，当在现实世界中部署强化学习算法时，即使有领域专业知识，通常也很难知道将顺序决策问题视为CB或MDP是否适合。换句话说，行动会影响未来的状态，还是仅影响即时奖励？关于环境的性质做出错误的假设可能会导致学习效率低下，甚至可以阻止该算法学习最佳政策，即使使用无限数据。在这项工作中，我们开发了一种在线算法，该算法使用贝叶斯假设测试方法来学习环境的性质。我们的算法允许从业人员合并有关环境是否是CB还是MDP的知识，并有效地在经典CB和基于MDP的算法之间插值，以减轻对环境分配环境的影响。我们进行仿真并证明，在CB设置中，我们的算法比基于MDP的算法降低了遗憾，而在非Bandit MDP设置中，我们的算法能够学习最佳策略，通常可以与基于MDP的算法相当地遗憾。

translated by 谷歌翻译

Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-implementation Guidelines

Anna L. Trella , Kelly W. Zhang , Inbal Nahum-Shani , Vivek Shetty , Finale Doshi-Velez , Susan A. Murphy

分类：机器学习 | 人工智能

2022-06-08

在线增强学习（RL）算法越来越多地用于个性化移动健康和在线教育领域的数字干预措施。在这些设置中设计和测试RL算法方面的常见挑战包括确保RL算法在实时约束下可以稳定学习和运行，并考虑了环境的复杂性，例如，缺乏用于用户动力学的准确机械模型。为了指导人们如何应对这些挑战，我们将PC（可预测性，可计算性，稳定性）框架扩展到了一个数据科学框架，该框架结合了监督学习中的机器学习和统计数据的最佳实践（Yu and Kumbier，2020年），用于数字干预设置的RL算法。此外，我们提供有关如何设计仿真环境的准则，这是使用PCS框架评估RL候选算法的关键工具。我们说明了使用PCS框架来设计Oralytics的RL算法，这是一项移动健康研究，旨在通过个性化的干预消息来改善用户的牙刷行为。 Oralytics将于2022年底进入该领域。

translated by 谷歌翻译

Statistical Inference with M-Estimators on Adaptively Collected Data

Kelly W. Zhang , Lucas Janson , Susan A. Murphy

分类：机器学习

2021-04-29

强盗算法越来越多地用于现实世界的连续决策问题。与之相关的是能够使用所产生的数据集来支持科学问题的增加，如：一种类型的广告导致更多购买？哪些背景是移动健康干预有效？然而，当与带有强盗算法收集的数据一起使用时，经典统计方法无法提供有效的置信区间。最近已经开发了用于简单模型的替代方法（例如，手段的比较）。然而，使用使用（上下文）强盗算法收集的数据的更复杂模型，缺乏对统计推断进行统计推理的一般方法;例如，当前方法不能用于逻辑回归模型中的参数的有效推断，以获得二进制奖励。在这项工作中，我们开发理论证明使用M估算器的使用 - 这包括基于经验风险最小化的估计，以及最大可能性 - 与自适应算法收集的数据，包括（上下文）强盗算法。具体地，我们表明，用特定自适应重量修改的M估算器可用于构建用于各种推理目标的渐近有效的置信区。

translated by 谷歌翻译

Improving astroBERT using Semantic Textual Similarity

Felix Grezes , Thomas Allen , Sergi Blanco-Cuaresma , Alberto Accomazzi , Michael J. Kurtz , Golnaz Shapurian , Edwin Henneken , Carolyn S. Grant , Donna M. Thompson , Timothy W. Hostetler

分类：自然语言处理

2022-11-29

The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first public release of the astroBERT language model; - show how astroBERT improves over existing public language models on astrophysics specific tasks; - and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT.

translated by 谷歌翻译

Plex: Towards Reliability using Pretrained Large Model Extensions

Dustin Tran , Jeremiah Liu , Michael W. Dusenberry , Du Phan , Mark Collier , Jie Ren , Kehang Han , Zi Wang , Zelda Mariet , Huiyi Hu

分类：机器学习 | (统计)机器学习

2022-07-15

人工智能的最新趋势是将验证的模型用于语言和视觉任务，这些模型已经实现了非凡的表现，但也令人困惑。因此，以各种方式探索这些模型的能力对该领域至关重要。在本文中，我们探讨了模型的可靠性，在其中我们将可靠的模型定义为一个不仅可以实现强大的预测性能，而且在许多涉及不确定性（例如选择性预测，开放式设置识别）的决策任务上，在许多决策任务上表现出色，而且表现良好。强大的概括（例如，准确性和适当的评分规则，例如在分布数据集中和分发数据集上的对数可能性）和适应性（例如，主动学习，几乎没有射击不确定性）。我们设计了40个数据集的10种任务类型，以评估视觉和语言域上可靠性的不同方面。为了提高可靠性，我们分别开发了VIT-PLEX和T5-PLEX，分别针对视觉和语言方式扩展了大型模型。 PLEX极大地改善了跨可靠性任务的最先进，并简化了传统协议，因为它可以改善开箱即用的性能，并且不需要设计分数或为每个任务调整模型。我们演示了高达1B参数的模型尺寸的缩放效果，并预处理数据集大小最多4B示例。我们还展示了PLEX在具有挑战性的任务上的功能，包括零射门的开放式识别，主动学习和对话语言理解中的不确定性。

translated by 谷歌翻译

Building astroBERT, a language model for Astronomy & Astrophysics

Felix Grezes , Sergi Blanco-Cuaresma , Alberto Accomazzi , Michael J. Kurtz , Golnaz Shapurian , Edwin Henneken , Carolyn S. Grant , Donna M. Thompson , Roman Chyla , Stephen McDonald

分类：自然语言处理

2021-12-01

用于探索美国国家航空航天局的搜索工具（广告）可以相当丰富和赋予（例如，类似和趋势的运营商），但研究人员尚未允许完全杠杆语义搜索。例如，对“普朗克任务的结果”查询应该能够区分普朗克（人，任务，常量，机构和更多）的所有各种含义，而无需从用户进一步澄清。在广告中，我们正在将现代机器学习和自然语言处理技术应用于我们最近的天文出版物的数据集，以培训Astrobert，这是一种基于Google研究的深刻语境语言模型。使用AstrBert，我们的目标是丰富广告数据集并提高其可发现性，特别是我们正在开发自己的命名实体识别工具。我们在这里展示我们初步的结果和经验教训。

translated by 谷歌翻译

Riemannian Optimization for Distance-Geometric Inverse Kinematics

Filip Marić , Matthew Giamou , Adam W. Hall , Soroush Khoubyarian , Ivan Petrović , Jonathan Kelly

分类：机器人

2021-08-31

解决逆运动学问题是针对清晰机器人的运动计划，控制和校准的基本挑战。这些机器人的运动学模型通常通过关节角度进行参数化，从而在机器人构型和最终效果姿势之间产生复杂的映射。或者，可以使用机器人附加点之间的不变距离来表示运动学模型和任务约束。在本文中，我们将基于距离的逆运动学的等效性和大量铰接式机器人和任务约束的距离几何问题进行形式化。与以前的方法不同，我们使用距离几何形状和低级别矩阵完成之间的连接来通过局部优化完成部分欧几里得距离矩阵来找到逆运动学解决方案。此外，我们用固定级革兰氏矩阵的Riemannian歧管来参数欧几里得距离矩阵的空间，从而使我们能够利用各种成熟的Riemannian优化方法。最后，我们表明，绑定的平滑性可用于生成知情的初始化，而无需大量的计算开销，从而改善收敛性。我们证明，我们的逆运动求解器比传统技术获得更高的成功率，并且在涉及许多工作区约束的问题上大大优于它们。

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

One-shot Machine Teaching: Cost Very Few Examples to Converge Faster

Chen Zhang , Xiaofeng Cao , Yi Chang , Ivor W Tsang

分类：机器学习 | 人工智能 | 计算机视觉

2022-12-13

Artificial intelligence is to teach machines to take actions like humans. To achieve intelligent teaching, the machine learning community becomes to think about a promising topic named machine teaching where the teacher is to design the optimal (usually minimal) teaching set given a target model and a specific learner. However, previous works usually require numerous teaching examples along with large iterations to guide learners to converge, which is costly. In this paper, we consider a more intelligent teaching paradigm named one-shot machine teaching which costs fewer examples to converge faster. Different from typical teaching, this advanced paradigm establishes a tractable mapping from the teaching set to the model parameter. Theoretically, we prove that this mapping is surjective, which serves to an existence guarantee of the optimal teaching set. Then, relying on the surjective mapping from the teaching set to the parameter, we develop a design strategy of the optimal teaching set under appropriate settings, of which two popular efficiency metrics, teaching dimension and iterative teaching dimension are one. Extensive experiments verify the efficiency of our strategy and further demonstrate the intelligence of this new teaching paradigm.

translated by 谷歌翻译