在强化学习文献中,有许多用于上下文强盗(CB)或马尔可夫决策过程(MDP)环境的算法。但是,当在现实世界中部署强化学习算法时,即使有领域专业知识,通常也很难知道将顺序决策问题视为CB或MDP是否适合。换句话说,行动会影响未来的状态,还是仅影响即时奖励?关于环境的性质做出错误的假设可能会导致学习效率低下,甚至可以阻止该算法学习最佳政策,即使使用无限数据。在这项工作中,我们开发了一种在线算法,该算法使用贝叶斯假设测试方法来学习环境的性质。我们的算法允许从业人员合并有关环境是否是CB还是MDP的知识,并有效地在经典CB和基于MDP的算法之间插值,以减轻对环境分配环境的影响。我们进行仿真并证明,在CB设置中,我们的算法比基于MDP的算法降低了遗憾,而在非Bandit MDP设置中,我们的算法能够学习最佳策略,通常可以与基于MDP的算法相当地遗憾。
在线增强学习(RL)算法越来越多地用于个性化移动健康和在线教育领域的数字干预措施。在这些设置中设计和测试RL算法方面的常见挑战包括确保RL算法在实时约束下可以稳定学习和运行,并考虑了环境的复杂性,例如,缺乏用于用户动力学的准确机械模型。为了指导人们如何应对这些挑战,我们将PC(可预测性,可计算性,稳定性)框架扩展到了一个数据科学框架,该框架结合了监督学习中的机器学习和统计数据的最佳实践(Yu and Kumbier,2020年),用于数字干预设置的RL算法。此外,我们提供有关如何设计仿真环境的准则,这是使用PCS框架评估RL候选算法的关键工具。我们说明了使用PCS框架来设计Oralytics的RL算法,这是一项移动健康研究,旨在通过个性化的干预消息来改善用户的牙刷行为。 Oralytics将于2022年底进入该领域。
强盗算法越来越多地用于现实世界的连续决策问题。与之相关的是能够使用所产生的数据集来支持科学问题的增加,如:一种类型的广告导致更多购买?哪些背景是移动健康干预有效?然而,当与带有强盗算法收集的数据一起使用时,经典统计方法无法提供有效的置信区间。最近已经开发了用于简单模型的替代方法(例如,手段的比较)。然而,使用使用(上下文)强盗算法收集的数据的更复杂模型,缺乏对统计推断进行统计推理的一般方法;例如,当前方法不能用于逻辑回归模型中的参数的有效推断,以获得二进制奖励。在这项工作中,我们开发理论证明使用M估算器的使用 - 这包括基于经验风险最小化的估计,以及最大可能性 - 与自适应算法收集的数据,包括(上下文)强盗算法。具体地,我们表明,用特定自适应重量修改的M估算器可用于构建用于各种推理目标的渐近有效的置信区。
The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first public release of the astroBERT language model; - show how astroBERT improves over existing public language models on astrophysics specific tasks; - and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT.
人工智能的最新趋势是将验证的模型用于语言和视觉任务,这些模型已经实现了非凡的表现,但也令人困惑。因此,以各种方式探索这些模型的能力对该领域至关重要。在本文中,我们探讨了模型的可靠性,在其中我们将可靠的模型定义为一个不仅可以实现强大的预测性能,而且在许多涉及不确定性(例如选择性预测,开放式设置识别)的决策任务上,在许多决策任务上表现出色,而且表现良好。强大的概括(例如,准确性和适当的评分规则,例如在分布数据集中和分发数据集上的对数可能性)和适应性(例如,主动学习,几乎没有射击不确定性)。我们设计了40个数据集的10种任务类型,以评估视觉和语言域上可靠性的不同方面。为了提高可靠性,我们分别开发了VIT-PLEX和T5-PLEX,分别针对视觉和语言方式扩展了大型模型。 PLEX极大地改善了跨可靠性任务的最先进,并简化了传统协议,因为它可以改善开箱即用的性能,并且不需要设计分数或为每个任务调整模型。我们演示了高达1B参数的模型尺寸的缩放效果,并预处理数据集大小最多4B示例。我们还展示了PLEX在具有挑战性的任务上的功能,包括零射门的开放式识别,主动学习和对话语言理解中的不确定性。
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
Artificial intelligence is to teach machines to take actions like humans. To achieve intelligent teaching, the machine learning community becomes to think about a promising topic named machine teaching where the teacher is to design the optimal (usually minimal) teaching set given a target model and a specific learner. However, previous works usually require numerous teaching examples along with large iterations to guide learners to converge, which is costly. In this paper, we consider a more intelligent teaching paradigm named one-shot machine teaching which costs fewer examples to converge faster. Different from typical teaching, this advanced paradigm establishes a tractable mapping from the teaching set to the model parameter. Theoretically, we prove that this mapping is surjective, which serves to an existence guarantee of the optimal teaching set. Then, relying on the surjective mapping from the teaching set to the parameter, we develop a design strategy of the optimal teaching set under appropriate settings, of which two popular efficiency metrics, teaching dimension and iterative teaching dimension are one. Extensive experiments verify the efficiency of our strategy and further demonstrate the intelligence of this new teaching paradigm.
