The simple idea that not all things are equally difficult has surprising implications when applied in a fairness context. In this work we explore how "difficulty" is model-specific, such that different models find different parts of a dataset challenging. When difficulty correlates with group information, we term this difficulty disparity. Drawing a connection with recent work exploring the inductive bias towards simplicity of SGD-trained models, we show that when such a disparity exists, it is further amplified by commonly-used models. We quantify this amplification factor across a range of settings aiming towards a fuller understanding of the role of model bias. We also present a challenge to the simplifying assumption that "fixing" a dataset is sufficient to ensure unbiased performance.
translated by 谷歌翻译
在对机器学习研究的可靠性和可信度的越来越关注的越来越关注的情况下,我们提出了一个有原则的框架,用于提出可靠和可推广的主张:多元宇宙分析。我们的框架建立在多元宇宙分析(Steegen等,2016)的基础上,该框架是为了应对心理学自身的可重复性危机而引入的。为了有效地探索高维且经常连续的ML搜索空间,我们用高斯工艺替代品对多元宇宙进行建模,并应用贝叶斯实验设计。我们的框架旨在促进有关模型性能的强大科学结论,因此我们的方法着重于探索而不是常规优化。在两个案例研究中的第一个中,我们研究了关于自适应优化者相对优点的有争议的主张。其次,我们综合了关于学习率对大批次培训概括差距的影响的矛盾研究。对于机器学习社区而言,多元宇宙分析是一种简单有效的技术,用于识别稳定的主张,提高透明度以及迈向改善可重复性的一步。
translated by 谷歌翻译
In this paper we explore whether the fundamental tool of experimental psychology, the behavioral experiment, has the power to generate insight not only into humans and animals, but artificial systems too. We apply the techniques of experimental psychology to investigating catastrophic forgetting in neural networks. We present a series of controlled experiments with two-layer ReLU networks, and exploratory results revealing a new understanding of the behavior of catastrophic forgetting. Alongside our empirical findings, we demonstrate an alternative, behavior-first approach to investigating neural network phenomena.
translated by 谷歌翻译
Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-irrelevant resulting in nuisance novelty, such as never before seen noise, blur, or distorted video recordings. To perform HAR optimally, algorithmic solutions must be tolerant to nuisance novelty, and learn over time in the face of novelty. This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current state-of-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA. The protocol as an algorithm for reproducing experiments using the KOWL-718 benchmark will be publicly released with code and containers at https://github.com/prijatelj/human-activity-recognition-in-an-open-world. The code may be used to analyze different annotations and subsets of the Kinetics datasets in an incremental open world fashion, as well as be extended as further updates to Kinetics are released.
translated by 谷歌翻译
Climate change is threatening human health in unprecedented orders and many ways. These threats are expected to grow unless effective and evidence-based policies are developed and acted upon to minimize or eliminate them. Attaining such a task requires the highest degree of the flow of knowledge from science into policy. The multidisciplinary, location-specific, and vastness of published science makes it challenging to keep track of novel work in this area, as well as making the traditional knowledge synthesis methods inefficient in infusing science into policy. To this end, we consider developing multiple domain-specific language models (LMs) with different variations from Climate- and Health-related information, which can serve as a foundational step toward capturing available knowledge to enable solving different tasks, such as detecting similarities between climate- and health-related concepts, fact-checking, relation extraction, evidence of health effects to policy text generation, and more. To our knowledge, this is the first work that proposes developing multiple domain-specific language models for the considered domains. We will make the developed models, resources, and codebase available for the researchers.
translated by 谷歌翻译
Existing statistical methods can be used to estimate a policy, or a mapping from covariates to decisions, which can then instruct decision makers. There is great interest in using such data-driven policies in healthcare. In healthcare, however, it is often important to explain to the healthcare provider, and to the patient, how a new policy differs from the current standard of care. This end is facilitated if one can pinpoint the aspects (i.e., parameters) of the policy that change most when moving from the standard of care to the new, suggested policy. To this end, we adapt ideas from Trust Region Policy Optimization. In our work, however, unlike in Trust Region Policy Optimization, the difference between the suggested policy and standard of care is required to be sparse, aiding with interpretability. In particular, we trade off between maximizing expected reward and minimizing the $L_1$ norm divergence between the parameters of the two policies. This yields "relative sparsity," where, as a function of a tuning parameter, $\lambda$, we can approximately control the number of parameters in our suggested policy that differ from their counterparts in the standard of care. We develop our methodology for the observational data setting. We propose a problem-specific criterion for selecting $\lambda$, perform simulations, and illustrate our method with a real, observational healthcare dataset, deriving a policy that is easy to explain in the context of the current standard of care. Our work promotes the adoption of data-driven decision aids, which have great potential to improve health outcomes.
translated by 谷歌翻译
We study the use of model-based reinforcement learning methods, in particular, world models for continual reinforcement learning. In continual reinforcement learning, an agent is required to solve one task and then another sequentially while retaining performance and preventing forgetting on past tasks. World models offer a task-agnostic solution: they do not require knowledge of task changes. World models are a straight-forward baseline for continual reinforcement learning for three main reasons. Firstly, forgetting in the world model is prevented by persisting existing experience replay buffers across tasks, experience from previous tasks is replayed for learning the world model. Secondly, they are sample efficient. Thirdly and finally, they offer a task-agnostic exploration strategy through the uncertainty in the trajectories generated by the world model. We show that world models are a simple and effective continual reinforcement learning baseline. We study their effectiveness on Minigrid and Minihack continual reinforcement learning benchmarks and show that it outperforms state of the art task-agnostic continual reinforcement learning methods.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
在这项研究中,将放射学方法扩展到用于组织分类的光学荧光分子成像数据,称为“验光”。荧光分子成像正在出现在头颈部鳞状细胞癌(HNSCC)切除期间的精确手术引导。然而,肿瘤到正常的组织对比与靶分子表皮生长因子受体(EGFR)的异质表达的内在生理局限性混淆。验光学试图通过探测荧光传达的EGFR表达中的质地模式差异来改善肿瘤识别。从荧光图像样品中提取了总共1,472个标准化的验光特征。涉及支持矢量机分类器的监督机器学习管道接受了25个顶级功能的培训,这些功能由最小冗余最大相关标准选择。通过将切除组织的图像贴片分类为组织学确认的恶性肿瘤状态,将模型预测性能与荧光强度阈值方法进行了比较。与荧光强度阈值方法相比,验光方法在所有测试集样品中提供了一致的预测准确性(无剂量)(平均精度为89%vs. 81%; P = 0.0072)。改进的性能表明,将放射线学方法扩展到荧光分子成像数据为荧光引导手术中的癌症检测提供了有希望的图像分析技术。
translated by 谷歌翻译
我们介绍了一种考虑复杂的环境条件,在极地地区介绍了一种在极地地区长距离海上路线计划的方法。该方法允许构建优化的路线,描述了该过程的三个主要阶段:使用不均匀网格对环境条件进行离散建模,网格最佳路径的构建以及路径平滑。为了说明不同的车辆性能,我们构建了一系列数据驱动的功能,这些功能可以应用于环境网格,以确定给定容器和网格单元的速度限制和燃料要求,以图形和地理空间表示这些数量。在描述我们的结果时,我们展示了一个示例用途,用于Polar Research船RRS David Attenborough爵士(SDA)的路线规划,核算冰的性能特征,并验证韦德尔海地区的时空路线构建,南极洲。我们通过证明路线的变化取决于季节性海冰可变性,所使用的路线规划目标函数的差异以及其他环境条件(如电流)的存在来证明这种路线构建方法的多功能性。为了证明我们的方法的普遍性,我们在北极海洋和波罗的海中介绍了例子。本手稿中概述的技术是通用的,因此可以应用于具有不同特征的血管。我们的方法不仅可以拥有一个船只计划程序,而且我们概述了该工作流程如何适用于更广泛的社区,例如商业和乘客运输。
translated by 谷歌翻译