背景:具有很小样本量的高维数据中的嵌入式特征选择需要优化模型构建过程的超参数。对于这种超参数优化,必须应用嵌套的交叉验证以避免偏向性能估计。由高维数据进行的重复训练导致了很长的计算时间。此外,它可能会观察到由小验证集中的异常值引起的个体性能评估指标的较高差异。因此,早期停止应用标准修剪算法来节省时间风险,以丢弃有希望的超参数集。结果:为了加快样本量微小数据的高维数据的速度选择,我们适应了最先进的异步连续的休息器。此外,我们将其与基于领域或先验知识的两种补充修剪策略相结合。一种修剪策略立即停止对所选超参数组合的语义上毫无意义的结果进行计算试验。另一个是一种新的外推阈值修剪策略,适用于具有较大性能评估指标差异的嵌套交叉验证。在反复的实验中,我们的组合修剪策略保持了所有有前途的试验。同时,与仅使用最先行的连续减半pruner相比,计算时间大大减少。训练训练的型号少于81.3 \%,获得了相同的优化结果。结论:所提出的组合修剪策略可以加速数据分析或在同一计算时间内更深入地搜索超参数。这导致了时间,资金和能源消耗大量节省,为高级,耗时的分析打开了大门。
translated by 谷歌翻译
The intersection of ground reaction forces in a small, point-like area above the center of mass has been observed in computer simulation models and human walking experiments. This intersection point is often called a virtual pivot point (VPP). With the VPP observed so ubiquitously, it is commonly assumed to provide postural stability for bipedal walking. In this study, we challenge this assumption by questioning if walking without a VPP is possible. Deriving gaits with a neuromuscular reflex model through multi-stage optimization, we found stable walking patterns that show no signs of the VPP-typical intersection of ground reaction forces. We, therefore, conclude that a VPP is not necessary for upright, stable walking. The non-VPP gaits found are stable and successfully rejected step-down perturbations, which indicates that a VPP is not primarily responsible for locomotion robustness or postural stability. However, a collision-based analysis indicates that non-VPP gaits increased the potential for collisions between the vectors of the center of mass velocity and ground reaction forces during walking, suggesting an increased mechanical cost of transport. Although our computer simulation results have yet to be confirmed through experimental studies, they already strongly challenge the existing explanation of the VPP's function and provide an alternative explanation.
translated by 谷歌翻译
It is no secret that deep learning models exhibit undesirable behaviors such as learning spurious correlations instead of learning correct relationships between input/output pairs. Prior works on robustness study datasets that mix low-level features to quantify how spurious correlations affect predictions instead of considering natural semantic factors due to limitations in accessing realistic datasets for comprehensive evaluation. To bridge this gap, in this paper we first investigate how natural background colors play a role as spurious features in image classification tasks by manually splitting the test sets of CIFAR10 and CIFAR100 into subgroups based on the background color of each image. We name our datasets CIFAR10-B and CIFAR100-B. We find that while standard CNNs achieve human-level accuracy, the subgroup performances are not consistent, and the phenomenon remains even after data augmentation (DA). To alleviate this issue, we propose FlowAug, a semantic DA method that leverages the decoupled semantic representations captured by a pre-trained generative flow. Experimental results show that FlowAug achieves more consistent results across subgroups than other types of DA methods on CIFAR10 and CIFAR100. Additionally, it shows better generalization performance. Furthermore, we propose a generic metric for studying model robustness to spurious correlations, where we take a macro average on the weighted standard deviations across different classes. Per our metric, FlowAug demonstrates less reliance on spurious correlations. Although this metric is proposed to study our curated datasets, it applies to all datasets that have subgroups or subclasses. Lastly, aside from less dependence on spurious correlations and better generalization on in-distribution test sets, we also show superior out-of-distribution results on CIFAR10.1 and competitive performances on CIFAR10-C and CIFAR100-C.
translated by 谷歌翻译
We investigate whether three types of post hoc model explanations--feature attribution, concept activation, and training point ranking--are effective for detecting a model's reliance on spurious signals in the training data. Specifically, we consider the scenario where the spurious signal to be detected is unknown, at test-time, to the user of the explanation method. We design an empirical methodology that uses semi-synthetic datasets along with pre-specified spurious artifacts to obtain models that verifiably rely on these spurious training signals. We then provide a suite of metrics that assess an explanation method's reliability for spurious signal detection under various conditions. We find that the post hoc explanation methods tested are ineffective when the spurious artifact is unknown at test-time especially for non-visible artifacts like a background blur. Further, we find that feature attribution methods are susceptible to erroneously indicating dependence on spurious signals even when the model being explained does not rely on spurious artifacts. This finding casts doubt on the utility of these approaches, in the hands of a practitioner, for detecting a model's reliance on spurious signals.
translated by 谷歌翻译
Emotions play an important role in interpersonal interactions and social conflict, yet their function in the development of controversy and disagreement in online conversations has not been explored. To address this gap, we study controversy on Reddit, a popular network of online discussion forums. We collect discussions from a wide variety of topical forums and use emotion detection to recognize a range of emotions from text, including anger, fear, joy, admiration, etc. Our study has three main findings. First, controversial comments express more anger and less admiration, joy and optimism than non-controversial comments. Second, controversial comments affect emotions of downstream comments in a discussion, usually resulting in long-term increase in anger and a decrease in positive emotions, although the magnitude and direction of emotional change depends on the forum. Finally, we show that emotions help better predict which comments will become controversial. Understanding emotional dynamics of online discussions can help communities to better manage conversations.
translated by 谷歌翻译
变压器注意机制中的设计选择,包括弱电感偏置和二次计算复杂性,限制了其用于建模长序列的应用。在本文中,我们介绍了一个简单的,理论上的,单头的门控注意机制,配备了(指数)移动平均线,以将局部依赖性的电感偏置纳入位置 - 敏锐的注意机制中。我们进一步提出了一个具有线性时间和空间复杂性的大型变体,但通过将整个序列分为固定长度的多个块,仅产生最小的质量损失。对广泛的序列建模基准测试的广泛实验,包括远距离竞技场,神经机器翻译,自动回归语言建模以及图像和语音分类,表明,巨人比其他序列模型取得了重大改进,包括变种物的变体和最新的变体模型状态空间模型。
translated by 谷歌翻译
数据质量是发展医疗保健中值得信赖的AI的关键因素。大量具有控制混杂因素的策划数据集可以帮助提高下游AI算法的准确性,鲁棒性和隐私性。但是,访问高质量的数据集受数据获取的技术难度的限制,并且严格的道德限制阻碍了医疗保健数据的大规模共享。数据合成算法生成具有与真实临床数据相似的分布的数据,可以作为解决可信度AI的发展过程中缺乏优质数据的潜在解决方案。然而,最新的数据合成算法,尤其是深度学习算法,更多地集中于成像数据,同时忽略了非成像医疗保健数据的综合,包括临床测量,医疗信号和波形以及电子保健记录(EHRS)(EHRS) 。因此,在本文中,我们将回顾合成算法,尤其是对于非成像医学数据,目的是在该领域提供可信赖的AI。本教程风格的审查论文将对包括算法,评估,局限性和未来研究方向在内的各个方面进行全面描述。
translated by 谷歌翻译
最近的AI算法是黑框模型,其决策难以解释。可解释的AI(XAI)试图通过向客户解释其AI决定,例如决定拒绝贷款申请,以解决缺乏AI的解释性和信任。普遍的智慧是,通过规定完全透明的XAI来调节AI会导致更大的社会福利。本文通过游戏理论模型对一个最大化社会福利的决策制定者,在最大化利润最大化的双重垄断竞争和异性消费者的政策制定者中挑战了这一概念。结果表明XAI调节可能是多余的。实际上,要求完全透明的XAI可能会使公司和客户变得更糟。这揭示了最大化福利和获得可解释的AI输出之间的权衡。我们还讨论了对政策制定者和公司的管理意义。
translated by 谷歌翻译
训练视觉和语言模型的更多数据总是更好吗?我们研究多模式任务中的知识可传递性。当前的机器学习趋势是假设通过从不同任务加入多个数据集,其整体绩效将有所改善。但是,我们表明,并非所有知识都会很好地转移或对相关任务产生积极影响,即使它们共享一个共同的目标也是如此。我们基于数百种分为4组的视觉和语言任务进行了数百个跨表现的分析。尽管同一组中的任务容易相互改进,但结果表明并非总是如此。其他因素(例如数据集大小或训练阶段)也对知识的转移程度也有很大的影响。
translated by 谷歌翻译
基于非线性吸引力 - 抑制力的方法(包括T-SNE,UMAP,FORCEATLAS2,grounvis等)主导了维度降低的现代方法。本文的目的是证明所有此类方法,通过设计,都带有一个沿途自动计算的附加功能,即与这些力相关的向量场。我们展示了该向量领域如何提供其他高质量信息,并根据莫尔斯理论的思想提出了一般的完善策略。这些想法的效率是使用T-SNE在合成和现实生活数据集上专门说明的。
translated by 谷歌翻译