我们解决了选择性分类的问题,目标是在数据集的所需覆盖范围内实现最佳性能。最新的最新选择性方法通过引入单独的选择头或额外的弃权logit进行体系结构变化。在本文中,我们通过确认最先进的方法的出色性能归功于培训更具概括的分类器,为选择性分类提供了令人惊讶的结果。但是,他们的选择机制是次优的。我们认为,选择机制应植根于目标函数,而不是单独计算的分数。因此,在本文中,我们激发了一种基于分类设置的横向熵损失的替代选择策略,即logits的最大值。我们提出的选择策略在所有覆盖范围和所有数据集中都可以通过大幅度的边距获得更好的结果,而无需任何其他计算。最后,受到我们优越的选择机制的启发,我们建议通过熵最小化进一步正规化目标函数。我们提出的具有修改后损耗功能的最大选择选择可实现选择性分类的新最新结果。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recent advances in deep learning (dl) have led to the release of several dl software libraries such as pytorch, Caffe, and TensorFlow, in order to assist machine learning (ml) practitioners in developing and deploying state-of-the-art deep neural networks (DNN), but they are not able to properly cope with limitations in the dl libraries such as testing or data processing. In this paper, we present a qualitative and quantitative analysis of the most frequent dl libraries combination, the distribution of dl library dependencies across the ml workflow, and formulate a set of recommendations to (i) hardware builders for more optimized accelerators and (ii) library builder for more refined future releases. Our study is based on 1,484 open-source dl projects with 46,110 contributors selected based on their reputation. First, we found an increasing trend in the usage of deep learning libraries. Second, we highlight several usage patterns of deep learning libraries. In addition, we identify dependencies between dl libraries and the most frequent combination where we discover that pytorch and Scikit-learn and, Keras and TensorFlow are the most frequent combination in 18% and 14% of the projects. The developer uses two or three dl libraries in the same projects and tends to use different multiple dl libraries in both the same function and the same files. The developer shows patterns in using various deep-learning libraries and prefers simple functions with fewer arguments and straightforward goals. Finally, we present the implications of our findings for researchers, library maintainers, and hardware vendors.
translated by 谷歌翻译
Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems for FL has focused on using stateful always running components, recent work has shown that components in an FL system can greatly benefit from the usage of serverless computing and Function-as-a-Service technologies. To this end, distributed training of models with severless FL systems can be more resource-efficient and cheaper than conventional FL systems. However, serverless FL systems still suffer from the presence of stragglers, i.e., slow clients due to their resource and statistical heterogeneity. While several strategies have been proposed for mitigating stragglers in FL, most methodologies do not account for the particular characteristics of serverless environments, i.e., cold-starts, performance variations, and the ephemeral stateless nature of the function instances. Towards this, we propose FedLesScan, a novel clustering-based semi-asynchronous training strategy, specifically tailored for serverless FL. FedLesScan dynamically adapts to the behaviour of clients and minimizes the effect of stragglers on the overall system. We implement our strategy by extending an open-source serverless FL system called FedLess. Moreover, we comprehensively evaluate our strategy using the 2nd generation Google Cloud Functions with four datasets and varying percentages of stragglers. Results from our experiments show that compared to other approaches FedLesScan reduces training time and cost by an average of 8% and 20% respectively while utilizing clients better with an average increase in the effective update ratio of 17.75%.
translated by 谷歌翻译
这项研究是有关阿拉伯历史文档的光学特征识别(OCR)的一系列研究的第二阶段,并研究了不同的建模程序如何与问题相互作用。第一项研究研究了变压器对我们定制的阿拉伯数据集的影响。首次研究的弊端之一是训练数据的规模,由于缺乏资源,我们的3000万张图像中仅15000张图像。另外,我们添加了一个图像增强层,时间和空间优化和后校正层,以帮助该模型预测正确的上下文。值得注意的是,我们提出了一种使用视觉变压器作为编码器的端到端文本识别方法,即BEIT和Vanilla Transformer作为解码器,消除了CNNs以进行特征提取并降低模型的复杂性。实验表明,我们的端到端模型优于卷积骨架。该模型的CER为4.46%。
translated by 谷歌翻译
人与机器人之间的双向对象移交可以在机器人以人为中心的制造或服务方面具有重要的功能技能。实现此技能的问题在于任何解决方案的能力来处理三个重要方面:(i)交接阶段的同步时间;(ii)对象的处理构成约束;(iii)理解触觉交换以无缝地实现(i)的某些步骤。我们为(i)和(ii)提出了一种新的方法,该方法包括在任务空间二次编程控制框架中明确制定移交过程作为约束,以实现隐式时间和轨迹相遇。我们的方法是在熊猫机器人手臂上实施的,从人类操作员那里拿走对象。
translated by 谷歌翻译
语言模型预训练的最新进展利用大规模数据集创建多语言模型。但是,这些数据集中大多遗漏了低资源语言。这主要是因为网络上没有很好地表示口语,因此被排除在用于创建数据集的大规模爬网中。此外,这些模型的下游用户仅限于最初选择用于预训练的语言的选择。这项工作调查了如何最佳利用现有的预培训模型来为16种非洲语言创建低资源翻译系统。我们关注两个问题:1)如何将预训练的模型用于初始预培训中未包含的语言? 2)生成的翻译模型如何有效地转移到新域?为了回答这些问题,我们创建了一个新的非洲新闻语料库,涵盖16种语言,其中8种语言不属于任何现有评估数据集的一部分。我们证明,将两种语言转移到其他语言和其他领域的最有效策略是,以少量的高质量翻译数据微调大型预训练模型。
translated by 谷歌翻译
本文介绍了我们提交给WMT21共享新闻翻译任务的受限轨道。我们专注于三个相对低的资源语言对孟加拉,从印地语,英语往返Hausa,以及来自Zulu的Xhosa。为了克服相对低行数据的限制,我们使用采用并行和单晶体数据的多任务目标训练多语言模型。此外,我们使用后退转换增强数据。我们还培养了一种双语模型,包括后退转换和知识蒸馏,然后使用序列到序列映射来组合两种模型。我们看到迄今为止英语和来自Hausa的Bleu Point的相对收益约为70%,以及与双语基线相比,孟加拉和从Zulu的孟加拉和从Zulu的相对改善约25%。
translated by 谷歌翻译
在本文中,我们研究了一些常用的设置对(a)预处理面部图像的影响,以及(b)分类和训练,在动作单位(au)检测性能和复杂性上。我们在我们的调查中使用了一个大型数据集,该集合由狂野收集的〜55k视频组成,用于观看商业广告的参与者。预处理设置包括将面部缩放到固定分辨率,将颜色信息(RGB变为灰度),对齐面,以及裁剪AU区域,而分类和培训设置包括类别类型(多标签与二进制)和用于训练模型的数据量。据我们所知,没有任何工作已经调查了这些环境对AU检测的影响。在我们的分析中,我们使用CNNS作为我们的基线分类模型。
translated by 谷歌翻译
在本文中,我们探讨了一些常用的卷积神经网络(CNNS),训练设置和训练集结构对动作单元(AU)检测的影响。具体而言,我们首先比较Au检测中的10个不同的浅层CNN。其次,我们调查不同训练设置的方式(即居中/归一化输入,使用不同的增强严重性和平衡数据)影响Au检测中的性能。第三,我们探讨了增加训练中标记对象和帧的数量的效果,在训练中设置了Au检测性能。这些比较为研究界提供了关于在AU检测中选择不同CNN和培训设置的有用提示。在我们的分析中,我们使用大规模的自然主义数据集,由狂野捕获的〜55k视频组成。据我们所知,没有工作已经调查了这种环境对大型AU数据集的影响。
translated by 谷歌翻译