Transformers have become central to recent advances in computer vision. However, training a vision Transformer (ViT) model from scratch can be resource intensive and time consuming. In this paper, we aim to explore approaches to reduce the training costs of ViT models. We introduce some algorithmic improvements to enable training a ViT model from scratch with limited hardware (1 GPU) and time (24 hours) resources. First, we propose an efficient approach to add locality to the ViT architecture. Second, we develop a new image size curriculum learning strategy, which allows to reduce the number of patches extracted from each image at the beginning of the training. Finally, we propose a new variant of the popular ImageNet1k benchmark by adding hardware and time constraints. We evaluate our contributions on this benchmark, and show they can significantly improve performances given the proposed training budget. We will share the code in https://github.com/BorealisAI/efficient-vit-training.
translated by 谷歌翻译
Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the $Q$ function generally standing at the core of learning schemes in RL by another function taking into account both the expected return and the risk. Named the risk-based utility function $U$, it can be extracted from the random return distribution $Z$ naturally learnt by any distributional RL algorithm. This enables to span the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, and with an emphasis on the interpretability of the resulting decision-making process.
translated by 谷歌翻译
Remote sensing of the Earth's surface water is critical in a wide range of environmental studies, from evaluating the societal impacts of seasonal droughts and floods to the large-scale implications of climate change. Consequently, a large literature exists on the classification of water from satellite imagery. Yet, previous methods have been limited by 1) the spatial resolution of public satellite imagery, 2) classification schemes that operate at the pixel level, and 3) the need for multiple spectral bands. We advance the state-of-the-art by 1) using commercial imagery with panchromatic and multispectral resolutions of 30 cm and 1.2 m, respectively, 2) developing multiple fully convolutional neural networks (FCN) that can learn the morphological features of water bodies in addition to their spectral properties, and 3) FCN that can classify water even from panchromatic imagery. This study focuses on rivers in the Arctic, using images from the Quickbird, WorldView, and GeoEye satellites. Because no training data are available at such high resolutions, we construct those manually. First, we use the RGB, and NIR bands of the 8-band multispectral sensors. Those trained models all achieve excellent precision and recall over 90% on validation data, aided by on-the-fly preprocessing of the training data specific to satellite imagery. In a novel approach, we then use results from the multispectral model to generate training data for FCN that only require panchromatic imagery, of which considerably more is available. Despite the smaller feature space, these models still achieve a precision and recall of over 85%. We provide our open-source codes and trained model parameters to the remote sensing community, which paves the way to a wide range of environmental hydrology applications at vastly superior accuracies and 2 orders of magnitude higher spatial resolution than previously possible.
translated by 谷歌翻译
As of 2022, greenhouse gases (GHG) emissions reporting and auditing are not yet compulsory for all companies and methodologies of measurement and estimation are not unified. We propose a machine learning-based model to estimate scope 1 and scope 2 GHG emissions of companies not reporting them yet. Our model, specifically designed to be transparent and completely adapted to this use case, is able to estimate emissions for a large universe of companies. It shows good out-of-sample global performances as well as good out-of-sample granular performances when evaluating it by sectors, by countries or by revenues buckets. We also compare our results to those of other providers and find our estimates to be more accurate. Thanks to the proposed explainability tools using Shapley values, our model is fully interpretable, the user being able to understand which factors split explain the GHG emissions for each particular company.
translated by 谷歌翻译
Polypharmacy, most often defined as the simultaneous consumption of five or more drugs at once, is a prevalent phenomenon in the older population. Some of these polypharmacies, deemed inappropriate, may be associated with adverse health outcomes such as death or hospitalization. Considering the combinatorial nature of the problem as well as the size of claims database and the cost to compute an exact association measure for a given drug combination, it is impossible to investigate every possible combination of drugs. Therefore, we propose to optimize the search for potentially inappropriate polypharmacies (PIPs). To this end, we propose the OptimNeuralTS strategy, based on Neural Thompson Sampling and differential evolution, to efficiently mine claims datasets and build a predictive model of the association between drug combinations and health outcomes. We benchmark our method using two datasets generated by an internally developed simulator of polypharmacy data containing 500 drugs and 100 000 distinct combinations. Empirically, our method can detect up to 33\% of PIPs while maintaining an average precision score of 99\% using 10 000 time steps.
translated by 谷歌翻译
当网络条件恶化时,视频会议系统的用户体验差,因为当前的视频编解码器根本无法在极低的比特率下运行。最近,已经提出了几种神经替代方案,可以使用每个框架的稀疏表示,例如面部地标信息,以非常低的比特率重建说话的头视频。但是,这些方法在通话过程中具有重大运动或遮挡的情况下会产生不良的重建,并且不会扩展到更高的分辨率。我们设计了Gemino,这是一种基于新型高频条件超分辨率管道的新型神经压缩系统,用于视频会议。 Gemino根据从单个高分辨率参考图像中提取的信息来增强高频细节(例如,皮肤纹理,头发等),为每个目标框架的一个非常低分辨率的版本(例如,皮肤纹理,头发等)。我们使用多尺度体系结构,该体系结构在不同的分辨率下运行模型的不同组件,从而使其扩展到可与720p相当的分辨率,并且我们个性化模型以学习每个人的特定细节,在低比特率上实现了更好的保真度。我们在AIORTC上实施了Gemino,这是WEBRTC的开源Python实现,并表明它在A100 GPU上实时在1024x1024视频上运行,比比特率的比特率低于传统的视频Codecs,以相同的感知质量。
translated by 谷歌翻译
我们通过查看在弥漫表面上铸造的对象的阴影来研究个体的生物特征识别信息的问题。我们表明,通过最大似然分析,在代表性的情况下,阴影中的生物特征信息泄漏可以足够用于可靠的身份推断。然后,我们开发了一种基于学习的方法,该方法在实际设置中证明了这种现象,从而利用阴影中的微妙提示是泄漏的来源,而无需任何标记的真实数据。特别是,我们的方法依赖于构建由从每个身份的单个照片获得的3D面模型组成的合成场景。我们以完全无监督的方式将我们从合成数据中学到的知识转移到真实数据中。我们的模型能够很好地概括到真实的域,并且在场景中的几种变体都有坚固的范围。我们报告在具有未知几何形状和遮挡对象的场景中发生的身份分类任务中的高分类精度。
translated by 谷歌翻译
我们研究了基于功能的新闻企业问题,其中决策者可以访问包括需求观察和外源特征组成的历史数据。在这种情况下,我们研究了功能选择,旨在得出具有改进样本外部性能的稀疏,可解释的模型。到目前为止,最新的方法利用正则化,这会惩罚所选特征的数量或解决方案向量的规范。作为替代方案,我们介绍了一种新型的双层编程公式。高级问题选择了一部分功能,这些功能将基于固定验证集的订购决策的样本外成本估算最小化。下层问题仅使用上层选择的功能,了解训练集中决策功能的最佳系数。我们为Bilevel程序提供了混合整数线性程序重新制定,可以通过标准优化求解器求解为最佳性。我们的计算实验表明,该方法准确地恢复了几百个观察结果的实例中的基础真相。相反,基于正则化的技术通常在功能恢复时失败,或者需要数千个观察值才能获得相似的准确性。关于样本外的概括,我们实现了改进或可比的成本绩效。
translated by 谷歌翻译
我们提出了组织条件,该条件在各种贝叶斯框架中产生了一种以适应性邻域半径训练SOM的方法。该方法在非平稳设置上进行了验证,并在高维设置中与其他适应性方法进行了比较。
translated by 谷歌翻译
这项研究的目的是评估历史匹配的潜力(HM),以调整具有多尺度动力学的气候系统。通过考虑玩具气候模型,即两尺度的Lorenz96模型并在完美模型设置中生产实验,我们详细探讨了如何需要仔细测试几种内置选择。我们还展示了在参数范围内引入物理专业知识的重要性,这是运行HM的先验性。最后,我们重新审视气候模型调整中的经典过程,该程序包括分别调整慢速和快速组件。通过在Lorenz96模型中这样做,我们说明了合理参数的非唯一性,并突出了从耦合中出现的指标的特异性。本文也有助于弥合不确定性量化,机器学习和气候建模的社区,这是通过在每个社区使用的术语之间建立相同概念的术语并提出有希望的合作途径,从而使气候建模研究受益。
translated by 谷歌翻译