There is an increasing need in our society to achieve faster advances in Science to tackle urgent problems, such as climate changes, environmental hazards, sustainable energy systems, pandemics, among others. In certain domains like chemistry, scientific discovery carries the extra burden of assessing risks of the proposed novel solutions before moving to the experimental stage. Despite several recent advances in Machine Learning and AI to address some of these challenges, there is still a gap in technologies to support end-to-end discovery applications, integrating the myriad of available technologies into a coherent, orchestrated, yet flexible discovery process. Such applications need to handle complex knowledge management at scale, enabling knowledge consumption and production in a timely and efficient way for subject matter experts (SMEs). Furthermore, the discovery of novel functional materials strongly relies on the development of exploration strategies in the chemical space. For instance, generative models have gained attention within the scientific community due to their ability to generate enormous volumes of novel molecules across material domains. These models exhibit extreme creativity that often translates in low viability of the generated candidates. In this work, we propose a workbench framework that aims at enabling the human-AI co-creation to reduce the time until the first discovery and the opportunity costs involved. This framework relies on a knowledge base with domain and process knowledge, and user-interaction components to acquire knowledge and advise the SMEs. Currently,the framework supports four main activities: generative modeling, dataset triage, molecule adjudication, and risk assessment.
translated by 谷歌翻译
The Elo algorithm, due to its simplicity, is widely used for rating in sports competitions as well as in other applications where the rating/ranking is a useful tool for predicting future results. However, despite its widespread use, a detailed understanding of the convergence properties of the Elo algorithm is still lacking. Aiming to fill this gap, this paper presents a comprehensive (stochastic) analysis of the Elo algorithm, considering round-robin (one-on-one) competitions. Specifically, analytical expressions are derived characterizing the behavior/evolution of the skills and of important performance metrics. Then, taking into account the relationship between the behavior of the algorithm and the step-size value, which is a hyperparameter that can be controlled, some design guidelines as well as discussions about the performance of the algorithm are provided. To illustrate the applicability of the theoretical findings, experimental results are shown, corroborating the very good match between analytical predictions and those obtained from the algorithm using real-world data (from the Italian SuperLega, Volleyball League).
translated by 谷歌翻译
Persistent Memory (PMEM), also known as Non-Volatile Memory (NVM), can deliver higher density and lower cost per bit when compared with DRAM. Its main drawback is that it is typically slower than DRAM. On the other hand, DRAM has scalability problems due to its cost and energy consumption. Soon, PMEM will likely coexist with DRAM in computer systems but the biggest challenge is to know which data to allocate on each type of memory. This paper describes a methodology for identifying and characterizing application objects that have the most influence on the application's performance using Intel Optane DC Persistent Memory. In the first part of our work, we built a tool that automates the profiling and analysis of application objects. In the second part, we build a machine learning model to predict the most critical object within large-scale graph-based applications. Our results show that using isolated features does not bring the same benefit compared to using a carefully chosen set of features. By performing data placement using our predictive model, we can reduce the execution time degradation by 12\% (average) and 30\% (max) when compared to the baseline's approach based on LLC misses indicator.
translated by 谷歌翻译
科学机器学习(SCIML)是对几个不同应用领域的兴趣越来越多的领域。在优化上下文中,基于SCIML的工具使得能够开发更有效的优化方法。但是,必须谨慎评估和执行实施优化的SCIML工具。这项工作提出了稳健性测试的推论,该测试通过表明其结果尊重通用近似值定理,从而确保了基于多物理的基于SCIML的优化的鲁棒性。该测试应用于一种新方法的框架,该方法在一系列基准测试中进行了评估,以说明其一致性。此外,将提出的方法论结果与可行优化的可行区域进行了比较,这需要更高的计算工作。因此,这项工作为保证在多目标优化中应用SCIML工具的稳健性测试提供了比存在的替代方案要低的计算努力。
translated by 谷歌翻译
我们介绍MR-NET,这是一种用于多分辨率神经网络的一般体系结构,也是基于此体系结构进行成像应用的框架。我们的基于坐标的网络在空间和规模上都是连续的,因为它们由多个阶段组成,这些阶段逐渐增加了更细节。除此之外,它们是一个紧凑而有效的表示。我们展示了多分辨率图像表示以及用于纹理放大和缩小以及抗脉化的应用。
translated by 谷歌翻译
分类是数据挖掘和机器学习领域中研究最多的任务之一,并且已经提出了文献中的许多作品来解决分类问题,以解决多个知识领域,例如医学,生物学,安全性和遥感。由于没有单个分类器可以为各种应用程序取得最佳结果,因此,一个很好的选择是采用分类器融合策略。分类器融合方法成功的关键点是属于合奏的分类器之间多样性和准确性的结合。借助文献中可用的大量分类模型,一个挑战是选择最终分类系统的最合适的分类器,从而产生了分类器选择策略的需求。我们通过基于一个称为CIF-E(分类器,初始化,健身函数和进化算法)的四步协议的分类器选择和融合的框架来解决这一点。我们按照提出的CIF-E协议实施和评估24种各种集合方法,并能够找到最准确的方法。在文献中最佳方法和许多其他基线中,还进行了比较分析。该实验表明,基于单变量分布算法(UMDA)的拟议进化方法可以超越许多著名的UCI数据集中最新的文献方法。
translated by 谷歌翻译
我们提出了一种使用平滑数值方法来构建大型数据集的模糊簇的新方法。通常会放宽方面的标准,因此在连续的空间上进行了良好的模糊分区的搜索,而不是像经典方法\ cite {hartigan}那样的组合空间。平滑性可以通过使用无限类别的可区分函数,从强烈的非差异问题转换为优化的可区别子问题。为了实现算法,我们使用了统计软件$ r $,并将获得的结果与Bezdek提出的传统模糊$ C $ - 表示方法进行了比较。
translated by 谷歌翻译
基于分数的生成模型是一类新的生成算法,即使在高维空间中也可以产生逼真的图像,目前超过其他基准类别和应用程序的其他最新模型。在这项工作中,我们介绍了Caloscore,这是一种基于分数的生成模型,用于对量热计淋浴的应用。使用快速热量量表模拟挑战2022数据集研究了三个不同的扩散模型。Caloscore是基于分数的生成模型在对撞机物理学中的第一个应用,并且能够为所有数据集生成高保真量热计图像,为热量计淋浴模拟提供了替代范式。
translated by 谷歌翻译
我们引入了一个神经隐式框架,该框架利用神经网络的可区分特性和点采样表面的离散几何形状,以将它们作为神经隐含函数的级别集近似。为了训练神经隐式函数,我们提出了近似签名距离函数的损失功能,并允许具有高阶导数的术语,例如曲率的主要方向之间的对齐方式,以了解更多几何细节。在训练过程中,我们考虑了基于点采样表面的曲率的不均匀采样策略,以优先考虑点更多的几何细节。与以前的方法相比,这种抽样意味着在保持几何准确性的同时更快地学习。我们还介绍了神经表面(例如正常矢量和曲率)的分析差异几何公式。
translated by 谷歌翻译
我们为多层神经网络架构定义了一种不断微弱的完美学习算法的概念,并表明了这种算法不存在,条件是数据集的长度超过所涉及的参数的数量,并且激活功能是逻辑,坦希或罪。
translated by 谷歌翻译