智能论文笔记

Hyperactive Learning (HAL) for Data-Driven Interatomic Potentials

Cas van der Oord , Matthias Sachs , Dávid Péter Kovács , Christoph Ortner , Gábor Csányi

分类： (统计)机器学习

2022-10-09

Data-driven interatomic potentials have emerged as a powerful class of surrogate models for {\it ab initio} potential energy surfaces that are able to reliably predict macroscopic properties with experimental accuracy. In generating accurate and transferable potentials the most time-consuming and arguably most important task is generating the training set, which still requires significant expert user input. To accelerate this process, this work presents \text{\it hyperactive learning} (HAL), a framework for formulating an accelerated sampling algorithm specifically for the task of training database generation. The key idea is to start from a physically motivated sampler (e.g., molecular dynamics) and add a biasing term that drives the system towards high uncertainty and thus to unseen training configurations. Building on this framework, general protocols for building training databases for alloys and polymers leveraging the HAL framework will be presented. For alloys, ACE potentials for AlSi10 are created by fitting to a minimal HAL-generated database containing 88 configurations (32 atoms each) with fast evaluation times of <100 microsecond/atom/cpu-core. These potentials are demonstrated to predict the melting temperature with excellent accuracy. For polymers, a HAL database is built using ACE, able to determine the density of a long polyethylene glycol (PEG) polymer formed of 200 monomer units with experimental accuracy by only fitting to small isolated PEG polymers with sizes ranging from 2 to 32.

translated by 谷歌翻译

MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields

Ilyes Batatia , Dávid Péter Kovács , Gregor N. C. Simm , Christoph Ortner , Gábor Csányi

分类： (统计)机器学习 | 机器学习

2022-06-15

在计算化学和材料科学中，创建快速准确的力场是一项长期挑战。最近，已经证明，几个直径传递神经网络（MPNN）超过了使用其他方法在准确性方面构建的模型。但是，大多数MPNN的计算成本高和可伸缩性差。我们建议出现这些局限性，因为MPNN仅传递两体消息，从而导致层数与网络的表达性之间的直接关系。在这项工作中，我们介绍了MACE，这是一种使用更高的车身订单消息的新型MPNN模型。特别是，我们表明，使用四体消息将所需的消息传递迭代数减少到\ emph {两}，从而导致快速且高度可行的模型，达到或超过RMD17的最新准确性，3BPA和ACAC基准任务。我们还证明，使用高阶消息会导致学习曲线的陡峭程度改善。

translated by 谷歌翻译

Tensor-reduced atomic density representations

James P. Darby , Dávid P. Kovács , Ilyes Batatia , Miguel A. Caro , Gus L. W. Hart , Christoph Ortner , Gábor Csányi

分类：机器学习

2022-10-02

Density based representations of atomic environments that are invariant under Euclidean symmetries have become a widely used tool in the machine learning of interatomic potentials, broader data-driven atomistic modelling and the visualisation and analysis of materials datasets.The standard mechanism used to incorporate chemical element information is to create separate densities for each element and form tensor products between them. This leads to a steep scaling in the size of the representation as the number of elements increases. Graph neural networks, which do not explicitly use density representations, escape this scaling by mapping the chemical element information into a fixed dimensional space in a learnable way. We recast this approach as tensor factorisation by exploiting the tensor structure of standard neighbour density based descriptors. In doing so, we form compact tensor-reduced representations whose size does not depend on the number of chemical elements, but remain systematically convergeable and are therefore applicable to a wide range of data analysis and regression tasks.

translated by 谷歌翻译

Backflipping with Miniature Quadcopters by Gaussian Process Based Control and Planning

Péter Antal , Tamás Péni , Roland Tóth

分类：机器人

2022-09-29

该论文提出了两种控制方法，用于用微型四轮驱动器进行反弹式操纵。首先，对专门为反转设计设计的现有前馈控制策略进行了修订和改进。使用替代高斯工艺模型的贝叶斯优化通过在模拟环境中反复执行翻转操作来找到最佳运动原语序列。第二种方法基于闭环控制，它由两个主要步骤组成：首先，即使在模型不确定性的情况下，自适应控制器也旨在提供可靠的参考跟踪。控制器是通过通过测量数据调整的高斯过程来增强无人机的标称模型来构建的。其次，提出了一种有效的轨迹计划算法，该算法仅使用二次编程来设计可行的轨迹为反弹操作设计。在模拟和使用BitCraze Crazyflie 2.1四肢旋转器中对两种方法进行了分析。

translated by 谷歌翻译

Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings

Dávid Sztahó , Attila Fejes

分类：自然语言处理

2022-09-26

在法医语音比较中，扬声器的嵌入在过去十年中已广泛流行。大多数审计的扬声器嵌入式嵌入都经过英语语料库进行培训，因为它很容易访问。因此，语言依赖性可能是自动法医语音比较的重要因素，尤其是当目标语言在语言上非常不同时。有许多商业系统可用，但是它们的模型主要接受与目标语言不同的语言（主要是英语）的培训。在低资源语言的情况下，开发用于法医目的的语料库，其中包含足够的扬声器来训练深度学习模型是昂贵的。这项研究旨在调查是否可以在目标低资源语言（匈牙利语）上使用预先培训的英语语料库的模型，与模型不同。另外，通常没有犯罪者（未知的扬声器）获得多个样本。因此，在有或没有说话者入学率的嫌疑人（已知）扬声器的情况下对样品进行比较。应用了两个语料库，这些语料库是专门用于法医目的的，第三个是用于传统演讲者验证的第三个语料库。使用了两种基于深度学习的扬声器嵌入向量提取方法：X-Vector和Ecapa-TDNN。说话者验证在可能性比率框架中进行了评估。在语言组合（建模，LR校准，评估）之间进行了比较。通过MinCllr和EER指标评估了结果。发现该模型以不同的语言进行了预先训练，但是在具有大量扬声器的语料库上，在语言不匹配的样本上表现良好。还检查了样本持续时间和口语样式的影响。发现相关样本的持续时间越长，性能就越好。另外，如果采用各种口语样式，则没有真正的区别。

translated by 谷歌翻译

Domain adaptation strategies for cancer-independent detection of lymph node metastases

Péter Bándi , Maschenka Balkenhol , Marcory van Dijk , Bram van Ginneken , Jeroen van der Laak , Geert Litjens

分类：计算机视觉 | 机器学习

2022-07-13

最近，大型高质量的公共数据集导致了卷积神经网络的发展，这些神经网络可以在专家病理学家水平上检测乳腺癌的淋巴结转移。许多癌症，无论起源地点如何，都可以转移到淋巴结。但是，收集和注释每种癌症类型的高量，高质量数据集都是具有挑战性的。在本文中，我们研究了如何在多任务设置中最有效地利用现有的高质量数据集，以实现紧密相关的任务。具体而言，我们将探索不同的训练和领域适应策略，包括预防灾难性遗忘，用于结肠和头颈癌症转移淋巴结中的灾难性遗忘。我们的结果表明，两项癌症转移检测任务的最新性能。此外，我们显示了从一种癌症类型到另一种癌症的反复适应以获得多任务转移检测网络的有效性。最后，我们表明，利用现有的高质量数据集可以显着提高新目标任务的性能，并且可以使用正则化有效地减轻灾难性遗忘。

translated by 谷歌翻译

BiometricBlender: Ultra-high dimensional, multi-class synthetic data generator to imitate biometric feature space

Marcell Stippinger , Dávid Hanák , Marcell T. Kurbucz , Gergely Hanczár , Olivér M. Törteli , Zoltán Somogyvári

分类：机器学习 | 人工智能

2022-06-21

缺乏自由获得的（现实生活或合成）高或超高维度的多级数据集可能会阻碍对特征筛查的快速增长的研究，尤其是在生物识别技术领域，在这种情况下，此类数据集使用很常见。本文报告了一个名为Biometricblender的Python软件包，它是一种超高维，多级合成数据生成器，可基于广泛的功能筛选方法进行基准测试。在数据生成过程中，用户可以控制混合特征的总体实用性和相互关系，因此合成特征空间能够模仿真实生物识别数据集的关键属性。

translated by 谷歌翻译

Learning the parameters of a differential equation from its trajectory via the adjoint equation

Imre Fekete , András Molnár , Péter L. Simon

分类：机器学习

2022-06-17

本文有助于加强机器学习与微分方程理论之间的关系。在这种情况下，拟合参数的逆问题，而微分方程与某些测量值的初始条件构成了关键问题。本文探讨了一个可以用于构建损失函数家族的抽象，目的是将初始值问题解决方案拟合到一组离散或连续测量中。可以证明，伴随方程的扩展可以用来推导损失函数的梯度，作为机器学习中反向传播的连续类似物。提供了数值证据，表明在合理控制的情况下，获得的梯度可以在梯度下降中使用，以将初始值问题解决方案拟合到一组连续的嘈杂测量值中，以及一组离散的噪声测量值，这些测量值在不确定的情况下记录下来时代。

translated by 谷歌翻译

DialogueScript: Using Dialogue Agents to Produce a Script

Patrícia Schmidtová , Dávid Javorský , Christián Mikláš , Tomáš Musil , Rudolf Rosa , Ondřej Dušek

分类：自然语言处理

2022-06-16

我们提出了一种新颖的方法来通过使用具有不同个性类型的代理来生成脚本。为了管理脚本中的字符交互，我们采用了模拟的戏剧网络。关于多个标准的自动和人类评估表明，我们的方法的表现优于基于香草-GPT2的基线。我们进一步引入了一个新的指标，以根据自然语言推论评估对话一致性并证明其有效性。

translated by 谷歌翻译

Towards Robotic Laboratory Automation Plug & Play: Survey and Concept Proposal on Teaching-free Robot Integration with the LAPP Digital Twin

Ádám Wolf , Stefan Romeder-Finger , Károly Széll , Péter Galambos

分类：机器人

2022-05-17

The Laboratory Automation Plug & Play (LAPP) framework is an over-arching reference architecture concept for the integration of robots in life science laboratories. The plug & play nature lies in the fact that manual configuration is not required, including the teaching of the robots. In this paper a digital twin (DT) based concept is proposed that outlines the types of information that have to be provided for each relevant component of the system. In particular, for the devices interfacing with the robot, the robot positions have to be defined beforehand in a device-attached coordinate system (CS) by the vendor. This CS has to be detectable by the vision system of the robot by means of optical markers placed on the front side of the device. With that, the robot is capable of tending the machine by performing the pick-and-place type transportation of standard sample carriers. This basic use case is the primary scope of the LAPP-DT framework. The hardware scope is limited to simple benchtop and mobile manipulators with parallel grippers at this stage. This paper first provides an overview of relevant literature and state-of-the-art solutions, after which it outlines the framework on the conceptual level, followed by the specification of the relevant DT parameters for the robot, for the devices and for the facility. Finally, appropriate technologies and strategies are identified for the implementation.

translated by 谷歌翻译