智能论文笔记

Skip Training for Multi-Agent Reinforcement Learning Controller for Industrial Wave Energy Converters

Soumyendu Sarkar , Vineet Gundecha , Sahand Ghorbanpour , Alexander Shmakov , Ashwin Ramesh Babu , Alexandre Pichard , Mathieu Cocho

分类：机器学习 | 人工智能

2022-09-13

最近的波能转化器（WEC）配备了多个腿和发电机，以最大程度地发电。传统控制器显示出捕获复杂波形模式的局限性，并且控制器必须有效地最大化能量捕获。本文介绍了多项式增强学习控制器（MARL），该控制器的表现优于传统使用的弹簧减震器控制器。我们的最初研究表明，问题的复杂性质使训练很难融合。因此，我们提出了一种新颖的跳过训练方法，使MARL训练能够克服性能饱和，并与默认的MARL训练相比，融合到最佳控制器，从而增强发电。我们还提出了另一种新型的混合训练初始化（STHTI）方法，其中最初可以单独针对基线弹簧减震器（SD）控制器对MARL控制器的个别代理进行训练，然后在将来一次或将来培训一个代理商或全部培训加速收敛。我们使用异步参与者-Critic（A3C）算法在基线弹簧减震器控制器上实现了基线弹簧减震器控制器的能源效率的两位数提高。

translated by 谷歌翻译

Rxn Hypergraph: a Hypergraph Attention Model for Chemical Reaction Representation

Mohammadamin Tavakoli , Alexander Shmakov , Francesco Ceccarelli , Pierre Baldi

分类：机器学习 | 人工智能

2022-01-02

它是科学技术的基础，能够预测化学反应及其性质。为实现此类技能，重要的是要培养良好的化学反应表示，或者可以自动从数据中学习此类表示的良好深度学习架构。目前没有普遍和广泛采用的方法，可强健地代表化学反应。大多数现有方法患有一个或多个缺点，例如：（1）缺乏普遍性; （2）缺乏稳健性; （3）缺乏可解释性;或（4）需要过度手动预处理。在这里，我们利用基于图的分子结构表示，以开发和测试一个超图注意神经网络方法，以一次解决反应表示和性能 - 预测问题，减轻了上述缺点。我们使用三个独立数据集化学反应评估三个实验中的这种超照片表示。在所有实验中，基于超图的方法与其他表示和它们相应的化学反应模型相匹配或优于相应的模型，同时产生可解释的多级表示。

translated by 谷歌翻译

SPANet: Generalized Permutationless Set Assignment for Particle Physics using Symmetry Preserving Attention

Alexander Shmakov , Michael James Fenton , Ta-Wei Ho , Shih-Chieh Hsu , Daniel Whiteson , Pierre Baldi

分类：机器学习

2021-06-07

大型强子撞机的不稳定沉重粒子的创造是解决物理学中最深处的最深处的最直接方式。碰撞通常产生可变尺寸的观察粒子，其具有固有的歧义，使观察到的颗粒的分配复杂于重质颗粒的腐烂产物。在物理界解决这些挑战的当前策略忽略了腐烂产品的物理对称，并考虑所有可能的分配排列，并不扩展到复杂的配置。基于注意的序列建模的深度学习方法在自然语言处理中取得了最先进的性能，但它们缺乏内置机制来处理物理集分配问题中发现的独特对称性。我们介绍了一种建构对称保护的新方法，用于保护对称保护的网络，反映问题的自然侵略者，以有效地找到任务而不评估所有排列。这种通用方法适用于任意复杂的配置，并且显着优于当前方法，提高了在典型的基准问题上的19 \％-35 \％之间的重建效率，同时在最复杂的事件上将推理时间减少两到五个数量级，使得许多重要和以前顽固的病例易腐烂。包含常规库的完整代码存储库，使用的特定配置和完整的数据集发布，是在https://github.com/alexanders101/spanet的avawaiable

translated by 谷歌翻译

Permutationless Many-Jet Event Reconstruction with Symmetry Preserving Attention Networks

Michael James Fenton , Alexander Shmakov , Ta-Wei Ho , Shih-Chieh Hsu , Daniel Whiteson , Pierre Baldi

分类：机器学习

2020-10-19

在大型强子对撞机上大量生产的顶级夸克，具有复杂的探测器签名，需要特殊的重建技术。最常见的衰减模式是“全杰”频道，导致6月份的最终状态，由于可能的排列数量大量，因此在$ pp $碰撞中尤其难以重建。我们使用广义注意机制基于神经网络提出了一种新的问题，我们称之为对称性保留注意力网络（SPA-NET）。我们训练一个这样的网络，以明确地识别每个顶级夸克的衰减产品，而无需组合爆炸作为该技术的力量的一个例子。这种方法大大优于现有的最新方法，正确分配了所有喷气机，以$ 93.0％的价格分配了所有喷气机$ 6 $ -JET，$ 87.8％的$ 7 $ -JET $和$ 82.6％的$ \ geq 8 $ -JET活动。

translated by 谷歌翻译

Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise

Hendrik A. Mehrtens , Alexander Kurz , Tabea-Clara Bucher , Titus J. Brinker

分类：计算机视觉 | 机器学习

2023-01-03

In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.

translated by 谷歌翻译

Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence

Björn W. Schuller , Shahin Amiriparian , Anton Batliner , Alexander Gebhard , Maurice Gerzcuk , Vincent Karas , Alexander Kathan , Lennart Seizer , Johanna Löchner

分类：人工智能 | 计算机视觉 | 机器学习

2022-12-31

Charisma is considered as one's ability to attract and potentially also influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. A number of models exist that base charisma on various dimensions, often following the idea that charisma is given if someone could and would help others. Examples include influence (could help) and affability (would help) in scientific studies or power (could help), presence, and warmth (both would help) as a popular concept. Modelling high levels in these dimensions for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we, thereforem present a blueprint for building machines that can appear charismatic, but also analyse the charisma of others. To this end, we first provide the psychological perspective including different models of charisma and behavioural cues of it. We then switch to conversational charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behaviour by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.

translated by 谷歌翻译

Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats

István Sárándi , Alexander Hermans , Bastian Leibe

分类：计算机视觉

2022-12-29

Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton formats provided by different datasets, i.e., they do not label the same set of anatomical landmarks. There is little prior research on how to best supervise one model with such discrepant labels. We show that simply using separate output heads for different skeletons results in inconsistent depth estimates and insufficient information sharing across skeletons. As a remedy, we propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks. The discovered latent 3D points capture the redundancy among skeletons, enabling enhanced information sharing when used for consistency regularization. Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model, which outperforms prior work on a range of benchmarks, including the challenging 3D Poses in the Wild (3DPW) dataset. Our code and models are available for research purposes.

translated by 谷歌翻译

Bayesian Interpolation with Deep Linear Networks

Boris Hanin , Alexander Zlokapa

分类： (统计)机器学习 | 机器学习

2022-12-29

This article concerns Bayesian inference using deep linear networks with output dimension one. In the interpolating (zero noise) regime we show that with Gaussian weight priors and MSE negative log-likelihood loss both the predictive posterior and the Bayesian model evidence can be written in closed form in terms of a class of meromorphic special functions called Meijer-G functions. These results are non-asymptotic and hold for any training dataset, network depth, and hidden layer widths, giving exact solutions to Bayesian interpolation using a deep Gaussian process with a Euclidean covariance at each layer. Through novel asymptotic expansions of Meijer-G functions, a rich new picture of the role of depth emerges. Specifically, we find that the posteriors in deep linear networks with data-independent priors are the same as in shallow networks with evidence maximizing data-dependent priors. In this sense, deep linear networks make provably optimal predictions. We also prove that, starting from data-agnostic priors, Bayesian model evidence in wide networks is only maximized at infinite depth. This gives a principled reason to prefer deeper networks (at least in the linear case). Finally, our results show that with data-agnostic priors a novel notion of effective depth given by \[\#\text{hidden layers}\times\frac{\#\text{training data}}{\text{network width}}\] determines the Bayesian posterior in wide linear networks, giving rigorous new scaling laws for generalization error.

translated by 谷歌翻译

An Optimal Algorithm for Strongly Convex Min-min Optimization

Dmitry Kovalev , Alexander Gasnikov , Grigory Malinovsky

分类：机器学习

2022-12-29

In this paper we study the smooth strongly convex minimization problem $\min_{x}\min_y f(x,y)$. The existing optimal first-order methods require $\mathcal{O}(\sqrt{\max\{\kappa_x,\kappa_y\}} \log 1/\epsilon)$ of computations of both $\nabla_x f(x,y)$ and $\nabla_y f(x,y)$, where $\kappa_x$ and $\kappa_y$ are condition numbers with respect to variable blocks $x$ and $y$. We propose a new algorithm that only requires $\mathcal{O}(\sqrt{\kappa_x} \log 1/\epsilon)$ of computations of $\nabla_x f(x,y)$ and $\mathcal{O}(\sqrt{\kappa_y} \log 1/\epsilon)$ computations of $\nabla_y f(x,y)$. In some applications $\kappa_x \gg \kappa_y$, and computation of $\nabla_y f(x,y)$ is significantly cheaper than computation of $\nabla_x f(x,y)$. In this case, our algorithm substantially outperforms the existing state-of-the-art methods.

translated by 谷歌翻译

Error syntax aware augmentation of feedback comment generation dataset

Nikolay Babakov , Maria Lysyuk , Alexander Shvets , Lilya Kazakova , Alexander Panchenko

分类：自然语言处理

2022-12-29

This paper presents a solution to the GenChal 2022 shared task dedicated to feedback comment generation for writing learning. In terms of this task given a text with an error and a span of the error, a system generates an explanatory note that helps the writer (language learner) to improve their writing skills. Our solution is based on fine-tuning the T5 model on the initial dataset augmented according to syntactical dependencies of the words located within indicated error span. The solution of our team "nigula" obtained second place according to manual evaluation by the organizers.

translated by 谷歌翻译