许多经济比赛和机器学习方法可以作为竞争优化问题,其中多个代理可以最大限度地减少其各自的目标函数,这取决于所有代理的行动。虽然梯度下降是单代理优化的可靠基本工作,但它通常会导致竞争优化的振荡。在这项工作中,我们提出了PolyATrix竞争梯度下降(PCGD)作为解决涉及任意数量的代理的通用和竞争优化的方法。我们的方法的更新是通过二次正则化的局部Polypatrix近似的纳什均衡,并且可以通过求解方程的线性系统有效地计算。我们证明了PCGD的本地融合以获得$ N $ -Player General Sum Games的稳定定点,并显示它不需要将步长调整到玩家交互的强度。我们使用PCGD优化多功能钢筋学习的政策,并展示其在蛇,马尔可夫足球和电力市场游戏中的优势。由PCGD优先效果培训的代理经过培训,具有同步梯度下降,辛渐变调整和蛇和马尔可夫足球比赛的Extragradient以及电力市场游戏,PCGD列达速度比同时梯度下降和自特殊方法。
translated by 谷歌翻译
深度神经网络通常以随机重量初始化,并具有足够选择的初始方差,以确保训练期间稳定的信号传播。但是,选择适当的方差变得具有挑战性,尤其是随着层数的增长。在这项工作中,我们用完全确定性的初始化方案(即零)代替随机权重初始化,该方案基于身份和Hadamard变换来初始用零和一个(最高范围化因子)开始网络的权重。通过理论和实证研究,我们证明了零能够训练网络而不会损害其表现力。在Resnet上应用零在包括Imagenet在内的各种数据集上实现最先进的性能,这表明随机权重可能不需要网络初始化。此外,零具有许多好处,例如训练超深网络(没有批处理规范化),表现出低级别的学习轨迹,从而导致低级和稀疏的解决方案,并提高培训可重复性。
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.
translated by 谷歌翻译
Passive monitoring of acoustic or radio sources has important applications in modern convenience, public safety, and surveillance. A key task in passive monitoring is multiobject tracking (MOT). This paper presents a Bayesian method for multisensor MOT for challenging tracking problems where the object states are high-dimensional, and the measurements follow a nonlinear model. Our method is developed in the framework of factor graphs and the sum-product algorithm (SPA). The multimodal probability density functions (pdfs) provided by the SPA are effectively represented by a Gaussian mixture model (GMM). To perform the operations of the SPA in high-dimensional spaces, we make use of Particle flow (PFL). Here, particles are migrated towards regions of high likelihood based on the solution of a partial differential equation. This makes it possible to obtain good object detection and tracking performance even in challenging multisensor MOT scenarios with single sensor measurements that have a lower dimension than the object positions. We perform a numerical evaluation in a passive acoustic monitoring scenario where multiple sources are tracked in 3-D from 1-D time-difference-of-arrival (TDOA) measurements provided by pairs of hydrophones. Our numerical results demonstrate favorable detection and estimation accuracy compared to state-of-the-art reference techniques.
translated by 谷歌翻译
Location-aware networks will introduce new services and applications for modern convenience, surveillance, and public safety. In this paper, we consider the problem of cooperative localization in a wireless network where the position of certain anchor nodes can be controlled. We introduce an active planning method that aims at moving the anchors such that the information gain of future measurements is maximized. In the control layer of the proposed method, control inputs are calculated by minimizing the traces of approximate inverse Bayesian Fisher information matrixes (FIMs). The estimation layer computes estimates of the agent states and provides Gaussian representations of marginal posteriors of agent positions to the control layer for approximate Bayesian FIM computations. Based on a cost function that accumulates Bayesian FIM contributions over a sliding window of discrete future timesteps, a receding horizon (RH) control is performed. Approximations that make it possible to solve the resulting tree-search problem efficiently are also discussed. A numerical case study demonstrates the intelligent behavior of a single controlled anchor in a 3-D scenario and the resulting significantly improved localization accuracy.
translated by 谷歌翻译
This paper presents an introduction to the state-of-the-art in anomaly and change-point detection. On the one hand, the main concepts needed to understand the vast scientific literature on those subjects are introduced. On the other, a selection of important surveys and books, as well as two selected active research topics in the field, are presented.
translated by 谷歌翻译
Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.
translated by 谷歌翻译
This short paper discusses continually updated causal abstractions as a potential direction of future research. The key idea is to revise the existing level of causal abstraction to a different level of detail that is both consistent with the history of observed data and more effective in solving a given task.
translated by 谷歌翻译
This project leverages advances in multi-agent reinforcement learning (MARL) to improve the efficiency and flexibility of order-picking systems for commercial warehouses. We envision a warehouse of the future in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput) under given resource constraints. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, the MARL framework can be flexibly applied to any warehouse configuration (e.g. size, layout, number/types of workers, item replenishment frequency) and the agents learn via a process of trial-and-error how to optimally cooperate with one another. This paper details the current status of the R&D effort initiated by Dematic and the University of Edinburgh towards a general-purpose and scalable MARL solution for the order-picking problem in realistic warehouses.
translated by 谷歌翻译