An oft-cited open problem of federated learning is the existence of data heterogeneity at the clients. One pathway to understanding the drastic accuracy drop in federated learning is by scrutinizing the behavior of the clients' deep models on data with different levels of "difficulty", which has been left unaddressed. In this paper, we investigate a different and rarely studied dimension of FL: ordered learning. Specifically, we aim to investigate how ordered learning principles can contribute to alleviating the heterogeneity effects in FL. We present theoretical analysis and conduct extensive empirical studies on the efficacy of orderings spanning three kinds of learning: curriculum, anti-curriculum, and random curriculum. We find that curriculum learning largely alleviates non-IIDness. Interestingly, the more disparate the data distributions across clients the more they benefit from ordered learning. We provide analysis explaining this phenomenon, specifically indicating how curriculum training appears to make the objective landscape progressively less convex, suggesting fast converging iterations at the beginning of the training procedure. We derive quantitative results of convergence for both convex and nonconvex objectives by modeling the curriculum training on federated devices as local SGD with locally biased stochastic gradients. Also, inspired by ordered learning, we propose a novel client selection technique that benefits from the real-world disparity in the clients. Our proposed approach to client selection has a synergic effect when applied together with ordered learning in FL.
translated by 谷歌翻译
聚集的联合学习(FL)已显示通过将客户分组为群集,从而产生有希望的结果。这在单独的客户群在其本地数据的分布方面有显着差异的情况下特别有效。现有的集群FL算法实质上是在试图将客户群体组合在一起,以便同一集群中的客户可以利用彼此的数据来更好地执行联合学习。但是,先前的群集FL算法试图在培训期间间接学习这些分布相似性,这可能会很耗时,因为可能需要许多回合的联合学习,直到群集的形成稳定为止。在本文中,我们提出了一种新的联合学习方法,该方法直接旨在通过分析客户数据子空间之间的主要角度来有效地识别客户之间的分布相似性。每个客户端都以单一的方式在其本地数据上应用截断的奇异值分解(SVD)步骤,以得出一小部分主向量,该量提供了一个签名,可简洁地捕获基础分布的主要特征。提供了一组主要的主向量,以便服务器可以直接识别客户端之间的分布相似性以形成簇。这是通过比较这些主要向量跨越的客户数据子空间之间主要角度的相似性来实现的。该方法提供了一个简单而有效的集群FL框架,该框架解决了广泛的数据异质性问题,而不是标签偏斜的更简单的非iids形式。我们的聚类FL方法还可以为非凸目标目标提供融合保证。我们的代码可在https://github.com/mmorafah/pacfl上找到。
translated by 谷歌翻译
在存在参与者的非IID数据分布的情况下,经典联合学习方法会产生明显的绩效降解。当每个本地数据集的分布与全局数据集有很大不同时,每个客户端的本地目标将与全局Optima不一致,从而导致本地更新中的漂移。这种现象极大地影响了客户的表现。这是为了让客户参加联合学习的主要动力是获得更好的个性化模型。为了解决上述问题,我们提出了一种新的算法弗利斯(Flis),该算法通过利用客户模型的推理相似性,将客户人口与可共同训练数据分布的群集分组。该框架捕获了设置,其中不同的用户组具有自己的目标(学习任务),但通过在同一集群(相同的学习任务)中汇总其数据以执行更有效和个性化的联合学习。我们提出了实验结果,以证明FLIS比CIFAR-100/10,SVHN和FMNIST数据集的最先进基准的好处。我们的代码可在https://github.com/mmorafah/flis上找到。
translated by 谷歌翻译
Compliance in actuation has been exploited to generate highly dynamic maneuvers such as throwing that take advantage of the potential energy stored in joint springs. However, the energy storage and release could not be well-timed yet. On the contrary, for multi-link systems, the natural system dynamics might even work against the actual goal. With the introduction of variable stiffness actuators, this problem has been partially addressed. With a suitable optimal control strategy, the approximate decoupling of the motor from the link can be achieved to maximize the energy transfer into the distal link prior to launch. However, such continuous stiffness variation is complex and typically leads to oscillatory swing-up motions instead of clear launch sequences. To circumvent this issue, we investigate decoupling for speed maximization with a dedicated novel actuator concept denoted Bi-Stiffness Actuation. With this, it is possible to fully decouple the link from the joint mechanism by a switch-and-hold clutch and simultaneously keep the elastic energy stored. We show that with this novel paradigm, it is not only possible to reach the same optimal performance as with power-equivalent variable stiffness actuation, but even directly control the energy transfer timing. This is a major step forward compared to previous optimal control approaches, which rely on optimizing the full time-series control input.
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
In a sequential decision-making problem, having a structural dependency amongst the reward distributions associated with the arms makes it challenging to identify a subset of alternatives that guarantees the optimal collective outcome. Thus, besides individual actions' reward, learning the causal relations is essential to improve the decision-making strategy. To solve the two-fold learning problem described above, we develop the 'combinatorial semi-bandit framework with causally related rewards', where we model the causal relations by a directed graph in a stationary structural equation model. The nodal observation in the graph signal comprises the corresponding base arm's instantaneous reward and an additional term resulting from the causal influences of other base arms' rewards. The objective is to maximize the long-term average payoff, which is a linear function of the base arms' rewards and depends strongly on the network topology. To achieve this objective, we propose a policy that determines the causal relations by learning the network's topology and simultaneously exploits this knowledge to optimize the decision-making process. We establish a sublinear regret bound for the proposed algorithm. Numerical experiments using synthetic and real-world datasets demonstrate the superior performance of our proposed method compared to several benchmarks.
translated by 谷歌翻译
Remote state estimation of large-scale distributed dynamic processes plays an important role in Industry 4.0 applications. In this paper, we focus on the transmission scheduling problem of a remote estimation system. First, we derive some structural properties of the optimal sensor scheduling policy over fading channels. Then, building on these theoretical guidelines, we develop a structure-enhanced deep reinforcement learning (DRL) framework for optimal scheduling of the system to achieve the minimum overall estimation mean-square error (MSE). In particular, we propose a structure-enhanced action selection method, which tends to select actions that obey the policy structure. This explores the action space more effectively and enhances the learning efficiency of DRL agents. Furthermore, we introduce a structure-enhanced loss function to add penalties to actions that do not follow the policy structure. The new loss function guides the DRL to converge to the optimal policy structure quickly. Our numerical experiments illustrate that the proposed structure-enhanced DRL algorithms can save the training time by 50% and reduce the remote estimation MSE by 10% to 25% when compared to benchmark DRL algorithms. In addition, we show that the derived structural properties exist in a wide range of dynamic scheduling problems that go beyond remote state estimation.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs allows SLU systems to improve in comparison to the 1-best setup (4% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, and a relative improvement of 18% over the 1-best configuration. Thus, crossmodal architectures represent a good alternative to overcome the limitations of working purely automatically generated textual data.
translated by 谷歌翻译
This paper describes a simple yet efficient repetition-based modular system for speeding up air-traffic controllers (ATCos) training. E.g., a human pilot is still required in EUROCONTROL's ESCAPE lite simulator (see https://www.eurocontrol.int/simulator/escape) during ATCo training. However, this need can be substituted by an automatic system that could act as a pilot. In this paper, we aim to develop and integrate a pseudo-pilot agent into the ATCo training pipeline by merging diverse artificial intelligence (AI) powered modules. The system understands the voice communications issued by the ATCo, and, in turn, it generates a spoken prompt that follows the pilot's phraseology to the initial communication. Our system mainly relies on open-source AI tools and air traffic control (ATC) databases, thus, proving its simplicity and ease of replicability. The overall pipeline is composed of the following: (1) a submodule that receives and pre-processes the input stream of raw audio, (2) an automatic speech recognition (ASR) system that transforms audio into a sequence of words; (3) a high-level ATC-related entity parser, which extracts relevant information from the communication, i.e., callsigns and commands, and finally, (4) a speech synthesizer submodule that generates responses based on the high-level ATC entities previously extracted. Overall, we show that this system could pave the way toward developing a real proof-of-concept pseudo-pilot system. Hence, speeding up the training of ATCos while drastically reducing its overall cost.
translated by 谷歌翻译