联合学习(FL)是一种隐私保留的分布式机器学习技术,该技术培训模型而不直接访问设备上生成的原始数据。由于设备可以是资源约束,因此可以通过将计算工作负载从设备传送到边缘服务器来改善流动来改善流动。然而,由于移动性,参与FL的设备可以在训练期间离开网络,并且需要连接到不同的边缘服务器。这是具有挑战性的,因为需要迁移边缘服务器的卸载计算。符合此断言,我们提出了Fedfly,即据我们所知,当设备在FL训练期间在边缘服务器之间移动时,将深度神经网络(DNN)迁移的第一项工作。我们对CiFar-10数据集的实证结果,具有平衡和不平衡的数据分布,支持我们的索赔,即当设备在50%的培训完成后,Fedfly可以将培训时间降低到33%,达到55%当与FL中的最先进的卸载方法相比,90%的培训时。 Fedfly在2秒的开销中可以忽略不计,并且不会妥协准确。最后,我们突出了一些开放的研究问题进行进一步调查。 fedfly可以从https://github.com/qub-blesson/fedfly下载
translated by 谷歌翻译
在互联网上应用联合学习(FL)是由他们产生的大量数据卷产生和越来越多的数据隐私问题所必需的。但是,有三种挑战需要解决,以使FL高效:(i)在具有有限的计算能力的设备上执行(ii)由于设备的计算异质性而对陷阱器进行丢包,并且(iii)适应变化的网络带宽。本文提出了一个自适应卸载FL框架,以减轻前述挑战。 FEDADATT通过利用深神经网络(DNN)的层卸载到服务器来加速在计算受限设备中的本地培训。此外,FEDADATT采用基于基于学习的优化和聚类,以便自适应地识别用于服务器上的每个单独设备的DNN的哪个层,以解决计算异质性和改变网络带宽的挑战。实验研究在包括五个物理设备的基于实验室的试验台上进行。通过将DNN从设备卸载到服务器FEDADATT与经典FL相比将典型的物联网设备的训练时间减少一半。极端陷阱和整体训练时间的培训时间可以减少高达57%。此外,随着网络带宽的变化,与经典FL相比,FEDADATT将在不牺牲精度的情况下将培训时间降低至多40%。 FEDADATT可以从https://github.com/qub-blesson/fedadapt下载。
translated by 谷歌翻译
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph, 1) the similarity between the original graph and the generated augmented graph gradually decreases; 2) the discrimination between all nodes within each augmented view gradually increases. In this paper, we argue that both such prior information can be incorporated (differently) into the contrastive learning paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
translated by 谷歌翻译
Artificial intelligence is to teach machines to take actions like humans. To achieve intelligent teaching, the machine learning community becomes to think about a promising topic named machine teaching where the teacher is to design the optimal (usually minimal) teaching set given a target model and a specific learner. However, previous works usually require numerous teaching examples along with large iterations to guide learners to converge, which is costly. In this paper, we consider a more intelligent teaching paradigm named one-shot machine teaching which costs fewer examples to converge faster. Different from typical teaching, this advanced paradigm establishes a tractable mapping from the teaching set to the model parameter. Theoretically, we prove that this mapping is surjective, which serves to an existence guarantee of the optimal teaching set. Then, relying on the surjective mapping from the teaching set to the parameter, we develop a design strategy of the optimal teaching set under appropriate settings, of which two popular efficiency metrics, teaching dimension and iterative teaching dimension are one. Extensive experiments verify the efficiency of our strategy and further demonstrate the intelligence of this new teaching paradigm.
translated by 谷歌翻译
Recent graph-based models for joint multiple intent detection and slot filling have obtained promising results through modeling the guidance from the prediction of intents to the decoding of slot filling. However, existing methods (1) only model the \textit{unidirectional guidance} from intent to slot; (2) adopt \textit{homogeneous graphs} to model the interactions between the slot semantics nodes and intent label nodes, which limit the performance. In this paper, we propose a novel model termed Co-guiding Net, which implements a two-stage framework achieving the \textit{mutual guidances} between the two tasks. In the first stage, the initial estimated labels of both tasks are produced, and then they are leveraged in the second stage to model the mutual guidances. Specifically, we propose two \textit{heterogeneous graph attention networks} working on the proposed two \textit{heterogeneous semantics-label graphs}, which effectively represent the relations among the semantics nodes and label nodes. Experiment results show that our model outperforms existing models by a large margin, obtaining a relative improvement of 19.3\% over the previous best model on MixATIS dataset in overall accuracy.
translated by 谷歌翻译
Recent joint multiple intent detection and slot filling models employ label embeddings to achieve the semantics-label interactions. However, they treat all labels and label embeddings as uncorrelated individuals, ignoring the dependencies among them. Besides, they conduct the decoding for the two tasks independently, without leveraging the correlations between them. Therefore, in this paper, we first construct a Heterogeneous Label Graph (HLG) containing two kinds of topologies: (1) statistical dependencies based on labels' co-occurrence patterns and hierarchies in slot labels; (2) rich relations among the label nodes. Then we propose a novel model termed ReLa-Net. It can capture beneficial correlations among the labels from HLG. The label correlations are leveraged to enhance semantic-label interactions. Moreover, we also propose the label-aware inter-dependent decoding mechanism to further exploit the label correlations for decoding. Experiment results show that our ReLa-Net significantly outperforms previous models. Remarkably, ReLa-Net surpasses the previous best model by over 20\% in terms of overall accuracy on MixATIS dataset.
translated by 谷歌翻译
许多自动语音识别(ASR)数据集包括一个单一的预定义测试集,该测试集由一个或多个演讲者组成,其语音从未出现在培训集中。但是,对于说话者数量很少的数据集,这种“持有说明器”的数据分配策略可能不是理想的选择。这项研究调查了具有最小ASR培训资源的五种语言的十种不同数据拆分方法。我们发现(1)模型性能取决于选择哪个扬声器进行测试; (2)所有固定扬声器的平均单词错误率(WER)不仅与多个随机拆分的平均差异相当,而且与任何给定的单个随机拆分相当; (3)当数据以启发性或对抗性分开时,通常也可以比较; (4)话语持续时间和强度是可变性的相对预测因素,而不管数据分解如何。这些结果表明,广泛使用的宣传者输出的ASR数据分配方法可以产生不反映未见数据或说话者模型性能的结果。在面对数据稀疏时,随机拆分可以产生更可靠和可推广的估计。
translated by 谷歌翻译
大数据学习为人工智能(AI)带来了成功,但是注释和培训成本很昂贵。将来,对小数据的学习是AI的最终目的之一,它要求机器识别依靠小数据作为人类的目标和场景。一系列的机器学习模型正在进行这种方式,例如积极学习,几乎没有学习,深度聚类。但是,其概括性能几乎没有理论保证。此外,它们的大多数设置都是被动的,也就是说,标签分布由一个指定的采样方案明确控制。这项调查遵循PAC(可能是近似正确)框架下的不可知论活动采样,以分析使用有监督和无监督的时尚对小数据学习的概括误差和标签复杂性。通过这些理论分析,我们从两个几何学角度对小数据学习模型进行了分类:欧几里得和非欧几里得(双曲线)平均表示,在此还提供了优化解决方案和讨论。稍后,然后总结了一些可能从小型数据学习中受益的潜在学习方案,还分析了它们的潜在学习方案。最后,还调查了一些具有挑战性的应用程序,例如计算机视觉,自然语言处理可能会受益于小型数据学习。
translated by 谷歌翻译
主动学习最大化假设更新,以找到那些所需的未标记数据。一个固有的假设是,这种学习方式可以将这些更新得出到最佳假设中。但是,如果这些增量更新是负面和无序的,则可能无法很好地保证其收敛性。在本文中,我们介绍了一位机器老师,该教师为主动学习者提供了一个黑盒教学假设,其中教学假设是最佳假设的有效近似。从理论上讲,我们证明,在这一教学假设的指导下,学习者可以比那些没有从老师那里获得任何指导的受过教育的学习者融合到更严格的概括错误和标签复杂性。我们进一步考虑了两种教学方案:教授白盒和黑盒学习者,首先提出了教学的自我完善以改善教学表现。实验验证了这一想法并表现出比基本的积极学习策略(例如Iwal,Iwal-D等)更好的性能。
translated by 谷歌翻译
如今,对大规模数据的深入学习是主导的。空前的数据规模可以说是深度学习成功的最重要的驱动力之一。但是,仍然存在收集数据或标签可能非常昂贵的场景,例如医学成像和机器人技术。为了填补这一空白,本文考虑了使用少量代表性数据从头开始研究的问题。首先,我们通过在球形歧管的同构管上积极学习来表征这个问题。这自然会产生可行的假设类别。使用同源拓扑特性,我们确定了一个重要的联系 - 发现管歧管等同于最大程度地减少物理几何形状中的超球能(MHE)。受此连接的启发,我们提出了一种基于MHE的主动学习(MHEAL)算法,并为MHEAL提供了全面的理论保证,涵盖了收敛和概括分析。最后,我们证明了MHEAL在数据效率学习的广泛应用中的经验表现,包括深度聚类,分布匹配,版本空间采样和深度积极学习。
translated by 谷歌翻译