Massively multi-task learning with large language models has recently made substantial progress on few-shot generalization. However, this is usually performed in a centralized learning fashion, ignoring the privacy sensitivity issue of (annotated) data used in multiple tasks. To mitigate this issue, we propose FewFedWeight, a few-shot federated learning framework across multiple tasks, to achieve the best of both worlds: privacy preservation and cross-task generalization. FewFedWeight trains client models in isolated devices without sharing data. It broadcasts the global model in the server to each client and produces pseudo data for clients so that knowledge from the global model can be explored to enhance few-shot learning of each client model. An energy-based algorithm is further proposed to weight pseudo samples in order to reduce the negative impact of noise from the generated pseudo data. Adaptive model weights of client models are also tuned according to their performance. We use these model weights to dynamically aggregate client models to update the global model. Experiments on 118 NLP tasks show that FewFedWeight can significantly improve the performance of client models on 61% tasks with an average performance improvement rate of 30.5% over the baseline and substantially outperform FedAvg and other decentralized learning methods.
translated by 谷歌翻译
Knowledge distillation (KD) has been widely used for model compression and knowledge transfer. Typically, a big teacher model trained on sufficient data transfers knowledge to a small student model. However, despite the success of KD, little effort has been made to study whether KD leaks the training data of the teacher model. In this paper, we experimentally reveal that KD suffers from the risk of privacy leakage. To alleviate this issue, we propose a novel knowledge distillation method, swing distillation, which can effectively protect the private information of the teacher model from flowing to the student model. In our framework, the temperature coefficient is dynamically and adaptively adjusted according to the degree of private information contained in the data, rather than a predefined constant hyperparameter. It assigns different temperatures to tokens according to the likelihood that a token in a position contains private information. In addition, we inject noise into soft targets provided to the student model, in order to avoid unshielded knowledge transfer. Experiments on multiple datasets and tasks demonstrate that the proposed swing distillation can significantly reduce (by over 80% in terms of canary exposure) the risk of privacy leakage in comparison to KD with competitive or better performance. Furthermore, swing distillation is robust against the increasing privacy budget.
translated by 谷歌翻译
从单眼视频中进行的3D人姿势估计最近看到了显着改善。但是,大多数最先进的方法都是基于运动学的,它容易出现具有明显伪影的物理上不可信的运动。当前基于动态的方法可以预测物理上合理的运动,但仅限于具有静态相机视图的简单场景。在这项工作中,我们介绍了D&D(从动态相机中学习人类动力学),该法律利用物理定律使用移动的摄像机从野外视频中重建3D人类运动。 D&D引入了惯性力控制(IFC),以考虑动态摄像机的惯性力来解释非惯性局部框架中的3D人运动。为了学习有限注释的接地接触,我们开发了概率接触扭矩(PCT),该概率是通过与接触概率的可区分抽样计算的,并用于生成运动。接触状态可以通过鼓励模型产生正确的动作来弱监督。此外,我们提出了一个细心的PD控制器,该控制器使用时间信息来调整目标姿势状态,以获得平稳而准确的姿势控制。我们的方法完全是基于神经的,并且在物理引擎中没有离线优化或模拟的情况下运行。大规模3D人体运动基准的实验证明了D&D的有效性,在该基于最新的运动学基于动力学和基于动力学的方法的情况下,我们表现出卓越的性能。代码可从https://github.com/jeffsjtu/dnd获得
translated by 谷歌翻译
多任务高斯流程(MTGP)是一种众所周知的非参数贝叶斯模型,用于通过跨任务传输知识来有效地学习相关任务。但是当前的MTGP通常仅限于在同一输入域中定义的多任务场景,没有留出空间来解决异质案例,即输入域的特征在任务上有所不同。为此,本文提出了一个新型的异质随机变化线性模型(\ texttt {hsvlmc})模型,用于同时学习具有不同输入域的任务。特别是,我们通过贝叶斯校准开发了随机变化框架,该框架(i)考虑了域映射提高的尺寸降低的影响,以实现有效的输入对准; (ii)采用残差建模策略来利用先前域映射带来的电感偏差来获得更好的模型推断。最后,对现有LMC模型的优势在各种异质的多任务案例和实用的多保真蒸汽轮机排气问题上进行了广泛的验证。
translated by 谷歌翻译
Evolutionary algorithms (EAs) are a sort of nature-inspired metaheuristics, which have wide applications in various practical optimization problems. In these problems, objective evaluations are usually inaccurate, because noise is almost inevitable in real world, and it is a crucial issue to weaken the negative effect caused by noise. Sampling is a popular strategy, which evaluates the objective a couple of times, and employs the mean of these evaluation results as an estimate of the objective value. In this work, we introduce a novel sampling method, median sampling, into EAs, and illustrate its properties and usefulness theoretically by solving OneMax, the problem of maximizing the number of 1s in a bit string. Instead of the mean, median sampling employs the median of the evaluation results as an estimate. Through rigorous theoretical analysis on OneMax under the commonly used onebit noise, we show that median sampling reduces the expected runtime exponentially. Next, through two special noise models, we show that when the 2-quantile of the noisy fitness increases with the true fitness, median sampling can be better than mean sampling; otherwise, it may fail and mean sampling can be better. The results may guide us to employ median sampling properly in practical applications.
translated by 谷歌翻译
Evolutionary algorithms (EAs) have found many successful real-world applications, where the optimization problems are often subject to a wide range of uncertainties. To understand the practical behaviors of EAs theoretically, there are a series of efforts devoted to analyzing the running time of EAs for optimization under uncertainties. Existing studies mainly focus on noisy and dynamic optimization, while another common type of uncertain optimization, i.e., robust optimization, has been rarely touched. In this paper, we analyze the expected running time of the (1+1)-EA solving robust linear optimization problems (i.e., linear problems under robust scenarios) with a cardinality constraint $k$. Two common robust scenarios, i.e., deletion-robust and worst-case, are considered. Particularly, we derive tight ranges of the robust parameter $d$ or budget $k$ allowing the (1+1)-EA to find an optimal solution in polynomial running time, which disclose the potential of EAs for robust optimization.
translated by 谷歌翻译
In noisy evolutionary optimization, sampling is a common strategy to deal with noise. By the sampling strategy, the fitness of a solution is evaluated multiple times (called \emph{sample size}) independently, and its true fitness is then approximated by the average of these evaluations. Most previous studies on sampling are empirical, and the few theoretical studies mainly showed the effectiveness of sampling with a sufficiently large sample size. In this paper, we theoretically examine what strategies can work when sampling with any fixed sample size fails. By constructing a family of artificial noisy examples, we prove that sampling is always ineffective, while using parent or offspring populations can be helpful on some examples. We also construct an artificial noisy example to show that when using neither sampling nor populations is effective, a tailored adaptive sampling (i.e., sampling with an adaptive sample size) strategy can work. These findings may enhance our understanding of sampling to some extent, but future work is required to validate them in natural situations.
translated by 谷歌翻译
In many real-world optimization problems, the objective function evaluation is subject to noise, and we cannot obtain the exact objective value. Evolutionary algorithms (EAs), a type of general-purpose randomized optimization algorithm, have been shown to be able to solve noisy optimization problems well. However, previous theoretical analyses of EAs mainly focused on noise-free optimization, which makes the theoretical understanding largely insufficient for the noisy case. Meanwhile, the few existing theoretical studies under noise often considered the one-bit noise model, which flips a randomly chosen bit of a solution before evaluation; while in many realistic applications, several bits of a solution can be changed simultaneously. In this paper, we study a natural extension of one-bit noise, the bit-wise noise model, which independently flips each bit of a solution with some probability. We analyze the running time of the (1+1)-EA solving OneMax and LeadingOnes under bit-wise noise for the first time, and derive the ranges of the noise level for polynomial and super-polynomial running time bounds. The analysis on LeadingOnes under bit-wise noise can be easily transferred to one-bit noise, and improves the previously known results. Since our analysis discloses that the (1+1)-EA can be efficient only under low noise levels, we also study whether the sampling strategy can bring robustness to noise. We prove that using sampling can significantly increase the largest noise level allowing a polynomial running time, that is, sampling is robust to noise.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
translated by 谷歌翻译