The growing interest in intelligent services and privacy protection for mobile devices has given rise to the widespread application of federated learning in Multi-access Edge Computing (MEC). Diverse user behaviors call for personalized services with heterogeneous Machine Learning (ML) models on different devices. Federated Multi-task Learning (FMTL) is proposed to train related but personalized ML models for different devices, whereas previous works suffer from excessive communication overhead during training and neglect the model heterogeneity among devices in MEC. Introducing knowledge distillation into FMTL can simultaneously enable efficient communication and model heterogeneity among clients, whereas existing methods rely on a public dataset, which is impractical in reality. To tackle this dilemma, Federated MultI-task Distillation for Multi-access Edge CompuTing (FedICT) is proposed. FedICT direct local-global knowledge aloof during bi-directional distillation processes between clients and the server, aiming to enable multi-task clients while alleviating client drift derived from divergent optimization directions of client-side local models. Specifically, FedICT includes Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA). FPKD is proposed to reinforce the clients' fitting of local data by introducing prior knowledge of local data distributions. Moreover, LKA is proposed to correct the distillation loss of the server, making the transferred local knowledge better match the generalized representation. Experiments on three datasets show that FedICT significantly outperforms all compared benchmarks in various data heterogeneous and model architecture settings, achieving improved accuracy with less than 1.2% training communication overhead compared with FedAvg and no more than 75% training communication round compared with FedGKT.
translated by 谷歌翻译
With the evergrowing sizes of pre-trained models (PTMs), it has been an emerging practice to only provide the inference APIs for users, namely model-as-a-service (MaaS) setting. To adapt PTMs with model parameters frozen, most current approaches focus on the input side, seeking for powerful prompts to stimulate models for correct answers. However, we argue that input-side adaptation could be arduous due to the lack of gradient signals and they usually require thousands of API queries, resulting in high computation and time costs. In light of this, we present Decoder Tuning (DecT), which in contrast optimizes task-specific decoder networks on the output side. Specifically, DecT first extracts prompt-stimulated output scores for initial predictions. On top of that, we train an additional decoder network on the output representations to incorporate posterior data knowledge. By gradient-based optimization, DecT can be trained within several seconds and requires only one PTM query per sample. Empirically, we conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $10^3\times$ speed-up.
translated by 谷歌翻译
Anomaly detection is defined as discovering patterns that do not conform to the expected behavior. Previously, anomaly detection was mostly conducted using traditional shallow learning techniques, but with little improvement. As the emergence of graph neural networks (GNN), graph anomaly detection has been greatly developed. However, recent studies have shown that GNN-based methods encounter challenge, in that no graph anomaly detection algorithm can perform generalization on most datasets. To bridge the tap, we propose a multi-view fusion approach for graph anomaly detection (Mul-GAD). The view-level fusion captures the extent of significance between different views, while the feature-level fusion makes full use of complementary information. We theoretically and experimentally elaborate the effectiveness of the fusion strategies. For a more comprehensive conclusion, we further investigate the effect of the objective function and the number of fused views on detection performance. Exploiting these findings, our Mul-GAD is proposed equipped with fusion strategies and the well-performed objective function. Compared with other state-of-the-art detection methods, we achieve a better detection performance and generalization in most scenarios via a series of experiments conducted on Pubmed, Amazon Computer, Amazon Photo, Weibo and Books. Our code is available at https://github.com/liuyishoua/Mul-Graph-Fusion.
translated by 谷歌翻译
Metric-based meta-learning is one of the de facto standards in few-shot learning. It composes of representation learning and metrics calculation designs. Previous works construct class representations in different ways, varying from mean output embedding to covariance and distributions. However, using embeddings in space lacks expressivity and cannot capture class information robustly, while statistical complex modeling poses difficulty to metric designs. In this work, we use tensor fields (``areas'') to model classes from the geometrical perspective for few-shot learning. We present a simple and effective method, dubbed hypersphere prototypes (HyperProto), where class information is represented by hyperspheres with dynamic sizes with two sets of learnable parameters: the hypersphere's center and the radius. Extending from points to areas, hyperspheres are much more expressive than embeddings. Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere. Following this idea, we also develop two variants of prototypes under other measurements. Extensive experiments and analysis on few-shot learning tasks across NLP and CV and comparison with 20+ competitive baselines demonstrate the effectiveness of our approach.
translated by 谷歌翻译
Conceptual knowledge is fundamental to human cognition and knowledge bases. However, existing knowledge probing works only focus on evaluating factual knowledge of pre-trained language models (PLMs) and ignore conceptual knowledge. Since conceptual knowledge often appears as implicit commonsense behind texts, designing probes for conceptual knowledge is hard. Inspired by knowledge representation schemata, we comprehensively evaluate conceptual knowledge of PLMs by designing three tasks to probe whether PLMs organize entities by conceptual similarities, learn conceptual properties, and conceptualize entities in contexts, respectively. For the tasks, we collect and annotate 24k data instances covering 393 concepts, which is COPEN, a COnceptual knowledge Probing bENchmark. Extensive experiments on different sizes and types of PLMs show that existing PLMs systematically lack conceptual knowledge and suffer from various spurious correlations. We believe this is a critical bottleneck for realizing human-like cognition in PLMs. COPEN and our codes are publicly released at https://github.com/THU-KEG/COPEN.
translated by 谷歌翻译
Since the recent success of Vision Transformers (ViTs), explorations toward transformer-style architectures have triggered the resurgence of modern ConvNets. In this work, we explore the representation ability of DNNs through the lens of interaction complexities. We empirically show that interaction complexity is an overlooked but essential indicator for visual recognition. Accordingly, a new family of efficient ConvNets, named MogaNet, is presented to pursue informative context mining in pure ConvNet-based models, with preferable complexity-performance trade-offs. In MogaNet, interactions across multiple complexities are facilitated and contextualized by leveraging two specially designed aggregation blocks in both spatial and channel interaction spaces. Extensive studies are conducted on ImageNet classification, COCO object detection, and ADE20K semantic segmentation tasks. The results demonstrate that our MogaNet establishes new state-of-the-art over other popular methods in mainstream scenarios and all model scales. Typically, the lightweight MogaNet-T achieves 80.0\% top-1 accuracy with only 1.44G FLOPs using a refined training setup on ImageNet-1K, surpassing ParC-Net-S by 1.4\% accuracy but saving 59\% (2.04G) FLOPs.
translated by 谷歌翻译
Pre-trained language models (PLMs) achieve remarkable performance on many downstream tasks, but may fail in giving reliable estimates of their predictive uncertainty. Given the lack of a comprehensive understanding of PLMs calibration, we take a close look into this new research problem, aiming to answer two questions: (1) Do PLMs learn to become calibrated in the training process? (2) How effective are existing calibration methods? For the first question, we conduct fine-grained control experiments to study the dynamic change in PLMs' calibration performance in training. We consider six factors as control variables, including dataset difficulty, available training samples, training steps, the number of tunable parameters, model scale, and pretraining. In experiments, we observe a consistent change in calibration performance across six factors. We find that PLMs don't learn to become calibrated in training, evidenced by the continual increase in confidence, no matter the predictions are correct or not. We highlight that our finding presents some contradiction with two established conclusions: (a) Larger PLMs are more calibrated; (b) Pretraining improves model calibration. Next, we study the effectiveness of existing calibration methods in mitigating the overconfidence issue, in both in-distribution and various out-of-distribution settings. Besides unlearnable calibration methods, we adapt two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations. Also, we propose extended learnable methods based on existing ones to further improve or maintain PLMs calibration without sacrificing the original task performance. Experimental results show that learnable methods significantly reduce PLMs' confidence in wrong predictions, and our methods exhibit superior performance compared with previous methods.
translated by 谷歌翻译
安全是每个机器人平台的关键特性:任何控制政策始终遵守执行器限制,并避免与环境和人类发生冲突。在加强学习中,安全对于探索环境而不会造成任何损害更为基础。尽管有许多针对安全勘探问题的建议解决方案,但只有少数可以处理现实世界的复杂性。本文介绍了一种安全探索的新公式,用于强化各种机器人任务。我们的方法适用于广泛的机器人平台,即使在通过探索约束歧管的切线空间从数据中学到的复杂碰撞约束下也可以执行安全。我们提出的方法在模拟的高维和动态任务中实现了最先进的表现,同时避免与环境发生冲突。我们在Tiago ++机器人上展示了安全的现实部署,在操纵和人类机器人交互任务中取得了显着的性能。
translated by 谷歌翻译
提示将下游应用程序作为语言建模任务施放,与使用预训练的模型进行标准微调相比,已显示出样本有效的效率。但是,提示的一个陷阱是需要手动设计的模式,其结果可能是不直觉的,需要大量的验证集来调整。为了应对挑战,我们提出了一种全自动提示方法Autoseq:(1)我们在序列到序列模型上采用自然语言提示,从而实现自由形式生成和更大的标签搜索空间; (2)我们提出了标签序列 - 无限长度的短语以口头表达标签 - 这消除了手动模板的需求,并且比单个标签单词更具有表现力; (3)我们使用Beam Search自动生成大量的标签序列候选物,并提出对比度重新排列以获得最佳组合。 Autoseq显着胜过其他无手动设计方法,例如软提示调整,适配器调整和自动搜索单个标签单词;生成的标签序列比各种任务上的精选手动序列更好。我们的方法揭示了几次学习中序列模型的潜力,并阐明了通用通用和自动提示的途径。本文的源代码可以从https://github.com/thunlp/seq2seq-prompt获得。
translated by 谷歌翻译
模块化设计是未来大型空间设施的On On On构造技术的基础。标准界面是未来空间机器人系统和空间设施模块化设计的关键技术。本文介绍了Petlock的设计和测试,标准和测试无性别界面可以在未来的模块化空间机器人操纵器和航天器之间传递机械载荷,功率和数据。Petlock采用完全无性别的设计,包括连接面,锁定机制,数据和功率接口。连接表面提供了较大的翻译和旋转错位耐受性,由于其120度对称和3D形状的设计。锁定机制具有三个锁定引脚撤回结构设计,这是简单可靠的。高锁定力,高容忍度,高可靠性和低成本的优势,Petloc K在未来的轨道施工任务中具有很大的应用潜力。
translated by 谷歌翻译