实现通用语言情报是自然语言处理的长期目标,标准评估基准发挥基本和指导作用。我们认为,对于通用语言智能评估,基准本身需要全面和系统。为此,我们提出了Cuge,一种中文语言理解和生成评估基准,具有以下特征:(1)分层基准框架,其中数据集主要选择和组织语言能力 - 任务数据集层次结构。 (2)多级评分策略,其中基于分层框架提供了不同级别的模型性能。为了促进CUGE,我们提供了一个公共排行榜,可以自定义,以支持灵活的模型判断标准。代表性预先训练的语言模型的评估结果表明了对通用语言智能的完善的充足空间。 Cuge在Cuge.baai.ac.cn上公开提供。
translated by 谷歌翻译
随着数据和无线设备的爆炸性增长,联合学习(FL)已成为大型智能系统的有希望的技术。利用电磁波的模拟叠加,空中计算是一种吸引力的方法,以减少流量聚集中的通信负担。然而,随着对智能系统的迫切需求,具有超空气计算的多个任务的培训进一步加剧了通信资源的稀缺性。可以在一定程度上通过同时培训共享通信资源的多个任务来减轻此问题,但后者不可避免地带来任务间干扰的问题。在本文中,我们在多输入多输出(MIMO)干扰通道上使用空中多任务FL(OA-MTFL)。我们提出了一种新颖的模型聚集方法,用于对不同器件的局部梯度对准,这减轻了由于信道异质性而在空中计算中广泛存在的脱柱问题。通过考虑设备之间的空间相关性,为所提出的OA-MTFL方案建立统一的通信 - 计算分析框架,并制定设计收发器波束形成和设备选择的优化问题。我们通过使用交替优化(AO)和分数编程(FP)来开发算法来解决这个问题,这有效地缓解了任务间干扰对流程的影响。我们表明,由于使用新的模型聚合方法,设备选择对我们的方案不再是必不可少的,从而避免了通过实现设备选择引起的重大计算负担。数值结果证明了分析的正确性和所提出的计划的出色性能。
translated by 谷歌翻译
这项工作提出了一种新的计算框架,用于学习用于真实数据集的明确生成模型。特别地,我们建议在包含多个独立的多维线性子空间组成的特征空间中的多类多维数据分发和{线性判别表示(LDR)}之间学习{\ EM闭环转录}。特别地,我们认为寻求的最佳编码和解码映射可以被配制为编码器和解码器之间的{\ em二手最小游戏的均衡点}。该游戏的自然实用功能是所谓的{\ em速率减少},这是一个简单的信息定理措施,用于特征空间中子空间类似的高斯的混合物之间的距离。我们的配方利用来自控制系统的闭环误差反馈的灵感,避免昂贵的评估和最小化数据空间或特征空间的任意分布之间的近似距离。在很大程度上,这种新的制定统一了自动编码和GaN的概念和益处,并自然将它们扩展到学习多级和多维实际数据的判别和生成}表示的设置。我们对许多基准图像数据集的广泛实验表明了这种新的闭环配方的巨大潜力:在公平的比较下,学习的解码器的视觉质量和编码器的分类性能是竞争力的,并且通常比基于GaN,VAE或基于GaN,VAE或基于GaN,VAE的方法更好的方法两者的组合。我们注意到所以,不同类别的特征在特征空间中明确地映射到大约{em独立的主管子空间};每个类中的不同视觉属性由每个子空间中的{\ em独立主体组件}建模。
translated by 谷歌翻译
近年来,卷积神经网络(CNNS)已成功应用于单个目标跟踪任务。通常,训练深层CNN模型需要众多标记的训练样本,并且这些样品的数量和质量直接影响训练模型的代表性能力。然而,这种方法在实践中是限制性的,因为手动标记了这么大的训练样本是耗时的并且非常昂贵。在本文中,我们提出了一种用于深度视觉跟踪的主动学习方法,其选择和注释未标记的样本以培训深度CNNS模型。在主动学习的指导下,基于受过训练的深CNN模型的跟踪器可以实现竞争性跟踪性能,同时降低标签成本。更具体地,为了确保所选样本的多样性,我们提出了一种基于多帧协作的主动学习方法,以选择应该是并且需要注释的那些训练样本。同时,考虑到这些所选样本的代表性,我们采用基于平均最近邻距离的最近邻差异距离筛选隔离样本和低质量样品。因此,基于我们的方法选择的训练样本子集仅需要一个给定的预算来维持整个样本集的多样性和代表性。此外,我们采用TVERSKY亏损来改进跟踪器的边界框估计,这可以确保跟踪器实现更准确的目标状态。广泛的实验结果证实,我们的积极学习的跟踪器(ALT)与七个最具挑战性评估基准的最先进的跟踪器相比,与最先进的跟踪器相比,实现了竞争性的跟踪精度和速度。
translated by 谷歌翻译
Federated Edge Learning(Feel)已成为一种革命性的范式,可以在6G无线网络的边缘开发AI服务,因为它支持大量移动设备的协作模型培训。但是,无线通道上的模型通信,尤其是在上行链路模型上传的感觉中,已被广泛认为是一种严重限制感觉效率的瓶颈。尽管无线计算可以减轻广播资源在感觉上传中的过度成本,但无线空中感觉的实际实施仍然遭受了一些挑战,包括强烈的Straggler问题,大型沟通开销和潜在的隐私泄漏。在本文中,我们研究了这些挑战,并利用了未来无线系统的关键推动力,以应对这些挑战。我们研究了有关RIS授权的感觉的最新解决方案,并探索采用RIS增强感觉性能的有希望的研究机会。
translated by 谷歌翻译
We present a one-stage Fully Convolutional Line Parsing network (F-Clip) that detects line segments from images. The proposed network is very simple and flexible with variations that gracefully trade off between speed and accuracy for different applications. F-Clip detects line segments in an end-to-end fashion by predicting each line's center position, length, and angle. We further customize the design of convolution kernels of our fully convolutional network to effectively exploit the statistical priors of the distribution of line angles in real image datasets. We conduct extensive experiments and show that our method achieves a significantly better trade-off between efficiency and accuracy, resulting in a real-time line detector at up to 73 FPS on a single GPU. Such inference speed makes our method readily applicable to real-time tasks without compromising any accuracy of previous methods. Moreover, when equipped with a performance-improving backbone network, F-Clip is able to significantly outperform all state-of-the-art line detectors on accuracy at a similar or even higher frame rate. In other word, under same inference speed, F-Clip always achieving best accuracy compare with other methods. Source code https://github.com/Delay-Xili/F-Clip.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译