以任务为导向的通信,主要是使用基于学习的联合源通道编码(JSCC),旨在通过将与任务相关的信息传输到接收方来设计通信有效的边缘推理系统。但是,只有在不引入任何冗余的情况下传输与任务相关的信息可能会导致由于渠道变化引起的学习鲁棒性问题,而JSCC将源数据直接映射到连续的通道输入符号中会对现有数字通信系统提出兼容性问题。在本文中,我们通过首先调查编码表示形式的信息性与接收到的信息失真的鲁棒性之间的固有权衡解决这两个问题,然后提出一种具有任务调制的导向的通信方案,名为Inveete Task-定向的JSCC(DT-JSCC),其中发射器将功能编码为离散表示形式,并使用数字调制方案将其传输到接收器。在DT-JSCC方案中,我们开发了一个可靠的编码框架,称为强大的信息瓶颈(rib),以改善对信道变化的稳健性,并使用变量近似来得出肋骨目标的可拖动变异上限,以克服克服相互信息的计算棘手性。实验结果表明,所提出的DT-JSCC比具有低通信延迟的基线方法更好的推理性能更好,并且由于施加的肋骨框架而表现出对通道变化的鲁棒性。
translated by 谷歌翻译
最近,事物的人工智能(Aiot)一直在引起人们的关注,具有通过事物的网络连接提供高度智能服务的有趣愿景,从而导致了先进的AI驱动生态。但是,对数据隐私的最新监管限制排除将敏感的本地数据上传到数据中心,并以集中式方法利用它们。在这种情况下,直接应用联合学习算法几乎不能满足效率和准确性的工业要求。因此,我们在面部识别应用方面为AIOT提出了一个有效的工业联合学习框架。具体而言,我们建议利用转移学习的概念来加快设备上的联合培训,并进一步介绍私人投影仪的新颖设计,该设计有助于保护共享梯度,而不会产生额外的记忆消耗或计算成本。对亚洲私人面部数据集的实证研究表明,我们的方法仅在20轮沟通中就可以实现高认识的准确性,这表明了其在预测和培训方面的有效性。
translated by 谷歌翻译
The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vision and natural language processing tasks, training with a large batch easily suffers from the loss of accuracy. Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks. To tackle this problem, we first theoretically show that different frequencies of ids make it challenging to scale hyperparameters when scaling the batch size. To stabilize the training process in a large batch size setting, we develop the adaptive Column-wise Clipping (CowClip). It enables an easy and effective scaling rule for the embeddings, which keeps the learning rate unchanged and scales the L2 loss. We conduct extensive experiments with four CTR prediction networks on two real-world datasets and successfully scaled 128 times the original batch size without accuracy loss. In particular, for CTR prediction model DeepFM training on the Criteo dataset, our optimization framework enlarges the batch size from 1K to 128K with over 0.1% AUC improvement and reduces training time from 12 hours to 10 minutes on a single V100 GPU. Our code locates at https://github.com/bytedance/LargeBatchCTR.
translated by 谷歌翻译
由于客户端的通信资源有限和大量的模型参数,大规模分布式学习任务遭受通信瓶颈。梯度压缩是通过传输压缩梯度来减少通信负载的有效方法。由于在随机梯度下降的情况下,相邻轮的梯度可能具有高相关,因为他们希望学习相同的模型,提出了一种用于联合学习的实用梯度压缩方案,它使用历史梯度来压缩梯度并且基于Wyner-Ziv编码但没有任何概率的假设。我们还在实时数据集上实现了我们的渐变量化方法,我们的方法的性能优于前一个方案。
translated by 谷歌翻译
Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other tasks. To address the need for efficient and task-independent model parallelism, we introduce GPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, GPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, GPipe utilizes a novel batchsplitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. We demonstrate the advantages of GPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i) Image Classification: We train a 557-million-parameter AmoebaNet model and attain a top-1 accuracy of 84.4% on ImageNet-2012, (ii) Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.Preprint. Under review.
translated by 谷歌翻译
特征选择是开发行业规模的深度点击率(CTR)预测系统的重要步骤。神经特征选择(NFS)的目标是选择一个相对较小的特征子集,具有最佳的解释性功率,作为消除冗余特征并降低计算成本的方法。灵感来自于基于梯度的神经结构搜索(NAS)和网络修剪方法,人们用Gating方法解决了NFS问题,该门控方法插入一组可分辨率的二进制栅极以降低更少的信息特征。二进制栅极以高效的端到端方式与网络参数一起优化。在本文中,我们从探索剥削角度分析了基于梯度的解决方案,并利用经验结果表明门控方法可能遭受勘探不足。为了提高基于梯度的解决方案的探索能力,我们提出了一种简单但有效的集合学习方法,名为Enemble Gating。我们选择两个公共数据集,即Avazu和Criteo,以评估这种方法。我们的实验表明,如果没有添加任何计算开销或引入任何超参数(除了集合的大小除外),我们的方法能够始终如一地提高门控方法,并在两个数据集中找到更好的特征子集,其中包含三个不同的底层底层CTR预测模型。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译