随着人工智能(AI)的积极发展,基于深神经网络(DNN)的智能应用会改变人们的生活方式和生产效率。但是,从网络边缘生成的大量计算和数据成为主要的瓶颈,传统的基于云的计算模式无法满足实时处理任务的要求。为了解决上述问题,通过将AI模型训练和推理功能嵌入网络边缘,Edge Intelligence(EI)成为AI领域的尖端方向。此外,云,边缘和终端设备之间的协作DNN推断提供了一种有希望的方法来增强EI。然而,目前,以EI为导向的协作DNN推断仍处于早期阶段,缺乏对现有研究工作的系统分类和讨论。因此,我们已经对有关以EI为导向的协作DNN推断的最新研究进行了全面调查。在本文中,我们首先回顾了EI的背景和动机。然后,我们为EI分类了四个典型的DNN推理范例,并分析其特征和关键技术。最后,我们总结了协作DNN推断的当前挑战,讨论未来的发展趋势并提供未来的研究方向。
translated by 谷歌翻译
未来的互联网涉及几种新兴技术,例如5G和5G网络,车辆网络,无人机(UAV)网络和物联网(IOT)。此外,未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定,以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法,例如单药强化学习(RL)或深入强化学习(DRL),以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是,这种算法未能对网络实体之间的合作或竞争进行建模,而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习(MARL)允许每个网络实体不仅观察环境,还可以观察其他实体的政策来学习其最佳政策。结果,MAL可以显着提高网络实体的学习效率,并且最近已用于解决新兴网络中的各种问题。在本文中,我们因此回顾了MAL在新兴网络中的应用。特别是,我们提供了MARL的教程,以及对MARL在下一代互联网中的应用进行全面调查。特别是,我们首先介绍单代机Agent RL和MARL。然后,我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问,传输电源控制,计算卸载,内容缓存,数据包路由,无人机网络的轨迹设计以及网络安全问题。
translated by 谷歌翻译
本文开发了一个通用框架,用于通过核规范正则化估计高维条件因子模型。我们建立了估计器的较大样本属性,并提供了用于查找估计器的有效计算算法以及选择正则化参数的交叉验证程序。一般框架使我们能够以统一的方式估算各种条件因素模型,并迅速提供新的渐近结果。我们采用该方法来分析单个美国股票收益的横截面,并发现施加同质性可以改善模型的样本外可预测性。
translated by 谷歌翻译
在大多数交互式图像生成任务中,用户感兴趣的区域(ROI),预计生成的结果将具有足够的外观,同时保持原始图像中正确且合理的结构。如果只有有限的数据,此类任务将变得更具挑战性。最近提出的生成模型仅基于一个图像完成培训。他们非常关注样本的整体特征,同时忽略样本中不同对象的实际语义信息。结果,对于基于ROI的生成任务,它们可能会产生过多随机性的不适当样品,而无需维护相关对象的正确结构。为了解决这个问题,这项工作介绍了名为Mogan的形态结构感知的生成对抗网络,该网络仅基于一个图像而产生具有不同外观和可靠结构的随机样品。对于ROI的培训,我们建议利用来自原始图像的数据,并引入新型模块将这种增强数据转换为包含结构和外观的知识,从而增强了模型对样品的理解。要学习除ROI以外的其他领域,我们采用二进制面具来确保与ROI隔离的一代。最后,我们设置了上述学习过程的平行和分层分支。与其他单一图像GAN方案相比,我们的方法着重于内部特征,包括维持理性结构和外观变化。实验比其竞争同行确认了我们模型对基于ROI的图像生成任务的能力。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译
In this tutorial paper, we look into the evolution and prospect of network architecture and propose a novel conceptual architecture for the 6th generation (6G) networks. The proposed architecture has two key elements, i.e., holistic network virtualization and pervasive artificial intelligence (AI). The holistic network virtualization consists of network slicing and digital twin, from the aspects of service provision and service demand, respectively, to incorporate service-centric and user-centric networking. The pervasive network intelligence integrates AI into future networks from the perspectives of networking for AI and AI for networking, respectively. Building on holistic network virtualization and pervasive network intelligence, the proposed architecture can facilitate three types of interplay, i.e., the interplay between digital twin and network slicing paradigms, between model-driven and data-driven methods for network management, and between virtualization and AI, to maximize the flexibility, scalability, adaptivity, and intelligence for 6G networks. We also identify challenges and open issues related to the proposed architecture. By providing our vision, we aim to inspire further discussions and developments on the potential architecture of 6G.
translated by 谷歌翻译