In recent years, the exponential proliferation of smart devices with their intelligent applications poses severe challenges on conventional cellular networks. Such challenges can be potentially overcome by integrating communication, computing, caching, and control (i4C) technologies. In this survey, we first give a snapshot of different aspects of the i4C, comprising background, motivation, leading technological enablers, potential applications, and use cases. Next, we describe different models of communication, computing, caching, and control (4C) to lay the foundation of the integration approach. We review current state-of-the-art research efforts related to the i4C, focusing on recent trends of both conventional and artificial intelligence (AI)-based integration approaches. We also highlight the need for intelligence in resources integration. Then, we discuss integration of sensing and communication (ISAC) and classify the integration approaches into various classes. Finally, we propose open challenges and present future research directions for beyond 5G networks, such as 6G.
translated by 谷歌翻译
假设源标签空间集成了目标一个,部分视频域适应(PVDA)是跨域视频分类问题的更一般和实际的场景。 PVDA的主要挑战是减轻由仅源离群类别类别引起的负转移。为了应对这一挑战,一个关键的步骤是通过提高目标类别和下降的异常值类来汇总目标预测,以分配类权重。但是,班级权重的错误预测会误导网络并导致负转移。以前的工作通过使用时间特征和注意力机制来提高类重量的准确性,但是当试图在域移动显着时,尝试产生准确的类重量时,这些方法可能会缺乏,就像在大多数真实世界中一样。为了应对这些挑战,我们提出了多模式集群校准的部分对抗网络(MCAN)。 MCAN通过多个时间尺度的多模式特征增强了视频功能提取,以形成更强大的整体特征。它利用一种新型的类重量校准方法来减轻由不正确的类重量引起的负转移。校准方法试图使用无监督聚类所隐含的分布信息来识别和权衡正确和错误的预测。与最先进的PVDA方法相比,对盛行的PVDA基准进行了广泛的实验,而拟议的MCAN取得了重大改进。
translated by 谷歌翻译
Domain adaptation (DA) approaches address domain shift and enable networks to be applied to different scenarios. Although various image DA approaches have been proposed in recent years, there is limited research towards video DA. This is partly due to the complexity in adapting the different modalities of features in videos, which includes the correlation features extracted as long-term dependencies of pixels across spatiotemporal dimensions. The correlation features are highly associated with action classes and proven their effectiveness in accurate video feature extraction through the supervised action recognition task. Yet correlation features of the same action would differ across domains due to domain shift. Therefore we propose a novel Adversarial Correlation Adaptation Network (ACAN) to align action videos by aligning pixel correlations. ACAN aims to minimize the distribution of correlation information, termed as Pixel Correlation Discrepancy (PCD). Additionally, video DA research is also limited by the lack of cross-domain video datasets with larger domain shifts. We, therefore, introduce a novel HMDB-ARID dataset with a larger domain shift caused by a larger statistical difference between domains. This dataset is built in an effort to leverage current datasets for dark video classification. Empirical results demonstrate the state-of-the-art performance of our proposed ACAN for both existing and the new video DA datasets.
translated by 谷歌翻译
神经架构的创新促进了语言建模和计算机视觉中的重大突破。不幸的是,如果网络参数未正确初始化,新颖的架构通常会导致挑战超参数选择和培训不稳定。已经提出了许多架构特定的初始化方案,但这些方案并不总是可移植到新体系结构。本文介绍了毕业,一种用于初始化神经网络的自动化和架构不可知论由方法。毕业基础是一个简单的启发式;调整每个网络层的规范,使得具有规定的超参数的SGD或ADAM的单个步骤导致可能的损耗值最小。通过在每个参数块前面引入标量乘数变量,然后使用简单的数字方案优化这些变量来完成此调整。 GradInit加速了许多卷积架构的收敛性和测试性能,无论是否有跳过连接,甚至没有归一化层。它还提高了机器翻译的原始变压器架构的稳定性,使得在广泛的学习速率和动量系数下使用ADAM或SGD来训练它而无需学习速率预热。代码可在https://github.com/zhuchen03/gradinit上获得。
translated by 谷歌翻译
数据增强可帮助神经网络通过放大培训集来更好地推广,但它仍然是如何有效增强图数据以增强GNN的性能的开放问题(图形神经网络)。虽然大多数现有图形常规程序专注于通过添加/删除边缘来操纵图形拓扑结构,但我们提供了一种增强节点功能以获得更好性能的方法。我们提出标志(图中的免费大规模对抗动力增强),它在训练期间迭代地增强了基于梯度的对冲扰动的节点特征。通过使模型不变地在输入数据中的小波动中,我们的方法有助于模型推广到分布外的样本,并在测试时间提高模型性能。标志是图形数据的通用方法,它普遍存在节点分类,链路预测和图形分类任务中。标志也是非常灵活和可扩展的,并且可以使用任意GNN骨架和大规模数据集进行可部署。我们通过广泛的实验和消融研究证明了我们方法的功效和稳定性。我们还提供了直观的观察,以更深入地了解我们的方法。
translated by 谷歌翻译
黑暗视频中的动作识别任务在各种情况下很有用,例如夜间夜间监视和自动驾驶。尽管在正常照明的视频的动作识别任务中取得了进展,但在黑暗中很少有人研究动作识别。这部分是由于缺乏足够的数据集来完成此类任务。在本文中,我们探讨了黑暗视频中动作识别的任务。我们通过收集一个新数据集:黑暗(ARID)数据集中的动作识别来弥合此任务缺乏数据的差距。它由3,780多个具有11个动作类别的视频剪辑组成。据我们所知,这是第一个针对黑暗视频中人类行为的数据集。为了进一步了解我们的干旱数据集,我们详细分析了干旱数据集,并在合成黑暗视频中表现出了必要性。此外,我们在数据集上基准了几种当前动作识别模型的性能,并探索了提高其性能的潜在方法。我们的结果表明,当前的动作识别模型和框架增强方法可能不是黑暗视频中动作识别任务的有效解决方案。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译