卷积神经网络(CNN)已在许多物联网(IoT)设备中应用于多种下游任务。但是,随着边缘设备上的数据量的增加,CNN几乎无法及时完成某些任务,而计算和存储资源有限。最近,过滤器修剪被认为是压缩和加速CNN的有效技术,但是从压缩高维张量的角度来看,现有的方法很少是修剪CNN。在本文中,我们提出了一种新颖的理论,可以在三维张量中找到冗余信息,即量化特征图(QSFM)之间的相似性,并利用该理论来指导滤波器修剪过程。我们在数据集(CIFAR-10,CIFAR-100和ILSVRC-12)上执行QSFM和Edge设备,证明所提出的方法可以在神经网络中找到冗余信息,具有可比的压缩和可耐受的准确性下降。没有任何微调操作,QSFM可以显着压缩CIFAR-56(48.7%的Flops和57.9%的参数),而TOP-1的准确性仅损失0.54%。对于边缘设备的实际应用,QSFM可以将Mobilenet-V2推理速度加速1.53倍,而ILSVRC-12 TOP-1的精度仅损失1.23%。
translated by 谷歌翻译
常规的几杆分类(FSC)旨在识别出有限标记的数据的新课程中的样本。最近,已经提出了域泛化FSC(DG-FSC),目的是识别来自看不见的域的新型类样品。 DG-FSC由于基础类(用于培训)和新颖类(评估中遇到)之间的域移位,对许多模型构成了巨大的挑战。在这项工作中,我们为解决DG-FSC做出了两个新颖的贡献。我们的首要贡献是提出重生网络(BAN)情节培训,并全面研究其对DG-FSC的有效性。作为一种特定的知识蒸馏形式,已证明禁令可以通过封闭式设置来改善常规监督分类的概括。这种改善的概括促使我们研究了DG-FSC的禁令,我们表明禁令有望解决DG-FSC中遇到的域转移。在令人鼓舞的发现的基础上,我们的第二个(主要)贡献是提出很少的禁令,FS-Ban,这是DG-FSC的新型禁令方法。我们提出的FS-BAN包括新颖的多任务学习目标:相互正则化,不匹配的老师和元控制温度,这些目标都是专门设计的,旨在克服DG-FSC中的中心和独特挑战,即过度拟合和领域差异。我们分析了这些技术的不同设计选择。我们使用六个数据集和三个基线模型进行全面的定量和定性分析和评估。结果表明,我们提出的FS-BAN始终提高基线模型的概括性能,并达到DG-FSC的最先进的准确性。
translated by 谷歌翻译
显着对象检测(SOD)是一个流行而重要的主题,旨在精确检测和分割图像中有趣的区域。我们将语言信息集成到专为显着对象检测任务的基于视觉的U结构网络中。实验基于新创建的DUTS Cross Modal(DUTS-CM)数据集,该数据集包含视觉和语言标签。我们提出了一个称为高效跨模式自我注意力(ECMSA)的新模块,以结合视觉和语言特征并提高原始U结构网络的性能。同时,为了减轻标签的沉重负担,我们通过训练基于DUTS-CM数据集的图像标题模型来采用半监督的学习方法,该模型可以自动标记其他数据集(如Dut-omron和HKU-IS)。综合实验表明,通过自然语言输入可以提高SOD的性能,并且与其他SOD方法相比具有竞争力。
translated by 谷歌翻译
这项工作研究了标签平滑(LS)和知识蒸馏(KD)之间的兼容性。解决这一论文陈述的当代发现采取二分法的观点:Muller等。 (2019)和Shen等。 (2021b)。至关重要的是,没有努力理解和解决这些矛盾的发现,留下了原始问题 - 顺利还是不平稳教师网络? - 未得到答复。我们工作的主要贡献是对系统扩散的发现,分析和验证是缺失的概念,这在理解和解决这些矛盾的发现方面具有重要作用。这种系统的扩散基本上削减了从LS训练的老师蒸馏的好处,从而使KD在升高的温度无效时使KD呈现。我们的发现得到了大规模实验,分析和案例研究的全面支持,包括图像分类,神经机器翻译和紧凑的学生蒸馏任务,这些任务跨越了多个数据集和教师 - 学生架构。根据我们的分析,我们建议从业者使用具有低温转移的LS训练的老师来实现高性能学生。代码和型号可在https://keshik6.github.io/revisiting-ls-kd-compatibility/
translated by 谷歌翻译
在本文中,我们提出了两种技术,即联合建模和数据增强,以改善视听场景分类(AVSC)的系统性能。我们采用仅在图像数据集中培训的预训练网络来提取视频嵌入;而对于音频嵌入模型,我们决定从头开始训练它们。我们探索不同的神经网络体系结构,以有效地结合视频和音频方式。此外,研究了数据增强策略以增加视听训练设置的规模。对于视频方式,验证了兰德金几个操作的有效性。提出了Audio-Video关节混合方案,以进一步改善AVSC的性能。在Tau Urban Audio Visual Spacees 2021的开发集中,我们的最终系统可以在提交给Dcase 2021 Task 1B的所有单个AVSC系统中达到94.2%的最佳准确性。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译