Current methods for few-shot action recognition mainly fall into the metric learning framework following ProtoNet. However, they either ignore the effect of representative prototypes or fail to enhance the prototypes with multimodal information adequately. In this work, we propose a novel Multimodal Prototype-Enhanced Network (MORN) to use the semantic information of label texts as multimodal information to enhance prototypes, including two modality flows. A CLIP visual encoder is introduced in the visual flow, and visual prototypes are computed by the Temporal-Relational CrossTransformer (TRX) module. A frozen CLIP text encoder is introduced in the text flow, and a semantic-enhanced module is used to enhance text features. After inflating, text prototypes are obtained. The final multimodal prototypes are then computed by a multimodal prototype-enhanced module. Besides, there exist no evaluation metrics to evaluate the quality of prototypes. To the best of our knowledge, we are the first to propose a prototype evaluation metric called Prototype Similarity Difference (PRIDE), which is used to evaluate the performance of prototypes in discriminating different categories. We conduct extensive experiments on four popular datasets. MORN achieves state-of-the-art results on HMDB51, UCF101, Kinetics and SSv2. MORN also performs well on PRIDE, and we explore the correlation between PRIDE and accuracy.
translated by 谷歌翻译
对于许多技术领域的专业用户,例如医学,遥感,精密工程和科学研究,无损和近乎无情的图像压缩至关重要。但是,尽管在基于学习的图像压缩方面的研究兴趣迅速增长,但没有发表的方法提供无损和近乎无情的模式。在本文中,我们提出了一个统一而强大的深层损失加上残留(DLPR)编码框架,以实现无损和近乎无情的图像压缩。在无损模式下,DLPR编码系统首先执行有损压缩,然后执行残差的无损编码。我们在VAE的方法中解决了关节损失和残留压缩问题,并添加残差的自回归上下文模型以增强无损压缩性能。在近乎荒谬的模式下,我们量化了原始残差以满足给定的$ \ ell_ \ infty $错误绑定,并提出了可扩展的近乎无情的压缩方案,该方案适用于可变$ \ ell_ \ infty $ bunds而不是训练多个网络。为了加快DLPR编码,我们通过新颖的编码环境设计提高了算法并行化的程度,并以自适应残留间隔加速熵编码。实验结果表明,DLPR编码系统以竞争性的编码速度实现了最先进的无损和近乎无效的图像压缩性能。
translated by 谷歌翻译
变形金刚占据了自然语言处理领域,最近影响了计算机视觉区域。在医学图像分析领域中,变压器也已成功应用于全栈临床应用,包括图像合成/重建,注册,分割,检测和诊断。我们的论文旨在促进变压器在医学图像分析领域的认识和应用。具体而言,我们首先概述了内置在变压器和其他基本组件中的注意机制的核心概念。其次,我们回顾了针对医疗图像应用程序量身定制的各种变压器体系结构,并讨论其局限性。在这篇综述中,我们调查了围绕在不同学习范式中使用变压器,提高模型效率及其与其他技术的耦合的关键挑战。我们希望这篇评论可以为读者提供医学图像分析领域的读者的全面图片。
translated by 谷歌翻译
最近,对建立问题的兴趣越来越兴趣,其中跨多种模式(如文本和图像)的原因。但是,使用图像的QA通常仅限于从预定义的选项集中挑选答案。此外,在现实世界中的图像,特别是在新闻中,具有与文本共同参考的对象,其中来自两个模态的互补信息。在本文中,我们提出了一种新的QA评估基准,并在新闻文章中提出了1,384个问题,这些文章需要跨媒体接地图像中的物体接地到文本上。具体地,该任务涉及需要推理图像标题对的多跳问题,以识别接地的视觉对象,然后从新闻正文文本中预测跨度以回答问题。此外,我们介绍了一种新颖的多媒体数据增强框架,基于跨媒体知识提取和合成问题答案生成,自动增强可以为此任务提供弱监管的数据。我们在我们的基准测试中评估了基于管道和基于端到端的预先预测的多媒体QA模型,并表明他们实现了有希望的性能,而在人类性能之后大幅滞后,因此留下了未来工作的大型空间,以便在这一具有挑战性的新任务上的工作。
translated by 谷歌翻译
我们提出了一种与变压器的端到端图像压缩和分析模型,针对基于云的图像分类应用程序。代替将现有的变换器的图像分类模型直接放置在图像编解码器之后,我们的目的是重新设计视觉变换器(VIV)模型,以从压缩特征执行图像分类,并促进来自变压器的长期信息的图像压缩。具体而言,我们首先用由卷积神经网络建模的轻量级图像编码器更换vit模型的涂抹杆(即图像分裂和嵌入)。由图像编码器产生的压缩特征被注入卷积电感偏压,并被馈送到变压器,用于绕过图像重建。同时,我们提出了一种特征聚合模块,使压缩特征熔断具有变压器的所选中间特征,并将聚合特征馈送到用于图像重建的解卷积神经网络。聚合特征可以从变压器的自我关注机构获得长期信息,并提高压缩性能。速率 - 失真准确度优化问题最终通过两步培训策略解决。实验结果证明了所提出的模型在图像压缩和分类任务中的有效性。
translated by 谷歌翻译
我们将简要介绍本文Trecvid2021中WHU-nercms的实验方法和结果。今年,我们参加了实例搜索的自动和交互式任务(INS)。对于自动任务,检索目标分为两个部分,人检索和动作检索。我们采用了两阶段方法,包括对人检索的面部检测和面部识别以及由三种基于框架的人类对象相互作用检测方法和两种基于视频的一般动作检测方法组成的两种动作检测方法。在那之后,人的检索结果和动作检索结果被融合以初始化结果排名列表。此外,我们尝试使用互补方法进一步提高搜索性能。对于交互式任务,我们在融合结果上测试了两种不同的交互策略。我们分别为自动和交互式任务提交4次运行。每次运行的引入显示在表1中。官方评估表明,所提出的策略在自动和交互式轨道中排名第一。
translated by 谷歌翻译
Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.
translated by 谷歌翻译
For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.
translated by 谷歌翻译
In this tutorial paper, we look into the evolution and prospect of network architecture and propose a novel conceptual architecture for the 6th generation (6G) networks. The proposed architecture has two key elements, i.e., holistic network virtualization and pervasive artificial intelligence (AI). The holistic network virtualization consists of network slicing and digital twin, from the aspects of service provision and service demand, respectively, to incorporate service-centric and user-centric networking. The pervasive network intelligence integrates AI into future networks from the perspectives of networking for AI and AI for networking, respectively. Building on holistic network virtualization and pervasive network intelligence, the proposed architecture can facilitate three types of interplay, i.e., the interplay between digital twin and network slicing paradigms, between model-driven and data-driven methods for network management, and between virtualization and AI, to maximize the flexibility, scalability, adaptivity, and intelligence for 6G networks. We also identify challenges and open issues related to the proposed architecture. By providing our vision, we aim to inspire further discussions and developments on the potential architecture of 6G.
translated by 谷歌翻译
In this paper, we investigate the joint device activity and data detection in massive machine-type communications (mMTC) with a one-phase non-coherent scheme, where data bits are embedded in the pilot sequences and the base station simultaneously detects active devices and their embedded data bits without explicit channel estimation. Due to the correlated sparsity pattern introduced by the non-coherent transmission scheme, the traditional approximate message passing (AMP) algorithm cannot achieve satisfactory performance. Therefore, we propose a deep learning (DL) modified AMP network (DL-mAMPnet) that enhances the detection performance by effectively exploiting the pilot activity correlation. The DL-mAMPnet is constructed by unfolding the AMP algorithm into a feedforward neural network, which combines the principled mathematical model of the AMP algorithm with the powerful learning capability, thereby benefiting from the advantages of both techniques. Trainable parameters are introduced in the DL-mAMPnet to approximate the correlated sparsity pattern and the large-scale fading coefficient. Moreover, a refinement module is designed to further advance the performance by utilizing the spatial feature caused by the correlated sparsity pattern. Simulation results demonstrate that the proposed DL-mAMPnet can significantly outperform traditional algorithms in terms of the symbol error rate performance.
translated by 谷歌翻译