Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE. Most existing efforts largely focused on directly extracting potentially useful information from images (such as pixel-level features, identified objects, and associated captions). However, such extraction processes may not be knowledge aware, resulting in information that may not be highly relevant. In this paper, we propose a novel Multi-modal Retrieval based framework (MoRe). MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively. Next, the retrieval results are sent to the textual and visual models respectively for predictions. Finally, a Mixture of Experts (MoE) module combines the predictions from the two models to make the final decision. Our experiments show that both our textual model and visual model can achieve state-of-the-art performance on four multi-modal NER datasets and one multi-modal RE dataset. With MoE, the model performance can be further improved and our analysis demonstrates the benefits of integrating both textual and visual cues for such tasks.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
尽管以前基于图的多视图聚类算法已经取得了重大进展,但其中大多数仍面临三个限制。首先,他们经常遭受高计算复杂性的困扰,这限制了他们在大规模场景中的应用。其次,他们通常在单视图级别或视图传感级别上执行图形学习,但经常忽略单视图和共识图的联合学习的可能性。第三,其中许多人依靠$ k $ - 表示光谱嵌入的离散化,这些嵌入缺乏直接使用离散群集结构直接学习图形的能力。鉴于此,本文通过统一和离散的两部分图(UDBGL)提出了一种有效的多视图聚类方法。具体而言,基于锚的子空间学习被合并为从多个视图中学习特定的二分化图,并利用双方图融合来学习具有自适应重量学习的视图 - 谐镜双分歧图。此外,施加Laplacian等级约束以确保融合的两分图具有离散的群集结构(具有特定数量的连接组件)。通过同时制定特定视图的两分图学习,视图 - 共表的两分图学习以及离散的群集结构学习到统一的目标函数中,然后设计有效的最小化算法来解决此优化问题,并直接实现离散的聚类解决方案解决方案解决方案解决方案解决方案。不需要其他分区,这特别是数据大小的线性时间复杂性。各种多视图数据集的实验证明了我们的UDBGL方法的鲁棒性和效率。
translated by 谷歌翻译
在过去的十年中,随着大数据技术的发展,越来越多的患者信息被存储为电子健康记录(EHRS)。利用这些数据,已经提出了各种医生建议系统。通常,此类研究以平坦结构的方式处理EHR数据,每次相遇都被视为一组无序的特征。然而,不得忽略索赔中存储的诸如服务序列之类的异质结构化信息。本文提出了一个医生推荐系统,并嵌入了时间,以使用异质图注意网络重建患者和医生之间的潜在联系。此外,为了解决患者数据共享交叉医院的隐私问题,还提出了一种基于最小化优化模型的联邦分散学习方法。基于图的推荐系统已在EHR数据集上进行了验证。与基线模型相比,提出的方法将AUC提高了6.2%。我们提出的基于联邦的算法不仅产生了虚拟的融合中心的性能,而且还具有O(1/T)的收敛速率。
translated by 谷歌翻译
蒙面自动编码在图像和语言领域的自我监督学习方面取得了巨大的成功。但是,基于面具的预处理尚未显示出对点云理解的好处,这可能是由于PointNet(PointNet)无法正确处理训练的标准骨架,而不是通过训练期间掩盖引入的测试分配不匹配。在本文中,我们通过提出一个判别性掩码式变压器框架,maskPoint}来弥合这一差距。我们的关键想法是将点云表示为离散的占用值(1如果点云的一部分;如果不是的,则为0),并在蒙版对象点和采样噪声点之间执行简单的二进制分类作为代理任务。这样,我们的方法是对点云中的点采样差异的强大,并促进了学习丰富的表示。我们在几个下游任务中评估了验证的模型,包括3D形状分类,分割和现实词对象检测,并展示了最新的结果,同时获得了明显的预读速度(例如,扫描仪上的4.1倍)先前的最新变压器基线。代码可在https://github.com/haotian-liu/maskpoint上找到。
translated by 谷歌翻译
Multiconer共享的任务旨在检测在多种语言的简短和低文本设置中,在语义上模棱两可且复杂的命名实体。缺乏上下文使人们对歧义的命名实体的认识充满挑战。为了减轻此问题,我们的团队Damo-NLP提出了一个基于知识的系统,我们在其中建立了基于Wikipedia的多语言知识基础,以向指定的实体识别(NER)模型提供相关的上下文信息。给定输入句子,我们的系统有效地从知识库中检索了相关上下文。然后,将原始输入句子加强此类上下文信息,从而可以捕获明显更好的上下文化令牌表示。我们的系统在Multiconer共享任务中赢得了13个曲目中的10个。
translated by 谷歌翻译
在本文中,我们提出了一个新型的相互一致性网络(MC-NET+),以有效利用未标记的数据进行半监督的医学图像分割。 MC-NET+模型的动机是通过观察到的,即经过有限注释训练的深模型很容易输出不确定的,易于分类的预测,例如模棱两可的区域(例如,粘合边缘或薄分支)进行医学图像分割。利用这些具有挑战性的样品可以使半监督分割模型训练更有效。因此,我们提出的MC-NET+模型由两个新设计组成。首先,该模型包含一个共享的编码器和多个略有不同的解码器(即使用不同的上采样策略)。计算多个解码器输出的统计差异以表示模型的不确定性,这表明未标记的硬区域。其次,我们在一个解码器的概率输出和其他解码器的软伪标签之间应用了一种新颖的相互一致性约束。通过这种方式,我们最大程度地减少了训练过程中多个输出(即模型不确定性)的差异,并迫使模型在此类具有挑战性的区域中产生不变的结果,旨在使模型训练正规化。我们将MC-NET+模型的细分结果与三个公共医疗数据集中的五种最先进的半监督方法进行了比较。具有两个标准半监督设置的扩展实验证明了我们模型的优越性能,而不是其他方法,这为半监督医学图像分割设定了新的最新技术。我们的代码将在https://github.com/ycwu1997/mc-net上公开发布。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译