智能论文笔记

Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

Andrey Ignatov , Radu Timofte , Shuai Liu , Chaoyu Feng , Furui Bai , Xiaotao Wang , Lei Lei , Ziyao Yi , Yan Xiang , Zibin Liu

分类：计算机视觉

2022-11-07

The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.

translated by 谷歌翻译

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Ren Yang , Radu Timofte , Xin Li , Qi Zhang , Lin Zhang , Fanglong Liu , Dongliang He , Fu li , He Zheng , Weihang Yuan

分类：计算机视觉

2022-08-23

本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率，轨迹〜2靶向压缩视频的超分辨率。在轨道1中，我们使用流行的数据集DIV2K作为培训，验证和测试集。在轨道2中，我们提出了LDV 3.0数据集，其中包含365个视频，包括LDV 2.0数据集（335个视频）和30个其他视频。在这一挑战中，有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。

translated by 谷歌翻译

Grad-Align+: Empowering Gradual Network Alignment Using Attribute Augmentation

Jin-Duk Park , Cong Tran , Won-Yong Shin , Xin Cao

分类：人工智能 | 机器学习 | 神经与进化计算

2022-08-23

网络对齐（NA）是在不同网络上发现节点对应关系的任务。尽管NA方法在无数场景中取得了巨大的成功，但它们的令人满意的性能并非没有先前的锚链接信息和/或节点属性，这可能并不总是可用。在本文中，我们提出了一种使用节点属性增强的新型NA方法的Grad-Align+，对于没有此类其他信息，它非常健壮。 Grad-Align+建立在最近的最新NA方法（所谓的Grad-Align）上，该方法逐渐发现了节点对的一部分，直到找到所有节点对。具体而言，grad Align+由以下关键组成组成：1）基于节点的中心度度量的增强节点属性，2）计算从图神经网络中提取的嵌入相似性矩阵，并在该图中提取了增强节点属性，并在其中进食增强的节点属性和3）通过计算相对于对齐的跨网络邻域对，逐渐发现节点对。实验结果表明，Grad-Align+具有（a）优于基准NA方法的优势，（b）我们理论发现的经验验证，以及（c）我们属性增强模块的有效性。

translated by 谷歌翻译

Faint Features Tell: Automatic Vertebrae Fracture Screening Assisted by Contrastive Learning

Xin Wei , Huaiwei Cong , Zheng Zhang , Junran Peng , Guoping Chen , Jinpeng Li

分类：计算机视觉

2022-08-23

长期椎骨骨折严重影响了患者的生活质量，导致脑诊断，腰椎畸形甚至瘫痪。计算机断层扫描（CT）是在早期筛查该疾病的常见临床检查。但是，微弱的放射学表现和非特异性症状导致遗体诊断的高风险。特别是，对于深度学习模型和缺乏经验的医生而言，轻度骨折和正常对照很难区分。在本文中，我们认为增强微弱的断裂特征以鼓励阶层间的可分离性是提高准确性的关键。在此激励的情况下，我们提出了一个基于对比度学习的监督模型，以通过CT扫描估算Genent的椎骨骨折等级。作为一项辅助任务，受监督的对比学习在将其他人推开的同时缩小了同一类中特征的距离，从而增强了模型捕获椎骨骨折的微妙特征的能力。考虑到该领域缺乏数据集，我们构建了一个数据库，其中包括经验丰富的放射科医生注释的208个样本。我们的方法的特异性为99 \％，在二元分类中的敏感性为85％，在多分类中的Macio-F1为77 \％，表明对比度学习显着提高了椎骨骨折筛选的准确性，尤其是在轻度断裂和正常对照。我们的脱敏数据和代码将公开为社区提供。

translated by 谷歌翻译

Multi-target Tracking of Zebrafish based on Particle Filter

Heng Cong , Mingzhu Sun , Duoying Zhou , Xin Zhao

分类：计算机视觉

2022-08-09

斑马鱼是一种出色的模型生物，已在生物实验，药物筛查和群智能领域广泛使用。近年来，有许多用于跟踪行为研究涉及斑马鱼的技术，这使其攻击许多领域的科学家的注意力。斑马鱼的多目标跟踪仍然面临许多挑战。高流动性和不确定性使得难以预测其运动；相似的外观和纹理功能使建立外观模型变得困难。由于频繁的阻塞，甚至很难将轨迹连接起来。在本文中，我们使用粒子过滤器来近似运动的不确定性。首先，通过分析斑马鱼的运动特性，我们建立了一个有效的混合运动模型来预测其位置。然后，我们根据预测位置建立一个外观模型，以预测每个目标的姿势，同时通过比较预测的姿势和观察姿势的差来称量颗粒；最后，我们通过加权位置获得了单斑马鱼的最佳位置，并使用关节颗粒过滤器来处理多个斑马鱼的轨迹链接。

translated by 谷歌翻译

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

Wayne Xin Zhao , Kun Zhou , Zheng Gong , Beichen Zhang , Yuanhang Zhou , Jing Sha , Zhigang Chen , Shijin Wang , Cong Liu , Ji-Rong Wen

分类：自然语言处理 | 人工智能

2022-06-13

本文旨在通过介绍第一个中国数学预训练的语言模型〜（PLM）来提高机器的数学智能，以有效理解和表示数学问题。与其他标准NLP任务不同，数学文本很难理解，因为它们在问题陈述中涉及数学术语，符号和公式。通常，它需要复杂的数学逻辑和背景知识来解决数学问题。考虑到数学文本的复杂性质，我们设计了一种新的课程预培训方法，用于改善由基本和高级课程组成的数学PLM的学习。特别是，我们首先根据位置偏见的掩盖策略执行令牌级预训练，然后设计基于逻辑的预训练任务，旨在分别恢复改组的句子和公式。最后，我们介绍了一项更加困难的预训练任务，该任务强制执行PLM以检测和纠正其生成的解决方案中的错误。我们对离线评估（包括九个与数学相关的任务）和在线$ A/B $测试进行了广泛的实验。实验结果证明了与许多竞争基线相比，我们的方法的有效性。我们的代码可在：\ textColor {blue} {\ url {https://github.com/rucaibox/jiuzhang}}}中获得。

translated by 谷歌翻译

Enhanced Language Representation with Label Knowledge for Span Extraction

Pan Yang , Xin Cong , Zhenyun Sun , Xingwu Liu

分类：自然语言处理

2021-11-01

跨度提取，旨在从纯文本中提取文本跨度（如单词或短语），是信息提取中的基本过程。最近的作品介绍了通过将跨度提取任务正式化为问题（QA正式化）的跨度提取任务来提高文本表示，以实现最先进的表现。然而，QA正规化并没有充分利用标签知识并遭受培训/推理的低效率。为了解决这些问题，我们介绍了一种新的范例来整合标签知识，并进一步提出一个小说模型，明确有效地将标签知识集成到文本表示中。具体而言，它独立地编码文本和标签注释，然后将标签知识集成到文本表示中，并使用精心设计的语义融合模块进行文本表示。我们在三个典型的跨度提取任务中进行广泛的实验：扁平的网，嵌套网和事件检测。实证结果表明，我们的方法在四个基准测试中实现了最先进的性能，而且分别将培训时间和推理时间降低76％和77％，与QA形式化范例相比。我们的代码和数据可在https://github.com/apkepers/lear中获得。

translated by 谷歌翻译

Neural Mean Discrepancy for Efficient Out-of-Distribution Detection

Xin Dong , Junfeng Guo , Ang Li , Wei-Te Ting , Cong Liu , H. T. Kung

分类：机器学习

2021-04-23

通过增强模型，输入示例，培训集和优化目标，已经提出了各种方法进行分发（OOD）检测。偏离现有工作，我们有一个简单的假设，即标准的离心模型可能已经包含有关训练集分布的足够信息，这可以利用可靠的ood检测。我们对验证这一假设的实证研究，该假设测量了模型激活的模型和分布（ID）迷你批次，发现OOD Mini-Batches的激活手段一直偏离培训数据的培训数据。此外，培训数据的激活装置可以从批量归一化层作为“自由午餐”中有效地计算或从批量归一化层次上检索。基于该观察，我们提出了一种名为神经平均差异（NMD）的新型度量，其比较了输入示例和训练数据的神经手段。利用NMD的简单性，我们提出了一种有效的OOD探测器，通过标准转发通道来计算神经手段，然后是轻量级分类器。广泛的实验表明，在检测精度和计算成本方面，NMD跨越多个数据集和模型架构的最先进的操作。

translated by 谷歌翻译

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma , Suzhen Wang , Zhipeng Hu , Changjie Fan , Tangjie Lv , Yu Ding , Zhidong Deng , Xin Yu

分类：计算机视觉

2023-01-03

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

translated by 谷歌翻译

Follow the Timeline! Generating Abstractive and Extractive Timeline Summary in Chronological Order

Xiuying Chen , Mingzhe Li , Shen Gao , Zhangming Chan , Dongyan Zhao , Xin Gao , Xiangliang Zhang , Rui Yan

分类：自然语言处理

2023-01-02

Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.

translated by 谷歌翻译