智能论文笔记

WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning

Wenhao Wu , Wei Li , Xinyan Xiao , Jiachen Liu , Sujian Li , Yajuan Lv

分类：自然语言处理

2022-12-20

A crucial issue of current text generation models is that they often uncontrollably generate factually inconsistent text with respective of their inputs. Limited by the lack of annotated data, existing works in evaluating factual consistency directly transfer the reasoning ability of models trained on other data-rich upstream tasks like question answering (QA) and natural language inference (NLI) without any further adaptation. As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks. To alleviate this problem, we propose a weakly supervised framework that aggregates multiple resources to train a precise and efficient factual metric, namely WeCheck. WeCheck first utilizes a generative model to accurately label a real generated sample by aggregating its weak labels, which are inferred from multiple resources. Then, we train the target metric model with the weak supervision while taking noises into consideration. Comprehensive experiments on a variety of tasks demonstrate the strong performance of WeCheck, which achieves a 3.4\% absolute improvement over previous state-of-the-art methods on TRUE benchmark on average.

translated by 谷歌翻译

Low-rank Tensor Assisted K-space Generative Model for Parallel Imaging Reconstruction

Wei Zhang , Zengwei Xiao , Hui Tao , Minghui Zhang , Xiaoling Xu , Qiegen Liu

分类：计算机视觉

2022-12-11

Although recent deep learning methods, especially generative models, have shown good performance in fast magnetic resonance imaging, there is still much room for improvement in high-dimensional generation. Considering that internal dimensions in score-based generative models have a critical impact on estimating the gradient of the data distribution, we present a new idea, low-rank tensor assisted k-space generative model (LR-KGM), for parallel imaging reconstruction. This means that we transform original prior information into high-dimensional prior information for learning. More specifically, the multi-channel data is constructed into a large Hankel matrix and the matrix is subsequently folded into tensor for prior learning. In the testing phase, the low-rank rotation strategy is utilized to impose low-rank constraints on tensor output of the generative network. Furthermore, we alternately use traditional generative iterations and low-rank high-dimensional tensor iterations for reconstruction. Experimental comparisons with the state-of-the-arts demonstrated that the proposed LR-KGM method achieved better performance.

translated by 谷歌翻译

RLogist: Fast Observation Strategy on Whole-slide Images with Deep Reinforcement Learning

Boxuan Zhao , Jun Zhang , Deheng Ye , Jian Cao , Xiao Han , Qiang Fu , Wei Yang

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-04

Whole-slide images (WSI) in computational pathology have high resolution with gigapixel size, but are generally with sparse regions of interest, which leads to weak diagnostic relevance and data inefficiency for each area in the slide. Most of the existing methods rely on a multiple instance learning framework that requires densely sampling local patches at high magnification. The limitation is evident in the application stage as the heavy computation for extracting patch-level features is inevitable. In this paper, we develop RLogist, a benchmarking deep reinforcement learning (DRL) method for fast observation strategy on WSIs. Imitating the diagnostic logic of human pathologists, our RL agent learns how to find regions of observation value and obtain representative features across multiple resolution levels, without having to analyze each part of the WSI at the high magnification. We benchmark our method on two whole-slide level classification tasks, including detection of metastases in WSIs of lymph node sections, and subtyping of lung cancer. Experimental results demonstrate that RLogist achieves competitive classification performance compared to typical multiple instance learning algorithms, while having a significantly short observation path. In addition, the observation path given by RLogist provides good decision-making interpretability, and its ability of reading path navigation can potentially be used by pathologists for educational/assistive purposes. Our code is available at: \url{https://github.com/tencent-ailab/RLogist}.

translated by 谷歌翻译

Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

Andrey Ignatov , Radu Timofte , Cheng-Ming Chiang , Hsien-Kai Kuo , Yu-Syuan Xu , Man-Yu Lee , Allen Lu , Chia-Ming Cheng , Chih-Cheng Chen , Jia-Ying Yong

分类：计算机视觉

2022-11-07

Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.

translated by 谷歌翻译

High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

Yucong Lin , Jinhua Su , Yuhang Li , Yuhao Wei , Hanchao Yan , Saining Zhang , Jiaan Luo , Danni Ai , Hong Song , Jingfan Fan

分类：计算机视觉 | 机器学习

2022-11-04

Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component.

translated by 谷歌翻译

MTU-Net: Multi-level TransUNet for Space-based Infrared Tiny Ship Detection

Tianhao Wu , Boyang Li , Yihang Luo , Yingqian Wang , Chao Xiao , Ting Liu , Jungang Yang , Wei An , Yulan Guo

分类：计算机视觉

2022-09-28

空间红外的小型船舶检测旨在将小型船只与轨道轨道捕获的图像分开。由于图像覆盖面积极大（例如，数千平方公里），这些图像中的候选目标比空中基于天线和陆基成像设备观察到的目标要小得多，二聚体，更可变。现有的简短成像基于距离的红外数据集和目标检测方法不能很好地用于空间监视任务。为了解决这些问题，我们开发了一个空间红外的小型船舶检测数据集（即Nudt-Sirst-Sea），该数据集具有48个空间基红外图像和17598像素级的小型船上注释。每个图像覆盖约10000平方公里的面积，带有10000x10000像素。考虑到这些充满挑战的场景，考虑到这些微小的船只的极端特征（例如，小，昏暗，可变的），我们在本文中提出了多层Transunet（MTU-NET）。具体而言，我们设计了视觉变压器（VIT）卷积神经网络（CNN）混合编码器来提取多层次特征。首先将局部特征图用几个卷积层提取，然后馈入多级特征提取模块（MVTM）以捕获长距离依赖性。我们进一步提出了一种拷贝性衡量量 - 帕斯特（CRRP）数据增强方法，以加速训练阶段，从而有效地减轻了目标和背景之间样本不平衡问题的问题。此外，我们设计了一个焦点损失，以实现目标定位和形状描述。 NUDT-SIRST-SEA数据集的实验结果表明，就检测概率，错误警报率和联合交集的交集而言，我们的MTU-NET优于传统和现有的基于深度学习的SIRST方法。

translated by 谷歌翻译

Intention Communication and Hypothesis Likelihood in Game-Theoretic Motion Planning

Makram Chahine , Roya Firoozi , Wei Xiao , Mac Schwager , Daniela Rus

分类：机器人

2022-09-26

游戏理论运动计划者是控制多个高度交互式机器人系统的有效解决方案。大多数现有的游戏理论规划师不切实际地假设所有代理都可以使用先验的目标功能知识。为了解决这个问题，我们提出了一个容忍度的退缩水平游戏理论运动计划者，该计划者利用了与意图假设的可能性相互交流。具体而言，机器人传达其目标函数以结合意图。离散的贝叶斯过滤器旨在根据观察到的轨迹与传达意图的轨迹之间的差异来实时推断目标。在仿真中，我们考虑了三种安全至关重要的自主驾驶场景，即超车，车道交叉和交叉点，以证明我们计划者在存在通信网络中存在错误的传输情况下利用替代意图假设来产生安全轨迹的能力。

translated by 谷歌翻译

Sar Ship Detection based on Swin Transformer and Feature Enhancement Feature Pyramid Network

Xiao Ke , Xiaoling Zhang , Tianwen Zhang , Jun Shi , Shunjun Wei

分类：计算机视觉 | 人工智能

2022-09-21

随着卷积神经网络（CNN）的蓬勃发展，诸如VGG-16和Resnet-50之类的CNN广泛用作SAR船检测中的骨架。但是，基于CNN的骨干很难对远程依赖性进行建模，并且导致缺乏浅层特征图中缺乏足够的高质量语义信息，从而导致在复杂的背景和小型船只中的检测性能不佳。为了解决这些问题，我们提出了一种基于SWIN Transformer的SAR船检测方法，并提出了功能增强功能功能金字塔网络（FEFPN）。SWIN Transformer用作建模远程依赖性并生成层次特征图的骨架。提出了FEFPN，以进一步提高特征地图的质量，通过逐渐增强各级特征地图的语义信息，尤其是浅层中的特征地图。在SAR船检测数据集（SSDD）上进行的实验揭示了我们提出的方法的优势。

translated by 谷歌翻译

Masked Autoencoders Enable Efficient Knowledge Distillers

Yutong Bai , Zeyu Wang , Junfei Xiao , Chen Wei , Huiyu Wang , Alan Yuille , Yuyin Zhou , Cihang Xie

分类：计算机视觉

2022-08-25

本文研究了从预先训练的模型，尤其是蒙面自动编码器中提取知识的潜力。我们的方法很简单：除了优化掩盖输入的像素重建损失外，我们还将教师模型的中间特征图与学生模型的中间特征图之间的距离最小化。此设计导致一个计算高效的知识蒸馏框架，给定1）仅使用一个少量可见的补丁子集，2）（笨拙的）教师模型仅需要部分执行，\ ie，\ ie，在前几个中，向前传播输入层，用于获得中间特征图。与直接蒸馏微型模型相比，提炼预训练的模型显着改善了下游性能。例如，通过将知识从MAE预先训练的VIT-L提炼为VIT-B，我们的方法可实现84.0％的Imagenet Top-1精度，表现优于直接将微型VIT-L蒸馏的基线，降低1.2％。更有趣的是，我们的方法即使具有极高的掩盖率也可以从教师模型中进行鲁棒性蒸馏：例如，在蒸馏过程中仅可见十个斑块，我们的VIT-B具有竞争力的前1个Imagenet精度为83.6％，在95％的掩盖率中，只有十个斑块。 ;令人惊讶的是，它仍然可以通过仅四个可见斑（98％的掩盖率）积极训练来确保82.4％的Top-1 Imagenet精度。代码和模型可在https://github.com/ucsc-vlaa/dmae上公开获得。

translated by 谷歌翻译

Transferable Cross-Tokamak Disruption Prediction with Deep Hybrid Neural Network Feature Extractor

Wei Zheng , Fengming Xue , Ming Zhang , Zhongyong Chen , Chengshuo Shen , Xinkun Ai , Nengchao Wang , Dalong Chen , Bihao Guo , Yonghua Ding

分类：机器学习

2022-08-20

预测不同托卡马克人的破坏是要克服的巨大障碍。未来的Tokamaks在高性能排放时几乎无法忍受中断。很少有高性能的破坏排放几乎无法构成丰富的训练集，这使得当前数据驱动的方法难以获得可接受的结果。能够将在一个Tokamak训练的中断预测模型转移到另一种训练的机器学习方法以解决该问题。关键是一个包含特征提取器的破坏预测模型，该模型能够在Tokamak诊断数据中提取常见的破坏前体痕迹，并具有可转移的破坏分类器。基于上面的问题，该论文首先提出了专门针对Tokamaks上的普通诊断中的破坏前体特征而设计的深融合功能提取器，该特征是根据当前已知的破坏前体，为可转移模型提供了有希望的基础。通过与J-Text上的手动特征提取进行比较，可以证明融合功能提取器。基于在J-TEXT上训练的功能提取器，将中断预测模型转移到East数据中，仅来自East实验的20次放电。该性能与经过1896年出院的模型相当。从其他模型培训方案之间的比较，转移学习表明了其在预测不同托卡马克人的破坏方面的潜力。

translated by 谷歌翻译