智能论文笔记

Visual Prompt Tuning for Test-time Domain Adaptation

Yunhe Gao , Xingjian Shi , Yi Zhu , Hao Wang , Zhiqiang Tang , Xiong Zhou , Mu Li , Dimitris N. Metaxas

分类：计算机视觉

2022-10-10

Models should be able to adapt to unseen data during test-time to avoid performance drops caused by inevitable distribution shifts in real-world deployment scenarios. In this work, we tackle the practical yet challenging test-time adaptation (TTA) problem, where a model adapts to the target domain without accessing the source data. We propose a simple recipe called \textit{Data-efficient Prompt Tuning} (DePT) with two key ingredients. First, DePT plugs visual prompts into the vision Transformer and only tunes these source-initialized prompts during adaptation. We find such parameter-efficient finetuning can efficiently adapt the model representation to the target domain without overfitting to the noise in the learning objective. Second, DePT bootstraps the source representation to the target domain by memory bank-based online pseudo-labeling. A hierarchical self-supervised regularization specially designed for prompts is jointly optimized to alleviate error accumulation during self-training. With much fewer tunable parameters, DePT demonstrates not only state-of-the-art performance on major adaptation benchmarks VisDA-C, ImageNet-C, and DomainNet-126, but also superior data efficiency, i.e., adaptation with only 1\% or 10\% data without much performance degradation compared to 100\% data. In addition, DePT is also versatile to be extended to online or multi-source TTA settings.

translated by 谷歌翻译

TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers

Di Liu , Yunhe Gao , Qilong Zhangli , Ligong Han , Xiaoxiao He , Zhaoyang Xia , Song Wen , Qi Chang , Zhennan Yan , Mu Zhou

分类：计算机视觉

2022-03-21

组合来自多视图图像的信息对于提高自动化方法的疾病诊断方法的性能和鲁棒性至关重要。但是，由于多视图图像的非对齐特性，跨视图的构建相关性和数据融合在很大程度上仍然是一个开放的问题。在这项研究中，我们提出了输血，这是一种基于变压器的体系结构，可使用卷积层和强大的注意机制合并不同的多视图成像信息。特别是，针对丰富的跨视图上下文建模和语义依赖性挖掘，提出了发散的融合注意（DIFA）模块，以解决从不同图像视图中捕获未对齐数据之间的长期相关性的关键问题。我们进一步提出了多尺度注意（MSA），以收集多尺度特征表示的全局对应关系。我们评估了心脏MRI（M \＆MS-2）挑战队列中多疾病，多视图\＆多中心右心室分段的输血。输血表明了针对最先进方法的领先绩效，并为多视图成像集成的新观点打开了稳健的医学图像分割。

translated by 谷歌翻译

Region Proposal Rectification Towards Robust Instance Segmentation of Biological Images

Qilong Zhangli , Jingru Yi , Di Liu , Xiaoxiao He , Zhaoyang Xia , Qi Chang , Ligong Han , Yunhe Gao , Song Wen , Haiming Tang

分类：计算机视觉

2022-03-06

自上而下的实例分割框架与自下而上的框架相比，它在对象检测方面表现出了优越性。虽然它有效地解决了过度细分，但自上而下的实例分割却遭受了过度处理问题。然而，完整的分割掩模对于生物图像分析至关重要，因为它具有重要的形态特性，例如形状和体积。在本文中，我们提出了一个区域建议纠正（RPR）模块，以解决这个具有挑战性的分割问题。特别是，我们提供了一个渐进式皇家模块，以逐渐将邻居信息引入一系列ROI。 ROI功能被馈入专门的进料网络（FFN）以进行提案框回归。有了其他邻居信息，提出的RPR模块显示了区域建议位置的校正显着改善，因此与最先进的基线方法相比，在三个生物图像数据集上表现出有利的实例分割性能。实验结果表明，所提出的RPR模块在基于锚固的和无锚的自上而下实例分割方法中有效，这表明该方法可以应用于生物学图像的一般自上而下实例分割。代码可用。

translated by 谷歌翻译

A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark

Yunhe Gao , Mu Zhou , Di Liu , Zhennan Yan , Shaoting Zhang , Dimitris N. Metaxas

分类：计算机视觉

2022-02-28

作为新一代神经体系结构的变形金刚在自然语言处理和计算机视觉方面表现出色。但是，现有的视觉变形金刚努力使用有限的医学数据学习，并且无法概括各种医学图像任务。为了应对这些挑战，我们将Medformer作为数据量表变压器呈现为可推广的医学图像分割。关键设计结合了理想的电感偏差，线性复杂性的层次建模以及以空间和语义全局方式以线性复杂性的关注以及多尺度特征融合。 Medformer可以在不预训练的情况下学习微小至大规模的数据。广泛的实验表明，Medformer作为一般分割主链的潜力，在三个具有多种模式（例如CT和MRI）和多样化的医学靶标（例如，健康器官，疾病，疾病组织和肿瘤）的三个公共数据集上优于CNN和视觉变压器。我们将模型和评估管道公开可用，为促进广泛的下游临床应用提供固体基线和无偏比较。

translated by 谷歌翻译

Pre-Trained Image Processing Transformer

Hanting Chen , Yunhe Wang , Tianyu Guo , Chang Xu , Yiping Deng , Zhenhua Liu , Siwei Ma , Chunjing Xu , Chao Xu , Wen Gao

分类：计算机视觉 | 机器学习

2020-12-01

由于现代硬件的计算能力强烈增加，在大规模数据集上学习的预训练的深度学习模型（例如，BERT，GPT-3）已经显示了它们对传统方法的有效性。巨大进展主要促进了变压器及其变体架构的代表能力。在本文中，我们研究了低级计算机视觉任务（例如，去噪，超级分辨率和派没），并开发了一个新的预先训练的模型，即图像处理变压器（IPT）。为了最大限度地挖掘变压器的能力，我们展示了利用众所周知的想象网基准，以产生大量损坏的图像对。 IPT模型在具有多头和多尾的这些图像上培训。此外，引入了对比度学习，以适应不同的图像处理任务。因此，在微调后，预先训练的模型可以有效地在所需的任务上使用。只有一个预先训练的模型，IPT优于当前的最先进方法对各种低级基准。代码可在https://github.com/huawei-noah/pretrate -ipt和https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/ipt

translated by 谷歌翻译

Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic Knowledge

Longxu Dou , Yan Gao , Xuqi Liu , Mingyang Pan , Dingzirui Wang , Wanxiang Che , Dechen Zhan , Min-Yen Kan , Jian-Guang Lou

分类：自然语言处理

2023-01-03

In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.

translated by 谷歌翻译

Further Improving Weakly-supervised Object Localization via Causal Knowledge Distillation

Feifei Shao , Yawei Luo , Shengjian Wu , Qiyi Li , Fei Gao , Yi Yang , Jun Xiao

分类：计算机视觉

2023-01-03

Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.

translated by 谷歌翻译

Deep Spectral Q-learning with Application to Mobile Health

Yuhe Gao , Chengchun Shi , Rui Song

分类： (统计)机器学习 | 机器学习

2023-01-03

Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.

translated by 谷歌翻译

Follow the Timeline! Generating Abstractive and Extractive Timeline Summary in Chronological Order

Xiuying Chen , Mingzhe Li , Shen Gao , Zhangming Chan , Dongyan Zhao , Xin Gao , Xiangliang Zhang , Rui Yan

分类：自然语言处理

2023-01-02

Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.

translated by 谷歌翻译

Lifting-wing Quadcopter Modeling and Unified Control

Quan Quan , Wang Shuai , Gao Wenhan

分类：机器人

2023-01-02

Hybrid unmanned aerial vehicles (UAVs) integrate the efficient forward flight of fixed-wing and vertical takeoff and landing (VTOL) capabilities of multicopter UAVs. This paper presents the modeling, control and simulation of a new type of hybrid micro-small UAVs, coined as lifting-wing quadcopters. The airframe orientation of the lifting wing needs to tilt a specific angle often within $ 45$ degrees, neither nearly $ 90$ nor approximately $ 0$ degrees. Compared with some convertiplane and tail-sitter UAVs, the lifting-wing quadcopter has a highly reliable structure, robust wind resistance, low cruise speed and reliable transition flight, making it potential to work fully-autonomous outdoor or some confined airspace indoor. In the modeling part, forces and moments generated by both lifting wing and rotors are considered. Based on the established model, a unified controller for the full flight phase is designed. The controller has the capability of uniformly treating the hovering and forward flight, and enables a continuous transition between two modes, depending on the velocity command. What is more, by taking rotor thrust and aerodynamic force under consideration simultaneously, a control allocation based on optimization is utilized to realize cooperative control for energy saving. Finally, comprehensive Hardware-In-the-Loop (HIL) simulations are performed to verify the advantages of the designed aircraft and the proposed controller.

translated by 谷歌翻译