智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Anti-Retroactive Interference for Lifelong Learning

Runqi Wang , Yuxiang Bao , Baochang Zhang , Jianzhuang Liu , Wentao Zhu , Guodong Guo

分类：计算机视觉

2022-08-27

人类可以不断学习新知识。但是，在学习新任务后，机器学习模型在以前的任务上的性能急剧下降。认知科学指出，类似知识的竞争是遗忘的重要原因。在本文中，我们根据大脑的元学习和关联机制设计了一个用于终身学习的范式。它从两个方面解决了问题：提取知识和记忆知识。首先，我们通过背景攻击破坏样本的背景分布，从而增强了模型以提取每个任务的关键特征。其次，根据增量知识和基础知识之间的相似性，我们设计了增量知识的自适应融合，这有助于模型将能力分配到不同困难的知识。理论上分析了所提出的学习范式可以使不同任务的模型收敛到相同的最优值。提出的方法已在MNIST，CIFAR100，CUB200和ImagEnet100数据集上进行了验证。

translated by 谷歌翻译

Towards Open Set Video Anomaly Detection

Yuansheng Zhu , Wentao Bao , Qi Yu

分类：计算机视觉

2022-08-23

开放式视频异常检测（OpenVAD）旨在从视频数据中识别出异常事件，在测试中都存在已知的异常和新颖的事件。无监督的模型仅从普通视频中学到的模型适用于任何测试异常，但遭受高误报率的损失。相比之下，弱监督的方法可有效检测已知的异常情况，但在开放世界中可能会失败。我们通过将证据深度学习（EDL）和将流量（NFS）归一化为多个实例学习（MIL）框架来开发出一种新颖的OpenVAD问题的弱监督方法。具体而言，我们建议使用图形神经网络和三重态损失来学习训练EDL分类器的区分特征，在该特征中，EDL能够通过量化不确定性来识别未知异常。此外，我们制定了一种不确定性感知的选择策略，以获取清洁异常实例和NFS模块以生成伪异常。我们的方法通过继承无监督的NF和弱监督的MIL框架的优势来优于现有方法。多个现实世界视频数据集的实验结果显示了我们方法的有效性。

translated by 谷歌翻译

HTML版本

Gradient Frequency Modulation for Visually Explaining Video Understanding Models

Xinmiao Lin , Wentao Bao , Matthew Wright , Yu Kong

分类：计算机视觉

2021-11-01

在许多应用中，必须了解机器学习模型使其做出决定的原因是必不可少的，但这受到最先进的神经网络的黑匣子性质的抑制。因此，由于在深度学习中，增加了越来越长的关注，包括在视频理解领域。由于视频数据的时间维度，解释视频动作识别模型的主要挑战是产生时尚常规一致的视觉解释，这在现有文献中被忽略。在本文中，我们提出了基于频率的极值扰动（F-EP）来解释视频了解模型的决策。因为扰动方法给出的解释是在空间和时间上的噪声和非光滑的，所以我们建议用具有离散余弦变换（DCT）的神经网络模型来调制梯度图的频率。我们在一系列实验中展示了F-EP提供了更加不稳定的始终如一的解释，与现有的最先进的方法相比，更忠实地代表模型的决定。

translated by 谷歌翻译

Robust computation of optimal transport by $β$-potential regularization

Shintaro Nakamura , Han Bao , Masashi Sugiyama

分类：机器学习 | 人工智能

2022-12-26

Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions. For instance, OT is a popular loss function that quantifies the discrepancy between an empirical distribution and a parametric model. Recently, an entropic penalty term and the celebrated Sinkhorn algorithm have been commonly used to approximate the original OT in a computationally efficient way. However, since the Sinkhorn algorithm runs a projection associated with the Kullback-Leibler divergence, it is often vulnerable to outliers. To overcome this problem, we propose regularizing OT with the \beta-potential term associated with the so-called $\beta$-divergence, which was developed in robust statistics. Our theoretical analysis reveals that the $\beta$-potential can prevent the mass from being transported to outliers. We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers. In addition, our proposed method can successfully detect outliers from a contaminated dataset

translated by 谷歌翻译

Federated PCA on Grassmann Manifold for Anomaly Detection in IoT Networks

Tung-Anh Nguyen , Jiayu He , Long Tan Le , Wei Bao , Nguyen H. Tran

分类：机器学习

2022-12-23

In the era of Internet of Things (IoT), network-wide anomaly detection is a crucial part of monitoring IoT networks due to the inherent security vulnerabilities of most IoT devices. Principal Components Analysis (PCA) has been proposed to separate network traffics into two disjoint subspaces corresponding to normal and malicious behaviors for anomaly detection. However, the privacy concerns and limitations of devices' computing resources compromise the practical effectiveness of PCA. We propose a federated PCA-based Grassmannian optimization framework that coordinates IoT devices to aggregate a joint profile of normal network behaviors for anomaly detection. First, we introduce a privacy-preserving federated PCA framework to simultaneously capture the profile of various IoT devices' traffic. Then, we investigate the alternating direction method of multipliers gradient-based learning on the Grassmann manifold to guarantee fast training and the absence of detecting latency using limited computational resources. Empirical results on the NSL-KDD dataset demonstrate that our method outperforms baseline approaches. Finally, we show that the Grassmann manifold algorithm is highly adapted for IoT anomaly detection, which permits drastically reducing the analysis time of the system. To the best of our knowledge, this is the first federated PCA algorithm for anomaly detection meeting the requirements of IoT networks.

translated by 谷歌翻译

EIT: Enhanced Interactive Transformer

Tong Zheng , Bei Li , Huiwen Bao , Tong Xiao , Jingbo Zhu

分类：自然语言处理

2022-12-20

In this paper, we propose a novel architecture, the Enhanced Interactive Transformer (EIT), to address the issue of head degradation in self-attention mechanisms. Our approach replaces the traditional multi-head self-attention mechanism with the Enhanced Multi-Head Attention (EMHA) mechanism, which relaxes the one-to-one mapping constraint among queries and keys, allowing each query to attend to multiple keys. Furthermore, we introduce two interaction models, Inner-Subspace Interaction and Cross-Subspace Interaction, to fully utilize the many-to-many mapping capabilities of EMHA. Extensive experiments on a wide range of tasks (e.g. machine translation, abstractive summarization, grammar correction, language modelling and brain disease automatic diagnosis) show its superiority with a very modest increase in model size.

translated by 谷歌翻译

An Information-Theoretic Approach to Transferability in Task Transfer Learning

Yajie Bao , Yang Li , Shao-Lun Huang , Lin Zhang , Lizhong Zheng , Amir Zamir , Leonidas Guibas

分类：机器学习 | 计算机视觉

2022-12-20

Task transfer learning is a popular technique in image processing applications that uses pre-trained models to reduce the supervision cost of related tasks. An important question is to determine task transferability, i.e. given a common input domain, estimating to what extent representations learned from a source task can help in learning a target task. Typically, transferability is either measured experimentally or inferred through task relatedness, which is often defined without a clear operational meaning. In this paper, we present a novel metric, H-score, an easily-computable evaluation function that estimates the performance of transferred representations from one task to another in classification problems using statistical and information theoretic principles. Experiments on real image data show that our metric is not only consistent with the empirical transferability measurement, but also useful to practitioners in applications such as source model selection and task transfer curriculum learning.

translated by 谷歌翻译

DocAsRef: A Pilot Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely

Forrest Sheng Bao , Ruixuan Tu , Ge Luo

分类：人工智能 | 自然语言处理

2022-12-20

Summary quality assessment metrics have two categories: reference-based and reference-free. Reference-based metrics are theoretically more accurate but are limited by the availability and quality of the human-written references, which are both difficulty to ensure. This inspires the development of reference-free metrics, which are independent from human-written references, in the past few years. However, existing reference-free metrics cannot be both zero-shot and accurate. In this paper, we propose a zero-shot but accurate reference-free approach in a sneaky way: feeding documents, based upon which summaries generated, as references into reference-based metrics. Experimental results show that this zero-shot approach can give us the best-performing reference-free metrics on nearly all aspects on several recently-released datasets, even beating reference-free metrics specifically trained for this task sometimes. We further investigate what reference-based metrics can benefit from such repurposing and whether our additional tweaks help.

translated by 谷歌翻译

Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding

Haoli Bai , Zhiguang Liu , Xiaojun Meng , Wentao Li , Shuang Liu , Nian Xie , Rongfu Zheng , Liangwei Wang , Lu Hou , Jiansheng Wei

分类：自然语言处理 | 计算机视觉

2022-12-19

Unsupervised pre-training on millions of digital-born or scanned documents has shown promising advances in visual document understanding~(VDU). While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far. A document textline usually contains words that are spatially and semantically correlated, which can be easily obtained from OCR engines. In this paper, we propose Wukong-Reader, trained with new pre-training objectives to leverage the structural knowledge nested in document textlines. We introduce textline-region contrastive learning to achieve fine-grained alignment between the visual regions and texts of document textlines. Furthermore, masked region modeling and textline-grid matching are also designed to enhance the visual and layout representations of textlines. Experiments show that our Wukong-Reader has superior performance on various VDU tasks such as information extraction. The fine-grained alignment over textlines also empowers Wukong-Reader with promising localization ability.

translated by 谷歌翻译