智能论文笔记

Non-imaging real-time detection and tracking of fast-moving objects using a single-pixel detector

Fengming Zhou

分类：计算机视觉

2021-08-13

快速移动对象的检测和跟踪在许多领域都具有广泛的实用性。但是，由于复杂的计算和有限的数据处理能力，使用基于图像的技术满足快速有效检测和跟踪的这种需求是有问题的。为了解决这个问题，我们提出了一种无图像的方法，以实现快速移动对象的实时检测和跟踪。它采用Hadamard模式通过空间光调节器来照亮快速移动对象，其中单像素检测器收集所得的光信号。单像素测量值直接用于无需图像重建而无需重建位置信息。此外，一种新的采样方法用于优化实现超低采样率的模式投影方法。与最先进的方法相比，我们的方法不仅能够处理实时检测和跟踪，而且还具有少量计算和高效率。我们在实验上证明，使用22kHz数字微型摩尔设备的提出方法可以在跟踪时以1.28％的采样速率实现105FPS帧速率。我们的方法突破了传统的跟踪方式，可以在无图像重建的情况下实现对象实时跟踪。

translated by 谷歌翻译

IDP-PGFE: An Interpretable Disruption Predictor based on Physics-Guided Feature Extraction

Chengshuo Shen , Wei Zheng , Yonghua Ding , Xinkun Ai , Fengming Xue , Yu Zhong , Nengchao Wang , Li Gao , Zhipeng Chen , Zhoujun Yang

分类：人工智能 | 机器学习

2022-08-28

近年来，破坏预测取得了迅速的进展，尤其是在机器学习（ML）的方法中。理解为什么预测因子使某个预测与未来Tokamak破坏预测指标的预测准确性一样至关重要。大多数破坏预测因素的目的是准确性或跨机能力。但是，如果可以解释中断预测模型，则可以说明为什么某些样品被归类为中断前体。这使我们能够说出传入的破坏类型，并使我们深入了解破坏机制。本文根据J-TEXT上的物理引导特征提取（IDP-PGFE）设计了一种称为可解释的破坏预测变量的破坏预测变量。通过提取物理引导的特征有效地改善了模型的预测性能。需要高性能模型来确保解释结果的有效性。 IDP-PGFE的可解释性研究提供了对J-Text破坏的理解，并且通常与现有的破坏理解一致。 IDP-PGFE已被应用于破坏，因为在J文本上的密度极限实验的密度不断增加。 PGFE的时间演变具有贡献，表明ECRH的应用触发了辐射引起的破坏，从而降低了破坏时的密度。虽然RMP的应用确实提高了J文本中的密度极限。解释性研究指导了RMP不仅会影响MHD不稳定性，而且还会影响辐射轮廓的密度极限破坏的物理机制，从而延迟了密度极限的破坏。

translated by 谷歌翻译

Transferable Cross-Tokamak Disruption Prediction with Deep Hybrid Neural Network Feature Extractor

Wei Zheng , Fengming Xue , Ming Zhang , Zhongyong Chen , Chengshuo Shen , Xinkun Ai , Nengchao Wang , Dalong Chen , Bihao Guo , Yonghua Ding

分类：机器学习

2022-08-20

预测不同托卡马克人的破坏是要克服的巨大障碍。未来的Tokamaks在高性能排放时几乎无法忍受中断。很少有高性能的破坏排放几乎无法构成丰富的训练集，这使得当前数据驱动的方法难以获得可接受的结果。能够将在一个Tokamak训练的中断预测模型转移到另一种训练的机器学习方法以解决该问题。关键是一个包含特征提取器的破坏预测模型，该模型能够在Tokamak诊断数据中提取常见的破坏前体痕迹，并具有可转移的破坏分类器。基于上面的问题，该论文首先提出了专门针对Tokamaks上的普通诊断中的破坏前体特征而设计的深融合功能提取器，该特征是根据当前已知的破坏前体，为可转移模型提供了有希望的基础。通过与J-Text上的手动特征提取进行比较，可以证明融合功能提取器。基于在J-TEXT上训练的功能提取器，将中断预测模型转移到East数据中，仅来自East实验的20次放电。该性能与经过1896年出院的模型相当。从其他模型培训方案之间的比较，转移学习表明了其在预测不同托卡马克人的破坏方面的潜力。

translated by 谷歌翻译

TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations

Shiyu Huang , Wenze Chen , Longfei Zhang , Shizhen Xu , Ziyang Li , Fengming Zhu , Deheng Ye , Ting Chen , Jun Zhu

分类：人工智能

2021-10-09

深度加强学习（DRL）在复杂的视频游戏中取得了超级性能（例如，星际争霸II和DOTA II）。然而，目前的DRL系统仍然遭受多助手协调，稀疏奖励，随机环境等的挑战。在寻求解决这些挑战时，我们雇用了足球视频游戏，例如Google Research Football（GRF），如我们测试的开发基于端到端的学习的AI系统（表示为Tickick）以完成此具有挑战性的任务。在这项工作中，我们首先从联赛培训获得的单一代理专家的自我播放中生成了一个大型重播数据集。然后，我们开发了一个分布式学习系统和新的离线算法，以从固定的单个代理数据集中学习一个强大的多辅助AI。据我们所知，Tickick是第一个基于学习的AI系统，可以接管多个Agent Google Research Footful Game，而以前的工作可以控制单一代理或实验玩具学术情景。广泛的实验进一步表明，我们的预先训练的模型可以加速现代多功能算法的训练过程，我们的方法在各种学术方案上实现了最先进的性能。

translated by 谷歌翻译

Cluster-guided Contrastive Graph Clustering Network

Xihong Yang , Yue Liu , Sihang Zhou , Siwei Wang , Wenxuan Tu , Qun Zheng , Xinwang Liu , Liming Fang , En Zhu

分类：机器学习

2023-01-03

Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.

translated by 谷歌翻译

Explaining Imitation Learning through Frames

Boyuan Zheng , Jianlong Zhou , Chunjie Liu , Yiqiao Li , Fang Chen

分类：机器学习 | 计算机视觉

2023-01-03

As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.

translated by 谷歌翻译

ClusTop: An unsupervised and integrated text clustering and topic extraction framework

Zhongtao Chen , Chenghu Mi , Siwei Duo , Jingfei He , Yatong Zhou

分类：自然语言处理

2023-01-03

Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.

translated by 谷歌翻译

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Jie Liu , Yixiao Zhang , Jie-Neng Chen , Junfei Xiao , Yongyi Lu , Bennett A. Landman , Yixuan Yuan , Alan Yuille , Yucheng Tang , Zongwei Zhou

分类：计算机视觉 | 机器学习

2023-01-02

An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.

translated by 谷歌翻译

PCRLv2: A Unified Visual Information Preservation Framework for Self-supervised Pre-training in Medical Image Analysis

Hong-Yu Zhou , Chixiang Lu , Chaoqi Chen , Sibei Yang , Yizhou Yu

分类：计算机视觉 | 机器学习

2023-01-02

Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.

translated by 谷歌翻译

Credible Remote Sensing Scene Classification Using Evidential Fusion on Aerial-Ground Dual-view Images

Kun Zhao , Qian Gao , Siyuan Hao , Jie Sun , Lijian Zhou

分类：计算机视觉 | 人工智能

2023-01-02

Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.

translated by 谷歌翻译