智能论文笔记

Convergence of Generalized Belief Propagation Algorithm on Graphs with Motifs

Yitao Chen , Deepanshu Vasal

分类：机器学习 | 人工智能 | (统计)机器学习

2021-12-11

信仰传播是一种基本消息传递算法，用于机器学习中的许多应用。已知信仰传播算法精确在树图上。但是，在大多数应用程序中，信仰传播在循环图上运行。因此，了解对循环图中的信仰传播的行为一直是不同领域的研究人员的主要话题。在本文中，我们研究了在具有图案（三角形，循环等）图中的广义信仰传播算法的收敛行为我们在一定的初始化下显示，广义信仰传播会聚到铁磁性模型的贝特自由能的全球最优的最佳状态在与图案的图表上。

translated by 谷歌翻译

Semantically Consistent Data Augmentation for Neural Machine Translation via Conditional Masked Language Model

Qiao Cheng , Jin Huang , Yitao Duan

分类：自然语言处理

2022-09-22

本文介绍了一种新的数据增强方法，用于神经机器翻译，该方法可以在语言内部和跨语言内部实施更强的语义一致性。我们的方法基于条件掩盖语言模型（CMLM），该模型是双向的，可以在左右上下文以及标签上有条件。我们证明CMLM是生成上下文依赖性单词分布的好技术。特别是，我们表明CMLM能够通过在替换过程中对源和目标进行调节来实现语义一致性。此外，为了增强多样性，我们将软词替换的想法纳入了数据增强，该概念用词汇上的概率分布代替了一个单词。在不同量表的四个翻译数据集上进行的实验表明，总体解决方案会导致更现实的数据增强和更好的翻译质量。与最新作品相比，我们的方法始终取得了最佳性能，并且在基线上的提高了1.90个BLEU点。

translated by 谷歌翻译

ConvNext Based Neural Network for Anti-Spoofing

Qiaowei Ma , Jinghui Zhong , Yitao Yang , Weiheng Liu , Ying Gao , Wing W. Y. Ng

分类：自然语言处理

2022-09-14

自动扬声器验证（ASV）已在现实生活中广泛用于身份认证。但是，随着语音转换的快速发展，语音合成算法和记录设备质量的提高，ASV系统很容易受到欺骗攻击。近年来，有关合成和重播语音检测的许多作品，研究人员提出了许多基于手工制作的特征的反欺骗方法，以提高合成和重播语音检测系统的准确性和鲁棒性。但是，使用手工制作的功能而不是原始波形将丢失某些信息进行抗旋转，这将降低系统的检测性能。受图像分类任务中Convnext的有希望的性能的启发，我们将Convnext网络体系结构相应地扩展到SPOOF攻击任务，并提出了端到端的反欺骗模型。通过将扩展体系结构与频道注意块相结合，提出的模型可以专注于最有用的语音表示子频段，以改善反欺骗性的性能。实验表明，对于ASVSPOOF 2019 LA评估数据集和PA评估数据集，我们提出的最佳单个系统可以达到1.88％和2.79％的误差率，这证明了该模型的抗SpoFofing能力。

translated by 谷歌翻译

MDM:Visual Explanations for Neural Networks via Multiple Dynamic Mask

Yitao Peng , Longzhen Yang , Yihang Liu , Lianghua He

分类：计算机视觉

2022-07-17

神经网络的活跃区域查找告诉我们，在做出决定时，神经网络的重点是哪个区域，这为我们提供了可解释性的基础，当神经网络做出分类决策时。我们提出了一种算法多动态掩码（MDM），这是一种具有解释性的通用显着图查询方法。它的建议基于一个假设：当图像输入到已经训练的神经网络时，与分类有关的激活特征将影响神经网络的分类结果，并且与分类无关的特征几乎不会影响分类结果网络。 MDM：一种基于学习的端到端算法，用于查找神经网络分类感兴趣的区域。它具有以下优点：1。它具有推理过程的解释性。 2.它是通用的，可以用于任何神经网络，并且不取决于神经网络的内部结构。 3.搜索性能更好。由于该算法基于学习生成面具并具有适应不同数据和网络的能力，因此性能比上一篇论文中提出的方法更好。对于MDM显着图搜索算法，我们在实验上比较了各种显着性图搜索方法的性能指标和MDM的Resnet和Densenet作为训练有素的神经网络。 MDM的搜索效果性能达到了最新的状态。我们将MDM应用于可解释的神经网络Protopnet和Xprotonet，从而改善了模型的解释性和原型搜索性能。我们可视化卷积神经体系结构和变压器体系结构在显着图搜索中的性能。

translated by 谷歌翻译

Variational Transformer: A Framework Beyond the Trade-off between Accuracy and Diversity for Image Captioning

Longzhen Yang , Yihang Liu , Yitao Peng , Lianghua He

分类：计算机视觉 | 机器学习

2022-05-28

准确性和多样性是产生自然和语义上正确标题的两个必不可少的迁移表现。由于权衡差距，已经做出了许多努力，以增强其中的一个。在这项工作中，我们将证明，从人类注释中得出的较低准确性标准（保留一个输出）不适用于机器生成的标题。为了通过稳定的精度性能提高多样性，我们利用了一种新颖的变异变压器框架。通过引入“不可见的信息先验”和“自动选择GMM”，我们指示编码器在不同场景中学习精确的语言信息和对象关系以确保准确性。通过引入“ Range-Median奖励”基线，我们在基于RL的多样性保证培训过程中保留了更加多样化的候选人，并具有更高的奖励。实验表明，我们的方法可以同时促进准确性（cider）和多样性（自助），高达1.1％和4.8％。同样，与人类注释相比，我们的方法具有最相似的语义检索性能，R@1（I2T）为50.3（人类50.6）。

translated by 谷歌翻译

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Yunfan Shao , Zhichao Geng , Yitao Liu , Junqi Dai , Hang Yan , Fei Yang , Li Zhe , Hujun Bao , Xipeng Qiu

分类：自然语言处理

2021-09-13

在本文中，我们利用了以前的预训练模型（PTM）的优势，并提出了一种新型的中国预训练的不平衡变压器（CPT）。与以前的中国PTM不同，CPT旨在利用自然语言理解（NLU）和自然语言生成（NLG）之间的共同知识来促进表现。 CPT包括三个部分：共享编码器，一个理解解码器和一代解码器。具有共享编码器的两个特定解码器分别通过蒙版语言建模（MLM）进行了预训练，并分别将自动编码（DAE）任务进行了验证。借助部分共享的体系结构和多任务预培训，CPT可以（1）使用两个解码器学习NLU或NLG任务的特定知识，并且（2）对模型的潜力充分利用了微调。此外，不平衡的变压器节省了计算和存储成本，这使CPT竞争激烈，并极大地加速了文本生成的推断。对各种中国NLU和NLG任务的实验结果显示了CPT的有效性。

translated by 谷歌翻译

Generative appearance replay for continual unsupervised domain adaptation

Boqi Chen , Kevin Thandiackal , Pushpak Pati , Orcun Goksel

分类：计算机视觉 | 人工智能

2023-01-03

Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.

translated by 谷歌翻译

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Shuhao Shi , Kai Qiao , Jian Chen , Shuai Yang , Jie Yang , Baojie Song , Linyuan Wang , Bin Yan

分类：计算机视觉

2023-01-03

The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

translated by 谷歌翻译

Explaining Imitation Learning through Frames

Boyuan Zheng , Jianlong Zhou , Chunjie Liu , Yiqiao Li , Fang Chen

分类：机器学习 | 计算机视觉

2023-01-03

As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.

translated by 谷歌翻译

Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment

Liqun Lin , Yang Zheng , Weiling Chen , Chengdong Lan , Tiesong Zhao

分类：计算机视觉

2023-01-03

Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.

translated by 谷歌翻译