智能论文笔记

Interpretability and causal discovery of the machine learning models to predict the production of CBM wells after hydraulic fracturing

Chao Min , Guoquan Wen , Liangjie Gou , Xiaogang Li , Zhaozhong Yang

分类：机器学习

2022-12-21

Machine learning approaches are widely studied in the production prediction of CBM wells after hydraulic fracturing, but merely used in practice due to the low generalization ability and the lack of interpretability. A novel methodology is proposed in this article to discover the latent causality from observed data, which is aimed at finding an indirect way to interpret the machine learning results. Based on the theory of causal discovery, a causal graph is derived with explicit input, output, treatment and confounding variables. Then, SHAP is employed to analyze the influence of the factors on the production capability, which indirectly interprets the machine learning models. The proposed method can capture the underlying nonlinear relationship between the factors and the output, which remedies the limitation of the traditional machine learning routines based on the correlation analysis of factors. The experiment on the data of CBM shows that the detected relationship between the production and the geological/engineering factors by the presented method, is coincident with the actual physical mechanism. Meanwhile, compared with traditional methods, the interpretable machine learning models have better performance in forecasting production capability, averaging 20% improvement in accuracy.

translated by 谷歌翻译

Distribution-based Emotion Recognition in Conversation

Wen Wu , Chao Zhang , Philip C. Woodland

分类：自然语言处理

2022-11-09

Automatic emotion recognition in conversation (ERC) is crucial for emotion-aware conversational artificial intelligence. This paper proposes a distribution-based framework that formulates ERC as a sequence-to-sequence problem for emotion distribution estimation. The inherent ambiguity of emotions and the subjectivity of human perception lead to disagreements in emotion labels, which is handled naturally in our framework from the perspective of uncertainty estimation in emotion distributions. A Bayesian training loss is introduced to improve the uncertainty estimation by conditioning each emotional state on an utterance-specific Dirichlet prior distribution. Experimental results on the IEMOCAP dataset show that ERC outperformed the single-utterance-based system, and the proposed distribution-based ERC methods have not only better classification accuracy, but also show improved uncertainty estimation.

translated by 谷歌翻译

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Andrey Ignatov , Radu Timofte , Maurizio Denna , Abdel Younes , Ganzorig Gankhuyag , Jingang Huh , Myeong Kyun Kim , Kihwan Yoon , Hyeon-Cheol Moon , Seungho Lee

分类：计算机视觉

2022-11-07

Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.

translated by 谷歌翻译

Localized Sparse Incomplete Multi-view Clustering

Chengliang Liu , Zhihao Wu , Jie Wen , Chao Huang , Yong Xu

分类：计算机视觉 | 人工智能

2022-08-05

旨在解决不完整的多视图数据中缺少部分视图的聚类问题的不完整的多视图聚类，近年来受到了越来越多的关注。尽管已经开发了许多方法，但大多数方法要么无法灵活地处理不完整的多视图数据，因此使用任意丢失的视图，或者不考虑视图之间信息失衡的负面因素。此外，某些方法并未完全探索所有不完整视图的局部结构。为了解决这些问题，本文提出了一种简单但有效的方法，称为局部稀疏不完整的多视图聚类（LSIMVC）。与现有方法不同，LSIMVC打算通过优化一个稀疏的正则化和新颖的图形嵌入式多视图矩阵分数模型来从不完整的多视图数据中学习稀疏和结构化的潜在表示。具体而言，在基于矩阵分解的这种新型模型中，引入了基于L1规范的稀疏约束，以获得稀疏的低维单个表示和稀疏共识表示。此外，引入了新的本地图嵌入项以学习结构化共识表示。与现有作品不同，我们的本地图嵌入术语汇总了图形嵌入任务和共识表示任务中的简洁术语。此外，为了减少多视图学习的不平衡因素，将自适应加权学习方案引入LSIMVC。最后，给出了有效的优化策略来解决我们提出的模型的优化问题。在六个不完整的多视图数据库上执行的全面实验结果证明，我们的LSIMVC的性能优于最新的IMC方法。该代码可在https://github.com/justsmart/lsimvc中找到。

translated by 谷歌翻译

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Dongchao Yang , Jianwei Yu , Helin Wang , Wen Wang , Chao Weng , Yuexian Zou , Dong Yu

分类：人工智能

2022-07-20

产生人类想要的声音效果是一个重要的话题。但是，在这一领域，很少有研究声音发电。在这项研究中，我们调查了以文本提示为条件的声音，并提出了一个新型的文本对生成框架，该框架由文本编码器组成，矢量量化了变异自动编码器（VQ-VAE），解码器和歌手。该框架首先使用解码器将从文本编码器提取的文本特征传递到借助VQ-VAE的MEL光谱图中，然后使用Vocoder将生成的MEL光谱图转换为波形。我们发现，解码器显着影响发电性能。因此，我们专注于在这项研究中设计一个好的解码器。我们从传统的自动回解码器开始，该解码器已被证明是以前的Sound Generation Works中的最先进方法。但是，AR解码器始终按顺序预测MEL-SPECTROGIN图令牌，这引入了单向偏见和错误问题的积累。此外，使用AR解码器，声音生成时间随着声音持续时间线性增加。为了克服AR解码器引入的缺点，我们提出了一个基于离散扩散模型的非自动回形解码器，称为DiffSound。具体而言，DIFFSOUND可以在一个步骤中预测所有MEL光谱图令牌，然后在下一步中完善预测的令牌，因此可以在几个步骤后获得最优于预测的结果。我们的实验表明，与AR解码器相比，我们提出的差异不仅产生更好的文本到单一生成结果，而且还具有更快的生成速度，例如MOS：3.56 \ textit {v.s} 2.786，并且生成速度为五个比AR解码器快的时间。

translated by 谷歌翻译

3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform

Yining Zhao , Chao Wen , Zhou Xue , Yue Gao

分类：计算机视觉

2022-07-19

在单个全景图像对3D房间布局的估计中，全局线框可以通过全局线框进行紧密描述。基于此观察，我们提出了一种替代方法，通过对可学习的霍夫变换块中的远程几何模式进行建模，以估算3D空间中的壁。我们将图像特征从库emap瓷砖转换为曼哈顿世界的霍夫空间，并将该功能直接映射到几何输出。卷积层不仅学习了局部梯度式的线特征，而且还利用全局信息成功预测具有简单网络结构的遮挡墙。与以前的大多数工作不同，预测是在每个Cubemap瓷砖上单独执行的，然后组装以获取布局估计。实验结果表明，我们在预测准确性和性能方面获得了可比的结果。代码可在https://github.com/starrah/dmh-net上找到。

translated by 谷歌翻译

Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors

Wen Wu , Chao Zhang , Xixin Wu , Philip C. Woodland

分类：自然语言处理

2022-03-08

情感识别是需要自然与人类互动的人工智能系统的关键属性。但是，由于情感的固有歧义，任务定义仍然是一个空旷的问题。在本文中，提出了一种基于dirichlet的新型贝叶斯训练损失，以提议言语情感识别，该言语识别为言语识别而建模，它将人类注释者分配给不同的情感类别的单次注释时产生的单热标签的不确定性。一个额外的指标用于通过具有高标签不确定性的检测测试说法来评估性能。这消除了一个主要局限性，即情绪分类系统仅考虑大多数注释者就情感阶级一致的标签来考虑话语。此外，研究了一种常见的方法，以利用通过平均单热标签获得的连续价值“软”标签。我们提出了一个双分支模型的结构，用于以每种能力为基础的情绪分类，该结构在广泛使用的Iemocap数据集上实现了最新的分类结果。基于此，进行了不确定性估计实验。当在Precision-Recall曲线下，当检测高不确定性的话语的情况下，可以通过对软标签的Kullback-Leibler Divergence训练损失来实现最佳性能。使用MSP播客数据集验证了所提出的方法的通用性，该数据集产生了相同的结果模式。

translated by 谷歌翻译

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Shuohuan Wang , Yu Sun , Yang Xiang , Zhihua Wu , Siyu Ding , Weibao Gong , Shikun Feng , Junyuan Shang , Yanbin Zhao , Chao Pang

分类：自然语言处理

2021-12-23

预先接受的语言模型实现了最先进的导致各种自然语言处理（NLP）任务。 GPT-3表明，缩放预先训练的语言模型可以进一步利用它们的巨大潜力。最近提出了一个名为Ernie 3.0的统一框架，以预先培训大型知识增强型号，并培训了具有10亿参数的模型。 Ernie 3.0在各种NLP任务上表现出最先进的模型。为了探讨缩放的表现，我们培养了百卢比的3.0泰坦参数型号，在PaddlePaddle平台上有高达260亿参数的泰坦。此外，我们设计了一种自我监督的对抗性损失和可控语言建模损失，以使ERNIE 3.0 TITAN产生可信和可控的文本。为了减少计算开销和碳排放，我们向Ernie 3.0泰坦提出了一个在线蒸馏框架，教师模型将同时教授学生和培训。埃塞尼3.0泰坦是迄今为止最大的中国密集预训练模型。经验结果表明，Ernie 3.0泰坦在68个NLP数据集中优于最先进的模型。

translated by 谷歌翻译

CATNet: Context AggregaTion Network for Instance Segmentation in Remote Sensing Images

Ye Liu , Huifang Li , Chao Hu , Shuang Luo , Huanfeng Shen , Chang Wen Chen

分类：计算机视觉

2021-11-22

遥感图像中的实例分段的任务，旨在在实例级别执行对象的每像素标记，对于各种民用应用非常重要。尽管以前的成功，但大多数现有的实例分割方法设计用于自然图像时，可以在直接应用于顶视图遥感图像时遇到清晰的性能下降。通过仔细分析，我们观察到由于严重的规模变化，低对比度和聚类分布，挑战主要来自歧视性对象特征。为了解决这些问题，提出了一种新颖的上下文聚合网络（CATNET）来改善特征提取过程。所提出的模型利用了三个轻量级的即插即用模块，即密度特征金字塔网络（Densfpn），空间上下文金字塔（SCP）和兴趣提取器（Hroie）的分层区域，以聚合在功能，空间和的全局视觉上下文实例域分别。 DenseFPN是一种多尺度特征传播模块，通过采用级别的残差连接，交叉级密度连接和具有重新加权策略来建立更灵活的信息流。利用注意力机制，SCP进一步通过将全局空间上下文聚合到当地区域来增强特征。对于每个实例，Hroie自适应地为不同的下游任务生成ROI功能。我们对挑战ISAID，DIOR，NWPU VHR-10和HRSID数据集进行了广泛的评估。评估结果表明，所提出的方法优于具有类似的计算成本的最先进。代码可在https://github.com/yeliudev/catnet上获得。

translated by 谷歌翻译

whu-nercms at trecvid2021:instance search task

Yanrui Niu , Jingyao Yang , Ankang Lu , Baojin Huang , Yue Zhang , Ji Huang , Shishi Wen , Dongshu Xu , Chao Liang , Zhongyuan Wang

分类：计算机视觉

2021-10-30

我们将简要介绍本文Trecvid2021中WHU-nercms的实验方法和结果。今年，我们参加了实例搜索的自动和交互式任务（INS）。对于自动任务，检索目标分为两个部分，人检索和动作检索。我们采用了两阶段方法，包括对人检索的面部检测和面部识别以及由三种基于框架的人类对象相互作用检测方法和两种基于视频的一般动作检测方法组成的两种动作检测方法。在那之后，人的检索结果和动作检索结果被融合以初始化结果排名列表。此外，我们尝试使用互补方法进一步提高搜索性能。对于交互式任务，我们在融合结果上测试了两种不同的交互策略。我们分别为自动和交互式任务提交4次运行。每次运行的引入显示在表1中。官方评估表明，所提出的策略在自动和交互式轨道中排名第一。

translated by 谷歌翻译