智能论文笔记

在这项工作中，我们通过仔细复制最新发布在已知文献基准上发布的NTPP模型，并将NTPP模型应用于新颖的现实世界中，并将其应用于新颖的现实世界，从而确定了应用神经时间点过程（NTPP）模型来应用于行业客户行为数据的开放研究机会消费者行为数据集是最大的公开NTPP基准的两倍。我们确定以下挑战。首先，NTPP模型尽管其生成性质仍然容易受到数据集失衡的影响，并且无法预测罕见事件。其次，尽管具有理论上的吸引力和文献基准的领先表现，但基于随机微分方程的NTPP模型并不能轻易地扩展到大型行业规模数据。前者是根据先前对深层生成模型的观察结果。此外，为了解决一个冷门问题，我们探索了NTPP模型的新颖添加 - 基于静态用户功能的参数化。

translated by 谷歌翻译

Speech MOS multi-task learning and rater bias correction

Haleh Akrami , Hannes Gamper

分类：人工智能

2022-12-04

Perceptual speech quality is an important performance metric for teleconferencing applications. The mean opinion score (MOS) is standardized for the perceptual evaluation of speech quality and is obtained by asking listeners to rate the quality of a speech sample. Recently, there has been increasing research interest in developing models for estimating MOS blindly. Here we propose a multi-task framework to include additional labels and data in training to improve the performance of a blind MOS estimation model. Experimental results indicate that the proposed model can be trained to jointly estimate MOS, reverberation time (T60), and clarity (C50) by combining two disjoint data sets in training, one containing only MOS labels and the other containing only T60 and C50 labels. Furthermore, we use a semi-supervised framework to combine two MOS data sets in training, one containing only MOS labels (per ITU-T Recommendation P.808), and the other containing separate scores for speech signal, background noise, and overall quality (per ITU-T Recommendation P.835). Finally, we present preliminary results for addressing individual rater bias in the MOS labels.

translated by 谷歌翻译