歌词到融合的生成是歌曲创作的重要任务,并且由于其独特的特征也很具有挑战性:产生的旋律不仅应遵循良好的音乐模式,而且还应与节奏和结构等歌词中的功能保持一致。由于几个问题,这些特征无法通过以端到端学习抒情式映射的神经生成模型来很好地处理:(1)缺乏对齐的抒情式摩托律训练数据,以充分学习抒情液特征结盟; (2)发电中缺乏可控性,无法明确保证抒情特征对齐。在本文中,我们提出了ROC,这是一种新的抒情术的范式,该范式通过一代网络式管道解决了上述问题。具体而言,我们的范式有两个阶段:(1)创建阶段,其中大量音乐是由基于神经的旋律语言模型生成的,并通过几个关键功能(例如和弦,音调,节奏和节奏和节奏)在数据库中索引。结构信息,包括合唱或经文); (2)重新创建阶段,根据歌词的关键功能从数据库中检索音乐作品,并根据构图指南和旋律语言模型分数从数据库中检索音乐作品来重新创建旋律。我们的ROC范式具有多个优点:(1)它只需要未配对的旋律数据来训练旋律语言模型,而不是以前模型中配对的抒情数据。 (2)它在抒情循环的生成中实现了良好的抒情式特征对齐。关于英语和中文数据集的实验表明,ROC在客观和主观指标上都优于先前基于神经的抒情性循环模型。
translated by 谷歌翻译
人通常通过按音乐形式组织元素来表达音乐思想来创作音乐。但是,对于基于神经网络的音乐生成,由于缺乏音乐形式的标签数据,很难这样做。在本文中,我们开发了Meloform,该系统是使用专家系统和神经网络以音乐形式生成旋律的系统。具体而言,1)我们设计了一个专家系统,可以通过开发从图案到短语的音乐元素到并根据预授予的音乐形式进行重复和变化的部分来生成旋律; 2)考虑到产生的旋律缺乏音乐丰富性,我们设计了一个基于变压器的改进模型,以改善旋律而不改变其音乐形式。 Meloform享有专家系统和通过神经模型的音乐丰富性学习的精确音乐形式控制的优势。主观和客观的实验评估都表明,MeloForm以97.79%的精度生成具有精确的音乐形式控制的旋律,并且在主观评估评分方面的表现优于基线系统0.75、0.50、0.50、0.86和0.89,其结构,主题,丰富性和整体质量和整体质量无需主观评估,而没有主观评估。任何标记的音乐形式数据。此外,Meloform可以支持各种形式,例如诗歌和合唱形式,隆多形式,变异形式,奏鸣曲形式,等等。
translated by 谷歌翻译
实时音乐伴奏的生成在音乐行业(例如音乐教育和现场表演)中具有广泛的应用。但是,自动实时音乐伴奏的产生仍在研究中,并且经常在逻辑延迟和暴露偏见之间取决于权衡。在本文中,我们提出了Song Driver,这是一种无逻辑延迟或暴露偏见的实时音乐伴奏系统。具体而言,Songdriver将一个伴奏的生成任务分为两个阶段:1)安排阶段,其中变压器模型首先安排了和弦,以实时进行输入旋律,并在下一阶段加速了和弦,而不是播放它们。 2)预测阶段,其中CRF模型基于先前缓存的和弦生成了即将到来的旋律的可播放的多轨伴奏。通过这种两相策略,歌手直接生成即将到来的旋律的伴奏,从而达到了零逻辑延迟。此外,在预测时间步的和弦时,歌手是指第一阶段的缓存和弦,而不是其先前的预测,这避免了暴露偏见问题。由于输入长度通常在实时条件下受到限制,因此另一个潜在的问题是长期顺序信息的丢失。为了弥补这一缺点,我们在当前时间步骤作为全球信息之前从长期音乐作品中提取了四个音乐功能。在实验中,我们在一些开源数据集上训练歌手,以及由中国风格的现代流行音乐得分构建的原始\```````'''aisong数据集。结果表明,歌手在客观和主观指标上均优于现有的SOTA(最先进)模型,同时大大降低了物理潜伏期。
translated by 谷歌翻译
长期以来,流行音乐的一代一直是音乐家和科学家的吸引力。但是,以令人满意的结构自动编写流行音乐仍然是一个具有挑战性的问题。在本文中,我们建议利用和谐学习的学习来获得结构增强的流行音乐。一方面,和谐,和弦的参与者之一代表了多个音符的谐波集,该音符与音乐的空间结构紧密整合在一起。另一方面,另一个和谐,和弦进步的参与者通常伴随音乐的发展,从而促进了音乐的时间结构。此外,当和弦演变成和弦发展时,质地和形式可以由和谐自然地桥接,这有助于两种结构的共同学习。此外,我们提出了和谐感知的等级音乐变压器(帽子),可以从音乐中适应结构,并使音乐令牌在层次上进行层次相互作用,以增强多层音乐元素的结构。实验结果表明,与现有方法相比,HAT对结构有更好的了解,并且还可以提高产生的音乐的质量,尤其是形式和质地。
translated by 谷歌翻译
The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.
translated by 谷歌翻译
Transformers and variational autoencoders (VAE) have been extensively employed for symbolic (e.g., MIDI) domain music generation. While the former boast an impressive capability in modeling long sequences, the latter allow users to willingly exert control over different parts (e.g., bars) of the music to be generated. In this paper, we are interested in bringing the two together to construct a single model that exhibits both strengths. The task is split into two steps. First, we equip Transformer decoders with the ability to accept segment-level, time-varying conditions during sequence generation. Subsequently, we combine the developed and tested in-attention decoder with a Transformer encoder, and train the resulting MuseMorphose model with the VAE objective to achieve style transfer of long pop piano pieces, in which users can specify musical attributes including rhythmic intensity and polyphony (i.e., harmonic fullness) they desire, down to the bar level. Experiments show that MuseMorphose outperforms recurrent neural network (RNN) based baselines on numerous widely-used metrics for style transfer tasks.
translated by 谷歌翻译
即使具有像变形金刚这样的强序模型,使用远程音乐结构产生表现力的钢琴表演仍然具有挑战性。同时,构成结构良好的旋律或铅片(Melody + Chords)的方法,即更简单的音乐形式,获得了更大的成功。在观察上面的情况下,我们设计了一个基于两阶段变压器的框架,该框架首先构成铅片,然后用伴奏和表达触摸来修饰它。这种分解还可以预处理非钢琴数据。我们的客观和主观实验表明,构成和装饰会缩小当前最新状态和真实表演之间的结构性差异,并改善了其他音乐方面,例如丰富性和连贯性。
translated by 谷歌翻译
文本样式传输是自然语言生成中的重要任务,旨在控制生成的文本中的某些属性,例如礼貌,情感,幽默和许多其他特性。它在自然语言处理领域拥有悠久的历史,最近由于深神经模型带来的有希望的性能而重大关注。在本文中,我们对神经文本转移的研究进行了系统调查,自2017年首次神经文本转移工作以来跨越100多个代表文章。我们讨论了任务制定,现有数据集和子任务,评估,以及丰富的方法在存在并行和非平行数据存在下。我们还提供关于这项任务未来发展的各种重要主题的讨论。我们的策据纸张列表在https://github.com/zhijing-jin/text_style_transfer_survey
translated by 谷歌翻译
Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). It is regarded as crucial for the development of advanced text generation technologies that are more natural and better meet the specific constraints in practical applications. In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse and fluent text. However, due to the lower level of interpretability of deep neural networks, the controllability of these methods need to be guaranteed. To this end, controllable text generation using transformer-based PLMs has become a rapidly growing yet challenging new research hotspot. A diverse range of approaches have emerged in the recent 3-4 years, targeting different CTG tasks which may require different types of controlled constraints. In this paper, we present a systematic critical review on the common tasks, main approaches and evaluation methods in this area. Finally, we discuss the challenges that the field is facing, and put forward various promising future directions. To the best of our knowledge, this is the first survey paper to summarize CTG techniques from the perspective of PLMs. We hope it can help researchers in related fields to quickly track the academic frontier, providing them with a landscape of the area and a roadmap for future research.
translated by 谷歌翻译
The goal of building dialogue agents that can converse with humans naturally has been a long-standing dream of researchers since the early days of artificial intelligence. The well-known Turing Test proposed to judge the ultimate validity of an artificial intelligence agent on the indistinguishability of its dialogues from humans'. It should come as no surprise that human-level dialogue systems are very challenging to build. But, while early effort on rule-based systems found limited success, the emergence of deep learning enabled great advance on this topic. In this thesis, we focus on methods that address the numerous issues that have been imposing the gap between artificial conversational agents and human-level interlocutors. These methods were proposed and experimented with in ways that were inspired by general state-of-the-art AI methodologies. But they also targeted the characteristics that dialogue systems possess.
translated by 谷歌翻译
在本文中,我们介绍了联合主义者,这是一种能够感知的多仪器框架,能够转录,识别和识别和将多种乐器与音频剪辑分开。联合主义者由调节其他模块的仪器识别模块组成:输出仪器特异性钢琴卷的转录模块以及利用仪器信息和转录结果的源分离模块。仪器条件设计用于明确的多仪器功能,而转录和源分离模块之间的连接是为了更好地转录性能。我们具有挑战性的问题表述使该模型在现实世界中非常有用,因为现代流行音乐通常由多种乐器组成。但是,它的新颖性需要关于如何评估这种模型的新观点。在实验过程中,我们从各个方面评估了模型,为多仪器转录提供了新的评估观点。我们还认为,转录模型可以用作其他音乐分析任务的预处理模块。在几个下游任务的实验中,我们的转录模型提供的符号表示有助于解决降低检测,和弦识别和关键估计的频谱图。
translated by 谷歌翻译
Although lyrics generation has achieved significant progress in recent years, it has limited practical applications because the generated lyrics cannot be performed without composing compatible melodies. In this work, we bridge this practical gap by proposing a song rewriting system which rewrites the lyrics of an existing song such that the generated lyrics are compatible with the rhythm of the existing melody and thus singable. In particular, we propose SongRewriter, a controllable Chinese lyric generation and editing system which assists users without prior knowledge of melody composition. The system is trained by a randomized multi-level masking strategy which produces a unified model for generating entirely new lyrics or editing a few fragments. To improve the controllabiliy of the generation process, we further incorporate a keyword prompt to control the lexical choices of the content and propose novel decoding constraints and a vowel modeling task to enable flexible end and internal rhyme schemes. While prior rhyming metrics are mainly for rap lyrics, we propose three novel rhyming evaluation metrics for song lyrics. Both automatic and human evaluations show that the proposed model performs better than the state-of-the-art models in both contents and rhyming quality. Our code and models implemented in MindSpore Lite tool will be available.
translated by 谷歌翻译
我们考虑了自动生成音乐文本描述的新颖任务。与其他完善的文本生成任务(例如图像标题)相比,富裕的音乐和文本数据集的稀缺性使其成为更具挑战性的任务。在本文中,我们利用众包音乐评论来构建一个新的数据集,并提出一个序列到序列模型以生成音乐的文本描述。更具体地说,我们将扩张的卷积层用作编码器的基本组成部分,基于内存的复发性神经网络作为解码器。为了增强生成文本的真实性和主题,我们进一步建议用歧视者和新的主题评估者微调模型。为了衡量生成的文本的质量,我们还提出了两个新的评估指标,它们比人类评估比传统指标(例如BLEU)更加一致。实验结果验证了我们的模型能够在包含原始音乐的主题和内容信息的同时产生流利而有意义的评论。
translated by 谷歌翻译
The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 2nd International Workshop on Reading Music Systems, held in Delft on the 2nd of November 2019.
translated by 谷歌翻译
现有的使用变压器模型生成多功能音乐的方法仅限于一小部分乐器或简短的音乐片段。这部分是由于MultiTrack Music的现有表示形式所需的冗长输入序列的内存要求。在这项工作中,我们提出了一个紧凑的表示,该表示可以允许多种仪器,同时保持短序列长度。使用我们提出的表示形式,我们介绍了MultiTrack Music Transformer(MTMT),用于学习多领音乐中的长期依赖性。在主观的听力测试中,我们提出的模型针对两个基线模型实现了无条件生成的竞争质量。我们还表明,我们提出的模型可以生成样品,这些样品的长度是基线模型产生的样品,此外,可以在推理时间的一半中进行样本。此外,我们提出了一项新的措施,以分析音乐自我展示,并表明训练有素的模型学会更少注意与当前音符形成不和谐间隔的注释,但更多地却更多地掌握了与当前相距4N节奏的音符。最后,我们的发现为未来的工作提供了一个新颖的基础,探索了更长形式的多音阶音乐生成并改善音乐的自我吸引力。所有源代码和音频样本均可在https://salu133445.github.io/mtmt/上找到。
translated by 谷歌翻译
了解因果关系有助于构建干预措施,以实现特定的目标并在干预下实现预测。随着学习因果关系的越来越重要,因果发现任务已经从使用传统方法推断出潜在的因果结构从观察数据到深度学习涉及的模式识别领域。大量数据的快速积累促进了具有出色可扩展性的因果搜索方法的出现。因果发现方法的现有摘要主要集中在基于约束,分数和FCM的传统方法上,缺乏针对基于深度学习的方法的完美分类和阐述,还缺乏一些考虑和探索因果关系的角度来探索因果发现方法范式。因此,我们根据变量范式将可能的因果发现任务分为三种类型,并分别给出三个任务的定义,定义和实例化每个任务的相关数据集以及同时构建的最终因果模型,然后审查不同任务的主要因果发现方法。最后,我们从不同角度提出了一些路线图,以解决因果发现领域的当前研究差距,并指出未来的研究方向。
translated by 谷歌翻译
Achieving multiple genres and long-term choreography sequences from given music is a challenging task, due to the lack of a multi-genre dataset. To tackle this problem,we propose a Multi Art Genre Intelligent Choreography Dataset (MagicDance). The data of MagicDance is captured from professional dancers assisted by motion capture technicians. It has a total of 8 hours 3D motioncapture human dances with paired music, and 16 different dance genres. To the best of our knowledge, MagicDance is the 3D dance dataset with the most genres. In addition, we find that the existing two types of methods (generation-based method and synthesis-based method) can only satisfy one of the diversity and duration, but they can complement to some extent. Based on this observation, we also propose a generation-synthesis choreography network (MagicNet), which cascades a Diffusion-based 3D Diverse Dance fragments Generation Network (3DGNet) and a Genre&Coherent aware Retrieval Module (GCRM). The former can generate various dance fragments from only one music clip. The latter is utilized to select the best dance fragment generated by 3DGNet and switch them into a complete dance according to the genre and coherent matching score. Quantitative and qualitative experiments demonstrate the quality of MagicDance, and the state-of-the-art performance of MagicNet.
translated by 谷歌翻译
尽管不断努力提高代码搜索的有效性和效率,但仍未解决两个问题。首先,编程语言具有固有的牢固结构链接,并且代码的特征是文本表单将省略其中包含的结构信息。其次,代码和查询之间存在潜在的语义关系,跨序列对齐代码和文本是具有挑战性的,因此在相似性匹配期间,向量在空间上保持一致。为了解决这两个问题,在本文中,提出了一个名为CSSAM的代码搜索模型(代码语义和结构注意匹配)。通过引入语义和结构匹配机制,CSSAM有效提取并融合了多维代码功能。具体而言,开发了交叉和残留层,以促进代码和查询的高纬度空间比对。通过利用残差交互,匹配模块旨在保留更多的代码语义和描述性功能,从而增强了代码及其相应查询文本之间的附着力。此外,为了提高模型对代码固有结构的理解,提出了一个名为CSRG的代码表示结构(代码语义表示图),用于共同表示抽象语法树节点和代码的数据流。根据两个包含540K和330K代码段的公开可用数据集的实验结果,CSSAM在两个数据集中分别在获得最高的SR@1/5/10,MRR和NDCG@50方面大大优于基本线。此外,进行消融研究是为了定量衡量CSSAM每个关键组成部分对代码搜索效率和有效性的影响,这为改进高级代码搜索解决方案提供了见解。
translated by 谷歌翻译
Stack Overflow是最受欢迎的编程社区之一,开发人员可以为他们遇到的问题寻求帮助。然而,如果没有经验的开发人员无法清楚地描述他们的问题,那么他们很难吸引足够的关注并获得预期的答案。我们提出了M $ _3 $ NSCT5,这是一种自动从给定代码片段生成多个帖子标题的新颖方法。开发人员可以使用生成的标题查找密切相关的帖子并完成其问题描述。 M $ _3 $ NSCT5使用Codet5骨干,这是一种具有出色语言理解和发电能力的预训练的变压器模型。为了减轻歧义问题,即在不同背景下可以将相同的代码片段与不同的标题保持一致,我们提出了最大的边缘多元核抽样策略,以一次产生多个高质量和不同的标题候选者,以便开发人员选择。我们构建了一个大规模数据集,其中包含890,000个问题帖子,其中涵盖了八种编程语言,以验证M $ _3 $ NSCT5的有效性。 BLEU和胭脂指标的自动评估结果表明,M $ _3 $ NSCT5的优势比六个最先进的基线模型。此外,具有值得信赖结果的人类评估也证明了我们对现实世界应用方法的巨大潜力。
translated by 谷歌翻译
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
translated by 谷歌翻译