Diffusion model, a new generative modelling paradigm, has achieved great success in image, audio, and video generation. However, considering the discrete categorical nature of text, it is not trivial to extend continuous diffusion models to natural language, and text diffusion models are less studied. Sequence-to-sequence text generation is one of the essential natural language processing topics. In this work, we apply diffusion models to approach sequence-to-sequence text generation, and explore whether the superiority generation performance of diffusion model can transfer to natural language domain. We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation. SeqDiffuSeq uses an encoder-decoder Transformers architecture to model denoising function. In order to improve generation quality, SeqDiffuSeq combines the self-conditioning technique and a newly proposed adaptive noise schedule technique. The adaptive noise schedule has the difficulty of denoising evenly distributed across time steps, and considers exclusive noise schedules for tokens at different positional order. Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
translated by 谷歌翻译
Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We also discuss the potential reasons for emerging such reasoning abilities and highlight future research directions.
translated by 谷歌翻译
Language models with the Transformers structure have shown great performance in natural language processing. However, there still poses problems when fine-tuning pre-trained language models on downstream tasks, such as over-fitting or representation collapse. In this work, we propose HyPe, a simple yet effective fine-tuning technique to alleviate such problems by perturbing hidden representations of Transformers layers. Unlike previous works that only add noise to inputs or parameters, we argue that the hidden representations of Transformers layers convey more diverse and meaningful language information. Therefore, making the Transformers layers more robust to hidden representation perturbations can further benefit the fine-tuning of PLMs en bloc. We conduct extensive experiments and analyses on GLUE and other natural language inference datasets. Results demonstrate that HyPe outperforms vanilla fine-tuning and enhances generalization of hidden representations from different layers. In addition, HyPe acquires negligible computational overheads, and is better than and compatible with previous state-of-the-art fine-tuning techniques.
translated by 谷歌翻译
迅速的学习方法通​​过诱导更好的几次表现,在他们仍然遵循基于参数的学习范式的同时,引起了自然语言处理的波动。学习中的遗忘和死记硬背的记忆问题可能会遇到不稳定的概括问题。具体而言,香草及时的学习可能难以利用死记硬背的非典型实例,在完全监督的培训或过度贴身模式的情况下使用低射击数据。为了减轻此类局限性,我们以将知识从记忆中解耦的动机发展为有助于模型在概括和记忆之间取得平衡。与香草及时学习相反,重新启动构造了培训实例中的开放式知识店,并在输入,培训和推理过程中实现检索机制,从而使该模型能够从培训语料库中检索相关环境作为能力为提示增强。广泛的实验表明,Retroppt可以在几次射击和零拍设置中获得更好的性能。此外,我们进一步说明,我们提出的撤退可以通过新数据集获得更好的概括能力。对记忆的详细分析确实显示逆转可以减少语言模型对记忆的依赖;因此,改善下游任务的概括。
translated by 谷歌翻译
我们展示了一个新的开源和可扩展知识提取工具包,称为Deepke(基于深度学习的知识提取),支持标准完全监督,低资源少拍摄和文档级方案。 Deepke实现了各种信息提取任务,包括命名实体识别,关系提取和属性提取。使用统一的框架,DeePke允许开发人员和研究人员根据其要求,自定义数据集和模型以从非结构化文本中提取信息。具体而言,DeePke不仅为不同的任务和场景提供了各种功能模块和模型实现,而且还通过一致的框架组织所有组件以维持足够的模块化和可扩展性。此外,我们在\ URL {http://deepke.zjukg.cn/}中介绍一个在线平台,用于实时提取各种任务。 Deepke已经配备了Google Colab教程和初学者的综合文件。我们用演示视频发布\ url {https://github.com/zjunlp/deepke}源代码。
translated by 谷歌翻译
来自结构数据的自然语言生成主要侧重于表面级描述,患有无法控制的内容选择和低保真度。以前的作品利用逻辑表格来促进逻辑知识条件文本生成。虽然取得了显着的进步,但它们是数据饥饿的,这使得通过有限的数据充分利用现实应用程序。为此,本文提出了几次拍摄设置中的逻辑知识条件文本生成的统一框架。只有少量种子逻辑形式(例如,20/100拍摄),我们的方法利用自我训练和样本伪逻辑形式,基于内容和结构一致性。实验结果表明,我们的方法可以比基线获得更好的少量表现。
translated by 谷歌翻译
大多数NER方法都依赖于广泛的标记数据进行模型培训,这些数据在低资源场景中挣扎,培训数据有限。与资源丰富的源域相比,现有的主要方法通常会遇到目标域具有不同标签集的挑战,该标签集可以作为类传输和域转移得出的结论。在本文中,我们通过可拔出的提示(Lightner)提出了一个轻巧的调整范式,用于低资源。具体而言,我们构建了实体类别的统一可学习的语言器,以生成实体跨度序列和实体类别,而无需任何标签特定的分类器,从而解决了类转移问题。我们通过将可学习的参数纳入自我发言层作为指导,进一步提出了一个可插入的指导模块,该参数可以重新调节注意力并调整预训练的权重。请注意,我们仅通过修复了预训练的语言模型的整个参数来调整那些插入的模块,从而使我们的方法轻巧且灵活地适合低资源场景,并且可以更好地跨域传输知识。实验结果表明,Lightner可以在标准监督环境中获得可比的性能,并且在低资源设置中优于强大基线。代码在https://github.com/zjunlp/deepke/tree/main/main/example/ner/few-shot中。
translated by 谷歌翻译
最近基于神经的关系提取方法虽然实现了对基准数据集的有希望的改进,但据对抗对抗攻击的脆弱性。到目前为止,努力主要集中在产生对抗性样本或捍卫对抗性攻击,但对正常和对抗样品之间的差异很少。在这项工作中,我们采取第一步利用基于显着的方法来分析那些对抗性样本。我们观察到显着标记与对抗扰动的直接相关。我们进一步发现对抗性扰动是训练集中不存在的那些代币或与关系标签相关的肤色提示。在某种程度上,我们的方法推出了对抗对抗样本的特征。我们在https://github.com/zjunlp/diagnoseadv中发布了一个开源测试用“diagnoseadv”。
translated by 谷歌翻译
三重提取是自然语言处理和知识图构建信息提取的重要任务。在本文中,我们重新审视了序列生成的端到端三重提取任务。由于生成三重提取可能难以捕获长期依赖性并产生不忠的三元组,因此我们引入了一种新型模型,即与生成变压器的对比度三重提取。具体而言,我们为基于编码器的生成引入了一个共享的变压器模块。为了产生忠实的结果,我们提出了一个新颖的三胞胎对比训练对象。此外,我们引入了两种机制,以进一步提高模型性能(即,批处理动态注意力掩盖和三个方面的校准)。在三个数据集(即NYT,WebNLG和MIE)上进行的实验结果表明,我们的方法比基线的方法更好。
translated by 谷歌翻译
As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its development. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. This survey focuses on reviewing the current researches of transfer learning by using deep neural network and its applications. We defined deep transfer learning, category and review the recent research works based on the techniques used in deep transfer learning.
translated by 谷歌翻译