从自然语言嵌入中汲取灵感,我们提出了Astromer,这是一种基于变压器的模型,以创建光曲线的表示。Astromer接受了数以百万计的Macho R波段样品的培训,并且很容易对其进行微调以匹配与下游任务相关的特定域。例如,本文显示了使用预训练的表示形式对变量恒星进行分类的好处。此外,我们还提供了一个Python库,其中包括这项工作中使用的所有功能。我们的图书馆包括预先培训的模型,可用于增强深度学习模型的性能,减少计算资源,同时获得最新的结果。
translated by 谷歌翻译
Future surveys such as the Legacy Survey of Space and Time (LSST) of the Vera C. Rubin Observatory will observe an order of magnitude more astrophysical transient events than any previous survey before. With this deluge of photometric data, it will be impossible for all such events to be classified by humans alone. Recent efforts have sought to leverage machine learning methods to tackle the challenge of astronomical transient classification, with ever improving success. Transformers are a recently developed deep learning architecture, first proposed for natural language processing, that have shown a great deal of recent success. In this work we develop a new transformer architecture, which uses multi-head self attention at its core, for general multi-variate time-series data. Furthermore, the proposed time-series transformer architecture supports the inclusion of an arbitrary number of additional features, while also offering interpretability. We apply the time-series transformer to the task of photometric classification, minimising the reliance of expert domain knowledge for feature selection, while achieving results comparable to state-of-the-art photometric classification methods. We achieve a logarithmic-loss of 0.507 on imbalanced data in a representative setting using data from the Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC). Moreover, we achieve a micro-averaged receiver operating characteristic area under curve of 0.98 and micro-averaged precision-recall area under curve of 0.87.
translated by 谷歌翻译
基于变压器的大型语言模型在自然语言处理中表现出色。通过考虑这些模型在一个领域中获得的知识的可传递性,以及自然语言与高级编程语言(例如C/C ++)的亲密关系,这项工作研究了如何利用(大)基于变压器语言模型检测软件漏洞以及这些模型在漏洞检测任务方面的良好程度。在这方面,首先提出了一个系统的(凝聚)框架,详细介绍了源代码翻译,模型准备和推理。然后,使用具有多个漏洞的C/C ++源代码的软件漏洞数据集进行经验分析,该数据集对应于库功能调用,指针使用,数组使用情况和算术表达式。我们的经验结果证明了语言模型在脆弱性检测中的良好性能。此外,这些语言模型具有比当代模型更好的性能指标,例如F1得分,即双向长期记忆和双向封闭式复发单元。由于计算资源,平台,库和依赖项的要求,对语言模型进行实验始终是具有挑战性的。因此,本文还分析了流行的平台,以有效地微调这些模型并在选择平台时提出建议。
translated by 谷歌翻译
在这项工作中,我们审查并评估了一个具有公开可用和广泛使用的数据集的深度学习知识追踪(DLKT)模型,以及学习编程的新型学生数据集。评估的DLKT模型已重新实现,用于评估先前报告的结果的可重复性和可复制性。我们测试在与模型的主要架构上独立于模型的比较模型中找到的不同输入和输出层变化,以及在某些研究中隐含地和明确地使用的不同最大尝试计数选项。几个指标用于反映评估知识追踪模型的质量。评估的知识追踪模型包括Vanilla-DKT,两个长短期内存深度知识跟踪(LSTM-DKT)变体,两个动态键值存储器网络(DKVMN)变体,以及自我细致的知识跟踪(SAKT)。我们评估Logistic回归,贝叶斯知识跟踪(BKT)和简单的非学习模型作为基准。我们的结果表明,DLKT模型一般优于非DLKT模型,DLKT模型之间的相对差异是微妙的,并且在数据集之间经常变化。我们的研究结果还表明,通常的纯模型,例如平均预测,比更复杂的知识追踪模型更好地表现出更好的性能,尤其是在准确性方面。此外,我们的公制和封路数据分析显示,用于选择最佳模型的度量标准对模型的性能有明显的影响,并且该度量选择可以影响模型排名。我们还研究了输入和输出层变化的影响,过滤出长期尝试序列,以及随机性和硬件等非模型属性。最后,我们讨论模型性能可重量和相关问题。我们的模型实现,评估代码和数据作为本工作的一部分发布。
translated by 谷歌翻译
Transfer learning, where a model is first pre-trained on a data-rich task before being finetuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.
translated by 谷歌翻译
目前,用于训练语言模型的最广泛的神经网络架构是所谓的BERT,导致各种自然语言处理(NLP)任务的改进。通常,BERT模型中的参数的数量越大,这些NLP任务中获得的结果越好。不幸的是,内存消耗和训练持续时间随着这些模型的大小而大大增加。在本文中,我们调查了较小的BERT模型的各种训练技术:我们将不同的方法与Albert,Roberta和相对位置编码等其他BERT变体相结合。此外,我们提出了两个新的微调修改,导致更好的性能:类开始终端标记和修改形式的线性链条条件随机字段。此外,我们介绍了整个词的注意力,从而降低了伯特存储器的使用,并导致性能的小幅增加,与古典的多重关注相比。我们评估了这些技术的五个公共德国命名实体识别(NER)任务,其中两条由这篇文章引入了两项任务。
translated by 谷歌翻译
在本文中,我们试图通过引入深度学习模型的句法归纳偏见来建立两所学校之间的联系。我们提出了两个归纳偏见的家族,一个家庭用于选区结构,另一个用于依赖性结构。选区归纳偏见鼓励深度学习模型使用不同的单位(或神经元)分别处理长期和短期信息。这种分离为深度学习模型提供了一种方法,可以从顺序输入中构建潜在的层次表示形式,即更高级别的表示由高级表示形式组成,并且可以分解为一系列低级表示。例如,在不了解地面实际结构的情况下,我们提出的模型学会通过根据其句法结构组成变量和运算符的表示来处理逻辑表达。另一方面,依赖归纳偏置鼓励模型在输入序列中找到实体之间的潜在关系。对于自然语言,潜在关系通常被建模为一个定向依赖图,其中一个单词恰好具有一个父节点和零或几个孩子的节点。将此约束应用于类似变压器的模型之后,我们发现该模型能够诱导接近人类专家注释的有向图,并且在不同任务上也优于标准变压器模型。我们认为,这些实验结果为深度学习模型的未来发展展示了一个有趣的选择。
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train. 1
translated by 谷歌翻译
预训练在机器学习的不同领域表现出成功,例如计算机视觉,自然语言处理(NLP)和医学成像。但是,尚未完全探索用于临床数据分析。记录了大量的临床记录,但是对于在小型医院收集的数据或处理罕见疾病的数据仍可能稀缺数据和标签。在这种情况下,对较大的未标记临床数据进行预训练可以提高性能。在本文中,我们提出了专为异质的多模式临床数据设计的新型无监督的预训练技术,用于通过蒙版语言建模(MLM)启发的患者预测,通过利用对人群图的深度学习来启发。为此,我们进一步提出了一个基于图形转换器的网络,该网络旨在处理异质临床数据。通过将基于掩盖的预训练与基于变压器的网络相结合,我们将基于掩盖的其他域中训练的成功转化为异质临床数据。我们使用三个医学数据集Tadpole,Mimic-III和一个败血症预测数据集,在自我监督和转移学习设置中展示了我们的预训练方法的好处。我们发现,我们提出的培训方法有助于对患者和人群水平的数据进行建模,并提高所有数据集中不同微调任务的性能。
translated by 谷歌翻译
人类活动识别是计算机视觉中的新出现和重要领域,旨在确定个体或个体正在执行的活动。该领域的应用包括从体育中生成重点视频到智能监视和手势识别。大多数活动识别系统依赖于卷积神经网络(CNN)的组合来从数据和复发性神经网络(RNN)中进行特征提取来确定数据的时间依赖性。本文提出并设计了两个用于人类活动识别的变压器神经网络:一个经常性变压器(RET),这是一个专门的神经网络,用于对数据序列进行预测,以及视觉变压器(VIT),一种用于提取显着的变压器的变压器(VIT)图像的特征,以提高活动识别的速度和可扩展性。我们在速度和准确性方面提供了对拟议的变压器神经网络与现代CNN和基于RNN的人类活动识别模型的广泛比较。
translated by 谷歌翻译
In recent years, deep learning has infiltrated every field it has touched, reducing the need for specialist knowledge and automating the process of knowledge discovery from data. This review argues that astronomy is no different, and that we are currently in the midst of a deep learning revolution that is transforming the way we do astronomy. We trace the history of astronomical connectionism from the early days of multilayer perceptrons, through the second wave of convolutional and recurrent neural networks, to the current third wave of self-supervised and unsupervised deep learning. We then predict that we will soon enter a fourth wave of astronomical connectionism, in which finetuned versions of an all-encompassing 'foundation' model will replace expertly crafted deep learning models. We argue that such a model can only be brought about through a symbiotic relationship between astronomy and connectionism, whereby astronomy provides high quality multimodal data to train the foundation model, and in turn the foundation model is used to advance astronomical research.
translated by 谷歌翻译
随着越来越多的可用文本数据,能够自动分析,分类和摘要这些数据的算法的开发已成为必需品。在本研究中,我们提出了一种用于关键字识别的新颖算法,即表示给定文档的关键方面的一个或多字短语的提取,称为基于变压器的神经标记器,用于关键字识别(TNT-KID)。通过将变压器架构适用于手头的特定任务并利用域特定语料库上的预先磨损的语言模型,该模型能够通过提供竞争和强大的方式克服监督和无监督的最先进方法的缺陷在各种不同的数据集中的性能,同时仅需要最佳执行系统所需的手动标记的数据。本研究还提供了彻底的错误分析,具有对模型内部运作的有价值的见解和一种消融研究,测量关键字识别工作流程的特定组分对整体性能的影响。
translated by 谷歌翻译
The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.
translated by 谷歌翻译
本教程展示了工作流程,将文本数据纳入精算分类和回归任务。主要重点是采用基于变压器模型的方法。平均长度为400个单词的车祸描述的数据集,英语和德语可用,以及具有简短财产保险索赔的数据集用来证明这些技术。案例研究应对与多语言环境和长输入序列有关的挑战。他们还展示了解释模型输出,评估和改善模型性能的方法,通过将模型调整到应用程序领域或特定预测任务。最后,该教程提供了在没有或仅有少数标记数据的情况下处理分类任务的实用方法。通过使用最少的预处理和微调的现成自然语言处理(NLP)模型的语言理解技能(NLP)模型实现的结果清楚地证明了用于实际应用的转移学习能力。
translated by 谷歌翻译
生态瞬间评估(EMAS)是用于测量移动卫生(MHECHEATH)研究和治疗方案的当前认知状态,影响,行为和环境因素的重要心理数据源。非反应,其中参与者未能响应EMA提示,是一个地方问题。准确预测非响应的能力可用于改善EMA交付和发展顺应性干预。事先工作已经探索了古典机器学习模型,以预测非反应。然而,正如越来越大的EMA数据集可用,有可能利用在其他领域有效的深度学习模型。最近,变压器模型在NLP和其他域中显示了最先进的性能。这项工作是第一个探索用于EMA数据分析的变压器的使用。我们在将变压器应用于EMA数据时解决了三个关键问题:1。输入表示,2.编码时间信息,3.预先培训提高下游预测任务性能的效用。变压器模型实现了0.77的非响应预测AUC,并且明显优于古典ML和基于LSTM的深度学习模型。我们将使我们的一个预测模型在研究界可自由地提供40k EMA样品的核查,以便于开发未来的基于变压器的EMA分析工作。
translated by 谷歌翻译
在培训数据中拟合复杂的模式,例如推理和争议,是语言预训练的关键挑战。根据最近的研究和我们的经验观察,一种可能的原因是训练数据中的一些易于适应的模式,例如经常共同发生的单词组合,主导和伤害预训练,使模型很难适合更复杂的信息。我们争辩说,错误预测可以帮助找到危害语言理解的这种主导模式。当发生错误预测时,应该经常与导致MIS预测的模型拟合的MIS预测字相同的模式。如果我们可以添加正规化以培训模型,当MIS预测发生并更多地对待更微妙的模式时,可以在更多信息上缩小到这种主导模式时,可以在预训练中有效地安装更多信息。在此动机之后,我们提出了一种新的语言预培训方法,错误预测作为伤害警报(MPA)。在MPA中,当在预训练期间发生错误预测时,我们使用其共同发生信息来指导自我关注模块的多个头部。变压器模块中的一些自我关注头经过优化,以将更低的注意重量分配给频繁地在误报中的输入句子中的单词,同时将更高权重分配给另一个单词。通过这样做,变压器模型训练,以依赖于主导的频繁共同发生模式,而在误报中,当发生错误预测时,在剩余更复杂的信息上更加关注更多。我们的实验表明,MPA加快了伯特和电器的预训练,并提高了他们对下游任务的表现。
translated by 谷歌翻译
Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features (a.k.a pseudo-labels) has proven to be a particularly relevant pretext task, leading to useful self-supervised representations which prove to be effective for downstream tasks. However, methods and common practices for combining such pretext tasks for better performance on the downstream task have not been explored and understood properly. In fact, the process relies almost exclusively on a computationally heavy experimental procedure, which becomes intractable with the increase of the number of pretext tasks. This paper introduces a method to select a group of pretext tasks among a set of candidates. The method we propose estimates calibrated weights for the partial losses corresponding to the considered pretext tasks during the self-supervised training process. The experiments conducted on automatic speech recognition, speaker and emotion recognition validate our approach, as the groups selected and weighted with our method perform better than classic baselines, thus facilitating the selection and combination of relevant pseudo-labels for self-supervised representation learning.
translated by 谷歌翻译
转移学习已通过深度审慎的语言模型广泛用于自然语言处理,例如来自变形金刚和通用句子编码器的双向编码器表示。尽管取得了巨大的成功,但语言模型应用于小型数据集时会过多地适合,并且很容易忘记与分类器进行微调时。为了解决这个忘记将深入的语言模型从一个域转移到另一个领域的问题,现有的努力探索了微调方法,以减少忘记。我们建议DeepeMotex是一种有效的顺序转移学习方法,以检测文本中的情绪。为了避免忘记问题,通过从Twitter收集的大量情绪标记的数据来仪器进行微调步骤。我们使用策划的Twitter数据集和基准数据集进行了一项实验研究。 DeepeMotex模型在测试数据集上实现多级情绪分类的精度超过91%。我们评估了微调DeepeMotex模型在分类Emoint和刺激基准数据集中的情绪时的性能。这些模型在基准数据集中的73%的实例中正确分类了情绪。所提出的DeepeMotex-Bert模型优于BI-LSTM在基准数据集上的BI-LSTM增长23%。我们还研究了微调数据集的大小对模型准确性的影响。我们的评估结果表明,通过大量情绪标记的数据进行微调提高了最终目标任务模型的鲁棒性和有效性。
translated by 谷歌翻译
Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations, wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of inter-token relationships across layers. On the other hand, there is naturally an evolutionary trend among attention maps at different abstraction levels, so it is beneficial to exploit a dedicated convolution-based module to capture this process. Equipped with the proposed mechanism, the convolution-enhanced evolving attention networks achieve superior performance in various applications, including time-series representation, natural language understanding, machine translation, and image classification. Especially on time-series representation tasks, Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly, achieving an average of 17% improvement compared to the best SOTA. To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps. Our implementation is available at https://github.com/pkuyym/EvolvingAttention
translated by 谷歌翻译