资源说明框架(RDF)和属性图(PG)是表示,存储和查询图数据的两个最常用的数据模型。我们提出了表达推理图存储(ERGS) - 构建在Janusgraph(属性图存储)顶部的图存储,该图还允许存储和查询RDF数据集。首先,我们描述了如何将RDF数据转换为属性图表示,然后描述将SPARQL查询转换为一系列Gremlin遍历的查询翻译模块。因此,开发的转换器和翻译器可以允许任何Apache TinkerPop符合图形数据库存储和查询RDF数据集。我们证明了使用JanusGraph作为基本属性图存储的建议方法的有效性,并将其性能与标准RDF系统进行比较。
translated by 谷歌翻译
Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies.
translated by 谷歌翻译
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community, but have not received as much attention as lower-level tasks like speech and speaker recognition. In particular, there are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers. Recent work has begun to introduce such benchmark datasets for several tasks. In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape. We contribute four tasks: question answering and summarization involve inference over longer speech sequences; named entity localization addresses the speech-specific task of locating the targeted content in the signal; dialog act classification identifies the function of a given speech utterance. We follow the blueprint of the Spoken Language Understanding Evaluation (SLUE) benchmark suite. In order to facilitate the development of SLU models that leverage the success of pre-trained speech representations, we will be publishing for each task (i) annotations for a relatively small fine-tuning set, (ii) annotated development and test sets, and (iii) baseline models for easy reproducibility and comparisons. In this work, we present the details of data collection and annotation and the performance of the baseline models. We also perform sensitivity analysis of pipeline models' performance (speech recognizer + text model) to the speech recognition accuracy, using more than 20 state-of-the-art speech recognition models.
translated by 谷歌翻译
Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that ``smoothness" is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between mixup and knowledge distillation. We also scrutinize the behavior of the networks trained with a mixup in the light of knowledge distillation through extensive analysis, visualizations, and comprehensive experiments on image classification. Finally, based on our findings, we suggest improved strategies to guide the student network to enhance its effectiveness. Additionally, the findings of this study provide insightful suggestions to researchers and practitioners that commonly use techniques from KD. Our code is available at https://github.com/hchoi71/MIX-KD.
translated by 谷歌翻译
Many self-supervised speech models, varying in their pre-training objective, input modality, and pre-training data, have been proposed in the last few years. Despite impressive empirical successes on downstream tasks, we still have a limited understanding of the properties encoded by the models and the differences across models. In this work, we examine the intermediate representations for a variety of recent models. Specifically, we measure acoustic, phonetic, and word-level properties encoded in individual layers, using a lightweight analysis tool based on canonical correlation analysis (CCA). We find that these properties evolve across layers differently depending on the model, and the variations relate to the choice of pre-training objective. We further investigate the utility of our analyses for downstream tasks by comparing the property trends with performance on speech recognition and spoken language understanding tasks. We discover that CCA trends provide reliable guidance to choose layers of interest for downstream tasks and that single-layer performance often matches or improves upon using all layers, suggesting implications for more efficient use of pre-trained models.
translated by 谷歌翻译
正如GPT-3和T5所证明的那样,随着参数空间变得越来越大,变压器具有能力。但是,对于需要大量知识的任务,非参数存储器允许模型在计算成本和GPU内存需求的次线性增加中急剧增长。诸如RAG和Realm之类的最新模型已将检索引入条件生成。这些模型结合了从一系列语料库中的神经初始检索。我们基于这一研究,提出了RE2G,该研究将神经初始检索和重新融合到基于巴特的序列到序列的生成中。我们的阅读方法还允许从无与伦比分数的来源合并结果,从而实现BM25和神经初始检索的合奏。为了训练我们的系统端到端,我们引入了一种新颖的知识蒸馏变体,以在目标序列输出上仅使用地面真理来训练初始检索,重读者和生成。我们在四个不同的任务中发现了很大的收益:零击插槽填充,问答,事实检查和对话,相对增长了9%至34%,比以前的苏格兰短裙排行榜上的最先前的排行榜相比。我们将代码作为开源提供,网址为https://github.com/ibm/kgi-slot-filling/tree/re2g。
translated by 谷歌翻译
与数字计算相比,模拟计算具有吸引力,因为它可以达到更高的计算密度和更高的能源效率。但是,与数字电路不同,由于晶体管偏置偏差,温度变化和有限的动态范围的差异,传统的模拟计算电路不能轻易地在不同的过程节点上映射。在这项工作中,我们概括了先前报道的基于边缘传播的模拟计算框架,用于设计新颖的\ textit {基于形状的模拟计算}(S-AC)电路,这些电路可以轻松地在不同的过程节点上交叉映射。与数字设计类似的S-AC设计也可以缩放以获得精确,速度和功率。作为概念验证,我们展示了实现机器学习(ML)体系结构中通常使用的数学功能的S-AC电路的几个示例。使用电路模拟,我们证明了电路输入/输出特性从平面CMOS 180NM工艺映射到FinFET 7NM工艺时保持健壮。同样,使用基准数据集,我们证明了基于S-AC的神经网络的分类精度在两个过程中映射到温度变化时仍然坚固。
translated by 谷歌翻译
问题回答(QA)对知识库(KBS)的挑战是充满挑战的,因为所需的推理模式多样化,本质上是无限的,类型的推理模式。但是,我们假设以大型KB为基础,以回答各自子图中各个实体的查询类型所需的推理模式。利用不同子图的本地社区之间的这种结构相似性,我们引入了一个半参数模型(cbr-subg),(i)一个非参数组件,每个查询,每个查询,都会动态检索其他类似的$ k $ - $ - $ - $ - near-neart-tebrienk(KNN)培训查询以及查询特定的子图和(ii)训练的参数组件,该参数分量可以从KNN查询的子图中识别(潜在的)推理模式,然后将其应用于目标查询的子图。我们还提出了一种自适应子图收集策略,以选择特定于查询的compact子图,从而使我们可以扩展到包含数十亿个事实的完整freebase kb。我们表明,CBR-SUBG可以回答需要子图推理模式的查询,并在几个KBQA基准上的最佳模型竞争性能。我们的子图收集策略还会产生更多紧凑的子图(例如,webQSP的尺寸减小55 \%,而将答案召回的召回率增加4.85 \%)\ footNote {代码,模型和子码头可在\ url {https://github.com上获得。 /rajarshd/cbr-subg}}。
translated by 谷歌翻译
偏差可估算的模拟计算对于实施机器学习(ML)处理器具有不同的功能性能规格具有吸引力。例如,用于服务器工作负载的ML实现专注于计算吞吐量和更快的训练,而Edge设备的ML实现则集中在节能推理上。在本文中,我们证明了使用边缘传播(MP)原理的概括(MP)原理称为基于形状的模拟计算(S-AC)的偏置模拟计算电路的实现。所得的S-AC核心集成了几个接近内存的计算元素,其中包括:(a)非线性激活函数; (b)内部产品计算电路; (c)混合信号压缩内存。使用在180nm CMOS工艺中制造的原型的测量结果,我们证明了计算模块的性能仍然可与晶体管偏置和温度变化保持稳健。在本文中,我们还证明了简单的ML回归任务的偏差量表性。
translated by 谷歌翻译
深度神经网络是参数化的数千或数百万个参数,并且在许多分类问题中表现出巨大的成功。然而,大量参数使得难以将这些模型集成到智能手机和可穿戴设备的边缘设备中。为了解决这个问题,知识蒸馏(KD)已被广泛采用,它使用预先训练的高容量网络来培训更小的网络,适用于边缘设备。本文首次研究了使用KD用于可穿戴设备的时间序列数据的适用性和挑战。 KD的成功应用需要在培训期间需要具体的数据增强方法。然而,如果在KD期间存在用于选择增强方法的相干策略,则尚不清楚。在本文中,我们报告了详细研究的结果,这些研究比较和对比基于KD的人类活动分析中的各种常见选择和一些混合数据增强策略。该领域的研究通常是有限的,因为公共领域没有可穿戴设备的全面数据库。我们的研究将数据库视为公共规模的数据库,以源于大规模介入研究的人类活动和久坐行为。我们发现,在KD期间的数据增强技术的选择具有对最终性能的可变影响程度,并发现最佳网络选择以及数据增强策略特定于手头的数据集。但是,我们还通过一系列关于数据库提供强大基线表现的一般建议。
translated by 谷歌翻译