translated by 谷歌翻译
最近的问题答案(QA)数据集和模型的爆炸在多个数据集上的训练模型或通过组合多个模型来增加多个域和格式的模型泛化的兴趣。我们认为,尽管有多个数据集模型的有希望的结果,但一些域或QA格式可能需要特定的架构,因此这些模型的适应性可能受到限制。此外,组合模型的当前方法忽略了质疑,例如问答兼容性。在这项工作中,我们建议将专家代理与专业代理商合并具有小说,灵活和培训的架构,这些架构考虑问题,答案预测和答案 - 预测置信度分数,以选择答案候选人列表中的最佳答案。通过定量和定性实验,我们表明我们的模型I)在域内和域外方案中的先前多个代理和多个数据集方法,II)培训是极其资料的代理商之间的协作。和III)可以适应任何QA格式。
translated by 谷歌翻译
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
Communication and computation are often viewed as separate tasks. This approach is very effective from the perspective of engineering as isolated optimizations can be performed. On the other hand, there are many cases where the main interest is a function of the local information at the devices instead of the local information itself. For such scenarios, information theoretical results show that harnessing the interference in a multiple-access channel for computation, i.e., over-the-air computation (OAC), can provide a significantly higher achievable computation rate than the one with the separation of communication and computation tasks. Besides, the gap between OAC and separation in terms of computation rate increases with more participating nodes. Given this motivation, in this study, we provide a comprehensive survey on practical OAC methods. After outlining fundamentals related to OAC, we discuss the available OAC schemes with their pros and cons. We then provide an overview of the enabling mechanisms and relevant metrics to achieve reliable computation in the wireless channel. Finally, we summarize the potential applications of OAC and point out some future directions.
translated by 谷歌翻译
An activation function has a significant impact on the efficiency and robustness of the neural networks. As an alternative, we evolved a cutting-edge non-monotonic activation function, Negative Stimulated Hybrid Activation Function (Nish). It acts as a Rectified Linear Unit (ReLU) function for the positive region and a sinus-sigmoidal function for the negative region. In other words, it incorporates a sigmoid and a sine function and gaining new dynamics over classical ReLU. We analyzed the consistency of the Nish for different combinations of essential networks and most common activation functions using on several most popular benchmarks. From the experimental results, we reported that the accuracy rates achieved by the Nish is slightly better than compared to the Mish in classification.
translated by 谷歌翻译
在这项研究中,提出了用于实现连续值梯度聚合的数字空地计算方案。结果表明,可以使用相应的数字的平均值来计算一组实价参数的平均值,其中数字是根据平衡数字系统获得的。通过使用此属性,提出的方案将局部梯度编码为一组数字。然后,它通过使用数字值来确定激活的正交频分多路复用(OFDM)子载波的位置。为了消除需要精确的样本级时同步,通道估计开销和由于通道倒置而引起的功率不稳定性,提出的方案还使用Edge Server(ES)的非连接接收器,并且不利用Pre-Pre-Pre-pre-pre-边缘设备(EDS)处的均等化。最后,得出了所提出的方案的理论均衡误差(MSE)性能,并展示了其在联合边缘学习(FEES)的性能。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
给定一系列自然语言描述,我们的任务是生成与文本相对应的3D人类动作,并遵循指令的时间顺序。特别是,我们的目标是实现一系列动作的综合,我们将其称为时间动作组成。文本条件运动合成中的艺术现状仅采用单个动作或单个句子作为输入。这部分是由于缺乏包含动作序列的合适训练数据,但这也是由于其非自动进取模型公式的计算复杂性,该计算的规模不能很好地扩展到长序列。在这项工作中,我们解决了这两个问题。首先,我们利用了最近的Babel运动文本集合,该收藏品具有广泛的标记作用,其中许多作用以它们之间的过渡为顺序。接下来,我们设计了一种基于变压器的方法,该方法在动作中进行非自动打击,但在动作序列中进行自动加工。与多个基线相比,这种层次配方在我们的实验中被证明有效。我们的方法被称为“为人类动作的时间动作组成”教授,为各种各样的动作和语言描述中的时间构成产生了现实的人类动作。为了鼓励从事这项新任务的工作,我们将代码用于研究目的,以$ \ href {toch.is.tue.mpg.de} {\ textrm {我们的网站}} $。
translated by 谷歌翻译