Humans are sophisticated at reading interlocutors' emotions from multimodal signals, such as speech contents, voice tones and facial expressions. However, machines might struggle to understand various emotions due to the difficulty of effectively decoding emotions from the complex interactions between multimodal signals. In this paper, we propose a multimodal emotion analysis framework, InterMulti, to capture complex multimodal interactions from different views and identify emotions from multimodal signals. Our proposed framework decomposes signals of different modalities into three kinds of multimodal interaction representations, including a modality-full interaction representation, a modality-shared interaction representation, and three modality-specific interaction representations. Additionally, to balance the contribution of different modalities and learn a more informative latent interaction representation, we developed a novel Text-dominated Hierarchical High-order Fusion(THHF) module. THHF module reasonably integrates the above three kinds of representations into a comprehensive multimodal interaction representation. Extensive experimental results on widely used datasets, (i.e.) MOSEI, MOSI and IEMOCAP, demonstrate that our method outperforms the state-of-the-art.
translated by 谷歌翻译
Humans are skilled in reading the interlocutor's emotion from multimodal signals, including spoken words, simultaneous speech, and facial expressions. It is still a challenge to effectively decode emotions from the complex interactions of multimodal signals. In this paper, we design three kinds of multimodal latent representations to refine the emotion analysis process and capture complex multimodal interactions from different views, including a intact three-modal integrating representation, a modality-shared representation, and three modality-individual representations. Then, a modality-semantic hierarchical fusion is proposed to reasonably incorporate these representations into a comprehensive interaction representation. The experimental results demonstrate that our EffMulti outperforms the state-of-the-art methods. The compelling performance benefits from its well-designed framework with ease of implementation, lower computing complexity, and less trainable parameters.
translated by 谷歌翻译
我们提出了一个基于串联弹性执行器(SEA)的平行按摩机器人,提供统一的力量控制方法。首先,建立了运动和静态力模型,以获得相应的控制变量。然后,提出了一种新型的力位控制策略,以在不需要机器人动力学模型的情况下分别控制沿表面正常方向的力位和另一个两方向位移。为了评估其性能,我们实施了一系列机器人按摩实验。结果表明,所提出的按摩操纵器可以成功实现按摩任务的所需力和运动模式,从而达到高得分用户体验。
translated by 谷歌翻译
Recently, contrastive learning attracts increasing interests in neural text generation as a new solution to alleviate the exposure bias problem. It introduces a sequence-level training signal which is crucial to generation tasks that always rely on auto-regressive decoding. However, previous methods using contrastive learning in neural text generation usually lead to inferior performance. In this paper, we analyse the underlying reasons and propose a new Contrastive Neural Text generation framework, CoNT. CoNT addresses bottlenecks that prevent contrastive learning from being widely adopted in generation tasks from three aspects -- the construction of contrastive examples, the choice of the contrastive loss, and the strategy in decoding. We validate CoNT on five generation tasks with ten benchmarks, including machine translation, summarization, code comment generation, data-to-text generation and commonsense generation. Experimental results show that CoNT clearly outperforms the conventional training framework on all the ten benchmarks with a convincing margin. Especially, CoNT surpasses previous the most competitive contrastive learning method for text generation, by 1.50 BLEU on machine translation and 1.77 ROUGE-1 on summarization, respectively. It achieves new state-of-the-art on summarization, code comment generation (without external data) and data-to-text generation.
translated by 谷歌翻译
计算机视觉中的当前预训练方法专注于日常生活中的自然图像。但是,诸如图标和符号之类的抽象图在现实世界中是常见的,很重要。这项工作受到坦格图的启发,这是一种需要从七个解剖形状复制抽象模式的游戏。通过录制人类在解决坦文图谜题方面的体验,我们展示了Tangram DataSet,并显示Tangram上的预先训练的神经模型有助于解决一些基于低分辨率视觉的迷你视觉任务。广泛的实验表明,我们所提出的方法为折叠衣服和评估室布局等审美任务产生智能解决方案。预训练的特征提取器可以促进人类手写的几秒钟学习任务的收敛性,并提高轮廓识别图标的准确性。Tangram DataSet可在https://github.com/yizhouzhao/tangram上获得。
translated by 谷歌翻译
基于随机差分方程(SDE)的挥发性可再生能源(RESS)的随机过程模型共同捕获了连续时间的不断变化的概率分布和时间相关性。它已经使最近的研究能够显着提高动力系统动态不确定性量化和优化的性能。然而,考虑到PV的非同质随机过程性质,仍然存在一个具有挑战性的问题:如何获得用于光伏电源的现实和准确的SDE模型,以反映其在线操作中的天气不确定性,特别是在高分辨率数值时天气预报(NWP)对于许多分布式工厂不可用?为了填补这个差距,本文发现,只有使用来自低分辨率公共天气报告的廉价数据,可以构建精确的PV电源SDE模型。具体地,构建每小时参数化的Jacobi扩散过程以在一天内重新创建PV挥发性的时间模式。它的参数使用极端学习机(ELM)的集合来映射到公共天气报告,以反映不同的天气状况。 SDE模型共同捕捉盘流道和陷阱。基于澳门收集的现实数据的统计检验表明,所提出的方法优于一系列最先进的深度学习的时间系列预测方法。
translated by 谷歌翻译
文件级关系提取旨在识别整个文件中实体之间的关系。捕获远程依赖性的努力大量依赖于通过(图)神经网络学习的隐式强大的表示,这使得模型不太透明。为了解决这一挑战,在本文中,我们通过学习逻辑规则提出了一种新的文档级关系提取的概率模型。 Logire将逻辑规则视为潜在变量,包括两个模块:规则生成器和关系提取器。规则生成器是生成可能导致最终预测的逻辑规则,并且关系提取器基于所生成的逻辑规则输出最终预测。可以通过期望最大化(EM)算法有效地优化这两个模块。通过将逻辑规则引入神经网络,Logire可以明确地捕获远程依赖项,并享受更好的解释。经验结果表明,Logire在关系性能(1.8 F1得分)和逻辑一致性(超过3.3逻辑得分)方面显着优于几种强大的基线。我们的代码可以在https://github.com/rudongyu/logire提供。
translated by 谷歌翻译
基于RGB-D信息的多模态突出物体检测模型在现实世界中具有更好的鲁棒性。然而,它仍然是在特征融合阶段中的更好地平衡有效的多模态信息而变得不利。在这封信中,我们提出了一种新颖的门控校正网络(GRNET)来评估两种模式的信息有效性,并平衡其影响力。我们的框架分为三个阶段:感知阶段,重新编码混合阶段和特征集成阶段。首先,采用了一种感知编码器来提取多级单模态特征,这为多模态语义比较分析奠定了基础。然后,提出了一种模态 - 自适应栅极单元(MGU)以抑制无效信息并将有效的模态特征传送到重新编码混频器和混合分支解码器。 Recoding混频器负责重新编码和混合平衡的多模态信息。最后,混合分支解码器根据可选边缘引导流(OEG)的指导完成多级特征集成。八种流行基准测试的实验与分析验证了我们的框架对9种最先进的方法表现有利。
translated by 谷歌翻译
建模和预测太阳能事件,尤其是太阳渐变事件,对于提高太阳能发电系统的情境意识至关重要。人们已经认识到,温度,湿度和云密度等天气条件会显着影响太阳渐变事件的出现和位置。结果,用复杂的时空相关性对这些事件进行建模是高度挑战性的。为了解决这个问题,我们采用了一种新颖的时空分类点过程模型,该模型可以直观有效地解决渐变事件之间的相关性和相互作用。我们在广泛的真实数据实验中证明了模型的解释性和预测能力。
translated by 谷歌翻译
Accurate determination of a small molecule candidate (ligand) binding pose in its target protein pocket is important for computer-aided drug discovery. Typical rigid-body docking methods ignore the pocket flexibility of protein, while the more accurate pose generation using molecular dynamics is hindered by slow protein dynamics. We develop a tiered tensor transform (3T) algorithm to rapidly generate diverse protein-ligand complex conformations for both pose and affinity estimation in drug screening, requiring neither machine learning training nor lengthy dynamics computation, while maintaining both coarse-grain-like coordinated protein dynamics and atomistic-level details of the complex pocket. The 3T conformation structures we generate are closer to experimental co-crystal structures than those generated by docking software, and more importantly achieve significantly higher accuracy in active ligand classification than traditional ensemble docking using hundreds of experimental protein conformations. 3T structure transformation is decoupled from the system physics, making future usage in other computational scientific domains possible.
translated by 谷歌翻译