Depression is a leading cause of death worldwide, and the diagnosis of depression is nontrivial. Multimodal learning is a popular solution for automatic diagnosis of depression, and the existing works suffer two main drawbacks: 1) the high-order interactions between different modalities can not be well exploited; and 2) interpretability of the models are weak. To remedy these drawbacks, we propose a multimodal multi-order factor fusion (MMFF) method. Our method can well exploit the high-order interactions between different modalities by extracting and assembling modality factors under the guide of a shared latent proxy. We conduct extensive experiments on two recent and popular datasets, E-DAIC-WOZ and CMDC, and the results show that our method achieve significantly better performance compared with other existing approaches. Besides, by analyzing the process of factor assembly, our model can intuitively show the contribution of each factor. This helps us understand the fusion mechanism.
translated by 谷歌翻译
近年来,随着新颖的策略和应用,神经网络一直在迅速扩展。然而,尽管不可避免地会针对关键应用程序来解决这些挑战,例如神经网络技术诸如神经网络技术中仍未解决诸如神经网络技术的挑战。已经尝试通过用符号表示来表示和嵌入域知识来克服神经网络计算中的挑战。因此,出现了神经符号学习(Nesyl)概念,其中结合了符号表示的各个方面,并将常识带入神经网络(Nesyl)。在可解释性,推理和解释性至关重要的领域中,例如视频和图像字幕,提问和推理,健康信息学和基因组学,Nesyl表现出了有希望的结果。这篇综述介绍了一项有关最先进的Nesyl方法的全面调查,其原理,机器和深度学习算法的进步,诸如Opthalmology之类的应用以及最重要的是该新兴领域的未来观点。
translated by 谷歌翻译
肌肉骨骼和神经系统疾病是老年人行走问题的最常见原因,它们通常导致生活质量降低。分析步行运动数据手动需要训练有素的专业人员,并且评估可能并不总是客观的。为了促进早期诊断,最近基于深度学习的方法显示了自动分析的有希望的结果,这些方法可以发现传统的机器学习方法中未发现的模式。我们观察到,现有工作主要应用于单个联合特征,例如时间序列的联合职位。由于发现了诸如通常较小规模的医疗数据集的脚之间的距离(即步幅宽度)之类的挑战,因此这些方法通常是优选的。结果,我们提出了一种解决方案,该解决方案明确地将单个关节特征和关节间特征作为输入,从而使系统免于从小数据中发现更复杂的功能。由于两种特征的独特性质,我们引入了一个两流框架,其中一个流从关节位置的时间序列中学习,另一个从相对关节位移的时间序列中学习。我们进一步开发了一个中层融合模块,以将发现的两个流中发现的模式结合起来进行诊断,从而导致数据互补表示,以获得更好的预测性能。我们使用3D骨架运动的基准数据集涉及45例肌肉骨骼和神经系统疾病的患者,并实现95.56%的预测准确性,效果优于最先进的方法,从而验证了我们的系统。
translated by 谷歌翻译
眼睛跟踪器可以在超声(US)扫描期间为超声检查员提供视觉指导。对于经验丰富的运营商来说,这种指导可能是有价值的,可以提高他们在操纵探测器以实现所需飞机方面的扫描技能。在本文中,提出了一种多模式的指导方法(多模式形式的指导方法)来捕获现实世界中的视频信号,同步注视和统一框架内的探测运动之间的逐步依赖性。为了了解目光运动与探测运动之间的因果关系,我们的模型利用多任务学习共同学习了两个相关任务:预测经验丰富的超声仪将在常规产科扫描中执行的凝视运动和探测信号。这两个任务通过模态感知的空间图关联,以检测多模式输入之间的共发生并共享有用的跨模式信息。多模式形式的扫描路径不是确定性的扫描路径,可以通过估计实际扫描的概率分布来扫描多样性。通过三个典型的产科扫描检查进行的实验表明,新方法在探针运动指导和凝视运动预测方面都优于单任务学习。多模态偏见还提供了一个视觉引导信号,对于224x288 US图像,错误率小于10像素。
translated by 谷歌翻译
由于字符之间的复杂和多样化的交互作用,合成的多字符交互是一项艰巨的任务。特别是,在产生诸如舞蹈和战斗之类的紧密互动时,需要精确的时空对齐。现有的生成多字符相互作用的工作集中在给定序列中生成单一类型的反应运动,从而导致缺乏各种结果动作。在本文中,我们提出了一种新颖的方式来创建现实的人类反应动作,通过混合和匹配不同类型的紧密相互作用,在给定数据集中未呈现。我们提出了一个有条件的层次生成对抗网络,具有多热的类嵌入,以从领导者的给定运动序列中生成追随者的混合和匹配反应性运动。实验是对嘈杂(基于深度)和高质量(基于MOCAP)的交互数据集进行的。定量和定性结果表明,我们的方法的表现优于给定数据集上的最新方法。我们还提供了一个增强数据集,具有逼真的反应动作,以刺激该领域的未来研究。该代码可从https://github.com/aman-goel1/imm获得
translated by 谷歌翻译
视频中的人类对象相互作用(HOI)识别对于分析人类活动很重要。在现实世界中,大多数关注视觉特征的工作通常都会受到阻塞。当HOI中有多个人和物体涉及时,这种问题将更加复杂。考虑到诸如人类姿势和物体位置之类的几何特征提供有意义的信息来了解HOI,我们认为将视觉和几何特征的好处结合在HOI识别中,并提出了一个新颖的两级几何形状特征信息信息图形卷积(2G) -GCN)。几何级图模拟了人类和对象的几何特征之间的相互依赖性,而融合级别的图将它们与人类和对象的视觉特征融合在一起。为了证明我们方法在挑战性场景中的新颖性和有效性,我们提出了一个新的多人HOI数据集(Mphoi-72)。关于Mphoi-72(多人HOI),CAD-1220(单人HOI)和双人动作(双手HOI)数据集的广泛实验证明了我们的表现与最先进的表现相比。
translated by 谷歌翻译
当前的关键字发现系统通常通过大量预定义的关键字进行培训。在开放式摄影设置中识别关键字对于个性化智能设备互动至关重要。为了实现这一目标,我们提出了一个基于MLPMixer的纯粹基于MLP的神经网络,该网络是MLPMIXER - 一种MLP模型体系结构,可有效取代视觉变压器中的注意机制。我们研究了将mlpmixer体系结构适应QBYE开放式录音录一下关键字点斑点任务的不同方法。与最先进的RNN和CNN模型的比较表明,我们的方法在挑战性情况(10DB和6DB环境)上都在公开可用的HEY-SNIPS数据集和具有400个扬声器的更大规模的内部数据集上取得了更好的性能。与基线模型相比,我们提出的模型还具有较少数量的参数和MAC。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译