建模长期依赖关系对于理解计算机视觉中的任务至关重要。尽管卷积神经网络(CNN)在许多视觉任务中都表现出色,但由于它们通常由当地核层组成,因此它们仍然限制捕获长期结构化关系。但是,完全连接的图(例如变形金刚中的自我发项操作)对这种建模是有益的,但是,其计算开销非常有用。在本文中,我们提出了一个动态图形消息传递网络,与建模完全连接的图形相比,该网络大大降低了计算复杂性。这是通过在图表中自适应采样节点(以输入为条件)来实现的,以传递消息传递。基于采样节点,我们动态预测节点依赖性滤波器权重和亲和力矩阵,以在它们之间传播信息。这种公式使我们能够设计一个自我发挥的模块,更重要的是,我们将基于变压器的新骨干网络用于图像分类预处理,并用于解决各种下游任务(对象检测,实例和语义细分)。使用此模型,我们在四个不同任务上的强,最先进的基线方面显示出显着改进。我们的方法还优于完全连接的图形,同时使用较少的浮点操作和参数。代码和型号将在https://github.com/fudan-zvg/dgmn2上公开提供。
translated by 谷歌翻译
本文提出了一种新的方法,使用未标记的语音数据进行无标记的神经网络(RNN) - 转换器(RNN-T)端到端(E2E)自动语音识别(ASR)系统进行无监督的微调和自我训练。传统系统使用未标记的音频数据时,使用ASR假设作为目标进行微调/自我训练,并且容易受到基本模型的ASR性能的影响。在这里,为了减轻使用未标记数据时ASR误差的影响,我们提出了多种假设的RNN-T损失,该损失将多个ASR 1最佳假设纳入损失函数中。对于微调任务,在LibrisPeech上进行的ASR实验表明,与test_other设置相比,与单类假设方法相比,多重肢体方法的相对降低可相对降低14.2%的单词错误率(WER)。对于自训练任务,使用来自华尔街日报(WSJ),Aurora-4的监督数据以及Chime-4真实嘈杂数据作为未标记的数据,对ASR模型进行了培训。与单障碍方法相比,多种假设方法在Chime-4的单渠道真实噪声评估集上相对减少了3.3%。
translated by 谷歌翻译
最近,机器学习(ML)电位的发展使得以量子力学(QM)模型的精度进行大规模和长期分子模拟成为可能。但是,对于高水平的QM方法,例如在元gga级和/或具有精确交换的密度函数理论(DFT),量子蒙特卡洛等,生成足够数量的用于训练的数据由于其高成本,计算挑战性。在这项工作中,我们证明了基于ML的DFT模型Deep Kohn-Sham(Deepks)可以在很大程度上缓解这个问题。 DeepKS采用计算高效的基于神经网络的功能模型来构建在廉价DFT模型上添加的校正项。在训练后,DeepKs提供了与高级QM方法相比,具有紧密匹配的能量和力,但是所需的训练数据的数量是比训练可靠的ML潜力所需的数量级要小。因此,DeepKs可以用作昂贵的QM型号和ML电位之间的桥梁:一个人可以生成相当数量的高准确性QM数据来训练DeepKs模型,然后使用DeepKs型号来标记大量的配置以标记训练ML潜力。该周期系统方案在DFT软件包算盘中实施,该计划是开源的,可以在各种应用程序中使用。
translated by 谷歌翻译
The combination of conduct, emotion, motivation, and thinking is referred to as personality. To shortlist candidates more effectively, many organizations rely on personality predictions. The firm can hire or pick the best candidate for the desired job description by grouping applicants based on the necessary personality preferences. A model is created to identify applicants' personality types so that employers may find qualified candidates by examining a person's facial expression, speech intonation, and resume. Additionally, the paper emphasises detecting the changes in employee behaviour. Employee attitudes and behaviour towards each set of questions are being examined and analysed. Here, the K-Modes clustering method is used to predict employee well-being, including job pressure, the working environment, and relationships with peers, utilizing the OCEAN Model and the CNN algorithm in the AVI-AI administrative system. Findings imply that AVIs can be used for efficient candidate screening with an AI decision agent. The study of the specific field is beyond the current explorations and needed to be expanded with deeper models and new configurations that can patch extremely complex operations.
translated by 谷歌翻译
Model Predictive Controllers (MPC) are widely used for controlling cyber-physical systems. It is an iterative process of optimizing the prediction of the future states of a robot over a fixed time horizon. MPCs are effective in practice, but because they are computationally expensive and slow, they are not well suited for use in real-time applications. Overcoming the flaw can be accomplished by approximating an MPC's functionality. Neural networks are very good function approximators and are faster compared to an MPC. It can be challenging to apply neural networks to control-based applications since the data does not match the i.i.d assumption. This study investigates various imitation learning methods for using a neural network in a control-based environment and evaluates their benefits and shortcomings.
translated by 谷歌翻译
Abstractive dialogue summarization has received increasing attention recently. Despite the fact that most of the current dialogue summarization systems are trained to maximize the likelihood of human-written summaries and have achieved significant results, there is still a huge gap in generating high-quality summaries as determined by humans, such as coherence and faithfulness, partly due to the misalignment in maximizing a single human-written summary. To this end, we propose to incorporate different levels of human feedback into the training process. This will enable us to guide the models to capture the behaviors humans care about for summaries. Specifically, we ask humans to highlight the salient information to be included in summaries to provide the local feedback , and to make overall comparisons among summaries in terms of coherence, accuracy, coverage, concise and overall quality, as the global feedback. We then combine both local and global feedback to fine-tune the dialog summarization policy with Reinforcement Learning. Experiments conducted on multiple datasets demonstrate the effectiveness and generalization of our methods over the state-of-the-art supervised baselines, especially in terms of human judgments.
translated by 谷歌翻译
Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.
translated by 谷歌翻译
Experimental sciences have come to depend heavily on our ability to organize, interpret and analyze high-dimensional datasets produced from observations of a large number of variables governed by natural processes. Natural laws, conservation principles, and dynamical structure introduce intricate inter-dependencies among these observed variables, which in turn yield geometric structure, with fewer degrees of freedom, on the dataset. We show how fine-scale features of this structure in data can be extracted from \emph{discrete} approximations to quantum mechanical processes given by data-driven graph Laplacians and localized wavepackets. This data-driven quantization procedure leads to a novel, yet natural uncertainty principle for data analysis induced by limited data. We illustrate the new approach with algorithms and several applications to real-world data, including the learning of patterns and anomalies in social distancing and mobility behavior during the COVID-19 pandemic.
translated by 谷歌翻译
Generative Adversarial Networks (GANs) have received wide acclaim among the machine learning (ML) community for their ability to generate realistic 2D images. ML is being applied more often to complex problems beyond those of computer vision. However, current frameworks often serve as black boxes and lack physics embeddings, leading to poor ability in enforcing constraints and unreliable models. In this work, we develop physics embeddings that can be stringently imposed, referred to as hard constraints, in the neural network architecture. We demonstrate their capability for 3D turbulence by embedding them in GANs, particularly to enforce the mass conservation constraint in incompressible fluid turbulence. In doing so, we also explore and contrast the effects of other methods of imposing physics constraints within the GANs framework, especially penalty-based physics constraints popular in literature. By using physics-informed diagnostics and statistics, we evaluate the strengths and weaknesses of our approach and demonstrate its feasibility.
translated by 谷歌翻译
The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this process, we help expand information access in Gondi across 2 different dimensions (a) The creation of linguistic resources that can be used by the community, such as a dictionary, children's stories, Gondi translations from multiple sources and an Interactive Voice Response (IVR) based mass awareness platform; (b) Enabling its use in the digital domain by developing a Hindi-Gondi machine translation model, which is compressed by nearly 4 times to enable it's edge deployment on low-resource edge devices and in areas of little to no internet connectivity. We also present preliminary evaluations of utilizing the developed machine translation model to provide assistance to volunteers who are involved in collecting more data for the target language. Through these interventions, we not only created a refined and evaluated corpus of 26,240 Hindi-Gondi translations that was used for building the translation model but also engaged nearly 850 community members who can help take Gondi onto the internet.
translated by 谷歌翻译