面部表情因人而异,每个随机图像的亮度,对比度和分辨率都不同。这就是为什么识别面部表情非常困难的原因。本文使用卷积神经网络(CNN)提出了一个有效的面部情感识别系统(愤怒,厌恶,恐惧,幸福,悲伤,惊喜和中立)的系统,该系统可以预测和分配每种情绪的概率。由于深度学习模型从数据中学习,因此,我们提出的系统通过各种预处理步骤处理每个图像,以更好地预测。首先通过面部检测算法将每个图像都包含在训练数据集中。由于CNN需要大量数据,因此我们使用每个图像上的各种过滤器重复了数据。将大小80*100的预处理图像作为输入传递到CNN的第一层。使用了三个卷积层,其次是合并层和三层密集层。致密层的辍学率为20%。该模型通过组合两个公开可用的数据集(Jaffe和Kdef)进行了培训。 90%的数据用于培训,而10%用于测试。我们使用合并的数据集实现了78.1%的最高精度。此外,我们设计了提出的系统的应用程序,该系统具有图形用户界面,该界面实时对情绪进行了分类。
translated by 谷歌翻译
深度学习在涉及数据的每个领域都起着至关重要的作用。它已经成为一个强大而有效的框架,可以应用于广泛的复杂学习问题,这些问题过去很难使用传统的机器学习技术来解决。在这项研究中,我们专注于用深度学习技术的蛋白质序列分类。氨基酸序列的研究在生命科学中至关重要。我们使用自然语言处理中不同单词嵌入技术来表示氨基酸序列作为向量。我们的主要目标是将序列分类为DNA,RNA,蛋白质和杂交的四组类别。经过几次测试,我们达到了近99%的火车和测试准确性。我们已经在CNN,LSTM,双向LSTM和GRU上进行了实验。
translated by 谷歌翻译
氨基酸的分类及其序列分析在生命科学中起着至关重要的作用,并且是一项艰巨的任务。本文使用并比较了最新的深度学习模型,例如卷积神经网络(CNN),长期记忆(LSTM)和门控复发单元(GRU),以解决使用氨基酸的大分子分类问题。与传统的机器学习技术相比,这些模型具有有效的框架来解决广泛的复杂学习问题。我们使用嵌入单词来表示氨基酸序列作为向量。CNN从氨基酸序列中提取特征,这些特征被视为向量,然后喂入上面提到的模型以训练健壮的分类器。我们的结果表明,嵌入与VGG-16相结合的Word2Vec的性能比LSTM和GRU更好。提出的方法的错误率为1.5%。
translated by 谷歌翻译
Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we propose learning these view-points. Specifically, we introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. As a result, MVTN can be trained end-to-end with any multi-view network for 3D shape classification. We integrate MVTN into a novel adaptive multi-view pipeline that is capable of rendering both 3D meshes and point clouds. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55). Further analysis indicates that our approach exhibits improved robustness to occlusion compared to other methods. We also investigate additional aspects of MVTN, such as 2D pretraining and its use for segmentation. To support further research in this area, we have released MVTorch, a PyTorch library for 3D understanding and generation using multi-view projections.
translated by 谷歌翻译
Automatic medical image classification is a very important field where the use of AI has the potential to have a real social impact. However, there are still many challenges that act as obstacles to making practically effective solutions. One of those is the fact that most of the medical imaging datasets have a class imbalance problem. This leads to the fact that existing AI techniques, particularly neural network-based deep-learning methodologies, often perform poorly in such scenarios. Thus this makes this area an interesting and active research focus for researchers. In this study, we propose a novel loss function to train neural network models to mitigate this critical issue in this important field. Through rigorous experiments on three independently collected datasets of three different medical imaging domains, we empirically show that our proposed loss function consistently performs well with an improvement between 2%-10% macro f1 when compared to the baseline models. We hope that our work will precipitate new research toward a more generalized approach to medical image classification.
translated by 谷歌翻译
Recent work has identified noisy and misannotated data as a core cause of hallucinations and unfaithful outputs in Natural Language Generation (NLG) tasks. Consequently, identifying and removing these examples is a key open challenge in creating reliable NLG systems. In this work, we introduce a framework to identify and remove low-quality training instances that lead to undesirable outputs, such as faithfulness errors in text summarization. We show that existing approaches for error tracing, such as gradient-based influence measures, do not perform reliably for detecting faithfulness errors in summarization. We overcome the drawbacks of existing error tracing methods through a new, contrast-based estimate that compares undesired generations to human-corrected outputs. Our proposed method can achieve a mean average precision of 0.91 across synthetic tasks with known ground truth and can achieve a two-fold reduction in hallucinations on a real entity hallucination evaluation on the NYT dataset.
translated by 谷歌翻译
Pretrained language models (PLMs) often fail to fairly represent target users from certain world regions because of the under-representation of those regions in training datasets. With recent PLMs trained on enormous data sources, quantifying their potential biases is difficult, due to their black-box nature and the sheer scale of the data sources. In this work, we devise an approach to study the geographic bias (and knowledge) present in PLMs, proposing a Geographic-Representation Probing Framework adopting a self-conditioning method coupled with entity-country mappings. Our findings suggest PLMs' representations map surprisingly well to the physical world in terms of country-to-country associations, but this knowledge is unequally shared across languages. Last, we explain how large PLMs despite exhibiting notions of geographical proximity, over-amplify geopolitical favouritism at inference time.
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
Reliable forecasting of traffic flow requires efficient modeling of traffic data. Different correlations and influences arise in a dynamic traffic network, making modeling a complicated task. Existing literature has proposed many different methods to capture the complex underlying spatial-temporal relations of traffic networks. However, methods still struggle to capture different local and global dependencies of long-range nature. Also, as more and more sophisticated methods are being proposed, models are increasingly becoming memory-heavy and, thus, unsuitable for low-powered devices. In this paper, we focus on solving these problems by proposing a novel deep learning framework - STLGRU. Specifically, our proposed STLGRU can effectively capture both local and global spatial-temporal relations of a traffic network using memory-augmented attention and gating mechanism. Instead of employing separate temporal and spatial components, we show that our memory module and gated unit can learn the spatial-temporal dependencies successfully, allowing for reduced memory usage with fewer parameters. We extensively experiment on several real-world traffic prediction datasets to show that our model performs better than existing methods while the memory footprint remains lower. Code is available at \url{https://github.com/Kishor-Bhaumik/STLGRU}.
translated by 谷歌翻译
This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts. Summarizing these creative documents requires making complex literary interpretations, as well as understanding non-trivial temporal dependencies in texts containing varied styles of plot development and narrative structure. This poses unique challenges and is yet underexplored for text summarization systems. In this shared task, we introduce four sub-tasks and their corresponding datasets, focusing on summarizing books, movie scripts, primetime television scripts, and daytime soap opera scripts. We detail the process of curating these datasets for the task, as well as the metrics used for the evaluation of the submissions. As part of the CREATIVESUMM workshop at COLING 2022, the shared task attracted 18 submissions in total. We discuss the submissions and the baselines for each sub-task in this paper, along with directions for facilitating future work in the field.
translated by 谷歌翻译