Whether based on models, training data or a combination, classifiers place (possibly complex) input data into one of a relatively small number of output categories. In this paper, we study the structure of the boundary--those points for which a neighbor is classified differently--in the context of an input space that is a graph, so that there is a concept of neighboring inputs, The scientific setting is a model-based naive Bayes classifier for DNA reads produced by Next Generation Sequencers. We show that the boundary is both large and complicated in structure. We create a new measure of uncertainty, called Neighbor Similarity, that compares the result for a point to the distribution of results for its neighbors. This measure not only tracks two inherent uncertainty measures for the Bayes classifier, but also can be implemented, at a computational cost, for classifiers without inherent measures of uncertainty.
translated by 谷歌翻译
在本文中,我们将基因组的结构作为由碱基碱基表的连续三胞胎的分布规定的二阶马尔可夫过程:使用真正的冠状病毒和腺病毒数据鉴定基因组数据库中的异常值和读取分类。
translated by 谷歌翻译
我们提出并申请新的范式以表征基因组数据质量,这量化了有意降解质量的影响。理由是,初始质量越高,基因组越脆弱,降解的影响越大。我们证明这种现象是普遍无处不在的,并且可以用于多种目的来使用量化的降解测量。我们专注于识别对数据质量可能有问题的异常值,但也可能是真正的异常甚至试图颠覆数据库。
translated by 谷歌翻译
Calculating an Air Quality Index (AQI) typically uses data streams from air quality sensors deployed at fixed locations and the calculation is a real time process. If one or a number of sensors are broken or offline, then the real time AQI value cannot be computed. Estimating AQI values for some point in the future is a predictive process and uses historical AQI values to train and build models. In this work we focus on gap filling in air quality data where the task is to predict the AQI at 1, 5 and 7 days into the future. The scenario is where one or a number of air, weather and traffic sensors are offline and explores prediction accuracy under such situations. The work is part of the MediaEval'2022 Urban Air: Urban Life and Air Pollution task submitted by the DCU-Insight-AQ team and uses multimodal and crossmodal data consisting of AQI, weather and CCTV traffic images for air pollution prediction.
translated by 谷歌翻译
As part of the MediaEval 2022 Predicting Video Memorability task we explore the relationship between visual memorability, the visual representation that characterises it, and the underlying concept portrayed by that visual representation. We achieve state-of-the-art memorability prediction performance with a model trained and tested exclusively on surrogate dream images, elevating concepts to the status of a cornerstone memorability feature, and finding strong evidence to suggest that the intrinsic memorability of visual content can be distilled to its underlying concept or meaning irrespective of its specific visual representational.
translated by 谷歌翻译
This paper describes the 5th edition of the Predicting Video Memorability Task as part of MediaEval2022. This year we have reorganised and simplified the task in order to lubricate a greater depth of inquiry. Similar to last year, two datasets are provided in order to facilitate generalisation, however, this year we have replaced the TRECVid2019 Video-to-Text dataset with the VideoMem dataset in order to remedy underlying data quality issues, and to prioritise short-term memorability prediction by elevating the Memento10k dataset as the primary dataset. Additionally, a fully fledged electroencephalography (EEG)-based prediction sub-task is introduced. In this paper, we outline the core facets of the task and its constituent sub-tasks; describing the datasets, evaluation metrics, and requirements for participant submissions.
translated by 谷歌翻译
Recent developments in in-situ monitoring and process control in Additive Manufacturing (AM), also known as 3D-printing, allows the collection of large amounts of emission data during the build process of the parts being manufactured. This data can be used as input into 3D and 2D representations of the 3D-printed parts. However the analysis and use, as well as the characterization of this data still remains a manual process. The aim of this paper is to propose an adaptive human-in-the-loop approach using Machine Learning techniques that automatically inspect and annotate the emissions data generated during the AM process. More specifically, this paper will look at two scenarios: firstly, using convolutional neural networks (CNNs) to automatically inspect and classify emission data collected by in-situ monitoring and secondly, applying Active Learning techniques to the developed classification model to construct a human-in-the-loop mechanism in order to accelerate the labeling process of the emission data. The CNN-based approach relies on transfer learning and fine-tuning, which makes the approach applicable to other industrial image patterns. The adaptive nature of the approach is enabled by uncertainty sampling strategy to automatic selection of samples to be presented to human experts for annotation.
translated by 谷歌翻译
The Predicting Media Memorability task in the MediaEval evaluation campaign has been running annually since 2018 and several different tasks and data sets have been used in this time. This has allowed us to compare the performance of many memorability prediction techniques on the same data and in a reproducible way and to refine and improve on those techniques. The resources created to compute media memorability are now being used by researchers well beyond the actual evaluation campaign. In this paper we present a summary of the task, including the collective lessons we have learned for the research community.
translated by 谷歌翻译
我们通过在预测视频记忆力的任务中对视觉变压器进行了微调,调查了流行的犯罪剧电视连续剧CSI的5季跨度的记忆。通过使用详细的注释语料库结合视频记忆性分数来调查犯罪戏剧电视的流行类型,我们展示了如何从视频拍摄中产生的记忆性分数中推断出含义。我们执行定量分析,将视频拍摄的记忆与节目的各个方面相关联。我们在本文中提供的见解说明了视频记忆力在应用程序中使用多媒体在教育,市场营销,索引以及在这里的情况下的重要性的重要性,即电视和电影制作。
translated by 谷歌翻译
尽管现在使用自我监督方法构建的计算机视觉模型现在很普遍,但仍然存在一些重要问题。自我监督的模型是否学习高度冗余的频道功能?如果一个自我监督的网络可以动态选择重要的渠道并摆脱不必要的渠道怎么办?目前,与计算机视觉中的有监督的对手相比,通过自我训练预先训练的Convnet在下游任务上获得了可比的性能。但是,有一些自我监督模型的缺点,包括大量参数,计算昂贵的培训策略以及对下游任务更快推断的明确需求。在这项工作中,我们的目标是通过研究如何将用于监督学习的标准渠道选择方法应用于经过自学训练的网络。我们验证我们在一系列目标预算上验证我们的发现$ t_ {d} $,用于跨不同数据集的图像分类任务的频道计算,特别是CIFAR-10,CIFAR-100和IMAGENET-100,获得了与原始网络的可比性性能when selecting all channels but at a significant reduction in computation reported in terms of FLOPs.
translated by 谷歌翻译