The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaussian model of the feature distribution. By estimating the parameters of this model, we are able to predict the generalization error on new classification tasks with few samples. We observe that accurate distance estimates between class-conditional densities are the key to accurate estimates of the generalization performance. Therefore, we propose an unbiased estimator for these distances and integrate it in our numerical analysis. We show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy in few-shot settings.
translated by 谷歌翻译
我们考虑了一个新颖的表述,即主动射击分类(AFSC)的问题,其目的是对标签预算非常限制的小规定,最初未标记的数据集进行分类。这个问题可以看作是与经典的跨托管少数射击分类(TFSC)的竞争对手范式,因为这两种方法都适用于相似的条件。我们首先提出了一种结合统计推断的方法,以及一种非常适合该框架的原始两级积极学习策略。然后,我们从TFSC领域调整了几个标准视觉基准。我们的实验表明,AFSC的潜在优势可能是很大的,与最先进的TFSC方法相比,对于同一标签预算,平均加权准确性高达10%。我们认为,这种新的范式可能会导致数据筛选学习设置的新发展和标准。
translated by 谷歌翻译
考虑到数据注释的成本以及几乎没有标记的样本所提供的准确性提高,几乎没有射击的成本,几乎没有射击的转移学习越来越多。尤其是在少量分类(FSC)中,最近的作品探索了旨在最大程度地相对于未知参数的可能性或后二阶段的特征分布。遵循这种静脉,并考虑到FSC和聚类之间的平行,我们寻求更好地考虑到由于缺乏数据而导致的估计不确定性,以及与每个类相关的群集的统计属性更好。因此,在本文中,我们提出了一种基于变异贝叶斯推论的新聚类方法,基于概率线性判别分析,自适应维度降低进一步改善。当应用于先前研究中使用的功能时,我们提出的方法可显着提高在各种少量基准测试的现实不平衡转导设置中的准确性,其准确性高达$ 6 \%$。此外,当应用于平衡设置时,我们将获得非常有竞争力的结果,而无需使用对实际用例的级别平衡伪像。我们还提供了方法的性能,以高性能的主链链链,其报告的结果进一步超过了当前的最新准确性,这表明该方法的通用性。
translated by 谷歌翻译
标记分类数据集意味着定义类和相关的粗标签,这可能会近似一个更光滑,更复杂的地面真理。例如,自然图像可能包含多个对象,其中只有一个对象在许多视觉数据集中标记,或者可以是由于回归问题的离散化而导致的。在此类粗标签上使用跨凝结训练分类模型可能会大致介绍特征空间,可能会忽略最有意义的此类功能,特别是在基础细粒任务上失去信息。在本文中,我们对仅在粗粒标签上训练的模型来解决细粒分类或回归的问题感兴趣。我们表明,标准的跨凝结可能导致与粗相关的特征过度拟合。我们引入了基于熵的正则化,以促进训练有素的模型的特征空间中的更多多样性,并从经验上证明了这种方法的功效,以在细粒度问题上提高性能。通过理论发展和经验验证,我们的结果得到了支持。
translated by 谷歌翻译
在许多计算机视觉任务中,深度神经网络是最新的。它们在自动驾驶汽车的背景下的部署特别令人感兴趣,因为它们在能源消耗方面的局限性禁止使用非常大的网络,这通常达到最佳性能。在不牺牲准确性的情况下,降低这些体系结构的复杂性的一种常见方法是依靠修剪,其中消除了最不重要的部分。关于该主题有很多文献,但有趣的是,很少有作品衡量修剪对能源的实际影响。在这项工作中,我们有兴趣使用CityScapes数据集在语义细分的特定语义细分中对其进行测量。为此,我们分析了最近提出的结构化修剪方法的影响,当训练有素的体系结构被部署在Jetson Xavier嵌入式GPU上。
translated by 谷歌翻译
结构化修剪是一种降低卷积神经网络成本的流行方法,这是许多计算机视觉任务中最先进的方法。但是,根据体系结构,修剪会引入维数差异,以防止实际减少修剪的网络。为了解决这个问题,我们提出了一种能够采用任何结构化的修剪面膜并生成一个不会遇到这些问题的网络并可以有效利用的网络。我们提供了对解决方案的准确描述,并显示了嵌入式硬件,修剪卷积神经网络的能源消耗和推理时间的增长结果。
translated by 谷歌翻译
混合是一种数据相关的正则化技术,其包括线性内插输入样本和相关输出。它已被证明在用于培训标准机器学习数据集时提高准确性。然而,作者已经指出,混合可以在增强训练集中产生分配的虚拟样本,甚至是矛盾,可能导致对抗效应。在本文中,我们介绍了当地混合,其中在计算损失时加权远处输入样本。在约束的环境中,我们证明了本地混合可以在偏差和方差之间产生权衡,极端情况降低了香草培训和古典混合。使用标准化的计算机视觉基准测试,我们还表明本地混合可以提高测试精度。
translated by 谷歌翻译
Charisma is considered as one's ability to attract and potentially also influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. A number of models exist that base charisma on various dimensions, often following the idea that charisma is given if someone could and would help others. Examples include influence (could help) and affability (would help) in scientific studies or power (could help), presence, and warmth (both would help) as a popular concept. Modelling high levels in these dimensions for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we, thereforem present a blueprint for building machines that can appear charismatic, but also analyse the charisma of others. To this end, we first provide the psychological perspective including different models of charisma and behavioural cues of it. We then switch to conversational charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behaviour by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.
translated by 谷歌翻译
There are two important things in science: (A) Finding answers to given questions, and (B) Coming up with good questions. Our artificial scientists not only learn to answer given questions, but also continually invent new questions, by proposing hypotheses to be verified or falsified through potentially complex and time-consuming experiments, including thought experiments akin to those of mathematicians. While an artificial scientist expands its knowledge, it remains biased towards the simplest, least costly experiments that still have surprising outcomes, until they become boring. We present an empirical analysis of the automatic generation of interesting experiments. In the first setting, we investigate self-invented experiments in a reinforcement-providing environment and show that they lead to effective exploration. In the second setting, pure thought experiments are implemented as the weights of recurrent neural networks generated by a neural experiment generator. Initially interesting thought experiments may become boring over time.
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译