Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, SysML, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two.
translated by 谷歌翻译
近十年来,随着深度卷积神经网络(CNN)的发展,许多最先进的图像分类和音频分类算法取得了显着的成功。但是,大多数工作只利用单一类型的训练数据。在本文中,我们通过利用CNN对视觉(图像)和音频(声音)数据的组合来对鸟类进行分类的研究,该CNN已被稀疏地处理。具体而言,我们提出了基于CNN的融合策略(早期,中期,晚期)类型的多模态学习模型,以解决组合训练数据跨域的问题。我们提出的方法的优点在于我们可以利用CNN不仅从图像和音频数据(频谱图)中提取特征,而且还可以跨特征模式组合特征。在实验中,我们在综合CUB-200-2011标准数据集上训练和评估网络结构,结合我们最初收集的关于数据种类的音频数据集。我们观察到,利用两种数据的组合的模型优于仅用任何类型的数据训练的模型。我们还表明,转移学习可以显着提高分类性能。
translated by 谷歌翻译
鸟类分类在计算机视觉领域受到越来越多的关注,因其在生物学和环境研究中的应用前景广阔。由于严格区域定位和细粒度特征学习的挑战,认识鸟类物种很困难。在本文中,我们介绍了一种基于传递学习的多阶段训练方法。我们已经使用预训练掩模-RCNN和由Inception网(InceptionV3和InceptionResNetV2)组成的集合模型来分别从图像中获得鸟的定位和种类。我们的最终模型在CVIP2018 Challenge中提供的数据集上获得了0.5567或55.67%的F1分数。
translated by 谷歌翻译
我们评估了自动鸟类声音识别系统在模拟真实,典型应用的情况下的有效性。针对人群来源的听力记录数据收集的湿法分类算法,并限制我们的训练方法完全没有手动干预。因此,该方法可直接应用于多种物种集合的分析,其中标记由众源集合提供。我们根据实际情况对实际数量的候选类别评估了鸟类声音识别系统的性能。我们研究了两种规范分类方法的使用,这两种方法由于其广泛使用和易于解释而被选择,即ak最近邻(kNN)分类器具有基于直方图的功能和具有时间摘要功能的支持向量机(SVM)。我们进一步研究了从分类器的输出概率导出的确定性度量的使用,以增强类决策的可解释性和可靠性。我们的结果表明,两种识别方法都达到了相似的性能,但我们认为使用kNN分类器可以提供更多的灵活性。此外,我们表明采用结果确定性度量提供了有价值且一致的分类结果可靠性指标。我们对通用训练数据的使用以及我们对概率分类方法的研究可以灵活地解决预期在野外遇到的可变数量的候选物种/类别,直接有助于开发具有潜在全球应用的实用鸟类声音识别系统。此外,我们表明,与识别结果相关的确定性措施可以显着地促进整个系统的实际可用性。
translated by 谷歌翻译
Fine-grained image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. In this paper, we prove that selecting useful deep descriptors contributes well to fine-grained image recognition. Specifically, a novel Mask-CNN model without the fully connected layers is proposed. Based on the part annotations, the proposed model consists of a fully convolutional network to both locate the discriminative parts (e.g. , head and torso), and more importantly generate weighted ob-ject/part masks for selecting useful and meaningful convolutional descriptors. After that, a three-stream Mask-CNN model is built for aggregating the selected object-and part-level descriptors simultaneously. Thanks to discarding the parameter redundant fully connected layers, our Mask-CNN has a small feature dimensionality and efficient inference speed by comparing with other fine-grained approaches. Furthermore , we obtain a new state-of-the-art accuracy on two challenging fine-grained bird species categoriza-tion datasets, which validates the effectiveness of both the descriptor selection scheme and the proposed Mask-CNN model.
translated by 谷歌翻译
Bird sounds possess distinctive spectral structure which may exhibit small shifts in spectrum depending on the bird species and environmental conditions. In this paper, we propose using convolutional recurrent neural networks on the task of automated bird audio detection in real-life environments. In the proposed method, convolutional layers extract high dimensional, local frequency shift invariant features, while recurrent layers capture longer term dependencies between the features extracted from short time frames. This method achieves 88.5% Area Under ROC Curve (AUC) score on the unseen evaluation data and obtains the second place in the Bird Audio Detection challenge.
translated by 谷歌翻译
Highlights: A set of acoustic indices is selected to characterise one-minute audio clips. Three multi-label classifiers are compared in detecting five acoustic patterns. Multi-label classifiers have the advantages of classifying concomitant classes. Classification methods improve the efficiency for bird species surveys. The proposed methods are resilient to different weather conditions. Abstract Acoustics is a rich source of environmental information that can reflect the ecological dynamics. To deal with the escalating acoustic data, a variety of automated classification techniques have been used for acoustic patterns or scene recognition, including urban soundscapes such as streets and restaurants; and natural soundscapes such as raining and thundering. It is common to classify acoustic patterns under the assumption that a single type of soundscapes present in an audio clip. This assumption is reasonable for some carefully selected audios. However, only few experiments have been focused on classifying simultaneous acoustic patterns in long-duration recordings. This paper proposes a binary relevance based multi-label classification approach to recognise simultaneous acoustic patterns in one-minute audio clips. By utilising acoustic indices as global features and multilayer perceptron as a base classifier, we achieve good classification performance on in-the-field data. Compared with single-label classification, multi-label classification approach provides more detailed information about the distributions of various acoustic patterns in long-duration recordings. These results will merit further biodiversity investigations, such as bird species surveys.
translated by 谷歌翻译
This paper summarizes a method for purely audio-based bird species recognition through the application of convolutional neural networks. The approach is evaluated in the context of the LifeCLEF 2016 bird identification task-an open challenge conducted on a dataset containing 34 128 audio recordings representing 999 bird species from South America. Three different network architectures and a simple ensemble model are considered for this task, with the ensemble submission achieving a mean average precision of 41.2% (official score) and 52.9% for foreground species.
translated by 谷歌翻译
We introduce tools and methodologies to collect high quality, large scale fine-grained computer vision datasets using citizen scientists-crowd annotators who are passionate and knowledgeable about specific domains such as birds or airplanes. We worked with citizen scientists and domain experts to collect NABirds, a new high quality dataset containing 48,562 images of North American birds with 555 categories, part annotations and bounding boxes. We find that citizen scientists are significantly more accurate than Mechanical Turkers at zero cost. We worked with bird experts to measure the quality of popular datasets like CUB-200-2011 and ImageNet and found class label error rates of at least 4%. Nevertheless, we found that learning algorithms are surprisingly robust to annotation errors and this level of training data corruption can lead to an acceptably small increase in test error if the training set has sufficient size. At the same time, we found that an expert-curated high quality test set like NABirds is necessary to accurately measure the performance of fine-grained computer vision systems. We used NABirds to train a publicly available bird recognition service deployed on the web site of the Cornell Lab of Ornithology. 1
translated by 谷歌翻译
We propose an architecture for fine-grained visual categorization thatapproaches expert human performance in the classification of bird species. Ourarchitecture first computes an estimate of the object's pose; this is used tocompute local image features which are, in turn, used for classification. Thefeatures are computed by applying deep convolutional nets to image patches thatare located and normalized by the pose. We perform an empirical study of anumber of pose normalization schemes, including an investigation of higherorder geometric warping functions. We propose a novel graph-based clusteringalgorithm for learning a compact pose normalization space. We perform adetailed investigation of state-of-the-art deep convolutional featureimplementations and fine-tuning feature learning for fine-grainedclassification. We observe that a model that integrates lower-level featurelayers with pose-normalized extraction routines and higher-level feature layerswith unaligned image features works best. Our experiments advancestate-of-the-art performance on bird species recognition, with a largeimprovement of correct classification rates over previous methods (75% vs.55-65%).
translated by 谷歌翻译