拓扑数据分析(TDA)是一种旨在发现隐藏在数据集中的拓扑信息的紧急领域。 TDA工具通常用于创建滤波器和拓扑描述符以改善机器学习(ML)方法。本文提出了一种算法,该算法将TDA直接应用于多级分类问题,而无需任何进一步的ML阶段,为不平衡数据集显示出优势。该算法在数据集上构建了一个过滤的单纯复合体。持续同源性(pH)被应用于指导选择未标记点的亚络合物,从标记的相邻点中获得大多数选票。我们选择具有不同尺寸的8个数据集,每类具有不同程度的类重叠和不平衡样本。平均而言,所提出的TDABC方法优于KNN和加权KNN。它在平衡数据集中的本地SVM和随机森林基线分类器竞争地表现得很竞争,并且它优于分类纠缠和少数群体的所有基线方法。
translated by 谷歌翻译
The generalisation performance of a convolutional neural networks (CNN) is majorly predisposed by the quantity, quality, and diversity of the training images. All the training data needs to be annotated in-hand before, in many real-world applications data is easy to acquire but expensive and time-consuming to label. The goal of the Active learning for the task is to draw most informative samples from the unlabeled pool which can used for training after annotation. With total different objective, self-supervised learning which have been gaining meteoric popularity by closing the gap in performance with supervised methods on large computer vision benchmarks. self-supervised learning (SSL) these days have shown to produce low-level representations that are invariant to distortions of the input sample and can encode invariance to artificially created distortions, e.g. rotation, solarization, cropping etc. self-supervised learning (SSL) approaches rely on simpler and more scalable frameworks for learning. In this paper, we unify these two families of approaches from the angle of active learning using self-supervised learning mainfold and propose Deep Active Learning using BarlowTwins(DALBT), an active learning method for all the datasets using combination of classifier trained along with self-supervised loss framework of Barlow Twins to a setting where the model can encode the invariance of artificially created distortions, e.g. rotation, solarization, cropping etc.
translated by 谷歌翻译
组织依靠机器学习工程师(MLE)来操作ML,即部署和维护生产中的ML管道。操作ML或MLOP的过程包括(i)数据收集和标记的连续循环,(ii)实验以改善ML性能,(iii)在多阶段部署过程中评估,以及(iv)监视(iv)性能下降。当一起考虑这些责任似乎令人震惊 - 任何人如何进行MLOP,没有解决的挑战,对工具制造商有什么影响?我们对在包括聊天机器人,自动驾驶汽车和金融在内的许多应用程序中工作的18个MLE进行了半结构化的民族志访谈。我们的访谈暴露了三个变量,这些变量控制了生产ML部署的成功:速度,验证和版本。我们总结了成功实验,部署和维持生产绩效的共同实践。最后,我们讨论了受访者的痛点和反图案,对工具设计产生了影响。
translated by 谷歌翻译
translated by 谷歌翻译