The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance and Dynamic Time Warping distance are known to be extremely effective. However, for time series containing cyclical behaviors, the semantic meaningfulness of such comparisons is less clear. For example, on two separate days the telemetry from an athlete workout routine might be very similar. The second day may change the order in of performing push-ups and squats, adding repetitions of pull-ups, or completely omitting dumbbell curls. Any of these minor changes would defeat existing time series distance measures. Some bag-of-features methods have been proposed to address this problem, but we argue that in many cases, similarity is intimately tied to the shapes of subsequences within these longer time series. In such cases, summative features will lack discrimination ability. In this work we introduce PRCIS, which stands for Pattern Representation Comparison in Series. PRCIS is a distance measure for long time series, which exploits recent progress in our ability to summarize time series with dictionaries. We will demonstrate the utility of our ideas on diverse tasks and datasets.
translated by 谷歌翻译
Non-IID data distribution across clients and poisoning attacks are two main challenges in real-world federated learning systems. While both of them have attracted great research interest with specific strategies developed, no known solution manages to address them in a unified framework. To jointly overcome both challenges, we propose SmartFL, a generic approach that optimizes the server-side aggregation process with a small clean server-collected proxy dataset (e.g., around one hundred samples, 0.2% of the dataset) via a subspace training technique. Specifically, the aggregation weight of each participating client at each round is optimized using the server-collected proxy data, which is essentially the optimization of the global model in the convex hull spanned by client models. Since at each round, the number of tunable parameters optimized on the server side equals the number of participating clients (thus independent of the model size), we are able to train a global model with massive parameters using only a small amount of proxy data. We provide theoretical analyses of the convergence and generalization capacity for SmartFL. Empirically, SmartFL achieves state-of-the-art performance on both federated learning with non-IID data distribution and federated learning with malicious clients. The source code will be released.
translated by 谷歌翻译
Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods. The code and model are publicly available at PaddleSpeech.
translated by 谷歌翻译
半监督学习(SSL)通过利用大量未标记数据来增强有限标记的样品来改善模型的概括。但是,目前,流行的SSL评估协议通常受到计算机视觉(CV)任务的约束。此外,以前的工作通常从头开始训练深层神经网络,这是耗时且环境不友好的。为了解决上述问题,我们通过从简历,自然语言处理(NLP)和音频处理(AUDIO)中选择15种不同,具有挑战性和全面的任务来构建统一的SSL基准(USB),我们会系统地评估主导的SSL方法,以及开源的一个模块化和可扩展的代码库,以对这些SSL方法进行公平评估。我们进一步为简历任务提供了最新的神经模型的预训练版本,以使成本负担得起,以进行进一步调整。 USB启用对来自多个域的更多任务的单个SSL算法的评估,但成本较低。具体而言,在单个NVIDIA V100上,仅需要37个GPU天才能在USB中评估15个任务的FIXMATCH,而335 GPU天(除ImageNet以外的4个CV数据集中的279 GPU天)在使用典型协议的5个CV任务上需要进行5个CV任务。
translated by 谷歌翻译
基于深度学习的模型占主导地位的生产推荐系统的当前景观。此外,近年来目睹了模型规模的指数增长 - 从谷歌的2016年模型,最新的Facebook的型号有10亿个参数,具有12万亿参数。型号容量的每次跳跃都有显着的质量增强,这使我们相信100万亿参数的时代即将来临。然而,即使在工业规模数据中心内,这些模型的培训也在挑战。这种困难是从训练计算的惊人的异质性继承 - 模型的嵌入层可以包括总模型尺寸的99.99%,这是极其内存密集的;虽然其余的神经网络越来越多地计算密集型。为支持培训此类巨大模式,迫切需要有效的分布式培训系统。在本文中,我们通过仔细共同设计优化算法和分布式系统架构来解决这一挑战。具体而言,为了确保培训效率和训练精度,我们设计一种新型混合训练算法,其中嵌入层和密集的神经网络由不同的同步机制处理;然后,我们构建一个名为Persia的系统(短暂的并行推荐培训系统,其中包含混合加速),以支持这种混合培训算法。理论上的示范和实证研究均达到100万亿参数,以证明了波斯的系统设计和实施。我们将Pensia公开使用(在https://github.com/persiamml/persia),以便任何人都能够以100万亿参数的规模轻松培训推荐模型。
translated by 谷歌翻译
软致动器在符合性和形态方面表现出具有很大的优势,用于操纵细腻物体和在密闭空间中的检查。对于可以提供扭转运动的软致动器有一个未满足的需要。放大工作空间并增加自由度。为此目标,我们呈现由硅胶制成的折纸启发的软充气执行器(OSPas)。原型可以输出多于一个旋转的旋转(高达435 {\ DEG}),比以前的同行更大。我们描述了设计和制作方法,构建了运动学模型和仿真模型,并分析和优化参数。最后,我们通过整合到能够同时抓住和提升脆弱或扁平物体的夹具,这是一种能够与扭转致动器的直角拾取和放置物品的多功能机器人,以及柔软的蛇通过扭转致动器的扭转能够改变姿态和方向的机器人。
translated by 谷歌翻译
自二十年前引入以来,人们对时间序列的早期分类问题一直在越来越兴趣。这个问题概括了经典的时间序列分类,以询问我们是否可以在仅看到目标模式的某些前缀后,以足够的准确性和置信度分类。这个想法是,较早的分类将使我们能够立即采取行动,在某个实践干预措施的领域中。例如,该干预措施可能会发出警报或在汽车中施加制动器。在这项工作中,我们提出了令人惊讶的主张。尽管有数十种有关时间序列的早期分类的论文,但尚不清楚它们中的任何一个都可以在现实世界中工作。问题本身本身不是算法,而是含糊不清的问题描述。从本质上讲,所有算法都对问题做出了隐式和不必要的假设,即使他们的结果表明他们可以获得近乎完美的结果,也会确保它们会受到误报和假否定的困扰。我们将通过新颖的见解和实验来解释我们的发现,并向社区提供建议。
translated by 谷歌翻译
时间序列的异常检测一直是数据科学中常年重要的主题,论文可以追溯到1950年代。但是,近年来,对这个主题引起了人们的兴趣,其中很大程度上是由于深度学习在其他领域和其他时间序列任务中的成功驱动。这些论文中的大多数对Yahoo,Numenta,NASA等创建的一个或多个流行的基准数据集进行了测试。在这项工作中,我们提出了令人惊讶的主张。这些数据集中的大多数示例都遭受四个缺陷中的一个或多个。由于这四个缺陷,我们认为许多发表的异常检测算法的比较可能是不可靠的,更重要的是,近年来,许多明显的进展可能都是幻觉。除了证明这些主张外,我们还介绍了UCR时间序列异常存档。我们认为,该资源将通过为社区提供基准,从而可以在方法和有意义的总体进步范围之间进行有意义的比较,从而扮演与UCR时间序列分类档案相似的角色。
translated by 谷歌翻译
许多时间序列数据挖掘问题可以通过重复使用距离度量来解决。此类任务的示例包括相似性搜索,聚类,分类,异常检测和分割。在过去的二十年中,人们已经知道,在大多数域中,动态时间扭曲(DTW)距离度量是用于大多数任务的最佳措施。由于经典的DTW算法具有二次的时间复杂性,因此引入了许多想法,以减少其摊销时间或快速近似它。最引用的近似方法之一是FastDTW。 FastDTW算法已有超过一千个引用,并已在数百个研究工作中明确使用。在这项工作中,我们提出了令人惊讶的主张。在任何现实的数据挖掘应用程序中,近似FastDTW都比确切的DTW慢得多。这个事实显然对使用此算法的社区具有影响:允许其解决更大的数据集,获得确切的结果并在更少的时间内完成。
translated by 谷歌翻译
Most existing distillation methods ignore the flexible role of the temperature in the loss function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In general, the temperature controls the discrepancy between two distributions and can faithfully determine the difficulty level of the distillation task. Keeping a constant temperature, i.e., a fixed level of task difficulty, is usually sub-optimal for a growing student during its progressive learning stages. In this paper, we propose a simple curriculum-based technique, termed Curriculum Temperature for Knowledge Distillation (CTKD), which controls the task difficulty level during the student's learning career through a dynamic and learnable temperature. Specifically, following an easy-to-hard curriculum, we gradually increase the distillation loss w.r.t. the temperature, leading to increased distillation difficulty in an adversarial manner. As an easy-to-use plug-in technique, CTKD can be seamlessly integrated into existing knowledge distillation frameworks and brings general improvements at a negligible additional computation cost. Extensive experiments on CIFAR-100, ImageNet-2012, and MS-COCO demonstrate the effectiveness of our method. Our code is available at https://github.com/zhengli97/CTKD.
translated by 谷歌翻译