智能论文笔记

Matrix Profile XXVII: A Novel Distance Measure for Comparing Long Time Series

Audrey Der , Chin-Chia Michael Yeh , Renjie Wu , Junpeng Wang , Yan Zheng , Zhongfang Zhuang , Liang Wang , Wei Zhang , Eamonn Keogh

分类：机器学习 | 人工智能

2022-12-09

The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance and Dynamic Time Warping distance are known to be extremely effective. However, for time series containing cyclical behaviors, the semantic meaningfulness of such comparisons is less clear. For example, on two separate days the telemetry from an athlete workout routine might be very similar. The second day may change the order in of performing push-ups and squats, adding repetitions of pull-ups, or completely omitting dumbbell curls. Any of these minor changes would defeat existing time series distance measures. Some bag-of-features methods have been proposed to address this problem, but we argue that in many cases, similarity is intimately tied to the shapes of subsequences within these longer time series. In such cases, summative features will lack discrimination ability. In this work we introduce PRCIS, which stands for Pattern Representation Comparison in Series. PRCIS is a distance measure for long time series, which exploits recent progress in our ability to summarize time series with dictionaries. We will demonstrate the utility of our ideas on diverse tasks and datasets.

translated by 谷歌翻译

Error-bounded Approximate Time Series Joins using Compact Dictionary Representations of Time Series

Chin-Chia Michael Yeh , Yan Zheng , Junpeng Wang , Huiyuan Chen , Zhongfang Zhuang , Wei Zhang , Eamonn Keogh

分类：机器学习

2021-12-24

矩阵配置文件是一种有效的数据挖掘工具，可提供时间序列数据的相似关系。矩阵配置文件的用户可以使用相似性连接（即，自行连接）或使用相似性相互作用连接使用另一个时间序列加入时间序列。通过调用或两种类型的连接，矩阵配置文件可以帮助用户在数据中发现保守和异常结构。自从五年前引入矩阵简介以来，已经进行了多项努力，以加快近似联合的计算;然而，大多数这些努力只关注自我连接。在这项工作中，我们表明可以通过创建时间序列的紧凑“字典”表示，有效地使用误差限制保证来执行近似时间序列相似度。使用字典表示而不是原始时间序列，我们能够将异常挖掘系统的吞吐量至少为20倍，基本上没有准确度降低。作为副作用，字典还以语义有意义的方式总结时间序列，可以提供直观和可操作的见解。我们展示了我们的字典的内部序列相似性的实用性，如医学和运输所多样化的域。

translated by 谷歌翻译

When is Early Classification of Time Series Meaningful?

Renjie Wu , Audrey Der , Eamonn J. Keogh

分类：机器学习

2021-02-23

自二十年前引入以来，人们对时间序列的早期分类问题一直在越来越兴趣。这个问题概括了经典的时间序列分类，以询问我们是否可以在仅看到目标模式的某些前缀后，以足够的准确性和置信度分类。这个想法是，较早的分类将使我们能够立即采取行动，在某个实践干预措施的领域中。例如，该干预措施可能会发出警报或在汽车中施加制动器。在这项工作中，我们提出了令人惊讶的主张。尽管有数十种有关时间序列的早期分类的论文，但尚不清楚它们中的任何一个都可以在现实世界中工作。问题本身本身不是算法，而是含糊不清的问题描述。从本质上讲，所有算法都对问题做出了隐式和不必要的假设，即使他们的结果表明他们可以获得近乎完美的结果，也会确保它们会受到误报和假否定的困扰。我们将通过新颖的见解和实验来解释我们的发现，并向社区提供建议。

translated by 谷歌翻译

Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress

Renjie Wu , Eamonn J. Keogh

分类：机器学习 | (统计)机器学习

2020-09-29

时间序列的异常检测一直是数据科学中常年重要的主题，论文可以追溯到1950年代。但是，近年来，对这个主题引起了人们的兴趣，其中很大程度上是由于深度学习在其他领域和其他时间序列任务中的成功驱动。这些论文中的大多数对Yahoo，Numenta，NASA等创建的一个或多个流行的基准数据集进行了测试。在这项工作中，我们提出了令人惊讶的主张。这些数据集中的大多数示例都遭受四个缺陷中的一个或多个。由于这四个缺陷，我们认为许多发表的异常检测算法的比较可能是不可靠的，更重要的是，近年来，许多明显的进展可能都是幻觉。除了证明这些主张外，我们还介绍了UCR时间序列异常存档。我们认为，该资源将通过为社区提供基准，从而可以在方法和有意义的总体进步范围之间进行有意义的比较，从而扮演与UCR时间序列分类档案相似的角色。

translated by 谷歌翻译

FastDTW is approximate and Generally Slower than the Algorithm it Approximates

Renjie Wu , Eamonn J. Keogh

分类：机器学习 | (统计)机器学习

2020-03-25

许多时间序列数据挖掘问题可以通过重复使用距离度量来解决。此类任务的示例包括相似性搜索，聚类，分类，异常检测和分割。在过去的二十年中，人们已经知道，在大多数域中，动态时间扭曲（DTW）距离度量是用于大多数任务的最佳措施。由于经典的DTW算法具有二次的时间复杂性，因此引入了许多想法，以减少其摊销时间或快速近似它。最引用的近似方法之一是FastDTW。 FastDTW算法已有超过一千个引用，并已在数百个研究工作中明确使用。在这项工作中，我们提出了令人惊讶的主张。在任何现实的数据挖掘应用程序中，近似FastDTW都比确切的DTW慢得多。这个事实显然对使用此算法的社区具有影响：允许其解决更大的数据集，获得确切的结果并在更少的时间内完成。

translated by 谷歌翻译

Depth-based Sampling and Steering Constraints for Memoryless Local Planners

Thai Binh Nguyen , Linh Nguyen , Tanveer Choudhury , Kathleen Keogh , Manzur Murshed

分类：机器人

2022-11-06

By utilizing only depth information, the paper introduces a novel but efficient local planning approach that enhances not only computational efficiency but also planning performances for memoryless local planners. The sampling is first proposed to be based on the depth data which can identify and eliminate a specific type of in-collision trajectories in the sampled motion primitive library. More specifically, all the obscured primitives' endpoints are found through querying the depth values and excluded from the sampled set, which can significantly reduce the computational workload required in collision checking. On the other hand, we furthermore propose a steering mechanism also based on the depth information to effectively prevent an autonomous vehicle from getting stuck when facing a large convex obstacle, providing a higher level of autonomy for a planning system. Our steering technique is theoretically proved to be complete in scenarios of convex obstacles. To evaluate effectiveness of the proposed DEpth based both Sampling and Steering (DESS) methods, we implemented them in the synthetic environments where a quadrotor was simulated flying through a cluttered region with multiple size-different obstacles. The obtained results demonstrate that the proposed approach can considerably decrease computing time in local planners, where more trajectories can be evaluated while the best path with much lower cost can be found. More importantly, the success rates calculated by the fact that the robot successfully navigated to the destinations in different testing scenarios are always higher than 99.6% on average.

translated by 谷歌翻译

Statistically-informed deep learning for gravitational wave parameter estimation

Hongyu Shen , E. A. Huerta , Eamonn O'Shea , Prayush Kumar , Zhizhen Zhao

分类：人工智能 | 机器学习 | (统计)机器学习

2019-03-05

我们介绍了深度学习模型，以估计黑洞兼并的二元组件的群众，$（m_1，m_2）$，以及合并后巧妙剩余滞留的三个天体性质，即最终旋转，$ a_f $，以及ringdown振荡的频率和阻尼时间为基础$ \ ell = m = 2 $酒吧模式，$（\ OMEGA_R，\ OMEGA_I）$。我们的神经网络将修改的$ \ texttt {wavenet} $架构与对比学习和标准化流相结合。我们将这些模型验证在先前分布通过闭合的分析表达描述后的高斯缀合物的先前家庭。确认我们的模型产生统计上一致的结果，我们使用它们来估计五个二进制黑洞的天体物理参数$（m_1，m_2，a_f，\ oomega_r，\ omega_i）：$ \ texttt {gw150914}，\ texttt {gw170104 }，\ texttt {gw170814}，\ texttt {gw190521} $和$ \ texttt {gw190630} $。我们使用$ \ texttt {pycbc推理} $直接比较传统的贝叶斯方法进行参数估计与我们的深度学习的后部分布。我们的研究结果表明，我们的神经网络模型预测编码物理相关性的后分布，以及我们的数据驱动的中值结果和90美元\％$置信区间与引力波贝叶斯分析产生的数据相似。此方法需要单个V100 $ \ TextTT {NVIDIA} $ GPU，以在每次事件中生成2毫秒内的中位值和后部分布。这个神经网络和使用的教程，可在$ \ texttt {scounty} $ \ texttt {scounty hub} $。

translated by 谷歌翻译