近年来,已对变压器进行了积极研究,以预测。尽管在各种情况下经常显示出令人鼓舞的结果,但传统的变压器并非旨在充分利用时间序列数据的特征,因此遭受了一些根本的限制,例如,它们通常缺乏分解能力和解释性,并且既不有效,也没有有效的效率 - 期望。在本文中,我们提出了一种新颖的时间序列变压器体系结构Etsformer,它利用了指数平滑的原理,以改善变压器的时间序列预测。特别是,受到预测时间序列的经典指数平滑方法的启发,我们提出了新型的指数平滑注意力(ESA)和频率注意(FA),以替代香草变压器中的自我发挥机制,从而提高了准确性和效率。基于这些,我们使用模块化分解块重新设计了变压器体系结构,以便可以学会将时间序列数据分解为可解释的时间序列组件,例如水平,增长和季节性。对各种时间序列基准的广泛实验验证了该方法的功效和优势。代码可从https://github.com/salesforce/etsformer获得。
translated by 谷歌翻译
延长预测时间是对真实应用的危急需求,例如极端天气预警和长期能源消耗规划。本文研究了时间序列的长期预测问题。基于现有的变压器的模型采用各种自我关注机制来发现远程依赖性。然而,长期未来的复杂时间模式禁止模型找到可靠的依赖项。此外,变压器必须采用长期级效率的稀疏版本的点明显自我关注,从而导致信息利用瓶颈。超越变形金刚,我们将自动运气设计为具有自动相关机制的新型分解架构。我们突破了序列分解的预处理公约,并将其翻新为深层模型的基本内部。这种设计为复杂的时间序列具有渐进式分解容量的自动成形。此外,由随机过程理论的启发,我们基于串联周期性设计自相关机制,这在子系列级别进行了依赖关系发现和表示聚合。自动相关性效率和准确性的自我关注。在长期预测中,自动成形器产生最先进的准确性,六个基准测试中的相对改善38%,涵盖了五种实际应用:能源,交通,经济,天气和疾病。此存储库中可用的代码:\ url {https://github.com/thuml/autoformer}。
translated by 谷歌翻译
深度学习已被积极应用于预测时间序列,从而导致了大量新的自回归模型体系结构。然而,尽管基于时间指数的模型具有吸引人的属性,例如随着时间的推移是连续信号函数,导致表达平滑,但对它们的关注很少。实际上,尽管基于天真的深度指数模型比基于经典时间指数的模型的手动预定义函数表示表达得多,但由于缺乏电感偏见和时间序列的非平稳性,它们的预测不足以预测。在本文中,我们提出了DeepTime,这是一种基于深度指数的模型,该模型通过元学习公式训练,该公式克服了这些局限性,从而产生了有效而准确的预测模型。对现实世界数据集的广泛实验表明,我们的方法通过最先进的方法实现了竞争成果,并且高效。代码可从https://github.com/salesforce/deeptime获得。
translated by 谷歌翻译
最近,对于长期时间序列预测(LTSF)任务,基于变压器的解决方案激增。尽管过去几年的表现正在增长,但我们质疑这项研究中这一研究的有效性。具体而言,可以说,变形金刚是最成功的解决方案,是在长序列中提取元素之间的语义相关性。但是,在时间序列建模中,我们要在一组连续点的有序集中提取时间关系。在采用位置编码和使用令牌将子系列嵌入变压器中的同时,有助于保留某些订购信息,但\ emph {置换不变}的自我注意力专注机制的性质不可避免地会导致时间信息损失。为了验证我们的主张,我们介绍了一组名为LTSF线性的令人尴尬的简单单层线性模型,以进行比较。在九个现实生活数据集上的实验结果表明,LTSF线性在所有情况下都超过现有的基于变压器的LTSF模型,并且通常要大幅度较大。此外,我们进行了全面的经验研究,以探索LTSF模型各种设计元素对其时间关系提取能力的影响。我们希望这一令人惊讶的发现为LTSF任务打开了新的研究方向。我们还主张重新审视基于变压器解决方案对其他时间序列分析任务(例如,异常检测)的有效性。代码可在:\ url {https://github.com/cure-lab/ltsf-linear}中获得。
translated by 谷歌翻译
Time series forecasting is a long-standing challenge due to the real-world information is in various scenario (e.g., energy, weather, traffic, economics, earthquake warning). However some mainstream forecasting model forecasting result is derailed dramatically from ground truth. We believe it's the reason that model's lacking ability of capturing frequency information which richly contains in real world datasets. At present, the mainstream frequency information extraction methods are Fourier transform(FT) based. However, use of FT is problematic due to Gibbs phenomenon. If the values on both sides of sequences differ significantly, oscillatory approximations are observed around both sides and high frequency noise will be introduced. Therefore We propose a novel frequency enhanced channel attention that adaptively modelling frequency interdependencies between channels based on Discrete Cosine Transform which would intrinsically avoid high frequency noise caused by problematic periodity during Fourier Transform, which is defined as Gibbs Phenomenon. We show that this network generalize extremely effectively across six real-world datasets and achieve state-of-the-art performance, we further demonstrate that frequency enhanced channel attention mechanism module can be flexibly applied to different networks. This module can improve the prediction ability of existing mainstream networks, which reduces 35.99% MSE on LSTM, 10.01% on Reformer, 8.71% on Informer, 8.29% on Autoformer, 8.06% on Transformer, etc., at a slight computational cost ,with just a few line of code. Our codes and data are available at https://github.com/Zero-coder/FECAM.
translated by 谷歌翻译
尽管基于变压器的方法已显着改善了长期序列预测的最新结果,但它们不仅在计算上昂贵,而且更重要的是,无法捕获全球时间序列的观点(例如,整体趋势)。为了解决这些问题,我们建议将变压器与季节性趋势分解方法相结合,在这种方法中,分解方法捕获了时间序列的全局概况,而变形金刚捕获了更详细的结构。为了进一步提高变压器的长期预测性能,我们利用了以下事实:大多数时间序列倾向于在诸如傅立叶变换之类的知名基础上具有稀疏的表示形式,并开发出频率增强的变压器。除了更有效外,所提出的方法被称为频率增强分解变压器({\ bf fedFormer}),比标准变压器更有效,具有线性复杂性对序列长度。我们对六个基准数据集的实证研究表明,与最先进的方法相比,FedFormer可以将预测错误降低14.8 \%$ $和$ 22.6 \%\%\%\%$ $,分别为多变量和单变量时间序列。代码可在https://github.com/maziqing/fedformer上公开获取。
translated by 谷歌翻译
Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a ProbSparse self-attention mechanism, which achieves O(L log L) in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.
translated by 谷歌翻译
Time series, sets of sequences in chronological order, are essential data in statistical research with many forecasting applications. Although recent performance in many Transformer-based models has been noticeable, long multi-horizon time series forecasting remains a very challenging task. Going beyond transformers in sequence translation and transduction research, we observe the effects of down-and-up samplings that can nudge temporal saliency patterns to emerge in time sequences. Motivated by the mentioned observation, in this paper, we propose a novel architecture, Temporal Saliency Detection (TSD), on top of the attention mechanism and apply it to multi-horizon time series prediction. We renovate the traditional encoder-decoder architecture by making as a series of deep convolutional blocks to work in tandem with the multi-head self-attention. The proposed TSD approach facilitates the multiresolution of saliency patterns upon condensed multi-heads, thus progressively enhancing complex time series forecasting. Experimental results illustrate that our proposed approach has significantly outperformed existing state-of-the-art methods across multiple standard benchmark datasets in many far-horizon forecasting settings. Overall, TSD achieves 31% and 46% relative improvement over the current state-of-the-art models in multivariate and univariate time series forecasting scenarios on standard benchmarks. The Git repository is available at https://github.com/duongtrung/time-series-temporal-saliency-patterns.
translated by 谷歌翻译
Multivariate time series forecasting (MTSF) is a fundamental problem in numerous real-world applications. Recently, Transformer has become the de facto solution for MTSF, especially for the long-term cases. However, except for the one forward operation, the basic configurations in existing MTSF Transformer architectures were barely carefully verified. In this study, we point out that the current tokenization strategy in MTSF Transformer architectures ignores the token uniformity inductive bias of Transformers. Therefore, the vanilla MTSF transformer struggles to capture details in time series and presents inferior performance. Based on this observation, we make a series of evolution on the basic architecture of the vanilla MTSF transformer. We vary the flawed tokenization strategy, along with the decoder structure and embeddings. Surprisingly, the evolved simple transformer architecture is highly effective, which successfully avoids the over-smoothing phenomena in the vanilla MTSF transformer, achieves a more detailed and accurate prediction, and even substantially outperforms the state-of-the-art Transformers that are well-designed for MTSF.
translated by 谷歌翻译
时间是时间序列最重要的特征之一,但没有得到足够的关注。先前的时间序列预测研究主要集中于将过去的子序列(查找窗口)映射到未来的系列(预测窗口),而系列的时间通常只是在大多数情况下都扮演辅助角色。由于这些窗口中的点处理,将其推断为长期未来在模式上是艰难的。为了克服这一障碍,我们提出了一个名为DateFormer的全新时间序列预测框架,他将注意力转移到建模时间上,而不是遵循上述实践。具体而言,首先按时间序列分为补丁,以监督通过Transformers(DERT)的日期编码器表示的动态日期代表的学习。然后将这些表示形式馈入一个简单的解码器,以产生更粗的(或全局)预测,并用于帮助模型从回顾窗口中寻求有价值的信息,以学习精致(或本地)的预测。 DateFormer通过将上述两个部分求和来获得最终结果。我们对七个基准测试的经验研究表明,与序列建模方法相比,时间模型方法对于长期序列预测更有效。 DateFormer产生最先进的准确性,相对改进40%,并将最大可靠的预测范围扩大到半年水平。
translated by 谷歌翻译
时间序列数据在研究以及各种工业应用中无处不在。有效地分析可用的历史数据并提供对未来的见解,使我们能够做出有效的决策。最近的研究见证了基于变压器的架构的出色表现,尤其是在《远距离时间序列》的政权预测中。但是,稀疏变压器体系结构的当前状态无法将其简化和上取样过程磨损,无法以与输入相似的分辨率产生输出。我们提出了基于新颖的Y形编码器架构的Yformer模型,该架构(1)在U-NET启发的体系结构中使用从缩小的编码层到相应的UPSMPLED DEXODER层的直接连接,(2)组合了降尺度/降压/以稀疏的注意来提高采样,以捕获远距离效应,(3)通过添加辅助重建损失来稳定编码器堆栈。已经在四个基准数据集上使用相关基线进行了广泛的实验,与单变量和多元设置的艺术现状相比,MAE的平均改善为19.82,18.41百分比和13.62,11.85百分比MAE。
translated by 谷歌翻译
在各种下游机器学习任务中,多元时间序列的可靠和有效表示至关重要。在多元时间序列预测中,每个变量都取决于其历史值,并且变量之间也存在相互依存关系。必须设计模型以捕获时间序列之间的内部和相互关系。为了朝着这一目标迈进,我们提出了时间序列注意变压器(TSAT),以进行多元时间序列表示学习。使用TSAT,我们以边缘增强动态图来表示多元时间序列的时间信息和相互依赖性。在动态图中的节点表示,串行中的相关性表示。修改了一种自我注意力的机制,以使用超经验模式分解(SMD)模块捕获序列间的相关性。我们将嵌入式动态图应用于时代序列预测问题,包括两个现实世界数据集和两个基准数据集。广泛的实验表明,TSAT显然在各种预测范围内使用六种最先进的基线方法。我们进一步可视化嵌入式动态图,以说明TSAT的图形表示功能。我们在https://github.com/radiantresearch/tsat上共享代码。
translated by 谷歌翻译
Transformer-based models have gained large popularity and demonstrated promising results in long-term time-series forecasting in recent years. In addition to learning attention in time domain, recent works also explore learning attention in frequency domains (e.g., Fourier domain, wavelet domain), given that seasonal patterns can be better captured in these domains. In this work, we seek to understand the relationships between attention models in different time and frequency domains. Theoretically, we show that attention models in different domains are equivalent under linear conditions (i.e., linear kernel to attention scores). Empirically, we analyze how attention models of different domains show different behaviors through various synthetic experiments with seasonality, trend and noise, with emphasis on the role of softmax operation therein. Both these theoretical and empirical analyses motivate us to propose a new method: TDformer (Trend Decomposition Transformer), that first applies seasonal-trend decomposition, and then additively combines an MLP which predicts the trend component with Fourier attention which predicts the seasonal component to obtain the final prediction. Extensive experiments on benchmark time-series forecasting datasets demonstrate that TDformer achieves state-of-the-art performance against existing attention-based models.
translated by 谷歌翻译
最近,由于引入变压器,时间序列的性能最近得到了极大的改善。在本文中,我们提出了一个一般的多尺度框架,可以应用于基于最新的变压器的时间序列预测模型,包括自动构造和告密者。使用具有共同权重,体系结构适应和专门设计的归一化方案的多个尺度上的预测时间序列,我们能够通过最小的其他计算开销来实现重大的性能改进。通过详细的消融研究,我们证明了我们提出的建筑和方法论创新的有效性。此外,我们在四个公共数据集上的实验表明,所提出的多规模框架的表现优于相应的基线,平均改善比自动型和告密者的平均改善分别为13%和38%。
translated by 谷歌翻译
各种深度学习模型,尤其是一些最新的基于变压器的方法,已大大改善了长期时间序列预测的最新性能。但是,这些基于变压器的模型遭受了严重的恶化性能,并延长了输入长度除了使用扩展的历史信息。此外,这些方法倾向于在长期预测中处理复杂的示例,并增加模型复杂性,这通常会导致计算的显着增加和性能较低的鲁棒性(例如,过度拟合)。我们提出了一种新型的神经网络架构,称为Treedrnet,以进行更有效的长期预测。受稳健回归的启发,我们引入了双重残差链接结构,以使预测更加稳健。对Kolmogorov-Arnold表示定理进行了明确的介绍,并明确介绍了特征选择,模型集合和树结构,以进一步利用扩展输入序列,从而提高了可靠的输入序列和Treedrnet的代表力。与以前的顺序预测工作的深层模型不同,Treedrnet完全建立在多层感知下,因此具有很高的计算效率。我们广泛的实证研究表明,Treedrnet比最先进的方法更有效,将预测错误降低了20%至40%。特别是,Treedrnet的效率比基于变压器的方法高10倍。该代码将很快发布。
translated by 谷歌翻译
Time series forecasting is an important problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. In this paper, we propose to tackle such forecasting problem with Transformer [1]. Although impressed by its performance in our preliminary study, we found its two major weaknesses: (1) locality-agnostics: the point-wise dotproduct self-attention in canonical Transformer architecture is insensitive to local context, which can make the model prone to anomalies in time series; (2) memory bottleneck: space complexity of canonical Transformer grows quadratically with sequence length L, making directly modeling long time series infeasible. In order to solve these two issues, we first propose convolutional self-attention by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism. Then, we propose LogSparse Transformer with only O(L(log L) 2 ) memory cost, improving forecasting accuracy for time series with fine granularity and strong long-term dependencies under constrained memory budget. Our experiments on both synthetic data and realworld datasets show that it compares favorably to the state-of-the-art.
translated by 谷歌翻译
在本文中,我们呈现SSDNet,这是一个新的时间序列预测的深层学习方法。SSDNet将变压器架构与状态空间模型相结合,提供概率和可解释的预测,包括趋势和季节性成分以及前一步对预测很重要。变压器架构用于学习时间模式并直接有效地估计状态空间模型的参数,而无需对卡尔曼滤波器的需要。我们全面评估了SSDNET在五个数据集上的性能,显示SSDNet是一种有效的方法,可在准确性和速度,优于最先进的深度学习和统计方法方面是一种有效的方法,能够提供有意义的趋势和季节性组件。
translated by 谷歌翻译
最近的研究表明,诸如RNN和Transformers之类的深度学习模型为长期预测时间序列带来了显着的性能增长,因为它们有效地利用了历史信息。但是,我们发现,如何在神经网络中保存历史信息,同时避免过度适应历史上的噪音,这仍然有很大的改进空间。解决此问题可以更好地利用深度学习模型的功能。为此,我们设计了一个\ textbf {f}要求\ textbf {i} mpraved \ textbf {l} egendre \ textbf {m} emory模型,或{\ bf film}:它应用了legendre promotions topimate legendre provientions近似历史信息,近似历史信息,使用傅立叶投影来消除噪声,并添加低级近似值以加快计算。我们的实证研究表明,所提出的膜显着提高了由(\ textbf {20.3 \%},\ textbf {22.6 \%})的多变量和单变量长期预测中最新模型的准确性。我们还证明,这项工作中开发的表示模块可以用作一般插件,以提高其他深度学习模块的长期预测性能。代码可从https://github.com/tianzhou2011/film/获得。
translated by 谷歌翻译
神经预测的最新进展加速了大规模预测系统的性能。然而,长途预测仍然是一项非常艰巨的任务。困扰任务的两个常见挑战是预测的波动及其计算复杂性。我们介绍了N-HITS,该模型通过结合新的分层插值和多率数据采样技术来解决挑战。这些技术使提出的方法能够顺序组装其预测,并在分解输入信号并合成预测的同时强调不同频率和尺度的组件。我们证明,在平稳性的情况下,层次结构插值技术可以有效地近似于任意长的视野。此外,我们从长远的预测文献中进行了广泛的大规模数据集实验,证明了我们方法比最新方法的优势,在该方法中,N-HITS可提供比最新的16%的平均准确性提高。变压器体系结构在减少计算时间的同时(50次)。我们的代码可在https://bit.ly/3jlibp8上找到。
translated by 谷歌翻译
基于预测方法的深度学习已成为时间序列预测或预测的许多应用中的首选方法,通常通常优于其他方法。因此,在过去的几年中,这些方法现在在大规模的工业预测应用中无处不在,并且一直在预测竞赛(例如M4和M5)中排名最佳。这种实践上的成功进一步提高了学术兴趣,以理解和改善深厚的预测方法。在本文中,我们提供了该领域的介绍和概述:我们为深入预测的重要构建块提出了一定深度的深入预测;随后,我们使用这些构建块,调查了最近的深度预测文献的广度。
translated by 谷歌翻译