标签: Time Series Transformer

2024-10-052025-04-16 随手记 8 分钟读完 (大约1151个字)

Transformer 是否适合时间序列预测

Transformer是否真的适合解决时序预测问题 - 知乎 (zhihu.com)

[[Autoformer]] #card
- 一类方法是否适用于某个问题，主要取决于两者的[[归纳偏置]]是否相匹配
- 标准 Transformer 实现的是点到点的特征融合，这种离散的方式其实是忽略了时间序列连续性。
- 基于自相关理论将 Transformer 扩展到序列到序列的特征融合
[[MTGNN]]
[[TPA-LSTM]]

Transformer是否适合用于做非NLP领域的时间序列预测问题？ - 知乎 (zhihu.com) [[2023/03/05]]

吴海旭
- 一类方法是否适用于某个问题，主要取决于两者的[[归纳偏置]]是否相匹配
  #card
  - RNN
    - 对序列依赖马尔可夫假设
    - 序列维度参数共享，天然可以处理变长数据
  - Transfomr
    - 注意力机制处理变长数据
    - 点点时序链接建模长期依赖
  - 线性模型
    - 不具备处理变长数据的能力
    - 模型参数量随着长度增加而增加
- [[@Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting]] 非平稳时间序列预测
科研汪老徐
- [[@Are Transformers Effective for Time Series Forecasting?]] transformer 类方法短时预测效果差，所以强调长时预测，然后对比的 baseline 是自回归模型，由于错误累积，长时预测效果差。#card
  - 我们不相信如果一个模型对短时预测都做不好，对长时预测却得心应手，毕竟这两个问题都是从同一段历史数据里提取时序特征来做的预测。
- 此外，我们也并不认为时序预测这个问题有必要考虑变长数据，也因此应该使用Transformer这类支持对变长数据进行建模的架构。#card
  - 反过来想一下，如果一个模型能同时处理各种长度的历史数据，至少说明这些不同长度之间的时序信息在建模过程中是丢失的。
- SCINet CNN 提取时序特征
- self-attention 机制上是 anti-ordering 的，与时间序列建模的目的恰好是矛盾的。#card
  - [[Informer]] 从实验结果来看，原生Informer的效果最差，原因也很简单，raw data除了大小没有什么时序的语义信息，对原始数据通过self-attention提取点对点的所谓correlation毫无意义。Informer中真正起了点儿作用的是对时间直接做些编码的部分，不过也作用有限。
- [[PatchTST]] 我们一直质疑的是attention对于提取时序信息的意义，文中的Patch内部使用linear来提取feature，这部分跟我们的做法类似。#card
  - 直觉上来讲，学习不同patch之间的关系对大多数时间序列意义不大，可以做个ablation study看看attention是否真的起到了作用，i.e.，跟去掉attention后的两层Linear结构比较一下。此外，文中缺省使用了RIN来降低distribution shift带来的影响，跟NLinear的结果比较会相对合理一些。
  - RIN 对抗distribution shift的normalization方法 [[@Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift]]
的泼墨佛给克呢 - 知乎 (zhihu.com)
- 是真的，informer设计了一大堆花里胡哨的，看起来很有道理，实际上用起来真的不太行。#card
  - 而且我发现，他们这些transformer的模型，在ETT数据集(只有7个变量)上，d_model设置为512，都快赶上nlp领域常设的768了，这也太离谱了。
  - 按理说，只有7个变量，用这么大的模型(虽然层数较少)太扯了
- 而且那个pyraformer是真的没啥特别突出的地方，不知道咋能中oral。。。#card
  - 我做过实验，就是在输出的时候把变量维度和时间维度统一拼接经过线性层，这样确实能效果好，
  - 我觉得他的效果有提升就是因为这个，但这样得到的线性层是非常非常大的，参数甚至比transformer本身都多，不知道有啥意义。。。最近他的代码开源了，可以看看他是怎么实现的

时间序列预测, Time Series Transformer

2022-03-072024-10-05 随手记 9 分钟读完 (大约1416个字)

@Transformers in Time Series: A Survey

[[Abstract]]

Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interests in the time series community.
Among multiple advantages of transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications.
- In this paper, we systematically review transformer schemes for time series modeling by highlighting their strengths as well as limitations. In particular, we examine the development of time series transformers in two perspectives.
  - From the perspective of network structure, we summarize the adaptations and modiﬁcation that have been made to transformer in order to accommodate the challenges in time series analysis.
  - From the perspective of applications, we categorize time series transformers based on common tasks including forecasting, anomaly detection, and classiﬁcation.
- Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how transformers perform in time series.
- Finally, we discuss and suggest future directions to provide useful research guidance.
A corresponding resource list which will be continuously updated can be found in the GitHub repository1.

[[Attachments]]

Transformers in Time Series-2022.pdf

Input Encoding and Positional Encoding

Absolute Positional Encoding
Relative Positional Encoding
Hybrid positional encodings

Network Modiﬁcations for Time Series

[[Positional Encoding]]
- Vanilla Positional Encoding
- Learnable Positional Encoding
  - [[A transformer-based framework for multivariate time series representation learning]] introduce an embedding layer in Transformer that learn embedding vectors for each position index jointly with other model parameters.
  - [[Temporal Fusion Transformers]] 使用 LSTM 对位置进行编码，更好适应时序预测任务
- Timestamp Encoding
  - calendar timestamps(hours, minute…) 和 special timestamps (holidays and events)
  - [[@Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting]] [[Autoformer]] [[FEDformer]] 将 timestamps 特征转换成 embedding 通过网络学习
  - 如何生成好的 timestamp encoding 比较依赖人工先验
Attention Module
- 提升 self-attention 计算效率

  + [[LogTrans]] [ Li et al., 2019 ] and [\[\[Pyraformer\]\]](/post/logseq/%40Pyraformer%3A%20Low-Complexity%20Pyramidal%20Attention%20for%20Long-Range%20Time%20Series%20Modeling%20and%20Forecasting.html) explicitly introducing a sparsity bias

  + 移除 self-attention 矩阵部分值 [\[\[@Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting\]\]](/post/logseq/%40Informer%3A%20Beyond%20Efficient%20Transformer%20for%20Long%20Sequence%20Time-Series%20Forecasting.html) [[FEDformer]]

Architecture Level
- renovate transformer
- hierarchical architecture 分层结构
  - 针对考虑到时间序列的多分辨率(多周期，多趋势叠加)
    - [[@Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting]] max-pooling layer
    - [[Pyraformer]] C-ary tree base attention mechanism
      - nodes at the ﬁnest scale correspond to the original time series
      - nodes in the coarser scales represent series at lower resolutions
      - both intra-scale and inter-scale attentions in order to better capture temporal dependencies across different resolutions

Applications of Time Series Transformers

Forecasting
- Time Series Forecasting
  - [[LogTrans]]
    - proposed convolutional self-attention by employing causal convolutions to generate queries and keys in the self-attention layer 因果卷积引入子注意力计算
    - a Logsparse mask
  - [[@Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting]]
  - AST [[Adversarial sparse transformer for time series forecasting]]
    - 使用生成对抗编码器-解码器架训练用于时间序列预测的稀疏 Transformer 模型
    - 对抗训练可以直接塑造网络的输出来改善预测效果，避免逐步预测带来的累积误差
      - directly shaping the output distribution of network to avoid the error accumulation through one-step ahead inference
  - [[Autoformer]]
    - simple seasonaltrend decomposition architecture 简单季节性趋势分解架构
    - an auto-correlation mechanism working as an attention module 自相关机制注意力模块 $O(L\log L)$
      - measures the time-delay similarity between inputs signal and aggregate the top-k similar sub-series to produce the output
  - [[FEDformer]]
    - 利用 [[Fourier transform]] 和 [[Wavelet transform]] 处理 frequency domain 频域中的注意力操作
      - linear complexity
  - [[@Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting]]
    - multi-horizon forecasting model with static covariate encoders, gating feature selection and temporal self-attention decoder
  - [[SSDNet]] [[ProTran]]
    - combine Transformer with state space models to provide probabilistic forecasts 提供概率预测
  - [[Pyraformer]]
    - hierarchical pyramidal attention module with binary tree following path
    - 分层金字塔注意力模块，二叉树
  - [[Aliformer]]
    - Knowledge-guided attention
- Spatio-Temporal Forecasting [[Traffic Flow Forecasting]]
  - Trafﬁc transformer: Capturing the continuity and periodicity of time series for trafﬁc forecasting
    - self attention module to capture temporal-temporal dependencies 时序特征
    - Graph neural network module to capture spatial dependencies 空间特征
  - Spatialtemporal Transformer
    - 空间 transformer 辅助图卷积网络来捕获空间依赖关系
  - Spatio-temporal graph Transformer
    - 基于注意力的图卷积机制
- Event Forecasting
  - temporal point processes (TPP)
Anomaly Detection
Classification
- [[GTN]]

Experimental Evaluation and Discussion

模型鲁棒性、模型大小以及对时序季节性和趋势捕捉能力

robustness analysis, model size analysis, and

seasonal-trend decomposition analysis

seasonal-trend decomposition 是 transformer 解决时序预测的重要组成部分
所有模型加上 moving average trend decomposition architecture proposed 结构后，和原始模型相比效果都获得提升

Future Research Opportunities

[[inductive bias]] for Time Series Transformers
- 避免过拟合，训练 transformer 需要大量数据。
- 时序数据具有 seasonal/periodic and trend patterns
- 将对于时序数据模型的理解和特定任务的特征做为归纳偏置引入 transformer
[[GNN]]
- 增强对于空间依赖和多维度之间的关系建模能力
[[预训练]]
- 目前针对时间序列的预训练 transformer 集中在时序分类任务中
[[Neural architecture search]]
- 如果构建高效的 transformer 结构

Ref

TODO [[A transformer-based framework for multivariate time series representation learning]]
TODO [[Adversarial sparse transformer for time series forecasting]]
DONE [[@Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting]]
completed:: [[2022/11/08]]
TODO [[SSDNet]]
TODO [[ProTran]]
[[LogSparse Transformer]]
Transformer应用于时序任务的综述【2022by阿里达摩院】 - 知乎 (zhihu.com)
- 影响预测效果的细节
  - 训练
  - Encoder 间的特征工程

2022-01-072024-10-05 随手记 21 分钟读完 (大约3087个字)

@Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

[[Attachments]]

Autoformer_2022_Wu.pdf

关键信息

[[Long Term Series Forecasting]] [[时间序列分解]]

核心贡献

Autoformer as a novel decomposition architecture with an Auto-Correlation mechanism
ls-type:: annotation
hl-page:: 1
hl-color:: yellow
[[Auto-Correlation Mechanism]] 自相关机制，代替点向连接的注意力机制，实现序列级连接和较低复杂度
- 序列级别依赖发现以及表示聚合 conducts the dependencies discovery and representation aggregation at the sub-series level.
  ls-type:: annotation
  hl-page:: 1
  hl-color:: yellow
Decomposition Architecture 深度分解架构，从复杂时间模式种分解出可预测性更强的成分
- 推理复杂时间模式 intricate temporal patterns
  ls-type:: annotation
  hl-page:: 2
  hl-color:: yellow
  - process the complex time series and extract more predictable components.
    ls-type:: annotation
    hl-page:: 2
    hl-color:: yellow
  - 常规对前向数据进行分解，忽视未来可能发生的分解组件之间的潜在交互作用
    - This common usage limits the capabilities of decomposition and overlooks the potential future interactions among decomposed components.
      ls-type:: annotation
      hl-page:: 2
      hl-color:: blue
  - 分解可以解开纠缠的时间模式并突出时间序列的固有属性 can ravel out the entangled temporal patterns and highlight the inherent properties of time series
    ls-type:: annotation
    hl-page:: 2
    hl-color:: blue
  - 对子序列进行分解，基于时间序列周期性导出的过程相似性构建一种系列级连接 sub-series at the same phase position among periods often present similar temporal processes
    ls-type:: annotation
    hl-page:: 2
    hl-color:: yellow
  - 逐步分解整个预测过程中的隐藏序列，包括过去的序列和预测的中间结果
    - decompose the hidden series throughout the whole forecasting process, including both the past series and the predicted intermediate results.
      ls-type:: annotation
      hl-page:: 3
      hl-color:: yellow

核心问题

原始序列中各种趋势信息比较混乱，无法在长时间序列中发现时间依赖
- unreliable to discover the temporal dependencies directly from the long-term time series because the dependencies can be obscured by entangled temporal patterns.
  ls-type:: annotation
  hl-page:: 1
  hl-color:: green
- 需要处理复杂的时间模式，打破计算效率和信息利用的瓶颈 handling intricate temporal patterns and breaking the bottleneck of computation efficiency and information utilization.
  ls-type:: annotation
  hl-page:: 3
  hl-color:: blue
- 待预测序列长度远远大于输入长度
Transformer 平方级别复杂度
- 之前方法尝试使用稀疏 self-attention mproving self-attention to a sparse version
  ls-type:: annotation
  hl-page:: 1
  hl-color:: green
  - 稀疏注意力机制将造成信息的丢失，成为长时间序列预测的瓶颈
- these models still utilize the point-wise representation aggregation
  ls-type:: annotation
  hl-page:: 1
  hl-color:: blue
Transformer point-wise 维度聚合
- self-attention 来捕捉时刻间的依赖
  - 难以直接发现可靠的时间依赖

相关工作

之前方法集中在 recurrent connections, temporal attention or causal convolution.
ls-type:: annotation
hl-page:: 2
hl-color:: blue
- [[DeepAR]] 自回归 + RNN 建模未来序列的概率分布
  - combines autoregressive methods and RNNs to model the probabilistic distribution of future series.
    ls-type:: annotation
    hl-page:: 2
    hl-color:: blue
- [[LSTNet]] CNNs + ResNet 捕捉 short-term 和 long-term temporal patterns
- [[TCN]]
Transformer 类方法
- [[Reformer]] [[local-sensitive hashing attention]] #mark/paper
- [[Informer]] KL + [[ProbSparse Attention]]
Decomposition of Time Series
- 将原始时间序列分解成多个序列，新序列更容易预测
  - each representing one of the underlying categories of patterns that are more predictable.
    ls-type:: annotation
    hl-page:: 3
    hl-color:: blue
- [[Prophet]] with trend-seasonality decomposition
- [[N-BEATS]] with basis expansion
- [[DeepGLO]] with matrix decomposition
- 缺点
  - 简单分解限制
    - limited by the plain decomposition effect of historical series
      ls-type:: annotation
      hl-page:: 3
      hl-color:: yellow
  - 忽视层次交互
    - overlooks the hierarchical interaction between the underlying patterns of series in the long-term future.
      ls-type:: annotation
      hl-page:: 3
      hl-color:: yellow
    - 预测问题未来的不可知性，通常方法先对过去序列进行分解，再分别预测，这会造成预测结果受限于分解效果，并且忽视了未来各个组分之间的相互作用。

解决方法

Decomposition Architecture
ls-type:: annotation
hl-page:: 3
hl-color:: yellow
- [:span]
  ls-type:: annotation
  hl-page:: 4
  hl-color:: yellow

tags:: #[[Model Architecture]] [[Encoder-Decoder]]

+ series decomposition block

ls-type:: annotation
hl-page:: 3
hl-color:: yellow
保留周期部分

  + 序列分解成趋势项和周期项部分 separate the series into trend-cyclical and seasonal parts.

ls-type:: annotation
hl-page:: 3
hl-color:: yellow

  + 在预测过程中，模型交替进行预测结果优化和序列分解，从隐藏变量中逐步分离趋势项与周期项

  + 逐步从预测的中间隐藏变量中提取长期稳定的趋势 xtract the long-term stationary trend from predicted intermediate hidden variables progressively.

ls-type:: annotation
hl-page:: 3
hl-color:: yellow

  + 使用 [[Moving Average]] 平滑周期性、突出趋势项 smooth out periodic fluctuations and highlight the long-term trends

ls-type:: annotation
hl-page:: 3
hl-color:: yellow

    + $\mathcal{X}_{\mathrm{s}}, \mathcal{X}_{\mathrm{t}}=\operatorname{SeriesDecomp}(\mathcal{X})$

      + $\begin{aligned} & \mathcal{X}_{\mathrm{t}}=\operatorname{Avg} \operatorname{Pool}(\operatorname{Padding}(\mathcal{X})) \\ & \mathcal{X}_{\mathrm{s}}=\mathcal{X}-\mathcal{X}_{\mathrm{t}}\end{aligned}$

      + xs seasonal

      + xt trend-cyclial part

+ [[Encoder]]

  + Encoder 输入过去 I 步 $\mathcal{X}_{\mathrm{en}} \in \mathbb{R}^{I \times d}$

  + 建模周期性部分，逐步消除趋势项（在 decoder 中通过累积得到） focuses on the seasonal part modeling

ls-type:: annotation
hl-page:: 4
hl-color:: yellow

    + 当成 decoder 的交叉信息 be used as the cross information to help the decoder refine prediction results

ls-type:: annotation
hl-page:: 4
hl-color:: yellow

  + 流程

    + _ the eliminated trend part

ls-type:: annotation
hl-page:: 4
hl-color:: red
消除趋势项

    + $\begin{aligned} & \mathcal{S}_{\text {en }}^{l, 1},_{-}=\operatorname{SeriesDecomp}\left(\text { Auto-Correlation }\left(\mathcal{X}_{\text {en }}^{l-1}\right)+\mathcal{X}_{\text {en }}^{l-1}\right) \\ & \mathcal{S}_{\text {en }}^{l, 2},_{-}=\operatorname{SeriesDecomp}\left(\text { FeedForward }\left(\mathcal{S}_{\text {en }}^{l, 1}\right)+\mathcal{S}_{\text {en }}^{l, 1}\right)\end{aligned}$

+ [[Decoder]]  分解对趋势项与周期项建模

  + 一半过去信息 + 填充

    + $\begin{aligned} \mathcal{X}_{\text {ens }}, \mathcal{X}_{\text {ent }} & =\operatorname{SeriesDecomp}\left(\mathcal{X}_{\text {en } \frac{I}{2}: I}\right) \\ \mathcal{X}_{\text {des }} & =\operatorname{Concat}\left(\mathcal{X}_{\text {ens }}, \mathcal{X}_0\right) \\ \mathcal{X}_{\text {det }} & =\operatorname{Concat}\left(\mathcal{X}_{\text {ent }}, \mathcal{X}_{\text {Mean }}\right),\end{aligned}$

    + seasonal part $\mathcal{X}_{\mathrm{des}} \in \mathbb{R}^{\left(\frac{1}{2}+O\right) \times d}$

    + trend-cyclical part $\mathcal{X}_{\mathrm{det}} \in \mathbb{R}^{\left(\frac{1}{2}+O\right) \times d}$

  + 趋势-周期累积结构 the accumulation structure for trend-cyclical components

ls-type:: annotation
hl-page:: 4
hl-color:: yellow

    + 从中间隐变量提取潜在趋势，使得模型可以逐步改进趋势预测并且消除干扰信息，以便于在自相关性中发现基于周期的依赖关系。

    + 其中，对于周期项，自相关机制利用序列的周期性质，聚合不同周期中具有相似过程的子序列；

    + Note that the model extracts the potential trend from the intermediate hidden variables during the decoder, allowing Autoformer to progressively refine the trend prediction and eliminate interference information for period-based dependencies discovery in Auto-Correlation.

ls-type:: annotation
hl-page:: 4
hl-color:: red

  + the stacked Auto-Correlation mechanism for seasonal component

ls-type:: annotation
hl-page:: 4
hl-color:: yellow

  + 流程

    + $\begin{aligned} \mathcal{S}_{\mathrm{de}}^{l, 1}, \mathcal{T}_{\mathrm{de}}^{l, 1} & =\operatorname{SeriesDecomp}\left(\text { Auto-Correlation }\left(\mathcal{X}_{\mathrm{de}}^{l-1}\right)+\mathcal{X}_{\mathrm{de}}^{l-1}\right) \\ \mathcal{S}_{\mathrm{de}}^{l, 2}, \mathcal{T}_{\mathrm{de}}^{l, 2} & =\operatorname{SeriesDecomp}\left(\text { Auto-Correlation }\left(\mathcal{S}_{\mathrm{de}}^{l, 1}, \mathcal{X}_{\mathrm{en}}^N\right)+\mathcal{S}_{\mathrm{de}}^{l, 1}\right) \\ \mathcal{S}_{\mathrm{de}}^{l, 3}, \mathcal{T}_{\mathrm{de}}^{l, 3} & =\operatorname{SeriesDecomp}\left(\text { FeedForward }\left(\mathcal{S}_{\mathrm{de}}^{l, 2}\right)+\mathcal{S}_{\mathrm{de}}^{l, 2}\right) \end{aligned}$

    + 趋势项，通过累积的方式逐步从预测的隐变量中提取出趋势信息

      + ${\mathcal{T}_{\mathrm{de}}^l =\mathcal{T}_{\mathrm{de}}^{l-1}+\mathcal{W}_{l, 1} * \mathcal{T}_{\mathrm{de}}^{l, 1}+\mathcal{W}_{l, 2} * \mathcal{T}_{\mathrm{de}}^{l, 2}+\mathcal{W}_{l, 3} * \mathcal{T}_{\mathrm{de}}^{l, 3}}$

[[Auto-Correlation Mechanism]]
- [:span]
  ls-type:: annotation
  hl-page:: 5
  hl-color:: yellow
- 高效的序列级连接，从而扩展信息效用
- Period-based dependencies 基于周期的依赖发现
  - 不同周期相同相位之间通常表现出相似的子过程 same phase position among periods naturally provides similar sub-processes.
    ls-type:: annotation
    hl-page:: 5
    hl-color:: yellow
  - [[Stochastic process theory]] discrete-time process 的 [[autocorrelation]]
    - $\mathcal{R}_{\mathcal{X} \mathcal{X}}(\tau)=\lim _{L \rightarrow \infty} \frac{1}{L} \sum_{t=1}^L \mathcal{X}_t \mathcal{X}_{t-\tau}$
    - $\mathcal{R}_{\mathcal{X} \mathcal{X}}(\tau)$ 代表序列 $\{ \mathcal{X}_t \}$ 和 $\tau$ 延迟 $\{ \mathcal{X}_{t - \tau} \}$ 之间的相似性
    - 将这种时延相似性看作未归一化的周期预估的置信度，即周期长度为 \tau 的置信度为 $\mathcal{R}(\tau)$
      - 假设周期为 \tau， $\mathcal{X}_{\tau: L-1}$ 与 $\mathcal{X}_{0: L-\tau-1}$ 会极为相似
  - 取最相关 k 个长度 choose the most possible k period lengths
    ls-type:: annotation
    hl-page:: 5
    hl-color:: yellow
- Time delay aggregation 时延信息聚合
  - 该部分聚合组序列 oll the series based on selected time delay
    ls-type:: annotation
    hl-page:: 5
    hl-color:: yellow
  - 相似的子序列信息进行聚合
  - 流程
    - 计算 top k=clogL 个长度
      - $\tau_1, \cdots, \tau_k=\underset{\tau \in\{1, \cdots, L\}}{\arg \operatorname{Topk}}\left(\mathcal{R}_{\mathcal{Q}, \mathcal{K}}(\tau)\right) \\$
    - 计算长度后计算相关性，然后求 softmax
      - $\widehat{\mathcal{R}}_{\mathcal{Q}, \mathcal{K}}\left(\tau_1\right), \cdots, \widehat{\mathcal{R}}_{\mathcal{Q}, \mathcal{K}}\left(\tau_k\right)=\operatorname{SoftMax}\left(\mathcal{R}_{\mathcal{Q}, \mathcal{K}}\left(\tau_1\right), \cdots, \mathcal{R}_{\mathcal{Q}, \mathcal{K}}\left(\tau_k\right)\right) \\$
    - Roll 进行信息对齐， $\mathcal{X}_{0: L-\tau-1}$ 移到序列最前面， $\mathcal{X}_{0: L-\tau-1}$ 和 $\mathcal{X}_{\tau: L-1}$ 保存着相似的趋势信息
      - during which elements that are shifted beyond the first position are re-introduced at the last position
        ls-type:: annotation
        hl-page:: 5
        hl-color:: red
      - $\begin{aligned}\text { Auto-Correlation }(\mathcal{Q}, \mathcal{K}, \mathcal{V})=\sum_{i=1}^k \operatorname{Roll}\left(\mathcal{V}, \tau_i\right) \widehat{\mathcal{R}}_{\mathcal{Q}, \mathcal{K}}\left(\tau_i\right) \end{aligned}$
  - 多头
    - $\begin{aligned} \text { MultiHead }(\mathcal{Q}, \mathcal{K}, \mathcal{V}) & =\mathcal{W}_{\text {output }} * \text { Concat }\left(\operatorname{head}_1, \cdots, \text { head }_h\right) \\ \text { where } \text { head }_i & =\text { Auto-Correlation }\left(\mathcal{Q}_i, \mathcal{K}_i, \mathcal{V}_i\right)\end{aligned}$
  - 复杂度 $\mathcal{O}(L \log L)$
    - 计算 $\tau \in [1, L)$ 的相关性
    - Wiener-Khinchin 理论，自相关信息可以使用[[快速傅里叶变换]] Fast Fourier Transforms
      ls-type:: annotation
      hl-page:: 6
      hl-color:: yellow
      得到
- 与其他方法对比
  - [:span]
    ls-type:: annotation
    hl-page:: 6
    hl-color:: yellow
- 序列级高效连接
- self-attention family only calculates the relation between scattered points
  ls-type:: annotation
  hl-page:: 6
  hl-color:: blue
- 我们采用时间延迟块来聚合底层周期中相似的子序列。 we adopt the time delay block to aggregate the similar sub-series from underlying periods.
  ls-type:: annotation
  hl-page:: 6
  hl-color:: yellow

实验结论

参数设置
- ADAM + early stopped
- Autoformer contains 2 encoder layers and 1 decoder layer.
  ls-type:: annotation
  hl-page:: 7
  hl-color:: yellow
对比
- Informer [ 48 ], Reformer [23 ], LogTrans [26 ], two RNN-based models: LSTNet [ 25], LSTM [ 17] and CNN-based TCN [ 4] as baselines.
  ls-type:: annotation
  hl-page:: 7
  hl-color:: yellow
- N-BEATS[ 29 ], DeepAR [34 ], Prophet [ 39 ] and ARMIA
  ls-type:: annotation
  hl-page:: 7
  hl-color:: yellow
实验结果
- 预测方式前 96 预测后 96
  - we fix the input length and evaluate models with a wide range of prediction lengths: 96, 192, 336, 720.
    ls-type:: annotation
    hl-page:: 8
    hl-color:: yellow
- [[multivariate]]
  - 训练变长预测表现变化也平稳 we can also find that the performance of Autoformer changes quite steadily as the prediction length O increases
    ls-type:: annotation
    hl-page:: 8
    hl-color:: yellow
  - [:span]
    ls-type:: annotation
    hl-page:: 7
    hl-color:: yellow
- Univariate results
  ls-type:: annotation
  hl-page:: 8
  hl-color:: yellow
  单变量
  - . This situation of ARIMA can be benefited from its inherent capacity for non-stationary economic data but is limited by the intricate temporal patterns of real-world series.
    ls-type:: annotation
    hl-page:: 8
    hl-color:: yellow
  - [:span]
    ls-type:: annotation
    hl-page:: 8
    hl-color:: yellow
[[Ablation Study]]
- Decomposition architecture
  ls-type:: annotation
  hl-page:: 8
  hl-color:: yellow
  - 具有较好的通用性，其他模型加分解结构效果有提升，预测时效的延长，效果提升更明显
    - 减少复杂模式引起的干扰 our method can generalize to other models and release the capacity of other dependencies learning mechanisms, alleviate the distraction caused by intricate patterns
      ls-type:: annotation
      hl-page:: 9
      hl-color:: yellow
  - 对比深度分解架构和先分解再使用两个模型预测的方式，后者参数多，但是表现不好。
  - [:span]
    ls-type:: annotation
    hl-page:: 8
    hl-color:: yellow
- Auto-Correlation vs. self-attention family
  ls-type:: annotation
  hl-page:: 9
  hl-color:: yellow
  - 效果超过 full attention，序列级别建模带来的收益
  - 可以预测更长序列
  - [:span]
    ls-type:: annotation
    hl-page:: 9
    hl-color:: yellow
Model Analysis
ls-type:: annotation
hl-page:: 9
hl-color:: yellow
- time series decomposition
  - 随着序列分解单元的数量增加，模型学到的趋势项会越来月接近数据的正式结果，周期项可以更好的捕捉序列变化情况。
  - [:span]
    ls-type:: annotation
    hl-page:: 9
    hl-color:: yellow
- Dependencies learning
  - 找到的注意力更合理 Autoformer can discover the relevant information more sufficiently and precisely.
    ls-type:: annotation
    hl-page:: 9
    hl-color:: yellow
  - 模型自相关机制可以正确发掘出每个周期的下降过程，没有误识别和漏识别，注意力机制存在错误和漏缺
  - [:span]
    ls-type:: annotation
    hl-page:: 10
    hl-color:: yellow
- Complex seasonality modeling
  ls-type:: annotation
  hl-page:: 9
  hl-color:: yellow
  - 学习到的长度有意义 Autoformer can capture the complex seasonalities of real-world series from deep representations and further provide a human-interpretable prediction.
    ls-type:: annotation
    hl-page:: 10
    hl-color:: yellow
  - 高的部分说明有对应的周期性
  - [:span]
    ls-type:: annotation
    hl-page:: 10
    hl-color:: yellow
- Efficiency analysis

读后总结

[[Autoformer Code]]

Paper, Time Series Forecasting, Time Series Transformer, autocorrelation, NeurIPS/2021

Input Encoding and Positional Encoding

Network Modiﬁcations for Time Series

Applications of Time Series Transformers

Experimental Evaluation and Discussion

Future Research Opportunities

Ref

分类

链接

最新文章

标签