Temporal Fusion Decoder

Temporal Fusion Decoder
ls-type:: annotation
hl-page:: 10
hl-color:: yellow

Locality Enhancement with Sequence-to-Sequence Layer
ls-type:: annotation
hl-page:: 10
hl-color:: yellow
在 TFD 前用 [[Seq2Seq]] 对数据进行一次增强 #card

  • 重要的点通常是与周围的值相关系来确定的,例如异常值、变化点或周期模式 points of significance are often identified in relation to their surrounding values – such as anomalies, change-points or cyclical patterns.
    ls-type:: annotation
    hl-page:: 10
    hl-color:: yellow

  • 注意力机制架构中构建利用 point-wise values 的特征可以实现性能提升

    • [[@Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting]] single convolutional layer for locality enhancement
      ls-type:: annotation
      hl-page:: 10
      hl-color:: green
  • LSTM encoder-decoder

    • encoder 输入 ξ~tk:t\tilde{\boldsymbol{\xi}}_{t-k: t} 前 k 步特征

    • decoder 输入 ξ~t+1:t+τmax\tilde{\boldsymbol{\xi}}_{t+1: t+\tau_{\max }} 未来 tao 步特征

    • 由 encoder 和 decoder 生成的时序特征可表示为 ϕ(t,n){ϕ(t,k),,ϕ(t,τmax)}\phi(t, n) \in\left\{\boldsymbol{\phi}(t,-k), \ldots, \boldsymbol{\phi}\left(t, \tau_{\max }\right)\right\}

      • 其中 n 为位置索引
    • 时序特征输入到 TFD 前的在经过一次非线性变换

      • ϕ~(t,n)=LayerNorm(ξ~t+n+GLUϕ~(ϕ(t,n)))\tilde{\boldsymbol{\phi}}(t, n)=\operatorname{LayerNorm}\left(\tilde{\boldsymbol{\xi}}_{t+n}+\operatorname{GLU}_{\tilde{\phi}}(\boldsymbol{\phi}(t, n))\right)

Static Enrichment Layer
ls-type:: annotation
hl-page:: 10
hl-color:: yellow
用静态变量增强时序特征 #card

  • 静态协变量通常对时间动态有重要影响 static covariates often have a significant influence on the temporal dynamics
    ls-type:: annotation
    hl-page:: 10
    hl-color:: yellow

  • θ(t,n)=GRNθ(ϕ~(t,n),ce)\boldsymbol{\theta}(t, n)=\operatorname{GRN}_\theta\left(\tilde{\boldsymbol{\phi}}(t, n), \boldsymbol{c}_e\right)

    • ce is a context vector from a static covariate encoder
      ls-type:: annotation
      hl-page:: 10
      hl-color:: yellow

自关注模块 Temporal Self-Attention Layer
ls-type:: annotation
hl-page:: 11
hl-color:: yellow
学习时序数据的长期依赖关系并提供模型可解释性 #card

  • [[Interpretable Multi-Head Attention]],每个 head 使用相同的 QW,然后用多头 attention score 加权平均后的 V 做解释

    • B(t)=\boldsymbol{B}(t)= InterpretableMultiHead (Θ(t),Θ(t),Θ(t))(\boldsymbol{\Theta}(t), \boldsymbol{\Theta}(t), \boldsymbol{\Theta}(t))
  • δ(t,n)=LayerNorm(θ(t,n)+GLUδ(β(t,n)))\boldsymbol{\delta}(t, n)=\operatorname{LayerNorm}\left(\boldsymbol{\theta}(t, n)+\operatorname{GLU}_\delta(\boldsymbol{\beta}(t, n))\right).

Position-wise Feed-forward Layer
ls-type:: annotation
hl-page:: 11
hl-color:: yellow
对自关注层的输出应用额外的非线性处理公式 #card

  • ψ(t,n)=GRNψ(δ(t,n))\boldsymbol{\psi}(t, n)=\operatorname{GRN}_\psi(\boldsymbol{\delta}(t, n))

  • ψ~(t,n)=LayerNorm(ϕ~(t,n)+GLUψ~(ψ(t,n)))\tilde{\boldsymbol{\psi}}(t, n)=\operatorname{LayerNorm}\left(\tilde{\boldsymbol{\phi}}(t, n)+\operatorname{GLU}_{\tilde{\psi}}(\boldsymbol{\psi}(t, n))\right)

作者

Ryen Xiang

发布于

2024-10-05

更新于

2025-04-23

许可协议


网络回响

评论