@DuETA: Traffic Congestion Propagation Pattern Modeling via Efficient Graph Learning for ETA Prediction at Baidu Maps

[[Attachments]]

核心贡献

  • 新颖性:通过 route-aware graph transformer 捕捉拥堵敏感图中长距离相关性,建模拥堵传播模式

    • The design of DuETA is driven by the novel ideas that directly capture the long-distance correlations through a congestion-sensitive graph, and that model traffic congestion propagation patterns via a route-aware graph transformer.
      ls-type:: annotation
      hl-page:: 2
      hl-color:: yellow

    • 捕捉任意两个( 距离很远,但是在路况状态上很相关的)segment 之间的交互

      • These designs enable DuETA to capture the interactions between any two road segment pairs that are spatially distant but highly correlated with traffic conditions.
        ls-type:: annotation
        hl-page:: 2
        hl-color:: yellow
  • 通过学习交通拥堵传播模式可以有效提高 ETA 预测效果

    • traffic congestion propagation patterns
      ls-type:: annotation
      hl-page:: 2
      hl-color:: yellow

核心问题

  • 业务需求

    • 预测的未来路况状态和真实状态不一致会导致 ETA 误差传播 we observed that a propagation of ETA errors arises from the sharp inconsistency between the predicted traffic condition in the future and ground truth.
      ls-type:: annotation
      hl-page:: 2
      hl-color:: green

    • 建模 traffic congestion propagation patter

      • Traffic congestion propagation pattern modeling is challenging, and it requires accounting for impact regions over time and cumulative effect of delay variations over time caused by traffic events on the road network.
        ls-type:: annotation
        hl-page:: 1
        hl-color:: red

      • 当前交通拥堵路段会影响路网上相邻道路的通行能力 As illustrated in it, the impact regions and cumulative delays over time caused by traffic congestion (the road segments in red) would inevitably affect all the interdependent segments on the road network.
        ls-type:: annotation
        hl-page:: 2
        hl-color:: green

        • 用户请求 ETA 时,只有 3-hop 拥堵,但是由于拥堵传播,等用户到达 target 时,2-hop 拥堵,部分y1-hop 缓行

        • 之前使用 [[STGNN]] 类方法建模直接相邻的路段 existing studies have applied spatial-temporal graph neural networks (STGNNs)[7 , 8 , 21 , 34, 35 , 38 ] to model traffic conditions
          ls-type:: annotation
          hl-page:: 2
          hl-color:: blue
          存在两个问题

          • 没有直接建模路网上不相邻 segment 的远距离相关性,网络传播过程中会有信息损失 The long-distance correlations of indirectly connected road segments are not explicitly modeled, which inevitably suffer from information loss during the multi-step message passing.
            ls-type:: annotation
            hl-page:: 2
            hl-color:: blue

          • 由于 STGCNN 方法计算的复杂度,大部分时候补数很少。两个距离较远的 segment 的路况状态特征不能很好传递。 Traffic conditions are not sufficiently transmitted between two road segments that are spatially distant, because they typically execute only a few steps of message passing (one step in most cases), due to the computational complexity of STGNNs.
            ls-type:: annotation
            hl-page:: 2
            hl-color:: blue

        • [:span]
          ls-type:: annotation
          hl-page:: 2
          hl-color:: green

  • 面临挑战

    • ETA 任务需要建模 contextual and predictive factors, such as spatial-temporal interaction, driving behavior, and traffic congestion propagation inference
      ls-type:: annotation
      hl-page:: 1
      hl-color:: green

    • 路网中新 segment 和 未知区域 we plan to investigate the transferability of our model to deal with unseen road segments or regions.
      ls-type:: annotation
      hl-page:: 9
      hl-color:: yellow

    • 路线旁边 poi 的影响 Second, given the observation that the travel times of some routes have a considerable correlation with the POIs distributed along the roads.
      ls-type:: annotation
      hl-page:: 9
      hl-color:: yellow

      • 特定地点特定时间

      • poi 密集区域对 eta 预测影响 To address this issue, we plan to utilize the POI retrieval system [5, 11, 13] as an auxiliary tool to forecast which POIs would be densely populated and how extensively they would affect the ETA prediction.
        ls-type:: annotation
        hl-page:: 9
        hl-color:: yellow

      • TODO 待找 poi 相关

        • MetaLearned Spatial-Temporal POI Auto-Completion for the Search Engine at Baidu Maps.

        • Personalized Prefix Embedding for POI Auto-Completion in the Search Engine of Baidu Maps

        • HGAMN: Heterogeneous Graph Attention Matching Network for Multilingual POI Retrieval at Baidu Maps

相关工作

  • ETA 任务方法

    • segment-based methods

      • computationally efficient and scalable
        ls-type:: annotation
        hl-page:: 9
        hl-color:: blue

      • do not account for the information of the travel route
        ls-type:: annotation
        hl-page:: 9
        hl-color:: blue

    • end-to-end methods

      • 之间方法对 拥堵传播建模不够 most existing methods are inefficient for modeling the traffic congestion propagation patterns along the route.
        ls-type:: annotation
        hl-page:: 9
        hl-color:: blue
  • STGCNN [[Traffic Flow Forecasting]]

    • 提升 GNN 层数感受野增加太多 increasing the depth of a GNN often means exponential expansion of the neighbor scope
      ls-type:: annotation
      hl-page:: 9
      hl-color:: blue

    • 子图 properly extracted subgraph
      ls-type:: annotation
      hl-page:: 9
      hl-color:: blue

  • [[@ConSTGAT: Contextual Spatial-Temporal Graph Attention Network for Travel Time Estimation at Baidu Maps]] 建模时空关系

  • [[@SSML: Self-Supervised Meta-Learner for En Route Travel Time Estimation at Baidu Maps]] 建模驾驶员行为

解决方法

  • traffic conditions 是动态特征

    • 过去 1 小时路况特征,每 5 分钟一个分桶,共 12 个 he traffic conditions of the past one hour are collected as features, which are divided into 12 time slots (5 minutes per time slot)
      ls-type:: annotation
      hl-page:: 3
      hl-color:: yellow

    • median speed, max speed, min speed, mean speed, and record counts as features
      ls-type:: annotation
      hl-page:: 3
      hl-color:: yellow

  • Congestion-sensitive Graph GCS=(L,{Erf}r=15,Eh)\mathcal{G}^{C S}=\left(\mathcal{L},\left.\left\{\mathcal{E}_r^f\right\}\right|_{r=1} ^5, \mathcal{E}^h\right)

    • we construct a congestion-sensitive graph based on the correlations of traffic patterns.
      ls-type:: annotation
      hl-page:: 2
      hl-color:: yellow

    • 对于某一个 link 找一阶相邻 link 以及高阶相邻link(可能和当前 link 的路况状态有关系)

      • we take advantage of the first-order neighbor links, as well as the high-order neighbor links whose traffic patterns are highly correlated to that of link 𝑙
        ls-type:: annotation
        hl-page:: 3
        hl-color:: yellow
    • First-order Neighbors
      ls-type:: annotation
      hl-page:: 3
      hl-color:: yellow

      • [[ConSTGAT]] 不同相邻 link 对于当前 link 的影响

        • 当前 link 的路况状态可能受下游影响大于上游 the traffic congestion is more likely to propagate from downstream links to upstream links.
          ls-type:: annotation
          hl-page:: 3
          hl-color:: blue
      • 具体过程

        • 定义多种 link 之间关系,并在建图中考虑这些关系 define multiple types of link relations and incorporate these relations into the construction of the congestion-sensitive graph
          ls-type:: annotation
          hl-page:: 3
          hl-color:: yellow

        • 用 attention 分别处理各种关系捕捉影响 use attention mechanism separately for each relation to capture the impact of neighbor links,
          ls-type:: annotation
          hl-page:: 3
          hl-color:: yellow

        • 用 edge 描述两个 link 之间的关系,一共有 5 种类型

          • [:span]
            ls-type:: annotation
            hl-page:: 3
            hl-color:: yellow

          • An edge describes the relation between two links
            ls-type:: annotation
            hl-page:: 3
            hl-color:: yellow

          • 2 是上游 link

          • 3 是下游 link

          • 剩余三种 link 不在路线中,但是这些 link 的路况状态可能影响目标 link(车辆阻塞路口)

    • High-order Neighbors
      ls-type:: annotation
      hl-page:: 3
      hl-color:: yellow

      • 间接连接 link 也很重要 the long-distance associations between indirectly connected links are also crucial for ETA prediction
        ls-type:: annotation
        hl-page:: 3
        hl-color:: yellow

      • 如何从高阶邻居采样?

        • link 从 historical route 从取 2-hop 到 5-hop 的邻居 link

        • 计算 link 和邻居 link 的 Pearson correlation
          ls-type:: annotation
          hl-page:: 4
          hl-color:: yellow
          ci,jr=cov(T1,T2)ρT1ρT2c^r_{i,j} = \frac{\operatorname{cov}\left(T_1, T_2\right)}{\rho_{T_1} \rho_{T_2}}

          • 取 link 过去 2 小时,每 5 分钟的平均通过时间序列 T1=[t10,t11,,t123]T_1=\left[t_1^0, t_1^1, \cdots, t_1^{23}\right]T2=[t20,t21,,t223]T_2=\left[t_2^0, t_2^1, \cdots, t_2^{23}\right]
        • 累加同一个 link pair 在不同 route 上的相关系数得到 ci,jfinalc^{final}_{i,j}

        • 每个 link 取相关系数 top5 的邻居 link

      • 连接 link 和 high-order neighbor links high-order edge is defined as an edge that connects a link and one of its high-order neighbor links.
        ls-type:: annotation
        hl-page:: 4
        hl-color:: yellow

      • [:span]
        ls-type:: annotation
        hl-page:: 3
        hl-color:: yellow

  • [[Graph Transformer]] Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification

    • 多头学习 edge 的权重 t adopts the multi-head attention mechanism [ 23] to learn edge weights.
      ls-type:: annotation
      hl-page:: 4
      hl-color:: yellow

      • 对于每个 edge 计算 attention score

        • qc,i=WcQxi+bcQ,kc,j=WcKxj+bcK,vc,j=WcVxj+bcV,αc,i,j=qc,i,kc,jkN(i)qc,i,kc,k,\begin{gathered}\mathbf{q}_{c, i}=\mathbf{W}_c^Q \mathbf{x}_i+\mathbf{b}_c^Q, \\ \mathbf{k}_{c, j}=\mathbf{W}_c^K \mathbf{x}_j+\mathbf{b}_c^K, \\ \mathbf{v}_{c, j}=\mathbf{W}_c^V \mathbf{x}_j+\mathbf{b}_c^V, \\ \alpha_{c, i, j}=\frac{\left\langle\mathbf{q}_{c, i}, \mathbf{k}_{c, j}\right\rangle}{\sum_{k \in \mathcal{N}(i)}\left\langle\mathbf{q}_{c, i}, \mathbf{k}_{c, k}\right\rangle},\end{gathered}
      • 计算 link i 的表示

        • hi=xi+1Cc=1CjN(i)αc,i,jvc,j\mathbf{h}_i=\mathbf{x}_i+\frac{1}{C} \sum_{c=1}^C \sum_{j \in \mathcal{N}(i)} \alpha_{c, i, j} \mathbf{v}_{c, j}
    • resnet 解决 [[GNN]] 的 oversmoothing 问题 t addresses the oversmoothing problem in vanilla GNNs by residual connections.
      ls-type:: annotation
      hl-page:: 4
      hl-color:: yellow

  • route-aware graph transformer

    • [:span]
      ls-type:: annotation
      hl-page:: 4
      hl-color:: yellow

tags:: #[[Model Architecture]] #[[Graph Transformer]]

+ 重新构建的图$\mathcal{G}^{C S}=\left(\mathcal{L},\left.\left\{\mathcal{E}_r\right\}\right|_{r=1} ^6\right)$有六种类型的边,拆分成六张子图,每一张子图用一个 transformer

  + $\mathbf{h}_i=\mathbf{x}_i+\frac{1}{6 C} \sum_{r=1}^6 \sum_{c=1}^C \sum_{j \in \mathcal{N}_r(i)} \alpha_{c, i, j}^{(r)} \mathbf{v}_{c, j}^{(r)}$

+ 之前的特征 transformer 无法区分一个 link 是否在路线上,无法生成不同的表示

  + the graph transformer is unable to identify whether a link belongs to a given route or not

ls-type:: annotation
hl-page:: 5
hl-color:: yellow

  + 5a 中路线不同,但是 a 通过 transformer 学习到表示可能相同

  + [:span]

ls-type:: annotation
hl-page:: 5
hl-color:: yellow

+ route-aware structural encoding

  + position encoding

    + 与当前 link 的最近距离

    + encode the order information of a link.

ls-type:: annotation
hl-page:: 5
hl-color:: yellow

    + 控制控制依赖这条 link 路况的程度 be regarded as a gate to control the degree of dependency of the traffic condition when a user requests the ETA.

ls-type:: annotation
hl-page:: 5
hl-color:: yellow

  + route identifier

    + 表示当前 link 是否在路线上

+ Integration

ls-type:: annotation
hl-page:: 5
hl-color:: yellow

  + a 1-D convolution layer (Conv1D) with a window size of 3

ls-type:: annotation
hl-page:: 5
hl-color:: yellow

  + MLP+ReLU 预估每条 link 的 travel time,累加得到整个行程的 eta

    + $\left[\hat{y}_1, \hat{y}_2, \cdots, \hat{y}_m\right]=\operatorname{MLP}\left(\operatorname{Conv1D}\left(\left[\mathbf{h}_1, \mathbf{h}_2, \cdots, \mathbf{h}_m\right]\right)\right)$

  + [\[\[Multi-Task Learning\]\]](/post/logseq/Multi-Task%20Learning.html) 优化 link 级别和路线级别 eta Multi-task learning is adopted to optimize the model parameters from both the link-level and the route-level.

ls-type:: annotation
hl-page:: 5
hl-color:: yellow

    + link-level loss function [[Huber Loss]]

      + $L_{l i n k}\left(\hat{y}_i, y_i\right)= \begin{cases}\frac{1}{2}\left(\hat{y}_i-y_i\right)^2, & \left|\hat{y}_i-y_i\right|<\delta, \\ \delta\left(\left|\hat{y}_i-y_i\right|-\frac{1}{\delta}\right), & \text { otherwise }\end{cases}$

    + route-level loss function

      + $L_{\text {route }}(\hat{y}, y)=\frac{|\hat{y}-y|}{y}$

    + 最终 loss

      + $L=\frac{1}{n} \sum_i^n\left(\frac{1}{m^{(i)}} \sum_{j=1}^{m^{(i)}} L_{\text {link }}\left(\hat{y}_j^{(i)}, y_j^{(i)}\right)+L_{\text {route }}\left(\hat{y}^{(i)}, y^{(i)}\right)\right)$

实验结论

  • 实验数据

    • 行程 link 数和 didi 差不多

    • 2021.10.10-2021.11.20,5周训练,1周测试

  • 指标

    • mae rmse mape
  • Baseline

    • AVG 请求时刻路况速度

    • STANN

      • STGNN、attention+LSTM

      • 只考虑相邻 link

    • [[DCRNN]]

      • GCN 处理 spatial info

      • LSTM 处理 temporal info

    • DeepTravel

      • bidirectional LSTM
    • [[ConSTGAT]]

    • DuETA

      • the embedding size and the hidden size of DuETA to be 32
        ls-type:: annotation
        hl-page:: 6
        hl-color:: green

      • attention heads 𝐶 is set to be 8
        ls-type:: annotation
        hl-page:: 6
        hl-color:: green

      • Adam

      • 3e-5

  • 结果

    • DeepTravel 和 ConSTGAT

      • End-to-end methods are more effective than the segment-based methods
        ls-type:: annotation
        hl-page:: 6
        hl-color:: yellow

      • the correlations of spatial and temporal information are jointly modeled.
        ls-type:: annotation
        hl-page:: 6
        hl-color:: yellow

    • DuETA

      • 对远距离拥堵更加敏感 sensitive to long-distance traffic congestion,
        ls-type:: annotation
        hl-page:: 6
        hl-color:: yellow

      • 处理交通事件带来的影响 On the other hand, the cumulative effect of delay variations over time caused by traffic events on the road network can be alleviated by the high efficiency of traffic congestion pattern modeling.
        ls-type:: annotation
        hl-page:: 6
        hl-color:: yellow

    • [:span]
      ls-type:: annotation
      hl-page:: 6
      hl-color:: yellow

  • Ablation Studies
    ls-type:: annotation
    hl-page:: 7
    hl-color:: yellow

    • 主要组件对比

      • removing both components hurts performance significantly in all three cities.
        ls-type:: annotation
        hl-page:: 7
        hl-color:: yellow

      • [:span]
        ls-type:: annotation
        hl-page:: 6
        hl-color:: yellow

  • route identifier 效果 To obtain an understanding of the effect of the route identifier, we visualize the distributions of the attention weights in Figure 7.
    ls-type:: annotation
    hl-page:: 7
    hl-color:: yellow

    • w/o 组 Off route 的 attention weight 比 Complete 组大,加上这个模块模型能更关注路线上的 link

      • enables DuETA to pay more attention to the links that are in the travel routes.
        ls-type:: annotation
        hl-page:: 7
        hl-color:: yellow
    • [:span]
      ls-type:: annotation
      hl-page:: 7
      hl-color:: yellow

  • congestion-sensitive graph
    ls-type:: annotation
    hl-page:: 7
    hl-color:: yellow

    • 存在部分远处 link 相关性比附近 link 强 he average Pearson correlation coefficients of our selected high-order neighbors is much higher than those of the second-order and third-order neighbors
      ls-type:: annotation
      hl-page:: 7
      hl-color:: yellow

    • [:span]
      ls-type:: annotation
      hl-page:: 7
      hl-color:: yellow

    • 远处拥堵 case 提升效果大 examine the relative improvements of high-order neighbors in cases of traffic congestion2 and normal traffic.
      ls-type:: annotation
      hl-page:: 7
      hl-color:: yellow

      • [:span]
        ls-type:: annotation
        hl-page:: 8
        hl-color:: yellow

  • Practical Applicability
    ls-type:: annotation
    hl-page:: 8
    hl-color:: yellow

    • P 发生拥堵,dueta 能预测未来路线上会堵

    • [:span]
      ls-type:: annotation
      hl-page:: 8
      hl-color:: yellow

  • Online Evaluation
    ls-type:: annotation
    hl-page:: 8
    hl-color:: yellow

    • 2022.4.12-2022.4.18

    • 全部、长短单、平峰和高峰

    • 在线 RMSE 高 averaged RMSE scores in the online evaluation of DuETA are higher than those in the offline evaluation
      ls-type:: annotation
      hl-page:: 8
      hl-color:: green

      • 数据噪音大
    • [:span]
      ls-type:: annotation
      hl-page:: 9
      hl-color:: yellow

读后总结

  • 只考虑下游关系,tcn 之类的考虑上游真的有用吗

@Interpreting Trajectories from Multiple Views: A Hierarchical Self-Attention Network for Estimating the Time of Arrival

[[Attachments]]

关键信息

相关工作

  • traffic flow prediction [[Traffic Flow Forecasting]]

    • GMAN: A graph multi-attention network for traffic prediction 基于图的多注意力机制来预测交通状况 GMAN [ 50] employs a graph multi-attention structure to extract the spatial and temporal relationships
      ls-type:: annotation
      hl-page:: 2
      hl-color:: green

      • 图学习通常会受到不相关的空间邻域区域的负面影响,尤其当区域变大,这种影响会导致误差传播 graph representation learning generally suffers from the negative impact from irrelevant spatial neighboring regions, resulting in error propagation especially when the involved area grows larger
        ls-type:: annotation
        hl-page:: 2
        hl-color:: green

      • 图建模被限制在狭窄的邻近区域,在开发大规模城市系统中存在不足 graph modeling is limited to process only narrow neighboring regions and falls short on developing large-scale urban-wise systems
        ls-type:: annotation
        hl-page:: 2
        hl-color:: yellow
        [[ConSTGAT]]

  • travel time estimation

  • trajectory recovery and inference

  • DeepTTE
    ls-type:: annotation
    hl-page:: 3
    hl-color:: yellow
    raw GPS sequences geo-convolutional network LSTM

  • [[WDR]] wide-deep-recurrent network

    • CoDriverETA 2020 滴滴
  • ConSTGAT 和 CompactETA 图建模

  • DeepGTT
    ls-type:: annotation
    hl-page:: 3
    hl-color:: yellow
    deep generative model for learning the distribution of travel time

  • [[HetETA]] learns the representation of spatio-temporal information using a multi-relational network;
    ls-type:: annotation
    hl-page:: 3
    hl-color:: blue

  • [[TTPNet]] 张量分解和图embedding从历史轨迹中学习速度和表示 extracts the travel speed and representation of road network from historical trajectories based on tensor decomposition and graph embedding.
    ls-type:: annotation
    hl-page:: 3
    hl-color:: blue

核心贡献

  • 利用三个视图之间的层次关系对道路底层结构进行建模 HierETA exploits the hierarchical relationship among the three views to portray the underlying road structure
    ls-type:: annotation
    hl-page:: 2
    hl-color:: yellow

  • 分层自自注意力网络根据 segment, link, intersection 之间自然关系进行高效组织 proposed hierarchical self-attention network organizes the segment-, link-, and intersection-views efficiently according to their natural relationships.
    ls-type:: annotation
    hl-page:: 2
    hl-color:: yellow

  • 自适应自注意力网络合并,以在多视图表示框架中共同利用全局和局部模式进行时空依赖建模。 adaptive self-attention network to jointly leverage the global and local patterns for spatio-temporal dependency modeling within the multi-view representation framework.
    ls-type:: annotation
    hl-page:: 2
    hl-color:: yellow

    • 利用多视图序列明确捕捉轨迹的时空依赖关系 we design an adaptive self-attention network to explicitly capture the spatio-temporal dependencies of the trajectory using multi-view sequences
      ls-type:: annotation
      hl-page:: 3
      hl-color:: yellow
  • hierarchy-aware attention decoder
    ls-type:: annotation
    hl-page:: 2
    hl-color:: yellow
    利用从不同粒度的信息上学习到上下文特征预估最终 ETA

核心问题

  • 传统 ETA 方法采用分治策略,将一个轨迹拆分成多个小段,然后累加每个小段的预测结果得到整体 ETA traditional ETA algorithms mainly employ the divide-and-conquer strategy by representing a trajectory as a segment sequence and then summing up the local predictions
    ls-type:: annotation
    hl-page:: 1
    hl-color:: blue

    • segment-view 将轨迹拆分成多个小段,然后通过小段计算同行时间 most of them decompose a trajectory into several segments and then compute the travel time by integrating the attributes from all segments
      ls-type:: annotation
      hl-page:: 1
      hl-color:: blue

      • 累积误差
  • 多视图下建模困难

    • 常规方法用 segment 建模,不考虑 link On the one hand
      ls-type:: annotation
      hl-page:: 2
      hl-color:: yellow

      • 不使用 link 建模,现有的研究很困难对同一个 link 中的多个段之间的一致性建模 However, without explicitly modeling the link-view characteristics, existing studies can hardly model the coherent consistency across segments within the same links.
        ls-type:: annotation
        hl-page:: 2
        hl-color:: red
    • segment 和 intersection 的属性是不一致的, 很难用同一个网络去建模,大部分选择忽视路口或简化建模 On the other hand
      ls-type:: annotation
      hl-page:: 2
      hl-color:: yellow

    • ETA 会受到路口等待影响

  • 什么是 trajectory

    • 三个角度 link、Intersection、Segment

      • segement

        • segment 是人工生成的,用来捕捉细粒度的局部交通情况,在表征道路网络结构方面并不完全 segment-view representation is artificially produced to capture the fined-grained local traffic conditions, which is however not comprehensive in characterizing the natural structure of the road network
          ls-type:: annotation
          hl-page:: 1
          hl-color:: green
      • link

        • 提供静态道路属性,pavement type,道路宽度、道路等级 preserve static road characteristics, such as pavement type, road width and road functional level
          ls-type:: annotation
          hl-page:: 1
          hl-color:: green
      • intersection

        • 等待时间、交通灯数量、历史车流量 valued information such as the waiting time, the number of traffic lights, and the historical traffic volume
          ls-type:: annotation
          hl-page:: 1
          hl-color:: green
    • link 和 intersection 粗粒度表示轨迹属性,link 可以进一步拆分成多个小段,segment 可以细粒度对空间依赖性进行建模 the link- and intersection-views characterize the trajectory attributes from a coarse perspective; a link can be further decomposed into several segments, and hence the segment-view representation models the spatial dependencies at a fine granularity
      ls-type:: annotation
      hl-page:: 2
      hl-color:: yellow

      • 猜测基于作者的假设,两个Intersection 之间的整条路被称之为 Link
    • [:span]
      ls-type:: annotation
      hl-page:: 2
      hl-color:: green

HierETA Hierarchical Self-Attention Network for Estimating the Time of Arrival

  • [:span]
    ls-type:: annotation
    hl-page:: 4
    hl-color:: yellow

tags:: [[Model Architecture]] [[ETA]]

  • Attribute Feature Extractor
    ls-type:: annotation
    hl-page:: 3
    hl-color:: yellow

    • 连续特征 z-score

    • 类别特征 embedding

    • 全局特征共享

  • Hierarchical Self-Attention Network for Multi-View Trajectory Representation
    ls-type:: annotation
    hl-page:: 4
    hl-color:: yellow

    • segment encoder

      • 对同一个 link 内的时空依赖进行建模 capture the spatiotemporal dependencies of segments in the same link
        ls-type:: annotation
        hl-page:: 4
        hl-color:: yellow

        • a segment encoder is developed to capture the spatio-temporal dependencies at a fine granularity
          ls-type:: annotation
          hl-page:: 1
          hl-color:: yellow
      • 利用 BiLSTM 处理 [xjsxr][x^s_j|x_r],正向和反向结果 concat 成 segment 的表示 HjsH^s_j

      • 同一个 link 内 segement 记作 Hs=[H1s,,Hns]Rn×dsH^s=\left[H_1^s, \ldots, H_n^s\right] \in \mathbb{R}^{n \times d_s}

        • 计算出 j-th segment 和 link 内其他 segment 的全局相似度 GPj=QjKTdsG P_j=\frac{Q_j K^T}{\sqrt{d}_s}

        • a local semantic pattern
          ls-type:: annotation
          hl-page:: 4
          hl-color:: yellow
          局部相似度

          • LPj(k)={GPj(k),jkω, otherwise L P_j(k)= \begin{cases}G P_j(k), & |j-k| \leq \omega \\ -\infty, & \text { otherwise }\end{cases}

          • 取 j 相邻 ω\omega 个 segment 计算相似度

          • 捕获局部 segment 的依赖,加强局部的拥堵转移

      • 用门控机制平衡全局和局部attention结果 a gating mechanism
        ls-type:: annotation
        hl-page:: 4
        hl-color:: yellow

        • Fjs=(1zj)Att(GPj)+zjAtt(LPj)F_j^s=\left(1-z_j\right) \odot \operatorname{Att}\left(G P_j\right)+z_j \odot \operatorname{Att}\left(L P_j\right)

          • 控制参数怎么学 zj=σ(WhHjs+WgAtt(GPj)+WlAtt(LPj)+bz)z_j=\sigma\left(W_h H_j^s+W_g A t t\left(G P_j\right)+W_l A t t\left(L P_j\right)+b_z\right)
        • ResNet + LN

      • 所有 link 的 encoder 参数共享以及并行计算

        • n adaptive self-attention module is designed to boost performance
          ls-type:: annotation
          hl-page:: 1
          hl-color:: yellow
    • Joint Link-Intersection Encoder.
      ls-type:: annotation
      hl-page:: 4
      hl-color:: yellow

      • 道路属性

      • o characterize the natural trajectory structure consisting of alternatively arranged links and intersections
        ls-type:: annotation
        hl-page:: 1
        hl-color:: yellow

      • 为什么要设计这个模块?

        • segment-view 无法对同一个 link 内 segment 共享的一致性进行建模

          • t fails to model the consistency shared within the same link
            ls-type:: annotation
            hl-page:: 4
            hl-color:: red
        • 粗粒度表示 coarse-scale representation
          ls-type:: annotation
          hl-page:: 4
          hl-color:: red

        • link 和 intersections 交替出现 as links and intersections appear alternatively
          ls-type:: annotation
          hl-page:: 4
          hl-color:: yellow

      • link 表示:xil=j=1nγijhijsx_i^l=\sum_{j=1}^n \gamma_{i j} h_{i j}^s

        • link 内的 segment 表示是 {hijs}j=1n\left\{h_{i j}^s\right\}_{j=1}^n

        • 加权融合 segment 得到 link 表示,权重计算方法 [[Attention]] γij=softmaxj(Wγhijs+bγ)\gamma_{i j}=\operatorname{softmax}_j\left(W_\gamma h_{i j}^s+b_\gamma\right)

      • 得到 link 和 intersections 的表示后,分别用编码 employ two BiLSTMs to respectively encode the links and intersections
        ls-type:: annotation
        hl-page:: 5
        hl-color:: yellow
        得到 Hil{H^l_i}Hic{H^c_i},concat 在一起得到 H^il=[HilHic]\hat{H}_i^l=\left[H_i^l \mid H_i^c\right]

      • 上一步得到向量经过 the joint link-intersection encoder also includes a self-attention layer, a residual connection and a layer normalization
        ls-type:: annotation
        hl-page:: 5
        hl-color:: yellow
        得到 {hil}i=1m\left\{h_i^l\right\}_{i=1}^m

      • 去除这个 encoder 中的 local pattern
        ls-type:: annotation
        hl-page:: 5
        hl-color:: red
        ,因为相邻 link 之间的交通影响更加弱和稀疏,避免过拟合

      • 总结

        • segement-view 捕捉局部交通信息 segment-view context feature that captures the local traffic conditions
          ls-type:: annotation
          hl-page:: 5
          hl-color:: yellow

        • link-intersection context 表达道路属性 joint link-intersection context feature that preserves the common road attributes
          ls-type:: annotation
          hl-page:: 5
          hl-color:: yellow

  • Hierarchy-Aware Attention Decoder
    ls-type:: annotation
    hl-page:: 5
    hl-color:: yellow
    层次感知注意力解码器

    • realize a tradeoff between the multi-view spatio-temporal features
      ls-type:: annotation
      hl-page:: 1
      hl-color:: yellow

    • sub-route 对于最后的 eta 贡献是不一样的(拥堵路口和道路应该给予更多关注)

      • travel time estimation is closely related to the critical components
        ls-type:: annotation
        hl-page:: 5
        hl-color:: green
    • ETA R=(1λ)i=1mj=1nαijhijs+λi=1mβihil\mathcal{R}=(1-\lambda) \sum_{i=1}^m \sum_{j=1}^n \alpha_{i j} h_{i j}^s+\lambda \sum_{i=1}^m \beta_i h_i^l

      • segment 的表示以及 link-intersection 的表示

      • alpha 和 beta 都是注意力权重

      • 设计注意力引导机制,利用 link-view 之间的关系调整 segment-view 之间的 attention attention guidance that adopts the link-view consistency to further adjust the segment-view attention
        ls-type:: annotation
        hl-page:: 5
        hl-color:: red

        • 先计算 link 的注意力 βi=softmaxi(fl(hil,xr))\beta_i=\underset{i}{\operatorname{softmax}}\left(f^l\left(h_i^l, x^r\right)\right)

          • fl(hil,xr)=vTtanh(w1hil+w2xr+b)f^l\left(h_i^l, x^r\right)=v^T \tanh \left(w_1 h_i^l+w_2 x^r+b\right)

          • xr 是外部影响因素

        • 根据 link 注意力计算 segment 之间注意力 αij=softmax(i,j)(βifs(hijs,xr))\alpha_{i j}=\underset{(i, j)}{\operatorname{softmax}}\left(\beta_i f^s\left(h_{i j}^s, x^r\right)\right)

          • 考虑 segment 之间的重要性,如果不考虑 link 之间的重要性, separately processing each segment without considering the link-view correlation is problematic as it lacks the feedback from the link-view consistency.
            ls-type:: annotation
            hl-page:: 5
            hl-color:: red
      • 改方法可以自适应选择不同表示粒度中最相关的特征 we can adaptively select the most relevant features from different representation granularities.
        ls-type:: annotation
        hl-page:: 5
        hl-color:: yellow

        • 可以实现是几个 link 权重大,还是几个 segment 权重大
    • L(Θ)=1Nk=1NYkY^k\mathcal{L}(\Theta)=\frac{1}{N} \sum_{k=1}^N\left|Y_k-\hat{Y}_k\right|

EXPERIMENTS
ls-type:: annotation
hl-page:: 5
hl-color:: green

  • 20 天训练,1 天评估,预测 7 天

  • 数据分布 probability density functions (PDFs) and cumulative distribution functions (CDFs)
    ls-type:: annotation
    hl-page:: 5
    hl-color:: green

    • [:span]
      ls-type:: annotation
      hl-page:: 6
      hl-color:: yellow
  • We repeat each experiment for five times except the statistics-based approach Route-ETA and report the mean and the standard deviation of different runs.
    ls-type:: annotation
    hl-page:: 6
    hl-color:: green
    训练 5 次取平均

  • 指标 mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), and satisfaction rate (SR), similar to existing approaches [ 23 ]. Specifically, SR refers to the fraction of trips with error rates less than 10% and a higher SR indicates better performance and customer satisfaction
    ls-type:: annotation
    hl-page:: 6
    hl-color:: green

  • 实验结果

    • [:span]
      ls-type:: annotation
      hl-page:: 7
      hl-color:: yellow

    • [[ConSTGAT]] ConstGAT considers the graph structures of the road network to exploit the joint relations of spatio-temporal information.
      ls-type:: annotation
      hl-page:: 7
      hl-color:: blue

    • HierETA 更具有可解释性,对潜在道路网络结构进行建模

    • 误差分析:所有距离分桶中指标都提升了,长单提升更加明显。

      • 层次化建模效果好 That is, interpreting the trajectory from multiple views effectively portrays the hierarchical structure of road network and eases the error propagation for estimating the travel time.
        ls-type:: annotation
        hl-page:: 7
        hl-color:: green

      • [:span]
        ls-type:: annotation
        hl-page:: 7
        hl-color:: green

  • 模型分析

    • window sizes
      ls-type:: annotation
      hl-page:: 8
      hl-color:: green

      • 局部窗口效果好

      • segment 之间距离越远,之间的关联性越弱 he correlation between adjacent segments slightly decreases while the modeling uncertainty increases.
        ls-type:: annotation
        hl-page:: 8
        hl-color:: green

      • [:span]
        ls-type:: annotation
        hl-page:: 8
        hl-color:: yellow

    • segment 和 link 的权重

      • 只考虑其中一个指标差
    • [[Ablation Study]]

      • 有无 local 或 global 特征

        • 建模细粒度交通信息 The local attention in encoder is removed to verify the effectiveness for modeling the semantic traffic condition.
          ls-type:: annotation
          hl-page:: 8
          hl-color:: green

        • 提取结构化交通模式 verify the necessity of extracting the structural traffic pattern.
          ls-type:: annotation
          hl-page:: 8
          hl-color:: green

      • 有无 guide

        • 无引导信息
      • 有无 路况信息

      • 有无 层次化结构 removing the joint link-intersection encoder
        ls-type:: annotation
        hl-page:: 8
        hl-color:: green
        ,没有这个效果显著下降

      • 从 1s 就是变化很大来说,这些网络结构都挺重要的

        • HierETA performs better than both variants that eliminating local and global attentions, which is contributed to the introduction of the global structural and local semantic patterns.
          ls-type:: annotation
          hl-page:: 8
          hl-color:: yellow
      • [:span]
        ls-type:: annotation
        hl-page:: 8
        hl-color:: green


@Order Fulfillment Cycle Time Estimation for On-Demand Food Delivery

[[Abstract]]

  • 送达时间 Order Fulfillment Cycle Time (OFCT)

  • a novel post-processing layer

    • accounting for the distributional mismatch between the true OFCT values and those predicted by the model at initialization
  • By providing customers with conveniences such as easy access to an extensive variety of restaurants, effortless food ordering and fast delivery, on-demand food delivery (OFD) platforms have achieved explosive growth in recent years. A crucial machine learning task performed at OFD platforms is prediction of the Order Fulfillment Cycle Time (OFCT), which refers to the amount of time elapsed between a customer places an order and he/she receives the meal. The accuracy of predicted OFCT is important for customer satisfaction, as it needs to be communicated to a customer before he/she places the order, and is considered as a service promise that should be fulfilled as well as possible. As a result, the estimated OFCT also heavily influences planning decisions such as dispatching and routing.

[[Attachments]]

相关工作

  • ETA 分类

    • Route-based

    • OD-based

  • OFCT 比其他 ETA 独特的挑战

    • Significantly more influencing factors

      • location and characteristics of the restaurant

      • the food preparation time

      • the orders assigned to the same courier (which will alter his/her delivery itinerary) 骑手配送行为

    • Unavailability of critical information 预估时缺少关键信息

      • 分配到订单的骑手

      • 最终的路线

  • Time Window Assignment Problem

BackGroound

  • 城市划分成区域,区域内配送站

    • 订单终点对应一个指定的 poi 类型(学校、写字楼等)
  • 订单生命周期划分 PT(下单到取到餐),DT(取餐到送达),OFCT(下单到送达),CT(下单到做好)

  • 流程

    • 用户下单时,预估 OFCT

      • dispatching 分单

      • routing 路径规划

        • VRPTW
    • 骑手到达餐厅取餐

    • 骑手配送

Feature

  • 订单特征

    • Spartial feature 空间特征

      • 起点终点 id 和坐标,city/grid id
    • Temporal feature 时间特征

      • hour of the day

      • workday

    • Order size features 订单大小影响商家出餐速度

      • sku_num

      • price

        • sku_num 低估套餐的大小
  • Aggregation Features 聚合特征

    • 通过手机传感器收集信息(gps,wifi,bluetooth),得到 dt/pt/ofct

    • 对上面的信息进行聚合:20 分位数、平均、标准差、80 分位数

    • 通过一定规则选取订单集合,在集合中再分组计算统计信息。

      • 相同 city/grid/restaurant/hour of day 的订单

      • 发单前不同时长完成订单

  • Dish Features 餐品特征

    • sku 分配 uid,订单可以用包含的 uid 表示

    • 训练分类模型( 116 类),使用预测的分类结果

      • sku 标题做为输入,人工标签

      • fine-tuning 中文 Bert

  • Cooking Time Features

    • 出餐时间很难获取真值,从历史数据中挖掘。

    • 三种场景

      • 骑手到达太早,需要等到出餐完成

      • 骑手需要在同一个餐厅取多个订单,只有在餐厅完成最后一个订单才能离开

      • 骑手达到时,如果餐厅完成出餐可以立即进行配送统计

    • 聚合 departure time from the restaurant 得到 ct

  • Supply-and-Demand Features 供需特征

    • 供需比 demand-to-supply ratio (DSR) DSRo=C1Oouncompleted \mathrm{DSR}_{o}=|C|^{-1}\left|O_{o}^{\text {uncompleted }}\right|

      • 订餐需求在时间和空间上都会有突变,此时通过合并订单配送策略减轻影响。合单策略提升整体配送效率,但是单个订单配送时间变长(骑手需要绕路去取或送)。通过 DSR 来描述这种现象

      • Oouncompleted O_{o}^{\text {uncompleted }} 当前未完成订单,DSR 表示当前未完成订单和活跃配送员数的比例

    • 配送比 dropoff_ratio oo=Oouncompleted 1Oodropoff o_{o}=\left|O_{o}^{\text {uncompleted }}\right|^{-1}\left|O_{o}^{\text {dropoff }}\right|

      • 未完成订单中,已经取餐在配送中的订单占比
    • 餐厅出餐量 unfetched_order_count

      • 骑手未取订单数,值越大需要等待的时间越大
    • 单量预测相关,尝试后发现没用

      • 单量预测模型中使用的特征 OFCT 预估模型也使用,没有带来新的信息增益。
    • 图例

      • e DSR 越大,Bundle 越大,g 对应 OCFT 变大

      • DSR 一定,dropoff ratio 越大,代表快要释放的运力更多,然后bundle变小,ocft 变小。

  • Courier Features 骑手特征

    • 预测 OFCT 时,送单的骑手还没有确定,通过预测模型对附近的骑手进行排序,取 top-ranked 骑手做为派单对象。

    • 相关特征

      • 取单距离

      • work load 负载,骑手当前未完成订单数

      • urgency 紧急程度,骑手需要配送订单平均或最小剩余时间

      • Mutual Distances 订单终点、餐厅、骑手当前订单的终点的最短距离。距离越大,骑手越有可能绕路。

  • ETA-based Drop-off Time Feature 多维度相似订单的配送 ETA

    • 利用回归模型来学习配送段 ETA 无法很好处理长尾不规律 case,比如高峰期等待电梯

      • 利用历史订单作为配送段时间预估的语料

      • 历史订单用多维特征来表示

      • 新订单通过 k 近邻搜索出相似的历史订单

      • 对相似的历史订单真实配送段时间加权平均,作为新订单的预估配送段时间

    • 通过 OD-based ETA 方法(TEMP+R) 预估餐厅到终点的配送时间特征 DTo

      • xo~eta =[lon(ro~), lat (ro~),lon(po~), lat (po~),hr(aσ~creation )]Tw\mathbf{x}_{\widetilde{o}}^{\text {eta }}=\left[\operatorname{lon}\left(r_{\widetilde{o}}\right) \text {, lat }\left(r_{\widetilde{o}}\right), \operatorname{lon}\left(p_{\widetilde{o}}\right) \text {, lat }\left(p_{\widetilde{o}}\right), \operatorname{hr}\left(a_{\widetilde{\sigma}}^{\text {creation }}\right)\right]^{T} \odot w

        • 用经纬度+时间组成的向量表示订单,w 表示向量中每一项的权重
      • dissim(o~,o^)=xo~eta xo^eta 2\operatorname{dissim}(\widetilde{o}, \widehat{o})=\left\|\mathbf{x}_{\widetilde{o}}^{\text {eta }}-x_{\widehat{o}}^{\text {eta }}\right\|_{2}

        • 用 Euclidean distance 表示两个不同订单之间的距离
      • (i=1kwi)1i=1k(wiDToi)\left(\sum_{i=1}^{k} w_{i}\right)^{-1} \sum_{i=1}^{k}\left(w_{i} \cdot \mathrm{DT}_{o_{i}}\right)

        • 在历史订单中取 k 个最相似订单,构建该特征

        • wi=exp(dissim(oi,o)σ2)w_{i}=\exp \left(-\frac{\operatorname{dissim}\left(o_{i}, o\right)}{\sigma^{2}}\right) 代表权重

  • Meteorological Features 气象

    • 数值特征

      • 气温

      • 空气质量

      • 风速

    • 类别特征

      • 天气

Model

  • 网络结构

  • [[Feature Engineering]]

    • 数值特征 归一化+ ELU 升维

      • 维度太低限制神经网络表达
    • 类别特征 embedding + dropout 避免过拟合

      • 一个类别多个id 对应 embedding 进行 average pooling
    • 骑手特征

      • 召回 pickup 距离在 X 米范围内的骑手组成骑手集合

      • re-ranking module Query Invariant Listwise Context Modeling (QILCM)

        • 对于每个骑手 c 生成表示向量 hc 以及排序分 sc,利用加权得分表示骑手向量

          • cCcandidate sch~c\sum_{\forall c \in C^{\text {candidate }}} s_{c} \tilde{\mathbf{h}}_{c}
  • Postprocessing 后处理

    • 模型收敛速度慢,预测效果差

      • 模型拟合的数据分布与真实履约时间的分布发生偏移,真实的履约时间实际上是一个右偏长尾的分布,相当于有一小部分订单真实的配送时间偏长而模型没有学习到。
    • OFCTopredict =μreal +σreal yoout μini σini \mathrm{OFCT}_o^{\text {predict }}=\mu^{\text {real }}+\sigma^{\text {real }} \frac{y_o^{\text {out }}-\mu^{\text {ini }}}{\sigma^{\text {ini }}}

      • 实现拟合和缩放

      • y_out 是模型输出

      • real 是根据真实数据统计的均值和方差

      • ini 是整个训练集的均值和方差

    • 自适应 [[Box-Cox Transformation]] 逆变换

      • 更加能够拟合长尾数据分布
  • Objective Function

    • predict +λrank \ell_{\text {predict }}+\lambda \ell_{\text {rank }}

      • ranking loss + predictive loss
    • predict =OFCTopredict OFCTo\ell_{\text {predict }}=\left|\mathrm{OFCT}_o^{\text {predict }}-\mathrm{OFCT}_o\right|

    • rank =cCcandidate (I(c=co)log(sc))\ell_{\text {rank }}=-\sum_{\forall c \in C^{\text {candidate }}}\left(\mathbb{I}\left(c=c_o\right) \log \left(s_c\right)\right)

      • cross entropy loss

      • I()\mathbb{I}() 是指示函数

Experiments

  • Ablation Analysis of Feature Groups 消融实验

    • 订单特征、供需特征

  • 不同模型对比

    • GBM

    • [[DeepFM]] [[@xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems]]

      • 简单 concatenation&MLP 策略,不适合 OFCT 任务
    • [[TEMP+R]] [[MURAT]] OD-based ETA methods

Conclusion

  • 可以改进的点

    • cooking time features

    • weather prediction

Ref