BERT

[[@BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]]

代码:google-research/bert: TensorFlow code and pre-trained models for BERT

大模型 + 微调提升小任务的效果

输入层

BERT

两种 NLP 预训练

[[ELMo]]

[[GPT]]

-w1304

贡献性

模型输入:

训练方式

  • [[Masked-Language Modeling]] →mask 部分单词,80 % mask,10 % 错误单词, 10% 正确单词
    • 目的 → 训练模型记忆句子之间的关系。
      • 减轻预训练和 fine-tune 目标不一致给模型带来的影响
  • [[Next Sentence Prediction]] → 预测是不是下一个句子
    • 句子 A 和句子 B 有 50% 的概率是上下文
    • 解决后续什么问题 → QA 和自然语言推理
      image.png

[[激活函数]] [[GELU]]

优化器

fine tune

研究取不同的 embedding 效果

缺陷

[[Ref]]


TCN

{:height 198, :width 509}

  • TCN 中输入和输出可能有不同的宽度,c 图表示使用 11 卷积调整输入大小
    • 也可以直接通过 zero padding 来增加 channels

TCN = 1D FCN + causal convolutions

特点

  • 使用因果卷积,不会泄漏未来信息。
    • 论文中强调和 RNN 之类方法进行对比,所以要考虑因果。
  • 可以取任意长度的序列,并将其映射到相同长度的输出序列。
  • 引入 [[ResNet]] 和扩张卷积的组合可以将网络做深以及增加感受野。

细节

  • tcn 中没有 pooling 层
  • normalization 方法是 weight norm,更适合序列问题

增加感受野的方法

  • 更大的 kernel_size (增加参数,卷积核大效果差,卷积核过大会退化成一个全连接层)
  • [[空洞卷积]]

时序问题

    1. 输入和输出矩阵大小相同
    1. 不能使用没有发生时刻的信息,因果卷积

[[ETA 模型]] 实现

  • ((f431b69d-e38f-4a1f-ac98-1c80d3e0bcbe))

@ETA Prediction with Graph Neural Networks in Google Maps

[[Abstract]]

  • Travel-time prediction constitutes a task of high importance in transportation networks, with web mapping services like Google Maps regularly serving vast quantities of travel time queries from users and enterprises alike. Further, such a task requires accounting for complex spatiotemporal interactions (modelling both the topological properties of the road network and anticipating events—such as rush hours—that may occur in the future). Hence, it is an ideal target for graph representation learning at scale. Here we present a graph neural network estimator for estimated time of arrival (ETA) which we have deployed in production at Google Maps. While our main architecture consists of standard GNN building blocks, we further detail the usage of training schedule methods such as MetaGradients in order to make our model robust and production-ready. We also provide prescriptive studies: ablating on various architectural decisions and training regimes, and qualitative analyses on real-world situations where our model provides a competitive edge. Our GNN proved powerful when deployed, significantly reducing negative ETA outcomes in several regions compared to the previous production baseline (40+% in cities like Sydney).

[[Attachments]]


@TabNet: Attentive Interpretable Tabular Learning

[[Abstract]]

  • We propose a novel high-performance and interpretable canonical deep tabular data learning architecture, TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning as the learning capacity is used for the most salient features. We demonstrate that TabNet outperforms other neural network and decision tree variants on a wide range of non-performance-saturated tabular datasets and yields interpretable feature attributions plus insights into the global model behavior. Finally, for the first time to our knowledge, we demonstrate self-supervised learning for tabular data, significantly improving performance with unsupervised representation learning when unlabeled data is abundant.

[[Attachments]]