Transformer/Ref
TODO 为节约而生:从标准Attention到稀疏Attention - 科学空间|Scientific Spaces
TODO 有哪些令你印象深刻的魔改transformer? - 知乎 (zhihu.com)
- Character-level language modeling with deeper self-attention
Transformer/Ref
TODO 为节约而生:从标准Attention到稀疏Attention - 科学空间|Scientific Spaces
TODO 有哪些令你印象深刻的魔改transformer? - 知乎 (zhihu.com)
Transformer/Ref