2025-04-172025-04-18 随手记 1 分钟读完 (大约141个字) 0次访问

Residual DCN

两层 [[DCNv2]] 提供足够的交叉，但是会显著增加训练和推理时间。采用两种策略提高效率 #card
- replaced the weight matrix with two skinny matrices resembling a low-rank approximation
- reduced the input feature dimension by replacing sparse one-hot features with embedding-table look-ups, resulting in nearly a 30% reduction.
low-rank cross net 中引入注意力机制 #card
- 通过 temperature 平衡学习到的特征交互复杂度
- skip connection 和 fine-tuning the attention temperature 有助于学习更复杂的特征俩惯性，并保持稳定的训练。

Residual DCN

https://blog.xiang578.com/post/logseq/Residual DCN.html

作者

Ryen Xiang

发布于

2025-04-17

更新于

2025-04-18

许可协议

网络回响

评论