Residual DCN
-
两层 [[DCNv2]] 提供足够的交叉,但是会显著增加训练和推理时间。采用两种策略提高效率 #card
-
replaced the weight matrix with two skinny matrices resembling a low-rank approximation
-
reduced the input feature dimension by replacing sparse one-hot features with embedding-table look-ups, resulting in nearly a 30% reduction.
-
-
low-rank cross net 中引入注意力机制 #card
-
通过 temperature 平衡学习到的特征交互复杂度
-
skip connection 和 fine-tuning the attention temperature 有助于学习更复杂的特征俩惯性,并保持稳定的训练。
-