2024-10-052024-10-05 随手记 1 分钟读完 (大约180个字) 0次访问

Learning Rate Decay

训练集的损失下降到一定程度后不在下降，training loss 一直在一个范围内震荡，遇到这种情况可以适当降低学习率。

常用方法

线性衰减，每过 5 个 epochs 学习率减半
[[指数衰减]]
- decayed_learning_rate=learning_rate*decay_rate^(global_step/decay_steps)

[[Adam]] 自适应的优化方法还需要学习率衰减吗？

neural network - Should we do learning rate decay for adam optimizer - Stack Overflow
自适应优化器Adam还需加learning-rate decay吗？ - 知乎 (zhihu.com)
- [[AdamW]] 中提到加 Learning Rate Decay 的 Adam 还是有效提升模型表现。

Learning Rate Decay

https://blog.xiang578.com/post/logseq/Learning Rate Decay.html

作者

Ryen Xiang

发布于

2024-10-05

更新于

2024-10-05

许可协议

网络回响

评论