2024-10-052025-04-29 随手记 3 分钟读完 (大约520个字) 0次访问

FTRL

FTL Follow The Leader 在线学习的一种思路 #card

为了减少单个样本的随机扰动，每次找到让之前所有损失函数之和最小的参数。
$w=\operatorname{argmin}_{w} \sum_{i=1}^{t} f_{i}(w)$
FTRL 带正则项的 FTL 算法 #card
$w=\operatorname{argmin}_{w} \sum_{i=1}^{t} f_{i}(w)+R(w)$
通过代理损失函数求解

[[稀疏性]] 模型稀疏好处

减少预测内存和复杂度，大量参数是零
利用 L1 正则不仅能获得稀疏，而且能降低模型过拟合带来的风险
稀疏模型，相对来说可解释性更好。

为什么 SGD 不一定能保证模型的稀疏性？#card

不同于 Batch，Online 中每次的更新并不是沿着全局梯度进行下降，而是沿着某个样本的产生的梯度方向进行下降，整个寻优过程变得像是一个“随机” 查找的过程(SGD 中 Stochastic 的来历)，这样 Online 最优化求解即使采用 L1 正则化的方式，也很难产生稀疏解。

数据集规模大，每一次计算全局梯度的代价变得过高，完成训练时间会变得很长。

在线学习：每次处理一个样本，处理过的样本会被丢弃。

特点 #card

每个特征一个学习率([[Adam]]中也实现了)
收敛速度快
L1 正则引入稀疏性，L2 正则引入平滑 [[弹性网络回归]]

How they choose to center the additional strong convexity used to guarantee low regret: RDA centers this regularization at the origin, while FOBOS centers it at the current feasible point. 结合[[FOBOS]]高精度以及 RDA 较好的稀疏性

How they handle an arbitrary non-smooth regularization function $\Psi$ . This includes the mechanism of projection onto a feasible set and how $L_1$ regularization is handled.

Ref

各大公司广泛使用的在线学习算法FTRL详解 - EE_NovRain - 博客园 (cnblogs.com) 包含部分工程细节

FTRL

https://blog.xiang578.com/post/logseq/FTRL.html

作者

Ryen Xiang

发布于

2024-10-05

更新于

2025-04-29

许可协议

Paper

FTRL

Ref

作者

发布于

更新于

许可协议

相关文章

网络回响

评论

目录

最新文章

FTRL

Ref

作者

发布于

更新于

许可协议

相关文章

【时间序列预测】Are Transformers Effective for Time Series Forecasting?

【滴滴 HierETA】Interpreting Trajectories from Multiple Views A Hierarchical Self-Attention Network for Estimating the Time of Arrival

@A Consumer Compensation System in Ride-hailing Service

3.12 Model Dictionary Compression

@Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

网络回响

评论

目录

最新文章