2026-02-17 2026-02-17 随手记 几秒读完 (大约36个字) 0次访问Probability Matching Boltzmann Exploration 手柄的概率 p(i)=expRˉ(i)τ∑j=1NexpRˉ(j)τp(i)=\frac{\exp \frac{\bar{R}(i)}{\tau}}{\sum_{j=1}^N \exp \frac{\bar{R}(j)}{\tau}}p(i)=∑j=1NexpτRˉ(j)expτRˉ(i) #card 网络回响Probability Matchinghttps://blog.xiang578.com/post/logseq/89471.html作者Ryen Xiang发布于2026-02-17更新于2026-02-17许可协议