GMM

解决高斯分布的单峰性,引入多个高斯模型加权平均拟合数据

几何角度

  • p(x)=k=1KαkN(μk,Σk),α=1p(x)=\sum _{k=1}^{K} \alpha _k N( \mu _k, \Sigma _k), \sum \alpha = 1

生成模型角度

  • 利用离散变量 z 来选择来自哪一个高斯分布

  • p(x)=zp(x,z)=k=1Kp(x,z=k)=k=1Kp(z=k)p(xz=k)p(x)=\sum\limits_zp(x,z)=\sum\limits_{k=1}^Kp(x,z=k)=\sum\limits_{k=1}^Kp(z=k)p(x|z=k)

  • 得到 $$p(x)=\sum\limits_{k=1}^Kp_k\mathcal{N}(x|\mu_k,\Sigma_k)$$
    [[MLE]] 求解

  • 求解上面的 px

  • θMLE=argmaxθlogp(X)=argmaxθi=1Nlogp(xi)=argmaxθi=1Nlogk=1KpkN(xiμk,Σk)\theta_{MLE}=\mathop{argmax}\limits_{\theta}\log p(X)=\mathop{argmax}_{\theta}\sum\limits_{i=1}^N\log p(x_i)=\mathop{argmax}_\theta\sum\limits_{i=1}^N\log \sum\limits_{k=1}^Kp_k\mathcal{N}(x_i|\mu_k,\Sigma_k)

  • log 中有连加号存在,无法求出解析解。
    [[EM]] 求解

  • EM 目标求解:θ(t+1)=argmaxθZlog[p(X,Zθ)]p(ZX,θ(t))dZ=argmaxθEzx,θt[logp(X,Zθ)]\theta ^{(t+1)} = \arg \max _{\theta} \int _Z \log [p(X,Z|\theta)]p(Z|X, \theta ^{(t)})dZ = \arg \max _{\theta} E _{z|x,{\theta ^t}} [\log p(X,Z|\theta)]

  • E-Step

    • Q(θ,θt)=z[logi=1Np(xi,ziθ)]i=1Np(zixi,θt)Q(\theta,\theta^t)=\sum\limits_z[\log\prod\limits_{i=1}^Np(x_i,z_i|\theta)]\prod \limits_{i=1}^Np(z_i|x_i,\theta^t)\\

+ 对第一个累加号展开,第一项为: + $$\sum\limits_z\log p(x_1,z_1|\theta)\prod\limits_{i=1}^Np(z_i|x_i,\theta^t)=\sum\limits_z\log p(x_1,z_1|\theta)p(z_1|x_1,\theta^t)\prod\limits_{i=2}^Np(z_i|x_i,\theta^t)\\ =\sum\limits_{z_1}\log p(x_1,z_1|\theta) p(z_1|x_1,\theta^t)\sum\limits_{z_2,\cdots,z_K}\prod\limits_{i=2}^Np(z_i|x_i,\theta^t)\\ =\sum\limits_{z_1}\log p(x_1,z_1|\theta)p(z_1|x_1,\theta^t)

  + 后面与 1 无关求和结果为1

+ $$Q(\theta,\theta^t)=\sum\limits_{i=1}^N\sum\limits_{z_i}\log p(x_i,z_i|\theta)p(z_i|x_i,\theta^t)$$

  + $$p(x,z|\theta)=p(z|\theta)p(x|z,\theta)=p_z\mathcal{N}(x|\mu_z,\Sigma_z)$$

  + $$p(z|x,\theta^t)=\frac{p(x,z|\theta^t)}{p(x|\theta^t)}=\frac{p_z^t\mathcal{N}(x|\mu_z^t,\Sigma_z^t)}{\sum\limits_kp_k^t\mathcal{N}(x|\mu_k^t,\Sigma_k^t)}$$

+ $$Q=\sum\limits_{i=1}^N\sum\limits_{z_i}\log p_{z_i}\mathcal{N(x_i|\mu_{z_i},\Sigma_{z_i})}\frac{p_{z_i}^t\mathcal{N}(x_i|\mu_{z_i}^t,\Sigma_{z_i}^t)}{\sum\limits_kp_k^t\mathcal{N}(x_i|\mu_k^t,\Sigma_k^t)}$$
  • M-Step

    • Q=k=1Ki=1N[logpk+logN(xiμk,Σk)]p(zi=kxi,θt)Q=\sum\limits_{k=1}^K\sum\limits_{i=1}^N[\log p_k + \log \mathcal{N}(x_i|\mu_k,\Sigma_k)]p(z_i=k|x_i,\theta^t)

    • pkt+1=argmaxpkk=1Ki=1N[logpk+logN(xiμk,Σk)]p(zi=kxi,θt) s.t. k=1Kpk=1p_k^{t+1}=\mathop{argmax}_{p_k}\sum\limits_{k=1}^K\sum\limits_{i=1}^N[\log p_k+\log \mathcal{N}(x_i|\mu_k,\Sigma_k)]p(z_i=k|x_i,\theta^t)\ s.t.\ \sum\limits_{k=1}^Kp_k=1

      • 化简 $$p_k{t+1}=\mathop{argmax}_{p_k}\sum\limits_{k=1}K\sum\limits_{i=1}^N\log p_kp(z_i=k|x_i,\thetat) s.t. \sum\limits_{k=1}Kp_k=1$$

      • [[Lagrange Multiplier]] $$L(p_k,\lambda)=\sum\limits_{k=1}K\sum\limits_{i=1}N\log p_kp(z_i=k|x_i,\thetat)-\lambda(1-\sum\limits_{k=1}Kp_k)$$

      • [[Lagrange]] $$L(p_k,\lambda)=\sum\limits_{k=1}K\sum\limits_{i=1}N\log p_kp(z_i=k|x_i,\thetat)-\lambda(1-\sum\limits_{k=1}Kp_k)$$

      • pkL=i=1N1pkp(zi=kxi,θt)+λ=0\frac{\partial}{\partial p_k}L=\sum\limits_{i=1}^N\frac{1}{p_k}p(z_i=k|x_i,\theta^t)+\lambda=0\\

\Rightarrow\lambda=-N$$

  + $$p_k^{t+1}=\frac{1}{N}\sum\limits_{i=1}^Np(z_i=k|x_i,\theta^t)$$

+ $$\mu_k,\Sigma_k$$ 无约束参数,直接求导

对比 MLE 和 EM 在求解 GMM 问题上的区别

  • θMLE=argmaxθlogp(X)=argmaxθi=1Nlogp(xi)=argmaxθi=1Nlogk=1KpkN(xiμk,Σk)\theta_{MLE}=\mathop{argmax}\limits_{\theta}\log p(X)=\mathop{argmax}_{\theta}\sum\limits_{i=1}^N\log p(x_i)=\mathop{argmax}_\theta\sum\limits_{i=1}^N\log \sum\limits_{k=1}^Kp_k\mathcal{N}(x_i|\mu_k,\Sigma_k)

  • Q=k=1Ki=1N[logpk+logN(xiμk,Σk)]p(zi=kxi,θt)Q=\sum\limits_{k=1}^K\sum\limits_{i=1}^N[\log p_k + \log \mathcal{N}(x_i|\mu_k,\Sigma_k)]p(z_i=k|x_i,\theta^t)

  • 极大似然需要计算 $$P(X)$$,EM 计算 $$P(X,Z)$$。$$P(X) = \sum _Z P(X,Z)$$,每次单独考虑 $$P(X,Z)$$,避免在 log 中求和。

E 和 M 具体的含义还没有整理!

作者

Ryen Xiang

发布于

2024-10-05

更新于

2024-10-05

许可协议


网络回响

评论