『聚合』 Softmax偏导及BP过程的推导

Softmax求导

其实BP过程在pytorch中可以自动进行,这里进行推导只是强迫症

A

Apart证明softmax求导和softmax的BP过程
本来像手打公式的,想想还是算了,引用部分给出latex公式说明。

A.1

softmax导数

A.2

softmax梯度下降

B

基本上都是拾人牙慧,在此给出引用和参考。

参考:

  • 矩阵求导术(下) – 知乎 (zhihu.com)

  • nndl


\(引用几个定理B.15和B.16\)

\((B.15)\)

\[ \begin{aligned} & \vec{x} \in k^{M \times 1}, y \in R, \vec{z} \in R^{N \times 1},\quad 则: \\ & \frac{\partial y \vec{z}}{\partial \vec{x}}=y \frac{\partial \vec{z}}{\partial \vec{x}}+\frac{\partial y}{\partial \vec{x}} \cdot \vec{z}^{\top} \in R^{M \times N} \end{aligned} \]

\[\begin{aligned} & \text{[证明]:} \\ & dy\vec{z} \\ & =d y \cdot \vec{z}+y \cdot d \vec{z} \\ &=\vec{z} \cdot d y+y \cdot d \vec{z} \\ &=\vec{z} \cdot \left(\frac{\partial y}{\partial \vec{x}}\right)^{\top} d \vec{x}+y \cdot\left(\frac{\partial \vec{z}}{\partial \vec{x}}\right)^{\top} d \vec{x} \\ & \therefore \frac{\partial y \vec{z}}{\partial \vec{x}}=y \cdot \frac{\partial \vec{z}}{\partial \vec{x}}+\frac{\partial y}{\partial \vec{x}} \cdot \vec{z}^{\top} \end{aligned} \]

\((B.26)\)

\[\begin{aligned} & \vec{x} \in R^N, \quad \vec{f}(\vec{x})=\left[f\left(x_1\right), f\left(x_2\right) \ldots f\left(x_n\right)\right] \in R^N, 则 \\ & \frac{\partial \vec{f}(\vec{x})}{\partial \vec{x}}=\operatorname{diag}\left(\vec{f}^{\prime}(\vec{x})\right) \end{aligned} \]

\[\begin{aligned} & \text { [证明]: } \frac{\partial \vec{f}(\vec{x})}{\partial \vec{x}}=\left[\begin{array}{cccc} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_2}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial \eta_n} \\ \vdots & \vdots & & \vdots \\ \frac{\partial f_1}{\partial x_n} & \frac{\partial f_1}{\partial x_n} & \cdots & -\frac{\partial f_n}{\partial x_n} \end{array}\right]=\left[\begin{array}{llll} f^{\prime}\left(x_1\right) & & \\ & f^{\prime}\left(x_2\right) & & \\ & & \ddots & \\ & & & f^{\prime}\left(x_n\right) \end{array}\right]=\operatorname{diag}\left(\vec{f}^{\prime}(\vec{x})\right) \end{aligned} \]

\(Apart中必须说明的两个推导:\)
\((1)\)

\[\begin{aligned} & \vec{x} \in R^n, \exp (\vec{x})=\left[\begin{array}{c} \exp \left(x_1\right) \\ \vdots \\ \exp \left(x_n\right) \end{array}\right] \in R^n\\ & 故存在偏导:\frac{\partial \exp (\vec{x})}{\partial \vec{x}}=\left[\begin{array}{ccc} \frac{\partial \exp \left(x_1\right)}{\partial x_1} & \cdots & \frac{\partial \exp \left(x_n\right)}{\partial x_1} \\ \vdots & & \\ \frac{\partial \exp \left(x_1\right)}{\partial x_n} & \cdots & \frac{\partial \exp \left(x_n\right)}{\partial x_n} \end{array}\right]=\operatorname{diag}(\exp (\vec{x})) \end{aligned} \]

\((2)\)

\[\begin{aligned} & d\vec{1}^{\top} \exp (\vec{x}) \\ & =\vec{1}^{\top} d \exp (\vec{x}) \\ &=\vec{1}^{\top}\left(\exp ^{\prime}(\vec{x}) \odot d \vec{x}\right) \\ &=\left(\vec{1} \odot \exp ^{\prime}(\vec{x})\right)^{\top} d \vec{x} \\ & \text { 有: } \frac{\partial \vec{1}^{\top} \exp (\vec{x})}{\partial \vec{x}}=\vec{1} \odot \exp ^{\prime}(\vec{x})=\exp ^{\prime}(\vec{x})=\exp (\vec{x}) \end{aligned} \]

C

理解可能有偏颇。


文章源地址: https://www.cnblogs.com/aoidayo/p/18005371.html 转载请注明出处

© 版权声明
WWW.ANXKJ.TOP
喜欢就支持一下吧
点赞6 分享
评论 抢沙发
头像
欢迎您留下宝贵的见解!
提交
头像

昵称

取消
昵称表情代码图片

    暂无评论内容