Skip to content

Latest commit

 

History

History
47 lines (31 loc) · 2.85 KB

3.2-单层神经网络梯度计算例子.md

File metadata and controls

47 lines (31 loc) · 2.85 KB

3.2-单层神经网络的梯度计算例子

在本章节,我们来对一个单层神经网路求梯度验证我们对导数、矩阵运算的学习效果。为了能够对神经网络复杂的计算求导,我们将神经网络的步步拆开成下面的表达式来看。单层神经网络可以表示为: $$x = input \in \mathbb R^{D_x \times 1}$$ $$z = Wx + b_1 \in \mathbb R^{D_h \times 1}$$ $$h = ReLU(z) \in \mathbb R^{D_x \times 1}$$ $$\Theta = Uh + b_2 \in \mathbb R^{N_c \times 1}$$ $$\hat{y} = softmax(\Theta ) \in \mathbb R^{N_c \times 1}$$ $$J = cross-entropy(y, \hat{y} \in \mathbb R^{1}$$ 其中 $$x \in \mathbb R^{D_x \times 1}, b_1 \in \mathbb R^{D_h \times 1}, W \in \mathbb R^{D_h \times D_x}, b_2 \in \mathbb R^{N_c \times 1}, U \in \mathbb R^{N_c \times D_h}$$

$D_x, D_h$分别是输入和hidden layer的大小,$N_c$是分类的类别数量。

我们想计算的是损失函数对参数$U,W, b_1, b_2$的梯度,我们可以计算的梯度有: $$\frac{\partial J}{\partial U}, \frac{\partial J}{\partial b_2}, \frac{\partial J}{\partial W}, \frac{\partial J}{\partial b_1}, \frac{\partial J}{\partial x}$$

另外我们需要知道$ReLu(x) = max(x, 0)$,他的导数: $$ReLU'(x) = {^{1, x>0}_{0} = sgn(ReLu(x))$$

sgn是符号函数(输入大于0为输出1,输入小于等于0输出0)。

假设我们要求$U, b_2$的梯度,我们可以写出链式法则: $$\frac{\partial J}{\partial U} = \frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \Theta} \frac{\partial \Theta}{\partial U}$$ $$\frac{\partial J}{\partial b_2} = \frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \Theta} \frac{\partial \Theta}{\partial b_2}$$

从上面的两个式子可以看出$\frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \Theta}$被用了两遍,所以我们再定义一个中间变量: $$\delta_1 = \frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \Theta} = \frac{\partial J}{\partial \Theta}$$

根据2.9.7的推到可以得到 $$\delta_1 = \frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \Theta} = \frac{\partial J}{\partial \Theta} = (\hat{y} - y)^T \mathbb R^{1 \times N_c}$$

进一步对$z$求梯度:

$$\frac{\partial J}{\partial z} = \frac{\partial J}{\partial \Theta} \frac{\partial \Theta}{\partial h} \frac{\partial h}{\partial z}$$

$$\frac{\partial J}{\partial z} = \delta_1 \frac{\partial \Theta}{\partial h} \frac{\partial h}{\partial z}, 带入\delta_1$$

$$\frac{\partial J}{\partial z} = \delta_1 U \frac{\partial h}{\partial z}, 根据2.9.1$$

$$\frac{\partial J}{\partial z} = \delta_1 U \odot ReLu'(z), 根据2.9.4, \odot代表elementwise乘法$$

$$\frac{\partial J}{\partial z} = \delta_1 U \odot sgn(h)$$

最后进行一下形状检查 $$\frac{\partial J}{\partial z} (1 \times D_h) = \delta_1(1 \times N_c)乘以 U (N_c \times D_h)\odot sgn(h) (D_h)$$