在本章节,我们来对一个单层神经网路求梯度验证我们对导数、矩阵运算的学习效果。为了能够对神经网络复杂的计算求导,我们将神经网络的步步拆开成下面的表达式来看。单层神经网络可以表示为:
$$x = input \in \mathbb R^{D_x \times 1}$$
$$z = Wx + b_1 \in \mathbb R^{D_h \times 1}$$
$$h = ReLU(z) \in \mathbb R^{D_x \times 1}$$
$$\Theta = Uh + b_2 \in \mathbb R^{N_c \times 1}$$
$$\hat{y} = softmax(\Theta ) \in \mathbb R^{N_c \times 1}$$
$$J = cross-entropy(y, \hat{y} \in \mathbb R^{1}$$
其中
$$x \in \mathbb R^{D_x \times 1}, b_1 \in \mathbb R^{D_h \times 1}, W \in \mathbb R^{D_h \times D_x}, b_2 \in \mathbb R^{N_c \times 1}, U \in \mathbb R^{N_c \times D_h}$$
$D_x, D_h$分别是输入和hidden layer的大小,$N_c$是分类的类别数量。
我们想计算的是损失函数对参数$U,W, b_1, b_2$的梯度,我们可以计算的梯度有:
$$\frac{\partial J}{\partial U}, \frac{\partial J}{\partial b_2}, \frac{\partial J}{\partial W}, \frac{\partial J}{\partial b_1}, \frac{\partial J}{\partial x}$$
另外我们需要知道$ReLu(x) = max(x, 0)$,他的导数:
$$ReLU'(x) = {^{1, x>0}_{0} = sgn(ReLu(x))$$
sgn是符号函数(输入大于0为输出1,输入小于等于0输出0)。
假设我们要求$U, b_2$的梯度,我们可以写出链式法则:
$$\frac{\partial J}{\partial U} = \frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \Theta} \frac{\partial \Theta}{\partial U}$$
$$\frac{\partial J}{\partial b_2} = \frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \Theta} \frac{\partial \Theta}{\partial b_2}$$
从上面的两个式子可以看出$\frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \Theta}$被用了两遍,所以我们再定义一个中间变量:
$$\delta_1 = \frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \Theta} = \frac{\partial J}{\partial \Theta}$$
根据2.9.7的推到可以得到
$$\delta_1 = \frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \Theta} = \frac{\partial J}{\partial \Theta} = (\hat{y} - y)^T \mathbb R^{1 \times N_c}$$
进一步对$z$求梯度:
$$\frac{\partial J}{\partial z} = \frac{\partial J}{\partial \Theta} \frac{\partial \Theta}{\partial h} \frac{\partial h}{\partial z}$$
$$\frac{\partial J}{\partial z} = \delta_1 \frac{\partial \Theta}{\partial h} \frac{\partial h}{\partial z}, 带入\delta_1$$
$$\frac{\partial J}{\partial z} = \delta_1 U \frac{\partial h}{\partial z}, 根据2.9.1$$
$$\frac{\partial J}{\partial z} = \delta_1 U \odot ReLu'(z), 根据2.9.4, \odot代表elementwise乘法$$
$$\frac{\partial J}{\partial z} = \delta_1 U \odot sgn(h)$$
最后进行一下形状检查
$$\frac{\partial J}{\partial z} (1 \times D_h) = \delta_1(1 \times N_c)乘以 U (N_c \times D_h)\odot sgn(h) (D_h)$$