-
Notifications
You must be signed in to change notification settings - Fork 0
/
atom.xml
118 lines (68 loc) · 64.9 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>此间不留白</title>
<link href="/atom.xml" rel="self"/>
<link href="https://elmadavies.github.io/"/>
<updated>2019-10-17T10:35:02.208Z</updated>
<id>https://elmadavies.github.io/</id>
<author>
<name>ElmaDavies</name>
</author>
<generator uri="http://hexo.io/">Hexo</generator>
<entry>
<title>deeplearning 课后作业1</title>
<link href="https://elmadavies.github.io/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B51/"/>
<id>https://elmadavies.github.io/深度学习/神经网络实践1/</id>
<published>2019-10-17T09:37:40.350Z</published>
<updated>2019-10-17T10:35:02.208Z</updated>
<content type="html"><![CDATA[<p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/4dzp9g.jpg" alt></p><a id="more"></a><blockquote><p>deeplearning.ai官网地址:<a href="https://www.deeplearning.ai/" target="_blank" rel="noopener">https://www.deeplearning.ai/</a><br>coursera地址:<a href="https://www.coursera.org/specializations/deep-learning" target="_blank" rel="noopener">https://www.coursera.org/specializations/deep-learning</a><br>网易视频地址:<a href="https://163.lu/nPtn42" target="_blank" rel="noopener">https://163.lu/nPtn42</a><br>课程一第二周课后作业1-1</p></blockquote><h1 id="使用numpy实现基本函数"><a href="#使用numpy实现基本函数" class="headerlink" title="使用numpy实现基本函数"></a>使用numpy实现基本函数</h1><h2 id="实现激活函数"><a href="#实现激活函数" class="headerlink" title="实现激活函数"></a>实现激活函数</h2><p>numpy是一个python中基础的科学计算包,以下练习中,将会实现一些数学中的基础函数,如<code>np.log(),np.exp()</code>等。</p><p>对于机器学习中的逻辑回归激活函数,表达式为$\delta = \frac{1}{1+e^{-z}}$,对于此公式,考虑到参数$z$是一个实数,可以简单的使用python中的<code>math</code>实现。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> math</span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">sigmoid</span><span class="params">(z)</span>:</span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>/(<span class="number">1</span>+math.exp(-z))</span><br></pre></td></tr></table></figure><p>以上实现中,实际能够传入的参数<code>z</code>只能是一个数字,而在机器学习或者深度学习中,实际需要传入的参数是一个向量,所以<code>math.exp()</code>并不适用,可以使用<code>numpy.exp()</code>实现,如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">sigmoid</span><span class="params">(z)</span>:</span></span><br><span class="line"> sigmoid = <span class="number">1</span>/(<span class="number">1</span>+np.exp(-z))</span><br><span class="line"> <span class="keyword">return</span> sigmoid</span><br></pre></td></tr></table></figure><p>对于一个向量而言,<code>np.exp()</code>的实现过程就是对向量中的每一个元素计算<code>np.exp()</code>,可以如下图表示:</p><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/%E8%AF%BE%E7%A8%8B1-%E7%AC%AC%E4%BA%8C%E5%91%A8%E4%BD%9C%E4%B8%9A1-1.png" alt></p><h2 id="实现激活函数梯度"><a href="#实现激活函数梯度" class="headerlink" title="实现激活函数梯度"></a>实现激活函数梯度</h2><p>在之前的学习中,需要计算梯度优化使用反向传播算法的损失函数,对于此梯度的计算公式可以如下表示:<br>$\delta’(z) = \delta(z)*(1-\delta(z))$</p><p>代码实如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">sigmoid_derivative</span><span class="params">(z)</span>:</span></span><br><span class="line"> s = sigmoid(z)</span><br><span class="line"> ds = s(<span class="number">1</span>-s)</span><br><span class="line"> <span class="keyword">return</span> ds</span><br></pre></td></tr></table></figure><h2 id="改变矩阵(向量)形状"><a href="#改变矩阵(向量)形状" class="headerlink" title="改变矩阵(向量)形状"></a>改变矩阵(向量)形状</h2><p>关于深度学习中,numpy中最常用到的两个跟矩阵或者向量形状有关的函数是<code>shape()</code>和<code>reshape()</code>。</p><ol><li><code>shape()</code>:返回当前矩阵或者向量的大小。</li><li><code>reshape()</code>:将当前数组转化为特定形状。</li></ol><p>具体一些用法如下所示;<br>将一个三维图像矩阵(shape=(length,height,depth)),转化为一维数组(shape = (lenght×height×depth,1))以便于处理,具体实现方法可以由以下代码所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">image2vector</span><span class="params">(image)</span>:</span></span><br><span class="line"> v = image.reshape(image.shape[<span class="number">0</span>] * image.shape[<span class="number">1</span>] * image.shape[<span class="number">2</span>], <span class="number">1</span>) </span><br><span class="line"> <span class="keyword">return</span> v</span><br></pre></td></tr></table></figure><h2 id="按行标准化:"><a href="#按行标准化:" class="headerlink" title="按行标准化:"></a>按行标准化:</h2><p>机器学习或者深度学习中,常常用到的一个数据预处理技巧是标准化,标准化后的数据能够使得算法运行速度更快,关于数据标准化的公式如下所示:</p><p>$normal = \frac{x}{||x||}$</p><p>其中,$||x||$表示范数,对于一个向量$[x_1,x_2……x_n]$,其范数计</p><p>算公式为$\sqrt{x_1^{2}+x_2^{2}+···x_n^{2}}$。</p><p><code>numpy</code>已经提供了用于计算范数的函数,所以其数据标准化的处理过程可以用以下代码实现:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">normalizeRows</span><span class="params">(x)</span>:</span></span><br><span class="line"> x_norm = np.linalg.norm(x, axis = <span class="number">1</span>, keepdims = <span class="literal">True</span>)</span><br><span class="line"> x = x / x_norm</span><br><span class="line"> <span class="keyword">return</span> x</span><br></pre></td></tr></table></figure><h2 id="Python广播机制和-softmax-函数"><a href="#Python广播机制和-softmax-函数" class="headerlink" title="Python广播机制和$softmax$函数"></a>Python广播机制和$softmax$函数</h2><p>为了理解<code>python</code>中的广播机制,一种行之有效的方法是操作两个维数不同的矩阵。使用<code>numpy</code>中的<code>softmax</code>函数理解<code>python</code>中的广播机制,可以将<code>softmax</code>函数认为是你的算法需要实现二分类或者更多分类时的数据标准化函数,其数学表示如下所示:</p><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/%E8%AF%BE%E7%A8%8B1-%E7%AC%AC%E4%BA%8C%E5%91%A8%E4%BD%9C%E4%B8%9A1-2.png" alt></p><p>其实现代码如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">softmax</span><span class="params">(x)</span>:</span></span><br><span class="line"> x_exp = np.exp(x)</span><br><span class="line"> x_sum = np.sum(x_exp,axis=<span class="number">1</span>,keepdims = <span class="literal">True</span>)</span><br><span class="line"> <span class="keyword">return</span> x_exp/x_sum</span><br></pre></td></tr></table></figure><h1 id="2-关于两个损失函数的实现"><a href="#2-关于两个损失函数的实现" class="headerlink" title="2.关于两个损失函数的实现"></a>2.关于两个损失函数的实现</h1><p>在深度学习中,损失函数常用来评估模型性能,如果损失函数的值越大,代表着模型的预测值与真实值之间的差值越大,深度学习中常用梯度下降算法来优化算法以最小化损失函数。</p><h2 id="L1损失函数"><a href="#L1损失函数" class="headerlink" title="L1损失函数"></a>L1损失函数</h2><p>L1损失函数的定义如下所示:<br>$$\begin{align<em>} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}|\end{align</em>}$$</p><p>其代码实现如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">L1</span><span class="params">(y,yhat)</span>:</span></span><br><span class="line"> loss = np.sum(np.abs(y-yhat))</span><br><span class="line"> <span class="keyword">return</span> loss</span><br></pre></td></tr></table></figure><h2 id="L2损失函数"><a href="#L2损失函数" class="headerlink" title="L2损失函数"></a>L2损失函数</h2><p>L2损失函数的定义如下所示:<br>$$\begin{align<em>} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} -\hat{y}^{(i)})^2 \end{align</em>}$$<br><strong>注意:</strong>对于一个向量$[X] = [x_1,x_2,···x_n]$,<code>np.dot()</code>的计算过</p><p>程为:<code>np.dot(x,x) =</code>$\sum_{i=0}^{n}x_i^{2}$,所以L2损失函数的一种有效计算方式是利用<code>dot()</code>函数,具体实现代码如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">L2</span><span class="params">(y,yhat)</span>:</span></span><br><span class="line"> loss = np.sum(np.dot((y-yhat),(y-yhat).T))</span><br><span class="line"> <span class="keyword">return</span> loss</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/4dzp9g.jpg" alt></p>
</summary>
<category term="深度学习" scheme="https://elmadavies.github.io/categories/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/"/>
<category term="深度学习" scheme="https://elmadavies.github.io/tags/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/"/>
<category term="神经网络" scheme="https://elmadavies.github.io/tags/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/"/>
</entry>
<entry>
<title>具有神经网络思维的逻辑回归算法</title>
<link href="https://elmadavies.github.io/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B52/"/>
<id>https://elmadavies.github.io/深度学习/神经网络实践2/</id>
<published>2019-10-17T09:37:40.350Z</published>
<updated>2019-10-17T10:36:21.311Z</updated>
<content type="html"><![CDATA[<p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/nm6318.jpg" alt><br>本次作业,将会构建一个逻辑回归分类器用以识别猫, 通过完成这次作业,也能了解一些神经网络的思维,并且能够建立一个对神经网络的基本认识。</p><a id="more"></a><blockquote><p>deeplearning.ai官网地址:<a href="https://www.deeplearning.ai/" target="_blank" rel="noopener">https://www.deeplearning.ai/</a><br>coursera地址:<a href="https://www.coursera.org/specializations/deep-learning" target="_blank" rel="noopener">https://www.coursera.org/specializations/deep-learning</a><br>网易视频地址:<a href="https://163.lu/nPtn42" target="_blank" rel="noopener">https://163.lu/nPtn42</a><br>课程一第二周课后作业1-2</p></blockquote><p>本次作业,将会构建一个逻辑回归分类器用以识别猫, 通过完成这次作业,也能了解一些神经网络的思维,并且能够建立一个对神经网络的基本认识。</p><h1 id="1-导入相关包"><a href="#1-导入相关包" class="headerlink" title="1. 导入相关包"></a>1. 导入相关包</h1><p>首先,导入,本此作业实现中所需要的所有相关包,具体实现如下:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br><span class="line"><span class="keyword">import</span> h5py</span><br><span class="line"><span class="keyword">import</span> scipy</span><br><span class="line"><span class="keyword">from</span> PIL <span class="keyword">import</span> Image</span><br><span class="line"><span class="keyword">from</span> scipy <span class="keyword">import</span> ndimage</span><br><span class="line"><span class="keyword">from</span> lr_utils <span class="keyword">import</span> load_dataset</span><br></pre></td></tr></table></figure><ul><li>一些库的简单介绍:<br><a href="http://www.h5py.org" target="_blank" rel="noopener">h5py</a>是一个处理<code>h5</code>文件的交互包<br><a href="https://www.scipy.org/" target="_blank" rel="noopener">scipy</a>是一个科学计算库<br><a href="http://www.pythonware.com/products/pil/" target="_blank" rel="noopener">PIL</a>是一个图像处理库</li></ul><h1 id="2-数据集的概述"><a href="#2-数据集的概述" class="headerlink" title="2.数据集的概述"></a>2.数据集的概述</h1><p>作业所用到的数据集存储在<code>data.h5</code>文件中,对于数据集,有以下介绍:</p><ul><li>数据集中的训练集<code>m_train</code>图像,被标注为<code>cat (y=1)</code>和<code>non-cat (y=0)</code></li><li>数据集中的测试集<code>m_test</code>图像,同样,被标注为<code>cat</code>和<code>non-cat</code></li><li>每一幅图像的都是<code>(num_px,num_px,3)</code>的形式,其中3表示图像是3通道的RGB形式,而长和宽都是<code>num_px</code>和<code>num_px</code>的正方形图片。</li><li>加载数据前,可以利用python查看数据集的存储形式,具体实现代码如下所示.</li></ul><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">f = h5py.File(<span class="string">'dataset/train_catvnoncat.h5'</span>,<span class="string">'r'</span>) <span class="comment">#打开h5文件 </span></span><br><span class="line"><span class="comment">#可以查看所有的主键 </span></span><br><span class="line"><span class="keyword">for</span> key <span class="keyword">in</span> f.keys(): </span><br><span class="line"> print(f[key].name) <span class="comment">#输出数据集标签值,train_set_x,train_set_y</span></span><br><span class="line"> print(f[key].shape) <span class="comment">#输出数据集train_set_x,train_set_y的形式</span></span><br><span class="line"> print(f[key].value) <span class="comment">#输出数据集trian_set_x和train_set_y的具体值</span></span><br></pre></td></tr></table></figure><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/%E8%AF%BE%E7%A8%8B1-%E7%AC%AC%E4%BA%8C%E5%91%A8%E8%AF%BE%E5%90%8E%E4%BD%9C%E4%B8%9A2-1.png" alt></p><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/%E8%AF%BE%E7%A8%8B1-%E7%AC%AC%E4%BA%8C%E5%91%A8%E4%BD%9C%E4%B8%9A2-2.png" alt></p><p>根据以上数据形式,编写<code>loadset()</code>函数加载数据集的代码如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">load_dataset</span><span class="params">()</span>:</span></span><br><span class="line"> train_dataset = h5py.File(<span class="string">'datasets/train_catvnoncat.h5'</span>, <span class="string">"r"</span>)</span><br><span class="line"> train_set_x_orig = np.array(train_dataset[<span class="string">"train_set_x"</span>][:]) <span class="comment">#训练集数据特征</span></span><br><span class="line"> train_set_y_orig = np.array(train_dataset[<span class="string">"train_set_y"</span>][:]) <span class="comment"># 训练集数据标签</span></span><br><span class="line"></span><br><span class="line"> test_dataset = h5py.File(<span class="string">'datasets/test_catvnoncat.h5'</span>, <span class="string">"r"</span>)</span><br><span class="line"> test_set_x_orig = np.array(test_dataset[<span class="string">"test_set_x"</span>][:]) <span class="comment"># 测试集数据特征</span></span><br><span class="line"> test_set_y_orig = np.array(test_dataset[<span class="string">"test_set_y"</span>][:]) <span class="comment">#测试集数据标签</span></span><br><span class="line"></span><br><span class="line"> classes = np.array(test_dataset[<span class="string">"list_classes"</span>][:]) <span class="comment"># 数据类别构成的列表</span></span><br><span class="line"> </span><br><span class="line"> train_set_y_orig = train_set_y_orig.reshape((<span class="number">1</span>, train_set_y_orig.shape[<span class="number">0</span>]))</span><br><span class="line"> test_set_y_orig = test_set_y_orig.reshape((<span class="number">1</span>, test_set_y_orig.shape[<span class="number">0</span>]))</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">return</span> train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes</span><br></pre></td></tr></table></figure><p>简单的测试下加载好的数据集,可以看到相关的输出信息如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">index = <span class="number">5</span></span><br><span class="line">plt.imshow(train_set_x_orig[index])</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"y = "</span> + str(train_set_y[:, index]) + <span class="string">", it's a '"</span> + classes[np.squeeze(train_set_y[:, index])].decode(<span class="string">"utf-8"</span>) + <span class="string">"' picture."</span>)</span><br></pre></td></tr></table></figure><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/%E8%AF%BE%E7%A8%8B1-%E7%AC%AC%E4%BA%8C%E5%91%A8%E4%BD%9C%E4%B8%9A2-3.png" alt></p><p>许多深度学习代码存在bug的原因之一就是矩阵或者向量维数不匹配,在算法实现之前,先进行数据集的维度输出是一个和明智的做法,具体实现算法如下代码所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">m_train = train_set_x_orig.shape[<span class="number">0</span>]</span><br><span class="line">m_test = test_set_x_orig.shape[<span class="number">0</span>]</span><br><span class="line">num_px = train_set_x_orig.shape[<span class="number">1</span>]</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">print</span> (<span class="string">"Number of training examples: m_train = "</span> + str(m_train))</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"Number of testing examples: m_test = "</span> + str(m_test))</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"Height/Width of each image: num_px = "</span> + str(num_px))</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"Each image is of size: ("</span> + str(num_px) + <span class="string">", "</span> + str(num_px) + <span class="string">", 3)"</span>)</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"train_set_x shape: "</span> + str(train_set_x_orig.shape))</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"train_set_y shape: "</span> + str(train_set_y.shape))</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"test_set_x shape: "</span> + str(test_set_x_orig.shape))</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"test_set_y shape: "</span> + str(test_set_y.shape))</span><br></pre></td></tr></table></figure><p>上述代码实现结果如下所示:</p><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/%E8%AF%BE%E7%A8%8B1-%E7%AC%AC%E4%BA%8C%E5%91%A8%E4%BD%9C%E4%B8%9A2-4.png" alt></p><p>在图像处理过程中,数据集中的图像数据是一个(num_px,num_px,3)的矩阵,需要将其转换为(num_px×num_px×3,1)的矩阵,在<code>python</code>中有一个实现技巧如下所示,可轻松实现矩阵形式的转化。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">X_flatten = X.reshape(X.shape[<span class="number">0</span>], <span class="number">-1</span>).T</span><br></pre></td></tr></table></figure><p>综上所述,具体实现代码可以如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[<span class="number">0</span>], <span class="number">-1</span>).T</span><br><span class="line">test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[<span class="number">0</span>], <span class="number">-1</span>).T</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"train_set_x_flatten shape: "</span> + str(train_set_x_flatten.shape))</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"train_set_y shape: "</span> + str(train_set_y.shape))</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"test_set_x_flatten shape: "</span> + str(test_set_x_flatten.shape))</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"test_set_y shape: "</span> + str(test_set_y.shape))</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"sanity check after reshaping: "</span> + str(train_set_x_flatten[<span class="number">0</span>:<span class="number">5</span>,<span class="number">0</span>]))</span><br></pre></td></tr></table></figure><p>以上代码结果如下所示:</p><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/replace.png" alt></p><p>为了表示图片中的颜色,由RGB通道表示图片颜色,即每个像素的值实际上是由三个0-255的数组成的向量。</p><p>机器学习中一种通用的处理数据集的方法称之为标准化,对于图像数据集而言,一种简单而又方便的处理数据的方法是对每一行数据除以255,即其RGB的最大值。具体实现可以如以下代码所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">train_set_x = train_set_x_flatten/<span class="number">255.</span></span><br><span class="line">test_set_x = test_set_x_flatten/<span class="number">255.</span></span><br></pre></td></tr></table></figure><h1 id="3-学习算法的一般体系架构"><a href="#3-学习算法的一般体系架构" class="headerlink" title="3. 学习算法的一般体系架构"></a>3. 学习算法的一般体系架构</h1><p>有了如上的数据预处理流程,是时候构建一个具有神经网络思维的算法了,其算法实现过程基本如下图所示:</p><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/%E8%AF%BE%E7%A8%8B1-%E7%AC%AC%E4%BA%8C%E5%91%A8%E8%AF%BE%E5%90%8E%E4%BD%9C%E4%B8%9A2-5.png" alt></p><p>对于一个样本 $x^{(i)}$:<br>$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$<br>$$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$<br>$$ \mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$$</p><p>所有样本的损失可以由以下公式计算:<br>$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$$</p><p>实现这个算法的关键步骤是:</p><ul><li>初始化模型的学习参数</li><li>通过最小化损失函数得到模型的学习参数</li><li>根据模型的学习参数做出相关预测</li><li>分析预测结果并得出相关结论</li></ul><h1 id="4-算法的构建过程"><a href="#4-算法的构建过程" class="headerlink" title="4. 算法的构建过程"></a>4. 算法的构建过程</h1><p>构建分类算法的主要步骤可以如下步骤所示:</p><ul><li>构建模型架构,例如特征的个数</li><li>初始化模型参数</li><li>循环:<ul><li>计算当前损失</li><li>利用反向传播算法计算当前的梯度</li><li>通过梯度下降更新参数</li></ul></li></ul><h2 id="4-1-sigmoid函数"><a href="#4-1-sigmoid函数" class="headerlink" title="4.1 sigmoid函数"></a>4.1 sigmoid函数</h2><p>逻辑回归中的激活函数是sigmoid函数,其实现方式可以由以下代码所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">sigmoid</span><span class="params">(z)</span>:</span></span><br><span class="line"> s = <span class="number">1</span> / (<span class="number">1</span> + np.exp(-z)) </span><br><span class="line"> <span class="keyword">return</span> s</span><br></pre></td></tr></table></figure><h2 id="4-2-初始化参数"><a href="#4-2-初始化参数" class="headerlink" title="4.2 初始化参数"></a>4.2 初始化参数</h2><p>初始化权重参数$w$并使其为0,参数$b = 0$,具体实现代码如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">initialize_with_zeros</span><span class="params">(dim)</span>:</span> </span><br><span class="line"> w = np.zeros((dim, <span class="number">1</span>))</span><br><span class="line"> b = <span class="number">0</span></span><br><span class="line"> <span class="comment">#确保参数w的维数和数据类型正确</span></span><br><span class="line"> <span class="keyword">assert</span>(w.shape == (dim, <span class="number">1</span>))</span><br><span class="line"> <span class="keyword">assert</span>(isinstance(b, float) <span class="keyword">or</span> isinstance(b, int))</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">return</span> w, b</span><br></pre></td></tr></table></figure><h2 id="4-3-前向和反向传播算法"><a href="#4-3-前向和反向传播算法" class="headerlink" title="4.3 前向和反向传播算法"></a>4.3 前向和反向传播算法</h2><p>初始化参数之后,通过传播算法计算学习参数了,关于参数的计算其公式如下是所示:<br>前向传播算法:<br>通过输入变量$X$计算:<br> $A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, …, a^{(m-1)}, a^{(m)})\tag{7}$<br>计算损失函数:<br> $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})\tag{8}$<br>可以通过以下两个公式计算相关参数:<br>$$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{9}$$<br>$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{10}$$</p><p>根据以上公式,算法实现的代码如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">propagate</span><span class="params">(w, b, X, Y)</span>:</span></span><br><span class="line"> m = X.shape[<span class="number">1</span>]</span><br><span class="line"> A = sigmoid(np.dot(w.T, X) + b) </span><br><span class="line"> cost = <span class="number">-1</span> / m * np.sum(Y * np.log(A) + (<span class="number">1</span> - Y) * np.log(<span class="number">1</span> - A)) </span><br><span class="line"> dw = <span class="number">1</span> / m * np.dot(X, (A - Y).T)</span><br><span class="line"> db = <span class="number">1</span> / m * np.sum(A - Y)</span><br><span class="line"> <span class="keyword">assert</span>(dw.shape == w.shape)</span><br><span class="line"> <span class="keyword">assert</span>(db.dtype == float)</span><br><span class="line"> cost = np.squeeze(cost)</span><br><span class="line"> <span class="keyword">assert</span>(cost.shape == ())</span><br><span class="line"> grads = {<span class="string">"dw"</span>: dw, <span class="string">"db"</span>: db}</span><br><span class="line"> <span class="keyword">return</span> grads, cost</span><br></pre></td></tr></table></figure><h2 id="4-4-利用梯度下降优化算法"><a href="#4-4-利用梯度下降优化算法" class="headerlink" title="4.4 利用梯度下降优化算法"></a>4.4 利用梯度下降优化算法</h2><p>有了前向传播算法对参数的计算,接下来需要做的就是利用反向传播算法中的梯度下降算法更新参数,具体实现代码如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">optimize</span><span class="params">(w, b, X, Y, num_iterations, learning_rate, print_cost = False)</span>:</span></span><br><span class="line"> costs = []</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(num_iterations):</span><br><span class="line"> grads, cost = propagate(w, b, X, Y)</span><br><span class="line"> dw = grads[<span class="string">"dw"</span>]</span><br><span class="line"> db = grads[<span class="string">"db"</span>]</span><br><span class="line"> w = w - learning_rate * dw</span><br><span class="line"> b = b - learning_rate * db</span><br><span class="line"> <span class="keyword">if</span> i % <span class="number">100</span> == <span class="number">0</span>:</span><br><span class="line"> costs.append(cost)</span><br><span class="line"> <span class="keyword">if</span> print_cost <span class="keyword">and</span> i % <span class="number">100</span> == <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">print</span> (<span class="string">"Cost after iteration %i: %f"</span> %(i, cost))</span><br><span class="line"> </span><br><span class="line"> params = {<span class="string">"w"</span>: w,</span><br><span class="line"> <span class="string">"b"</span>: b}</span><br><span class="line"> </span><br><span class="line"> grads = {<span class="string">"dw"</span>: dw,</span><br><span class="line"> <span class="string">"db"</span>: db}</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">return</span> params, grads, costs</span><br></pre></td></tr></table></figure><p>有了通过反向传播算法得到的参数$w$和$b$,可以根据公式:<br>$\hat{y} = \delta(w^Tx+b)\tag{11}$做出预测了,具体实现代码如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">predict</span><span class="params">(w, b, X)</span>:</span></span><br><span class="line"> m = X.shape[<span class="number">1</span>]</span><br><span class="line"> Y_prediction = np.zeros((<span class="number">1</span>,m))</span><br><span class="line"> w = w.reshape(X.shape[<span class="number">0</span>], <span class="number">1</span>)</span><br><span class="line"> A = sigmoid(np.dot(w.T, X) + b)</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(A.shape[<span class="number">1</span>]):</span><br><span class="line"> <span class="keyword">if</span> A[<span class="number">0</span>, i] <= <span class="number">0.5</span>:</span><br><span class="line"> Y_prediction[<span class="number">0</span>, i] = <span class="number">0</span></span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> Y_prediction[<span class="number">0</span>, i] = <span class="number">1</span></span><br><span class="line"> <span class="keyword">assert</span>(Y_prediction.shape == (<span class="number">1</span>, m))</span><br><span class="line"> <span class="keyword">return</span> Y_prediction</span><br></pre></td></tr></table></figure><h1 id="5-将所有函数合并在一起搭建模型"><a href="#5-将所有函数合并在一起搭建模型" class="headerlink" title="5. 将所有函数合并在一起搭建模型"></a>5. 将所有函数合并在一起搭建模型</h1><p>通过以上,已经分模块实现了算法的所有部分,现在可以通过构建一个模型函数将所有的函数合并在一起,用以实现模型的参数更新和预测,具体实现代码如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">model</span><span class="params">(X_train, Y_train, X_test, Y_test, num_iterations = <span class="number">2000</span>, learning_rate = <span class="number">0.5</span>, print_cost = False)</span>:</span></span><br><span class="line"> w, b = initialize_with_zeros(X_train.shape[<span class="number">0</span>])</span><br><span class="line"> parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)</span><br><span class="line"> w = parameters[<span class="string">"w"</span>]</span><br><span class="line"> b = parameters[<span class="string">"b"</span>]</span><br><span class="line"> Y_prediction_test = predict(w, b, X_test)</span><br><span class="line"> Y_prediction_train = predict(w, b, X_train)</span><br><span class="line"> print(<span class="string">"train accuracy: {} %"</span>.format(<span class="number">100</span> - np.mean(np.abs(Y_prediction_train - Y_train)) * <span class="number">100</span>))</span><br><span class="line"> print(<span class="string">"test accuracy: {} %"</span>.format(<span class="number">100</span> - np.mean(np.abs(Y_prediction_test - Y_test)) * <span class="number">100</span>))</span><br><span class="line"> d = {<span class="string">"costs"</span>: costs,</span><br><span class="line"> <span class="string">"Y_prediction_test"</span>: Y_prediction_test, </span><br><span class="line"> <span class="string">"Y_prediction_train"</span> : Y_prediction_train, </span><br><span class="line"> <span class="string">"w"</span> : w, </span><br><span class="line"> <span class="string">"b"</span> : b,</span><br><span class="line"> <span class="string">"learning_rate"</span> : learning_rate,</span><br><span class="line"> <span class="string">"num_iterations"</span>: num_iterations}</span><br><span class="line"> <span class="keyword">return</span> d</span><br></pre></td></tr></table></figure><p>可以简单绘制模型训练过程中损失函数和梯度的变化曲线,如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">costs = np.squeeze(d[<span class="string">'costs'</span>])</span><br><span class="line">plt.plot(costs)</span><br><span class="line">plt.ylabel(<span class="string">'cost'</span>)</span><br><span class="line">plt.xlabel(<span class="string">'iterations (per hundreds)'</span>)</span><br><span class="line">plt.title(<span class="string">"Learning rate ="</span> + str(d[<span class="string">"learning_rate"</span>]))</span><br><span class="line">plt.show()</span><br></pre></td></tr></table></figure><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/%E8%AF%BE%E7%A8%8B1-%E7%AC%AC%E4%BA%8C%E5%91%A8%E8%AF%BE%E5%90%8E%E4%BD%9C%E4%B8%9A2-6.png" alt></p><p>如上图所示,可以看到,随着迭代次数的增加,模型的损失正在下降,当跌打次数增加,将会看到训练集的损失逐渐上升,而测试集的损失逐渐下降。</p><h1 id="6-进一步的分析"><a href="#6-进一步的分析" class="headerlink" title="6.进一步的分析"></a>6.进一步的分析</h1><p>学习率的大小对于模型的训练也是至关重要的,当学习率的大小决定着参数更新的速度,当学习率过大时,模型不容易收敛,而当学习率过小时,模型的需要迭代很多次才能收敛,可以用以下代码验证学习率对模型训练的影响。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">learning_rates = [<span class="number">0.01</span>, <span class="number">0.001</span>, <span class="number">0.0001</span>]</span><br><span class="line">models = {}</span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> learning_rates:</span><br><span class="line"> <span class="keyword">print</span> (<span class="string">"learning rate is: "</span> + str(i))</span><br><span class="line"> models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = <span class="number">1500</span>, learning_rate = i, print_cost = <span class="literal">False</span>)</span><br><span class="line"> <span class="keyword">print</span> (<span class="string">'\n'</span> + <span class="string">"-------------------------------------------------------"</span> + <span class="string">'\n'</span>)</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> learning_rates:</span><br><span class="line"> plt.plot(np.squeeze(models[str(i)][<span class="string">"costs"</span>]), label= str(models[str(i)][<span class="string">"learning_rate"</span>]))</span><br><span class="line"></span><br><span class="line">plt.ylabel(<span class="string">'cost'</span>)</span><br><span class="line">plt.xlabel(<span class="string">'iterations'</span>)</span><br><span class="line"></span><br><span class="line">legend = plt.legend(loc=<span class="string">'upper center'</span>, shadow=<span class="literal">True</span>)</span><br><span class="line">frame = legend.get_frame()</span><br><span class="line">frame.set_facecolor(<span class="string">'0.90'</span>)</span><br></pre></td></tr></table></figure><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/%E8%AF%BE%E7%A8%8B1-%E7%AC%AC%E4%BA%8C%E5%91%A8%E4%BD%9C%E4%B8%9A2-7.png" alt></p><p>综上,可以得到如下结论:</p><ul><li>不同的学习率会得到不同的预测结果和损失</li><li>如果模型的学习率过大,可能会导致模型的损失上下振荡,如上图所示,学习率取0.001可能是一个不错的选择。</li><li>过小的损失不一定意味着该模型是一个好模型,能做出很好的预测,很可能是过拟合原因造成的。这种过拟合情况通常发生在训练集的精确度比测试集高很多的情况下。</li><li>对于深度学习,通常有如下建议:<ul><li>为模型选择一个合适的学习率以得到更好的模型参数</li><li>如果模型过拟合,通常要采取其他措施消除这种过拟合,一种行之有效的方法是正则化方法。</li></ul></li></ul>]]></content>
<summary type="html">
<p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/nm6318.jpg" alt><br>本次作业,将会构建一个逻辑回归分类器用以识别猫, 通过完成这次作业,也能了解一些神经网络的思维,并且能够建立一个对神经网络的基本认识。</p>
</summary>
<category term="深度学习" scheme="https://elmadavies.github.io/categories/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/"/>
<category term="神经网络" scheme="https://elmadavies.github.io/tags/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/"/>
<category term="逻辑回归" scheme="https://elmadavies.github.io/tags/%E9%80%BB%E8%BE%91%E5%9B%9E%E5%BD%92/"/>
</entry>
<entry>
<title>神经网络基础</title>
<link href="https://elmadavies.github.io/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%9F%BA%E7%A1%80/"/>
<id>https://elmadavies.github.io/深度学习/神经网络基础/</id>
<published>2019-10-08T10:53:22.812Z</published>
<updated>2019-10-08T12:35:46.788Z</updated>
<content type="html"><![CDATA[<p> 假设有以下一张图片,要判断其输出是否为猫,若是,则可以用$y = 1$表示,否则用$y = 0$表示。</p><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/1-1.png" alt></p><a id="more"></a><h2 id="二元分类"><a href="#二元分类" class="headerlink" title="二元分类"></a>二元分类</h2><p>计算机保存一张图片通常使用3个独立矩阵,分别对应红、绿、蓝三个颜色通道。如果输入像素是<code>64×64</code>,则会有3个<code>64×64</code>的矩阵,即输入是一个<code>3×64×64</code>的高维度数据,而输出是$y=0 \ or\ 1$。</p><p>与<a href="https://www.jianshu.com/p/c18c1e08f239" target="_blank" rel="noopener">机器学习之单变量线性回归</a>一样,可以用$(x,y)$表示一个训练样本,其中$x \in R^n$,而$y = {0,1}$,通常用${(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)})……(x^{(m)},y^{(m)}) }$表示整个训练集,其中$m$表示训练样本的大小,以上,由$m$个训练样本可以组成输入矩阵$X$,且$X \in R^{n×m}$,$Y$是一个输出矩阵,且$Y \in R^{1×m}$。</p><h2 id="逻辑回归"><a href="#逻辑回归" class="headerlink" title="逻辑回归"></a>逻辑回归</h2><p>对于以上分类问题,给定输入$x$,我们想知道$y$的输出概率,$\hat{y}$的输出概率代表着该分类问题的结果,$\hat{y}$的值表示是猫的概率可以简单表示为$\hat{y} = p{y=1|x}$,且$0 \le \hat{y} \le 1$。</p><p>对于线性回归,有$y = w^Tx+b$,而对于逻辑回归,采用激活函数,即:$ y = \delta(w^Tx+b) $表示,令$z = w^Tx+b$,则:</p><p>$\delta(z) = \frac{1}{1+e^{-z}}$,其函数图像如下所示:</p><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/1-2.png" alt></p><p>当$\delta(z) > 0.5$,即也就是$w^Tx+b > 0 $时,认为其输出$\hat{y} = 1$<br>而$\delta(z) < 0.5$,即也就是$w^Tx+b < 0 $时,其输出$\hat{y} = 0$.</p><h2 id="逻辑回归损失函数"><a href="#逻辑回归损失函数" class="headerlink" title="逻辑回归损失函数"></a>逻辑回归损失函数</h2><p>对于逻辑回归问题,给定训练样本:<br>${(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)})……(x^{(m)},y^{(m)}) }$,我们希望得到$\hat{y} \approx y$。逻辑回归的损失函数$L(\hat{y},y)$可以由以下公式定义:</p><p>$L(\hat{y},y) = -ylog(\hat{y}) -(1-y)log(1-\hat{y})$</p><p>对于以上损失函数,有:<br>若$y = 1$,则$L(y,\hat{y}) = -ylog(\hat{y})$,而想让损失函数$L(y,\hat{y})$尽可能小,意味着$\hat{y}$要尽可能大,又因为$0 \le\hat{y} \le 1$,所以$\hat{y} = 1$是,损失函数最小。</p><p>若$y = 0$,则$L(y,\hat{y}) = -log(1-\hat{y})$,损失函数要取得最小值,意味着$\hat{y}$需取得最大值,则需满足$\hat{y} = 0$。</p><p>以上损失函数只适用于单个样本,对于$m$个样本的损失函数可以有如下定义:</p><p>$J(w,b) = \frac{1}{m}\sum_{i=1}^{m}L(y,\hat{y}) = \frac{1}{m}\sum_{i=1}^{m}-y^{(i)}log(\hat{y}^{(i)})-(1-y^{(i)})log(1-\hat{y}^{(i)})$</p><h2 id="梯度下降法"><a href="#梯度下降法" class="headerlink" title="梯度下降法"></a>梯度下降法</h2><p>对于以上损失函数,需要找到损失函数$J(w,b)$的最小值,最常用的算法就是梯度下降算法,即对于一个凸函数,总能通过梯度下降算法找到它的全局最优解,对于此损失函数的梯度下降算法,在<a href="https://www.jianshu.com/p/b603be6c9f0d" target="_blank" rel="noopener">机器学习之逻辑回归</a>的算法介绍中已经做了较为详细的推导,在此不再过多叙述,梯度下降算法的简单实现步骤如下所示:<br>$repeat \ \ {$<br>$w: = w- \alpha \frac{\partial J(w,b)}{\partial w}$</p><p>$}$<br>重复以上过程,直到损失函数收敛,以求得参数$w$的值,其中,$\alpha$代表学习率。</p><h2 id="计算图"><a href="#计算图" class="headerlink" title="计算图"></a>计算图</h2><h3 id="计算图介绍:"><a href="#计算图介绍:" class="headerlink" title="计算图介绍:"></a>计算图介绍:</h3><p>假设有一函数表达式为:$J(a,b,c) = 3(a+bc)$,其计算过程可以简单分为三个步骤,如下所示:</p><ul><li>$u = bc$</li><li>$v = a +u $</li><li>$J =3*v $<br>对于以上三个步骤,用计算图可以有如下表示</li></ul><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/1-3.png" alt></p><h3 id="计算图的导数:"><a href="#计算图的导数:" class="headerlink" title="计算图的导数:"></a>计算图的导数:</h3><p>如上图所示,利用计算图从左向右的流程,一步步可以算出$J$的值,那么,依靠从右向左的反向传播就可以算出$J$对每个变量的导数,如下图所示:</p><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/1-4.png" alt></p><p>其反向传播过程如图中红色箭头所示,根据导数定义以及链式计算法则,有如下计算:</p><p>$\frac{\partial J}{\partial v} = 3$</p><p>$\frac{\partial J}{\partial u} =\frac{\partial J}{\partial v} \frac{\partial v}{\partial u} = 3×1 = 3$</p><p>$\frac{\partial J}{\partial a} =\frac{\partial J}{\partial v} \frac{\partial v}{\partial a} = 3×1 =3$</p><p>$\frac{\partial J}{\partial b} =\frac{\partial J}{\partial v} \frac{\partial v}{\partial u} \frac{\partial u}{\partial b} = 3×1×5 =15$</p><p>$\frac{\partial J}{\partial c} =\frac{\partial J}{\partial v} \frac{\partial v}{\partial u} \frac{\partial u}{\partial c} = 3×1×4 =12$</p><h2 id="逻辑回归中的梯度下降算法"><a href="#逻辑回归中的梯度下降算法" class="headerlink" title="逻辑回归中的梯度下降算法"></a>逻辑回归中的梯度下降算法</h2><h3 id="单个样本的逻辑回归梯度下降算法"><a href="#单个样本的逻辑回归梯度下降算法" class="headerlink" title="单个样本的逻辑回归梯度下降算法"></a>单个样本的逻辑回归梯度下降算法</h3><p>关于逻辑回归的损失函数,有如下公式:<br>$z= w^Tx+b$<br>$\hat{y} = a = \delta(z)$<br>$L(a,y) = -ylog(a)-(1-y)log(1-a)$<br>假设有两个输入特征$x_1,x_2$和两个参数$w_1,w_2$,则用计算图(流程图)表示其计算过程如下所示:</p><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/1-5.png" alt></p><p>依照其计算图中的反向传播过程和链式法则,其导数计算如下所示:</p><p>$\frac{\partial L(a,y)}{\partial a} = -\frac{y}{a}+\frac{1-y}{1-a}$</p><p>$\frac{\partial L(a,y)}{\partial z} = \frac{\partial L}{\partial a} \frac{\partial a}{\partial z} = a-y$</p><p>$\frac{\partial L(a,y)}{\partial w_1} =\frac{\partial L}{\partial a} \frac{\partial a}{\partial z} \frac{\partial z}{\partial w_1} =x_1dz = x_1*(a-y)$</p><p>$\frac{\partial L(a,y)}{\partial w_2} =\frac{\partial L}{\partial a} \frac{\partial a}{\partial z} \frac{\partial z}{\partial w_2} = x_2dz = x_2*(a-y)$</p><p>$\frac{\partial L(a,y)}{\partial b} =\frac{\partial L}{\partial a} \frac{\partial a}{\partial z} \frac{\partial z}{\partial b} = dz = (a-y)$<br>···</p><p>最后,参数$w_1,w_2,b$的更新规律为:<br>$w_1 := w_1 - \alpha dw_1$<br>$w_2 := w_2 - \alpha dw_2$<br>$b := b - \alpha db$<br>其中,$\alpha$表示学习率。</p><h3 id="m-个样本的逻辑回归"><a href="#m-个样本的逻辑回归" class="headerlink" title="$m$个样本的逻辑回归"></a>$m$个样本的逻辑回归</h3><p>$m$个样本的损失函数,如下所示:</p><p>$J(w,b) = \frac{1}{m}\sum_{i=1}^{m}L(a^{(i)},y^{(i)}) $</p><p>$a^{(i)} = \hat{y}^{(i)} = \delta(z^{(i)}) = \delta(w^Tx^{(i)} +b)$</p><p>其梯度计算公式,可以有如下表示:<br>$\frac{\partial J(w,b)}{\partial w} = \frac{1}<br>{m}\sum_{i=1}^{m}\frac{\partial }{\partial w}L(a^{(i)},y^{(i)}) $</p><p>在实际计算过程中,需要计算每一个样本的关于$w$的梯度,最后求和取平均,在一个具体算法实现中,其为代码可以如下所示:</p><p>假设有2个特征向量,$m$个样本,则有:<br>初始化:$J = 0, dw_1 = 0,dw_2 = 0$<br>$for i =1 to \ \ m:$</p><p>$ \ \ \ \ \ \ z^{(i)} = w^Tx^{(i)} +b;$</p><p>$ \ \ \ \ \ \ a^{(i)} =\delta(z^{(i)});$</p><p>$ \ \ \ \ \ \ J+ = -y^{(i)}log(a^{(i)})-(1-y^{(i)})log(1-a^{(i)});$</p><p>$ \ \ \ \ \ \ dz^{(i)} =a^{(i)} - y^{(i)};$</p><p>$ \ \ \ \ \ \ dw_1 +=x_1*dz^{(i)};$</p><p>$ \ \ \ \ \ \ dw_2 +=x_2*dz^{(i)};$</p><p>$ \ \ \ \ \ \ db \ +=dz^{(i)};$</p><p>$J/=m,dw_1/=m,dw_2/=m,db/=m;$</p><p>以上,是应用一次梯度下降的过程,应用多次梯度下降算法之后,其参数的更新如下所示:<br>$w_1 := w_1 - \alpha dw_1$<br>$w_2 := w_2 - \alpha dw_2$<br>$b := b - \alpha db$</p><blockquote><p>注意:以上,算法实现过程中,有两个特征和参数,分别是$x_1,x_2$和$w_1,w_2$,当有$n$个特征和参数时,可以利用循环完成。</p></blockquote><h2 id="向量化"><a href="#向量化" class="headerlink" title="向量化"></a>向量化</h2><h3 id="向量化的简单示例:"><a href="#向量化的简单示例:" class="headerlink" title="向量化的简单示例:"></a>向量化的简单示例:</h3><p>如以上算法表示,通过<code>for</code>循环来遍历$m$个样本和$n$个特征,当在整个算法运行过程中,需要考虑运行时间的问题,当样本数量和特征足够大时,采用传统的<code>for</code>循环不是一个明智的选择,为了减少算法运行时间,特地引入了向量化的实现。<br>将一以下代码作为示例:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="keyword">import</span> random</span><br><span class="line"><span class="keyword">import</span> time</span><br><span class="line">a = np.random.rand(<span class="number">1000000</span>)</span><br><span class="line">b = np.random.rand(<span class="number">1000000</span>)</span><br><span class="line">ts = time.time()</span><br><span class="line">c = np.dot(a,b)</span><br><span class="line">te = time.time()</span><br><span class="line">print(c)</span><br><span class="line">print(<span class="string">"向量化的代码实现花费时间:"</span>+str((te-ts)*<span class="number">1000</span>)+<span class="string">" ms"</span>)</span><br><span class="line"></span><br><span class="line">c = <span class="number">0</span></span><br><span class="line">ts = time.time()</span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> range(<span class="number">1000000</span>):</span><br><span class="line"> c += a[i]*b[i]</span><br><span class="line">te = time.time()</span><br><span class="line">print(c)</span><br><span class="line"></span><br><span class="line">print(<span class="string">"for循环代码实现花费时间:"</span>+str((te-ts)*<span class="number">1000</span>)+<span class="string">" ms"</span>)</span><br></pre></td></tr></table></figure><p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/1-6.png" alt></p><p>如上所示,同样实现两个数组(向量)相乘的过程,对于百万级别的数据,<code>for</code>循环的实现方式所花费的时间差不多是向量化的400倍左右,向量化的实现可以简单的理解为是一个并行的过程,而<code>for</code>循环可以简单理解为串行的过程,所以通过向量化的实现,大大节省了运行程序所耗费的时间。在算法实现过程中,应该尽量避免使用<code>for</code>循环。</p><h3 id="用向量化实现逻辑回归:"><a href="#用向量化实现逻辑回归:" class="headerlink" title="用向量化实现逻辑回归:"></a>用向量化实现逻辑回归:</h3><p>对于逻辑回归的算法,需要考虑输入向量$X$和权重参数$W$,其中,$X \in R^{n×m}$,$W \in R^{n×1}$,而根据矩阵乘法运算法则和逻辑回归的实现原理,有:<br>$[z_1,z_2,……z_m] = [W^TX]+[b_1,b_2,……b_m]$</p><p>在<code>python</code>中用<code>numpy</code>库,可以简单的用以下一行代码实现(一般认为$b$是一个R^{1×1}的偏置常量):</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">z = np.dot(W.T,x)+b</span><br></pre></td></tr></table></figure><p>根据之前的学习,对于逻辑回归利用反向传播算法计算导数,有:</p><blockquote><p>$ \ \ \ \ \ \ dz^{(i)} =a^{(i)} - y^{(i)};$<br>$ \ \ \ \ \ \ dw_1 +=x_1<em>dz^{(i)};$<br>$ \ \ \ \ \ \ dw_2 +=x_2</em>dz^{(i)};$<br>$\ \ \ \ \ \ \ …$<br>$ \ \ \ \ \ \ dw_n +=x_n*dz^{(i)};$<br>$ \ \ \ \ \ \ db \ +=dz^{(i)};$<br>$J/=m,dw_1/=m,dw_2/=m,db/=m;$</p></blockquote><p>对于以上,公式,有如下定义:<br>$dZ = [dz^{(1)},dz^{(2)}……dz^{(m)}]$<br>$A = [a^{(1)},a^{(2)},……a^{(m)}]$<br>$Y = [y^{(1)},y^{(2)}……y^{(m)}]$<br>$dZ = [A - Y]$<br>对于以上过程,摈弃传统的<code>for</code>循环实现,采用向量化的实现方式可以简单表示为:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">dw = np.dot(X,dZ^T)</span><br><span class="line">db = <span class="number">1</span>/m*np.sum(dZ)</span><br></pre></td></tr></table></figure><p>综合以上所有向量化的实现,可以得到利用<code>python</code>实现的一个高度向量化的逻辑回归梯度下降算法(a代表学习率):</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">Z = np.dot(W^T,X)+b</span><br><span class="line">A = np.exp(Z)</span><br><span class="line">dZ = A-Y</span><br><span class="line">dw = np.dot(X,dZ^T)</span><br><span class="line">db = <span class="number">1</span>/m*np.sum(dZ)</span><br><span class="line">w = w-a*dw</span><br><span class="line">b = b-a*db</span><br></pre></td></tr></table></figure><p>以上,只是实现一次梯度下降的伪代码,在实际算法运行过程中,我们仍然需要利用循环实现多次梯度下降。</p>]]></content>
<summary type="html">
<p> 假设有以下一张图片,要判断其输出是否为猫,若是,则可以用$y = 1$表示,否则用$y = 0$表示。</p>
<p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/deepLearning_course1/1-1.png" alt></p>
</summary>
<category term="深度学习" scheme="https://elmadavies.github.io/categories/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/"/>
<category term="深度学习" scheme="https://elmadavies.github.io/tags/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0/"/>
<category term="神经网络" scheme="https://elmadavies.github.io/tags/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/"/>
</entry>
<entry>
<title>新的开始</title>
<link href="https://elmadavies.github.io/%E9%9A%8F%E7%AC%94/%E6%96%B0%E7%9A%84%E5%BC%80%E5%A7%8B/"/>
<id>https://elmadavies.github.io/随笔/新的开始/</id>
<published>2019-10-07T12:59:24.661Z</published>
<updated>2019-10-07T14:25:12.147Z</updated>
<content type="html"><![CDATA[<p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/blog_background/bg1.jpg" alt></p><a id="more"></a><h4 id="一些说明"><a href="#一些说明" class="headerlink" title="一些说明"></a>一些说明</h4><p>花了一个国庆的时间,终于将这个博客搭建的差不多了,尽管还有很多不足,但是目前基本够用,也许在使用过程中还会出现很多bug,只能边用边解决了。<br><br><br>搭建一个自己的博客是我一直想要做的事,但是之前由于被各种繁杂的事务耽搁,导致一直没有时间动手去做,趁着这个国庆假期终于做好了。整个博客是用<code>hexo+github</code>搭建完成的,倒也没什么难度,就是需要一些耐心,当然,如果有一些前端方面的知识,搭建起来会更加顺利,可定制化也会更高。<br><br><br>以后,会慢慢的将自己的简书上的文章迁移过来,以后也会在简书平台和这个博客网站共同记录自己的学习笔记,除此之外,博客平台除了技术也会记录一些自己的生活经历等。<br><br><br>在搭建博客的过程中,看到可好多大佬的博客非常漂亮,也受到了好多大佬的所写文章的指点,在此表示非常感谢。为此,附上几个教学链接,有需要搭建此类博客的同学,也可以了解一下。</p><h4 id="博客搭建与优化教程"><a href="#博客搭建与优化教程" class="headerlink" title="博客搭建与优化教程"></a>博客搭建与优化教程</h4><ul><li><p><a href="http://yearito.cn/categories/%E6%8A%80%E6%9C%AF/%E5%8D%9A%E5%AE%A2/" target="_blank" rel="noopener">搭建教程1</a></p></li><li><p><a href="https://bestzuo.cn/categories/%E5%8D%9A%E5%AE%A2/" target="_blank" rel="noopener">搭建教程2</a></p></li><li><p><a href="https://bestzuo.cn/categories/%E5%8D%9A%E5%AE%A2/" target="_blank" rel="noopener">搭建教程3</a></p></li></ul>]]></content>
<summary type="html">
<p><img src="https://elamdavies-1300381401.cos.ap-chengdu.myqcloud.com/blog_background/bg1.jpg" alt></p>
</summary>
<category term="随笔" scheme="https://elmadavies.github.io/categories/%E9%9A%8F%E7%AC%94/"/>
<category term="Start" scheme="https://elmadavies.github.io/tags/Start/"/>
</entry>
</feed>