-
Notifications
You must be signed in to change notification settings - Fork 0
/
kdump.html
245 lines (222 loc) · 18.9 KB
/
kdump.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<title>如何解决一个内核崩溃问题 — Feng's blog 1.0 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/custom.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="如何写一个 kprobe 模块" href="kprobe.html" />
<link rel="prev" title="pcap 库怎么获取抓到包的网卡信息" href="pcap-sll2.html" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
<meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9" />
</head><body>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<section id="ru-he-jie-jue-yi-ge-nei-he-beng-kui-wen-ti">
<h1>如何解决一个内核崩溃问题<a class="headerlink" href="#ru-he-jie-jue-yi-ge-nei-he-beng-kui-wen-ti" title="Permalink to this headline">¶</a></h1>
<section id="tong-guo-kdump-huo-qu-nei-he-beng-kui-xian-chang">
<h2>通过 kdump 获取内核崩溃现场<a class="headerlink" href="#tong-guo-kdump-huo-qu-nei-he-beng-kui-xian-chang" title="Permalink to this headline">¶</a></h2>
<p>内核崩溃的时候如果不能通过管理控制台获取最后的现场信息,可以通过 kdump 工具来转储崩溃现场。</p>
<blockquote>
<div><p>kdump is a feature of the Linux kernel that creates crash dumps in
the event of a kernel crash. When triggered, kdump exports a memory
image (also known as vmcore) that can be analyzed for the purposes of
debugging and determining the cause of a crash.</p>
<p>–- <a class="reference external" href="https://en.wikipedia.org/wiki/Kdump_(Linux">https://en.wikipedia.org/wiki/Kdump_(Linux</a>)</p>
</div></blockquote>
<p>CentOS 系统一般预装了 kdump ,没有的话可以通过以下命令安装。</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>yum install kexec-tools
</pre></div>
</div>
<p>修改内核启动参数添加 kdump 配置。</p>
<div class="highlight-ini notranslate"><div class="highlight"><pre><span></span><span class="c1"># 修改 /etc/default/grub 中的内核启动参数,添加下面 crashkernel 参数</span>
<span class="na">GRUB_CMDLINE_LINUX</span><span class="o">=</span><span class="s">"... crashkernel=auto"</span>
</pre></div>
</div>
<p>重新生成 grub2 配置:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
</pre></div>
</div>
<p>重启确认配置是否成功(成功后内核参数会多出上面新加的 crashkernel 参数)。</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp"># </span>cat /proc/cmdline
<span class="go">BOOT_IMAGE=/boot/vmlinuz-4.19.113-88.8bs.el7.x86_64 ... crashkernel=auto ...</span>
</pre></div>
</div>
<p>确认 kdump 服务是否正常</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp"># </span>systemctl status kdump
<span class="go">● kdump.service - Crash recovery kernel arming</span>
<span class="go">Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)</span>
<span class="go">Active: active (exited) since Tue 2024-05-14 10:36:56 CST; 1h 15min ago</span>
<span class="go">...</span>
</pre></div>
</div>
<p>有些时候 <code class="docutils literal notranslate"><span class="pre">crashkernel=auto</span></code> 参数会失败,这个时候可以尝试将 <code class="docutils literal notranslate"><span class="pre">auto</span></code> 改成具体的内存大小。</p>
</section>
<section id="yi-ge-zhen-shi-de-beng-kui-xian-chang">
<h2>一个真实的崩溃现场<a class="headerlink" href="#yi-ge-zhen-shi-de-beng-kui-xian-chang" title="Permalink to this headline">¶</a></h2>
<p>启用了 kdump 之后,每次内核崩溃,kdump 会在 <code class="docutils literal notranslate"><span class="pre">/var/crash</span></code> 下创建一个新的目录用来保存崩溃的现场信息,每个现场一般两个文件,一个 core 文件 <code class="docutils literal notranslate"><span class="pre">vmcore</span></code> ,一个崩溃时的 dmesg 文件。</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp"># </span>ll /var/crash/127.0.0.1-2023-12-20-10<span class="se">\:</span><span class="m">51</span><span class="se">\:</span><span class="m">07</span>/
<span class="go">total 160112</span>
<span class="go">-rw------- 1 root root 81906871 Dec 20 10:51 vmcore</span>
<span class="go">-rw-r--r-- 1 root root 137943 Dec 20 10:51 vmcore-dmesg.txt</span>
</pre></div>
</div>
<p>首先查看 dmesg 文件,一般崩溃通过 <code class="docutils literal notranslate"><span class="pre">dmesg</span></code> 就可以定位。</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>cat /var/crash/127.0.0.1-2023-12-20-10<span class="se">\:</span><span class="m">51</span><span class="se">\:</span><span class="m">07</span>/vmcore-dmesg.txt
<span class="go">...</span>
<span class="go">[ 1823.251269] RIP: 0010:iptunnel_xmit+0x17d/0x1c0</span>
<span class="go">[ 1823.253371] Code: ea 4c 89 e7 e8 d4 12 fb ff 48 85 ed 74 25 83 e0 fd 75 31 83 fb 00 7e 2a 48 8b 85 f0 04 00 00 65 48 03 05 ae 63 77 46 48 63 db <48> 83 40 10 01 48 01 58 18 48 83 c4 20 5b 5d 41 5c 41 5d 41 5e 41</span>
<span class="go">...</span>
<span class="go">[ 1823.280234] Call Trace:</span>
<span class="go">[ 1823.281826] <IRQ></span>
<span class="go">[ 1823.283814] send4+0xf2/0x1b0 [XxxWan]</span>
<span class="go">[ 1823.285593] hfunc_out6+0x2f7/0x510 [XxxWan]</span>
<span class="go">[ 1823.289267] ? ipt_do_table+0x351/0x680</span>
<span class="go">[ 1823.291741] nf_hook_slow+0x3d/0xb0</span>
<span class="go">[ 1823.293875] ip6_xmit+0x332/0x5e0</span>
<span class="go">[ 1823.295120] ? neigh_key_eq128+0x30/0x30</span>
<span class="go">[ 1823.296623] ? inet6_csk_route_socket+0x158/0x250</span>
<span class="go">[ 1823.298286] ? __kmalloc_node_track_caller+0x5d/0x280</span>
<span class="go">[ 1823.299848] inet6_csk_xmit+0x91/0xe0</span>
<span class="go">[ 1823.301455] __tcp_transmit_skb+0x536/0x9f0</span>
<span class="go">...</span>
<span class="go">[ 1823.367165] </IRQ></span>
<span class="go">...</span>
</pre></div>
</div>
<p>能从 dmesg 中获取到的有用信息:</p>
<ol class="arabic simple">
<li><p><code class="docutils literal notranslate"><span class="pre">RIP</span></code> 是程序指令指针(Instruction Pointer Relative),指向下一条即将被执行的指令,在上面 dmesg 中,RIP 指向 <code class="docutils literal notranslate"><span class="pre">iptunnel_xmit</span></code> 函数的 0x17d 偏移处。</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">Code</span></code> 行是内核崩溃时正在执行前后的指令 dump,其中 <code class="docutils literal notranslate"><span class="pre"><48></span></code> 为当前 RIP 指向的指令处。</p></li>
<li><p>最后 <code class="docutils literal notranslate"><span class="pre">Call</span> <span class="pre">Trace</span></code> 后面是内核崩溃时的函数调用栈,从调用栈可以看出内核是崩溃在了 <code class="docutils literal notranslate"><span class="pre">XxxWan</span></code> 模块的 <code class="docutils literal notranslate"><span class="pre">send4</span></code> 函数中,调用栈缺少 <code class="docutils literal notranslate"><span class="pre">send4</span></code> 函数到 <code class="docutils literal notranslate"><span class="pre">iptunnel_xmit</span></code> 的调用过程,并不完全。</p></li>
</ol>
</section>
<section id="fan-bian-yi-vmlinux-huo-qu-beng-kui-de-dai-ma-xing">
<h2>反编译 vmlinux 获取崩溃的代码行<a class="headerlink" href="#fan-bian-yi-vmlinux-huo-qu-beng-kui-de-dai-ma-xing" title="Permalink to this headline">¶</a></h2>
<p>系统日常启动加载的内核镜像(也就是内核的可执行文件)位于 <code class="docutils literal notranslate"><span class="pre">/boot/</span></code> 目录下,比如 <code class="docutils literal notranslate"><span class="pre">/boot/vmlinuz-4.19.113-88.8bs.el7.x86_64</span></code> ,这个内核镜像是去除了调试信息并压缩了的。各种内核调试工具需要用到是未压缩的原始
vmlinux 文件,这个文件一般在内核的 <code class="docutils literal notranslate"><span class="pre">debuginfo</span></code> 包中,安装。</p>
<p>对比可以看出未压缩的 vmlinux 镜像要比压缩后的 vmlinuz 镜像大两个数量级。</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp"># </span>ll -h /boot/vmlinuz-4.19.113-88.8bs.el7.x86_64
<span class="go">-rwxr-xr-x 1 root root 8.4M Nov 4 2022 /boot/vmlinuz-4.19.113-88.8bs.el7.x86_64</span>
<span class="gp"># </span>ll -h /usr/lib/debug/lib/modules/<span class="k">$(</span>uname -r<span class="k">)</span>/vmlinux
<span class="go">-rw-r--r-- 1 root root 495M Dec 22 14:59 /usr/lib/debug/lib/modules/4.19.113-88.8bs.el7.x86_64/vmlinux</span>
</pre></div>
</div>
<p>有了 vmlinux,就可以通过 <code class="docutils literal notranslate"><span class="pre">objdump</span> <span class="pre">-d</span></code> 反编译内核,根据前面 dmesg 信息,内核崩溃点在 <code class="docutils literal notranslate"><span class="pre">iptunnel_xmit</span></code> 函数中,所以先获取 <code class="docutils literal notranslate"><span class="pre">iptunnel_xmit</span></code> 的起始地址( <code class="docutils literal notranslate"><span class="pre">objdump</span></code>
不支持直接反编译某一个函数),然后从这个起始地址开始反编译内核并打印出汇编代码对应的 c 代码行号。</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>objdump -d /usr/lib/debug/lib/modules/<span class="k">$(</span>uname -r<span class="k">)</span>/vmlinux <span class="p">|</span> grep iptunnel_xmit
<span class="go">ffffffff81898c00 <iptunnel_xmit>:</span>
<span class="go">...</span>
<span class="gp">$ </span>objdump -d --start-address<span class="o">=</span>0xffffffff81898c00 --line-numbers /usr/lib/debug/lib/modules/<span class="k">$(</span>uname -r<span class="k">)</span>/vmlinux
<span class="go">/usr/lib/debug/lib/modules/4.19.113-88.8bs.el7.x86_64/vmlinux: file format elf64-x86-64</span>
<span class="go">Disassembly of section .text:</span>
<span class="go">...</span>
<span class="go">ffffffff81898c00 <iptunnel_xmit>:</span>
<span class="go">iptunnel_xmit():</span>
<span class="go">...</span>
<span class="go">/usr/src/debug/kernel-alt-4.19.113/linux-4.19.113-88.8bs.el7.x86_64/include/net/ip_tunnels.h:458</span>
<span class="go">ffffffff81898d6b: 48 8b 85 f0 04 00 00 mov 0x4f0(%rbp),%rax</span>
<span class="go">ffffffff81898d72: 65 48 03 05 ae 63 77 add %gs:0x7e7763ae(%rip),%rax # f128 <this_cpu_off></span>
<span class="go">ffffffff81898d79: 7e</span>
<span class="go">/usr/src/debug/kernel-alt-4.19.113/linux-4.19.113-88.8bs.el7.x86_64/include/net/ip_tunnels.h:461</span>
<span class="go">ffffffff81898d7a: 48 63 db movslq %ebx,%rbx</span>
<span class="go">/usr/src/debug/kernel-alt-4.19.113/linux-4.19.113-88.8bs.el7.x86_64/include/net/ip_tunnels.h:462</span>
<span class="go">ffffffff81898d7d: 48 83 40 10 01 addq $0x1,0x10(%rax)</span>
<span class="go">👆 RIP</span>
<span class="go">...</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">iptunnel_xmit</span></code> 函数 0x17d 偏移处为地址 <code class="docutils literal notranslate"><span class="pre">ffffffff81898d7d</span></code> 。</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="nb">hex</span><span class="p">(</span><span class="mh">0xffffffff81898c00</span><span class="o">+</span><span class="mh">0x17d</span><span class="p">)</span>
<span class="go">0xffffffff81898d7dL</span>
</pre></div>
</div>
<p>结合反编译内核得到的信息,可以获取到正在执行的代码位于 <code class="docutils literal notranslate"><span class="pre">ip_tunnels.h</span></code> 文件的 461 行。对应的 c 代码如下:</p>
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="c1">// ip_tunnels.h</span>
<span class="mi">455</span><span class="w"> </span><span class="k">static</span><span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="n">iptunnel_xmit_stats</span><span class="p">(</span><span class="k">struct</span> <span class="nc">net_device</span><span class="w"> </span><span class="o">*</span><span class="n">dev</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">pkt_len</span><span class="p">)</span><span class="w"></span>
<span class="mi">456</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="mi">457</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pkt_len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="mi">458</span><span class="w"> </span><span class="k">struct</span> <span class="nc">pcpu_sw_netstats</span><span class="w"> </span><span class="o">*</span><span class="n">tstats</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">get_cpu_ptr</span><span class="p">(</span><span class="n">dev</span><span class="o">-></span><span class="n">tstats</span><span class="p">);</span><span class="w"></span>
<span class="mi">459</span><span class="w"></span>
<span class="mi">460</span><span class="w"> </span><span class="n">u64_stats_update_begin</span><span class="p">(</span><span class="o">&</span><span class="n">tstats</span><span class="o">-></span><span class="n">syncp</span><span class="p">);</span><span class="w"></span>
<span class="mi">461</span><span class="w"> </span><span class="n">tstats</span><span class="o">-></span><span class="n">tx_bytes</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">pkt_len</span><span class="p">;</span><span class="w"></span>
<span class="p">...</span><span class="w"></span>
<span class="mi">475</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
</pre></div>
</div>
<p>自此分析结束,后面就是结合内核以及 <code class="docutils literal notranslate"><span class="pre">XxxWan</span></code> 的代码来看为什么这个地方会崩溃。</p>
<p>如果问题更复杂,还可以使用 <code class="docutils literal notranslate"><span class="pre">crash</span></code> 工具(类似 gdb)来从 core 文件中获取更多现场信息。</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>crash /usr/lib/debug/lib/modules/<span class="k">$(</span>uname -r<span class="k">)</span>/vmlinux /var/crash/.../vmcore
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">crash</span></code> 比较耗内存,在本次崩溃的小虚拟机上跑不起来报 <code class="docutils literal notranslate"><span class="pre">crash:</span> <span class="pre">cannot</span> <span class="pre">allocate</span> <span class="pre">any</span> <span class="pre">more</span> <span class="pre">memory!</span></code> 错误,以后有机会再说,详细可以参见: <a class="reference external" href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/kernel_administration_guide/kernel_crash_dump_guide#sect-crash-running-the-utility">https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/kernel_administration_guide/kernel_crash_dump_guide#sect-crash-running-the-utility</a> 。</p>
</section>
</section>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">如何解决一个内核崩溃问题</a><ul>
<li><a class="reference internal" href="#tong-guo-kdump-huo-qu-nei-he-beng-kui-xian-chang">通过 kdump 获取内核崩溃现场</a></li>
<li><a class="reference internal" href="#yi-ge-zhen-shi-de-beng-kui-xian-chang">一个真实的崩溃现场</a></li>
<li><a class="reference internal" href="#fan-bian-yi-vmlinux-huo-qu-beng-kui-de-dai-ma-xing">反编译 vmlinux 获取崩溃的代码行</a></li>
</ul>
</li>
</ul>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/kdump.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="https://cn.bing.com/search" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>
$('#searchbox').show(0);
document.getElementsByClassName('search')[0].addEventListener('submit', function(event) {
event.preventDefault();
var form = event.target;
var input = form.querySelector('input[name="q"]');
var value = input.value;
var q = 'ensearch=1&q=site%3Achanfung032.github.io++' + value;
var url = form.action + '?' + q;
window.location.href = url;
});
</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer">
©2017, chanfung032.
|
Powered by <a href="http://sphinx-doc.org/">Sphinx 4.1.2</a>
& <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.12</a>
|
<a href="_sources/kdump.rst.txt"
rel="nofollow">Page source</a>
</div>
</body>
</html>