Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic in dmu_write() on i686 #1284

Closed
amospalla opened this issue Feb 10, 2013 · 10 comments
Closed

Kernel panic in dmu_write() on i686 #1284

amospalla opened this issue Feb 10, 2013 · 10 comments
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture
Milestone

Comments

@amospalla
Copy link

Screenshot

20130209_007

Rebooted the machine with sysrq b

@amospalla
Copy link
Author

Crash happened while browsing.

@behlendorf
Copy link
Contributor

Thank we'll want to dig in to this when we have the time. Was this was a one time event?

@amospalla
Copy link
Author

I moved data to zfs, and after a couple of days, that happened, but after that moved again to ext4.

@behlendorf
Copy link
Contributor

OK, well we'll leave the issue open for reference in case someone else hits something similar. This however is the first report I've heard of something like this.

@baryluk
Copy link

baryluk commented Feb 26, 2013

Looks similar:

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.8.0-rc7-t43-prod-dirty (baryluk@sredniczarny) (gcc version 4.7.2 (Debian 4.7.2-5) ) #34 SMP Sun Feb 10 01:07:48 CET 2013
...
...
[  474.080157] BUG: unable to handle kernel paging request at ffca1000
[  474.081611] IP: [<cc67ef71>] dmu_write+0x1a1/0x260 [zfs]
[  474.083058] *pdpt = 0000000001b00001 *pde = 000000000a50e067 *pte = 0000000000000000 
[  474.084009] Oops: 0000 [#1] SMP 
[  474.084009] Modules linked in: pktcdvd cdrom ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM dummy ppdev decnet lp bnep rfcomm bluetooth libipw lib80211 uinput nfsd zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) hdaps pcmcia acpi_cpufreq mperf yenta_socket pcmcia_rsrc i2c_i801 pcmcia_core radeon gpio_ich video parport_pc drm_kms_helper floppy parport ttm drm cfbfillrect cfbimgblt cfbcopyarea intel_agp i2c_algo_bit intel_gtt agpgart xhci_hcd [last unloaded: ipw2200]
[  474.084009] Pid: 9035, comm: flush-zfs-27 Tainted: P           O 3.8.0-rc7-t43-prod-dirty #34 IBM 2669UYD/2669UYD
[  474.084009] EIP: 0060:[<cc67ef71>] EFLAGS: 00010202 CPU: 0
[  474.084009] EIP is at dmu_write+0x1a1/0x260 [zfs]
[  474.084009] EAX: 00000000 EBX: c30face8 ECX: 0000082a EDX: d0113800
[  474.084009] ESI: ffca1000 EDI: d0114800 EBP: c56a3c70 ESP: c56a3c28
[  474.084009]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  474.084009] CR0: 8005003b CR2: ffca1000 CR3: 073b7000 CR4: 000007f0
[  474.084009] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  474.084009] DR6: ffff0ff0 DR7: 00000400
[  474.084009] Process flush-zfs-27 (pid: 9035, ti=c56a2000 task=c576f480 task.ti=c56a2000)
[  474.084009] Stack:
[  474.084009]  00000000 00000000 000030a8 00000000 00000000 cc7224d6 c56a3c60 c56a3c5c
[  474.084009]  00000000 00000000 000030a8 00000000 000030a8 c7352650 00000001 00000010
[  474.084009]  c2d7c3c0 c5776000 c56a3d30 cc709e90 00000000 00000000 000030a8 00000000
[  474.084009] Call Trace:
[  474.084009]  [<cc709e90>] zfs_putpage+0x330/0x4e0 [zfs]
[  474.084009]  [<c1103a6c>] ? find_get_pages_tag+0xbc/0x160
[  474.084009]  [<cc71c13c>] zpl_putpage+0x2c/0x40 [zfs]
[  474.084009]  [<cc71c110>] ? zpl_readpage+0x60/0x60 [zfs]
[  474.084009]  [<c110bb61>] write_cache_pages+0x1b1/0x3c0
[  474.084009]  [<cc71c110>] ? zpl_readpage+0x60/0x60 [zfs]
[  474.084009]  [<c1089196>] ? dequeue_entity+0x116/0x580
[  474.084009]  [<cc71c0a8>] zpl_writepages+0x18/0x20 [zfs]
[  474.084009]  [<c110d1ba>] do_writepages+0x1a/0x40
[  474.084009]  [<c117ef4f>] __writeback_single_inode+0x2f/0x140
[  474.084009]  [<c1072f13>] ? wake_up_bit+0x23/0x30
[  474.084009]  [<c117fb62>] writeback_sb_inodes+0x162/0x330
[  474.084009]  [<c117fdac>] __writeback_inodes_wb+0x7c/0xb0
[  474.084009]  [<c117ffea>] wb_writeback+0x20a/0x290
[  474.084009]  [<c110c1ba>] ? global_dirty_limits+0x2a/0x110
[  474.084009]  [<c117da63>] ? over_bground_thresh+0x23/0xa0
[  474.084009]  [<c1181234>] wb_do_writeback+0x1f4/0x200
[  474.084009]  [<c11812b1>] bdi_writeback_thread+0x71/0x200
[  474.084009]  [<c1181240>] ? wb_do_writeback+0x200/0x200
[  474.084009]  [<c1072824>] kthread+0x94/0xa0
[  474.084009]  [<c1010000>] ? perf_trace_xen_mc_flush_reason+0x30/0xc0
[  474.084009]  [<c176d977>] ret_from_kernel_thread+0x1b/0x28
[  474.084009]  [<c1072790>] ? kthread_create_on_node+0xc0/0xc0
[  474.084009] Code: e8 e9 16 ff ff ff 8d 74 26 00 f6 c2 01 75 63 f7 c7 02 00 00 00 75 73 f7 c7 04 00 00 00 0f 85 87 00 00 00 89 c1 83 e0 03 c1 e9 02 <f3> a5 e9 2c ff ff ff 90 8d b4 26 00 00 00 00 e8 7b 95 ff ff 8b
[  474.084009] EIP: [<cc67ef71>] dmu_write+0x1a1/0x260 [zfs] SS:ESP 0068:c56a3c28
[  474.084009] CR2: 00000000ffca1000
[  474.084009] ---[ end trace a86b28d53f179a9f ]---

was compiling kernel. System was still working, and operational on different file systems. doing sync was blocking, unmounting or killing blocked processes was impossible. After a while, system freezed. But still was able to perform SysRq-b (-s/-u/-i/-e, was not operational I think).

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.8.0-rc7-t43-prod-dirty root=/dev/mapper/sredniczarny-root ro vmalloc=840M resume=/dev/mapper/sredniczarny-swap_1 thinkpad_acpi.fan_control=1
$ cat /sys/module/zfs/parameters/zfs_arc_max
536870912

machine have 2GB of ram.

I can repeat this error quite reliably.

@baryluk
Copy link

baryluk commented Feb 26, 2013

I will build debug kernel (not sure how, when I have sources on zfs itself... will need to clone/download again) and see if it will show any more interesting infos.

@ryao
Copy link
Contributor

ryao commented Mar 3, 2013

This looks like an issue in the current mmap() code. I am doing a rewrite that will replace all of the code involved in this backtrace. It should fix this when it is done.

@ryao
Copy link
Contributor

ryao commented Mar 3, 2013

@baryluk You might be able to catch the cause if this issue if you rebuild the spl and zfs with --enable-debug and you encounter a build failure.

@ryao
Copy link
Contributor

ryao commented Apr 1, 2013

@baryluk Your issue could be related to #1342.

@behlendorf
Copy link
Contributor

This was almost certainly addressed by the various mmap improvements and bug fixes over the last 18 months. Since there are no recent similar reports I'm closing this as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture
Projects
None yet
Development

No branches or pull requests

4 participants