Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC at spl-kmem.c:388:spl_kmem_free_track() VERIFY3(dptr != ((void *)0)) failed ( (null) != (null)) #4967

Closed
JuliaVixen opened this issue Aug 14, 2016 · 11 comments
Milestone

Comments

@JuliaVixen
Copy link

Well, I ran this command on an idle system...

zfs send -eLv -i more_disks@Aug_12_2016 more_disks@Aug_14_2016 | zfs recv -evsF U

And then this happened:

[93344.134345] VERIFY3(dptr != ((void *)0)) failed (          (null) !=           (null))
[93344.134472] PANIC at spl-kmem.c:388:spl_kmem_free_track()
[93344.134586] Showing stack for process 15374
[93344.134588] CPU: 2 PID: 15374 Comm: zfs Tainted: P           O    4.4.6-gentoo-debug2 #1
[93344.134589] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013
[93344.134590]  0000000000000000 ffff88043eea7888 ffffffff8130fa9d ffffffffa0d16b6a
[93344.134591]  ffffffffa0d16b4b ffff88043eea7898 ffffffffa0d10f13 ffff88043eea7a28
[93344.134593]  ffffffffa0d110cb 0000000000000000 0000000100000030 ffff88043eea7a38
[93344.134594] Call Trace:
[93344.134598]  [<ffffffff8130fa9d>] dump_stack+0x4d/0x63
[93344.134603]  [<ffffffffa0d10f13>] spl_dumpstack+0x3d/0x3f [spl]
[93344.134605]  [<ffffffffa0d110cb>] spl_panic+0xa9/0xdc [spl]
[93344.134608]  [<ffffffffa0d10593>] ? vn_rdwr+0x159/0x1b3 [spl]
[93344.134611]  [<ffffffff814d0a10>] ? mutex_unlock+0x9/0xb
[93344.134628]  [<ffffffffa0dddff9>] ? dump_bytes_cb+0x10f/0x123 [zfs]
[93344.134630]  [<ffffffffa0d0a412>] spl_kmem_free+0x44/0xbf [spl]
[93344.134633]  [<ffffffffa0d3b6ec>] fnvlist_pack_free+0x9/0xb [znvpair]
[93344.134644]  [<ffffffffa0ddf884>] dmu_send_impl+0x529/0x120a [zfs]
[93344.134645]  [<ffffffff814d0a10>] ? mutex_unlock+0x9/0xb
[93344.134657]  [<ffffffffa0de06fc>] dmu_send_obj+0x197/0x1bc [zfs]
[93344.134669]  [<ffffffffa0e5aa52>] zfs_ioc_send+0x1f7/0x236 [zfs]
[93344.134682]  [<ffffffffa0e60a63>] zfsdev_ioctl+0x40e/0x521 [zfs]
[93344.134684]  [<ffffffff81059337>] ? check_preempt_curr+0x3e/0x6c
[93344.134686]  [<ffffffff810fc891>] do_vfs_ioctl+0x3f5/0x43d
[93344.134688]  [<ffffffff8103fd2e>] ? _do_fork+0x229/0x24e
[93344.134689]  [<ffffffff810368d2>] ? __do_page_fault+0x24e/0x367
[93344.134691]  [<ffffffff810fc912>] SyS_ioctl+0x39/0x61
[93344.134692]  [<ffffffff814d2317>] entry_SYSCALL_64_fastpath+0x12/0x6a

I'm using the current GIT version as of about two days ago...

Aug 13 07:56:47 localhost kernel: SPL: Loaded module v0.6.5-1 (DEBUG mode)
Aug 13 08:02:50 localhost kernel: zavl: module license 'CDDL' taints kernel.
Aug 13 08:02:50 localhost kernel: Disabling lock debugging due to kernel taint
Aug 13 08:02:52 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5

localhost ~ # modinfo /lib/modules/4.4.6-gentoo-debug2/extra/zfs/zfs.ko | head -7
filename:       /lib/modules/4.4.6-gentoo-debug2/extra/zfs/zfs.ko
version:        0.6.5-1
license:        CDDL
author:         OpenZFS on Linux
description:    ZFS
srcversion:     1B0E25441FFC82D8549AB1B
depends:        spl,znvpair,zunicode,zcommon,zavl
localhost ~ # modinfo /lib/modules/4.4.6-gentoo-debug2/extra/spl/spl.ko | head -7
filename:       /lib/modules/4.4.6-gentoo-debug2/extra/spl/spl.ko
version:        0.6.5-1
license:        GPL
author:         OpenZFS on Linux
description:    Solaris Porting Layer
srcversion:     B6B4023A493DF4B6621F15E
depends:        zlib_deflate

And stuff...

localhost ~ # zpool status
  pool: U
 state: ONLINE
  scan: none requested
config:

    NAME                                          STATE     READ WRITE CKSUM
    U                                             ONLINE       0     0     0
      raidz1-0                                    ONLINE       0     0     0
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7TPZJE9  ONLINE       0     0     0
        ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E1430463  ONLINE       0     0     0
        ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E1457549  ONLINE       0     0     0

errors: No known data errors

  pool: more_disks
 state: ONLINE
  scan: resilvered 22.0G in 0h6m with 0 errors on Thu Aug 11 23:00:51 2016
config:

    NAME                          STATE     READ WRITE CKSUM
    more_disks                    ONLINE       0     0     0
      raidz1-0                    ONLINE       0     0     0
        ata-ST3750640AS_3QD06JCF  ONLINE       0     0     0
        ata-ST3750640AS_3QD06JS5  ONLINE       0     0     0
        ata-ST3750640AS_3QD06JZS  ONLINE       0     0     0

errors: No known data errors
@JuliaVixen
Copy link
Author

I rebooted, ran this zfs send|recv again, and instantly kernel panic'd again. The recv'd data set from last time doesn't exist. I should have mentioned, this happens at the start of the recv...

send from @Aug_12_2016 to more_disks@Aug_14_2016 estimated size is 229G
total estimated size is 229G
TIME        SENT   SNAPSHOT
receiving incremental stream of more_disks@Aug_14_2016 into U/more_disks@Aug_14_2016
cannot receive incremental stream: dataset is busy
10:15:30     312   more_disks@Aug_14_2016
10:15:31     312   more_disks@Aug_14_2016
10:15:32     312   more_disks@Aug_14_2016
10:15:33     312   more_disks@Aug_14_2016
10:15:34     312   more_disks@Aug_14_2016
10:15:35     312   more_disks@Aug_14_2016
10:15:36     312   more_disks@Aug_14_2016
[etc.]

After a reboot...

localhost ~ # zfs get all U/more_disks@Aug_14_2016
cannot open 'U/more_disks@Aug_14_2016': dataset does not exist

@GeLiXin
Copy link
Contributor

GeLiXin commented Aug 15, 2016

This issue maybe introduced by commit 47dfff3, sorry add trouble to you and thanks for your report.

We will fix it soon, you can rollback your version to date before Jun 25, 2016, or wait for our fix PR.

@ironMann
Copy link
Contributor

So far I've seen few code paths where NULL is passed tofree(); because the operation is actually ignored, and it creates less 'clutter'.
In this case, it looks like spl is built with DEBUG_KMEM_TRACKING which doesn't like that.

If this practice is preferable, then the kmem tracking code should be updated.

@behlendorf
Copy link
Contributor

The kmem illumos interfaces we are emulating in the SPL are supposed to support passing NULL to free(). So this is preferable. @GeLiXin can you propose an alternate patch against the SPL which handles this NULL case in spl_kmem_free_track().

@behlendorf behlendorf added this to the 0.7.0 milestone Aug 15, 2016
@JuliaVixen
Copy link
Author

Oh yeah, I was going to mention, I build SPL with:
./configure --enable-debug --enable-debug-kmem --enable-debug-kmem-tracking
And ZFS with:
./configure --enable-debug --enable-debug-dmu-tx

@behlendorf
Copy link
Contributor

@JuliaVixen just FYI --enable-debug-kmem-tracking will likely significantly hurt performance.

@JuliaVixen
Copy link
Author

I was wondering why zfs export was going so slow... but anyway, I've been hitting so many kernel panics, at this point I figured it would be faster to just build-in debugging to begin with, rather than rebuilding and reproducing the bug to collect info.

I just rebuilt SPL with ./configure --enable-debug --enable-debug-kmem and started another zfs send|recv... No panic this time.

@GeLiXin
Copy link
Contributor

GeLiXin commented Aug 19, 2016

@behlendorf A new PR #openzfs/spl#567 which handles NULL pointer in spl_kmem_free_track() is done.

By the way, not all the versions of kernel can handle NULL pointer in free(), so the patch ##4969 which resolve this in the upper layer seems has a certain necessity.

@behlendorf
Copy link
Contributor

@GeLiXin thanks for adding the SPL patch. Could you be more specific which versions of the kernel don't handle this? I know this used to be true, but that was a long time ago.

behlendorf pushed a commit to openzfs/spl that referenced this issue Aug 19, 2016
When DEBUG_KMEM_TRACKING is enabled in SPL, we keep tracking all
the buffers alloced by kmem_alloc() and kmem_zalloc().  If a NULL
pointer which indicates no track info in SPL is passed to
spl_kmem_free_track, we just ignore it.

Signed-off-by: GeLiXin <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs/zfs#4967
Closes #567
@GeLiXin
Copy link
Contributor

GeLiXin commented Aug 20, 2016

@behlendorf I'm sorry to make a mistake, it's not the kernel don't handle NULL pointer, but the Intel DPDK memory manage module don't handle this in my version.

Since the ABD patch isn't robust enough to resolve the memory fragmention in ZFS now, I take theIntel DPDK memory manage module instead of kmem in SPL. It works well and give me the illusion that my kernel can't handle NULL pointer. Sorry.

Avoid to free NULL pointer will provides a convenient for secondary development and makes the code seems more rigorous. On the other hand, it's understandable to close ##4969 as well since it won't make actual trouble to ZoL.

@behlendorf
Copy link
Contributor

@GeLiXin OK, that makes sense. Thanks for the clarification.

nedbass pushed a commit to nedbass/spl that referenced this issue Aug 26, 2016
When DEBUG_KMEM_TRACKING is enabled in SPL, we keep tracking all
the buffers alloced by kmem_alloc() and kmem_zalloc().  If a NULL
pointer which indicates no track info in SPL is passed to
spl_kmem_free_track, we just ignore it.

Signed-off-by: GeLiXin <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs/zfs#4967
Closes openzfs#567
tuxoko pushed a commit to tuxoko/spl that referenced this issue Sep 8, 2016
When DEBUG_KMEM_TRACKING is enabled in SPL, we keep tracking all
the buffers alloced by kmem_alloc() and kmem_zalloc().  If a NULL
pointer which indicates no track info in SPL is passed to
spl_kmem_free_track, we just ignore it.

Signed-off-by: GeLiXin <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs/zfs#4967
Closes openzfs#567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants