Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS kernel panic during scrub and following imports - CentOS #2678

Closed
cointer opened this issue Sep 7, 2014 · 15 comments
Closed

ZFS kernel panic during scrub and following imports - CentOS #2678

cointer opened this issue Sep 7, 2014 · 15 comments
Milestone

Comments

@cointer
Copy link

cointer commented Sep 7, 2014

I've had this issue on both CentOS 6 and 7, the same thing happened in 6 and I rebuilt the pool from scratch on 7 using same hardware. I have a pool of roughly 28TB in size, about 6-7TB used. The pool is made up of two 6-drive raidz2, mirrored SLOG partitions (eMLC drives), 64GB ECC RAM, and 180GB L2ARC. Dedup is disabled, lz4 compression is enabled, other settings I've changed are related to xattr, nfsshare, and acltype.

All drives report "available" via zpool and all pass smart status. I started a scrub last night, it was going fine, roughly 300MB/s, reported a time of 3-5hrs for the scrub. I left it and went to bed. I woke this morning to find that there was a kernel panic, reboot, and another panic when the pool was trying to import at boot.

I went into single user mode and moved the zpool.cache file and system booted up normally. Trying to import via "zpool import -f pool" caused a kernel panic again. I tried "zpool import -F pool" but this would not run without the "-f" switch, which always induces panic.

I was able to import the pool in read-only mode and all data seems intact (I am currently rsync data off of this pool to another, non-zfs share while this is sorted out).

The pool still is in the middle of the scrub, so I feel like the scrub itself is hitting a certain point and causing the panic. Here are some relevant bits from zpool status and dmesg:

pool: pool
state: ONLINE
scan: scrub in progress since Sat Sep 6 17:43:08 2014
923G scanned out of 6.34T at 1/s, (scan is slow, no estimated time)
0 repaired, 14.21% done
config:

    NAME                                                      STATE     READ WRITE CKSUM
    pool                                                      ONLINE       0     0     0
      raidz2-0                                                ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
      raidz2-1                                                ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L0_WD-[redacted]            ONLINE       0     0     0
    logs
      mirror-2                                                ONLINE       0     0     0
        ata-Edge_Boost_Server_PF_MLC_[redacted]-part3  ONLINE       0     0     0
        ata-Edge_Boost_Server_PF_MLC_[redacted]-part3  ONLINE       0     0     0
    cache
      ata-EDGE_Boost_Express_SSD_[redacted]             ONLINE       0     0     0

errors: No known data errors

[1200156.671391] general protection fault: 0000 [#1] SMP
[1200156.671426] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding bridge stp llc sg iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core mei_me mei igb ixgbe ptp pps_core ioatdma mdio dca ses
[1200156.671838] enclosure ipmi_si ipmi_msghandler wmi mperf shpchp nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c raid1 sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ahci libahci ttm drm mpt2sas libata i2c_core raid_class scsi_transport_sas zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) zlib_deflate [last unloaded: ip_tables]
[1200156.672036] CPU: 2 PID: 3910 Comm: txg_sync Tainted: PF O-------------- 3.10.0-123.6.3.el7.x86_64 #1
[1200156.672079] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
[1200156.672119] task: ffff8810333c38e0 ti: ffff88103224c000 task.ti: ffff88103224c000
[1200156.672162] RIP: 0010:[] [] spl_kmem_cache_alloc+0x32/0x270 [spl]
[1200156.672227] RSP: 0018:ffff88103224d208 EFLAGS: 00010246
[1200156.672251] RAX: 00000007d74f6800 RBX: 0000000000c6cc00 RCX: ffffffffa01be4e0
[1200156.672282] RDX: ffff88103224d418 RSI: 0000000000000230 RDI: 657a5f73667a0065
[1200156.672313] RBP: ffff88103224d260 R08: 7e8e756e80cd7cfd R09: 7e8e756e80cd7cfd
[1200156.672344] R10: ffff88085f803900 R11: 69663a725f746365 R12: 0000000000000001
[1200156.672386] R13: ffff880c826e5310 R14: 0000000000000230 R15: 657a5f73667a0065
[1200156.672416] FS: 0000000000000000(0000) GS:ffff88085fd00000(0000) knlGS:0000000000000000
[1200156.672462] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1200156.672487] CR2: 00007effa9bf5880 CR3: 00000000018d0000 CR4: 00000000001407e0
[1200156.672518] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1200156.672549] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1200156.672579] Stack:
[1200156.672591] ffff88103224d240 ffffffff81090b04 ffff881047c78300 ffffc908fbe43870
[1200156.672629] ffffc908fbe434e0 0000000000000246 0000000000c6cc00 0000000000000001
[1200156.672666] ffff880c826e5310 ffff880c826e5348 0000000000006365 ffff88103224d270
[1200156.672702] Call Trace:
[1200156.672720] [] ? __wake_up+0x44/0x50
[1200156.672792] [] zio_buf_alloc+0x23/0x30 [zfs]
[1200156.672832] [] arc_get_data_buf.isra.19+0x345/0x4b0 [zfs]
[1200156.672878] [] arc_buf_alloc+0xdc/0x110 [zfs]
[1200156.672918] [] arc_read+0x392/0x920 [zfs]
[1200156.672946] [] ? ktime_get_ts+0x48/0xe0
[1200156.672985] [] ? arc_buf_remove_ref+0x100/0x100 [zfs]
[1200156.673036] [] dsl_scan_visitbp.isra.5+0x5f1/0xc40 [zfs]
[1200156.673086] [] dsl_scan_visitbp.isra.5+0x56a/0xc40 [zfs]
[1200156.673135] [] dsl_scan_visitbp.isra.5+0x73b/0xc40 [zfs]
[1200156.673184] [] dsl_scan_visitbp.isra.5+0x73b/0xc40 [zfs]
[1200156.673232] [] dsl_scan_visitbp.isra.5+0x73b/0xc40 [zfs]
[1200156.674379] [] dsl_scan_visitbp.isra.5+0x73b/0xc40 [zfs]
[1200156.675512] [] dsl_scan_visitbp.isra.5+0x73b/0xc40 [zfs]
[1200156.676631] [] dsl_scan_visitbp.isra.5+0x73b/0xc40 [zfs]
[1200156.677743] [] dsl_scan_visitbp.isra.5+0x88f/0xc40 [zfs]
[1200156.678784] [] dsl_scan_visitds+0xd8/0x570 [zfs]
[1200156.679819] [] dsl_scan_sync+0x16d/0xb60 [zfs]
[1200156.680862] [] spa_sync+0x492/0xb20 [zfs]
[1200156.681850] [] ? ktime_get_ts+0x48/0xe0
[1200156.682831] [] txg_sync_thread+0x37e/0x5c0 [zfs]
[1200156.683788] [] ? txg_fini+0x290/0x290 [zfs]
[1200156.684704] [] thread_generic_wrapper+0x7a/0x90 [spl]
[1200156.685604] [] ? __thread_exit+0xa0/0xa0 [spl]
[1200156.686457] [] kthread+0xcf/0xe0
[1200156.687279] [] ? kthread_create_on_node+0x140/0x140
[1200156.688082] [] ret_from_fork+0x7c/0xb0
[1200156.688852] [] ? kthread_create_on_node+0x140/0x140
[1200156.689606] Code: 89 e5 41 57 49 89 ff 41 56 41 89 f6 41 55 41 54 53 48 83 ec 30 f6 05 3d 45 01 00 01 74 0d f6 05 3d 45 01 00 08 0f 85 2e 01 00 00 41 ff 87 68 a0 00 00 41 f6 87 48 a0 00 00 80 0f 84 90 00 00
[1200156.691217] RIP [] spl_kmem_cache_alloc+0x32/0x270 [spl]
[1200156.691987] RSP

@behlendorf
Copy link
Contributor

@cointer I suspect you're right. The stack you've posted shows the txg_sync thread in the middle of a scrub. It also shows the scan recursed quite deeply which makes me suspect a stack overflow is causing the crash. We're limited to 8k stacks in the Linux kernel.

Importing the pool read-only as you've done will effectively disable the scrub and avoid this issue.
You could also import the pool using FreeBSD or Illumos to stop the scrub, both of these platforms have much larger default kernel stack sizes. Once stopped you should be able to import the pool under Linux again.

What version of ZoL are you using?

@behlendorf behlendorf added the Bug label Sep 9, 2014
@cointer
Copy link
Author

cointer commented Sep 9, 2014

@behlendorf Thanks for the possible temporary workaround, I may look into this today.

I currently have the latest rpm installed from the centos 7 ZoL repository.

zfs-0.6.3-1.el7.centos.x86_64.rpm

@cointer
Copy link
Author

cointer commented Sep 10, 2014

@behlendorf So here's some interesting news. I booted a live FreeBSD 10, tried to import the pool and it still crashed in the same manner. What do you think?

@behlendorf
Copy link
Contributor

@cointer then it's not a stack overflow. Can you provide the stack trace from FreeBSD?

Based on the stack the next most likely reason would be that somehow a bogus size was passed to arc_read(). If you rebuild the spl and zfs code with the --enable-debug option we can check for this by enabling all the assertions in the code. If you're using the dkms packages set ZFS_DKMS_ENABLE_DEBUG=y in /etc/sysconfig/zfs and rebuild the packages.

@cointer
Copy link
Author

cointer commented Sep 10, 2014

I will try to get the stack trace from FreeBSD. I'm not too familiar with FreeBSD. Can you give advice on the best way to get the dump/stack trace? The live system instantly reboots when I import the pool so I see no output.

@behlendorf
Copy link
Contributor

@cointer I'm not familiar with debugging on FreeBSD either. I mainly just wanted to verify it was the same failure. We can certainly debug this under Linux, the first step would be to build with debugging enabled.

@cointer
Copy link
Author

cointer commented Sep 10, 2014

Here is what I get with debugging enabled:

Message from syslogd ...
kernel:SPLError: 29891:0:(zio.c:254:zio_buf_alloc()) ASSERTION(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT) failed

Message from syslogd ...
kernel:SPLError: 29891:0:(zio.c:254:zio_buf_alloc()) SPL PANIC

behlendorf added a commit to behlendorf/zfs that referenced this issue Sep 10, 2014
The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
@behlendorf
Copy link
Contributor

OK, that's what I was expecting. It shows that somehow the block pointer on disk contains an incorrect logical size despite having a valid checksum. This is what's resulting in the crash on both Linux, FreeBSD, and almost certainly any other ZFS platform.

I've proposed a fix for this specific case in pull request #2685. Could apply that patch to your source tree, rebuild, and import the pool again. It will allow scrub to detect the on-disk damage and if possible fix it. In this case it may be fixable because multiple copies of the block pointer will be stored on disk.

@behlendorf behlendorf added this to the 0.6.4 milestone Sep 10, 2014
@cointer
Copy link
Author

cointer commented Sep 10, 2014

I applied the patch and it worked! The scrub is now continuing from that point and everything is back to read-write. Thanks for the fix!

@behlendorf
Copy link
Contributor

@cointer Great news! Could you check the output of zpool status and see if it logged the error and fixed it. I'm glad we could turn around a fix for you.

@cointer
Copy link
Author

cointer commented Sep 11, 2014

@behlendorf According to zpool status after the scrub completed:

scrub repaired 0 in 99h29m with 10 errors
errors: No known data errors

Should I look into anything further?

@behlendorf
Copy link
Contributor

@cointer Looks good. Apparently 10 meta-data blocks were impacted and no data blocks. Things look to be in pretty good shape. The only additional thing I'd suggest is clearing the errors, zpool clear, and scrubbing the pool again. That will tell us if those blocks were corrected.

@cointer
Copy link
Author

cointer commented Sep 11, 2014

OK, I scrubbed the pool again, it found a couple URE's and repaired them. I ran one more scrub after that to be sure, no more URE but it came back with 2 meta-data block errors this time. zpool hasn't marked any disks as bad yet, so I'm going to assume things are OK at the moment unless you have any other suggestions.

@behlendorf
Copy link
Contributor

@cointer it's a little concerning you're still detecting errors after the first scrub. It should have corrected everything. It makes me wonder if there's some flaky hardware on your system (memory, cables, controller, drives, etc) causing errors.

@cointer
Copy link
Author

cointer commented Sep 11, 2014

@behlendorf I'm leaning toward that thought myself. All the URE's originated from a single drive, so it's most likely that is the problem child. I'll run a memtest first, but otherwise I'll keep an eye on that drive. I noticed zpool reported the drive that the URE's were discovered on but no such clues about the errors. When a metadata error is discovered, is there a way to tell what drives they originated from?

behlendorf added a commit to behlendorf/zfs that referenced this issue Sep 12, 2014
The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 14, 2014
… corrupt logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 16, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 17, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 18, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
behlendorf added a commit to behlendorf/zfs that referenced this issue Sep 18, 2014
The general strategy used by ZFS to verify that blocks are valid is
to checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long as bad data is never
written with a valid checksum.  If this does somehow occur due to
a software bug or a memory bit-flip on a non-ECC system it may
result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 22, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 22, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 27, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 30, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Oct 3, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Oct 11, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Oct 11, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Oct 19, 2014
… logical size

The general strategy used by ZFS to verify that blocks are is to
checksum everything.  This has the advantage of being extremely
robust and generically applicable regardless of the contents of
the block.  If a blocks checksum is valid then its contents are
trusted by the higher layers.

This system works exceptionally well as long bad data is never
written with a valid checksum.  However, if this does somehow
occur due to a software bug or a memory bit-flip on a non-ECC
system it may result in kernel panic.

One such place where this could occur is if somehow the logical
size stored in a block pointer exceeds the maximum block size.
This will result in an attempt to allocate a buffer greater than
the maximum block size causing a system panic.

To prevent this from happening the arc_read() function has been
updated to detect this specific case.  If a block pointer with an
invalid logical size is passed it will treat the block as if it
contained a checksum error.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2678
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants