Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of 'attempt to access beyond end of device' in syslog #7724

Closed
gedia opened this issue Jul 17, 2018 · 7 comments
Closed

Lots of 'attempt to access beyond end of device' in syslog #7724

gedia opened this issue Jul 17, 2018 · 7 comments

Comments

@gedia
Copy link
Contributor

gedia commented Jul 17, 2018

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version rolling
Linux Kernel 4.17.6-gentoo
Architecture amd64
ZFS Version 0.7.0-1472_g2e5dc449
SPL Version 0.7.0-1472_g2e5dc449

Describe the problem you're observing

My system log is inundated with warnings of attempts to access beyond end of device. This has been the case with previous zfs versions as well, dating back to a couple of months when the system was first built. Here's an example:

Jul 17 17:26:29 laptop kernel: attempt to access beyond end of device
Jul 17 17:26:29 laptop kernel: nvme0n1p1: rw=0, want=1000193568, limit=1000062976
Jul 17 17:26:29 laptop kernel: attempt to access beyond end of device
Jul 17 17:26:29 laptop kernel: nvme0n1p1: rw=0, want=1000194080, limit=1000062976

With high I/O load, I'm also experiencing some stability issues, but I'm not sure this is in any way related to ZFS or not.

Please let me know how I can provide more useful debugging information.

Describe how to reproduce the problem

The errors are very frequent, about 1 per second on average (they actually come in bursts). They even appear when the system is generally idle, so I'm not sure what the trigger is.

My system has a few particularities I'd like to list here:

  • I only have a single ZFS pool, which comprises of a single partition on an NVMe SSD.
  • Before creating the pool, I reformatted the SSD using nvme-cli, and changed its (firmware toggle-able) logical addressing to 4096-byte sectors
  • I then created the pool on the entire drive, in order to have the 'whole-disk' flag set
  • This created a GPT partition table with two partitions. The smaller of the two (with id=9) is used as an EFI system partition
  • I'm using native ZFS encryption on all datasets mounted on VFS. Cipher mode is aes-256-gcm
@richardelling
Copy link
Contributor

richardelling commented Jul 17, 2018

Can you look at the comments in #7629 to see if it applies?
One other method is to look at the asize for the vdev in the label using zdb -l /dev/nvme0n1p1 and compare to the partition size of /dev/nvme0n1p1.

A workaround might be as simple as removing partition 9 (it isn't in use) and growing partition 1 to
a large enough size.

@gedia
Copy link
Contributor Author

gedia commented Jul 18, 2018

Thank you for your reply @richardelling

Here's a report from my system with the values in question:

Device Start End Sectors (count) Bytes
nvme0n1 - - 125026902 512110190592
nvme0n1p1 2048 125009919 125007872 512032243712
nvme0n1p9 125009920 125026303 16384 67108864
vdev - - - 512027525120

I seems to me that the vdev size is smaller than the size of partition 1, so now the 'beyond end of device' error baffles me even more...

I'm not sure how #7629 is related to this... Could you elaborate?

Lastly, I'm actually using partition 9 as the EFI system partition (I'm using EFISTUB for booting), so if I delete it I'll end up with an unbootable system.

And here's the full output of 'zdb -l' for reference. Is it normal to have two labels with identical data?

------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'gtank'
    state: 0
    txg: 556610
    pool_guid: 15423420548484792682
    errata: 0
    hostid: 8323328
    hostname: 'gdiamantopoulos-laptop'
    top_guid: 9629493302233033777
    guid: 9629493302233033777
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 9629493302233033777
        path: '/dev/disk/by-id/nvme-KXG50ZNV512G_NVMe_TOSHIBA_512GB_184F74BGFQCS-part1'
        devid: 'nvme-KXG50ZNV512G_NVMe_TOSHIBA_512GB_184F74BGFQCS-part1'
        whole_disk: 1
        metaslab_array: 65
        metaslab_shift: 32
        ashift: 13
        asize: 512027525120
        is_log: 0
        DTL: 295
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 
------------------------------------
LABEL 2
------------------------------------
    version: 5000
    name: 'gtank'
    state: 0
    txg: 235364
    pool_guid: 15423420548484792682
    errata: 0
    hostname: 'gdiamantopoulos-laptop'
    top_guid: 9629493302233033777
    guid: 9629493302233033777
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 9629493302233033777
        path: '/dev/disk/by-id/nvme-KXG50ZNV512G_NVMe_TOSHIBA_512GB_184F74BGFQCS-part1'
        devid: 'nvme-KXG50ZNV512G_NVMe_TOSHIBA_512GB_184F74BGFQCS-part1'
        whole_disk: 1
        metaslab_array: 65
        metaslab_shift: 32
        ashift: 13
        asize: 512027525120
        is_log: 0
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 2 3 

@DeHackEd
Copy link
Contributor

DeHackEd commented Jul 26, 2018

# zpool create yeti-storage-toronto -m none -O compress=lz4 raidz2 sd[cdefghijkl] raidz2 sd[mnopqrstuv] raidz2 sd[wxyz] sda[abcdef] raidz2 sda[ghijklmnop] raidz2 sda[qrstuvwxyz] raidz2 sdb[a-j]
cannot create 'yeti-storage-toronto': I/O error
# dmesg
[119129.751439]  sdbi: sdbi1 sdbi9
[119129.762736]  sdbi: sdbi1 sdbi9
[119130.236607]  sdbj: sdbj1 sdbj9
[119130.270633]  sdbj: sdbj1 sdbj9
[119130.287245]  sdbj: sdbj1 sdbj9
[119130.716017] attempt to access beyond end of device
[119130.716022] attempt to access beyond end of device
[119130.716025] sdav1: rw=0, want=15628031520, limit=15627894784
[119130.716032] sdau1: rw=0, want=15628031520, limit=15627894784
[119130.716207] attempt to access beyond end of device
[119130.716211] sdav1: rw=0, want=15628032032, limit=15627894784
[119130.716341] attempt to access beyond end of device
[119130.716344] sdau1: rw=0, want=15628032032, limit=15627894784
[119130.716388] attempt to access beyond end of device
[119130.716395] sdl1: rw=0, want=15628031520, limit=15627894784
[119130.716573] attempt to access beyond end of device
[119130.716576] sdl1: rw=0, want=15628032032, limit=15627894784
[119130.716636] attempt to access beyond end of device
[119130.716641] sdm1: rw=0, want=15628031520, limit=15627894784
[119130.716821] attempt to access beyond end of device
[119130.716823] sdm1: rw=0, want=15628032032, limit=15627894784
[119130.942915] attempt to access beyond end of device
[119130.942922] sdl1: rw=536870919, want=15628031744, limit=15627894784
[119130.943056] attempt to access beyond end of device
[119130.943058] sdl1: rw=536870919, want=15628031520, limit=15627894784
[119130.945039] attempt to access beyond end of device
[119130.945042] sdl1: rw=536870919, want=15628032000, limit=15627894784
[119130.946771] attempt to access beyond end of device
[119130.946774] sdl1: rw=536870919, want=15628032256, limit=15627894784
[119130.946915] attempt to access beyond end of device
[119130.946917] sdl1: rw=536870919, want=15628032032, limit=15627894784
[119130.948890] attempt to access beyond end of device
[119130.948893] sdl1: rw=536870919, want=15628032512, limit=15627894784
[119130.952504] attempt to access beyond end of device
[119130.952509] sdl1: rw=0, want=15628031520, limit=15627894784
[119130.952519] attempt to access beyond end of device
[119130.952522] sdl1: rw=0, want=15628032032, limit=15627894784
[119130.970324] attempt to access beyond end of device
[119130.970329] sdl1: rw=536870912, want=15628031744, limit=15627894784
[119130.970450] attempt to access beyond end of device
[119130.970452] sdl1: rw=536870912, want=15628031520, limit=15627894784
[119130.972243] attempt to access beyond end of device
[119130.972248] sdl1: rw=536870912, want=15628032000, limit=15627894784
[119130.973797] attempt to access beyond end of device
[119130.973799] sdl1: rw=536870912, want=15628032256, limit=15627894784
[119130.973929] attempt to access beyond end of device
[119130.973931] sdl1: rw=536870912, want=15628032032, limit=15627894784
[119130.975686] attempt to access beyond end of device
[119130.975688] sdl1: rw=536870912, want=15628032512, limit=15627894784
[119130.977557] attempt to access beyond end of device
[119130.977562] sdl1: rw=0, want=15628031520, limit=15627894784
[119130.977573] attempt to access beyond end of device
[119130.977576] sdl1: rw=0, want=15628032032, limit=15627894784

# git bisect bad
74d4260 is the first bad commit

Mentioning @shartse as the author and @behlendorf for a bisectable bug.

Edit: git revert 74d42600 on top of master results in a usable pool.

@DeHackEd
Copy link
Contributor

Additional. I have trouble believing there is anything patently wrong with the ZFS code up to this point in time preventing pool creation, or else the test suite should be having a fit.

The only thing I can think is special about these disks are are they 4096n, not 512e. That is, they do not offer an emulation view of 512byte sectors. All IO must be 4096 bytes on the wire.

@shartse
Copy link
Contributor

shartse commented Jul 26, 2018

I expect that this is caused by how I was computing available size in 74d4260 - #7629 should have fixed it. Can you see if this persists with that change?

@DeHackEd
Copy link
Contributor

Okay, yes that specific commit does work properly.

Well, that satisfies my concerns. Sorry for the alarm.

@gedia
Copy link
Contributor Author

gedia commented Jul 26, 2018

Using latest master I also no longer have this issue. Thanks! I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants