Draid rebase [WIP] #8016

don-brady · 2018-10-12T04:03:51Z

Motivation and Context

This is a WIP rebase of draid from @thegreatgazoo PR-7078 to allow continued community development using a branch based on most recent master.

Description

Rebasing Notes:

Conditionally disabled dRAID mirrored metaslabs (was causing failed assertions)
Exclude dRAID pools from device removal operations.
Added dRAID specific testing to ztest and zloop that will exercise dRAID configurations across a range of settings (see zloop.sh for details)
Removed the linking of libzpool intolibzfs (moved common dRAID config code into zcommon/draid_config.c)

How Has This Been Tested?

Used new zloop & ztest to verify draid code base
Some manual testing of draid configs

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.
Change has been approved by a ZFS on Linux member.

Signed-off-by: Don Brady <[email protected]>

Included new command options to control ztest testing with dRAID: -K draid|raidz|random -- kind of RAID to test -D <value> -- dRAID data drives per redundancy group -G <value> -- dRAID redundancy group count -S <value> -- dRAID distributed spare drives -R <value> -- RAID parity (raidz or dRAID) The above values are used to generate a dRAID config from the draidcfg command. This also allows for a more comprehensive set of configurations when used with zloop. By default, dRAID will be provisioned 50% of the time in ztest runs. Added dRAID specific section to zloop to exercise dRAID across a wide range of configuration parameters. For example: ztest -VVVV -K draid -D 7 -G 6 -S 1 -m 0 -r 1 -R 1 -v 0 -a 12 -s 384m Updated the zloop ztest time range to be 30 - 120 seconds. Signed-off-by: Don Brady <[email protected]>

codecov · 2018-10-12T14:59:51Z

Codecov Report

Merging #8016 into master will increase coverage by 11.34%.
The diff coverage is 57.92%.

@@             Coverage Diff             @@
##           master    #8016       +/-   ##
===========================================
+ Coverage   66.74%   78.08%   +11.34%     
===========================================
  Files         314      383       +69     
  Lines       97292   116167    +18875     
===========================================
+ Hits        64934    90714    +25780     
+ Misses      32358    25453     -6905

Flag	Coverage Δ
#kernel	`77.64% <14.74%> (?)`
#user	`67.32% <57.79%> (+0.58%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d43cc9...2964970. Read the comment docs.

don-brady · 2018-10-12T16:05:55Z

So clearly we need some ZTS testing coverage (the codecov %67 score is too low)

As for the ztest runs, it looks like it's finding some bugs. We need some volunteers to chase them down (see buildbot runs for details). They are all failed assertions. Below are the top 3:

spa_config_held(os->os_spa, SCL_ALL, RW_WRITER) == 0 || (spa_is_root(os->os_spa) && spa_config_held(os->os_spa, SCL_STATE, RW_WRITER))
ASSERT at ../../module/zfs/dnode.c:1285:dnode_hold_impl()/sbin/ztest(+0x8c13)[0x55e9f52f0c13]

offset < vd->vdev_psize - VDEV_LABEL_START_SIZE - VDEV_LABEL_END_SIZE (0xffffffffffc3f000 < 0x7b80000)
ASSERT at ../../module/zfs/vdev_draid.c:1310:vdev_dspare_get_child()/sbin/ztest(+0x8c13)[0x55bded087c13]

!msp->ms_rebuilding
ASSERT at ../../module/zfs/metaslab.c:3603:metaslab_free_concrete()/usr/sbin/ztest(+0x8d67)[0x55d9a6cf5d67]

naclosagc · 2018-10-13T02:05:50Z

Hi.

Trying out the new rebase. It's a VM on a Intel(R) Xeon(R) CPU X5650. The VM has 16 cores and 16GB memory. I'm using a SCSI Virtio controller, with 28 disks attached - 1 for boot and 27 for a pool.
Created the pool with 27 drives, parity 1, 2 spares, 4 drive data. I was continuously copying data to it for a couple of hours, and it was acting better than the last one did. The last one would get checksum errors in less than an hour. I stopped the copy with the pool fairly full, and did a scrub. The scrub finished fine. I then failed a drive, and did a replace. The replace failed after just a few minutes with:

gentoo ~ # zpool replace tank scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-1 '%draid1-0-s0'
Message from syslogd@gentoo at Fri Oct 12 20:33:45 2018 ...
gentoo kernel: [14202.454290] PANIC at metaslab.c:3603:metaslab_free_concrete()

Message from syslogd@gentoo at Fri Oct 12 20:33:45 2018 ...
gentoo kernel: [14202.454288] VERIFY(!msp->ms_rebuilding) failed

I attached the dmesg output. Let me know what else I need to provide.

P.S. I tried this twice, and the same thing happened. The first time, I did not have debug enabled and the zfs paramters suggested set. The second time I did.

P.P.S It was a QEMU VM under Libvirt. The OS is Gentoo. Going to try an Ubuntu under ESXi

dmesg.txt

don-brady · 2018-10-13T17:00:12Z

@naclosagc Thanks for the report. That looks to be the same failed assertion that ztest is hitting.

carlesmateo · 2018-10-16T08:35:52Z

Created the pool with 27 drives, parity 1, 2 spares, 4 drive data. I was continuously copying data to it for a couple of hours, and it was acting better than the last one did. The last one would get checksum errors in less than an hour. I stopped the copy with the pool fairly full, and did a scrub. The scrub finished fine. I then failed a drive, and did a replace. The replace failed after just a few minutes with:

When you tell "I then failed a drive" you mean that you forced Linux to stop seeing the drive, so made it UNAVAIL on purpose, or you're reporting a bug: ZFS seen the drive as failed?.
Can you indicate the exact steps you do to make the drive appear as failing?.
Can you paste your zpool history?.

Did you had the ZED Service running?.

naclosagc · 2018-10-16T13:24:53Z

Don, I failed the drive with zpool. Here is the history, though it didn't record the zpool replace: History for 'tank': 2018-10-12.15:37:04 zpool create -f tank draid1 cfg=27.nvl /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-3 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-4 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-5 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-6 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-0 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-1 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-3 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-4 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-5 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-6 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-0 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-1 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-3 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-4 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-5 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-6 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-0 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-1 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-3 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-4 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-5 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-6 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi4-0-0-0 2018-10-12.16:38:37 zpool import tank 2018-10-12.20:09:37 zpool scrub tank 2018-10-12.20:33:09 zpool offline tank scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-1 2018-10-16.08:17:44 zpool import tank Something interesting happened when I imported the pool to get the history. My server fans spun up. I checked, and the pool was resilvering instead of rebuilding and it finished: gentoo ~ # zpool status pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Oct 16 08:17:44 2018 411G scanned at 6.05G/s, 405G issued at 5.96G/s, 411G total 13.9G resilvered, 98.49% done, 0 days 00:00:01 to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 draid1-0 DEGRADED 0 0 0 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-2 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-3 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-4 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-5 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-6 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-0 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-1 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-2 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-3 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-4 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-5 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-6 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-0 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-1 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-2 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-3 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-4 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-5 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-6 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-0 ONLINE 0 0 0 (resilvering) spare-20 DEGRADED 0 0 0 scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-1 OFFLINE 0 0 0 %draid1-0-s0 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-2 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-3 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-4 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-5 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-6 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi4-0-0-0 ONLINE 0 0 0 (resilvering) spares %draid1-0-s0 INUSE currently in use %draid1-0-s1 AVAIL errors: No known data errors So, the pool is functioning at the moment.

…

On Tue, Oct 16, 2018 at 3:36 AM Carles Mateo ***@***.***> wrote: Created the pool with 27 drives, parity 1, 2 spares, 4 drive data. I was continuously copying data to it for a couple of hours, and it was acting better than the last one did. The last one would get checksum errors in less than an hour. I stopped the copy with the pool fairly full, and did a scrub. The scrub finished fine. I then failed a drive, and did a replace. The replace failed after just a few minutes with: When you tell "I then failed a drive" you mean that you forced Linux to stop seeing the drive, so made it UNAVAIL on purpose, or you're reporting a bug: ZFS seen the drive as failed. Can you indicate the exact steps you do to make the drive appear as failing?. Can you paste your zpool history?. Did you had the ZED Service running?. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8016 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASWqwATZdaPFiMhe5FksZ4ZFiNHjcV_yks5ulZqMgaJpZM4XYwOb> .

-- Andy Carlson --------------------------------------------------------------------------- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.

naclosagc · 2018-10-16T13:27:56Z

Sorry, Carles, not Don, zpool replace scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-1 '%draid1-0-s0' -o ashift=9

…

On Tue, Oct 16, 2018 at 8:24 AM Andrew Carlson ***@***.***> wrote: Don, I failed the drive with zpool. Here is the history, though it didn't record the zpool replace: History for 'tank': 2018-10-12.15:37:04 zpool create -f tank draid1 cfg=27.nvl /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-3 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-4 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-5 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-6 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-0 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-1 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-3 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-4 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-5 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-6 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-0 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-1 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-3 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-4 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-5 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-6 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-0 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-1 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-3 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-4 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-5 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-6 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi4-0-0-0 2018-10-12.16:38:37 zpool import tank 2018-10-12.20:09:37 zpool scrub tank 2018-10-12.20:33:09 zpool offline tank scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-1 2018-10-16.08:17:44 zpool import tank Something interesting happened when I imported the pool to get the history. My server fans spun up. I checked, and the pool was resilvering instead of rebuilding and it finished: gentoo ~ # zpool status pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Oct 16 08:17:44 2018 411G scanned at 6.05G/s, 405G issued at 5.96G/s, 411G total 13.9G resilvered, 98.49% done, 0 days 00:00:01 to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 draid1-0 DEGRADED 0 0 0 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-2 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-3 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-4 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-5 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-6 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-0 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-1 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-2 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-3 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-4 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-5 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-6 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-0 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-1 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-2 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-3 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-4 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-5 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-6 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-0 ONLINE 0 0 0 (resilvering) spare-20 DEGRADED 0 0 0 scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-1 OFFLINE 0 0 0 %draid1-0-s0 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-2 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-3 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-4 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-5 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-6 ONLINE 0 0 0 (resilvering) scsi-0QEMU_QEMU_HARDDISK_drive-scsi4-0-0-0 ONLINE 0 0 0 (resilvering) spares %draid1-0-s0 INUSE currently in use %draid1-0-s1 AVAIL errors: No known data errors So, the pool is functioning at the moment. On Tue, Oct 16, 2018 at 3:36 AM Carles Mateo ***@***.***> wrote: > Created the pool with 27 drives, parity 1, 2 spares, 4 drive data. I was > continuously copying data to it for a couple of hours, and it was acting > better than the last one did. The last one would get checksum errors in > less than an hour. I stopped the copy with the pool fairly full, and did a > scrub. The scrub finished fine. I then failed a drive, and did a replace. > The replace failed after just a few minutes with: > > When you tell "I then failed a drive" you mean that you forced Linux to > stop seeing the drive, so made it UNAVAIL on purpose, or you're reporting a > bug: ZFS seen the drive as failed. > Can you indicate the exact steps you do to make the drive appear as > failing?. > Can you paste your zpool history?. > > Did you had the ZED Service running?. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#8016 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ASWqwATZdaPFiMhe5FksZ4ZFiNHjcV_yks5ulZqMgaJpZM4XYwOb> > . > -- Andy Carlson --------------------------------------------------------------------------- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.

-- Andy Carlson --------------------------------------------------------------------------- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.

feltstykket · 2018-12-04T21:22:06Z

Trying out the rebase as well, the EXPANDSZ parameter is wrong on a test pool:
root@gaia:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
gaia 70.7T 1.50M 70.7T - 16.0E 0% 0% 1.00x ONLINE -

root@gaia:~# zpool status gaia
pool: gaia
state: ONLINE
scan: none requested
config:

    NAME                                            STATE     READ WRITE CKSUM
    gaia                                            ONLINE       0     0     0
      draid1-0                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK11A8B9KEE58F  ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK11A8B9KEYSUF  ONLINE       0     0     0
        ata-ST32000644NS_9WM1QR5H                   ONLINE       0     0     0
        ata-ST32000644NS_9WM1XP7W                   ONLINE       0     0     0
        ata-ST32000644NS_9WM1YJX3                   ONLINE       0     0     0
        ata-ST32000644NS_9WM1YLJC                   ONLINE       0     0     0
        ata-ST32000644NS_9WM1YLJG                   ONLINE       0     0     0
        ata-ST32000644NS_9WM1ZGLD                   ONLINE       0     0     0
        ata-ST32000644NS_9WM20JHL                   ONLINE       0     0     0
        ata-ST32000644NS_9WM2S91S                   ONLINE       0     0     0
        ata-ST32000644NS_9WM70FXN                   ONLINE       0     0     0
        ata-ST32000644NS_9WM72X94                   ONLINE       0     0     0
        ata-WDC_WD2000FYYZ-01UL1B0_WD-WCC1P0022601  ONLINE       0     0     0
        ata-WDC_WD2002FYPS-01U1B1_WD-WCAVY1425565   ONLINE       0     0     0
        ata-WDC_WD2002FYPS-01U1B1_WD-WCAVY2455644   ONLINE       0     0     0
        ata-WDC_WD2002FYPS-01U1B1_WD-WCAVY2523196   ONLINE       0     0     0
        ata-WDC_WD2002FYPS-01U1B1_WD-WCAVY2586569   ONLINE       0     0     0
        ata-WDC_WD2002FYPS-01U1B1_WD-WCAVY2588016   ONLINE       0     0     0
        ata-WDC_WD2002FYPS-02W3B0_WD-WCAVY6425945   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B0_WD-WMAY00055867   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B0_WD-WMAY00058754   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B0_WD-WMAY00691102   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B0_WD-WMAY01342101   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B0_WD-WMAY03223177   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00008597   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00012700   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00014911   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00015392   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00016755   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00017646   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00019016   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00022891   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00025608   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00038080   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00038451   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00040879   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00041792   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00087852   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00492436   ONLINE       0     0     0
        ata-WDC_WD2003FYYS-02W0B1_WD-WCAY00875337   ONLINE       0     0     0
    spares
      %draid1-0-s0                                  AVAIL

errors: No known data errors

test.nvl.gz

prod-feng · 2019-04-23T19:33:03Z

Hi,

I did a test on the draidcfg, with the rebase, with a very rare case: 3data+1parity+1spare, it gives the base permutations as below:

draidcfg -r my5.nvl

dRAID1 vdev of 5 child drives: 1 x (3 data + 1 parity) and 1 distributed spare
Using 32 base permutations

3, 0, 2, 1, 4,
1, 3, 4, 2, 0,
0, 4, 2, 3, 1,
3, 1, 2, 0, 4,
0, 1, 3, 2, 4,
0, 1, 2, 3, 4,
3, 2, 1, 4, 0,
2, 1, 0, 4, 3,
4, 3, 1, 2, 0,
4, 1, 2, 0, 3,
1, 4, 2, 0, 3,
2, 4, 1, 0, 3,
0, 3, 2, 4, 1,
4, 1, 3, 2, 0,
3, 4, 2, 0, 1,
0, 3, 2, 4, 1,
0, 4, 3, 1, 2,
4, 1, 3, 0, 2,
4, 2, 0, 3, 1,
1, 0, 4, 3, 2,
3, 2, 4, 1, 0,
3, 0, 2, 1, 4,
4, 3, 1, 0, 2,
3, 0, 1, 4, 2,
2, 0, 3, 4, 1,
1, 3, 0, 4, 2,
3, 2, 0, 1, 4,
4, 2, 3, 1, 0,
3, 2, 1, 4, 0,
2, 1, 0, 4, 3,
2, 0, 4, 1, 3,
3, 1, 2, 0, 4,

There are 5 rows that are identical, such as "3, 0, 2, 1, 4". It seems it is not as randomly as expected?

Also, I am wondering that whether draid can support the case of "(n - s) % (p + d) /= 0". For example, n=10, d=4, p=2, s=1? Basically the random permutation can also work well with this case.

richardelling · 2019-04-23T20:13:24Z

The row combinations can be repeated, but the goal is to get a random spread
as data is allocated to according to rows. In this example, there are 120 combinations
and draidcfg randomly picks 32. It does this a few times and picks the set that has
the least number of collisions (implying it is more random). This has no effect on
data protection, though it might have a small impact on random read performance.

The answer to the second question is no, not today. What would be a good use case
for having more disks than usable space or spares? IMHO, it would be better to use
the space for spares, parity, or data; do you agree?

prod-feng · 2019-04-23T20:46:06Z

Thanks for your reply, Richard.

For the case of 3data+1parity+1spare, the 32 base permutations X 5 times permutation=160 combinations, which is > 120, the total combinations of 5 disks. Since there are 5 base permutations repeated, that is 32-5=27 X 5= 135 combinations(>120). Some of them should be repeated, the chance that it still distribute the data evenly is good. Just want to be sure that when use more disks, like 20, 40, 80, etc., if there's any repeated base permutations, the expected even distribution may be affected badly(much more combinations >permutations). One easy way may be just remove the identical base permutation, and generate a new/different one to replace it.

For my second question:

In this case, with the 32/64/128/base permutations, it will eventually use all disks, just like regular random distribution of data?

Thanks

richardelling · 2019-04-23T21:38:45Z

I'm not sure I follow your math. There are n! combinations, so for n=5 there are 120 combinations.

In general, the way this works is the offset % base_permutation picks the row. Parity, data, and spares are the columns. So for any offset there is a direct map to the disks and their role (parity, data, spare). This just needs to be random enough to spread the workload across all of the disks. By contrast, RAID-5 typically rotates the parity through the smaller number of combinations, n. This likely doesn't make much difference for small values of n. We use n=56, base_permutations=64, combinations=7.1e74 for small products and larger n for others, so we want a random choice.

If it helps, we could prevent duplicates if n! >= base_permutations. But I'm not sure how to quantify the impact.

PrivatePuffin · 2019-11-05T11:59:45Z

Is there any chance @thegreatgazoo is able or willing to continue some work on this any time soon?

It seems a bit odd that he is still listed as a ZoL Organisation member, but seems totally inactive and unrespsonive...

PrivatePuffin · 2019-11-05T17:53:31Z

@kpande The usual stuff basically...
Although according to @ahrens that it should still be actively developed still?
https://twitter.com/mahrens1/status/1191717726118563842

That being said: It would've been nice from intel (as a sponsor) to send a new dude if their dude leaves or at least make sure he posts about leaving.

Shame though, because this is very much in the interest of Intel. Because lets be real: thats is why they started pushing this and Allocation classes in the first place.

ahrens · 2019-11-07T04:45:40Z

Superseded by #9558

Don Brady added 3 commits October 11, 2018 17:03

dRAID implementation PR-7078 rebase

653f520

Signed-off-by: Don Brady <[email protected]>

don-brady requested review from behlendorf, richardelling and tcaputi October 12, 2018 04:04

ahrens added Type: Feature Feature request or new feature Status: Work in Progress Not yet ready for general review labels Oct 12, 2018

behlendorf mentioned this pull request Oct 12, 2018

dRAID vdev driver #7078

Closed

behlendorf added the Status: Post 0.8.0 label Apr 19, 2019

behlendorf removed the Status: Post 0.8.0 label May 23, 2019

ofaaland mentioned this pull request Nov 6, 2019

[WIP] draid rebase #9558

Closed

12 tasks

ahrens closed this Nov 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draid rebase [WIP] #8016

Draid rebase [WIP] #8016

don-brady commented Oct 12, 2018 •

edited by behlendorf

Loading

codecov bot commented Oct 12, 2018

don-brady commented Oct 12, 2018

naclosagc commented Oct 13, 2018 •

edited

Loading

don-brady commented Oct 13, 2018

carlesmateo commented Oct 16, 2018 •

edited

Loading

naclosagc commented Oct 16, 2018 via email

naclosagc commented Oct 16, 2018 via email

feltstykket commented Dec 4, 2018

prod-feng commented Apr 23, 2019

richardelling commented Apr 23, 2019

prod-feng commented Apr 23, 2019

richardelling commented Apr 23, 2019

PrivatePuffin commented Nov 5, 2019

PrivatePuffin commented Nov 5, 2019

ahrens commented Nov 7, 2019

Draid rebase [WIP] #8016

Draid rebase [WIP] #8016

Conversation

don-brady commented Oct 12, 2018 • edited by behlendorf Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

codecov bot commented Oct 12, 2018

Codecov Report

don-brady commented Oct 12, 2018

naclosagc commented Oct 13, 2018 • edited Loading

don-brady commented Oct 13, 2018

carlesmateo commented Oct 16, 2018 • edited Loading

naclosagc commented Oct 16, 2018 via email

naclosagc commented Oct 16, 2018 via email

feltstykket commented Dec 4, 2018

prod-feng commented Apr 23, 2019

richardelling commented Apr 23, 2019

prod-feng commented Apr 23, 2019

richardelling commented Apr 23, 2019

PrivatePuffin commented Nov 5, 2019

PrivatePuffin commented Nov 5, 2019

ahrens commented Nov 7, 2019

don-brady commented Oct 12, 2018 •

edited by behlendorf

Loading

naclosagc commented Oct 13, 2018 •

edited

Loading

carlesmateo commented Oct 16, 2018 •

edited

Loading