-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize several operations during zpool import #11470
Conversation
metaslab_init is the slowest part of importing a mature pool, and it must be repeated hundreds of times for each top-level vdev. But its speed is dominated by a few serialized disk accesses. That can lead to import times of > 1 hour for pools with many top-level vdevs on spinny disks. Speed up the import by using a taskqueue to parallelize vdev_load across all top-level vdevs. openzfs/zfs#11470 Sponsored by: Axcient
This is similar to what we already do in vdev_geom_read_config. openzfs/zfs#11470 Sponsored by: Axcient
The runtime of vdev_validate is dominated by the disk accesses in vdev_label_read_config. Speed it up by validating all vdevs in parallel. openzfs/zfs#11470 Sponsored by: Axcient
Great idea! But just in case you hadn't noticed this does appear to introduce a locking problem with the spa config lock. |
Yep. I didn't notice that at first because assertions were accidentally disabled in my build. I'm testing a fix now. AFAICT it's safe for multiple threads to call |
Of the test cases that failed, none of them failed on more than one run. So I think they're all intermittent. They are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement! I agree the failures look unrelated but let's be sure. If you rebase this on the the latest master and force update the PR you should be a clean run including on FreeBSD HEAD.
Here we go! |
Yes, I think it should be split into three commits. But the final commit actually needs to be squashed to the first, not the third. Would you like me to do it now, or wait until CI finishes? |
@asomers it'd be great if you could sort it out now so each of the commits really can stand by itself. |
metaslab_init is the slowest part of importing a mature pool, and it must be repeated hundreds of times for each top-level vdev. But its speed is dominated by a few serialized disk accesses. That can lead to import times of > 1 hour for pools with many top-level vdevs on spinny disks. Speed up the import by using a taskqueue to parallelize vdev_load across all top-level vdevs. This also requires adding mutex protection to metaslab_class_t.mc_historgram. The mc_histogram fields were unprotected when that code was first written in "Illumos 4976-4984 - metaslab improvements" (OpenZFS f3a7f66). The lock wasn't added until 3dfb57a, though it's unclear exactly which fields it's supposed to protect. In any case, it wasn't until vdev_load was parallelized that any code attempted concurrent access to those fields. Sponsored by: Axcient Signed-off-by: Alan Somers <[email protected]>
This is similar to what we already do in vdev_geom_read_config. Sponsored by: Axcient Signed-off-by: Alan Somers <[email protected]>
The runtime of vdev_validate is dominated by the disk accesses in vdev_label_read_config. Speed it up by validating all vdevs in parallel. Sponsored by: Axcient Signed-off-by: Alan Somers <[email protected]>
This is similar to what we already do in vdev_geom_read_config. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes #11470
The runtime of vdev_validate is dominated by the disk accesses in vdev_label_read_config. Speed it up by validating all vdevs in parallel using a taskq. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes #11470
metaslab_init is the slowest part of importing a mature pool, and it must be repeated hundreds of times for each top-level vdev. But its speed is dominated by a few serialized disk accesses. That can lead to import times of > 1 hour for pools with many top-level vdevs on spinny disks. Speed up the import by using a taskqueue to parallelize vdev_load across all top-level vdevs. This also requires adding mutex protection to metaslab_class_t.mc_historgram. The mc_histogram fields were unprotected when that code was first written in "Illumos 4976-4984 - metaslab improvements" (OpenZFS f3a7f66). The lock wasn't added until 3dfb57a, though it's unclear exactly which fields it's supposed to protect. In any case, it wasn't until vdev_load was parallelized that any code attempted concurrent access to those fields. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#11470
This is similar to what we already do in vdev_geom_read_config. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#11470
The runtime of vdev_validate is dominated by the disk accesses in vdev_label_read_config. Speed it up by validating all vdevs in parallel using a taskq. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#11470
metaslab_init is the slowest part of importing a mature pool, and it must be repeated hundreds of times for each top-level vdev. But its speed is dominated by a few serialized disk accesses. That can lead to import times of > 1 hour for pools with many top-level vdevs on spinny disks. Speed up the import by using a taskqueue to parallelize vdev_load across all top-level vdevs. This also requires adding mutex protection to metaslab_class_t.mc_historgram. The mc_histogram fields were unprotected when that code was first written in "Illumos 4976-4984 - metaslab improvements" (OpenZFS f3a7f66). The lock wasn't added until 3dfb57a, though it's unclear exactly which fields it's supposed to protect. In any case, it wasn't until vdev_load was parallelized that any code attempted concurrent access to those fields. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#11470
This is similar to what we already do in vdev_geom_read_config. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#11470
The runtime of vdev_validate is dominated by the disk accesses in vdev_label_read_config. Speed it up by validating all vdevs in parallel using a taskq. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#11470
Motivation and Context
zpool import
is too slow for large pools on spinning disks. It can sometimes take over an hour. Most of that time is occupied by simply waiting for serialized disk I/O. The worst offender ismetaslab_init
, which must be repeated hundreds of times for each top-level vdev.Description
This PR removes the worst bottlenecks during the import process:
vdev_load
by using a taskqueue to load each top-level vdev.vdev_label_read_config
, similarly to what's already done invdev_geom_read_config.
vdev_validate
by using a taskqueue for all vdevs.How Has This Been Tested?
Performance tested using various large pools on FreeBSD 13. Regression tested using the FreeBSD ZFS test suite. Shortens the
zpool import
time by about 6x for large pools. When combined by the changes in PRs #11469 and #11467 , shortens the import time by about 8x.Types of changes
Checklist:
Signed-off-by
.