-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenZFS - 6363 Add UNMAP/TRIM functionality #5925
Conversation
@dweeezil, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @wca and @grwilson to be potential reviewers. |
It looks like I need to work on the |
@dweeezil Got any details on that failure? |
@skiselkov I'm sure it's because the test is more than just a bit of a hack and not anything wrong with the TRIM code. I think there's some way to get the test suite error output from the bots but I'm going to have to dig into what that way is. The test did work for at least one of the bots. |
The full test output is available from the log link. |
It looks like the test threshold needs to be a bit more liberal:
I'd love to have the test script calculate these numbers in some rational way, but for the time being, it looks like a simple adjustment ought to get it passing on all platforms. |
@dweeezil @behlendorf I'd appreciate if any of you guys could spare a moment to review openzfs/openzfs#172 . The sooner we get this pushed, the sooner I can get to upstreaming other work dependent on it (notably improved resilvering, which depends on some range_tree.c changes from TRIM). |
@skiselkov Yes, absolutely. I just wanted to make sure there were no major issues with this PR against ZoL first and it does appear to be the case now. |
@skiselkov I just finished a first-pass review which incorporates the issues discovered during the ZoL port as well as a couple of other minor things. |
I have reimaged my system and started to test the workflow i already looked at with PR #3656 (ntrim branch) System information:
workflow: results:
|
Looks like the trim tests didn't run. Are they perhaps missing from the Makefile.am?
A rebase on master should get you part the Amazon xfstests failure. I'm not sure what happened with the 32-bit build so I've resubmitted it. |
@behlendorf AFAICT, the necessary things are in the various Makefile.am files. I was getting that "failed verification" error, too, after re-doing the newer test suite stuff and it turned out that I was missing execute permissions on the files, which does appear to be there in the latest commit. In any case, I'll push a rebased and slightly squashed version and see what happens. |
ca90ef0
to
5f90dce
Compare
rerunning the above test with
shows the same result: Run 1 - 1.02 GB/s |
run zpool trim on an empty pool increases trimstats if i run zpool trim on an empty pool (no data has been written or deleted) the trimstats increase with every run
|
@fcrg Manual trim always trims everything. I did have a "only trim never-allocated space" option in earlier versions of the PR but have left it out, at least for the time being, for better alignment with the upstream PR. |
@kpande I've ported (untested) the partial trim patch to the current stack in dweeezil:ntrim2-with-partial-manual-trim. There are some subtle issues with this approach which I'll communicate to you off-list. This patch should behave just as it did before, otherwise. |
@dweezil Manual trim always trims everything means that trim doesn't work on the list of recently deleted items but trims always the not allocated space of the pool? |
@fcrg Yes, that's exactly what it does. The idea behind this is to allow one to fully recover a device to a clean state, regardless of whether it has been used before or not. |
@kpande Yeah, I saw that discussion. TBH, I see no problem in integrating this kind of feature upstream. |
@kpande My implementation used "zpool trim -p" to initiate a manual partial trim. Since @skiselkov seems OK with the idea and the rest of this stack seems to be passing pretty much everything else now, I'll re-submit it with the partial trim support. |
@dweeezil Do you want to hold the upstream PR for this feature, or are you ok with it going in as a separate commit? |
The blkdev_issue_discard() function has been available for a long time by the kernel but it only supports synchronous discards. The __blkdev_issue_discard() function provides an asynchronous interface but was added in the 4.6 kernel. Only supporting synchronously discards can potentially limit performance when processing a large number of small extents. To avoid this an asynchronous discard implementation has been added to vdev_disk.c which builds on existing functionality. The kernel provided synchronous version remains the default pending additional functional and performance testing. Due to different mechamism used for submitting TRIM commands there were not being properly accounted for in the extended statistics. Resolve this by allow for aggregated stats to be returned as part of the TRIM zio. This allows for far better visibility in to the discard request sizes. Minor documentation updates. Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Porting Notes: Man page changes dropped for the moment. This can be reconsiled when the final version is merged to OpenZFS. They are accurate now, only worded a little differently.
1) Removed the first-fit allocator. 2) Moved the autotrim metaslab scheduling logic into vdev_auto_trim. 2a) As a consequence of openzfs#2, metaslab_trimset_t was rendered superfluous. New trimsets are simple range_tree_t's. 3) Made ms_trimming_ts remove extents it is working on from ms_tree and then add them back in. 3a) As a consequence of openzfs#3, undone all the direct changes to the allocators and removed metaslab_check_trim_conflict and range_tree_find_gap. Porting Notes: * Removed WITH_*_ALLOCATOR macros and aligned remaining allocations with OpenZFS. Unused wariables warnings resolved with the gcc __attribute__ ((unused__ keyword. * Added missing calls for ms_condensing_cv. Signed-off-by: Brian Behlendorf <[email protected]>
Porting Notes: * metaslab_sync changes already applied. * resync of test cases needed
1) Simplified the SM_FREE spacemap writing while a trim is active. 2) Simplified the range_tree_verify in metaslab_check_free. 3) Clarified comment above metaslab_trim_all. 4) Substituted 'flust out' with 'drop' in comment in metaslab_trim_all. 5) Moved ms_prev_ts clearing up to ms_cur_ts claring in metaslab_trim_all. 6) Added recomputation of metaslab weight when metaslab is loaded. 7) Moved dmu_tx_commit inside of spa_trim_update_time. 8) Made the smallest allowable manual trim rate 1/1000th of a metaslab size. 9) Switched to using hrtime_t in manual trim timing logic. 10) Changed "limited" to "preserve_spilled" in vdev_auto_trim. 11) Moved vdev_notrim setting into zio_vdev_io_assess.a Porting Notes: * vdev_disk.c and zio.c hunks already applied. * nsec_per_tick -> MSEC2NSEC(1)
… wanting to condense.
Some storage backends such as large thinly-provisioned SANs are very slow for large trims. Manual trim now supports "zpool trim -p" (partial trim) to skip metaslabs for which there is no spacemap. Signed-off-by: Tim Chase <[email protected]>
The existing test cases were split in to multiple test cases and refactored. There are now test cases for the following: zpool_trim_001_pos - Verify manual TRIM zpool_trim_002_pos - Verify manual trim can be interrupted zpool_trim_003_pos - Verify 'zpool trim -s' rate limiting zpool_trim_004_pos - Verify 'zpool trim -p' partial TRIM works zpool_trim_005_neg - Verify bad parameters to 'zpool trim' zpool_trim_006_neg - Verify bad parameters to 'zpool trim -r' autotrim_001_pos - Verify 'autotrim=on' pool data integrity autotrim_002_pos - Verify various pool geometries manualtrim_001_pos - Verify manual trim pool data integrity manualtrim_002_pos - Verify various pool geometries manualtrim_003_pos - Verify 'zpool import|export' manualtrim_004_pos - Verify 'zpool online|offline|replace' manualtrim_005_pos - Verify TRIM and scrub run concurrently Signed-off-by: Brian Behlendorf <[email protected]>
* Rename TRIM taskq threads to be more concise for Linux. * Fix divide by zero panic Signed-off-by: Brian Behlendorf <[email protected]>
Rather than hacking `vdev_raidz_map_alloc()` to get the child offsets calculate the values directly. Signed-off-by: Isaac Huang <[email protected]>
* Fixed missing taskq_destroy when exporting a pool which is being actively trimmed. * Add auto/manual TRIM coverage to ztest. * Temporarily disable manualtrim_004_pos. Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
NOTE: should be squashed into a previous commit
Also, comment-out the ASSERT(!metaslab_should_allocate(msp, asize)); in metaslab_group_alloc_normal(). It seems that the additional metaslab_group_sort() performed by trim makes this assertion invalid.
Refreshed, finally, against a recent master. It does seem to work again, however, I've only given it very light manual testing and it does currently fail at least on of the ZTS tests (not looked into those yet). I figured it would be a good idea to get it through the bots anyway and to show some current activity on this. |
Replaced with #8255. |
Description
Add TRIM support. Replacement for #3656.
Motivation and Context
How Has This Been Tested?
Various stress testing with an assortment of vdev types.
Types of changes
Checklist:
This PR should integrate all the recent changes in the upstream patch set. The stack also includes the separate fixes which were in #3656. It seems stable so far during some fairly abusive testing on SSDs with various types of vdevs. It does not include the "partial" trim support of the previous PR.