-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary txg syncs from receive_object() #7197
Conversation
1b66810 introduced serveral changes which improved the reliability of zfs sends when large dnodes were involved. However, these fixes required adding a few calls to txg_wait_synced() in the DRR_OBJECT handling code. Although most of them are currently necessary, this patch allows the code to continue without waiting in some cases where it doesn't have to. Signed-off-by: Tom Caputi <[email protected]>
95dc40c
to
e0affc8
Compare
Codecov Report
@@ Coverage Diff @@
## master #7197 +/- ##
=========================================
- Coverage 76.36% 76.26% -0.1%
=========================================
Files 327 327
Lines 103785 103788 +3
=========================================
- Hits 79255 79157 -98
- Misses 24530 24631 +101
Continue to review full report at Codecov.
|
A while ago I reported poor speeds on sends and receives for file systems that had large dnodes enabled. It looked like this:
After this PR, it now looks like this (10s interval):
Again, many IOPS and low bandwidth, but it finished in two or three minutes instead of a couple of hours. So this PR helped. |
Glad to hear it. There is still some more work to do here to get back up to where we were before, but this should help about 75% of it. |
Could this have regressed lately? I'm on
The send stream is 1.3 GB. |
This patch did not completely address the problem, but it should have helped a lot. I'm not at my computer right now, but can you provide the last 100 lines or so of /proc/spl/kstat/zfs/dbgmsg? Make sure that /sys/module/zfs/parameters/zfs_flags has bit 9 set first. If you don't know how to do that I can provide better instructions when I am near my computer. |
I checked a couple of older snapshots, the incremental send streams for them were around 100-500 MB, so this 1.3 GB one has reasons to take longer. debug log
|
Looks like this send had as pretty big number of datasets within it. I'm not sure this has anything to do with large dnodes or txg syncs. Might be a good idea to make a new issue for this if the problem is severe enough. |
Unless I copied too much from the log, it should be a single incremental send for a single file system. I'll try again tomorrow with a new destination fs and maybe open another issue, even if just for tracking the remaining changes from this PR. By the way, is there anything out of the ordinary in my setup? Are receives relatively slow for everyone with large dnodes enabled? |
Hard to say if there is anything different about your setup without some more context. I was assuming that the stream included many datasets because of the repeated groupings of fzap_checksize() calls, which in my experience are usually associated with opening new datasets for mounting or receiving. Large dnode receives may be slower than legacy dnode receives for some work loads, but this does not appear to be happening to you based on the errors reported in the log. Sorry for short answers with no formatting. Hard to look at some of this from my phone. |
I'm not sure any more. I cancelled the first one -- I didn't want to, but something didn't handle Does a rolled back receive have visible effects? |
It shouldn't..... Not really sure what might have been going on. |
1b66810 introduced serveral changes which improved the reliability
of zfs sends when large dnodes were involved. However, these fixes
required adding a few calls to txg_wait_synced() in the DRR_OBJECT
handling code. Although most of them are currently necessary, this
patch allows the code to continue without waiting in some cases
where it doesn't have to.
Signed-off-by: Tom Caputi [email protected]
Types of changes
Checklist:
Signed-off-by
.