-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ENOSPC in "Handle zap_add() failures in mixed ..." #7421
Conversation
eb6d663
to
cc83da9
Compare
If |
Thanks for your analysis, and for adding a test case! Are we seeing that 2 ZAP's end up with the same salt, or is it that even with (slightly?) different salts, the low bits of the hash value are preserved? Either way, perhaps we should consider initializing the salt with |
@ahrens @trisk |
@tuxoko Good work. This explains things nicely. |
@tuxoko Thanks, makes sense that we'd need the failure to allow expansions on an existing fatzap. |
@tuxoko I tested with I would guess that this is a property of the zap hash function. We should consider replacing it with something better, e.g. cityhash https://github.com/google/cityhash/. But that would require a ZPL on-disk version bump, and obviously your change should go forward without that.
|
@ahrens It is not likely to change your conclusions, but I should point out that the implementations of https://github.com/zfsonlinux/spl/blob/master/module/spl/spl-generic.c#L64 That implies that the tests that the two of you did were using different sources of randomness, but came to the same result. You might have expected that, but I thought I would state it explicitly. |
done | ||
|
||
log_must test $NR_FILES -eq $(ls -U $TESTDIR/src | wc -l) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a quick comment here explaining why you use cp_files
instead of just cp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
I separated out your test case code and tried it on master with/without the bad patch (cc63068). With the bad patch, it correctly failed 18/20 times. When I bumped |
2ddec57
to
2ac5646
Compare
@tonyhutter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tuxoko thanks for getting to the bottom of this.
|
||
WD=$(pwd) | ||
cd $TESTDIR/src | ||
# create NR_FILES in BATCH at a time to prevent overlowing argument buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overflowing?
|
||
log_must test $NR_FILES -eq $(ls -U $TESTDIR/src | wc -l) | ||
|
||
# copy files from src to dst, use cp_files to make sure readdir order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/make sure/ensure
Commit cc63068 caused ENOSPC error when copy a large amount of files between two directories. The reason is that the patch limits zap leaf expansion to 2 retries, and return ENOSPC when failed. The intent for limiting retries is to prevent pointlessly growing table to max size when adding a block full of entries with same name in different case in mixed mode. However, it turns out we cannot use any limit on the retry. When we copy files from one directory in readdir order, we are copying in hash order, one leaf block at a time. Which means that if the leaf block in source directory has expanded 6 times, and you copy those entries in that block, by the time you need to expand the leaf in destination directory, you need to expand it 6 times in one go. So any limit on the retry will result in error where it shouldn't. Note that while we do use different salt for different directories, it seems that the salt/hash function doesn't provide enough randomization to the hash distance to prevent this from happening. Since cc63068 has already been reverted. This patch adds it back and removes the retry limit. Also, as it turn out, failing on zap_add() has a serious side effect for mzap_upgrade(). When upgrading from micro zap to fat zap, it will call zap_add() to transfer entries one at a time. If it hit any error halfway through, the remaining entries will be lost, causing those files to become orphan. This patch add a VERIFY to catch it. Signed-off-by: Chunwei Chen <[email protected]>
Fix the comment. |
Codecov Report
@@ Coverage Diff @@
## master #7421 +/- ##
==========================================
- Coverage 76.53% 76.5% -0.03%
==========================================
Files 335 331 -4
Lines 107100 104299 -2801
==========================================
- Hits 81965 79792 -2173
+ Misses 25135 24507 -628
Continue to review full report at Codecov.
|
Thank you @tuxoko for handling this while I was away. I agree with the changes. The optimisation for bailing out early was to avoid growing the table unnecessarily. But, as seen in this bug we cannot deterministically decide when. Hence, we would need to live with the extra indirection. |
This isn't the best place to ask, but is there a way to detect if you've been affected by this issue yet? If not, will the FAQ when there is? |
Commit cc63068 caused ENOSPC error when copy a large amount of files between two directories. The reason is that the patch limits zap leaf expansion to 2 retries, and return ENOSPC when failed. The intent for limiting retries is to prevent pointlessly growing table to max size when adding a block full of entries with same name in different case in mixed mode. However, it turns out we cannot use any limit on the retry. When we copy files from one directory in readdir order, we are copying in hash order, one leaf block at a time. Which means that if the leaf block in source directory has expanded 6 times, and you copy those entries in that block, by the time you need to expand the leaf in destination directory, you need to expand it 6 times in one go. So any limit on the retry will result in error where it shouldn't. Note that while we do use different salt for different directories, it seems that the salt/hash function doesn't provide enough randomization to the hash distance to prevent this from happening. Since cc63068 has already been reverted. This patch adds it back and removes the retry limit. Also, as it turn out, failing on zap_add() has a serious side effect for mzap_upgrade(). When upgrading from micro zap to fat zap, it will call zap_add() to transfer entries one at a time. If it hit any error halfway through, the remaining entries will be lost, causing those files to become orphan. This patch add a VERIFY to catch it. Reviewed-by: Sanjeev Bagewadi <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Albert Lee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes openzfs#7401 Closes openzfs#7421
Commit cc63068 caused ENOSPC error when copy a large amount of files between two directories. The reason is that the patch limits zap leaf expansion to 2 retries, and return ENOSPC when failed. The intent for limiting retries is to prevent pointlessly growing table to max size when adding a block full of entries with same name in different case in mixed mode. However, it turns out we cannot use any limit on the retry. When we copy files from one directory in readdir order, we are copying in hash order, one leaf block at a time. Which means that if the leaf block in source directory has expanded 6 times, and you copy those entries in that block, by the time you need to expand the leaf in destination directory, you need to expand it 6 times in one go. So any limit on the retry will result in error where it shouldn't. Note that while we do use different salt for different directories, it seems that the salt/hash function doesn't provide enough randomization to the hash distance to prevent this from happening. Since cc63068 has already been reverted. This patch adds it back and removes the retry limit. Also, as it turn out, failing on zap_add() has a serious side effect for mzap_upgrade(). When upgrading from micro zap to fat zap, it will call zap_add() to transfer entries one at a time. If it hit any error halfway through, the remaining entries will be lost, causing those files to become orphan. This patch add a VERIFY to catch it. Reviewed-by: Sanjeev Bagewadi <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Albert Lee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes openzfs#7401 Closes openzfs#7421
Commit cc63068 caused ENOSPC error when copy a large amount of files between two directories. The reason is that the patch limits zap leaf expansion to 2 retries, and return ENOSPC when failed. The intent for limiting retries is to prevent pointlessly growing table to max size when adding a block full of entries with same name in different case in mixed mode. However, it turns out we cannot use any limit on the retry. When we copy files from one directory in readdir order, we are copying in hash order, one leaf block at a time. Which means that if the leaf block in source directory has expanded 6 times, and you copy those entries in that block, by the time you need to expand the leaf in destination directory, you need to expand it 6 times in one go. So any limit on the retry will result in error where it shouldn't. Note that while we do use different salt for different directories, it seems that the salt/hash function doesn't provide enough randomization to the hash distance to prevent this from happening. Since cc63068 has already been reverted. This patch adds it back and removes the retry limit. Also, as it turn out, failing on zap_add() has a serious side effect for mzap_upgrade(). When upgrading from micro zap to fat zap, it will call zap_add() to transfer entries one at a time. If it hit any error halfway through, the remaining entries will be lost, causing those files to become orphan. This patch add a VERIFY to catch it. Reviewed-by: Sanjeev Bagewadi <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Albert Lee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes openzfs#7401 Closes openzfs#7421
@ryao apologies for interrupting you with my tagging, I know this is probably very low priority but is there any way for a user without zfs source code reading ability to verify if is there any impact of this issue? I read on the remedy pull request that this is triggered by copying of "large amount of files", but it does not mention the time distribution of those files. Is there a rate number to compare that guarantees the issue was (or not) triggered? Like N files per second or per cpu cycles? Thank you! :) |
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in openzfs#7421, but it was never fully ported to FreeBSD. Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#13215
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in openzfs#7421, but it was never fully ported to FreeBSD. Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#13215
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in openzfs#7421, but it was never fully ported to FreeBSD. Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#13215
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in openzfs#7421, but it was never fully ported to FreeBSD. Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#13215
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in #7421, but it was never fully ported to FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #13215 Closes #16138
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in openzfs#7421, but it was never fully ported to FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#13215 Closes openzfs#16138
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in openzfs#7421, but it was never fully ported to FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#13215 Closes openzfs#16138
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in openzfs#7421, but it was never fully ported to FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#13215 Closes openzfs#16138
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in #7421, but it was never fully ported to FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #13215 Closes #16138
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in openzfs#7421, but it was never fully ported to FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#13215 Closes openzfs#16138
Commit cc63068 caused ENOSPC error when copy a large amount of files
between two directories. The reason is that the patch limits zap leaf
expansion to 2 retries, and return ENOSPC when failed.
The intent for limiting retries is to prevent pointlessly growing table
to max size when adding a block full of entries with same name in
different case in mixed mode. However, it turns out we cannot use any
limit on the retry. When we copy files from one directory in readdir
order, we are copying in hash order, one leaf block at a time. Which
means that if the leaf block in source directory has expanded 6 times,
and you copy those entries in that block, by the time you need to expand
the leaf in destination directory, you need to expand it 6 times in one
go. So any limit on the retry will result in error where it shouldn't.
Note that while we do use different salt for different directories, it
seems that the salt/hash function doesn't provide enough randomization
to the hash distance to prevent this from happening.
Since cc63068 has already been reverted. This patch adds it back and
removes the retry limit.
Also, as it turn out, failing on zap_add() has a serious side effect for
mzap_upgrade(). When upgrading from micro zap to fat zap, it will
call zap_add() to transfer entries one at a time. If it hit any error
halfway through, the remaining entries will be lost, causing those files
to become orphan. This patch add a VERIFY to catch it.
Signed-off-by: Chunwei Chen [email protected]
Description
Motivation and Context
#7401
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.