Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert that a dnode's bonuslen never exceeds its recorded size #8348

Merged
merged 1 commit into from
Aug 15, 2019

Conversation

sdimitro
Copy link
Contributor

This patch introduces an assertion that can catch pitfalls in
development where there is a mismatch between the size of
reads and writes between a *_phys structure and its respective
in-core structure when bonus buffers are used.

This debugging-aid should be complementary to the verification
done by ztest in ztest_verify_dnode_bt().

A side to this patch is that we now clear out any extra bytes
past a bonus buffer's new size when the buffer is shrinking.

Signed-off-by: Serapheim Dimitropoulos [email protected]

Side Note

This patch is part of a series of side-fixes and refactoring that are part of the Log Spacemap commit in DelphixOS (see delphix/delphix-os@fe7bf6c) and is separated from it to make upstreaming of the actual feature easier.

The change itself is not essential to the Log Spacemap feature but it did helped during development and did catch a few issues. Thus, I've decided to create a separate PR for it.

How Has This Been Tested?

  • Compiled in ZoL and ran ZTS internally
  • Has been running in DelphixOS as part of the Log Spacemap commit for a while now
  • Before adding the code that clears out the extra bytes when a buffer is shrinked, I made sure that the assertion works by detecting the data past the recorded size.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jan 28, 2019
@sdimitro sdimitro added Status: Revision Needed Changes are required for the PR to be accepted and removed Status: Code Review Needed Ready for review and testing labels Jan 28, 2019
@sdimitro
Copy link
Contributor Author

Changed status back to "Revision Needed".

It seems like pretty often we crash on ZTS after the following test:

...
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_to_encrypted (run as root) [00:00] [PASS]

And the reason is because we hit the newly added assertion. Looking into this now..

Copy link
Member

@ahrens ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good. Obviously we need to investigate the test failure. It's possible that this is uncovering an existing bug, which would be exciting!

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm all for additional verification checks, and curious to see what caused the failure.

module/zfs/dbuf.c Outdated Show resolved Hide resolved
@sdimitro sdimitro force-pushed the delphix-dnode-verify branch from 2f2bb92 to dd09539 Compare July 16, 2019 20:38
@sdimitro
Copy link
Contributor Author

sdimitro commented Jul 16, 2019

A lot of send-receive bugs have been opened and closed since this PR was last touched.
Rebasing code and rerunning tests

@behlendorf behlendorf added Status: Code Review Needed Ready for review and testing and removed Status: Revision Needed Changes are required for the PR to be accepted labels Jul 16, 2019
@sdimitro
Copy link
Contributor Author

It seems like the introduced verification progresses by one send/receive test from what it did before but again hits another issue later:

Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_from_encrypted (run as root) [00:01] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_to_encrypted (run as root) [00:01] [PASS]

Will investigate when I get the chance.

@behlendorf behlendorf added Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels Jul 17, 2019
This patch introduces an assertion that can catch pitfalls in
development where there is a mismatch between the size of
reads and writes between a *_phys structure and its respective
in-core structure when bonus buffers are used.

This debugging-aid should be complementary to the verification
done by ztest in ztest_verify_dnode_bt().

A side to this patch is that we now clear out any extra bytes
past a bonus buffer's new size when the buffer is shrinking.

Signed-off-by: Serapheim Dimitropoulos <[email protected]>
@sdimitro sdimitro force-pushed the delphix-dnode-verify branch from dd09539 to a62f306 Compare August 13, 2019 17:19
@sdimitro sdimitro added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Aug 13, 2019
@sdimitro
Copy link
Contributor Author

Updated the review, to skip the case where the bonus buffer is encrypted (it seems like we sometimes write pass the recorded bonuslen of those on purpose).

The rest of the tests that were failing, seem to pass now.

@codecov
Copy link

codecov bot commented Aug 14, 2019

Codecov Report

Merging #8348 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8348      +/-   ##
==========================================
+ Coverage   79.11%   79.13%   +0.01%     
==========================================
  Files         400      400              
  Lines      121790   121806      +16     
==========================================
+ Hits        96357    96392      +35     
+ Misses      25433    25414      -19
Flag Coverage Δ
#kernel 79.72% <100%> (-0.03%) ⬇️
#user 67.05% <100%> (+0.49%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8242a9...a62f306. Read the comment docs.

@behlendorf
Copy link
Contributor

@sdimitro I think that makes sense. But let's run this by @tcaputi to verify that's expected.

I've resubmitted the two TEST runs which failed, though the reported failures did appear to be unrelated known issues.

@sdimitro sdimitro requested a review from tcaputi August 14, 2019 14:54
Copy link
Contributor

@tcaputi tcaputi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me

@sdimitro sdimitro added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 14, 2019
@behlendorf behlendorf merged commit 0e37a0f into openzfs:master Aug 15, 2019
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 24, 2019
This patch introduces an assertion that can catch pitfalls in
development where there is a mismatch between the size of
reads and writes between a *_phys structure and its respective
in-core structure when bonus buffers are used.

This debugging-aid should be complementary to the verification
done by ztest in ztest_verify_dnode_bt().

A side to this patch is that we now clear out any extra bytes
past a bonus buffer's new size when the buffer is shrinking.

Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes openzfs#8348
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 27, 2019
This patch introduces an assertion that can catch pitfalls in
development where there is a mismatch between the size of
reads and writes between a *_phys structure and its respective
in-core structure when bonus buffers are used.

This debugging-aid should be complementary to the verification
done by ztest in ztest_verify_dnode_bt().

A side to this patch is that we now clear out any extra bytes
past a bonus buffer's new size when the buffer is shrinking.

Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes openzfs#8348
tonyhutter pushed a commit that referenced this pull request Jan 23, 2020
This patch introduces an assertion that can catch pitfalls in
development where there is a mismatch between the size of
reads and writes between a *_phys structure and its respective
in-core structure when bonus buffers are used.

This debugging-aid should be complementary to the verification
done by ztest in ztest_verify_dnode_bt().

A side to this patch is that we now clear out any extra bytes
past a bonus buffer's new size when the buffer is shrinking.

Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8348
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants