Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zfs_write() / mmap update_time() lock inversion #7942

Closed
wants to merge 1 commit into from

Conversation

behlendorf
Copy link
Contributor

Motivation and Context

Alternate approach to #7939, which is intended to address #7512.
Pushed for the purposes and review feedback, testing, and letting
the CI test it out.

Description

When a page is faulted in by filemap_page_mkwrite() this function
may be called by update_time() with the file's mmap_sem held.
Therefore it's necessary to use TXG_NOWAIT since we cannot release
the mmap_sem, and even if we could, it would be undesirable to
delay the page fault. TXG_NOTHROTTLE will be set as needed to
bypass the write throttle. In the unlikely case the transaction
cannot be assigned set z_atime_dirty=1 so at least the times will
be updated when the file is closed.

How Has This Been Tested?

Locally built and tested using the reproduced provided in #7939.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@behlendorf behlendorf requested a review from tuxoko September 21, 2018 22:04
*/
if (flags == I_DIRTY_TIME) {
if (flags & I_DIRTY_TIME &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* cannot be assigned set z_atime_dirty=1 so at least the times will
* be updated when the file is closed.
*/
boolean_t waited = B_FALSE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this. Is this initialization going to take effect once or every time?
We should move this to beginning of the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course thank you!

* cannot be assigned set z_atime_dirty=1 so at least the times will
* be updated when the file is closed.
*/
error = dmu_tx_assign(tx, (waited ? TXG_NOTHROTTLE : 0) | TXG_NOWAIT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dmu_tx_try_assign:

			if (dn->dn_assigned_txg == tx->tx_txg - 1) {
				mutex_exit(&dn->dn_mtx);
				tx->tx_needassign_txh = txh;
				DMU_TX_STAT_BUMP(dmu_tx_group);
				return (SET_ERROR(ERESTART));
			}

It doesn't seem that TXG_NOTHROTTLE will prevent dmu_tx_assign from failing? Perhaps we should bail out on waited && error == ERESTART?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. You're right, it won't prevent it from failing under all circumstances, only due to the write throttle. I assume your specific concern is that we could end up spinning here, which would be just as bad.

dmu_tx_abort(tx);
goto out;
zp->z_atime_dirty = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is that we only do lazy update atime, but not mtime and ctime. We need to add that otherwise we will lose those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, unfortunate we really don't have a mechanism for that currently. Deferring the atime update to zfs_inactive already isn't really correct, but there was already machinery which I figured was better than nothing for this unlikely case. In order to handle all cases here I think we'd need to redirty the inode.

@codecov
Copy link

codecov bot commented Sep 22, 2018

Codecov Report

Merging #7942 into master will increase coverage by 0.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7942      +/-   ##
==========================================
+ Coverage   43.95%   43.96%   +0.01%     
==========================================
  Files         318      318              
  Lines      103561   103566       +5     
==========================================
+ Hits        45521    45535      +14     
+ Misses      58040    58031       -9
Flag Coverage Δ
#kernel 7.61% <0%> (-0.01%) ⬇️
#user 49.96% <ø> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36e369e...4715f1a. Read the comment docs.

When a page is faulted in by filemap_page_mkwrite() this function
may be called by update_time() with the file's mmap_sem held.
Therefore it's necessary to use TXG_NOWAIT since we cannot release
the mmap_sem, and even if we could, it would be undesirable to
delay the page fault.  TXG_NOTHROTTLE will be set as needed to
bypass the write throttle.  In the unlikely case the transaction
cannot be assigned set z_atime_dirty=1 so at least the times will
be updated when the file is closed.

Signed-off-by: Brian Behlendorf <[email protected]>
@behlendorf
Copy link
Contributor Author

Closing in favor of the proposal in #7939.

@behlendorf behlendorf closed this Sep 25, 2018
@behlendorf behlendorf added Status: Abandoned and removed Status: Code Review Needed Ready for review and testing labels Sep 25, 2018
@behlendorf behlendorf deleted the issue-7512 branch April 19, 2021 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants