Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use simple folio migration function to avoid BUG in migrate_folio_extra #16723

Closed
wants to merge 2 commits into from

Conversation

tstabrawa
Copy link
Contributor

@tstabrawa tstabrawa commented Nov 5, 2024

Motivation and Context

Starting in kernel version 6.3.0, Linux page migration is done in a batched manner, where several pages are first unmapped, and then the pages moved. See @RodoMa92's bisection test results in #15140 for details on the kernel change. As a result of this change, and due to ZFS's implicit use of fallback_migrate_folio (by not specifying a .migrate_folio function in its address_space_operations), when dirty pages are migrated, they are written back during the "move" step. Then, when the "move" step is retried, fallback_migrate_folio calls into migrate_folio, which in turn calls into migrate_folio_extra, and migrate_folio_extra BUG's out due to the page still being under writeback.

Note: The kernel page migration code will actually wait for any previously-started writeback during the "unmap" step, which could explain why unbatched retries didn't hit this problem prior to kernel 6.3.0. For example, in 6.2.16, unmap_and_move will wait for writeback to complete each time it retries migration on a page (such as would be the case if fallback_migrate_folio called writeout for the page). In contrast, in 6.3.1, since only the "move" step is repeated on move-related retries, the kernel doesn't wait for writeback operations started by fallback_migrate_folio.

A previous attempt at fixing this problem was merged as part of PR #16568, but this turned out not to be a complete fix. @RodoMa92 re-raised the problem and explained that, while the problem occurred less frequently, it still occurred approximately once per 20 attempts.

Description

Rather than ZFS using the default fallback_migrate_folio function as a result of not specifying a .migrate_folio function, this PR causes it to use the simple folio migration function (migrate_folio) instead. Notably, migrate_folio (since it's intended for filesystems that don't use buffers / private data) doesn't start writeback like fallback_migrate_folio does, and therefore migrate_folio_extra wouldn't BUG() on pages still under writeback, even without us setting PagePrivate on every page.

Additionally, this PR reverts the changes merged as part of PR #16568, as these changes are no longer necessary when using migrate_folio instead of fallback_migrate_folio.

A side effect of this change could be that ZFS page migration gets noticeably faster and/or more effective (as it would no longer be skipping and/or starting writeback on all dirty pages being migrated).

How Has This Been Tested?

Tested by user @RodoMa92 - results in the following comments on #16568:

Also, regression-tested by running ZFS Test Suite (on Ubunutu 23.10, running kernel version 6.5.0-44-generic. No crashes were observed, though several tests were skipped/failed/killed. I don't expect these errors were caused by my changes but would defer to your expertise if you have any concerns:

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

This reverts commit b052035.

Signed-off-by: tstabrawa <[email protected]>
@AllKind AllKind mentioned this pull request Nov 5, 2024
13 tasks
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Signed-off-by: tstabrawa <[email protected]>
@tstabrawa
Copy link
Contributor Author

Looks like I had a typo in the code for older kernel APIs (.migrate_page should be .migratepage). Amending my commit to fix.

diff --git a/module/os/linux/zfs/zpl_file.c b/module/os/linux/zfs/zpl_file.c
index f7ae381..f6e0143 100644
--- a/module/os/linux/zfs/zpl_file.c
+++ b/module/os/linux/zfs/zpl_file.c
@@ -1094,7 +1094,7 @@ const struct address_space_operations zpl_address_space_operations = {
 #ifdef HAVE_VFS_MIGRATE_FOLIO
 	.migrate_folio	= migrate_folio,
 #else
-	.migrate_page	= migrate_page,
+	.migratepage	= migrate_page,
 #endif
 };

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Nov 5, 2024
@behlendorf behlendorf requested a review from bwatkinson November 6, 2024 01:22
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for running this down so quickly. This is a much cleaner fix and clearly the intended way to handle this.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 6, 2024
@behlendorf behlendorf closed this in f38e2d2 Nov 6, 2024
behlendorf pushed a commit that referenced this pull request Nov 6, 2024
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: tstabrawa <[email protected]>
Closes #16568
Closes #16723
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Nov 6, 2024
This reverts commit b052035.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: tstabrawa <[email protected]>
Closes openzfs#16568
Closes openzfs#16723
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Nov 6, 2024
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: tstabrawa <[email protected]>
Closes openzfs#16568
Closes openzfs#16723
@RodoMa92
Copy link

RodoMa92 commented Nov 9, 2024

Thanks for this, @tstabrawa. Still held up fine as of today. I feel much more confident now to say it's finally closed, and allow me again to use the amazing features of ZFS.

Again, thanks a lot :)

Marco

ixhamza pushed a commit to truenas/zfs that referenced this pull request Nov 11, 2024
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: tstabrawa <[email protected]>
Closes openzfs#16568
Closes openzfs#16723
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants