Skip to content

Commit

Permalink
mempolicy: fix migrate_pages(2) syscall return nr_failed
Browse files Browse the repository at this point in the history
[ Upstream commit 1cb5d11 ]

"man 2 migrate_pages" says "On success migrate_pages() returns the number
of pages that could not be moved".  Although 5.3 and 5.4 commits fixed
mbind(MPOL_MF_STRICT|MPOL_MF_MOVE*) to fail with EIO when not all pages
could be moved (because some could not be isolated for migration),
migrate_pages(2) was left still reporting only those pages failing at the
migration stage, forgetting those failing at the earlier isolation stage.

Fix that by accumulating a long nr_failed count in struct queue_pages,
returned by queue_pages_range() when it's not returning an error, for
adding on to the nr_failed count from migrate_pages() in mm/migrate.c.  A
count of pages?  It's more a count of folios, but changing it to pages
would entail more work (also in mm/migrate.c): does not seem justified.

queue_pages_range() itself should only return -EIO in the "strictly
unmovable" case (STRICT without any MOVEs): in that case it's best to
break out as soon as nr_failed gets set; but otherwise it should continue
to isolate pages for MOVing even when nr_failed - as the mbind(2) manpage
promises.

There's a case when nr_failed should be incremented when it was missed:
queue_folios_pte_range() and queue_folios_hugetlb() count the transient
migration entries, like queue_folios_pmd() already did.  And there's a
case when nr_failed should not be incremented when it would have been: in
meeting later PTEs of the same large folio, which can only be isolated
once: fixed by recording the current large folio in struct queue_pages.

Clean up the affected functions, fixing or updating many comments.  Bool
migrate_folio_add(), without -EIO: true if adding, or if skipping shared
(but its arguable folio_estimated_sharers() heuristic left unchanged).
Use MPOL_MF_WRLOCK flag to queue_pages_range(), instead of bool lock_vma.
Use explicit STRICT|MOVE* flags where queue_pages_test_walk() checks for
skipping, instead of hiding them behind MPOL_MF_VALID.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tejun heo <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Stable-dep-of: 091c1dd ("mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM")
Signed-off-by: Sasha Levin <[email protected]>
  • Loading branch information
Hugh Dickins authored and gregkh committed Dec 14, 2024
1 parent 8f149bc commit cc42489
Showing 1 changed file with 159 additions and 179 deletions.
Loading

0 comments on commit cc42489

Please sign in to comment.