Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradual decommit in wks #73620

Merged
merged 9 commits into from
Sep 28, 2022
Merged

Conversation

PeterSolMS
Copy link
Contributor

This addresses a performance regression with regions which was caused by decommitting memory more agressively than with segments.

The fix is to essentially replicate the logic used with segments - we use the time elapsed since the last GC to compute the size we should decommit in this GC. The goal is to ramp down gradually rather than suddenly, in case the application uses more memory again in the future.

This is the regions version modeled after the behavior of the segments version.

Idea is simply to limit the amount of decommitted memory based on the time since the last GC.
…es the logic for the WKS decommit more straightforward.
@ghost
Copy link

ghost commented Aug 9, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

This addresses a performance regression with regions which was caused by decommitting memory more agressively than with segments.

The fix is to essentially replicate the logic used with segments - we use the time elapsed since the last GC to compute the size we should decommit in this GC. The goal is to ramp down gradually rather than suddenly, in case the application uses more memory again in the future.

Author: PeterSolMS
Assignees: -
Labels:

area-GC-coreclr

Milestone: -

@@ -12845,8 +12847,23 @@ void gc_heap::distribute_free_regions()
}
}
#else //MULTIPLE_HEAPS
while (decommit_step())
// we want to limit the amount of decommit we do per time to indirectly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is similar logic already implemented for the svr case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in SVR case gc thread #0 wakes up every 100 ms and call decommit_step().

@Maoni0
Copy link
Member

Maoni0 commented Aug 15, 2022

it would be good to take this opportunity to consolidate the decommit we do for regions for WKS. right now we do call decommit_heap_segment_pages in decommit_ephemeral_segment_pages which can decommit but only for ephemeral_heap_segment which is just one of the gen0 regions. and then we can decommit again in distribute_free_regions. I think we should only do the decommit for regions for WKS in one place (ie, in distribute_free_regions). what do you think? let's talk more in our meeting today.

@PeterSolMS
Copy link
Contributor Author

The logic concerning regions in decommit_ephemeral_segment_pages is specifically intended to decommit the tails of regions where they exceed the budget. This mostly concerns gen 1 regions where gen 1 budget is small, but may also concern gen 0 if we have a lot of pinning. The logic was added with PR#66008.

I have looked into consolidating the logic in distribute_free_regions, but found that having that function deal with parts of regions as well as whole regions would likely make things messier and harder to understand.

So my vote would be to keep this as is.

@Maoni0
Copy link
Member

Maoni0 commented Sep 13, 2022

@PeterSolMS I meant this actual decommit for WKS GC in decommit_ephemeral_segment_pages -

#ifndef MULTIPLE_HEAPS
    // we want to limit the amount of decommit we do per time to indirectly
    // limit the amount of time spent in recommit and page faults
    size_t ephemeral_elapsed = (size_t)((dd_time_clock (dd0) - gc_last_ephemeral_decommit_time) / 1000);
    gc_last_ephemeral_decommit_time = dd_time_clock (dd0);

    // this is the amount we were planning to decommit
    ptrdiff_t decommit_size = heap_segment_committed (ephemeral_heap_segment) - decommit_target;

    // we do a max of DECOMMIT_SIZE_PER_MILLISECOND per millisecond of elapsed time since the last GC
    // we limit the elapsed time to 10 seconds to avoid spending too much time decommitting
    ptrdiff_t max_decommit_size = min (ephemeral_elapsed, (10*1000)) * DECOMMIT_SIZE_PER_MILLISECOND;
    decommit_size = min (decommit_size, max_decommit_size);

    slack_space = heap_segment_committed (ephemeral_heap_segment) - heap_segment_allocated (ephemeral_heap_segment) - decommit_size;
    decommit_heap_segment_pages (ephemeral_heap_segment, slack_space);
#endif // !MULTIPLE_HEAPS

not what you added at the top of this method that sets the decommit target for regions in SVR GC (that doesn't actually do any decommit). the decommit_heap_segment_pages we call for WKS GC makes sense for segments but not for regions because for regions ephemeral_heap_segment is just that very first region in gen0. we are already doing decommit for regions in WKS GC in distribute_free_regions so there's no need to do this in 2 places, especially when the behavior in decommit_ephemeral_segment_pages is questionable.

@mangod9
Copy link
Member

mangod9 commented Sep 28, 2022

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mangod9
Copy link
Member

mangod9 commented Sep 28, 2022

The OSX timeouts are unrelated infra issue per Build Analysis. Merging since this needs to be ported to 7

@mangod9 mangod9 merged commit 84da2ff into dotnet:main Sep 28, 2022
@mangod9
Copy link
Member

mangod9 commented Sep 28, 2022

/backport to release/7.0

@github-actions
Copy link
Contributor

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3144439093

@ghost ghost locked as resolved and limited conversation to collaborators Oct 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants