Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing a bug that causes us to mistakenly demote a gen2 region to gen0 #82413

Merged
merged 1 commit into from
Mar 3, 2023

Conversation

Maoni0
Copy link
Member

@Maoni0 Maoni0 commented Feb 21, 2023

We are seeing a gen0 region that's almost fully occupied (< 24 bytes free) with one giant plug (ie, no free objects at all in the region). This causes allocate_in_condemned_generations to go into an infinite loop because in ephemeral generations we expect short plugs, ie, we should be able to allocate a min free object in front of each plug. And normally we can because when we allocate objects in gen0 we make sure to break up the allocation contexts with min free objects and when we compact into gen1 we form short plugs.

We are in this situation when all of the following conditions are true -

  • we did a gen2 compacting GC that generates a pinned plug in a gen2 region almost as big as the whole region. my guess for the reason why there's this giant pinned plug is because that gen2 region was already really compact so when we called allocate_in_condemned_generations on the non pinned plugs that are next to some pinned plugs in it we discovered we can't move the non pinned plugs anyway so we artificially pinned them and formed a giant pinned plug. and during this GC those objects were no longer pinned so we have one giant non pinned plug.
  • this gen2 region needs to be the last region with pinned plugs;
  • this gen2 region hasn't been consumed by allocate_in_condemned_generations yet so it was processed by process_remaining_regions;

Then in process_remaining_regions we'll set the plan_gen_num for that gen2 region to 0 because we are doing

set_region_plan_gen_num_sip (current_region, current_plan_gen_num);

instead of going through the demotion logic to decide whether we should demote this region or not.

We are seeing a gen0 region that's almost fully occupied (< 24 bytes free) with one giant plug (ie, no free objects at all in the region).
This causes allocate_in_condemned_generations to go into an infinite loop because in ephemeral generations we expect short plugs,
ie, we should be able to allocate a min free object in front of each plug. And normally we can because when we allocate objects in gen0
we make sure to break up the allocation contexts with min free objects and when we compact into gen1 we form short plugs.

We are in this situation when all of the following conditions are true -

+ we did a gen2 compacting GC that generates a pinned plug in a gen2 region almost as big as the whole region. my guess for the reason why there's this giant pinned plug is because that gen2 region was already really compact so when we called allocate_in_condemned_generations on the non pinned plugs that are next to some pinned plugs in it we discovered we can't move the non pinned plugs anyway so we artificially pinned them and formed a giant pinned plug. and during this GC those objects were no longer pinned so we have one giant non pinned plug.
+ this gen2 region needs to be the last region with pinned plugs;
+ this gen2 region hasn't been consumed by allocate_in_condemned_generations yet so it was processed by process_remaining_regions;

Then in process_remaining_regions we'll set the plan_gen_num for that gen2 region to 0 because we are doing

set_region_plan_gen_num_sip (current_region, current_plan_gen_num);

instead of going through the demotion logic to decide whether we should demote this region or not.
@ghost ghost assigned Maoni0 Feb 21, 2023
@Maoni0
Copy link
Member Author

Maoni0 commented Feb 21, 2023

fixes #80073
will need to be backported to 7.0.

@@ -28876,7 +28876,8 @@ void gc_heap::process_remaining_regions (int current_plan_gen_num, generation* c
return;
}

set_region_plan_gen_num_sip (current_region, current_plan_gen_num);
decide_on_demotion_pin_surv (current_region);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code in decide_on_demotion_pin_surv, we may still set the generation to 0 if pinned survival is low enough. I thought that during this GC, the objects were no longer pinned, so pinned survival should be 0. So what am I missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during this GC it's no longer pinned, but it was pinned in the last (gen2 compacting GC) which was why it was (mistakenly) demoted to a gen0 region. so in this GC, since it's no longer pinned, we are calling alloc_in_condemned on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose of the fix is for the last GC we would not have demoted this region to gen0. so in this GC, we wouldn't be calling alloc_in_condemned for a giant plug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see - thanks for the explanation.

Copy link
Contributor

@PeterSolMS PeterSolMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me.

@Maoni0
Copy link
Member Author

Maoni0 commented Mar 3, 2023

this does not fix another problem in process_remaining_regions which is we don't necessarily end up with our "at least 1 region per gen" invariant. when we reach process_remaining_regions, if there isn't a region planned for a generation already, we always run the risk of not getting a region for that generation because process_remaining_regions will pick which gen to plan a region in based on decide_on_demotion_pin_surv. without the fix in this PR, it will always guarantee to have a region in gen0 but not for other generations. but that's not necessarily a good thing because what happens later is in thread_final_regions we will get a new region if we don't see at least one region in a generation, which means we could be getting an empty region in gen2 which will be used much later than an empty region in gen0. so this fix makes that better but doesn't solve the other problem.

however the other problem is only a problem when we can't get new regions in thread_final_regions which means we are very close the limit.

for backporting to 7.0 I think it's ok to just port this one as a fix for the other problem is much more involved. essentially we'd want to make sure during plan we keep our invariant which means we should keep track of how many regions are already planned in each condemened gen and when we demote, make sure we have at least one region in each gen if that's not the case. I'll let that fix bake in 8.0 for a while and see if we still need to backport.

@mangod9
Copy link
Member

mangod9 commented Mar 13, 2023

/backport to release/7.0

@github-actions
Copy link
Contributor

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/4407773176

@ghost ghost locked as resolved and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants