Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc: add some guard rails and refinements to MemBalancer #52197

Merged
merged 3 commits into from
Dec 15, 2023

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Nov 16, 2023

This replaces #50909, though notably does not include the change to use heap size instead of heap memory. Similar to that PR however, it is expected (and observed in running the GC benchmarks scripts) that this may show slightly higher memory utilization at usually similar performance since the tuning parameter is increased.

This adds the smoothing behavior from that prior PR (to better estimate the long-term rates / ignore transient changes), updates the GC_TIME printing to reflect the change to use MemBalancer heuristics, and adds some other guardrails to the decisions so they do not get put off too far into the future. Since, unlike several other languages that use MemBalancer, we do not have a time-based trigger for GC to update these heuristics continuously, so we need to make sure each step is reasonably conservative (both from under and over predicting the rate). Finally, this is stricter about observing limits set by the user, by strictly limiting the exceedence rate to around 10%, while avoiding some prior possible issues with the hard cut-off being disjoint at the cutoff. This should mean we will go over the threshold slowly if the program continues to demand more space. If we OOM eventually by the kerenl, we would have died anyways from OOM now by ourself.

@vtjnash vtjnash added the GC Garbage collector label Nov 16, 2023
@vtjnash vtjnash requested review from gbaraldi and d-netto November 16, 2023 21:29
@vtjnash vtjnash force-pushed the jn/membalancer-guardrails branch from 10aa815 to 9bf067a Compare November 17, 2023 14:20
@d-netto d-netto force-pushed the jn/membalancer-guardrails branch from 9bf067a to 22e0821 Compare December 7, 2023 02:21
@d-netto
Copy link
Member

d-netto commented Dec 7, 2023

Rebasing to make sure this is on top of #52294, since it may change performance a bit.

Some transients cause the MemBalancer heuristics to compute odd values.
Since there is not a background thread monitoring these rates, smooth
out these transients at each interval and add a sequence of hard limits
that iteratively refines these estimates.

Some principles here:
 - Hard limits should always be applied as a MIN or MAX, not just
 applied directly (which might go too far or in the wrong direction).
 - The max heap should alter the allocation heuristics regime to change
 from the time based balancer to a proportional limit. The
 overallocation function accordingly is changed from a simple sqrt,
 which tends to start off too fast and end up too slow, into a low power
 polynomial (taken from array.c).
@vtjnash vtjnash force-pushed the jn/membalancer-guardrails branch from 22e0821 to 32f6880 Compare December 12, 2023 18:13
@vtjnash
Copy link
Member Author

vtjnash commented Dec 12, 2023

The only notable change I see in benchmarks is pidigits, which seems to be notably faster. I also cut the thresholds in the latest commit, which seems to have a minor impact but generally means this now runs GC at about the same rate as before in the nominal cases.

$ julia --project=. ./run_benchmarks.jl slow all
category = "bigint"                                                                                      
bench = "pidigits.jl"                                                                                    
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      44531 │    2170 │       373 │       1794 │           89 │             92336 │      447 │          4 │
│  median │      51878 │    2299 │       386 │       1916 │           94 │            111978 │      447 │          4 │
│ maximum │      56642 │    2366 │       399 │       1970 │          104 │            120129 │      491 │          5 │
│   stdev │       3594 │      57 │         9 │         50 │            6 │              7579 │       18 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      33437 │    3608 │       287 │       3306 │           78 │             76550 │      150 │         11 │
│  median │      33966 │    3652 │       295 │       3356 │           80 │             80902 │      152 │         11 │
│ maximum │      34107 │    3724 │       301 │       3435 │           96 │             86443 │      154 │         11 │
│   stdev │        192 │      35 │         5 │         37 │            5 │              3328 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

@d-netto @kpamnany are you happy with this also?

@kpamnany
Copy link
Contributor

If possible, please hold on merging for a couple of days @vtjnash; we're testing this and will report back.

@d-netto
Copy link
Member

d-netto commented Dec 13, 2023

Sorry for the delay on reviewing this @vtjnash.

I share the same thoughts as Kiran: if you could hold merging for one week so that we can do a careful review and testing of this, that would be great.

Thanks in advance.

@kpamnany
Copy link
Contributor

We backported and tested this. I think the results are better than previous iterations of the MemBalancer logic (cc: @d-netto to confirm) but still, there's at least one OOM-kill and two (not very large) perf regressions. The good news is that apart from those, everything else looks perf neutral so I think this is a step forward?

@vtjnash vtjnash merged commit e8576fc into master Dec 15, 2023
6 of 8 checks passed
@vtjnash vtjnash deleted the jn/membalancer-guardrails branch December 15, 2023 17:35
@vtjnash
Copy link
Member Author

vtjnash commented Dec 15, 2023

If you re-run that test with the gc files compiled with -DGC_TIME=1, it should give some more specifics now on what threshold it is targeting after each collection, and especially about what is going on heuristically as it approaches OOM.

I think currently the areas I see it make the most poor updates are the periodic full collection, as it results in a different behavior (as expected) from the incremental collections. We should consider completely separating those statistics, so that the relative desired rates of incremental and full collections can be more accurately computed.

d-netto pushed a commit that referenced this pull request Mar 15, 2024
This replaces #50909, though
notably does not include the change to use heap size instead of heap
memory.

This adds the smoothing behavior from that prior PR (to better estimate
the long-term rates / ignore transient changes), updates the GC_TIME
printing to reflect the change to use MemBalancer heuristics, and adds
some other guardrails to the decisions so they do not get put off too
far into the future. Since, unlike several other languages that use
MemBalancer, we do not have a time-based trigger for GC to update these
heuristics continuously, so we need to make sure each step is reasonably
conservative (both from under and over predicting the rate). Finally,
this is stricter about observing limits set by the user, by strictly
limiting the exceedence rate to around 10%, while avoiding some prior
possible issues with the hard cut-off being disjoint at the cutoff. This
should mean we will go over the threshold slowly if the program
continues to demand more space. If we OOM eventually by the kerenl, we
would have died anyways from OOM now by ourself.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants