gc: add some guard rails and refinements to MemBalancer #52197

vtjnash · 2023-11-16T21:29:59Z

This replaces #50909, though notably does not include the change to use heap size instead of heap memory. Similar to that PR however, it is expected (and observed in running the GC benchmarks scripts) that this may show slightly higher memory utilization at usually similar performance since the tuning parameter is increased.

This adds the smoothing behavior from that prior PR (to better estimate the long-term rates / ignore transient changes), updates the GC_TIME printing to reflect the change to use MemBalancer heuristics, and adds some other guardrails to the decisions so they do not get put off too far into the future. Since, unlike several other languages that use MemBalancer, we do not have a time-based trigger for GC to update these heuristics continuously, so we need to make sure each step is reasonably conservative (both from under and over predicting the rate). Finally, this is stricter about observing limits set by the user, by strictly limiting the exceedence rate to around 10%, while avoiding some prior possible issues with the hard cut-off being disjoint at the cutoff. This should mean we will go over the threshold slowly if the program continues to demand more space. If we OOM eventually by the kerenl, we would have died anyways from OOM now by ourself.

d-netto · 2023-12-07T02:21:34Z

Rebasing to make sure this is on top of #52294, since it may change performance a bit.

Some transients cause the MemBalancer heuristics to compute odd values. Since there is not a background thread monitoring these rates, smooth out these transients at each interval and add a sequence of hard limits that iteratively refines these estimates. Some principles here: - Hard limits should always be applied as a MIN or MAX, not just applied directly (which might go too far or in the wrong direction). - The max heap should alter the allocation heuristics regime to change from the time based balancer to a proportional limit. The overallocation function accordingly is changed from a simple sqrt, which tends to start off too fast and end up too slow, into a low power polynomial (taken from array.c).

vtjnash · 2023-12-12T18:34:59Z

The only notable change I see in benchmarks is pidigits, which seems to be notably faster. I also cut the thresholds in the latest commit, which seems to have a minor impact but generally means this now runs GC at about the same rate as before in the nominal cases.

$ julia --project=. ./run_benchmarks.jl slow all
category = "bigint"                                                                                      
bench = "pidigits.jl"                                                                                    
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      44531 │    2170 │       373 │       1794 │           89 │             92336 │      447 │          4 │
│  median │      51878 │    2299 │       386 │       1916 │           94 │            111978 │      447 │          4 │
│ maximum │      56642 │    2366 │       399 │       1970 │          104 │            120129 │      491 │          5 │
│   stdev │       3594 │      57 │         9 │         50 │            6 │              7579 │       18 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      33437 │    3608 │       287 │       3306 │           78 │             76550 │      150 │         11 │
│  median │      33966 │    3652 │       295 │       3356 │           80 │             80902 │      152 │         11 │
│ maximum │      34107 │    3724 │       301 │       3435 │           96 │             86443 │      154 │         11 │
│   stdev │        192 │      35 │         5 │         37 │            5 │              3328 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

@d-netto @kpamnany are you happy with this also?

kpamnany · 2023-12-13T15:46:56Z

If possible, please hold on merging for a couple of days @vtjnash; we're testing this and will report back.

d-netto · 2023-12-13T15:52:58Z

Sorry for the delay on reviewing this @vtjnash.

I share the same thoughts as Kiran: if you could hold merging for one week so that we can do a careful review and testing of this, that would be great.

Thanks in advance.

kpamnany · 2023-12-15T13:37:44Z

We backported and tested this. I think the results are better than previous iterations of the MemBalancer logic (cc: @d-netto to confirm) but still, there's at least one OOM-kill and two (not very large) perf regressions. The good news is that apart from those, everything else looks perf neutral so I think this is a step forward?

vtjnash · 2023-12-15T17:39:53Z

If you re-run that test with the gc files compiled with -DGC_TIME=1, it should give some more specifics now on what threshold it is targeting after each collection, and especially about what is going on heuristically as it approaches OOM.

I think currently the areas I see it make the most poor updates are the periodic full collection, as it results in a different behavior (as expected) from the incremental collections. We should consider completely separating those statistics, so that the relative desired rates of incremental and full collections can be more accurately computed.

This replaces #50909, though notably does not include the change to use heap size instead of heap memory. This adds the smoothing behavior from that prior PR (to better estimate the long-term rates / ignore transient changes), updates the GC_TIME printing to reflect the change to use MemBalancer heuristics, and adds some other guardrails to the decisions so they do not get put off too far into the future. Since, unlike several other languages that use MemBalancer, we do not have a time-based trigger for GC to update these heuristics continuously, so we need to make sure each step is reasonably conservative (both from under and over predicting the rate). Finally, this is stricter about observing limits set by the user, by strictly limiting the exceedence rate to around 10%, while avoiding some prior possible issues with the hard cut-off being disjoint at the cutoff. This should mean we will go over the threshold slowly if the program continues to demand more space. If we OOM eventually by the kerenl, we would have died anyways from OOM now by ourself.

vtjnash added the GC Garbage collector label Nov 16, 2023

vtjnash requested review from gbaraldi and d-netto November 16, 2023 21:29

vtjnash force-pushed the jn/membalancer-guardrails branch from 10aa815 to 9bf067a Compare November 17, 2023 14:20

d-netto force-pushed the jn/membalancer-guardrails branch from 9bf067a to 22e0821 Compare December 7, 2023 02:21

vtjnash added 3 commits December 12, 2023 17:29

fix MemBalencer smoothing

4bee663

reduce by half

32f6880

vtjnash force-pushed the jn/membalancer-guardrails branch from 22e0821 to 32f6880 Compare December 12, 2023 18:13

oscardssmith added the performance Must go faster label Dec 13, 2023

d-netto mentioned this pull request Dec 14, 2023

Backport MemBalancer refinements RelationalAI/julia#122

Closed

vtjnash merged commit e8576fc into master Dec 15, 2023

vtjnash deleted the jn/membalancer-guardrails branch December 15, 2023 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gc: add some guard rails and refinements to MemBalancer #52197

gc: add some guard rails and refinements to MemBalancer #52197

vtjnash commented Nov 16, 2023

d-netto commented Dec 7, 2023

vtjnash commented Dec 12, 2023

kpamnany commented Dec 13, 2023

d-netto commented Dec 13, 2023

kpamnany commented Dec 15, 2023

vtjnash commented Dec 15, 2023

gc: add some guard rails and refinements to MemBalancer #52197

gc: add some guard rails and refinements to MemBalancer #52197

Conversation

vtjnash commented Nov 16, 2023

d-netto commented Dec 7, 2023

vtjnash commented Dec 12, 2023

kpamnany commented Dec 13, 2023

d-netto commented Dec 13, 2023

kpamnany commented Dec 15, 2023

vtjnash commented Dec 15, 2023