Fixed amount of work for rb_tree.jl #61

d-netto · 2023-04-26T23:10:54Z

I believe the previous version was imposing a time cutoff for the benchmark.

This PR eliminates the time cutoffs and sets a lower iteration count.

N=10_000_000 was chosen in order to make the benchmark last ~1min on my machine:

┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      84248 │   11706 │     11431 │        276 │         3330 │                 9 │     1010 │         14 │
│  median │      84248 │   11706 │     11431 │        276 │         3330 │                 9 │     1010 │         14 │
│ maximum │      84248 │   11706 │     11431 │        276 │         3330 │                 9 │     1010 │         14 │
│   stdev │        NaN │     NaN │       NaN │        NaN │          NaN │               NaN │      NaN │        NaN │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

d-netto · 2023-04-26T23:12:58Z

@kpamnany: could you take a look as well? Can't assign you as a reviewer for some reason.

kpamnany

This really needs to run at least 10 minutes to expose the extent of the pause time increase. That's why it's in slow/. Suggest finding an N corresponding to that amount of time.

d-netto · 2023-04-26T23:47:31Z

N = 20_000_000 makes the benchmark last a bit more than 10min on my machine.

vchuravy · 2023-04-27T01:29:40Z

This really needs to run at least 10 minutes to expose the extent of the pause time increase.

I am not sure there is useful information in there after the first ~60s

elapsed=10.000182151794434s, 2358200 current points, 2358200 total, 235815.0 per second
elapsed=20.000406980514526s, 3896300 current points, 3896300 total, 194811.0 per second
elapsed=30.000055074691772s, 5569000 current points, 5569000 total, 185632.0 per second
elapsed=40.00019717216492s, 7022500 current points, 7022500 total, 175561.0 per second
elapsed=50.00018310546875s, 9029300 current points, 9029300 total, 180585.0 per second
elapsed=64.95979309082031s, 10940500 current points, 10940500 total, 168419.0 per second
...
elapsed=610.0001201629639s, 87594200 current points, 87594200 total, 143597.0 per second
elapsed=620.0000710487366s, 89046400 current points, 89046400 total, 143623.0 per second
elapsed=630.0003559589386s, 90528100 current points, 90528100 total, 143695.0 per second
elapsed=640.0005731582642s, 92004200 current points, 92004200 total, 143756.0 per second
elapsed=650.000018119812s, 93471400 current points, 93471400 total, 143802.0 per second
elapsed=660.0004801750183s, 94934100 current points, 94934100 total, 143839.0 per second
elapsed=670.000039100647s, 96403900 current points, 96403900 total, 143886.0 per second
elapsed=680.0004360675812s, 97875400 current points, 97875400 total, 143934.0 per second
elapsed=690.0000729560852s, 99332400 current points, 99332400 total, 143959.0 per second

For a useful benchmark the absolute length of GC is not that interesting for me.
Here I can see that the points per second processed decrease over time, and with instrumentation I could maybe
see that the reason for that is GC taking longer and longer as a percentage of the time-step. E.g. that GC performs scales worse
with the number of items .

For a good benchmark my rules are:

Is it representative of a particular workload
Is the shape of the heap useful
Is it targeted (some of the benchmarks here seem silly but target specific aspects of the GC subsystem)
Can I use it to improve the GC
Is the workload dependent on heap size

A benchmark that needs 10 minutes to run exceeds the threshold of "Can I use it to improve the GC".

So the question is what is the "goal" of the benchmark.
Is it to be representative (then it doesn't need to run 10minutes)?
Does it show behavior only exhibited by a long running workload?
Does it demonstrate a particular scenario when we reach a certain amount of memory used?

vchuravy · 2023-04-27T01:32:06Z

Maybe the answer is that we need two versions of this benchmark (and of the other slow benchmarks)

A version fast enough to be used for tuning the GC and another that is more of the verification of the GC.

kpamnany · 2023-04-27T15:35:53Z

For a useful benchmark the absolute length of GC is not that interesting for me.

The length of GC is a "feature" of GC behavior. Knowing how big this can get is absolutely an interesting datum. G1, Shenandoah, LXR, etc. all make claims about maximum pause time. What's the maximum pause time for our GC? If we don't have an answer to that question, then you can't make a meaningful comparison with another GC.

kpamnany · 2023-04-27T15:43:14Z

Also worth noting is that this is not a "torture" test. It is designed to be representative of a real workload. That it takes a long time to create this heap shape is practically irrelevant to the point, which is to show GC behavior for this size and shape of heap.

I understand that it becomes inconvenient when the runtime is long, but the solution is not to discard or break the benchmark but perhaps to find a way to quickly recreate the heap shape at 10 minutes. Which is probably beyond scope for us.

I'm not sure there is much value to the benchmark if it runs for a minute or two -- at that point it isn't too different from the tree benchmarks?

d-netto · 2023-05-08T02:26:30Z

I ran this benchmark on a small aarch64 laptop and also on a large x86_64 server. On both of them, 50M nodes was enough to make the benchmark last for more than 10 minutes and to cause pathological (more than 20 seconds) GC pauses, which should be more than enough to exercise the desired behavior.

If there are no objections, I'll be this merging by tomorrow night.

fixed amount of work at rb_tree

9159a16

d-netto requested a review from vchuravy April 26, 2023 23:12

kpamnany reviewed Apr 26, 2023

View reviewed changes

choose N to get >= 10min of elapsed time

66d1abc

d-netto force-pushed the dcn/rb_tree branch from d6d3dc0 to 66d1abc Compare April 26, 2023 23:46

This was referenced May 2, 2023

Run GC on multiple threads JuliaLang/julia#48600

Merged

Promote objects more eagerly JuliaLang/julia#49644

Merged

50M nodes in rb_tree.jl

a7c6abb

vchuravy merged commit ff744ce into main May 8, 2023

vchuravy deleted the dcn/rb_tree branch May 8, 2023 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed amount of work for rb_tree.jl #61

Fixed amount of work for rb_tree.jl #61

d-netto commented Apr 26, 2023

d-netto commented Apr 26, 2023

kpamnany left a comment

d-netto commented Apr 26, 2023

vchuravy commented Apr 27, 2023

vchuravy commented Apr 27, 2023

kpamnany commented Apr 27, 2023

kpamnany commented Apr 27, 2023

d-netto commented May 8, 2023

Fixed amount of work for rb_tree.jl #61

Fixed amount of work for rb_tree.jl #61

Conversation

d-netto commented Apr 26, 2023

d-netto commented Apr 26, 2023

kpamnany left a comment

Choose a reason for hiding this comment

d-netto commented Apr 26, 2023

vchuravy commented Apr 27, 2023

vchuravy commented Apr 27, 2023

kpamnany commented Apr 27, 2023

kpamnany commented Apr 27, 2023

d-netto commented May 8, 2023