Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yet more atomics & cache-line fixes on work-stealing queue #53424

Merged
merged 1 commit into from
Mar 4, 2024

Conversation

d-netto
Copy link
Member

@d-netto d-netto commented Feb 22, 2024

This regression at a large number of GC threads still shows up on a large aarch64 machine (80-core ARM Neoverse N1).

bench = "many_refs.jl"
(gcthreads, threads) = (1, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3807 │    2826 │      2633 │        193 │          960 │                13 │      892 │         74 │
│  median │       3810 │    2826 │      2633 │        193 │          961 │                15 │      892 │         74 │
│ maximum │       3810 │    2831 │      2638 │        193 │          962 │                27 │      892 │         74 │
│   stdev │          2 │       3 │         3 │          0 │            1 │                 7 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2455 │    1476 │      1350 │        126 │          488 │                20 │      892 │         60 │
│  median │       2459 │    1478 │      1352 │        126 │          489 │                23 │      892 │         60 │
│ maximum │       2465 │    1479 │      1352 │        126 │          489 │                23 │      893 │         60 │
│   stdev │          5 │       1 │         1 │          0 │            1 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2173 │    1194 │      1115 │         77 │          463 │                18 │      892 │         54 │
│  median │       2187 │    1194 │      1116 │         78 │          463 │                19 │      893 │         55 │
│ maximum │       2217 │    1208 │      1130 │         78 │          463 │                19 │      893 │         55 │
│   stdev │         22 │       8 │         8 │          1 │            0 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2191 │    1212 │      1149 │         63 │          539 │                19 │      901 │         55 │
│  median │       2199 │    1219 │      1156 │         63 │          540 │                20 │      901 │         55 │
│ maximum │       2201 │    1222 │      1157 │         65 │          540 │                20 │      901 │         56 │
│   stdev │          5 │       5 │         4 │          1 │            1 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (16, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3897 │    2916 │      2854 │         60 │         1423 │                20 │      902 │         75 │
│  median │       4012 │    3032 │      2972 │         62 │         1491 │                21 │      904 │         75 │
│ maximum │       4026 │    3106 │      3044 │         62 │         1511 │                22 │      904 │         77 │
│   stdev │         70 │      96 │        96 │          1 │           46 │                 1 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

This patch seems to improve performance at 16 GC threads though we still have some negative scaling for this benchmark beyond 8 GC threads (others seem fine). I didn't test whether this is indeed the optimal struct layout for the queue in this benchmark.

bench = "many_refs.jl"
(gcthreads, threads) = (1, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3807 │    2822 │      2630 │        193 │          959 │                15 │      892 │         74 │
│  median │       3824 │    2823 │      2630 │        193 │          959 │                15 │      892 │         74 │
│ maximum │       4307 │    3194 │      2974 │        220 │         1086 │                19 │      892 │         74 │
│   stdev │        284 │     215 │       199 │         16 │           74 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2459 │    1476 │      1349 │        126 │          488 │                19 │      892 │         60 │
│  median │       2460 │    1479 │      1352 │        127 │          488 │                21 │      893 │         60 │
│ maximum │       2770 │    1661 │      1526 │        135 │          570 │                22 │      893 │         60 │
│   stdev │        179 │     106 │       101 │          5 │           47 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2215 │    1233 │      1155 │         77 │          485 │                20 │      892 │         53 │
│  median │       2320 │    1248 │      1171 │         78 │          492 │                21 │      893 │         54 │
│ maximum │       2372 │    1259 │      1176 │         83 │          493 │                22 │      893 │         56 │
│   stdev │         80 │      13 │        11 │          3 │            5 │                 1 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2204 │    1220 │      1156 │         64 │          544 │                23 │      901 │         53 │
│  median │       2362 │    1249 │      1181 │         64 │          553 │                23 │      901 │         53 │
│ maximum │       2374 │    1261 │      1196 │         68 │          558 │                25 │      901 │         55 │
│   stdev │         95 │      21 │        20 │          2 │            7 │                 1 │        0 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (16, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2502 │    1519 │      1458 │         62 │          721 │                22 │      902 │         58 │
│  median │       2511 │    1524 │      1461 │         63 │          728 │                23 │      903 │         61 │
│ maximum │       2664 │    1554 │      1486 │         68 │          741 │                25 │      905 │         61 │
│   stdev │         91 │      19 │        16 │          3 │           10 │                 1 │        2 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

@d-netto d-netto added performance Must go faster GC Garbage collector labels Feb 22, 2024
@gbaraldi
Copy link
Member

Maybe we should do this here

julia/src/gc.h

Lines 324 to 329 in e42ffa6

typedef struct {
_Atomic(size_t) bytes_mapped;
_Atomic(size_t) bytes_resident;
_Atomic(size_t) heap_size;
_Atomic(size_t) heap_target;
} gc_heapstatus_t;
so that changing heap_size doesn't invalidade the heap_target (not sure if it makes too much of a difference because I think we almost always load both of them) and here

julia/src/gc.c

Lines 1401 to 1408 in 3e6ff3d

JL_DLLEXPORT double jl_gc_page_utilization_stats[JL_GC_N_MAX_POOLS];
STATIC_INLINE void gc_update_page_fragmentation_data(jl_gc_pagemeta_t *pg) JL_NOTSAFEPOINT
{
gc_fragmentation_stat_t *stats = &gc_page_fragmentation_stats[pg->pool_n];
jl_atomic_fetch_add(&stats->n_freed_objs, pg->nfree);
jl_atomic_fetch_add(&stats->n_pages_allocd, 1);
}
so that different threads updating this don't invalidate eachother.
Also thos fetch_adds can probably be relaxed

@vtjnash
Copy link
Member

vtjnash commented Feb 22, 2024

If the code is going to invalidate several fields, it is probably best for them to be in the same cache line, since then the update of the second one can be almost free (tagging along with the first one in the cache line lock)

Anyways, this PR sounds good to me

@d-netto d-netto force-pushed the dcn-false-sharing-fun branch from cfa3be2 to 034e070 Compare February 22, 2024 20:49
@d-netto d-netto added backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 labels Feb 22, 2024
This was referenced Feb 26, 2024
KristofferC added a commit that referenced this pull request Feb 27, 2024
Backported PRs:
- [x] #53205 <!-- Profile: add notes to `print()` docs -->
- [x] #53170 <!-- Remove outdated discussion about externally changing
module bindings -->
- [x] #53228 <!-- SubArray: avoid invalid elimination of singleton
indices -->
- [x] #51361 <!-- code_warntype docs: more neutral reference to
@code_warntype -->
- [x] #50480 <!-- Document --heap-size-hint in Command-line Interface
-->
- [x] #53301 <!-- Fix typo in `Sys.total_memory` docstring. -->
- [x] #53354 <!-- fix code coverage bug in tail position and `else` -->
- [x] #53388 <!-- Fix documentation: thread pool of main thread -->
- [x] #53429 <!-- Subtype: skip slow-path in `local_∀_∃_subtype` if
inputs contain no ∃ typevar. -->
- [x] #53437 <!-- Add debug variant of loader_trampolines.o -->

Need manual backport:
- [ ] #52505 <!-- fix alignment of emit_unbox_store copy -->
- [ ] #53373 <!-- fix sysimage-native-code=no option with pkgimages -->
- [ ] #53439 <!-- staticdata: fix assert from partially disabled native
code -->

Contains multiple commits, manual intervention needed:
- [ ] #52913 <!-- Added docstring for Artifacts.jl -->
- [ ] #53218 <!-- Fix interpreter_exec.jl test -->

Non-merged PRs with backport label:
- [ ] #53424 <!-- yet more atomics & cache-line fixes on work-stealing
queue -->
- [ ] #53125 <!-- coverage: count coverage where explicitly requested by
inference only -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
KristofferC added a commit that referenced this pull request Mar 1, 2024
Backported PRs:
- [x] #53361 <!-- 🤖 [master] Bump the SparseArrays stdlib from c9f7293
to cb602d7 -->
- [x] #53300 <!-- allow external absint to hold custom data in
`codeinst.inferred` -->
- [x] #53342 <!-- Add `Base.wrap` to docs -->
- [x] #53372 <!-- Silence warnings in `test/file.jl` -->
- [x] #53357 <!-- 🤖 [master] Bump the Pkg stdlib from 6dd0e7c9e to
76070d295 -->
- [x] #53373 <!-- fix sysimage-native-code=no option with pkgimages -->
- [x] #53333 <!-- More consistent return value for annotations, take 2
-->
- [x] #53354 <!-- fix code coverage bug in tail position and `else` -->
- [x] #53407 <!-- fix sysimage-native-code=yes option -->
- [x] #53388 <!-- Fix documentation: thread pool of main thread -->
- [x] #53355 <!-- Fix synchronization issues on the GC scheduler. -->
- [x] #53429 <!-- Subtype: skip slow-path in `local_∀_∃_subtype` if
inputs contain no ∃ typevar. -->
- [x] #53437 <!-- Add debug variant of loader_trampolines.o -->
- [x] #53284 <!-- Add annotate! method for AnnotatedIOBuffer -->
- [x] #53466 <!-- [MozillaCACerts_jll] Update to v2023-12-12 -->
- [x] #53467 <!-- [LibGit2_jll] Update to v1.7.2 -->
- [x] #53326 <!-- RFC: when loading code for internal purposes, load
stdlib files directly, bypassing DEPOT_PATH, LOAD_PATH, and stale checks
-->
- [x] #53332
- [x] #53320 <!-- Add `Sys.isreadable, Sys.iswriteable`, update `ispath`
-->
- [x] #53476

Contains multiple commits, manual intervention needed:
- [ ] #53285 <!-- Add update mechanism for Terminfo, and common
user-alias data -->

Non-merged PRs with backport label:
- [ ] #53424 <!-- yet more atomics & cache-line fixes on work-stealing
queue -->
- [ ] #53408 <!-- task splitting: change additive accumulation to
multiplicative -->
- [ ] #53403 <!-- Move parallel precompilation to Base -->
- [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` -->
- [ ] #53391 <!-- Default to the medium code model in x86 linux -->
- [ ] #53125 <!-- coverage: count coverage where explicitly requested by
inference only -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->
@KristofferC KristofferC mentioned this pull request Mar 1, 2024
60 tasks
@d-netto d-netto merged commit 973b4ff into master Mar 4, 2024
7 of 9 checks passed
@d-netto d-netto deleted the dcn-false-sharing-fun branch March 4, 2024 16:16
KristofferC pushed a commit that referenced this pull request Mar 6, 2024
This regression at a large number of GC threads still shows up on a
large aarch64 machine (80-core ARM Neoverse N1).

```
bench = "many_refs.jl"
(gcthreads, threads) = (1, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3807 │    2826 │      2633 │        193 │          960 │                13 │      892 │         74 │
│  median │       3810 │    2826 │      2633 │        193 │          961 │                15 │      892 │         74 │
│ maximum │       3810 │    2831 │      2638 │        193 │          962 │                27 │      892 │         74 │
│   stdev │          2 │       3 │         3 │          0 │            1 │                 7 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2455 │    1476 │      1350 │        126 │          488 │                20 │      892 │         60 │
│  median │       2459 │    1478 │      1352 │        126 │          489 │                23 │      892 │         60 │
│ maximum │       2465 │    1479 │      1352 │        126 │          489 │                23 │      893 │         60 │
│   stdev │          5 │       1 │         1 │          0 │            1 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2173 │    1194 │      1115 │         77 │          463 │                18 │      892 │         54 │
│  median │       2187 │    1194 │      1116 │         78 │          463 │                19 │      893 │         55 │
│ maximum │       2217 │    1208 │      1130 │         78 │          463 │                19 │      893 │         55 │
│   stdev │         22 │       8 │         8 │          1 │            0 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2191 │    1212 │      1149 │         63 │          539 │                19 │      901 │         55 │
│  median │       2199 │    1219 │      1156 │         63 │          540 │                20 │      901 │         55 │
│ maximum │       2201 │    1222 │      1157 │         65 │          540 │                20 │      901 │         56 │
│   stdev │          5 │       5 │         4 │          1 │            1 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (16, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3897 │    2916 │      2854 │         60 │         1423 │                20 │      902 │         75 │
│  median │       4012 │    3032 │      2972 │         62 │         1491 │                21 │      904 │         75 │
│ maximum │       4026 │    3106 │      3044 │         62 │         1511 │                22 │      904 │         77 │
│   stdev │         70 │      96 │        96 │          1 │           46 │                 1 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

This patch seems to improve performance at 16 GC threads though we still
have some negative scaling for this benchmark beyond 8 GC threads
(others seem fine). I didn't test whether this is indeed the optimal
struct layout for the queue in this benchmark.

```
bench = "many_refs.jl"
(gcthreads, threads) = (1, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3807 │    2822 │      2630 │        193 │          959 │                15 │      892 │         74 │
│  median │       3824 │    2823 │      2630 │        193 │          959 │                15 │      892 │         74 │
│ maximum │       4307 │    3194 │      2974 │        220 │         1086 │                19 │      892 │         74 │
│   stdev │        284 │     215 │       199 │         16 │           74 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2459 │    1476 │      1349 │        126 │          488 │                19 │      892 │         60 │
│  median │       2460 │    1479 │      1352 │        127 │          488 │                21 │      893 │         60 │
│ maximum │       2770 │    1661 │      1526 │        135 │          570 │                22 │      893 │         60 │
│   stdev │        179 │     106 │       101 │          5 │           47 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2215 │    1233 │      1155 │         77 │          485 │                20 │      892 │         53 │
│  median │       2320 │    1248 │      1171 │         78 │          492 │                21 │      893 │         54 │
│ maximum │       2372 │    1259 │      1176 │         83 │          493 │                22 │      893 │         56 │
│   stdev │         80 │      13 │        11 │          3 │            5 │                 1 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2204 │    1220 │      1156 │         64 │          544 │                23 │      901 │         53 │
│  median │       2362 │    1249 │      1181 │         64 │          553 │                23 │      901 │         53 │
│ maximum │       2374 │    1261 │      1196 │         68 │          558 │                25 │      901 │         55 │
│   stdev │         95 │      21 │        20 │          2 │            7 │                 1 │        0 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (16, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2502 │    1519 │      1458 │         62 │          721 │                22 │      902 │         58 │
│  median │       2511 │    1524 │      1461 │         63 │          728 │                23 │      903 │         61 │
│ maximum │       2664 │    1554 │      1486 │         68 │          741 │                25 │      905 │         61 │
│   stdev │         91 │      19 │        16 │          3 │           10 │                 1 │        2 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

(cherry picked from commit 973b4ff)
KristofferC pushed a commit that referenced this pull request Mar 12, 2024
This regression at a large number of GC threads still shows up on a
large aarch64 machine (80-core ARM Neoverse N1).

```
bench = "many_refs.jl"
(gcthreads, threads) = (1, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3807 │    2826 │      2633 │        193 │          960 │                13 │      892 │         74 │
│  median │       3810 │    2826 │      2633 │        193 │          961 │                15 │      892 │         74 │
│ maximum │       3810 │    2831 │      2638 │        193 │          962 │                27 │      892 │         74 │
│   stdev │          2 │       3 │         3 │          0 │            1 │                 7 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2455 │    1476 │      1350 │        126 │          488 │                20 │      892 │         60 │
│  median │       2459 │    1478 │      1352 │        126 │          489 │                23 │      892 │         60 │
│ maximum │       2465 │    1479 │      1352 │        126 │          489 │                23 │      893 │         60 │
│   stdev │          5 │       1 │         1 │          0 │            1 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2173 │    1194 │      1115 │         77 │          463 │                18 │      892 │         54 │
│  median │       2187 │    1194 │      1116 │         78 │          463 │                19 │      893 │         55 │
│ maximum │       2217 │    1208 │      1130 │         78 │          463 │                19 │      893 │         55 │
│   stdev │         22 │       8 │         8 │          1 │            0 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2191 │    1212 │      1149 │         63 │          539 │                19 │      901 │         55 │
│  median │       2199 │    1219 │      1156 │         63 │          540 │                20 │      901 │         55 │
│ maximum │       2201 │    1222 │      1157 │         65 │          540 │                20 │      901 │         56 │
│   stdev │          5 │       5 │         4 │          1 │            1 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (16, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3897 │    2916 │      2854 │         60 │         1423 │                20 │      902 │         75 │
│  median │       4012 │    3032 │      2972 │         62 │         1491 │                21 │      904 │         75 │
│ maximum │       4026 │    3106 │      3044 │         62 │         1511 │                22 │      904 │         77 │
│   stdev │         70 │      96 │        96 │          1 │           46 │                 1 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

This patch seems to improve performance at 16 GC threads though we still
have some negative scaling for this benchmark beyond 8 GC threads
(others seem fine). I didn't test whether this is indeed the optimal
struct layout for the queue in this benchmark.

```
bench = "many_refs.jl"
(gcthreads, threads) = (1, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3807 │    2822 │      2630 │        193 │          959 │                15 │      892 │         74 │
│  median │       3824 │    2823 │      2630 │        193 │          959 │                15 │      892 │         74 │
│ maximum │       4307 │    3194 │      2974 │        220 │         1086 │                19 │      892 │         74 │
│   stdev │        284 │     215 │       199 │         16 │           74 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2459 │    1476 │      1349 │        126 │          488 │                19 │      892 │         60 │
│  median │       2460 │    1479 │      1352 │        127 │          488 │                21 │      893 │         60 │
│ maximum │       2770 │    1661 │      1526 │        135 │          570 │                22 │      893 │         60 │
│   stdev │        179 │     106 │       101 │          5 │           47 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2215 │    1233 │      1155 │         77 │          485 │                20 │      892 │         53 │
│  median │       2320 │    1248 │      1171 │         78 │          492 │                21 │      893 │         54 │
│ maximum │       2372 │    1259 │      1176 │         83 │          493 │                22 │      893 │         56 │
│   stdev │         80 │      13 │        11 │          3 │            5 │                 1 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2204 │    1220 │      1156 │         64 │          544 │                23 │      901 │         53 │
│  median │       2362 │    1249 │      1181 │         64 │          553 │                23 │      901 │         53 │
│ maximum │       2374 │    1261 │      1196 │         68 │          558 │                25 │      901 │         55 │
│   stdev │         95 │      21 │        20 │          2 │            7 │                 1 │        0 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (16, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2502 │    1519 │      1458 │         62 │          721 │                22 │      902 │         58 │
│  median │       2511 │    1524 │      1461 │         63 │          728 │                23 │      903 │         61 │
│ maximum │       2664 │    1554 │      1486 │         68 │          741 │                25 │      905 │         61 │
│   stdev │         91 │      19 │        16 │          3 │           10 │                 1 │        2 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

(cherry picked from commit 973b4ff)
@KristofferC KristofferC mentioned this pull request Mar 12, 2024
25 tasks
KristofferC added a commit that referenced this pull request Mar 17, 2024
Backported PRs:
- [x] #39071 <!-- Add a lazy `logrange` function and `LogRange` type -->
- [x] #51802 <!-- Allow AnnotatedStrings in log messages -->
- [x] #53369 <!-- Orthogonalize re-indexing for FastSubArrays -->
- [x] #48050 <!-- improve `--heap-size-hint` arg handling -->
- [x] #53482 <!-- add IR encoding for EnterNode -->
- [x] #53499 <!-- Avoid compiler warning about redefining jl_globalref_t
-->
- [x] #53507 <!-- update staled `Core.Compiler.Effects` documentation
-->
- [x] #53408 <!-- task splitting: change additive accumulation to
multiplicative -->
- [x] #53523 <!-- add back an alias for `check_top_bit` -->
- [x] #53377 <!-- add _readdirx for returning more object info gathered
during dir scan -->
- [x] #53525 <!-- fix InteractiveUtils call in Base.runtests on failure
-->
- [x] #53540 <!-- use more efficient `_readdirx` for tab completion -->
- [x] #53545 <!-- use `_readdirx` for `walkdir` -->
- [x] #53551 <!-- revert "Add @create_log_macro for making custom styled
logging macros (#52196)" -->
- [x] #53554 <!-- Always return a value in 1-d circshift! of
abstractarray.jl -->
- [x] #53424 <!-- yet more atomics & cache-line fixes on work-stealing
queue -->
- [x] #53571 <!-- Update Documenter to v1.3 for inventory writing -->
- [x] #53403 <!-- Move parallel precompilation to Base -->
- [x] #53589 <!-- add back `unsafe_convert` to pointer for arrays -->
- [x] #53596 <!-- build: remove extra .a file -->
- [x] #53606 <!-- fix error path in `precompilepkgs` -->
- [x] #53004 <!-- Unexport with, at_with, and ScopedValue from Base -->
- [x] #53629 <!-- typo fix in scoped values docs -->
- [x] #53630 <!-- sroa: Fix incorrect scope counting -->
- [x] #53598 <!-- Use Base parallel precompilation to build stdlibs -->
- [x] #53649 <!-- precompilepkgs: package in boths deps and weakdeps are
in fact only weak -->
- [x] #53671 <!-- Fix bootstrap Base precompile in cross compile
configuration -->
- [x] #52125 <!-- Load Pkg if not already to reinstate missing package
add prompt -->
- [x] #53602 <!-- Handle zero on arrays of unions of number types and
missings -->
- [x] #53516 <!-- permit NamedTuple{<:Any, Union{}} to be created -->
- [x] #53643 <!-- Bump CSL to 1.1.1 to fix libgomp bug -->
- [x] #53679 <!-- move precompile workload back from Base -->
- [x] #53663 <!-- add isassigned methods for reinterpretarray -->
- [x] #53662 <!-- [REPL] fix incorrectly cleared line after completions
accepted -->
- [x] #53611 <!-- Linalg: matprod_dest for Diagonal and adjvec -->
- [x] #53659 <!-- fix #52025, re-allow all implicit pointer casts in
cconvert for Array -->
- [x] #53631 <!-- LAPACK: validate input parameters to throw informative
errors -->
- [x] #53628 <!-- Make some improvements to the Scoped Values
documentation. -->
- [x] #53655 <!-- Change tbaa of ptr_phi to tbaa_value  -->
- [x] #53391 <!-- Default to the medium code model in x86 linux -->
- [x] #53699 <!-- Move `isexecutable, isreadable, iswritable` to
`filesystem.jl` -->
- [x] #41232 <!-- Fix linear indexing for ReshapedArray if the parent
has offset axes -->
- [x] #53527 <!-- Enable analyzegc checks for try catch and fix found
issues -->
- [x] #52092 
- [x] #53682 <!-- Increase build precompilation -->
- [x] #53720 
- [x] #53553 <!-- typeintersect: fix `UnionAll` unaliasing bug caused by
innervars. -->

Contains multiple commits, manual intervention needed:
- [ ] #53305 <!-- Propagate inbounds in isassigned with CartesianIndex
indices -->

Non-merged PRs with backport label:
- [ ] #53736 <!-- fix literal-pow to return the right type when the base
is -1 -->
- [ ] #53707 <!-- Make ScopedValue public -->
- [ ] #53696 <!-- add invokelatest to on_done callback in bracketed
paste -->
- [ ] #53660 <!-- put Logging back in default sysimage -->
- [ ] #53509 <!-- revert moving "creating packages" from Pkg.jl -->
- [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} -->
- [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->
- [ ] #51928 <!-- Styled markdown, with a few tweaks -->
- [ ] #51816 <!-- User-themable stacktraces -->
- [ ] #51811 <!-- Make banner size depend on terminal size -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
@KristofferC KristofferC removed the backport 1.11 Change should be backported to release-1.11 label Mar 18, 2024
mkitti pushed a commit to mkitti/julia that referenced this pull request Apr 13, 2024
…#53424)

This regression at a large number of GC threads still shows up on a
large aarch64 machine (80-core ARM Neoverse N1).

```
bench = "many_refs.jl"
(gcthreads, threads) = (1, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3807 │    2826 │      2633 │        193 │          960 │                13 │      892 │         74 │
│  median │       3810 │    2826 │      2633 │        193 │          961 │                15 │      892 │         74 │
│ maximum │       3810 │    2831 │      2638 │        193 │          962 │                27 │      892 │         74 │
│   stdev │          2 │       3 │         3 │          0 │            1 │                 7 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2455 │    1476 │      1350 │        126 │          488 │                20 │      892 │         60 │
│  median │       2459 │    1478 │      1352 │        126 │          489 │                23 │      892 │         60 │
│ maximum │       2465 │    1479 │      1352 │        126 │          489 │                23 │      893 │         60 │
│   stdev │          5 │       1 │         1 │          0 │            1 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2173 │    1194 │      1115 │         77 │          463 │                18 │      892 │         54 │
│  median │       2187 │    1194 │      1116 │         78 │          463 │                19 │      893 │         55 │
│ maximum │       2217 │    1208 │      1130 │         78 │          463 │                19 │      893 │         55 │
│   stdev │         22 │       8 │         8 │          1 │            0 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2191 │    1212 │      1149 │         63 │          539 │                19 │      901 │         55 │
│  median │       2199 │    1219 │      1156 │         63 │          540 │                20 │      901 │         55 │
│ maximum │       2201 │    1222 │      1157 │         65 │          540 │                20 │      901 │         56 │
│   stdev │          5 │       5 │         4 │          1 │            1 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (16, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3897 │    2916 │      2854 │         60 │         1423 │                20 │      902 │         75 │
│  median │       4012 │    3032 │      2972 │         62 │         1491 │                21 │      904 │         75 │
│ maximum │       4026 │    3106 │      3044 │         62 │         1511 │                22 │      904 │         77 │
│   stdev │         70 │      96 │        96 │          1 │           46 │                 1 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

This patch seems to improve performance at 16 GC threads though we still
have some negative scaling for this benchmark beyond 8 GC threads
(others seem fine). I didn't test whether this is indeed the optimal
struct layout for the queue in this benchmark.

```
bench = "many_refs.jl"
(gcthreads, threads) = (1, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3807 │    2822 │      2630 │        193 │          959 │                15 │      892 │         74 │
│  median │       3824 │    2823 │      2630 │        193 │          959 │                15 │      892 │         74 │
│ maximum │       4307 │    3194 │      2974 │        220 │         1086 │                19 │      892 │         74 │
│   stdev │        284 │     215 │       199 │         16 │           74 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2459 │    1476 │      1349 │        126 │          488 │                19 │      892 │         60 │
│  median │       2460 │    1479 │      1352 │        127 │          488 │                21 │      893 │         60 │
│ maximum │       2770 │    1661 │      1526 │        135 │          570 │                22 │      893 │         60 │
│   stdev │        179 │     106 │       101 │          5 │           47 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2215 │    1233 │      1155 │         77 │          485 │                20 │      892 │         53 │
│  median │       2320 │    1248 │      1171 │         78 │          492 │                21 │      893 │         54 │
│ maximum │       2372 │    1259 │      1176 │         83 │          493 │                22 │      893 │         56 │
│   stdev │         80 │      13 │        11 │          3 │            5 │                 1 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2204 │    1220 │      1156 │         64 │          544 │                23 │      901 │         53 │
│  median │       2362 │    1249 │      1181 │         64 │          553 │                23 │      901 │         53 │
│ maximum │       2374 │    1261 │      1196 │         68 │          558 │                25 │      901 │         55 │
│   stdev │         95 │      21 │        20 │          2 │            7 │                 1 │        0 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (16, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2502 │    1519 │      1458 │         62 │          721 │                22 │      902 │         58 │
│  median │       2511 │    1524 │      1461 │         63 │          728 │                23 │      903 │         61 │
│ maximum │       2664 │    1554 │      1486 │         68 │          741 │                25 │      905 │         61 │
│   stdev │         91 │      19 │        16 │          3 │           10 │                 1 │        2 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```
KristofferC added a commit that referenced this pull request Apr 22, 2024
Backported PRs:
- [x] #50759 <!-- Fix outdated usage of scrubbing for log test failures
-->
- [x] #51830 <!-- Add version string to sysimg triple -->
- [x] #53273 <!-- [REPL] Fix typo in using/import completion -->
- [x] #53499 <!-- Avoid compiler warning about redefining jl_globalref_t
-->
- [x] #53424 <!-- yet more atomics & cache-line fixes on work-stealing
queue -->
- [x] #53596 <!-- build: remove extra .a file -->
- [x] #53516 <!-- permit NamedTuple{<:Any, Union{}} to be created -->
- [x] #53643 <!-- Bump CSL to 1.1.1 to fix libgomp bug -->
- [x] #53655 <!-- Change tbaa of ptr_phi to tbaa_value  -->
- [x] #53391 <!-- Default to the medium code model in x86 linux -->
- [x] #53809 <!-- Add missing GC_POP() in emit_cfunction -->
- [x] #53961 <!-- `LazyString` in `LinearAlgebra.checksquare` error
message -->
- [x] #52913 <!-- Added docstring for Artifacts.jl -->
- [x] #53553 <!-- typeintersect: fix `UnionAll` unaliasing bug caused by
@KristofferC KristofferC mentioned this pull request May 8, 2024
23 tasks
@KristofferC KristofferC removed the backport 1.10 Change should be backported to the 1.10 release label May 8, 2024
Drvi pushed a commit to RelationalAI/julia that referenced this pull request Jun 7, 2024
…#53424)

This regression at a large number of GC threads still shows up on a
large aarch64 machine (80-core ARM Neoverse N1).

```
bench = "many_refs.jl"
(gcthreads, threads) = (1, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3807 │    2826 │      2633 │        193 │          960 │                13 │      892 │         74 │
│  median │       3810 │    2826 │      2633 │        193 │          961 │                15 │      892 │         74 │
│ maximum │       3810 │    2831 │      2638 │        193 │          962 │                27 │      892 │         74 │
│   stdev │          2 │       3 │         3 │          0 │            1 │                 7 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2455 │    1476 │      1350 │        126 │          488 │                20 │      892 │         60 │
│  median │       2459 │    1478 │      1352 │        126 │          489 │                23 │      892 │         60 │
│ maximum │       2465 │    1479 │      1352 │        126 │          489 │                23 │      893 │         60 │
│   stdev │          5 │       1 │         1 │          0 │            1 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2173 │    1194 │      1115 │         77 │          463 │                18 │      892 │         54 │
│  median │       2187 │    1194 │      1116 │         78 │          463 │                19 │      893 │         55 │
│ maximum │       2217 │    1208 │      1130 │         78 │          463 │                19 │      893 │         55 │
│   stdev │         22 │       8 │         8 │          1 │            0 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2191 │    1212 │      1149 │         63 │          539 │                19 │      901 │         55 │
│  median │       2199 │    1219 │      1156 │         63 │          540 │                20 │      901 │         55 │
│ maximum │       2201 │    1222 │      1157 │         65 │          540 │                20 │      901 │         56 │
│   stdev │          5 │       5 │         4 │          1 │            1 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (16, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3897 │    2916 │      2854 │         60 │         1423 │                20 │      902 │         75 │
│  median │       4012 │    3032 │      2972 │         62 │         1491 │                21 │      904 │         75 │
│ maximum │       4026 │    3106 │      3044 │         62 │         1511 │                22 │      904 │         77 │
│   stdev │         70 │      96 │        96 │          1 │           46 │                 1 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

This patch seems to improve performance at 16 GC threads though we still
have some negative scaling for this benchmark beyond 8 GC threads
(others seem fine). I didn't test whether this is indeed the optimal
struct layout for the queue in this benchmark.

```
bench = "many_refs.jl"
(gcthreads, threads) = (1, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3807 │    2822 │      2630 │        193 │          959 │                15 │      892 │         74 │
│  median │       3824 │    2823 │      2630 │        193 │          959 │                15 │      892 │         74 │
│ maximum │       4307 │    3194 │      2974 │        220 │         1086 │                19 │      892 │         74 │
│   stdev │        284 │     215 │       199 │         16 │           74 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2459 │    1476 │      1349 │        126 │          488 │                19 │      892 │         60 │
│  median │       2460 │    1479 │      1352 │        127 │          488 │                21 │      893 │         60 │
│ maximum │       2770 │    1661 │      1526 │        135 │          570 │                22 │      893 │         60 │
│   stdev │        179 │     106 │       101 │          5 │           47 │                 1 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2215 │    1233 │      1155 │         77 │          485 │                20 │      892 │         53 │
│  median │       2320 │    1248 │      1171 │         78 │          492 │                21 │      893 │         54 │
│ maximum │       2372 │    1259 │      1176 │         83 │          493 │                22 │      893 │         56 │
│   stdev │         80 │      13 │        11 │          3 │            5 │                 1 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2204 │    1220 │      1156 │         64 │          544 │                23 │      901 │         53 │
│  median │       2362 │    1249 │      1181 │         64 │          553 │                23 │      901 │         53 │
│ maximum │       2374 │    1261 │      1196 │         68 │          558 │                25 │      901 │         55 │
│   stdev │         95 │      21 │        20 │          2 │            7 │                 1 │        0 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (16, 1)
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
[ Info: Setting GC memory pressure callback
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2502 │    1519 │      1458 │         62 │          721 │                22 │      902 │         58 │
│  median │       2511 │    1524 │      1461 │         63 │          728 │                23 │      903 │         61 │
│ maximum │       2664 │    1554 │      1486 │         68 │          741 │                25 │      905 │         61 │
│   stdev │         91 │      19 │        16 │          3 │           10 │                 1 │        2 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

(cherry picked from commit 973b4ff)
LebedevRI added a commit to LebedevRI/julia that referenced this pull request Oct 25, 2024
… queue (JuliaLang#53424)"

Results in reliable `julia` segfault immediately upon startup.
I don't know what is going on or why it is crashing or what is wrong,
but this is very reproducible and bisection result is reliable.

Notably, does not seem to be happening in `master`.

See the following issue for more details:
Fixes JuliaLang#54560

This reverts commit 53180e4.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants