-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yet more atomics & cache-line fixes on work-stealing queue #53424
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Maybe we should do this here Lines 324 to 329 in e42ffa6
heap_target (not sure if it makes too much of a difference because I think we almost always load both of them) and here Lines 1401 to 1408 in 3e6ff3d
Also thos fetch_adds can probably be relaxed |
If the code is going to invalidate several fields, it is probably best for them to be in the same cache line, since then the update of the second one can be almost free (tagging along with the first one in the cache line lock) Anyways, this PR sounds good to me |
vchuravy
approved these changes
Feb 22, 2024
d-netto
force-pushed
the
dcn-false-sharing-fun
branch
from
February 22, 2024 20:49
cfa3be2
to
034e070
Compare
d-netto
added
backport 1.10
Change should be backported to the 1.10 release
backport 1.11
Change should be backported to release-1.11
labels
Feb 22, 2024
This was referenced Feb 26, 2024
Merged
KristofferC
added a commit
that referenced
this pull request
Feb 27, 2024
Backported PRs: - [x] #53205 <!-- Profile: add notes to `print()` docs --> - [x] #53170 <!-- Remove outdated discussion about externally changing module bindings --> - [x] #53228 <!-- SubArray: avoid invalid elimination of singleton indices --> - [x] #51361 <!-- code_warntype docs: more neutral reference to @code_warntype --> - [x] #50480 <!-- Document --heap-size-hint in Command-line Interface --> - [x] #53301 <!-- Fix typo in `Sys.total_memory` docstring. --> - [x] #53354 <!-- fix code coverage bug in tail position and `else` --> - [x] #53388 <!-- Fix documentation: thread pool of main thread --> - [x] #53429 <!-- Subtype: skip slow-path in `local_∀_∃_subtype` if inputs contain no ∃ typevar. --> - [x] #53437 <!-- Add debug variant of loader_trampolines.o --> Need manual backport: - [ ] #52505 <!-- fix alignment of emit_unbox_store copy --> - [ ] #53373 <!-- fix sysimage-native-code=no option with pkgimages --> - [ ] #53439 <!-- staticdata: fix assert from partially disabled native code --> Contains multiple commits, manual intervention needed: - [ ] #52913 <!-- Added docstring for Artifacts.jl --> - [ ] #53218 <!-- Fix interpreter_exec.jl test --> Non-merged PRs with backport label: - [ ] #53424 <!-- yet more atomics & cache-line fixes on work-stealing queue --> - [ ] #53125 <!-- coverage: count coverage where explicitly requested by inference only --> - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility --> - [ ] #51479 <!-- prevent code loading from lookin in the versioned environment when building Julia -->
KristofferC
added a commit
that referenced
this pull request
Mar 1, 2024
Backported PRs: - [x] #53361 <!-- 🤖 [master] Bump the SparseArrays stdlib from c9f7293 to cb602d7 --> - [x] #53300 <!-- allow external absint to hold custom data in `codeinst.inferred` --> - [x] #53342 <!-- Add `Base.wrap` to docs --> - [x] #53372 <!-- Silence warnings in `test/file.jl` --> - [x] #53357 <!-- 🤖 [master] Bump the Pkg stdlib from 6dd0e7c9e to 76070d295 --> - [x] #53373 <!-- fix sysimage-native-code=no option with pkgimages --> - [x] #53333 <!-- More consistent return value for annotations, take 2 --> - [x] #53354 <!-- fix code coverage bug in tail position and `else` --> - [x] #53407 <!-- fix sysimage-native-code=yes option --> - [x] #53388 <!-- Fix documentation: thread pool of main thread --> - [x] #53355 <!-- Fix synchronization issues on the GC scheduler. --> - [x] #53429 <!-- Subtype: skip slow-path in `local_∀_∃_subtype` if inputs contain no ∃ typevar. --> - [x] #53437 <!-- Add debug variant of loader_trampolines.o --> - [x] #53284 <!-- Add annotate! method for AnnotatedIOBuffer --> - [x] #53466 <!-- [MozillaCACerts_jll] Update to v2023-12-12 --> - [x] #53467 <!-- [LibGit2_jll] Update to v1.7.2 --> - [x] #53326 <!-- RFC: when loading code for internal purposes, load stdlib files directly, bypassing DEPOT_PATH, LOAD_PATH, and stale checks --> - [x] #53332 - [x] #53320 <!-- Add `Sys.isreadable, Sys.iswriteable`, update `ispath` --> - [x] #53476 Contains multiple commits, manual intervention needed: - [ ] #53285 <!-- Add update mechanism for Terminfo, and common user-alias data --> Non-merged PRs with backport label: - [ ] #53424 <!-- yet more atomics & cache-line fixes on work-stealing queue --> - [ ] #53408 <!-- task splitting: change additive accumulation to multiplicative --> - [ ] #53403 <!-- Move parallel precompilation to Base --> - [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` --> - [ ] #53391 <!-- Default to the medium code model in x86 linux --> - [ ] #53125 <!-- coverage: count coverage where explicitly requested by inference only --> - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility -->
60 tasks
KristofferC
pushed a commit
that referenced
this pull request
Mar 6, 2024
This regression at a large number of GC threads still shows up on a large aarch64 machine (80-core ARM Neoverse N1). ``` bench = "many_refs.jl" (gcthreads, threads) = (1, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3807 │ 2826 │ 2633 │ 193 │ 960 │ 13 │ 892 │ 74 │ │ median │ 3810 │ 2826 │ 2633 │ 193 │ 961 │ 15 │ 892 │ 74 │ │ maximum │ 3810 │ 2831 │ 2638 │ 193 │ 962 │ 27 │ 892 │ 74 │ │ stdev │ 2 │ 3 │ 3 │ 0 │ 1 │ 7 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (2, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2455 │ 1476 │ 1350 │ 126 │ 488 │ 20 │ 892 │ 60 │ │ median │ 2459 │ 1478 │ 1352 │ 126 │ 489 │ 23 │ 892 │ 60 │ │ maximum │ 2465 │ 1479 │ 1352 │ 126 │ 489 │ 23 │ 893 │ 60 │ │ stdev │ 5 │ 1 │ 1 │ 0 │ 1 │ 2 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (4, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2173 │ 1194 │ 1115 │ 77 │ 463 │ 18 │ 892 │ 54 │ │ median │ 2187 │ 1194 │ 1116 │ 78 │ 463 │ 19 │ 893 │ 55 │ │ maximum │ 2217 │ 1208 │ 1130 │ 78 │ 463 │ 19 │ 893 │ 55 │ │ stdev │ 22 │ 8 │ 8 │ 1 │ 0 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (8, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2191 │ 1212 │ 1149 │ 63 │ 539 │ 19 │ 901 │ 55 │ │ median │ 2199 │ 1219 │ 1156 │ 63 │ 540 │ 20 │ 901 │ 55 │ │ maximum │ 2201 │ 1222 │ 1157 │ 65 │ 540 │ 20 │ 901 │ 56 │ │ stdev │ 5 │ 5 │ 4 │ 1 │ 1 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (16, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3897 │ 2916 │ 2854 │ 60 │ 1423 │ 20 │ 902 │ 75 │ │ median │ 4012 │ 3032 │ 2972 │ 62 │ 1491 │ 21 │ 904 │ 75 │ │ maximum │ 4026 │ 3106 │ 3044 │ 62 │ 1511 │ 22 │ 904 │ 77 │ │ stdev │ 70 │ 96 │ 96 │ 1 │ 46 │ 1 │ 1 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ ``` This patch seems to improve performance at 16 GC threads though we still have some negative scaling for this benchmark beyond 8 GC threads (others seem fine). I didn't test whether this is indeed the optimal struct layout for the queue in this benchmark. ``` bench = "many_refs.jl" (gcthreads, threads) = (1, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3807 │ 2822 │ 2630 │ 193 │ 959 │ 15 │ 892 │ 74 │ │ median │ 3824 │ 2823 │ 2630 │ 193 │ 959 │ 15 │ 892 │ 74 │ │ maximum │ 4307 │ 3194 │ 2974 │ 220 │ 1086 │ 19 │ 892 │ 74 │ │ stdev │ 284 │ 215 │ 199 │ 16 │ 74 │ 2 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (2, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2459 │ 1476 │ 1349 │ 126 │ 488 │ 19 │ 892 │ 60 │ │ median │ 2460 │ 1479 │ 1352 │ 127 │ 488 │ 21 │ 893 │ 60 │ │ maximum │ 2770 │ 1661 │ 1526 │ 135 │ 570 │ 22 │ 893 │ 60 │ │ stdev │ 179 │ 106 │ 101 │ 5 │ 47 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (4, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2215 │ 1233 │ 1155 │ 77 │ 485 │ 20 │ 892 │ 53 │ │ median │ 2320 │ 1248 │ 1171 │ 78 │ 492 │ 21 │ 893 │ 54 │ │ maximum │ 2372 │ 1259 │ 1176 │ 83 │ 493 │ 22 │ 893 │ 56 │ │ stdev │ 80 │ 13 │ 11 │ 3 │ 5 │ 1 │ 1 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (8, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2204 │ 1220 │ 1156 │ 64 │ 544 │ 23 │ 901 │ 53 │ │ median │ 2362 │ 1249 │ 1181 │ 64 │ 553 │ 23 │ 901 │ 53 │ │ maximum │ 2374 │ 1261 │ 1196 │ 68 │ 558 │ 25 │ 901 │ 55 │ │ stdev │ 95 │ 21 │ 20 │ 2 │ 7 │ 1 │ 0 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (16, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2502 │ 1519 │ 1458 │ 62 │ 721 │ 22 │ 902 │ 58 │ │ median │ 2511 │ 1524 │ 1461 │ 63 │ 728 │ 23 │ 903 │ 61 │ │ maximum │ 2664 │ 1554 │ 1486 │ 68 │ 741 │ 25 │ 905 │ 61 │ │ stdev │ 91 │ 19 │ 16 │ 3 │ 10 │ 1 │ 2 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ ``` (cherry picked from commit 973b4ff)
KristofferC
pushed a commit
that referenced
this pull request
Mar 12, 2024
This regression at a large number of GC threads still shows up on a large aarch64 machine (80-core ARM Neoverse N1). ``` bench = "many_refs.jl" (gcthreads, threads) = (1, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3807 │ 2826 │ 2633 │ 193 │ 960 │ 13 │ 892 │ 74 │ │ median │ 3810 │ 2826 │ 2633 │ 193 │ 961 │ 15 │ 892 │ 74 │ │ maximum │ 3810 │ 2831 │ 2638 │ 193 │ 962 │ 27 │ 892 │ 74 │ │ stdev │ 2 │ 3 │ 3 │ 0 │ 1 │ 7 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (2, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2455 │ 1476 │ 1350 │ 126 │ 488 │ 20 │ 892 │ 60 │ │ median │ 2459 │ 1478 │ 1352 │ 126 │ 489 │ 23 │ 892 │ 60 │ │ maximum │ 2465 │ 1479 │ 1352 │ 126 │ 489 │ 23 │ 893 │ 60 │ │ stdev │ 5 │ 1 │ 1 │ 0 │ 1 │ 2 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (4, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2173 │ 1194 │ 1115 │ 77 │ 463 │ 18 │ 892 │ 54 │ │ median │ 2187 │ 1194 │ 1116 │ 78 │ 463 │ 19 │ 893 │ 55 │ │ maximum │ 2217 │ 1208 │ 1130 │ 78 │ 463 │ 19 │ 893 │ 55 │ │ stdev │ 22 │ 8 │ 8 │ 1 │ 0 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (8, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2191 │ 1212 │ 1149 │ 63 │ 539 │ 19 │ 901 │ 55 │ │ median │ 2199 │ 1219 │ 1156 │ 63 │ 540 │ 20 │ 901 │ 55 │ │ maximum │ 2201 │ 1222 │ 1157 │ 65 │ 540 │ 20 │ 901 │ 56 │ │ stdev │ 5 │ 5 │ 4 │ 1 │ 1 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (16, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3897 │ 2916 │ 2854 │ 60 │ 1423 │ 20 │ 902 │ 75 │ │ median │ 4012 │ 3032 │ 2972 │ 62 │ 1491 │ 21 │ 904 │ 75 │ │ maximum │ 4026 │ 3106 │ 3044 │ 62 │ 1511 │ 22 │ 904 │ 77 │ │ stdev │ 70 │ 96 │ 96 │ 1 │ 46 │ 1 │ 1 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ ``` This patch seems to improve performance at 16 GC threads though we still have some negative scaling for this benchmark beyond 8 GC threads (others seem fine). I didn't test whether this is indeed the optimal struct layout for the queue in this benchmark. ``` bench = "many_refs.jl" (gcthreads, threads) = (1, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3807 │ 2822 │ 2630 │ 193 │ 959 │ 15 │ 892 │ 74 │ │ median │ 3824 │ 2823 │ 2630 │ 193 │ 959 │ 15 │ 892 │ 74 │ │ maximum │ 4307 │ 3194 │ 2974 │ 220 │ 1086 │ 19 │ 892 │ 74 │ │ stdev │ 284 │ 215 │ 199 │ 16 │ 74 │ 2 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (2, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2459 │ 1476 │ 1349 │ 126 │ 488 │ 19 │ 892 │ 60 │ │ median │ 2460 │ 1479 │ 1352 │ 127 │ 488 │ 21 │ 893 │ 60 │ │ maximum │ 2770 │ 1661 │ 1526 │ 135 │ 570 │ 22 │ 893 │ 60 │ │ stdev │ 179 │ 106 │ 101 │ 5 │ 47 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (4, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2215 │ 1233 │ 1155 │ 77 │ 485 │ 20 │ 892 │ 53 │ │ median │ 2320 │ 1248 │ 1171 │ 78 │ 492 │ 21 │ 893 │ 54 │ │ maximum │ 2372 │ 1259 │ 1176 │ 83 │ 493 │ 22 │ 893 │ 56 │ │ stdev │ 80 │ 13 │ 11 │ 3 │ 5 │ 1 │ 1 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (8, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2204 │ 1220 │ 1156 │ 64 │ 544 │ 23 │ 901 │ 53 │ │ median │ 2362 │ 1249 │ 1181 │ 64 │ 553 │ 23 │ 901 │ 53 │ │ maximum │ 2374 │ 1261 │ 1196 │ 68 │ 558 │ 25 │ 901 │ 55 │ │ stdev │ 95 │ 21 │ 20 │ 2 │ 7 │ 1 │ 0 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (16, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2502 │ 1519 │ 1458 │ 62 │ 721 │ 22 │ 902 │ 58 │ │ median │ 2511 │ 1524 │ 1461 │ 63 │ 728 │ 23 │ 903 │ 61 │ │ maximum │ 2664 │ 1554 │ 1486 │ 68 │ 741 │ 25 │ 905 │ 61 │ │ stdev │ 91 │ 19 │ 16 │ 3 │ 10 │ 1 │ 2 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ ``` (cherry picked from commit 973b4ff)
KristofferC
added a commit
that referenced
this pull request
Mar 17, 2024
Backported PRs: - [x] #39071 <!-- Add a lazy `logrange` function and `LogRange` type --> - [x] #51802 <!-- Allow AnnotatedStrings in log messages --> - [x] #53369 <!-- Orthogonalize re-indexing for FastSubArrays --> - [x] #48050 <!-- improve `--heap-size-hint` arg handling --> - [x] #53482 <!-- add IR encoding for EnterNode --> - [x] #53499 <!-- Avoid compiler warning about redefining jl_globalref_t --> - [x] #53507 <!-- update staled `Core.Compiler.Effects` documentation --> - [x] #53408 <!-- task splitting: change additive accumulation to multiplicative --> - [x] #53523 <!-- add back an alias for `check_top_bit` --> - [x] #53377 <!-- add _readdirx for returning more object info gathered during dir scan --> - [x] #53525 <!-- fix InteractiveUtils call in Base.runtests on failure --> - [x] #53540 <!-- use more efficient `_readdirx` for tab completion --> - [x] #53545 <!-- use `_readdirx` for `walkdir` --> - [x] #53551 <!-- revert "Add @create_log_macro for making custom styled logging macros (#52196)" --> - [x] #53554 <!-- Always return a value in 1-d circshift! of abstractarray.jl --> - [x] #53424 <!-- yet more atomics & cache-line fixes on work-stealing queue --> - [x] #53571 <!-- Update Documenter to v1.3 for inventory writing --> - [x] #53403 <!-- Move parallel precompilation to Base --> - [x] #53589 <!-- add back `unsafe_convert` to pointer for arrays --> - [x] #53596 <!-- build: remove extra .a file --> - [x] #53606 <!-- fix error path in `precompilepkgs` --> - [x] #53004 <!-- Unexport with, at_with, and ScopedValue from Base --> - [x] #53629 <!-- typo fix in scoped values docs --> - [x] #53630 <!-- sroa: Fix incorrect scope counting --> - [x] #53598 <!-- Use Base parallel precompilation to build stdlibs --> - [x] #53649 <!-- precompilepkgs: package in boths deps and weakdeps are in fact only weak --> - [x] #53671 <!-- Fix bootstrap Base precompile in cross compile configuration --> - [x] #52125 <!-- Load Pkg if not already to reinstate missing package add prompt --> - [x] #53602 <!-- Handle zero on arrays of unions of number types and missings --> - [x] #53516 <!-- permit NamedTuple{<:Any, Union{}} to be created --> - [x] #53643 <!-- Bump CSL to 1.1.1 to fix libgomp bug --> - [x] #53679 <!-- move precompile workload back from Base --> - [x] #53663 <!-- add isassigned methods for reinterpretarray --> - [x] #53662 <!-- [REPL] fix incorrectly cleared line after completions accepted --> - [x] #53611 <!-- Linalg: matprod_dest for Diagonal and adjvec --> - [x] #53659 <!-- fix #52025, re-allow all implicit pointer casts in cconvert for Array --> - [x] #53631 <!-- LAPACK: validate input parameters to throw informative errors --> - [x] #53628 <!-- Make some improvements to the Scoped Values documentation. --> - [x] #53655 <!-- Change tbaa of ptr_phi to tbaa_value --> - [x] #53391 <!-- Default to the medium code model in x86 linux --> - [x] #53699 <!-- Move `isexecutable, isreadable, iswritable` to `filesystem.jl` --> - [x] #41232 <!-- Fix linear indexing for ReshapedArray if the parent has offset axes --> - [x] #53527 <!-- Enable analyzegc checks for try catch and fix found issues --> - [x] #52092 - [x] #53682 <!-- Increase build precompilation --> - [x] #53720 - [x] #53553 <!-- typeintersect: fix `UnionAll` unaliasing bug caused by innervars. --> Contains multiple commits, manual intervention needed: - [ ] #53305 <!-- Propagate inbounds in isassigned with CartesianIndex indices --> Non-merged PRs with backport label: - [ ] #53736 <!-- fix literal-pow to return the right type when the base is -1 --> - [ ] #53707 <!-- Make ScopedValue public --> - [ ] #53696 <!-- add invokelatest to on_done callback in bracketed paste --> - [ ] #53660 <!-- put Logging back in default sysimage --> - [ ] #53509 <!-- revert moving "creating packages" from Pkg.jl --> - [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} --> - [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` --> - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility --> - [ ] #51928 <!-- Styled markdown, with a few tweaks --> - [ ] #51816 <!-- User-themable stacktraces --> - [ ] #51811 <!-- Make banner size depend on terminal size --> - [ ] #51479 <!-- prevent code loading from lookin in the versioned environment when building Julia -->
KristofferC
removed
the
backport 1.11
Change should be backported to release-1.11
label
Mar 18, 2024
mkitti
pushed a commit
to mkitti/julia
that referenced
this pull request
Apr 13, 2024
…#53424) This regression at a large number of GC threads still shows up on a large aarch64 machine (80-core ARM Neoverse N1). ``` bench = "many_refs.jl" (gcthreads, threads) = (1, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3807 │ 2826 │ 2633 │ 193 │ 960 │ 13 │ 892 │ 74 │ │ median │ 3810 │ 2826 │ 2633 │ 193 │ 961 │ 15 │ 892 │ 74 │ │ maximum │ 3810 │ 2831 │ 2638 │ 193 │ 962 │ 27 │ 892 │ 74 │ │ stdev │ 2 │ 3 │ 3 │ 0 │ 1 │ 7 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (2, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2455 │ 1476 │ 1350 │ 126 │ 488 │ 20 │ 892 │ 60 │ │ median │ 2459 │ 1478 │ 1352 │ 126 │ 489 │ 23 │ 892 │ 60 │ │ maximum │ 2465 │ 1479 │ 1352 │ 126 │ 489 │ 23 │ 893 │ 60 │ │ stdev │ 5 │ 1 │ 1 │ 0 │ 1 │ 2 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (4, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2173 │ 1194 │ 1115 │ 77 │ 463 │ 18 │ 892 │ 54 │ │ median │ 2187 │ 1194 │ 1116 │ 78 │ 463 │ 19 │ 893 │ 55 │ │ maximum │ 2217 │ 1208 │ 1130 │ 78 │ 463 │ 19 │ 893 │ 55 │ │ stdev │ 22 │ 8 │ 8 │ 1 │ 0 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (8, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2191 │ 1212 │ 1149 │ 63 │ 539 │ 19 │ 901 │ 55 │ │ median │ 2199 │ 1219 │ 1156 │ 63 │ 540 │ 20 │ 901 │ 55 │ │ maximum │ 2201 │ 1222 │ 1157 │ 65 │ 540 │ 20 │ 901 │ 56 │ │ stdev │ 5 │ 5 │ 4 │ 1 │ 1 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (16, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3897 │ 2916 │ 2854 │ 60 │ 1423 │ 20 │ 902 │ 75 │ │ median │ 4012 │ 3032 │ 2972 │ 62 │ 1491 │ 21 │ 904 │ 75 │ │ maximum │ 4026 │ 3106 │ 3044 │ 62 │ 1511 │ 22 │ 904 │ 77 │ │ stdev │ 70 │ 96 │ 96 │ 1 │ 46 │ 1 │ 1 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ ``` This patch seems to improve performance at 16 GC threads though we still have some negative scaling for this benchmark beyond 8 GC threads (others seem fine). I didn't test whether this is indeed the optimal struct layout for the queue in this benchmark. ``` bench = "many_refs.jl" (gcthreads, threads) = (1, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3807 │ 2822 │ 2630 │ 193 │ 959 │ 15 │ 892 │ 74 │ │ median │ 3824 │ 2823 │ 2630 │ 193 │ 959 │ 15 │ 892 │ 74 │ │ maximum │ 4307 │ 3194 │ 2974 │ 220 │ 1086 │ 19 │ 892 │ 74 │ │ stdev │ 284 │ 215 │ 199 │ 16 │ 74 │ 2 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (2, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2459 │ 1476 │ 1349 │ 126 │ 488 │ 19 │ 892 │ 60 │ │ median │ 2460 │ 1479 │ 1352 │ 127 │ 488 │ 21 │ 893 │ 60 │ │ maximum │ 2770 │ 1661 │ 1526 │ 135 │ 570 │ 22 │ 893 │ 60 │ │ stdev │ 179 │ 106 │ 101 │ 5 │ 47 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (4, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2215 │ 1233 │ 1155 │ 77 │ 485 │ 20 │ 892 │ 53 │ │ median │ 2320 │ 1248 │ 1171 │ 78 │ 492 │ 21 │ 893 │ 54 │ │ maximum │ 2372 │ 1259 │ 1176 │ 83 │ 493 │ 22 │ 893 │ 56 │ │ stdev │ 80 │ 13 │ 11 │ 3 │ 5 │ 1 │ 1 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (8, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2204 │ 1220 │ 1156 │ 64 │ 544 │ 23 │ 901 │ 53 │ │ median │ 2362 │ 1249 │ 1181 │ 64 │ 553 │ 23 │ 901 │ 53 │ │ maximum │ 2374 │ 1261 │ 1196 │ 68 │ 558 │ 25 │ 901 │ 55 │ │ stdev │ 95 │ 21 │ 20 │ 2 │ 7 │ 1 │ 0 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (16, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2502 │ 1519 │ 1458 │ 62 │ 721 │ 22 │ 902 │ 58 │ │ median │ 2511 │ 1524 │ 1461 │ 63 │ 728 │ 23 │ 903 │ 61 │ │ maximum │ 2664 │ 1554 │ 1486 │ 68 │ 741 │ 25 │ 905 │ 61 │ │ stdev │ 91 │ 19 │ 16 │ 3 │ 10 │ 1 │ 2 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ ```
KristofferC
added a commit
that referenced
this pull request
Apr 22, 2024
Backported PRs: - [x] #50759 <!-- Fix outdated usage of scrubbing for log test failures --> - [x] #51830 <!-- Add version string to sysimg triple --> - [x] #53273 <!-- [REPL] Fix typo in using/import completion --> - [x] #53499 <!-- Avoid compiler warning about redefining jl_globalref_t --> - [x] #53424 <!-- yet more atomics & cache-line fixes on work-stealing queue --> - [x] #53596 <!-- build: remove extra .a file --> - [x] #53516 <!-- permit NamedTuple{<:Any, Union{}} to be created --> - [x] #53643 <!-- Bump CSL to 1.1.1 to fix libgomp bug --> - [x] #53655 <!-- Change tbaa of ptr_phi to tbaa_value --> - [x] #53391 <!-- Default to the medium code model in x86 linux --> - [x] #53809 <!-- Add missing GC_POP() in emit_cfunction --> - [x] #53961 <!-- `LazyString` in `LinearAlgebra.checksquare` error message --> - [x] #52913 <!-- Added docstring for Artifacts.jl --> - [x] #53553 <!-- typeintersect: fix `UnionAll` unaliasing bug caused by
KristofferC
removed
the
backport 1.10
Change should be backported to the 1.10 release
label
May 8, 2024
Drvi
pushed a commit
to RelationalAI/julia
that referenced
this pull request
Jun 7, 2024
…#53424) This regression at a large number of GC threads still shows up on a large aarch64 machine (80-core ARM Neoverse N1). ``` bench = "many_refs.jl" (gcthreads, threads) = (1, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3807 │ 2826 │ 2633 │ 193 │ 960 │ 13 │ 892 │ 74 │ │ median │ 3810 │ 2826 │ 2633 │ 193 │ 961 │ 15 │ 892 │ 74 │ │ maximum │ 3810 │ 2831 │ 2638 │ 193 │ 962 │ 27 │ 892 │ 74 │ │ stdev │ 2 │ 3 │ 3 │ 0 │ 1 │ 7 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (2, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2455 │ 1476 │ 1350 │ 126 │ 488 │ 20 │ 892 │ 60 │ │ median │ 2459 │ 1478 │ 1352 │ 126 │ 489 │ 23 │ 892 │ 60 │ │ maximum │ 2465 │ 1479 │ 1352 │ 126 │ 489 │ 23 │ 893 │ 60 │ │ stdev │ 5 │ 1 │ 1 │ 0 │ 1 │ 2 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (4, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2173 │ 1194 │ 1115 │ 77 │ 463 │ 18 │ 892 │ 54 │ │ median │ 2187 │ 1194 │ 1116 │ 78 │ 463 │ 19 │ 893 │ 55 │ │ maximum │ 2217 │ 1208 │ 1130 │ 78 │ 463 │ 19 │ 893 │ 55 │ │ stdev │ 22 │ 8 │ 8 │ 1 │ 0 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (8, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2191 │ 1212 │ 1149 │ 63 │ 539 │ 19 │ 901 │ 55 │ │ median │ 2199 │ 1219 │ 1156 │ 63 │ 540 │ 20 │ 901 │ 55 │ │ maximum │ 2201 │ 1222 │ 1157 │ 65 │ 540 │ 20 │ 901 │ 56 │ │ stdev │ 5 │ 5 │ 4 │ 1 │ 1 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (16, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3897 │ 2916 │ 2854 │ 60 │ 1423 │ 20 │ 902 │ 75 │ │ median │ 4012 │ 3032 │ 2972 │ 62 │ 1491 │ 21 │ 904 │ 75 │ │ maximum │ 4026 │ 3106 │ 3044 │ 62 │ 1511 │ 22 │ 904 │ 77 │ │ stdev │ 70 │ 96 │ 96 │ 1 │ 46 │ 1 │ 1 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ ``` This patch seems to improve performance at 16 GC threads though we still have some negative scaling for this benchmark beyond 8 GC threads (others seem fine). I didn't test whether this is indeed the optimal struct layout for the queue in this benchmark. ``` bench = "many_refs.jl" (gcthreads, threads) = (1, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 3807 │ 2822 │ 2630 │ 193 │ 959 │ 15 │ 892 │ 74 │ │ median │ 3824 │ 2823 │ 2630 │ 193 │ 959 │ 15 │ 892 │ 74 │ │ maximum │ 4307 │ 3194 │ 2974 │ 220 │ 1086 │ 19 │ 892 │ 74 │ │ stdev │ 284 │ 215 │ 199 │ 16 │ 74 │ 2 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (2, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2459 │ 1476 │ 1349 │ 126 │ 488 │ 19 │ 892 │ 60 │ │ median │ 2460 │ 1479 │ 1352 │ 127 │ 488 │ 21 │ 893 │ 60 │ │ maximum │ 2770 │ 1661 │ 1526 │ 135 │ 570 │ 22 │ 893 │ 60 │ │ stdev │ 179 │ 106 │ 101 │ 5 │ 47 │ 1 │ 0 │ 0 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (4, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2215 │ 1233 │ 1155 │ 77 │ 485 │ 20 │ 892 │ 53 │ │ median │ 2320 │ 1248 │ 1171 │ 78 │ 492 │ 21 │ 893 │ 54 │ │ maximum │ 2372 │ 1259 │ 1176 │ 83 │ 493 │ 22 │ 893 │ 56 │ │ stdev │ 80 │ 13 │ 11 │ 3 │ 5 │ 1 │ 1 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (8, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2204 │ 1220 │ 1156 │ 64 │ 544 │ 23 │ 901 │ 53 │ │ median │ 2362 │ 1249 │ 1181 │ 64 │ 553 │ 23 │ 901 │ 53 │ │ maximum │ 2374 │ 1261 │ 1196 │ 68 │ 558 │ 25 │ 901 │ 55 │ │ stdev │ 95 │ 21 │ 20 │ 2 │ 7 │ 1 │ 0 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ (gcthreads, threads) = (16, 1) [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback [ Info: Setting GC memory pressure callback ┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐ │ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │ │ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │ ├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤ │ minimum │ 2502 │ 1519 │ 1458 │ 62 │ 721 │ 22 │ 902 │ 58 │ │ median │ 2511 │ 1524 │ 1461 │ 63 │ 728 │ 23 │ 903 │ 61 │ │ maximum │ 2664 │ 1554 │ 1486 │ 68 │ 741 │ 25 │ 905 │ 61 │ │ stdev │ 91 │ 19 │ 16 │ 3 │ 10 │ 1 │ 2 │ 1 │ └─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ ``` (cherry picked from commit 973b4ff)
LebedevRI
added a commit
to LebedevRI/julia
that referenced
this pull request
Oct 25, 2024
… queue (JuliaLang#53424)" Results in reliable `julia` segfault immediately upon startup. I don't know what is going on or why it is crashing or what is wrong, but this is very reproducible and bisection result is reliable. Notably, does not seem to be happening in `master`. See the following issue for more details: Fixes JuliaLang#54560 This reverts commit 53180e4.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This regression at a large number of GC threads still shows up on a large aarch64 machine (80-core ARM Neoverse N1).
This patch seems to improve performance at 16 GC threads though we still have some negative scaling for this benchmark beyond 8 GC threads (others seem fine). I didn't test whether this is indeed the optimal struct layout for the queue in this benchmark.