-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC mark-loop rewrite #47292
GC mark-loop rewrite #47292
Conversation
@nanosoldier |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
The IOError thing is a Pkg bug (which might hide some other real bug). Edit: Actually, a rebase should probably update to a Pkg where it is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments from looking this over with you, but it seems to look good to me as a whole. Really solid work, Diogo.
size_t elsize = ((jl_array_t *)ary8_parent)->elsize / sizeof(jl_value_t *); | ||
#ifndef GC_VERIFY | ||
// Decide whether need to chunk ary8 | ||
size_t nrefs = (ary8_end - ary8_begin) / elsize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we multiply by * (elem_end - elem_begin)
for the number of pointers in each element?
gc_cache->data_stack = (jl_gc_mark_data_t *)malloc_s(init_size * sizeof(jl_gc_mark_data_t)); | ||
|
||
// Initialize GC mark-queue | ||
size_t init_size = (1 << 18); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 MB seems rather large (though 8k does seem a little small)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the chunk-queue shouldn't be large (the number of chunks grows with the depth of graphs of objarrays).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is over 2 MB per thread though, which seems large
f594a8d
to
e82cd9d
Compare
@vtjnash any idea of what could be causing the error in |
Try annotating the
It is a heuristic evaluation tool, so any changes to the code can sometime arbitrarily change what branches it considers and what errors it thinks it might have found. You can run it locally with |
c0cf1ed
to
5bb5bab
Compare
@nanosoldier |
1 similar comment
@nanosoldier |
Hm, not sure why this didn't went through. Let's try again: @nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
80f71bb
to
be4b416
Compare
I'm seeing some regressions after the rebase, so not good to merge yet. |
NVM, I was running with obj pools disabled. |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
Bump |
@@ -117,6 +117,7 @@ typedef intptr_t ssize_t; | |||
#define LLT_FREE(x) free(x) | |||
|
|||
#define STATIC_INLINE static inline | |||
#define FORCE_INLINE static inline __attribute__((always_inline)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already defined in
julia/src/support/MurmurHash3.c
Line 15 in d775750
#define FORCE_INLINE inline __attribute__((always_inline)) |
In file included from /home/mose/repo/julia/src/support/hashing.c:51:
/home/mose/repo/julia/src/support/MurmurHash3.c:15: warning: "FORCE_INLINE" redefined
15 | #define FORCE_INLINE inline __attribute__((always_inline))
|
In file included from /home/mose/repo/julia/src/support/hashing.c:7:
/home/mose/repo/julia/src/support/dtypes.h:120: note: this is the location of the previous definition
120 | #define FORCE_INLINE static inline __attribute__((always_inline))
|
Previous work
Since #21590, the GC mark-loop was implemented by keeping two manually managed stacks: one of which contained iterator states used to keep track of the object currently being marked. As an example, to mark arrays, we would pop the corresponding iterator state from the stack, iterate over the array until we found an unmarked reference, and if so, we would update the iterator state (to reflect the index we left off), "repush" it into the stack and proceed with marking the reference we just found.
This PR
This PR eliminates the need of keeping the iterator states by modifying the object graph traversal code. We keep a single stack of
jl_value_t *
currently being processed. To mark an object, we first pop it from the stack, push all unmarked references into the stack and proceed with marking.I believe this doesn't break any invariant from the generational GC. Indeed, the age bits are set after marking (if the object survived one GC cycle it's moved to the old generation), so this new traversal scheme wouldn't change the fact of whether an object had references to old objects or not. Furthermore, we must not update GC metadata for objects in the
remset
, and we ensure this by callinggc_mark_outrefs
ingc_queue_remset
withmeta_updated
set to 1.Additional advantages
jl_gc_markqueue_t
(from LIFO to FIFO, for example) methods without touching the mark-loop itself, which could enable further exploration on the GC in the future.Since this PR changes the mark-loop graph traversal code, there are some changes in the heap-snapshot, though I'm not familiar with that PR.
Some benchmark results are here: https://hackmd.io/@Idnmfpb3SxK98-OsBtRD5A/H1V6QSzvs.