-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC safepoint and transition support #14190
Conversation
2bf2e16
to
da7b6e0
Compare
da7b6e0
to
39178b6
Compare
JL_DLLEXPORT void jl_gc_collect(int full) | ||
{ | ||
if (!is_gc_enabled || jl_in_gc) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a statepoint here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Multiple threads entering the GC is handled separately below to avoid running the GC multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't realize jl_in_gc was thread-local now. it seems like that flag should never be true here now in that case?
to return to the original purpose of this flag, it should perhaps be renamed jl_in_finalizers
and moved to only protecting the finalizer queue from task switches? the original intent of this flag was to disallow finalizers from descheduling the thread and switching to running other tasks at unexpected yield points (any allocation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be true in finalizer.
it should perhaps be renamed
jl_in_finalizers
Yeah, that's effectively what it is for now.
This is great. I think we might be able to merge this fairly soon, since even without codegen support it appears to be strictly better than what we have now, where there is effectively a single GC statepoint (plus it doesn't work). |
// Add two arguments, one for the default value of `old_state`, one to satisfy | ||
// ISO C++11 standard. | ||
#define jl_gc_state_set_and_save(...) \ | ||
jl_gc_state_set_and_save__(__VA_ARGS__, 0, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Variadic macros are a new feature in C99. GNU CPP has supported them for a long time, but only with a named variable argument (‘args...’, not ‘...’ and __VA_ARGS__). If you are concerned with portability to previous versions of GCC, you should use only named variable arguments. On the other hand, if you are concerned with portability to other conforming implementations of C99, you should use only __VA_ARGS__."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only question is what does MSVC support and IIRC it supports the __VA_ARGS__
version in c++ mode. @tkelman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in the end this is just some trick to avoid having two names so can easily be removed if we can't support on all compilers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d:/code/msys64/home/Tony/julia/src/gc.c(2480) : error C2660: 'jl_gc_state_set_an
d_save_' : function does not take 3 arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... fine... I can just rename it.........
Yes, it should be mostly safe to merge in it's current state. The missing part for the state point is OSX support and I'm hoping someone more familiar with x64 calling convension and OSX signal handling can implement that ;-). I'm currently working on settling on a GC state transition convention and make sure the the runtime follows that convension (which is time consuming since I need to go through all the functions in As I mentioned in other threads, there's a few other GC fixes that are kind of independent with this PR including locking for write barrier and scanning thread stack correctly. Those should be relatively easy to fix, I'll probably submit a PR for those this weekend but anyone else should feel free to fix those. |
IIUC, the purpose of the GC state transitions is to allow GC inside unmanaged code? I don't really like the idea of needing to insert lots of explicit state transitions all over. I would rather interrupt threads asynchronously and do conservative register scanning. |
// TODO: before we support getting return value from | ||
// the work, we don't need to enter GC managed region | ||
// when starting the work. | ||
int8_t state = jl_gc_managed_enter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this state
be added to the exception state (jl_enter / jl_leave / jl_rethrow)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is on my uncommitted local changes.
i believe the right approach is to return |
I agree that could be simpler if we have a conservative scan already. (The current approach isn't too bad though, and it's nice to see that some functions can actually be run simultaneously with the GC). More importantly, I think it is still possible to pause a thread at a point that can cause corruption. (in the GC allocation functions, for example). I'm also not totally convinced that the GC can have a consistent view of the object if we pause the thread at any time. (e.g. what if we pause a thread in the middle of a buffer |
Another thing I'm worrying is that what would happen if the thread haven't got a change to rerun the segfault'd load before the next GC is triggered? |
it should re-trigger the statepoint, which would be fine. there might be a race condition with testing and setting |
Will this, for example, require switching gc states around every ccall? I would not be ok with that. I think we could stage this by first just merging the new gc waiting mechanism, with minimal state points: only in jl_gc_collect and when entering or exiting threaded regions. Then the statepoints can be slow, we can leave the trickier parts for later, and in the meantime threading will be much more stable. |
No
Me neither. The plan is to run in unmanaged mode by default and only add transitions around critical regions (write barrier and gc frame store) and then merge the critical regions if there's no code in between that has to be running in unmanaged mode (edit: The worst case is of course if you have alternating code that need to be managed and unmanaged and if the source of the store is actually cheap (in which case the conservative scan should perform well). I did a small benchmark to check what the overhead would be with the code below (note that in this case we can hoist the transition out of the loop and only leave the safepoint check in the loop but it should be the worst case we could ever have). The timing on my laptop shows that type Obj{T}
a::T
b::T
end
function f1(n, p1, p2, obj, a, b)
for i in 1:n
obj.a = a
# obj.b = b
end
end
function f2(n, p1, p2, obj, a, b)
for i in 1:n
Base.llvmcall("""
store volatile i8 3, i8* %0
%3 = load volatile i64, i64* %1
ret void
""", Void, Tuple{Ptr{Int8},Ptr{Int64}},
Ptr{Int8}(p1), Ptr{Int64}(p2))
obj.a = a
# obj.b = b
Base.llvmcall("""
store volatile i8 0, i8* %0
ret void
""", Void, Tuple{Ptr{Int8}}, Ptr{Int8}(p1))
end
end
using Benchmarks
a = Ref(1)
b = Ref(1)
obj = Obj(a, b)
b1 = Ref(1)
b2 = Ref(1)
p1 = pointer_from_objref(b1)
p2 = pointer_from_objref(b2)
@show @benchmark f1(10, p1, p2, obj, a, b)
@show @benchmark f2(10, p1, p2, obj, a, b) edit: |
Thanks, I understand it better now. |
486f196
to
635b995
Compare
80b14f3
to
3b00447
Compare
Rebased on top of #14501 . Also fixes other TLS references in the signal handling thread on OSX. |
@@ -194,7 +194,9 @@ class JuliaJITEventListener: public JITEventListener | |||
virtual void NotifyFunctionEmitted(const Function &F, void *Code, | |||
size_t Size, const EmittedFunctionDetails &Details) | |||
{ | |||
int8_t gc_state = jl_gc_state_save_and_set(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 3
should have a name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also for the 1
.
I also change it to 2
instead since I don't think we need it to be a bit flag (which was why I use 3
instead of 2
to begin with).
428d883
to
ac55af5
Compare
Also fix other TLS references on the signal handling thread for OSX. Add comment to the functions it uses so that we don't easily break it.
Rename `jl_in_gc` to `jl_in_finalizer` Allow running GC in finalizers
ac55af5
to
11ff232
Compare
11ff232
to
eed7fea
Compare
Rebased and CI all pass. Given that there's only very minor comments and updates in the past two weeks and people have probably back to work now, I'll merge this sometime later this week. It is very hard to make further progress on threading without merging the current PR's. |
Yes, let's merge this. Amazing work @yuyichao. I have a couple more questions that are not urgent and don't block merging:
|
GC safepoint and transition support
I assumed that the tls buffer is 0 initialized since it's a static variable in C. Should we explicitly initialize that?
Right, that follows from what we use for our runtime lock. I don't like this busy loop either and I think both should be fixed (especially for the cases we are doing expensive stuffs in the runtime). FWIW, the pthread lock performs better than our spin lock on my laptop when I was benchmarking #14451. |
Yes I think it's clearer, just for the sake of reading the init code and wanting to know the initial state. |
Done #14565 |
This is a work in progress to make our GC a stop the world GC rather than a wait for the world (and maybe dead lock?) GC (ok that probably doesn't sound too exciting...).
The first two commits are #14181 which I made as a general improvement to our finalizers handling while trying to figure out how to synchronize finalizers and the GC with threading for this PR.(merged)This is still missing the very important GC transition and safepoint support in codegen but I'm opening this as RFC now to make sure that the GC synchronization schematics I'm following makes sense.
I described the schmatics in the comment in
gc.c
. To summarize, we need GC safe point and GC transitions so that the GC doesn't have to wait for other threads forever (especially if the thread doing the GC is holding a lock). The safepoint is implemented using a volatile load from a special page which wemprotect
toPROT_NONE
during a GC and catch the SegFault in the signal handler. (This is similar to how the Java HotSpot GC does it.) The GC also makes sure (with a combination of lock and safe point) that no other thread can run the GC at the same time or modifying GC related data structures when the GC is running.TODO/request for help:
The segfault handler for OSX is missing. It seems that I can't just wait in the signal handler since it's running on a different thread processing the messages from the kernel in series and it is necessary to do some low level stuff to make sure the thread can return to it's previous state after waiting for the GC to finish. Doing this on a platform that I can't test locally is beyond what I can confidently do so I need help from someone who's more familiar with this to implement it.
Codegen support:
I'm currently planning to emit a GC transition around each critical region (store to object or stack) and allow the GC to run everywhere else. We can do a optimization pass to merge unnecessary GC transitions after the whole function is generated. I'm still deciding what I can assume about GC awareness for a function call and what's the convension of changing the GC state in a function. Note that this shouldn't require the LLVM statepoint support so it should work on LLVM 3.3 as well.
c.c. @vtjnash @kpamnany @carnaval @Keno
Close #13380
Close #14343
Close #10317
TODO for this PR:
managed
/unmanaged
tounsafe
/safe
(Still need to update the comment indone.gc.c
)Easier way to breakpoint on actual segfault(jl_critical_error
should be good enough)TODO that can be done in future PRs:
requires Threading compatible way for getting stack size and address #14473merged)