GC safepoint and transition support #14190

yuyichao · 2015-11-30T04:36:33Z

This is a work in progress to make our GC a stop the world GC rather than a wait for the world (and maybe dead lock?) GC (ok that probably doesn't sound too exciting...). ~~The first two commits are #14181 which I made as a general improvement to our finalizers handling while trying to figure out how to synchronize finalizers and the GC with threading for this PR.~~ (merged)

This is still missing the very important GC transition and safepoint support in codegen but I'm opening this as RFC now to make sure that the GC synchronization schematics I'm following makes sense.

I described the schmatics in the comment in gc.c. To summarize, we need GC safe point and GC transitions so that the GC doesn't have to wait for other threads forever (especially if the thread doing the GC is holding a lock). The safepoint is implemented using a volatile load from a special page which we mprotect to PROT_NONE during a GC and catch the SegFault in the signal handler. (This is similar to how the Java HotSpot GC does it.) The GC also makes sure (with a combination of lock and safe point) that no other thread can run the GC at the same time or modifying GC related data structures when the GC is running.

TODO/request for help:

The segfault handler for OSX is missing. It seems that I can't just wait in the signal handler since it's running on a different thread processing the messages from the kernel in series and it is necessary to do some low level stuff to make sure the thread can return to it's previous state after waiting for the GC to finish. Doing this on a platform that I can't test locally is beyond what I can confidently do so I need help from someone who's more familiar with this to implement it.
Codegen support:

I'm currently planning to emit a GC transition around each critical region (store to object or stack) and allow the GC to run everywhere else. We can do a optimization pass to merge unnecessary GC transitions after the whole function is generated. I'm still deciding what I can assume about GC awareness for a function call and what's the convension of changing the GC state in a function. Note that this shouldn't require the LLVM statepoint support so it should work on LLVM 3.3 as well.

c.c. @vtjnash @kpamnany @carnaval @Keno

Close #13380
Close #14343
Close #10317

TODO for this PR:

SegFault handler for OSX
Rename managed / unmanaged to unsafe / safe
Run finalizers after the GC is finished and allowing GC to run in finalizers ~~(Still need to update the comment in gc.c)~~ done.
~~Easier way to breakpoint on actual segfault~~ (jl_critical_error should be good enough)

TODO that can be done in future PRs:

Make sure the runtime satisfies the GC state transition requirements
Insert safepoints and state transitions in codegen
Optimization pass on state transitions
Fix GC debugging code (stack grub) (~~requires Threading compatible way for getting stack size and address #14473~~ merged)

vtjnash · 2015-12-03T22:01:53Z

src/gc.c

+JL_DLLEXPORT void jl_gc_collect(int full)
+{
+    if (!is_gc_enabled || jl_in_gc)
+        return;


should there be a statepoint here?

No. Multiple threads entering the GC is handled separately below to avoid running the GC multiple times.

i didn't realize jl_in_gc was thread-local now. it seems like that flag should never be true here now in that case?

to return to the original purpose of this flag, it should perhaps be renamed jl_in_finalizers and moved to only protecting the finalizer queue from task switches? the original intent of this flag was to disallow finalizers from descheduling the thread and switching to running other tasks at unexpected yield points (any allocation).

It could be true in finalizer.

it should perhaps be renamed jl_in_finalizers

Yeah, that's effectively what it is for now.

JeffBezanson · 2015-12-03T22:10:30Z

This is great. I think we might be able to merge this fairly soon, since even without codegen support it appears to be strictly better than what we have now, where there is effectively a single GC statepoint (plus it doesn't work).

vtjnash · 2015-12-03T22:15:12Z

src/julia.h

+// Add two arguments, one for the default value of `old_state`, one to satisfy
+// ISO C++11 standard.
+#define jl_gc_state_set_and_save(...)           \
+    jl_gc_state_set_and_save__(__VA_ARGS__, 0, 0)


"Variadic macros are a new feature in C99. GNU CPP has supported them for a long time, but only with a named variable argument (‘args...’, not ‘...’ and __VA_ARGS__). If you are concerned with portability to previous versions of GCC, you should use only named variable arguments. On the other hand, if you are concerned with portability to other conforming implementations of C99, you should use only __VA_ARGS__."

I think the only question is what does MSVC support and IIRC it supports the __VA_ARGS__ version in c++ mode. @tkelman

And in the end this is just some trick to avoid having two names so can easily be removed if we can't support on all compilers.

d:/code/msys64/home/Tony/julia/src/gc.c(2480) : error C2660: 'jl_gc_state_set_an d_save_' : function does not take 3 arguments

... fine... I can just rename it.........

yuyichao · 2015-12-03T22:17:23Z

Yes, it should be mostly safe to merge in it's current state. The missing part for the state point is OSX support and I'm hoping someone more familiar with x64 calling convension and OSX signal handling can implement that ;-).

I'm currently working on settling on a GC state transition convention and make sure the the runtime follows that convension (which is time consuming since I need to go through all the functions in src/).

As I mentioned in other threads, there's a few other GC fixes that are kind of independent with this PR including locking for write barrier and scanning thread stack correctly. Those should be relatively easy to fix, I'll probably submit a PR for those this weekend but anyone else should feel free to fix those.

JeffBezanson · 2015-12-03T22:25:28Z

IIUC, the purpose of the GC state transitions is to allow GC inside unmanaged code? I don't really like the idea of needing to insert lots of explicit state transitions all over. I would rather interrupt threads asynchronously and do conservative register scanning.

vtjnash · 2015-12-03T22:35:28Z

src/threading.c

+                // TODO: before we support getting return value from
+                //       the work, we don't need to enter GC managed region
+                //       when starting the work.
+                int8_t state = jl_gc_managed_enter();


should this state be added to the exception state (jl_enter / jl_leave / jl_rethrow)?

Yes, it is on my uncommitted local changes.

vtjnash · 2015-12-03T23:09:18Z

The segfault handler for OSX is missing. It seems that I can't just wait in the signal handler since it's running on a different thread processing the messages from the kernel in series and it is necessary to do some low level stuff to make sure the thread can return to it's previous state after waiting for the GC to finish. Doing this on a platform that I can't test locally is beyond what I can confidently do so I need help from someone who's more familiar with this to implement it.

i believe the right approach is to return MIG_NO_REPLY from catch_exception_raise (which will leave the thread unscheduled), then call thread_resume to continue running it later after gc is finished.

yuyichao · 2015-12-03T23:11:54Z

I agree that could be simpler if we have a conservative scan already. (The current approach isn't too bad though, and it's nice to see that some functions can actually be run simultaneously with the GC).

More importantly, I think it is still possible to pause a thread at a point that can cause corruption. (in the GC allocation functions, for example). I'm also not totally convinced that the GC can have a consistent view of the object if we pause the thread at any time. (e.g. what if we pause a thread in the middle of a buffer realloc). I'm probably a little pessimistic about this but I think it might be less code to add indeed but takes equal amount of effort to make sure we've covered all the cases (given how many GC roots we've missed in the past...)

yuyichao · 2015-12-03T23:15:24Z

Another thing I'm worrying is that what would happen if the thread haven't got a change to rerun the segfault'd load before the next GC is triggered?

vtjnash · 2015-12-03T23:19:39Z

it should re-trigger the statepoint, which would be fine. there might be a race condition with testing and setting jl_gc_running / ptls->state though

JeffBezanson · 2015-12-03T23:20:35Z

Will this, for example, require switching gc states around every ccall? I would not be ok with that.

I think we could stage this by first just merging the new gc waiting mechanism, with minimal state points: only in jl_gc_collect and when entering or exiting threaded regions. Then the statepoints can be slow, we can leave the trickier parts for later, and in the meantime threading will be much more stable.

yuyichao · 2015-12-04T00:00:38Z

Will this, for example, require switching gc states around every ccall?

No

I would not be ok with that.

Me neither.

The plan is to run in unmanaged mode by default and only add transitions around critical regions (write barrier and gc frame store) and then merge the critical regions if there's no code in between that has to be running in unmanaged mode (edit: ccall of function that doesn't have jl_value_t* argument or return value). This way we only do the expensive transition when there's more expensive operations (note that the source of the store are usually also expensive).

The worst case is of course if you have alternating code that need to be managed and unmanaged and if the source of the store is actually cheap (in which case the conservative scan should perform well). I did a small benchmark to check what the overhead would be with the code below (note that in this case we can hoist the transition out of the loop and only leave the safepoint check in the loop but it should be the worst case we could ever have).

The timing on my laptop shows that f1 is ~10% faster then f2 (no difference if I hoist the transition) when I'm only setting the field a. There's no measurable difference if I set both field.

type Obj{T}
    a::T
    b::T
end

function f1(n, p1, p2, obj, a, b)
    for i in 1:n
        obj.a = a
        # obj.b = b
    end
end

function f2(n, p1, p2, obj, a, b)
    for i in 1:n
        Base.llvmcall("""
                      store volatile i8 3, i8* %0
                      %3 = load volatile i64, i64* %1
                      ret void
                      """, Void, Tuple{Ptr{Int8},Ptr{Int64}},
                      Ptr{Int8}(p1), Ptr{Int64}(p2))
        obj.a = a
        # obj.b = b
        Base.llvmcall("""
                      store volatile i8 0, i8* %0
                      ret void
                      """, Void, Tuple{Ptr{Int8}}, Ptr{Int8}(p1))
    end
end

using Benchmarks

a = Ref(1)
b = Ref(1)
obj = Obj(a, b)
b1 = Ref(1)
b2 = Ref(1)
p1 = pointer_from_objref(b1)
p2 = pointer_from_objref(b2)

@show @benchmark f1(10, p1, p2, obj, a, b)
@show @benchmark f2(10, p1, p2, obj, a, b)

edit:
The value of the volatile store is actually wrong but that doesn't affect the performance.

JeffBezanson · 2015-12-04T00:10:46Z

Thanks, I understand it better now.

yuyichao · 2015-12-30T04:45:55Z

Rebased on top of #14501 . Also fixes other TLS references in the signal handling thread on OSX.

JeffBezanson · 2015-12-30T04:53:27Z

src/debuginfo.cpp

@@ -194,7 +194,9 @@ class JuliaJITEventListener: public JITEventListener
    virtual void NotifyFunctionEmitted(const Function &F, void *Code,
                                       size_t Size, const EmittedFunctionDetails &Details)
    {
+        int8_t gc_state = jl_gc_state_save_and_set(3);


This 3 should have a name.

Done. Also for the 1.

I also change it to 2 instead since I don't think we need it to be a bit flag (which was why I use 3 instead of 2 to begin with).

Also fix other TLS references on the signal handling thread for OSX. Add comment to the functions it uses so that we don't easily break it.

Add tests.

Rename `jl_in_gc` to `jl_in_finalizer` Allow running GC in finalizers

yuyichao · 2016-01-05T17:05:57Z

Rebased and CI all pass. Given that there's only very minor comments and updates in the past two weeks and people have probably back to work now, I'll merge this sometime later this week. It is very hard to make further progress on threading without merging the current PR's.

JeffBezanson · 2016-01-05T17:46:55Z

Yes, let's merge this. Amazing work @yuyichao.

I have a couple more questions that are not urgent and don't block merging:

Where is the initial gc state for the master thread set? I expected to see something in ti_initthread.
It looks like waiting for gc uses busy waiting? Can that be fixed at some point?

GC safepoint and transition support

yuyichao · 2016-01-05T18:41:44Z

Where is the initial gc state for the master thread set? I expected to see something in ti_initthread.

I assumed that the tls buffer is 0 initialized since it's a static variable in C. Should we explicitly initialize that?

It looks like waiting for gc uses busy waiting? Can that be fixed at some point?

Right, that follows from what we use for our runtime lock. I don't like this busy loop either and I think both should be fixed (especially for the cases we are doing expensive stuffs in the runtime). FWIW, the pthread lock performs better than our spin lock on my laptop when I was benchmarking #14451.

JeffBezanson · 2016-01-05T18:53:05Z

Should we explicitly initialize that?

Yes I think it's clearer, just for the sake of reading the init code and wanting to know the initial state.

yuyichao · 2016-01-05T18:59:12Z

Done #14565

yuyichao added multithreading Base.Threads and related functionality GC Garbage collector labels Nov 30, 2015

yuyichao force-pushed the yyc/gc/safepoint branch 4 times, most recently from 2bf2e16 to da7b6e0 Compare December 1, 2015 03:30

yuyichao mentioned this pull request Dec 1, 2015

DLLEXPORT clean up #14219

Merged

yuyichao force-pushed the yyc/gc/safepoint branch from da7b6e0 to 39178b6 Compare December 2, 2015 21:15

This was referenced Dec 3, 2015

Thread safety tracker #10421

Closed

Segfault in threading: complex numbers #13380

Closed

vtjnash reviewed Dec 3, 2015
View reviewed changes

yuyichao force-pushed the yyc/gc/safepoint branch 6 times, most recently from 486f196 to 635b995 Compare December 6, 2015 04:36

yuyichao force-pushed the yyc/gc/safepoint branch from 80b14f3 to 3b00447 Compare December 30, 2015 04:44

JeffBezanson reviewed Dec 30, 2015
View reviewed changes

yuyichao force-pushed the yyc/gc/safepoint branch 2 times, most recently from 428d883 to ac55af5 Compare December 30, 2015 15:33

yuyichao added 9 commits January 4, 2016 14:11

Match the behavior of the atomic intrinsics on different compilers.

ccb791a

GC safepoint and GC transition

d77fb75

Lock based safepoint for mach.

22c143a

Also fix other TLS references on the signal handling thread for OSX. Add comment to the functions it uses so that we don't easily break it.

Preserve GC state across task switch

7066920

Make sure debuginfo won't dead lock during GC

66a19bc

Avoid dead lock due to ti_threadgroup_join

4c33e29

Temporary fix to make sure GC while holding a lock will not dead lock.

7a06413

Add tests.

Thread safe jl_gc_enable

36d3445

Move execution of finalizes out of jl_gc_running = 1.

0a4187d

Rename `jl_in_gc` to `jl_in_finalizer` Allow running GC in finalizers

yuyichao force-pushed the yyc/gc/safepoint branch from ac55af5 to 11ff232 Compare January 4, 2016 19:11

Update debugging tips for gc safepoint and other minor changes.

eed7fea

yuyichao force-pushed the yyc/gc/safepoint branch from 11ff232 to eed7fea Compare January 5, 2016 13:45

JeffBezanson added a commit that referenced this pull request Jan 5, 2016

Merge pull request #14190 from JuliaLang/yyc/gc/safepoint

6cc48dc

GC safepoint and transition support

JeffBezanson merged commit 6cc48dc into master Jan 5, 2016

yuyichao deleted the yyc/gc/safepoint branch January 5, 2016 18:42

yuyichao mentioned this pull request Jan 5, 2016

Explicitly initialize GC state #14565

Merged

yuyichao mentioned this pull request Mar 25, 2016

Freeze in GC for multithreaded code #15620

Closed

vtjnash mentioned this pull request Mar 25, 2016

Possible Segfault With SIGINT And File IO #10990

Closed

yuyichao mentioned this pull request Aug 19, 2018

Fix gc rooting in exception throwing #28770

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC safepoint and transition support #14190

GC safepoint and transition support #14190

yuyichao commented Nov 30, 2015

vtjnash Dec 3, 2015

yuyichao Dec 3, 2015

vtjnash Dec 3, 2015

yuyichao Dec 3, 2015

JeffBezanson commented Dec 3, 2015

vtjnash Dec 3, 2015

yuyichao Dec 3, 2015

yuyichao Dec 3, 2015

tkelman Dec 3, 2015

yuyichao Dec 3, 2015

yuyichao commented Dec 3, 2015

JeffBezanson commented Dec 3, 2015

vtjnash Dec 3, 2015

yuyichao Dec 3, 2015

vtjnash commented Dec 3, 2015

yuyichao commented Dec 3, 2015

yuyichao commented Dec 3, 2015

vtjnash commented Dec 3, 2015

JeffBezanson commented Dec 3, 2015

yuyichao commented Dec 4, 2015

JeffBezanson commented Dec 4, 2015

yuyichao commented Dec 30, 2015

JeffBezanson Dec 30, 2015

yuyichao Dec 30, 2015

yuyichao commented Jan 5, 2016

JeffBezanson commented Jan 5, 2016

yuyichao commented Jan 5, 2016

JeffBezanson commented Jan 5, 2016

yuyichao commented Jan 5, 2016

GC safepoint and transition support #14190

GC safepoint and transition support #14190

Conversation

yuyichao commented Nov 30, 2015

TODO/request for help:

TODO for this PR:

TODO that can be done in future PRs:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeffBezanson commented Dec 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuyichao commented Dec 3, 2015

JeffBezanson commented Dec 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash commented Dec 3, 2015

yuyichao commented Dec 3, 2015

yuyichao commented Dec 3, 2015

vtjnash commented Dec 3, 2015

JeffBezanson commented Dec 3, 2015

yuyichao commented Dec 4, 2015

JeffBezanson commented Dec 4, 2015

yuyichao commented Dec 30, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuyichao commented Jan 5, 2016

JeffBezanson commented Jan 5, 2016

yuyichao commented Jan 5, 2016

JeffBezanson commented Jan 5, 2016

yuyichao commented Jan 5, 2016