-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve jl_print_task_backtraces()
#50968
Conversation
c812a9e
to
857e51c
Compare
Why? They are almost never stopped when this code is running. Particularly in GDB, we should assume all threads are running, since that is the default. |
857e51c
to
525bcdb
Compare
Here's my understanding: When you attach Have I got this wrong? |
525bcdb
to
470fe83
Compare
With |
Also, if you can't acquire the tid lock here, that means the contents of memory on that thread is in an inconsistent state, and attempts to unwind it from here may go badly. |
Which required locks? I see We use this Completely open to any suggestions for improvement.
The tid lock (assuming you're talking about this line) cannot be acquired for sticky tasks running on other threads because their tids are never unset. That's the point of this patch. |
The other major alternative to this PR, I think, would be to preempt all the other threads, exactly like the profiler does. Then, once they're all paused, we should be free to safely walk the stacks of all Tasks, right? But i think Kiran's question still stands: what locks would this stackwalking code be grabbing? Why should we need to preempt the threads inside julia's runtime if we have already paused them from GDB? If there's any locks that this code is grabbing, can we use the new, unsafe |
We use `jl_rec_backtrace()` which tries to set the task's tid to the current thread before gathering the backtrace. This will fail for tasks that are sticky to another thread as their tid is never reset. However, for `jl_print_task_backtraces()`, we aren't concerned about thread safety since we assume that all threads are stopped so we add a flag to `jl_rec_backtrace()` to ignore the task's tid. With this, `jl_print_task_backtraces()` should now only miss tasks that are currently executing on threads other than the calling thread.
470fe83
to
4dc3a34
Compare
Okay, I added some comments that make it clear that |
We discussed this some more -- this capability has already been very useful for us so we'd really like to improve it. How would you want to implement this capability @vtjnash, such that no tasks are missed? |
We use
jl_rec_backtrace()
which tries to set the task's tid to the current thread before gathering the backtrace. This will fail for tasks that are sticky to another thread as their tid is never reset. However, forjl_print_task_backtraces()
, we aren't concerned about thread safety since we assume that all threads are stopped so we add a flag tojl_rec_backtrace()
to ignore the task's tid.With this,
jl_print_task_backtraces()
should now only miss tasks that are currently executing on threads other than the calling thread.