Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile.Allocs: Add task and timestamp #44055

Merged
merged 3 commits into from
Feb 12, 2022

Conversation

jpsamaroo
Copy link
Member

@jpsamaroo jpsamaroo commented Feb 6, 2022

This makes it possible to track when and on what task an allocation occurred, similar to what we have for the time-based profiler.

EDIT: The motivation for this change is to support consumers like Dagger.jl, which will want to pair this data with its own collected events (which record nanosecond-resolution timestamps as well as task pointers) and task runtime data from #43453 to create a cohesive view of allocations within a distributed, multithreaded application.

@jpsamaroo
Copy link
Member Author

jpsamaroo commented Feb 6, 2022

One thing I'm not sure about is whether embedding the task as Task will cause issues due to the GC freeing tasks; I'd be happy to change it to Ptr{Cvoid} if that's desired.

@vchuravy
Copy link
Member

vchuravy commented Feb 6, 2022

Yeah I think this is incorrect, don't recall the precise code but I think the allocs buffer is not GC integrated and can thus not contain mutable structs. @vilterp correct me if I am wrong

@jpsamaroo
Copy link
Member Author

Ok latest push switches to Ptr{Cvoid}.

@vilterp
Copy link
Contributor

vilterp commented Feb 6, 2022

Yep @vchuravy is right; nothing retains those task objects. Ptr{Cvoid} should be good if all you want to do is distinguish different tasks, match tasks up to USDT probe outputs, etc… Just don't try to unsafe_load it into an actual Task :)

size
size,
jl_current_task,
cycleclock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to use jl_hrtime aka Base.time_ns? Since the task pointer does not uniquely identify the task (as GC can recycle it), you'd need to correlate the allocation event with the window defined by the creation and finish times of the task to recover the accurate task identity. To join this data with the event data collected on the user-side Julia program (e.g., Dagger), it sounds more useful to use the clock available in Julia. Also, https://en.wikipedia.org/wiki/Time_Stamp_Counter suggests rdtsc used for cycleclock on x86_64 does not seem to work as a global clock with different CPUs.

A bit aside, but I wonder if it's better to have a "true" task ID that is guaranteed be unique within a single process (e.g., prepare const TASKIDS = [1:nthreads();] and then issue TASKIDS[threadid()] += nthreads() in the task constructor in C). Although it's a bit annoying to have 64 more bits if you need it only for profiling...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointers do uniquely identify an object. That is the definition of a pointer

rdtsc is synced by the GC, but we could use the rdtscp version too if we must

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash pointers don't uniquely identify objects that have been freed over a session's entire life, which is something that I'll probably need to deal with eventually.

rdtsc is synced by the GC, but we could use the rdtscp version too if we must

For my understanding, can you point out where this happens in the GC code?

For jl_hrtime, from a quick @btime time_ns(), it seems to take about 1 microsecond, which is not huge, but also not tiny.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I put cycleclock() in there due to the runtime profiler using it as well. Maybe that's OK for that profiler because we get the thread ID that the stack was recorded on? Although if rtdsc is per-CPU core, it can be unreliable as a thread gets scheduled on a different core (which I can account for with BPF probes, but it'd be nice to not have to do that).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the case decades ago for rdtsc, but the hardware has long since been corrected

Copy link
Member

@tkf tkf Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. Does it mean that, with

ref = Threads.Atomic{UInt64}(0)
@spawn ref[] = cycleclock()
@spawn ok = ref[] <= cycleclock()

it is guaranteed that ok === true always if appropriate fences are inserted?1 Also, does it provide a global clock even across CPU sockets?

Anyway, if cycleclock defines a global clock, I agree we can use it for this purpose. But I think we also need to have it at the Julia level so that Julia packages like Dagger can interpret the clock with the event information they collected.

For jl_hrtime, from a quick @btime time_ns(), it seems to take about 1 microsecond, which is not huge, but also not tiny.

Is microsecond a typo? Or is it very system-dependent? It's about 30 ns for me (on amdci2).

Footnotes

  1. https://www.felixcloutier.com/x86/rdtscp and https://www.felixcloutier.com/x86/rdtsc mention when LFENCE and MFENCE are required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's a musl or an AMD thing, but on my system:

6/7> @benchmark time_ns()
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.145 μs …   8.388 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.467 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.481 μs ± 206.673 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▇▆█▆
  ▂▁▂▂▂▂▂▂▃▃▃▇▆████▇▇▃▄▃▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  1.15 μs         Histogram: frequency by time        2.37 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, that's rather slow. So maybe we need jl_cycleclock? It's nice to have a fast "logical" clock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a final decision on this? I'd like for this to make it into 1.8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, maybe we can merge this as-is and consider changing the clock source later?

@vchuravy vchuravy merged commit bfc9431 into JuliaLang:master Feb 12, 2022
@jpsamaroo jpsamaroo deleted the jps/alloc-ns-ct branch February 12, 2022 21:57
@DilumAluthge
Copy link
Member

DilumAluthge commented Feb 13, 2022

This broke analyzegc.

/cache/build/default-amdci5-2/julialang/julia-master/src/gc-alloc-profiler.cpp:140:9: error: Calling potential safepoint from function annotated JL_NOTSAFEPOINT [julia.GCChecker]

@jpsamaroo
Copy link
Member Author

jpsamaroo commented Feb 13, 2022

@DilumAluthge that analyzegc failure looks suspiciously unrelated (pointing to the Random stdlib)? We don't even call floor in the Allocs module. Sorry, was looking at the wrong failure.

antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request Feb 17, 2022
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request Feb 17, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants