-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile: Thread and task-specific profiling #41742
Profile: Thread and task-specific profiling #41742
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Added a heuristic to detect, flag, count and summarize idling as a % per-group. See The same result with no grouping Another example that shows initial idle time on other threads for function foo()
x = rand(1000,1000)
Threads.@threads for i in 1:100
x * x
end
end |
I was surprised when I looked at the code just now and realized this as well. We should make this clear in the docs and maybe track it in a separate issue?
Might be best to split into two PRs? That change is orthogonal and less complicated |
6174b74
to
af1e549
Compare
This comment has been minimized.
This comment has been minimized.
Playing around with some keyword arguments I got the following error, when I tried to combine the groupby with the threads keyword. julia> Profile.print(groupby = :thread, threads=[1,2])
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
ERROR: MethodError: no method matching mod(::Int64, ::Vector{Int64})
Closest candidates are:
mod(::Union{Int128, Int16, Int32, Int64, Int8}, ::Unsigned) at int.jl:272
mod(::Integer, ::Base.OneTo) at range.jl:1401
mod(::Integer, ::AbstractUnitRange{<:Integer}) at range.jl:1402
...
Stacktrace:
[1] _broadcast_getindex_evalf
@ ./broadcast.jl:670 [inlined]
[2] _broadcast_getindex
@ ./broadcast.jl:643 [inlined]
[3] getindex
@ ./broadcast.jl:597 [inlined]
[4] copy
@ ./broadcast.jl:943 [inlined]
[5] materialize
@ ./broadcast.jl:904 [inlined]
[6] _intersect
@ ~/Documents/Programmieren/julia/usr/share/julia/stdlib/v1.8/Profile/src/Profile.jl:225 [inlined]
[7] print(io::Base.TTY, data::Vector{UInt64}, lidict::Dict{UInt64, Vector{Base.StackTraces.StackFrame}}; format::Symbol, C::Bool, combine::Bool, maxdepth::Int64, mincount::Int64, noisefloor::Int64, sortedby::Symbol, groupby::Symbol, recur::Symbol, threads::Vector{Int64}, tasks::UnitRange{UInt64})
@ Profile ~/Documents/Programmieren/julia/usr/share/julia/stdlib/v1.8/Profile/src/Profile.jl:210
[8] print(data::Vector{UInt64}, lidict::Dict{UInt64, Vector{Base.StackTraces.StackFrame}}; kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:groupby, :threads), Tuple{Symbol, Vector{Int64}}}})
@ Profile ~/Documents/Programmieren/julia/usr/share/julia/stdlib/v1.8/Profile/src/Profile.jl:280
[9] top-level scope
@ REPL[34]:1 Also when I try to use
|
Once the suggestion in #41759 is added I'll use that. There wasn't an efficient intersect method for what I needed, so the dispatch is worse currently
Thanks for testing! I haven't done any testing of the flat format yet. I'll move onto that |
784bc7a
to
3ff5012
Compare
3ff5012
to
ddf1cbf
Compare
7131a46
to
b8e9b7a
Compare
I think this is ready for review. I've walked some of the wilder features back to the basics, and retained the default behavior to not split |
3e24577
to
52aff04
Compare
ee431d9
to
1ab855a
Compare
Trying to get to the bottom of the issue on Linux32 as it seems real |
You can also compile for Linux 32bit on Linux 64 by setting the XC_HOST variable I think (whatever we use for mingw) |
It seems that on 32-bit linux some of the ips are entered as See the julia> Profile.clear()
julia> @profile busywait(1, 20)
julia> Profile.fetch()
ERROR: AssertionError: metadata stripping failed i=20 j=0 data[1:i]=UInt32[0xf7f39547, 0xf7f025cb, 0xf7af6a0c, 0xf79f9ae5, 0xe8694f8e, 0xe86c1831, 0xe8875ac0, 0xf799c4f7, 0xf79c4202, 0xf79f952c, 0xf7efb634, 0xf7e0c8f9, 0x00000004, 0xefb94010, 0x247e7c04, 0x00000002, 0x00000000, 0xf7f3a10c, 0xf7dca17e, 0xf7dca2e2]
Stacktrace:
[1] fetch(; include_meta::Bool)
@ Profile /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.8/Profile/src/Profile.jl:519
[2] fetch()
@ Profile /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.8/Profile/src/Profile.jl:494
[3] top-level scope
@ REPL[65]:1
julia> d = Profile.fetch(include_meta = true);
julia> for i in 1:length(d)-1
if d[i + 1] == 0 && !in(d[i], [1,2])
# the entry before a block end is the idle state, which can only be 1 or 2 (given we add 1 to the state val to avoid zero)
# so indicate anything zeros that don't match that pattern
@info i
end
end
[ Info: 9875
[ Info: 36298
[ Info: 38375
[ Info: 62623
julia> d[9850:9900]
51-element Vector{UInt32}:
0xf79f952c
0xf7efb634
0xf7e0c8f9
0x00000002 # threadid
0xefb90010 # taskid
0x3bf946fc # cycleclock
0x00000002 # idle state (+1)
0x00000000 # block end
0xf7f39547
0xf7f025cb
0xf7af6a0c
0xf79f9ae5
0xe8694f8e
0xe86c1831
0xe8875ac0
0xf799c4f7
0xf79c4202
0xf79f952c
0xf7efb634
0xf7e0c8f9
0x00000004 # threadid
0xefb94010 # taskid
0x3bfa0ede # cycleclock
0x00000002 # idle state (+1)
0x00000000 # block end
0xf7dca235
0x00000000 # ???????
0x75c085c0
0x00000001 # threadid
0xefb08010 # taskid
0x3bfb1482 # cycleclock
0x00000001 # idle state (+1)
0x00000000 # block end
0xf7f39547
0xf7f025cb
0xf7af6a0c
0xf79f9ae5
0xe8694f8e
0xe86c1831
0xe8875ac0
0xf799c4f7
0xf79c4202
0xf79f952c
0xf7efb634
0xf7e0c8f9
0x00000002 # threadid
0xefb90010 # taskid
0x3c30e878 # cycleclock
0x00000002 # idle state (+1)
0x00000000 # block end
0xf7f3a10c @vtjnash any idea what's happening here? Btw @vchuravy I couldn't figure this out
|
21a73b4
to
5c7da45
Compare
I added a bandaid that should in most cases ignore rogue 0's when detecting the block ends. |
Can you squash the PR down? |
We can just squash-merge, right? |
- Adds thread and task ids to profile samples - Implements thread and task selection for Profile.print() - Implements thread and task groupby options for Profile.print() - Add include_meta to Profile.fetch() which defaults to false to ensure backwards compat with external profiling tooling - store time of each profile sample (cycleclock) - add sleep_check_state to metadata and show % utilization
- Adds thread and task ids to profile samples - Implements thread and task selection for Profile.print() - Implements thread and task groupby options for Profile.print() - Add include_meta to Profile.fetch() which defaults to false to ensure backwards compat with external profiling tooling - store time of each profile sample (cycleclock) - add sleep_check_state to metadata and show % utilization
Aims to fix #41713
Profile.print(groupby = [:threads, :tasks])
to group first by threads, then tasksor
Profile.print(groupby = :threads)
or:tasks
to group by those respectivelyor default behavior via
Profile.print(groupby = :none)
Todo:
groupby = :threads
groupby = :tasks
groupby = [:threads, :tasks]
Thanks @felixcremer!Profile.fetch
by default now strips the metadata out to ensure backwards compat with external consumers. They can enable and handle when readyEdit: See latest examples below