-
Notifications
You must be signed in to change notification settings - Fork 65
Add some nospecialize and a few precompiles #502
Conversation
By the way, if you want to more systematically find methods that have lots of specializations, this is quite easy: using MethodAnalysis
nmis = Pair{Method,Int}[]
julia> visit(AbstractPlotting) do item
if item isa Method
push!(nmis, item=>length(methodinstances(item)))
return false
end
true
end
julia> sort!(nmis; by=last) After just running Note that specializations count inference, but not all of these get compiled (xref JuliaLang/julia#35131). That said, the inference is costing you a lot. |
In conjunction with #503, it's worth asking whether we can find a much more comprehensive set of precompiles. That, perhaps together with some improvements to inference, might get rid of some of the incentive to have so many |
Example: julia> methodinstances(boundingbox)
15-element Vector{Core.MethodInstance}:
MethodInstance for boundingbox(::Scene)
MethodInstance for boundingbox(::String, ::Any, ::Any, ::Any, ::Any, ::Any, ::StaticArrays.SMatrix{4, 4, Float32, 16}, ::Any, ::Any)
MethodInstance for boundingbox(::String, ::Vector{Point{3, Float32}}, ::Vector{Float32}, ::Vector{FreeTypeAbstraction.FTFont}, ::Vec{2, Float32}, ::Vector{Quaternionf0}, ::StaticArrays.SMatrix{4, 4, Float32, 16}, ::Float64, ::Float64)
MethodInstance for boundingbox(::String, ::Any, ::Any, ::Any, ::Any, ::Any, ::StaticArrays.SMatrix{4, 4, _A, 16} where _A, ::Any, ::Any)
MethodInstance for boundingbox(::AbstractPlotting.Text{Tuple{String}}, ::String)
MethodInstance for boundingbox(::AbstractPlotting.Text{ArgType} where ArgType, ::String)
MethodInstance for boundingbox(::AbstractPlotting.Text{var"#s194"} where var"#s194"<:Tuple{Arg1}, ::String)
MethodInstance for boundingbox(::AbstractPlotting.Text{Tuple{String}})
MethodInstance for boundingbox(::AbstractPlotting.Text{ArgType} where ArgType)
MethodInstance for boundingbox(::AbstractPlotting.Text{var"#s194"} where var"#s194"<:Tuple{Arg1})
MethodInstance for boundingbox(::Scatter{Tuple{Vector{Point{2, Float32}}}})
MethodInstance for boundingbox(::LineSegments{Tuple{Vector{Point{2, Float32}}}})
MethodInstance for boundingbox(::Atomic{Arg} where Arg)
MethodInstance for boundingbox(::Axis2D{Tuple{Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}}})
MethodInstance for boundingbox(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}) Ideally, the non-concrete methods just wouldn't be here. At least for I'm not quite sure what to think here. I understand why you use types and type-parameters to control your dispatch---I don't yet have a proposal for keeping your code nicely organized without it---but wow is it bad for your latency.
But sometimes that doesn't help; abstract inference is more expensive than concrete inference. Maybe the only way to fix it is a long slog of patient analysis on individual cases. |
Ah, wait, one more important tool I've not explored or mentioned: for plot in scene.plots # Scene.plots is a Vector{AbstractPlot}, hence the call on `plot` can't be inferred
Base.invokelatest(foo, plot)
end might be better than for plot in scene.plots # Scene.plots is a Vector{AbstractPlot}, hence the call on `plot` can't be inferred
foo(plot)
end because it would force concrete inference of the |
Wow, thank you so much :)
I plan to refactor that and use way less type parameters! |
Interesting. But I do think the Not to say it doesn't need refactoring, but I am just acknowledging that reducing the latency here is subtle and may take some experimentation. |
Is there a way forward to merge this? :) |
Let's hold it a little longer. There is a whole new generation of analysis tools coming:
Once those merge, and if you don't have big refactors you're partway through, then I think it would make sense to do the analysis here from scratch. The new tools should be much better at finding the "best precompilable unit" and give a clearer sense of where your inference latency is coming from. |
I am pretty sure there are more effective interventions possible, and more easily discoverable now with the new |
This, together with some companion PRs in several other packages, shaves about 30s off the time to run the tests in this package (from 214s to 187s). Its effect on TTFP is fairly minimal (maybe shaves off 2.5s, which appears to be bigger than the noise but not massively so).
A few comments are in order. The main trick here is to add
@nospecialize
. There is some risk that this could lead to a runtime penalty, but I don't know these packages well enough to say. Consequently, this is something that is better analyzed by people more knowledgeable than I. Of course lack of specialization will only affect methods that do "real work," and from what I could tell most/all of these just call the things that do the real work. But I could easily be mistaken.Second, I added a few precompiles. I typically add an
@assert
in front of these so that I know whether they have gone stale (ifprecompile
can't find the method from the supplied signature, it just returnsfalse
rather than errorring). Feel free to strip the@assert
s if you prefer to not be bothered due to API refactoring. However, this revealed something quite interesting: some of your signatures simply can't be precompiled. I left a note in one of these packages about the ultimate origin. I did not poke at this quite long enough to figure out why it doesn't work.Third, there are quite a lot of anonymous functions that would be nice to precompile. I've marked many of them, but to make that work robustly you'd actually have to split them out and name them. Alternatively, perhaps one could insert
@nospecialize
s in thosedo
blocks too? (I've never tried.)Finally, I only tested on Julia 1.6; on 1.5, you might get so much invalidation that these precompiles won't help. But the
@nospecialize
should be useful regardless.At the end,
display(plot(rand(5)))
was still about half inference time. That suggests there is probably significantly more progress to be made. Consequently, let me summarize my workflow. I useSnoopCompileCore
's@snoopi
, but these days I almost never run the "automated" precompile generation stuff. Instead, I look at the results, think about them, and then intervene manually. I only worry about the big ones, say with an inference time bigger than 100ms, since for now you still have some really expensive-to-infer calls.To detect excessive specialization, here are some good tricks. Let's say you've collected the
@snoopi
results to a variabletinf::Vector{Tuple{Float64,MethodInstance}}
. That list (which is sorted in order of inference time) shows you the "bad actors" for specific types, but sometimes if you're inferring dozens of specializations then even a cheaper-to-infer method can add up. Consequently a nice trick (which I discovered while looking at this package) iswhich aggregates all the MethodInstances associated with a single Method. If you see some methods that are "bad actors," then you can use
filter
to pull all the correspondingMethodInstance
s out oftinf
; if there are a lot of them, consider adding a@nospecialize
.Finally, in the long run you should consider whether you need to specify as much via type parameters as you do. Some of the structs have abstract fields, and in such cases you're not going to be able to infer your dispatches anyway, so the specialization is only costing you. An example of a package where I've successfully defeated the overhead of excessive specialization is FileIO; it used to be horrible for latency, but I've finally essentially disabled all the relevant specialization (see a sequence of PRs of mine over the last several months) and it's much lighter weight now. But a good alternative is to not use so many type parameters for things. For example, in FileIO with 20/20 hindsight we could have defined a file format to have a
::Module
or::Vector{Module}
field, and then used something likeand had only one format type (instead of one type per file format).