You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yeah, mapreduce is known to be slow, #46. We had sped it up at some point, but had to revert (JuliaGPU/GPUArrays.jl#454), and I haven't had the time to revisit.
Adding specializations that use MPS might be a good workaround for the common cases.
Basically, these aren't caused by the mapreduce implementation, but are a consequence of how the ObjectiveC object wrappers are designed (all objects being abstract types resulting in dynamic dispatch everywhere).
For example, with the simplest kernel possible:
julia>f() =@metalidentity(nothing)
f (generic function with 2 methods)
julia>@timef()
0.000177 seconds (55 allocations:1.578 KiB)
Metal.HostKernel{typeof(identity), Tuple{Nothing}}(identity, Metal.MTL.MTLComputePipelineStateInstance (object of type AGXG15XFamilyComputePipeline))
Because of these allocations almost all coming from object instances, they are generally small and thus very fast. As such, I don't think this is a performance issue/priority right now.
I was expecting fewer allocations, and much faster speed for
sum(Vector)
, but I am not sure what to compare it to.The text was updated successfully, but these errors were encountered: