-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preallocate actually has more (while not by number of bytes) allocations! #22
Comments
Correct, preallocation has more allocations, but they are much smaller. I believe the performance changes are not visiable in small benchmarks. |
See me addition above. Yes, I figured the actual number of allocations are not a big deal, still surprising that the amount of bytes doesn't matter much. [Most useful comment here might be about the tagline?] |
Yeah not just size of memory allcoation, i think this shows up most when you have a large number of largish allocations.
You are probably finding that julia GC is not being triggered. But it is showing how good Julia's allocator is at creating memory |
The D language has
Right, I was trying to exercise it, and yes, seeing GC triggered (might work on your machine to test). Since I've gotten 128 GB in my desktop, I've probably not seen it since... As triggering GC often actually happens on other machines, being able to simulate such an environment would be nice and good for knowing the slowdown, maybe I (or could BenchmarkTools simulate such an environment?) should sometimes time with some VM or some settings (ulimit); prlimit which wasn't too helpful:
|
I think somehow there is a performance regression on Julia 1.5, although I thought JuliaLang/julia#34126 would fix our last a few small allocations due to I'm not sure where the allocation actually happens, @oxinabox any idea how to actually find the allocations? seems quite nasty to read the IR code directly to me. |
I do not have a good way |
This is supposed to do it, but it's not generating |
Ok so this is not a great way to do it, but it's a start...
Now, you can use To get the actual lines, you can do In this case, the ones with actual numbers (allocations) were 32 println(io::IO, xs...) = print(io, xs..., "\n")
48 sort!(p, alg, Perm(ordr,v))
80 b = Vector{T}(undef, length(a))
96 println(xs...) = println(stdout::IO, xs...)
128 y = iterate(itr)
144 show(io, x)
224 collect_to_with_first!(_array_for(typeof(v1), itr.iter, isz), v1, itr, st)
256 y = iterate(itr, st)
432 print(io, x)
448 indexed_iterate(t::Tuple, i::Int, state=1) = (@_inline_meta; (getfield(t, i), i+1))
480 let t = t, val = val; _all(i->val[i] isa fieldtype(t, i), 1:n); end
528 return :($(esc(ex)) ? $(nothing) : throw(AssertionError($msg)))
1456 esc(isa(ex, Expr) ? pushmeta!(ex, :noinline) : ex)
2000 p = similar(Vector{eltype(ax)}, ax)
4000 dest = similar(A, shape) Note that if you run it several times you'll end up with several timestams. You can do In my case, the newest one is
and then ➤ fd "\.mem" | grep 987335 | xargs sort -n | tail -n 20
- "∀ ε > 0, ∃ δ > 0: |x-y| < δ ⇒ |f(x)-f(y)| < ε"
- "εδxyδfxfyε"
- "π"
- "ℵ\\\\0"
- "ℵ\\\\x000"
32 println(io::IO, xs...) = print(io, xs..., "\n")
48 sort!(p, alg, Perm(ordr,v))
80 b = Vector{T}(undef, length(a))
96 println(xs...) = println(stdout::IO, xs...)
128 y = iterate(itr)
144 show(io, x)
224 collect_to_with_first!(_array_for(typeof(v1), itr.iter, isz), v1, itr, st)
256 y = iterate(itr, st)
432 print(io, x)
448 indexed_iterate(t::Tuple, i::Int, state=1) = (@_inline_meta; (getfield(t, i), i+1))
480 let t = t, val = val; _all(i->val[i] isa fieldtype(t, i), 1:n); end
528 return :($(esc(ex)) ? $(nothing) : throw(AssertionError($msg)))
1456 esc(isa(ex, Expr) ? pushmeta!(ex, :noinline) : ex)
2000 p = similar(Vector{eltype(ax)}, ax)
4000 dest = similar(A, shape) |
With julia 1.8, this sample case generates 1k allocations instead of 1 julia> using AutoPreallocation
julia> function f()
x = rand(100)
nothing
end
f (generic function with 1 method)
julia> x, preallocated_f = preallocate(f);
julia> preallocated_f()
julia> @timev preallocated_f()
0.000053 seconds (1.20 k allocations: 24.906 KiB)
elapsed time (ns): 52981
gc time (ns): 0
bytes allocated: 25504
pool allocs: 1196
non-pool GC allocs: 0
minor collections: 0
full collections: 0
julia> @timev f()
0.000004 seconds (1 allocation: 896 bytes)
elapsed time (ns): 4020
gc time (ns): 0
bytes allocated: 896
pool allocs: 1
non-pool GC allocs: 0
minor collections: 0
full collections: 0 |
This can't really be fixed here (except in special case), it needs to be fixed in Cassette or more likely by rewriting this with MixTape.jl |
I don't know much about implementation of this package or Cassette.jl it depends on (or Mixtape.jl). Should some issue be opened at Cassette.jl? Is that getting to be a redundant package? Should this issue be kept open or not planned to be fixed? I'm not rushing anyone, I was just exploring and opened the issue, I'm not (really) a user. Possibly the status should be documented. |
This whole package should be documented as an experiment and largely as a deadend. |
The same here: julia> using Distributions, AutoPreallocation
julia> L = rand(2, 2);
julia> S = 2I + L * L';
julia> d = MvNormal(rand(2), S);
julia> x = rand(2);
julia> logpdf(d, x)
-2.8535848089575633
julia> @btime logpdf($d, $x)
118.435 ns (3 allocations: 176 bytes)
-2.8535848089575633
julia> _, pf = preallocate(logpdf, d, x)
(-2.8535848089575633, preallocate(logpdf, ::FullNormal, ::Vector{Float64}))
julia> @btime $pf($d, $x)
266.003 ns (9 allocations: 272 bytes)
-2.8535848089575633 |
FYI: I was looking into this package, and ran the code from the README, and (at least on Julia 1.5) it has two extra allocations.
[From the tag-line: "Remember what memory we needed last time and use it gain every time after", better to change to "to gain" and finish with a period?]
Since preallocation wasn't clearly (much) faster, I was thinking isn't that the point (and to have less/or not variability for real-time), maybe the timing wasn't reliable, so I checked:
I guess this is also useful for GPUs, some such way needed there?
Is a better example needed to show the usefulness? FYI2, I tried with a larger arrays (and
I think it mostly shows how fast/good Julia's GCis):The text was updated successfully, but these errors were encountered: