-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
poor performance of the code using sub #5100
Comments
At least part of the problem is that |
Thanks, Simon. @alsam, FYI this would not have been an issue if the SubArrays were passed as an argument to a function that was just the "algorithmic" part of what you were doing. Julia optimizes at the level of functions, so these type inference issues would not be such a problem. Separating the allocation from the algorithm would also facilitate your comparison of Arrays vs SubArrays. |
Sure, thanks Simon for the fix. |
I think Tim meant that you could have "hidden" the type inference issues by dooing the calculations in a separate function. a = sub(...)
do_calculation(a) not a = my_subarray_creation()
#do calculation That way the |
Yeah, sure, got it and see the difference. function farray(T, r::Range1{Int64}...)
dims = map((x) -> length(x), r)
array = Array(T, dims)
sub_indices = map((x) -> -minimum(x) + 2 : maximum(x), r)
sub(array, sub_indices)
end Do you happen to know how to improve perf? I'm quite new to Julia. |
Got it! |
PS. As I guessed this function was not the bottleneck in the computation, but here you have it anyway. That function is very flexible, but unfortunately I can't guess the types of the variables, and apparently julia can't either. If this is a performance critical function in your program, I would consider specializing it.
Unfortunately I don't understand what your function tries to do, but here is an attempt at something better for the 2 argment version. function farray(T, r1::Range1{Int64},r2::Range1{Int64})
array = Array(T, length(r1),length(r2))
sub_indices = [minimum(x)+2 : maximum(x) for x in (r1,r2)]
sub(array, sub_indices[1], sub_indices[2])
end |
Sure, the function farray is not a bottleneck at all as called only thrice for constructing 3 arrays, though when I used it instead of inlined sub calls a serious penalty was observed before I moved computations into a separate function seems due dynamic dispatch as you explained. The function does construction of negative (and zero-based) fortran-like arrays: u = farray(Float64, -2:ncells+1)
x = farray(Float64, 0:ncells)
flux = farray(Float64, 0:ncells) instead of u = Array(Float64, ncells+4)
x = Array(Float64, ncells+1)
flux = Array(Float64, ncells+1) For the latter case index should be shifted in index expressions. I mean u[ic+3] -= (flux[ic+2]-flux[ic+1]) / (x[ic+2]-x[ic+1]) instead of more natural u[ic] -= (flux[ic+1]-flux[ic]) / (x[ic+1]-x[ic]) Arrays with negative/zero bases are used here for ghost cells to accommodate boundary conditions. I'll try with specializations to see a difference. Thanks! |
I have the following synthetic testcase: function mykern(r, s, a, x, y)
const nsteps = 25000
istep = 0
ifirst = minimum(r)
ilast = maximum(r)
while istep < nsteps #&& t < tmax
for i = ifirst : ilast-1
s[i] = a * (x[i+1] - x[i]) / (y[i+1] - y[i])
end
for i = ifirst : ilast
x[i] = a * s[i-1]
end
for i = ifirst : ilast
y[i] = a * x[i-1]
end
for i = ifirst : ilast-1
s[i] = s[i] - (x[i+1] - x[i]) / (y[i+1] - y[i])
end
istep=istep + 1
end
s
end
const siz = 10^4
const a = e^pi - pi^e
s = rand(siz)
x = rand(siz)
y = rand(siz)
s1 = sub(s, 1:(siz - 1))
x1 = sub(x, 1:(siz - 1))
y1 = sub(y, 1:(siz - 1))
s2 = sub(s, 2:siz)
x2 = sub(x, 2:siz)
y2 = sub(y, 2:siz)
s3 = sub(s, 4:siz)
# warming up
# mykern( 2:(siz - 3), s, a, x, y )
# mykern( 2:(siz - 3), s, a, x1, y1 )
# mykern( 3:(siz - 2), s, a, x2, y2 )
# @time mykern( 2:(siz - 3), s, a, x, y )
# the real test
println("follow up: mykern( 2:(siz - 3), s, a, x, y )")
@time mykern( 2:(siz - 3), s, a, x, y )
println("follow up: mykern( 2:(siz - 3), s1, a, x1, y1 )")
@time mykern( 2:(siz - 3), s1, a, x1, y1 )
println("follow up: mykern( 3:(siz - 2), s2, a, x2, y2 )")
@time mykern( 3:(siz - 2), s2, a, x2, y2 )
@time mykern( 3:(siz - 2), s1, a, x2, y2 )
@time mykern( 5:(siz - 5), s3, a, x2, y2 ) and timings for the testcase:
i.e. slowdown of using subs is about 3-4x. On a bigger testcase a slowdown up to 5x was observed. Is it ok? Doesn't it make sense to create a new issue for performance improvement? |
Yes, we'd still like to erase the remaining hit, so please do file a new issue. But now you're getting into territory where the fixes may not come as quickly. If you want to help address it, profiling is the next step. |
Sure, thanks. Submitted a new issue: |
This is a continuation of the #5084
The mail thread:
https://groups.google.com/forum/#!topic/julia-dev/NOF6MA6tb9Y
The code below exposes poor performance:
elapsed time: 38.070276549 seconds (23509047972 bytes allocated)
Array-based version (commented above by ##):
elapsed time: 3.756609233 seconds (246868 bytes allocated)
I.e. sub version has 10x slowdown (the expected slowdown should be less than factor of 2) and enormous number of allocated bytes signals about type inference problems.
Thanks!
The text was updated successfully, but these errors were encountered: