-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some errors with interpolated functions #174
Comments
I fixed most of these on master, and will probably tag another release soon. julia> using Tullio, LoopVectorization, OffsetArrays
julia> x = rand(100); k = rand(5);
julia> @tullio y[i] := x[pad(i-a, 3)] * k[a] verbose=1;
┌ Info: symbolic gradients
│ inbody =
│ 2-element Vector{Any}:
│ :(𝛥x[pad(i - a, 3)] = 𝛥x[pad(i - a, 3)] + 𝛥ℛ[i] * conj(k[a]))
└ :(𝛥k[a] = 𝛥k[a] + 𝛥ℛ[i] * conj(x[pad(i - a, 3)]))
┌ Warning: LoopVectorization failed (symbolic gradient)
│ err =
│ LoadError: Don't know how to handle expression.
│ if i - a ∈ (axes)(𝛥x, 1)
│ 𝛥x[i - a] = 𝛥x[i - a] + ℰ𝓍2
│ else
│ (zero)((eltype)(𝛥x))
│ end
│ in expression starting at /home/chriselrod/.julia/packages/Tullio/u3myB/src/macro.jl:1090
└ @ Tullio ~/.julia/packages/Tullio/u3myB/src/macro.jl:1116
┌ Info: threading threshold (from cost = 11)
└ block = 23832
┌ Info: left index ranges
└ i = 3:104
┌ Info: reduction index ranges
└ a = Base.OneTo(5)
[ Info: running LoopVectorization actor
[ Info: running LoopVectorization actor If it's easy for you to turn this into a conditional store: i - a ∈ (axes)(𝛥x, 1) && (𝛥x[i - a] = 𝛥x[i - a] + ℰ𝓍2) Then that should work, but it should be simple to fix in LoopVectorization, too. |
Great, thank you. And yes, easy to chop the unneeded |
Looking further at things like the first example above, I'm getting some wrong answers and am not sure why. I really thought I had this working with LoopVectorization, but my tests were not adequate to track this. Today I can't today find a version on which this works; it's possible that I somehow only checked one-dimensional cases. This is for the gradient of My guess here is that It's also possible I should change the scheme used here entirely, to store the indices at which it attains the maximum on the forward pass. There I also had difficulties but that's another story. using LoopVectorization, Random
# if VERSION < v"1.5"
# using VectorizationBase: SVec, Mask, prevpow2
# const Vec = SVec
# else
using VectorizationBase: Vec, Mask, prevpow2 #, vsum, vany
# end
function grad!(𝛥x, 𝛥ℛ, ℛ, x, 𝒶𝓍i, 𝒶𝓍j)
for i in 𝒶𝓍i
done = 0
for j in 𝒶𝓍j
peak = (ℛ[i] == x[i, j]) && iszero(done)
begin
ℰ𝓍1 = 0.0
ℰ𝓍2 = ifelse(peak, 𝛥ℛ[i], ℰ𝓍1)
𝛥x[i, j] += ℰ𝓍2
end
done += peak
end
end
𝛥x
end
# _onlyone(cond::Bool, seen) = cond && iszero(seen)
_onlyone(cond::Mask{W}) where {W} = Mask{W}(prevpow2(cond.u))
_onlyone(cond::Mask, seen) = _allzero(seen) ? _onlyone(cond) : zero(cond)
# _allzero(seen) = iszero(seen)
_allzero(seen::Vec) = iszero((!iszero(seen)).u) # iszero(vsum(seen))
# _anyone(cond::Bool) = cond
_anyone(cond::Mask) = !iszero(cond.u)
function grad_avx_new!(𝛥x, 𝛥ℛ, ℛ, x, 𝒶𝓍i, 𝒶𝓍j)
@avx for i in 𝒶𝓍i
done = 0
for j in 𝒶𝓍j
peak = _onlyone(ℛ[i] == x[i, j], done)
begin
ℰ𝓍1 = 0.0
ℰ𝓍2 = ifelse(peak, 𝛥ℛ[i], ℰ𝓍1)
𝛥x[i, j] += ℰ𝓍2
end
done += _anyone(peak)
end
end
𝛥x
end
m2 = reshape(shuffle(1:20), 4,5); # no repeats
m2max = vec(maximum(m2, dims=2))
grad!(zeros(4,5), ones(4), m2max, m2, 1:4, 1:5)
sum(ans, dims=2) # [1, 1, 1, 1]
grad_avx_new!(zeros(4,5), ones(4), m2max, m2, 1:4, 1:5)
sum(ans, dims=2) # [0, 0, 1, 0]
# (jl_HftcGW) pkg> st
# Status `/private/var/folders/d0/hrtbsjqj66qf8zm88ycw9_580000gp/T/jl_HftcGW/Project.toml`
# [bdcacae8] LoopVectorization v0.9.9
# [3d5dd08c] VectorizationBase v0.14.4 |
Have you noticed: julia> function grad_avx_new_v2!(𝛥x, 𝛥ℛ, ℛ, x, 𝒶𝓍i, 𝒶𝓍j)
@avx for i in 𝒶𝓍i
for j in 𝒶𝓍j
peak = ℛ[i] == x[i, j]
ℰ𝓍1 = 0.0
ℰ𝓍2 = ifelse(peak, 𝛥ℛ[i], ℰ𝓍1)
𝛥x[i, j] += ℰ𝓍2
end
end
𝛥x
end
grad_avx_new_v2! (generic function with 1 method)
julia> grad_avx_new_v2!(zeros(4,5), ones(4), m2max, m2, 1:4, 1:5)
4×5 Matrix{Float64}:
0.0 1.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0
1.0 0.0 0.0 0.0 0.0
julia> grad!(zeros(4,5), ones(4), m2max, m2, 1:4, 1:5)
4×5 Matrix{Float64}:
0.0 1.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0
1.0 0.0 0.0 0.0 0.0 Of course, we'd get duplicate answers, which you were trying to avoid. I'd use this definition of @inline _onlyone(m::Mask{W}) where {W} = Mask{W}(one(m.u) << trailing_zeros(m.u)) The other one could place the on-mask off the edge of the array. The above definition places the With the rewrite, I'll have LoopVectorization properly track and handle loop carried dependencies, but I can't make promises on when that may happen. |
Oh, and, make LoopVectorization swap the iteration order by transposing the matrices: julia> grad_avx_new!(zeros(5,4)', ones(4), m2max, permutedims(m2)', 1:4, 1:5)
4×5 adjoint(::Matrix{Float64}) with eltype Float64:
0.0 1.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0
1.0 0.0 0.0 0.0 0.0 And the answers are of course suddenly correct, because now the mask goes across columns of the array, where we do really want |
If you want a workaround between now and when LoopVectorization could eventually handle the loop carried dependencies itself, you could pass in something that'd be a |
OK, thanks for your thoughts. Indeed the transposed version is fine, this may be why I missed the bug at first! Will think about the |
Notice: julia> _onlyone(cond::Mask, seen, dummy) = (@show typeof(dummy); _allzero(seen) ? _onlyone(cond) : zero(cond))
_onlyone (generic function with 5 methods)
julia> function grad_avx_new!(𝛥x, 𝛥ℛ, ℛ, x, 𝒶𝓍i, 𝒶𝓍j)
@avx unroll=1 for i in 𝒶𝓍i
done = 0
for j in 𝒶𝓍j
peak = _onlyone(ℛ[i] == x[i, j], done, i)
ℰ𝓍1 = 0.0
ℰ𝓍2 = ifelse(peak, 𝛥ℛ[i], ℰ𝓍1)
𝛥x[i, j] += ℰ𝓍2
done += _anyone(peak)
end
end
𝛥x
end
grad_avx_new! (generic function with 1 method)
julia> grad_avx_new!(zeros(4,5), ones(4), m2max, m2, 1:4, 1:5)
typeof(dummy) = VectorizationBase.MM{8, 1, Int64}
typeof(dummy) = VectorizationBase.MM{8, 1, Int64}
typeof(dummy) = VectorizationBase.MM{8, 1, Int64}
typeof(dummy) = VectorizationBase.MM{8, 1, Int64}
typeof(dummy) = VectorizationBase.MM{8, 1, Int64}
4×5 Matrix{Float64}:
0.0 0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0
julia> grad_avx_new!(zeros(5,4)', ones(4), m2max, permutedims(m2)', 1:4, 1:5)
typeof(dummy) = Int64
typeof(dummy) = Int64
typeof(dummy) = Int64
typeof(dummy) = Int64
4×5 adjoint(::Matrix{Float64}) with eltype Float64:
0.0 1.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0
1.0 0.0 0.0 0.0 0.0 So, this should work: julia> _anyone(cond::Mask, ::Integer) = !iszero(cond.u)
_anyone (generic function with 3 methods)
julia> _anyone(cond::Mask, ::LoopVectorization.VectorizationBase.AbstractSIMD) = !iszero(cond)
_anyone (generic function with 3 methods)
julia> _onlyone(cond::Mask, seen, ::Integer) = _allzero(seen) ? _onlyone(cond) : zero(cond)
_onlyone (generic function with 7 methods)
julia> _onlyone(cond::Mask, seen, ::LoopVectorization.VectorizationBase.AbstractSIMD) = iszero(seen) & cond
_onlyone (generic function with 7 methods)
julia> function grad_avx_new!(𝛥x, 𝛥ℛ, ℛ, x, 𝒶𝓍i, 𝒶𝓍j)
@avx unroll=1 for i in 𝒶𝓍i
done = 0
for j in 𝒶𝓍j
peak = _onlyone(ℛ[i] == x[i, j], done, i)
ℰ𝓍1 = 0.0
ℰ𝓍2 = ifelse(peak, 𝛥ℛ[i], ℰ𝓍1)
𝛥x[i, j] += ℰ𝓍2
done += _anyone(peak, i)
end
end
𝛥x
end
grad_avx_new! (generic function with 1 method)
julia> grad_avx_new!(zeros(4,5), ones(4), m2max, m2, 1:4, 1:5)
4×5 Matrix{Float64}:
0.0 1.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0
1.0 0.0 0.0 0.0 0.0
julia> grad_avx_new!(zeros(5,4)', ones(4), m2max, permutedims(m2)', 1:4, 1:5)
4×5 adjoint(::Matrix{Float64}) with eltype Float64:
0.0 1.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0
1.0 0.0 0.0 0.0 0.0 |
Oh nice, thanks. That's perfect! |
Also, here (at last) is an isolation of the weird bug of this comment mcabbott/Tullio.jl#57 (comment) # Tracker.gradient(x -> sum(@tullio y[i] := x[pad(i,0,2)] avx=true verbose=2), rand(3))[1]
function grad!(𝛥x, 𝛥ℛ, x, 𝒶𝓍i=eachindex(x))
for i = 𝒶𝓍i
(i >= first(axes(𝛥x, 1))) & (i <= last(axes(𝛥x, 1))) && (𝛥x[i] = 𝛥x[i] + 𝛥ℛ[i])
end
𝛥x
end
function grad_avx!(𝛥x, 𝛥ℛ, x, 𝒶𝓍i=eachindex(x))
@avx for i = 𝒶𝓍i
(i >= first(axes(𝛥x, 1))) & (i <= last(axes(𝛥x, 1))) && (𝛥x[i] = 𝛥x[i] + 𝛥ℛ[i])
end
𝛥x
end
function grad_avx_base!(𝛥x, 𝛥ℛ, x, 𝒶𝓍i=eachindex(x))
@avx for i = 𝒶𝓍i
(i >= first(axes(𝛥x, 1))) & (i <= Base.last(axes(𝛥x, 1))) && (𝛥x[i] = 𝛥x[i] + 𝛥ℛ[i])
end
𝛥x
end
@eval function grad_avx_eval!(𝛥x, 𝛥ℛ, x, 𝒶𝓍i=eachindex(x))
@avx for i = 𝒶𝓍i
(i >= $first($axes(𝛥x, 1))) & (i <= $last($axes(𝛥x, 1))) && (𝛥x[i] = 𝛥x[i] + 𝛥ℛ[i])
end
𝛥x
end # LoadError: KeyError: key typeof(first) not found
grad!(zeros(5), ones(5), ones(3)) # [1,1,1,0,0]
grad_avx!(zeros(5), ones(5), ones(3)) # UndefVarError: last not defined
grad_avx_base!(zeros(5), ones(5), ones(3)) # ok
grad_avx_eval!(zeros(5), ones(5), ones(3)) # not defined |
Re that bizarre error, here is a snippet of the generated code: var"##i_loopstop#1398" = last(#= /home/chriselrod/.julia/dev/LoopVectorization/src/reconstruct_loopset.jl:41 =# @inbounds(lb[1]))
LoopVectorization.VectorizationBase.assume(var"##i_loopstop#1398" > 0)
var"##vptr##_𝛥x" = #= /home/chriselrod/.julia/dev/LoopVectorization/src/reconstruct_loopset.jl:120 =# @inbounds(vargs[1])
var"##vptr##_𝛥ℛ" = #= /home/chriselrod/.julia/dev/LoopVectorization/src/reconstruct_loopset.jl:120 =# @inbounds(vargs[2])
var"##vptr##_𝛥x" = #= /home/chriselrod/.julia/dev/LoopVectorization/src/reconstruct_loopset.jl:120 =# @inbounds(vargs[3])
var"##symlicm#1413" = #= /home/chriselrod/.julia/dev/LoopVectorization/src/reconstruct_loopset.jl:194 =# @inbounds(vargs[4])
axes = #= /home/chriselrod/.julia/dev/LoopVectorization/src/reconstruct_loopset.jl:390 =# @inbounds(vargs[5])
last = #= /home/chriselrod/.julia/dev/LoopVectorization/src/reconstruct_loopset.jl:390 =# @inbounds(vargs[6]) LoopVectorization doesn't "know" Maybe I should mangle the names of functions like this. I should also use a similar strategy of passing in arguments for interpolated functions. Currently, I'm just looking them up in a dictionary, so you can only interpolate ones I happened to put in there. |
Handling of interpolated functions is much more robust now. Tagging a new version. |
My next problem here is these: julia> vn = Vec(ntuple(_ -> rand(1:33), VectorizationBase.pick_vector_width_val(Int64))...)
Vec{4,Int64}<24, 21, 9, 14>
julia> mod(vn, 10)
ERROR: mod not defined for Vec{4, Int64}
julia> mod(vn, 1:10)
ERROR: MethodError: no method matching mod(::Vec{4, Int64}, ::UnitRange{Int64})
julia> clamp(vn, 1, 10)
ERROR: TypeError: non-boolean (Mask{4, UInt8}) used in boolean context
julia> clamp(vn, 1:10)
ERROR: MethodError: no method matching clamp(::Vec{4, Int64}, ::UnitRange{Int64}) How do these look as solutions? Trying to copy Base except for using VectorizationBase, IfElse
# clamp(x::X, lo::L, hi::H) where {X, L, H} in Base.Math at math.jl:65
Base.clamp(x::X, lo::L, hi::H) where {X<:Vec, L,H} =
IfElse.ifelse(x > hi, convert(promote_type(X,L,H), hi),
IfElse.ifelse(x < lo,
convert(promote_type(X,L,H), lo),
convert(promote_type(X,L,H), x)))
# clamp(x::Integer, r::AbstractUnitRange{var"#s830"} where var"#s830"<:Integer) in Base.Math at math.jl:110
Base.clamp(x::Vec{<:Any,<:Integer}, r::AbstractUnitRange{<:Integer}) = clamp(x, first(r), last(r))
# div(x::T, y::T, ::RoundingMode{:Down}) where T<:Integer in Base at div.jl:288
# function div(x::T, y::T, ::typeof(RoundDown)) where T<:Integer
# d = div(x, y, RoundToZero)
# return d - (signbit(x ⊻ y) & (d * y != x))
# end
Base.fld(x::Vec{W1,T}, y::Vec{W2,T}) where {W1,W2,T<:Integer} = div(x-2*signbit(x),y)
# mod(x::T, y::T) where T<:Integer in Base at int.jl:266
function Base.mod(x::Vec{W1,T}, y::Vec{W2,T}) where {W1,W2,T<:Integer}
# y == -1 && return T(0) # avoid potential overflow in fld
# return x - fld(x, y) * y
IfElse.ifelse(y == -1, zero(x), x - fld(x, y) * y)
end
# mod(i::Integer, r::AbstractUnitRange{var"#s78"} where var"#s78"<:Integer) in Base at range.jl:1153
Base.mod(i::Vec{<:Any,<:Integer}, r::AbstractUnitRange{<:Integer}) = mod(i-first(r), length(r)) + first(r) |
That's why I wait for you to close your issues. lol Also, note that
julia> using VectorizationBase
julia> vx = Vec(ntuple(i -> 3i-2.0, VectorizationBase.pick_vector_width_val(Float64))...)
Vec{8,Float64}<1.0, 4.0, 7.0, 10.0, 13.0, 16.0, 19.0, 22.0>
julia> promote(vx, 2.3, 1, 4f0)
(Vec{8,Float64}<1.0, 4.0, 7.0, 10.0, 13.0, 16.0, 19.0, 22.0>, Vec{8,Float64}<2.3, 2.3, 2.3, 2.3, 2.3, 2.3, 2.3, 2.3>, Vec{8,Float64}<1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0>, Vec{8,Float64}<4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0>) I.e.: @inline function Base.clamp(_x::AbstractSIMD, _lo, _hi)
x, lo, hi = promote(_x, _lo, _hi)
IfElse.ifelse(x > hi, hi, IfElse.ifelse(x < lo, lo, x))
end I'd also mind the integer literal Please make a PR to |
OK, will tidy up and make a PR. At last! It is weird that |
Am not sure I understand what
|
Yes, it's supposed to work. I just fixed it on VectorizationBase master. It represents a group of vectors, e.g. from unrolling a loop. It can improve out of order execution, will make some aspects of code gen simpler, and will also make it easier to special case optimize some operations, e.g. offset strided loads/stores, such as when loading/storing from an array of structs. |
It's been a month, so I'll close this. We can use a new issue for the next flurry of issues. |
I said I'd try to narrow down some of the errors from mcabbott/Tullio.jl#57. Here's a start:
A
max
gradient:And a convolution:
The text was updated successfully, but these errors were encountered: