-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault with longer SIMD vectors #20961
Comments
On 0.5 we say that the vectors should be 8 byte aligned:
On 0.6, vectors above length 32 are said to be 512 byte aligned (?): 32:
36:
|
I also thought data should be 16 byte aligned for SIMD (32 for AVX). |
Could this be an LLVM codegen problem? FWIW I get a segfault in the constructor, as one might expect from the following (note the large offsets):
The N=32 version looks fine (and seems to work). Output from |
No, the issue is already identified AFAICT. Assembly code isn't very helpful here. |
I stumbled over a similar issue, thus took a look at type sizes and alignment. The following table shows the sizes and alignments for tuples of N VecElements, with N matching the row number, for integers of size 8 to 128 bits. (the table extends to the right). Ran it on today's Julia master. Note, that begining with N>32, some base types lead to a Tuple alignment of 0. Also note, most fall back to the alignment and size of the pure My thought on that would be that the alignment should be the tuple size rounded up to the next power of two, but limited to the cache line size of 64 bytes, and a minimum of the necessary 16 bytes.
However, there also seems to be an issue in LLVM codegen itself. I found this possibly related LLVM bug. Unfortunately, I have no clue where to change the alignment in the depth of the Julia codebase. Hope this helps anyway.
Here's the code to reproduce the above. (Not really a piece of art): using DataFrames
df = DataFrame(n=1:81)
ttype(t,a) = NTuple{a,VecElement{t}}
label(t,s) = Symbol(string(t),'_',s)
for t in [Int8, Int16, Int32, Int64, Int128]
types = ttype.(t,df[:n])
df[label(t,"sizeof")] = sizeof.(types)
end
for t in [Int8, Int16, Int32, Int64, Int128]
types = ttype.(t,df[:n])
df[label(t,"alignm")] = Base.datatype_alignment.(types)
end
delete!(df, :n)
showall(df) |
This is where it is decided if the tuple should be passed as a LLVM Vector or Array: Line 174 in 8341444
seems a bit iffy to me. Also, note that as long as the SIMD vector does not hit global scope, things seem to work ok: using SIMD
@generated function tosimd{N, T}(a::NTuple{N, T})
t = Expr(:tuple, [:(a[$i]) for i in 1:N]...)
ex = :(SIMD.Vec{N, T}($t))
return quote
$(Expr(:meta, :inline))
$ex
end
end
function sum_simd{N,T}(t::NTuple{N, T})
v = tosimd(t)
return sum(v)
end
sum_simd((rand(36)...)) evaluated cleanly. |
It wont work with N other than the ones I listed up top because of: Lines 174 to 185 in 64409a0
Edit: You are always ahead of me @KristofferC ... |
Regardless of the comment you linked refering to LLVM 3.7 & 3.8 crashing, returning the wrong alignment in Line 267 in 64409a0
So probably the C part of Julia should still try to emit the correct code adhering to basic SIMD requirements (16bytes), and the constructors on top of that should warn when an incompatible combination is chosen? |
It seems that the alignment computed in |
@KristofferC Here's one line that applies the special alignment only conditionally if 'not zero' is returned Line 268 in 64409a0
And here's another one: Line 361 in 080dcf4
(and above that line some further restrictions are made when it actually is a simd type) |
Yes I know but for a 32 length Float64 vector, 256 is returned, not 0. |
Have you also seen the definition of MAX_ALIGN at Lines 19 to 33 in 080dcf4
used in julia_alignment Line 1117 in 080dcf4
only if no alignment is defined yet? static unsigned julia_alignment(Value* /*ptr*/, jl_value_t *jltype, unsigned alignment)
{
if (!alignment && ((jl_datatype_t*)jltype)->layout->alignment > MAX_ALIGN) {
// Type's natural alignment exceeds strictest alignment promised in heap, so return the heap alignment.
return MAX_ALIGN;
}
return alignment;
} |
That might explain why the vectors are only 8 byte (64 bit) aligned? |
I would believe so, in particular since the limit is only applied if the other function does not return a valid alignment (meaning non-zero). I think the problem is that Probably these functions should be split into something like Also MAX_ALIGN of 8 byte is definetly wrong. There's nothing that should restrict code to be aligned up to one cache line size of 64 byte = 512 bit. Above that, well, doesn't make too much sense. |
Well, it is definitely possible to align at higher values than 8 bytes, see: #15139 |
Note that the magic constants listed above (8, 16, 64) are valid only for the x86-64 architecture. Other architectures have different cache lines sizes (might be 32 or 128). Also, if a SIMD vector is only 8 bytes long, then it certainly doesn't need 16-byte alignment. And given how LLVM lays out SIMD vectors with odd sizes (it concatenates smaller SIMD vectors), one should round down (sic!) to the next power of two -- a vector of 12 Float32s would consist of a length-8 and a length-4 vector internally. Of course, that rule might change with AVX512, where one can load partial vectors... this will be up to LLVM to decide. I think it's best to ask what LLVM expects instead of second-guessing it, and if LLVM gets it wrong, we need to fix it there. If LLVM continues to get things horribly wrong, then we might want to change our calling convention to match that of C where LLVM doesn't have any problems. |
On const LLVMDataTypeString = String
import Base.VecElement
d = Dict{DataType, LLVMDataTypeString}((
Float64 => "double",
Float32 => "float",
Int32 => "i32",
Int64 => "i64"
))
immutable Vec{N, T}
data::NTuple{N, VecElement{T}}
end
@generated function Base.(:+){N, T}(x::Vec{N, T}, y::Vec{N, T})
exp = """
%3 = fadd <$(N) x $(d[T])> %0, %1
ret <$(N) x $(d[T])> %3
"""
return quote
Vec(Base.llvmcall($exp,
NTuple{N, VecElement{T}},
Tuple{NTuple{N,VecElement{T}},NTuple{N,VecElement{T}}},
x.data, y.data))
end
end
for N in 1:35
a = Vec(([VecElement(rand()) for i in 1:N]...))
try
a+a
println("$N ok")
catch e
println("$N not ok")
end
end
|
While looking into #21918 I noticed https://github.com/llvm-mirror/llvm/blob/836dd8e1f01318ac7f1149d399ce36b064404cb4/lib/IR/DataLayout.cpp#L507-L514 which is the part |
Fixed by #21980 (and tests where added in that PR) |
Upstream recently indicated that odd vector lengths are now fixed https://bugs.llvm.org/show_bug.cgi?id=27708#c5 so we probably can remove that extra codepath if we backport those changes |
The following:
works for N = (1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 16, 17, 18, 20, 24, 32, 33, 34, 36, 40, 48, 64, 65, 66, 68, 72, 80, 96) (tested up to N = 100) on v0.5. On master the above code segfaults for all N above 32 (e.g. N = 33, 34, 36, 40, 48, 64, 65, 66, 68, 72, 80, 96). The generated code look okay(?). Example with N = 36:
Actually, doing anything with the
Vec
, like showing, also segfaults:The text was updated successfully, but these errors were encountered: