Excessive LLVM time in egal codegen of large struct #54109

Keno · 2024-04-17T00:36:46Z

This is similar to #44998, in that LLVM's SLPVectorizer is involved, but I think it's easier to solve by tweaking the codegen for egal:

struct DefaultOr{T}
   x::T
   default::Bool
end

@eval struct Torture
    $((Expr(:(::), Symbol("x$i"), DefaultOr{Float64}) for i = 1:897)...)
end

egal_any(x::Torture, y::Any) = x === y

julia> @time code_llvm(egal_any, Tuple{Torture, Any})
 22.034327 seconds (5.48 M allocations: 206.847 MiB, 0.40% gc time, 88.69% compilation time: <1% of which was recompilation)

The text was updated successfully, but these errors were encountered:

vtjnash · 2024-04-17T00:44:19Z

I think Oscar was proposing making this code more branch-y, which should help defeat the vectorizer. All those undef padding bits otherwise get in the way of doing simple loops over the bits

gbaraldi · 2024-04-17T00:49:42Z

Does the padding stop us from emmiting a memcpy?

Keno · 2024-04-17T00:58:32Z

Yes, the padding is forcing us to emit this unrolled. I think a reasonable implementation here would be to RLE the padding bit pattern and then emit the compare as a sequence of loops with an early out between each block. That should allow the loop vectorizer to emit the correct target-specific comparison sequence for each bit pattern as well as giving it license to early out the loop, without forcing that semantically.

gbaraldi · 2024-04-17T02:18:06Z

We should probably vendor the expand memcmp code llvm has. Not sure if there is anything that we can annotate the loop to say, hey we don't care if you early/late exit this

The strategy here is to look at (data, padding) pairs and RLE them into loops, so that repeated adjacent patterns use a loop rather than getting unrolled. On the test case from #54109, this makes compilation essentially instant, while also being faster at runtime (turns out LLVM spends a massive amount of time AND the answer is bad). There's some obvious further enhancements possible here: 1. The `memcmp` constant is small. LLVM has a pass to inline these with better code. However, we don't have it turned on. We should consider vendoring it, though we may want to add some shorcutting to it to avoid having it iterate through each function. 2. This only does one level of sequence matching. It could be recursed to turn things into nested loops. However, this solves the immediate issue, so hopefully it's a useful start. Fixes #54109.

This reverts a portion of commit 50833c8. This algorithm is not able to handle simple cases where there is any internal padding, such as the example of: ``` struct LotsBytes a::Int8 b::NTuple{256,Int} c::Int end ``` Unfortunately fixing it is a bit of a large project right now, so reverting now to fix correctness while working on that. Fixes #55513 (indirectly, by removing broken code) Maybe reopens #54109, although the latency issue it proposes to fix doesn't occur on master even with this revert (just the mediocre looking IR result output returns)

This reverts a portion of commit 50833c8. This algorithm is not able to handle simple cases where there is any internal padding, such as the example of: ``` struct LotsBytes a::Int8 b::NTuple{256,Int} c::Int end ``` Unfortunately fixing it is a bit of a large project right now, so reverting now to fix correctness while working on that. Fixes #55513 (indirectly, by removing broken code) Maybe reopens #54109, although the latency issue it proposes to fix doesn't occur on master even with this revert (just the mediocre looking IR result output returns) (cherry picked from commit a65c2cf)

giordano added performance Must go faster compiler:llvm For issues that relate to LLVM compiler:codegen Generation of LLVM IR and native code labels Apr 17, 2024

Keno mentioned this issue Apr 17, 2024

Make emitted egal code more loopy #54121

Merged

Keno closed this as completed in 50833c8 Apr 25, 2024

Keno closed this as completed in #54121 Apr 25, 2024

vtjnash mentioned this issue Feb 18, 2025

Revert "Make emitted egal code more loopy (#54121)" #57453

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive LLVM time in egal codegen of large struct #54109

Excessive LLVM time in egal codegen of large struct #54109

Keno commented Apr 17, 2024

vtjnash commented Apr 17, 2024

gbaraldi commented Apr 17, 2024

Keno commented Apr 17, 2024

gbaraldi commented Apr 17, 2024

Excessive LLVM time in egal codegen of large struct #54109

Excessive LLVM time in egal codegen of large struct #54109

Comments

Keno commented Apr 17, 2024

vtjnash commented Apr 17, 2024

gbaraldi commented Apr 17, 2024

Keno commented Apr 17, 2024

gbaraldi commented Apr 17, 2024