Inefficient codegen collecting slice iterator into array #126000
Labels
A-codegen
Area: Code generation
A-LLVM
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
C-bug
Category: This is a bug.
I-heavy
Issue: Problems and improvements with respect to binary size of generated code.
I-slow
Issue: Problems and improvements with respect to performance of generated code.
While working on
Itertools::collect_array
foritertools
I wanted to compare the efficiency ofYou can see my comparison here: https://rust.godbolt.org/z/qPW3K8aTx.
I was rather shocked, both look good for
N = 4
, but forN = 16
we see the following nice implementation fortry_into
:But the following abomination for
collect_array
:While it not folding the consecutive address
mov
s into efficient SIMDmov
s is disappointing, I would argue that there's probably a bug somewhere since it comparesrdx
THIRTEEN TIMES IN A ROW to ultimately just check if it is< 16
.This doesn't appear to be a recent regression, the same happens in 1.60 through nightly, and it happens on both x86 as well as ARM.
The text was updated successfully, but these errors were encountered: