You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to optimize extremely performance sensitive big integer / cryptographic code where somehow I couldn't reach the speed of other Assembly implementations.
It turned out I was initializing an array with this line
Using this line instead, with staticFor being a compile-time loop unroller improved performance by 30%, previously my code took 70 cycles and now it takes 50 cycles.
staticFor i, 0, `N`: # Do NOT use Nim slice/toOpenArray, they are not inlined
`scratchSym`[i] = `a_MR`[i]
(state-of-the art C++ JIT-ed code takes 55 cycles on my machine).
Slicing via scratchSym[0 .. N-1] is the problem here, not toOpenarray.
Araq
changed the title
Performance: toOpenArray should generate inline code or do constant-folding
Performance: old school slicing should generate inline code or do constant-folding
Feb 1, 2021
I was trying to optimize extremely performance sensitive big integer / cryptographic code where somehow I couldn't reach the speed of other Assembly implementations.
It turned out I was initializing an array with this line
Using this line instead, with
staticFor
being a compile-time loop unroller improved performance by 30%, previously my code took 70 cycles and now it takes 50 cycles.(state-of-the art C++ JIT-ed code takes 55 cycles on my machine).
Profiling:
![image](https://user-images.githubusercontent.com/22738317/106390782-613fcb80-63ea-11eb-8240-c1448024026f.png)
C code
The text was updated successfully, but these errors were encountered: