Added AVX1 support for salsa and chacha rounds #1

kangaderoo · 2015-02-15T15:36:35Z

Code is in C for better maintainabilty. ASM derived from these files
might increase speed slightly.
Current speed increase compared to SSE routines about 10%

Code is in C for better maintainabilty. ASM derived from these files might increase speed slightly. Current speed increase compared to SSE routines about 10%

Make the config work with the new files

ghostlander · 2015-02-17T16:14:06Z

Thanks, I plan to add the AVX/XOP assembly code in the future and may use your inline assembly as a reference. SSE2 4-way is also going to be improved.

kangaderoo · 2015-02-18T10:45:21Z

I was kind of wondering where your speed increase from the 4-way is
originated.
Guess I still have to rewrite the the KDF compress to inline assembly.
I guess this function is a good candidate to optimize to 4-way, or maybe
8-way, depending on the XMM requirements.

The original CpuMiner had a scrypt 3-way and a SHA256 4-way, resulting
is the best result running a 12-way on AVX1.
Scrypt 3-way contained 3 'matrices' in XMM registers, keeping 4 XMM
register free for calculating functions etc.
It seems that XMM//XMM operations run 3 times faster then XMM//Memory
operations.

Due to the mixing behavior (4 times a 4x4 matrix) of neo-scrypt it looks
like that for salsa and cha-cha 1-way would need the minimum of
memory moves.

Unfortunately my development environment doesn't have AVX2, but the
in-line assembly code could easily be rewritten to
support the 256bits YMM registers.

John Doering schreef op 2/17/2015 om 5:14 PM:

Thanks, I plan to add the AVX/XOP assembly code in the future and may
use your inline assembly as a reference. SSE2 4-way is also going to
be improved.

—
Reply to this email directly or view it on GitHub
#1 (comment).

Increase hashing speed by running 3 calc in parallel. Eliminate simd latency by smart sequencing. ~25% speed increase observed.

kangaderoo added 2 commits February 15, 2015 16:33

Added AVX1 support for salsa and chacha rounds

8fb5263

Code is in C for better maintainabilty. ASM derived from these files might increase speed slightly. Current speed increase compared to SSE routines about 10%

Merge the new files and the build env.

2ae7b82

Make the config work with the new files

kangaderoo added 5 commits March 8, 2015 16:58

use a 128 bit xor with sse/avx

f534bfc

use blake2 avx code

1f59593

memory alloc alligment for avx/sse and clean-up

3b69b45

Added a hashing X3

06c5d4b

Increase hashing speed by running 3 calc in parallel. Eliminate simd latency by smart sequencing. ~25% speed increase observed.

Enable extranonce subscription

7532b59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added AVX1 support for salsa and chacha rounds #1

Added AVX1 support for salsa and chacha rounds #1

kangaderoo commented Feb 15, 2015

ghostlander commented Feb 17, 2015

kangaderoo commented Feb 18, 2015

Added AVX1 support for salsa and chacha rounds #1

Are you sure you want to change the base?

Added AVX1 support for salsa and chacha rounds #1

Conversation

kangaderoo commented Feb 15, 2015

ghostlander commented Feb 17, 2015

kangaderoo commented Feb 18, 2015