-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCL improvements #1
Conversation
This patch attempts to tweak OpenCL kernel for the follow aspects: 1. Reduce unnecessary memory access; 2. Remove non-reacheable code; 3. Specialize character-wise set; 4. Add loop unrolling hints; 5. Assume the messages no exceeding 17 exabytes and apply optimizations; It is known to bring about 15% speedup on NVIDIA TITAN Xp.
Test environment:
original OpenCL kernel:
new OpenCL kernel:
|
I run some tests to compare the new against the old one.
I'm using RX 5700XT with the current Nano difficult network settings (8x). It seems that almost every case the new OpenCL is faster than the old one, if few exceptions (one on the list above). I run custom tests multiple times, using each OpenCL implementations. The times are listed above. It could save up to 0.64 seconds, and it's slower by 0.08 seconds in one case. In most case, the differences are not significative, but the results are consistent, saving some few milliseconds. In a total of 20 PoWs, the new implementation saves around 2 seconds, consistently. I also generated 200 PoWs, which took 272.680s on the old one, and 259.198s using the new one. So, that saves ~13 seconds. I'll try to do more tests at night. 👍 Thank you for the PR. (: |
This patch attempts to tweak OpenCL kernel for the follow aspects:
It is known to bring about 15% speedup on NVIDIA TITAN Xp.