-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: enable SIMD support for buffer swap #44793
Conversation
Submit for CI first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still -1 on this because I'm not convinced that swapping bytes is a common enough operation to warrant the level of optimization being made here
Additional issues:
- Byte swapping functions called by all platforms are targeting avx512vbmi
I'm not sure the concern of code change here. Byte swap is a official Node.js API for all programmers. As long as the opt patch does not hurt current performance but benefit some platforms, I think it is good to have. Is there concerns for quality? |
Found the review guideline, could you please check if this opt patch meets this? Do not overwhelm new contributors. It is tempting to micro-optimize and make everything about relative performance, perfect grammar, or exact style matches. Do not succumb to that temptation. Focus first on the most significant aspects of the change: Does this change make sense for Node.js? |
Accepting changes is a tradeoff, that's mentioned elsewhere in doc/contributing/pull-requests.md. I agree with @mscdex the benefits (performance) do not outweigh the downsides (complexity) for an uncommon operation. AVX-512 itself is still so uncommon it's probably not worth optimizing for yet. Something I also touched upon in your other pull request is that I'm skeptical real world programs are going to see much improvement. Most programs don't swap bytes 100% of the time and lightly sprinkling AVX-512 instructions throughout your instruction stream often reduces overall performance. It definitely increases overall power consumption. |
I'm also skeptical here. While I appreciate the code change and the perf improvement for some cases, this definitely increases the complexity quite a bit for an operation that is often not in the hot path. I'm going to say -0 on this (slightly against but not going to block, but I'm not at all convinced there's enough value) |
Could you please provide a real test case? I will try to reproduce your skepital. |
Some good candidates: tsc, webpack, ghost, express. You should check if they or their dependencies actually do any byte swapping. (If none of them do, that's a good indicator it's not a hot path.) |
some of them like typescript and express are frameworks, and programmers can write any JS code based on them. When I search for typescript code snippets in Github that invoke buffer.swap32(), I could see many occurances. Is that a sign of the popularity of buffer wapping operations? Search link: https://github.com/search?l=TypeScript&q=buffer.swap32&type=Code |
That link seems to show that the majority of results are just typescript definitions and not so much widespread code usage. |
It's a bit nosense that so many projects define functions and not use them. I will look into the project repo code listed in the first result page and trace the usage of those definitions. |
Closing as there's likely no action to take here. |
This patch is modified from #44578 after review comments accepted.
Comments -> solution
Patch is too big that add a deps folder -> I made all the code changes into the util-inl.h file, and did not touch any other folders and files.
Patch only adopted AVX512 Simd instruction which is not widespread -> I enabled ssse version and AVX512 version at the same time. So that all popular platforms can benefit from this patch. Most of known platforms support ssse
Do not modify the gyp file -> I removed all code changes in gyp file, and uses attribute in the source code file like deps/zlib does.
For all platforms, the most performance gains is >5X.
For the platforms supporting AVX512, it achieved best performance gain, the charts for comparison as below:
For the platforms supporting SSE only, it also achieved good performance, charts as below: