-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a CPU features detection library #2824
Conversation
It implements CPUID using inline assembly, as well as a thin, cached layer of abstraction for common features. Supported features: * SSE * SSE2 * SSE3 * SSSE3 * SSE4.1 * SSE4.2 * AVX * AVX2 There is also vendor detection (Intel/AMD) and vendor string reporting. No tests are provided, as the results of these method calls depends on the CPU.
Mmh, interesting, is that in preparation for something? If not what are the common enough usecases for it to make sense in a languages standard library? Please keep in mind that every code we accept increases the technical debt and thus maintenance burden :) If we decide to not accept it makes a great shard for sure! |
Indeed! We could use this to speed up critical portions of the standard library by branching at runtime to a different codepath, based on what extended instruction set the hardware provides. The intended use case is for the |
Sorry for being stubborn, but do you have something specific in mind already? Do LLVM intrinsics generate code that does similar things? If yes, what examples are there not covered by intrinsics? |
This does not generate any code per se, it just allow you to detect what your CPU can and can't do at runtime. The goal here is to prevent the CPU to execute instructions it doesn't have support for. If you generate an AVX2-only codepath, for exemple, and your CPU doesn't support AVX2, it's gonna crash. It's useful if you want to redistribute binaries. With |
@Nax But this is only useful if you write inline assembly, right? |
@asterite Unless we have something equivalent to C compilers's intrinsics, then yes. |
Sorry to clarify, I meant whether LLVM intrinsics already do generate code which uses CPU feature detection to optimize. |
@jhass No, LLVM won't do that. You can ask it to emit extended instructions, but it will not insert runtime checks on it's own. |
@@features = Features::None | ||
|
||
def self.cpuid(fn : Int32, subfn : Int32 = 0) | ||
buf = [0_u32, 0_u32, 0_u32, 0_u32] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this rather be a StaticArray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't StaticArray
bound to the stack? I supposed they could not be returned from methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah well, yes. They can be returned but then a copy is returned from my understanding. Might still be faster than a heap allocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can, they are passed by copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay, then I will change to StaticArray
. Indeed, copying 128 bits should be way faster than allocating.
How about scoping this into the |
|
Internally use StaticArray for performance.
Is it cross OS? Since it involves ASM, I guess it is. I understand it detects and runs on x86 CPUs, but what about other ones, like ARM or MIPS archs? And what about 32bits? It could be nice to have, but maybe premature for per feature optimisations in the core/stdlib. Also well have to maintain it, and it may make it more complex to introduce new architectures. Of course great optimisations could help to integrate it nonetheless. We'll need input from @waj and @asterite here. |
It is cross OS indeed, and should work fine on both I don't think this would make new archs more complex to adopt if we do that. I'm gonna add checks to make it compile on non-x86 archs. |
Filepath should be |
Any news on this? |
@Nax To merge this we'll need a real, concrete use case |
Please do release a shard for this though! I think it can be useful, possibly in a future simd library. |
For the few people who are looking for this, LLVM itself provides some of the functionality here, although you probably need to dig into its source code to parse those feature names: require "llvm"
# target_machine.cr
lib LibLLVM
fun get_host_cpu_features = LLVMGetHostCPUFeatures : Char*
end
LLVM.init_aarch64 # or `init_x86` etc.
features = LLVM.string_and_dispose(LibLLVM.get_host_cpu_features)
features.split(',') # => ["+fp-armv8", "+lse", "+neon", "+crc", "+crypto"] My guess is LLVM needs this information for |
It implements CPUID using inline assembly, as well as a thin, cached layer of abstraction for common features.
Supported features:
There is also vendor detection (Intel/AMD) and vendor string reporting.
No tests are provided, as the results of these method calls depends on the CPU model.