-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
please use runtime dispatch via target_feature #3
Comments
Thanks for your comment. This is very precious to get that kind of comment when the crate is young and has no other clients than tantivy. Let me check I understand the philosophy behind the best practise of SIMD in rust : the compilation of those instructions is independant from the I have a couple of issues with this approach, some of which are specific to my use case, and I would like to know your opinion on the subject. In my case, the different implementations of the bitpackers create formats that are incompatible one another. That's because the bitpacking at the register type level. For instance, when bitpacking over 4 bits using SSE3, the first byte will contain integer 0, and integer 5 (instead of integer 0 and integer 1). It is possible to create a bitpacker compatible with the SSE3 format using only scalar instructions, but it is likely to be slower than the vanilla scalar bitpacker. There exists even a use case for people to use the ScalarBitPacker and the SSE3BitPacker in the same application : while this is not implemented yet, the scalar bitpacker makes it possible to get random Another thing that concerns me is that many rustaceans are not aware of the After reading those concerns, what do you think would be the best solution here ? |
You got it!
Oh... Yikes. That's a noodle scratcher. This does seem orthogonal to runtime dispatch to me though, no? I do appreciate that this means you can't just expose one set of methods and have the crate "pick" the right one. You'll need to give callers a choice. But with runtime dispatch, you can give callers every choice available. e.g., // always succeeds
fn create_scalar_compressor() -> ScalarCompressor { ... }
// only succeeds when current CPU supports SSSE3
fn create_ssse3_compressor() -> Option<SSSE3Compressor> { ... }
// only succeeds when current CPU supports AVX2
fn create_avx2_compressor() -> Option<AVX2Compressor> { ... } The corresponding types you get back would then have the methods suitable for doing compression (or decompression too, whatever). It is then up to the caller to figure out which one to use. It's probable that the caller should return an error if its CPU doesn't have the appropriate capabilities to read the compressed data. e.g., Creating an index on an AVX2 CPU and trying to read it on a CPU that doesn't support AVX2 I guess should produce an error. It seems likely that you want/need this even if you stick with compile time options. Otherwise, if I run tantivy that happened to be compiled using different options and end up sacrificing the integrity of my data, then thats really bad. Note that the ssse3 and avx2 types are also platform specific and should only be available on x86. You have some flexibility here in how you deal with it, but the simplest is probably to just promote the platform specificness into your own public API, but you could technically abstract over it (and I'd subsequently rename the types so that they didn't include ISA extension names in them). (I probably still might not expose a trait for this since I try to keep APIs small, but I can definitely see the motivation given multiple distinct alternatives. And I could see callers would want to use the trait. You could also declare that the caller should define their own trait.)
Yeah this is exactly why I think people should be using runtime feature detection by default unless there is a good reason not to do it. With runtime feature detection, you shouldn't ever need to worry about compile time flags. This impacts every dependent of tantivy as well. For example, if I build an application that uses tantivy internally and start distributing binaries on GitHub, it would be awesome if those binaries took advantage of SIMD if it were available automatically. Doing this for compile time feature configuration is a huge burden; you need to ship a different binary for each permutation of target feature you want to support. Doing this for runtime feature detection is free: as the application writer, I don't even need to be aware of the fact that tantivy uses CPU specific features. It will Just Work. That is an enormous benefit, and IMO, it is worth doing almost everything possible to get it. |
Also, I think the fact that the different methods here produce distinct incompatible bitpackings should be documented in huge bold letters. I didn't realize that until you told me. :-) |
@fulmicoton I will also say this: it may be the case that I wind up needing to use a crate like this, and if you'd like to defer the work to address this issue, then I'd probably be happy to take it on assuming you're comfortable with major refactorings. I don't know when it will happen (it could be a year or more, or it could be a month, I have no clue). |
@BurntSushi Just to make sure of what should be done, let me be very specific here :
Maybe I am misunderstanding your point, but unfortunately, in my experience it does not work that way. I will also say this: it may be the case that I wind up needing to use a crate like this, and if you'd like to defer the work to address this issue, then I'd probably be happy to take it on assuming you're comfortable with major refactorings. I don't know when it will happen (it could be a year or more, or it could be a month, I have no clue). I am open to this refactoring and I would love if you could take over this. |
Oh I see, yeah, this is definitely not the intended behavior! I ported the Other links:
|
Sounds good. I have no timeline, but just wanted to put this out there! |
|
(@BurntSushi I quickly test it confirm that this solved my perf problem !) |
I think I should be able to take care of this issue this week. If that's ok I'll send you a CR @BurntSushi |
Sounds good! No rush at all. :-) |
PR available here. #5 Can you have look at it ? @BurntSushi It seems like I cannot mark you as a reviewer because you dont belong to the tantivy-search github organization. (I'd gladly add you to it if you want to) |
It would be great if this crate didn't require using any compilation flags at all, and instead auto-configured itself automatically at runtime based on what the running CPU supports. This process is documented in the
std::arch
module docs.Some other nits:
The text was updated successfully, but these errors were encountered: