-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safer ksize selection in signature::Select
#3026
Comments
First let me say that while I think this will have minor real-world impact, I can virtually guarantee it will be a source of at least one nasty time-consuming bug in the next year :). So I really don't like this kind of fuzziness! I appreciate the creation of this issue :).
The molecule type is available in the manifest; can't we use that? Or is the problem that 'select' doesn't use the manifest/there may not be a manifest? |
agreed! #3028 🙂
Yes - in Ok, this was actually simpler than I was thinking! Sigs have |
Check signature `.hash_function` to determine whether or not we need to make protein ksize corrections. * Fixes #3026
Protein k-mer sizes are k=k*3 internally, but k in a manifest.
In
signature::Select
, we account for this discrepancy like this:So here we match exact ksize or k=k*3, regardless of ksize or molecule type. This is b/c we don't have access to
is_protein
or other relevant properties unless we load the minhash.This is fine for most common uses, b/c
signature::select
with signatures of the rightksize
alreadyBut we could be safer about it anyway.
Just after the
ksize
check, we load the minhash to check scaled and downsample if needed. We could combine these checks and use the minhash to determine molecule type (via matching onencodings::HashFunctions
, i think)The text was updated successfully, but these errors were encountered: