Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import other kinds of fracminhash etc? #2710

Open
ctb opened this issue Aug 10, 2023 · 8 comments
Open

import other kinds of fracminhash etc? #2710

ctb opened this issue Aug 10, 2023 · 8 comments

Comments

@ctb
Copy link
Contributor

ctb commented Aug 10, 2023

curious how well our underlying infrastructure can work for handling other kinds of fracminhash sequences! maybe could be explored using plugins.

https://github.com/St4NNi/jam-rs

@mr-eyes
Copy link
Member

mr-eyes commented Aug 10, 2023

I think, in this tool, it's doable to convert its output to a sourmash signature. It will only fail when working on any 'scale' related operation since hashing is different.

@mr-eyes
Copy link
Member

mr-eyes commented Aug 10, 2023

I would love to work or help creating a plugin that converts different input files to a sourmash sketch. KMC, kProcessor, etc..

@ctb
Copy link
Contributor Author

ctb commented Aug 10, 2023

(as long as it's a FracMinHash bottom sketch, the scaled stuff should work fine! it's all based on numbers not the specific hashing approach 🤷 )

@ctb
Copy link
Contributor Author

ctb commented Aug 10, 2023

cc @St4NNi :)

@mr-eyes
Copy link
Member

mr-eyes commented Aug 10, 2023

(as long as it's a FracMinHash bottom sketch, the scaled stuff should work fine! it's all based on numbers not the specific hashing approach 🤷 )

I see what you are saying, and yes! Thanks.

@St4NNi
Copy link

St4NNi commented Aug 10, 2023

Hi everyone, and thanks for the shout out @ctb . jam was more or less born out of the same curiosity, but after taking a closer look at it, a lot of other ideas for tailoring minhash to some of our specific problems popped up.

Currently, jam is in a fairly early stage, and the output format has not yet settled on anything stable, but I am also curious about how different hashing algorithms will perform in sourmash, so I was thinking of adding an output parameter that creates sourmash-compatible sketches directly.

The only thing that would be a little odd is that sourmash::encodings::HashFunctions has no real custom option or similar and is not even #[non_exhaustive] so for now the algorithm would need to pretend to be murmur64_DNA

@luizirber
Copy link
Member

The only thing that would be a little odd is that sourmash::encodings::HashFunctions has no real custom option or similar and is not even #[non_exhaustive] so for now the algorithm would need to pretend to be murmur64_DNA

That's a great point! I'm not saying you should lie, but... you can lie at the Signature level, but not at the Sketch level, so supporting a Custom(String) variant in HashFunctions seems the way to go!

@St4NNi
Copy link

St4NNi commented Aug 11, 2023

That's a great point! I'm not saying you should lie, but... you can lie at the Signature level, but not at the Sketch level, so supporting a Custom(String) variant in HashFunctions seems the way to go!

Sounds reasonable to me, in a first iteration I could lie on the Sketch level (and tell the truth on Signature level), as long as both sketches use the same alg it should be fine.

Regarding the custom string on the HashFunctions enum i don´t know if it would be that easy since it is #[repr(u32)] and I'm afraid that changing this would cause all sorts of side-effects.

Leaving this as #[repr(u32)] but adding a field would effectively make this enum non-primitive and result in something similar to #[repr(C)] preventing any type casts with as.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants