Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded file and string consumption #22

Open
Adamtaranto opened this issue Sep 10, 2024 · 4 comments
Open

Multithreaded file and string consumption #22

Adamtaranto opened this issue Sep 10, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@Adamtaranto
Copy link
Collaborator

How fast can we make kmer counting in Rust?

@ctb, are there examples of this from Sourmash or other libraries?

@Adamtaranto Adamtaranto added the enhancement New feature or request label Sep 10, 2024
@ctb
Copy link
Contributor

ctb commented Sep 10, 2024

yes, it's actually really easy ;). The trick is to make sure we're using iter and closures wherever possible; then we will just change iter to par_iter, include rayon's prelude, and voila.

This was next on my list after #10 gets finished off.

@ctb ctb mentioned this issue Sep 12, 2024
@Adamtaranto
Copy link
Collaborator Author

Might also save some time if we revcomp the full DNA seq once and pass a sliding window backwards through it (as in sourmash seqtohashes) instead of calculating rc for every kmer.

@Adamtaranto
Copy link
Collaborator Author

@ctb what do you think about making the user specify a thread number for rayon? I think it will try to use all available by default.

@Adamtaranto
Copy link
Collaborator Author

Some other Rust kmer counting projects for ideas:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants