Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .dump() method #24

Closed
Adamtaranto opened this issue Sep 11, 2024 · 3 comments · Fixed by #30
Closed

Add .dump() method #24

Adamtaranto opened this issue Sep 11, 2024 · 3 comments · Fixed by #30
Assignees
Labels
enhancement New feature or request

Comments

@Adamtaranto
Copy link
Collaborator

Similar to dump in KMC_tools, write kmer:count pairs to a text file.

Could add two options for this:

  • .dump_hash() for hash:count pairs
  • .dump() for kmer:count pairs

The second option should throw an error if no hash:kmer map is stored in the KmerCountTable object.

Should we support sorting output on either kmer or count?

@Adamtaranto Adamtaranto added the enhancement New feature or request label Sep 11, 2024
@Adamtaranto
Copy link
Collaborator Author

Check out Pariter for parallel queries that preserve input / output order.

@ctb
Copy link
Contributor

ctb commented Sep 12, 2024

(I've used rayon extensively - ref also #22)

@Adamtaranto
Copy link
Collaborator Author

Just adding .dump_hash() for now until we have a solution for #21.

Default sort on counts then keys. Option to sort on keys.

@ctb For writing output, should we:

  1. Take a outfile arg and handle in rust?, or
  2. Yield output lines and handle the writing in Python?

2 is easier to write tests for, 1 is probably easier for users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants