Add Count Thresholding #18

Adamtaranto · 2024-09-10T08:51:51Z

It is often useful to exclude low abundance (erroneous) or high abundance (repeat associated) kmers from a count table.

As a user I'd expect a method called .min() to return all the kmers with the minimum observed count and .max() to be all kmers with the max observed count.

For thresholding at some cutoff value, maybe something like .mincut() and .maxcut() ?

Suggested use:

table = oxli.KmerCountTable(3)
kmers = ["AAA", "GGG", "GGG"]

for kmer in kmers:
    table.count(kmer)

table.mincut(2)
>> "Dropped 1 hash with fewer than 2 counts."

table.get("AAA")
>> 0

table.get("GGG")
>> 2

@ctb?

The text was updated successfully, but these errors were encountered:

Adamtaranto added the enhancement New feature or request label Sep 10, 2024

Adamtaranto self-assigned this Sep 10, 2024

Adamtaranto mentioned this issue Sep 14, 2024

Removing kmer records #28

Merged

Adamtaranto closed this as completed in #28 Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Count Thresholding #18

Add Count Thresholding #18

Adamtaranto commented Sep 10, 2024

Add Count Thresholding #18

Add Count Thresholding #18

Comments

Adamtaranto commented Sep 10, 2024