Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--full-matrix doesn't work with --sparse #32

Closed
apcamargo opened this issue Jul 1, 2024 · 4 comments
Closed

--full-matrix doesn't work with --sparse #32

apcamargo opened this issue Jul 1, 2024 · 4 comments

Comments

@apcamargo
Copy link

I'm experimenting with skani to cluster plasmid/virus genomes at scale (see here). To get diagonal values I started using --full-matrix, but I noticed that those values are not showing up when using --sparse. Is this intentional?

Also, it could be useful if there was a parameter that enables the diagonal in the output. I understand why this is not enabled by default (#31), but there are some cases where it is useful to have it. For example:

curl -L https://ccb-microbe.cs.uni-saarland.de/plsdb/plasmids/download/plsdb.fna.bz2 \
    | seqkit seq --only-id --upper-case \
    > plsdb.fna

skani triangle -t 16 --sparse -i -m 150 -c 30 -s 70 plsdb.fna > skani_output.tsv

awk 'NR>1 && $3>=95 && ($4>=85 || $5>=85) {
    printf("%s\t%s\t%.4f\n", $6, $7, $3 * ($4 > $5 ? $4 : $5) / 10000)
}' skani_output.tsv > edges.tsv

pyleiden edges.tsv clusters.txt

In the example above, sequences that don't have ANI and AF higher than the thresholds for any other sequence in the FASTA will not be in the network at all and will be missing from the input. Of course there are other ways of including them, but having the diagonal values (in this case, using --sparse) would make things easier.

@bluenote-1577
Copy link
Owner

Hi Antonio,

Thanks for reaching out.

It is intentional to not put in diagonal values (e.g. 100% ANI and 100% AF) for --sparse. The reason is kind of for backwards-compatibility with mash triangle -E, the analogous command. I recognize how this could make your life harder for creating a graph, though.

I think I can implement a --diagonal option or something like that, to give a diagonal both when --sparse is enabled and when it isn't. I'll get back to you.

Thanks,
Jim

@apcamargo
Copy link
Author

Ohh, ok. I assumed that --full-matrix --sparse would provide the diagonals because using --full-matrix on itself does do that. I didn't know of Mash's behaviour.

I guess that an additional parameter to enable the diagonal would solve this.

@bluenote-1577
Copy link
Owner

i released v0.2.2 and now you can specify --diagonal to include diagonal entries (for even --sparse, CC @fplazaonate)

@apcamargo
Copy link
Author

Thanks, Jim!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants