Limit to reported alignments? #13

rwhetten · 2022-09-26T14:32:50Z

The latest version was able to index a 22.5 Gb genome (1.75 million scaffolds) in 32 min using 16 cores and 99 Gb RAM, and align a file of 51,751 proteins to the index in 31 min using 16 cores and 42 Gb RAM. Thanks to @lh3 for the quick fixes! The output GFF file reports multiple alignment positions for many proteins, which is expected due to an abundance of pseudogenes in this assembly. The distribution of number of alignment positions appears to be truncated at 51 - there are 2513 proteins with 51 reported alignment positions, and no proteins with any more than that. Is this the expected behavior? In this assembly, it would not be unreasonable to see hundreds of alignment positions for some proteins.

lh3 · 2022-09-26T16:00:30Z

Glad to know miniprot works on your 22 Gb fragmented assembly in reasonable time. Thanks for testing!

If you want to see more alignments, increase both -N and --outn to something like:

miniprot -N 1000 --outn=1000

N controls how many hits miniprot evaluates internally. Increasing its value will make miniprot run slower. --outn controls how many hits to output. It doesn't affect performance much.

It doesn't hurt to have a large default value. Related to #13.

lh3 closed this as completed Sep 26, 2022

lh3 added the question Further information is requested label Sep 26, 2022

lh3 mentioned this issue Sep 26, 2022

Indexing is much slower on fragmented assemblies #10

Closed

lh3 added a commit that referenced this issue Sep 27, 2022

r146: changed default --outn to 1000

f076cbf

It doesn't hurt to have a large default value. Related to #13.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit to reported alignments? #13

Limit to reported alignments? #13

rwhetten commented Sep 26, 2022

lh3 commented Sep 26, 2022

Limit to reported alignments? #13

Limit to reported alignments? #13

Comments

rwhetten commented Sep 26, 2022

lh3 commented Sep 26, 2022