-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to reduce memory requirement? #258
Comments
How about splitting the database into two parts and run kaiju once for each parts? Then merge both output files using |
I think it's worth trying. Thank you so much!
…On Mon, Mar 20, 2023, 10:28 PM Peter Menzel ***@***.***> wrote:
How about splitting the database into two parts and run kaiju once for
each parts? Then merge both output files using kaiju-mergeOutputs with
option -s, so that for each read the database match with best score is
kept, see the README, also about sorting the files before merging them.
—
Reply to this email directly, view it on GitHub
<#258 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACN4I2JL4GZETFY6RAGHCEDW5C4XJANCNFSM6AAAAAAWBSE6FI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I tried it and it worked! Thank you so much! There are minor differences between the two methods though, but I can live with that. I suggest to include this strategy as a new feature of kaiju, where kaiju automatically detects the available memory and splits the database into a number of chunks suitable for the detected memory, so that everything runs automatically behind the scenes. This feature will not only allow using kaiju for systems with less memory, but also help many users who may not be able to do these steps manually. |
I am not sure what performance impact this would have but using LMDB instead of having the whole thing in RAM may not have that much of an impact. Especially with a fast scratch disk system: http://www.lmdb.tech/doc/ |
I would like to thank you so much for this wonderful tool. I really like it so much. However, I would appreciate if there is a way to reduce memory requirement. I used Uniref100 as a custom database. I tested it briefly on a system with 256G of RAM; it successully ran, but required roughly 230G of RAM. So, when I submitted it to compute nodes, which have 128G each, it of course failed.
mmseqs2 reduces memory requirement by splitting the database, but it's much slower and I'd prefer if I can find a way to still use kaiju in my case with only 128G. Do you have some tips or suggestions?
Many thanks!
The text was updated successfully, but these errors were encountered: