Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more tools from Galaxy Australia TPV config #69

Merged
merged 2 commits into from
Sep 23, 2024

Conversation

cat-bro
Copy link
Collaborator

@cat-bro cat-bro commented Sep 13, 2024

Some tools that have been added to AU's TPV config in the last year or so

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @cat-bro!

I'm not sure how to review those properly. I hope our cgroups2 setup is soon back and we can contribute numbers.

I tried to flag a few tools that IMHO would waste a lot of resources in the happy-case and where a size-based rule might be good to have.

Thanks again!

tools.yml Outdated
@@ -293,6 +305,8 @@ tools:
mem: 12
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_multijoin_tool/.*:
mem: 8
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sort_header_tool/.*:
mem: 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like way too much for the general case. The tool is not doing much afaik.
Maybe we should include here a rule?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for reviewing! This rarely uses above the default 3.8GB and only when the inputs are large, so I'll add a rule here.

tools.yml Outdated
@@ -399,6 +421,8 @@ tools:
toolshed.g2.bx.psu.edu/repos/devteam/cuffquant/cuffquant/.*:
cores: 6
mem: 20
toolshed.g2.bx.psu.edu/repos/devteam/emboss_5/EMBOSS.*:
mem: 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks also very generous. Most tools will not need more than a few 100 MB for normal files sizes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For needle this needs more than 3.8GB fairly regularly for smallish inputs so I'll target this tool ID on its own. As a general rule it will only need extra memory for large input sizes.

@@ -618,10 +664,15 @@ tools:
toolshed.g2.bx.psu.edu/repos/galaxyp/eggnog_mapper/eggnog_mapper/.*:
cores: 8
mem: 12
toolshed.g2.bx.psu.edu/repos/galaxyp/eggnog_mapper/eggnog_mapper_search/.*:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this in place.

  toolshed.g2.bx.psu.edu/repos/galaxyp/eggnog_mapper/eggnog_mapper/.*:
    cores: 8
    mem: 24
    env:
      EGGNOG_DBMEM: "--dbmem"
    params:
      singularity_run_extra_arguments: "--env EGGNOG_DBMEM=--dbmem"
    scheduling:
      require:
        - singularity

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bgruening ! That's a separate tool ID and I'll try this out on Australia. eggnog_mapper was very popular a year ago and is still used regularly. Is there a difference between {params: {singularity_run_extra_arguments: "--env EGGNOG_DBMEM=--dbmem"}} and {env: {SINGULARITYENV_EGGNOG_DBMEM: dbmem}}? We have been using environment variables prefixed with SINGULARITYENV_ and right now I can't remember where I got that from.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the eggnog mapper readme Admins can choose to set the environment variable ``EGGNOG_DBMEM=--dbmem`` which will copy the complete EggNOG annotation DB into memory which is usually much faster than using multiple cores (but needs approx. 37GB of RAM).

tools.yml Outdated
@@ -1090,6 +1154,8 @@ tools:
cores: 7
toolshed.g2.bx.psu.edu/repos/iuc/bcftools_norm/bcftools_norm/.*:
cores: 2
toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools.*:
mem: 12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bedtools are used a lot and most of the time 4GB is enough as it seems?

Maybe a rule here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This varies a lot between tools. genomecoveragebed regularly exceeds 3.8 with no input, intersectbed is known to need lots. I might leave this out for now and come back with tool-by-tool estimates.

tools.yml Outdated
@@ -1265,8 +1353,13 @@ tools:
toolshed.g2.bx.psu.edu/repos/iuc/gemini_load/gemini_load/.*:
cores: 10
mem: 40
toolshed.g2.bx.psu.edu/repos/iuc/getorganelle/get_organelle_from_reads/.*:
mem: 70
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks strange, but its just a gut feeling :(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is also hard to generalise. For inputs less that 0.8GB we might be safe with 24GB but it's a steep increase after that. We have jobs running out of memory on Galaxy Australia with 4GB of input - there must be other variables at play. Jobs that try to use too much memory are killed, so we're better off with the local rule in this case.

@cat-bro cat-bro force-pushed the update_tools_from_au branch from 18689c3 to 7e34546 Compare September 18, 2024 05:22
Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cat-bro

@cat-bro cat-bro merged commit dba3405 into galaxyproject:main Sep 23, 2024
1 check passed
@cat-bro cat-bro deleted the update_tools_from_au branch September 23, 2024 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants