fix: lower required threads for annotation rules #1528

mathiasbio · 2025-02-07T15:00:56Z

Description

The cluster has been quite clogged in this period in early 2025 and @Karl-Svard in prodbioinfo notified me about a job in balsamic cadd_annotate_somaticINDEL_research which was hardly using any threads or memory on the node on ganglia but it required 36 cores and booked up the whole node. It seems that this rule doesn't at all require this amount of resources and the cluster could be free:d up a bit if we lowered it.

I then looked at other similar rules and saw a few bcftools commands that was also run on the whole node. I don't think this should be necessary at all since the sizes of the VCFs are rarely even in the scale of 1gb.

On top of this the benchmark files specified in the rules also had the same name and should overwrite each other, meaning that we cannot track the benchmark of these rules.

Link issue: #1529

Changed

lowered threads for bcftools and CADD rules

Fixed

changed name of benchmark files for annotation rules to avoid name conflicts

Documentation

N/A
Updated Balsamic documentation to reflect the changes as needed for this PR.
- [Document Name]

Tests

Feature Tests

Run WGS TN case and check the benchmark files for the rule to see if the threads are matching the memory requirement

The threads seem reasonable given the memory usage

See table below from benchmark files for the adjusted rules:

rules	s	h:m:s	max_rss	max_vms	max_uss	max_pss	io_in	io_out	mean_load	cpu_time
bcftools_get_somaticINDEL_research.wholewhale.tnscope.tsv	2.7428	0:00:02	10.15	837.4	11.05	11.05	4.97	0	22.59	0.68
cadd_annotate_somaticINDEL_research.wholewhale.tnscope.tsv	1065.6277	0:17:45	9133.37	9404.01	9126.89	9128.77	47.1	3.14	76.51	834.99
vep_annotate_somaticSNV_research.wholewhale.tnscope.tsv	2554.3209	0:42:34	907.97	2748.82	429.1	486.2	14.64	0	526.06	13470.09
vcfanno_annotate_somaticSNV_clinical.wholewhale.tnscope.tsv	141.7593	0:02:21	175.46	2681.86	175.58	175.58	7.34	0	343.71	487.29

All rules except the cadd_annotate rule used less than 90M memory. The cadd_annotate rule used about 9.1Gb memory, which should be fine for using 4 threads as the nodes have about 190Gb mem available. So filling up one of these nodes with for instance 9 jobs (in the case of a 36 core node, 36 / 4 = 9) would only use like 82Gb ram. Which is far away from memory issues.

However this was a WGS TN case which has due to the normal filtration much fewer variants than a WGS TO case. So let's compare it to a WGS TO case as well, and I will guess that we need to increase it from 4 to some higher number to be safe. But let's compare...

Run WGS TO case and check the benchmark files for the rule to see if the threads are matching the memory requirement

The threads seem reasonable given the memory usage

rules	s	h:m:s	max_rss	max_vms	max_uss	max_pss	io_in	io_out	mean_load	cpu_time
bcftools_get_somaticINDEL_research.civilsole.tnscope.tsv	19.0488	0:00:19	23.74	905.05	23.79	23.79	4.98	0	75.55	14.43
cadd_annotate_somaticINDEL_research.civilsole.tnscope.tsv	23004.0995	6:23:24	1846.12	2109.49	1839.57	1841.44	66.27	575.4	102.91	23715.73
vep_annotate_somaticSNV_research.civilsole.tnscope.tsv	5162.9602	1:26:02	5246.64	5540.95	3939.81	4069.17	14.93	0	713.93	37300.1
vcfanno_annotate_somaticSNV_clinical.civilsole.tnscope.tsv	173.7455	0:02:53	585.99	3606.61	586.04	586.04	7.79	0	642.36	1116.13

Comparing to the TO case civilsole we actually see even a lower memory usage.

We don't have the benchmark file to compare as this was overwritten due to the benchmark files having the same name. But we can look at the log-file from the rule to see how much time it took to finish, to see if supplying 36 cores was useful to speed up the step.

It started at 19:24:05 and finished at 02:30:32, so it took roughly 7 hours, and in comparison this run which used 4 cores took about 6.5 hours, so I think we can conclude that this step does not benefit from multiple cores, and it doesn't seem like there's a risk of memory-issues from this either.

Pipeline Integrity Tests

Report deliver (generation of the .hk file)
- N/A
- Verified
TGA T/O Workflow
- N/A
- Verified
TGA T/N Workflow
- N/A
- Verified
UMI T/O Workflow
- N/A
- Verified
UMI T/N Workflow
- N/A
- Verified
WGS T/O Workflow
- N/A
- Verified
WGS T/N Workflow
- N/A
- Verified
QC Workflow
- N/A
- Verified
PON Workflow
- N/A
- Verified

Clinical Genomics Stockholm

Documentation

Atlas documentation
- N/A
- Updated: [Link]
Web portal for Clinical Genomics
- N/A
- Updated: [Link]

Panel of Normal specific criteria

The PR includes the addition of a new Panel of Normals
The samples have been verified to adhere to the sample selection criteria on Atlas PoN creation instructions for Balsamic

User Changes

N/A
This PR affects the output files or results.
- User feedback is considered unnecessary because [Justification].
- Affected users have been included in the development process and given a chance to provide feedback.

Infrastructure Changes

Stored files in Housekeeper
- N/A
- Updated: [Link]
CG (CLI and delivered/uploaded files)
- N/A
- Updated: [Link]
Servers (configuration files on Hasta)
- N/A
- Updated: [Link]
Scout interface
- N/A
- Updated: [Link]

Validation criteria

Validation criteria to be added to validation report PR: [LINK-TO-VALIDATION-REPORT-PR from the validations repository]

Version specific criteria

Text here or N/A

Important

One of the below checkboxes for validation need to be checked

Added version specific validation criteria to validation report
Changes validated in standard sections: [validation-section]
Validation criteria not necessary

Checklist

Important

Ensure that all checkboxes below are ticked before merging.

For Developers

PR Description
- Provided a comprehensive description of the PR.
- Linked relevant user stories or issues to the PR.
Documentation
- Verified and updated documentation if necessary.
Validation criteria
- Completed the validation criteria section of the template.
Tests
- Described and tested the functionality addressed in the PR.
- Ensured integration of the new code with existing workflows.
- Confirmed that meaningful unit tests were added for the changes introduced.
- Checked that the PR has successfully passed all relevant code smells and coverage checks.
Review
- Addressed and resolved all the feedback provided during the code review process.
- Obtained final approval from designated reviewers.

For Reviewers

Code
- Code implements the intended features or fixes the reported issue.
- Code follows the project's coding standards and style guide.
Documentation
- Pipeline changes are well-documented in the CHANGELOG and relevant documentation.
Validation criteria
- The author has completed the validation criteria section of the template
Tests
- The author provided a description of their manual testing, including consideration of edge cases and boundary
  conditions where applicable, with satisfactory results.
Review
- Confirmed that the developer has addressed all the comments during the code review.

codecov · 2025-02-07T15:05:02Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.45%. Comparing base (7d529e6) to head (72acfbe).
Report is 46 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1528      +/-   ##
===========================================
- Coverage    99.48%   99.45%   -0.03%     
===========================================
  Files           40       40              
  Lines         1932     2020      +88     
===========================================
+ Hits          1922     2009      +87     
- Misses          10       11       +1

Flag	Coverage Δ
unittests	`99.45% <ø> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sonarqubecloud · 2025-02-10T16:07:59Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

fevac

this is great. I think optimisation should be done for all rules but it's never priority so fixing at least these is great

lower threads for bcftools and vep

157d607

mathiasbio changed the base branch from master to develop February 7, 2025 15:01

mathiasbio linked an issue Feb 7, 2025 that may be closed by this pull request

[Maintenance] update threads and benchmark files for annotation rules #1529

Closed

3 tasks

mathiasbio self-assigned this Feb 7, 2025

mathiasbio added this to the Release 17 milestone Feb 7, 2025

changelog

4fb80e7

mathiasbio marked this pull request as ready for review February 7, 2025 15:45

mathiasbio requested a review from a team as a code owner February 7, 2025 15:45

Merge branch 'develop' into lower_required_threads

72acfbe

fevac approved these changes Feb 14, 2025

View reviewed changes

mathiasbio merged commit 22323f9 into develop Feb 14, 2025
9 checks passed

mathiasbio deleted the lower_required_threads branch February 14, 2025 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: lower required threads for annotation rules #1528

fix: lower required threads for annotation rules #1528

mathiasbio commented Feb 7, 2025 •

edited

Loading

codecov bot commented Feb 7, 2025 •

edited

Loading

sonarqubecloud bot commented Feb 10, 2025

fevac left a comment

fix: lower required threads for annotation rules #1528

fix: lower required threads for annotation rules #1528

Conversation

mathiasbio commented Feb 7, 2025 • edited Loading

Description

Changed

Fixed

Documentation

Tests

Feature Tests

Pipeline Integrity Tests

Clinical Genomics Stockholm

Documentation

Panel of Normal specific criteria

User Changes

Infrastructure Changes

Validation criteria

Checklist

For Developers

For Reviewers

codecov bot commented Feb 7, 2025 • edited Loading

Codecov Report

sonarqubecloud bot commented Feb 10, 2025

Quality Gate passed

fevac left a comment

Choose a reason for hiding this comment

mathiasbio commented Feb 7, 2025 •

edited

Loading

codecov bot commented Feb 7, 2025 •

edited

Loading