-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory improvement when writing missing positions to pet #7098
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
there are actually 2 files that only have whitespace changes. they don't need to be in this PR
@@ -168,7 +168,7 @@ task CreateImportTsvs { | |||
--mode GENOMES \ | |||
-SNM ~{sample_map} \ | |||
--ref-version 38 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this class from the PR - only whitespace changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great -- just small nits
public static List<List<String>> createMissingTSV(long start, long end, String sampleName) { | ||
List<List<String>> rows = new ArrayList<>(); | ||
|
||
public void createMissingTSV(long start, long end, String sampleName) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to writeMissingEntry
or something else with write
?
public static List<List<String>> createMissingTSV(long start, long end, String sampleName) { | ||
List<List<String>> rows = new ArrayList<>(); | ||
|
||
public void createMissingTSV(long start, long end, String sampleName) throws IOException { | ||
for (long position = start; position <= end; position ++){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit -- remove space on postincrement operator (ie position++
)
* WIP memory improvements - only do one loop * move log to end of loop * reduce memory/cpus for ImportGenomes, re-enable non-TSV outputs * just kidding, let's not change the wdl yet * remove redundant outputType inputs * remove debugging log * rename createMissingTSV to writeMissingPositions, minor edits * revert whitespace-only changes
* WIP memory improvements - only do one loop * move log to end of loop * reduce memory/cpus for ImportGenomes, re-enable non-TSV outputs * just kidding, let's not change the wdl yet * remove redundant outputType inputs * remove debugging log * rename createMissingTSV to writeMissingPositions, minor edits * revert whitespace-only changes
related to issue #211 and #233 - in CreateVariantIngestFiles, when writing missing positions to the pet tsv, we were looping through blocks of missing intervals twice, once to construct the pet rows, holding them all in memory, and again to write the pet rows. in this PR, we merge those steps into one: for each missing block, we write the pet lines to file as we loop through, avoiding the need to hold large blocks in memory.
note that this does not resolve the memory issues that originally prompted issues #211 and #233, but it's nonetheless a minor improvement that helps immensely when there are large numbers of missing locations.