-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more scaffolding updates #511
Conversation
…multi-segment genome are recovered
@@ -79,7 +70,7 @@ workflow scaffold_and_refine_multitaxa { | |||
"assembly_length_unambiguous" : refine.assembly_length_unambiguous, | |||
"reads_aligned" : refine.align_to_self_merged_reads_aligned, | |||
"mean_coverage" : refine.align_to_self_merged_mean_coverage, | |||
"percent_reference_covered" : 1.0 * refine.assembly_length_unambiguous / refine.reference_genome_length, | |||
"percent_reference_covered" : select_first([percent_reference_covered, 0.0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be nice to break out tax_name
and percent_reference_covered
for the "top" viral assembly into separate workflow outputs, for easier search and filtering on Terra (where "top" could be defined as the most complete assembly, or the most abundant taxon in terms of # of reads or # of matching distinct k-mers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added into the TO DO comments at the bottom of the WDL. I think this will require a small bespoke tsv-parsing task for this purpose. It will also need to be reslient to the empty-output scenario (ie, there is no top assembly because none were attempted or were successful).
|
||
Int num_read_groups = refine.num_read_groups[0] | ||
Int num_libraries = refine.num_libraries[0] | ||
Array[Map[String,String]] assembly_stats_by_taxon = stats_by_taxon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we can't make this type Map[ String, Map[String,String] ]
, where the outer map String
keys are the taxid
or tax_name
values? (for picking out values for a given taxon in downstream analyses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just because of how we construct it (see the scatter in the WDL above), and that WDL 1.0 lacks a lot of the basic methods for navigating Map
s and converting back and forth with Array
s.
scaffold_and_refine_multitaxa:
classify_single:
containers:
build: