Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaffolding regression fixes plus docker updates #547

Merged
merged 18 commits into from
Aug 4, 2024
Merged

Conversation

dpark01
Copy link
Member

@dpark01 dpark01 commented Aug 2, 2024

This PR:

  1. fixes a bug [bug] assemble_denovo fails at larger ANI distance #543 introduced recently that causes different/unintended failure modes in assemble_denovo on example exercises. This PR now has the scaffold task fallback to old brute-force reference selection if ANI-based reference selection fails to find any matches at all. This will work fine (same as before) in most historically-normal use cases, but will not behave well if given a very large array of reference genomes to choose from. (this change does not impact the behavior of scaffold_and_refine_multitaxa which does not pass multiple references to the scaffold task anyway.
  2. Updates viral-core and viral-assemble docker images to latest (2.3.2 and 2.3.2.0)
  3. Updates cromwell, womtool, and dxCompiler versions for build
  4. minor cleanups to docbuild
  5. drops any direct use of util.file.zstd_open in favor of zstandard.open
  6. increase default RAM for reports.alignment_metrics
  7. bugfix some bed file sorting for samtools ampliconstats in the case of multi-segment targets -- also allow samtools ampliconstats to fail silently in the case that it's getting too picky about the bed file
  8. change the mafft workflow to use multi_align_mafft_ref instead of multi_align_mafft
  9. make the terra_tsv_to_table workflow resilient to non-existent TSVs in its input array, to simplify running it on a Terra table full of tsvs that may or may not exist
  10. Some initial WiP code for multi-species genbank prep, not far along yet.

@dpark01 dpark01 self-assigned this Aug 2, 2024
@dpark01 dpark01 requested a review from tomkinsc August 2, 2024 13:31
@dpark01 dpark01 enabled auto-merge August 2, 2024 13:31
@dpark01 dpark01 changed the title sacaffolding regression fixes plus docker updates scaffolding regression fixes plus docker updates Aug 2, 2024
@dpark01 dpark01 added this pull request to the merge queue Aug 4, 2024
~{true='--allow_incomplete_output' false="" allow_incomplete_output} \
--loglevel=DEBUG

cut -f 3 "~{sample_name}.refs_skani_dist.full.tsv" | tail +2 | head -1 > SKANI_ANI
Copy link
Member

@tomkinsc tomkinsc Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 323-325 are bit repetitive with line 305 above, and it may be nice to also have these values in the log for debugging—what about something like:

chosen_ref_data_fields=($(sed -n '2{p;q;}' "~{sample_name}.refs_skani_dist.full.tsv"))

CHOSEN_REF_FASTA="${chosen_ref_data_fields[0]}"
tee <<< "${CHOSEN_REF_FASTA}" CHOSEN_REF
tee <<< "${CHOSEN_REF_FASTA%.*}" CHOSEN_REF_BASENAME

tee <<< "${chosen_ref_data_fields[2]}" SKANI_ANI
tee <<< "${chosen_ref_data_fields[3]}" SKANI_REF_AF
tee <<< "${chosen_ref_data_fields[4]}" SKANI_CONTIGS_AF

Merged via the queue into master with commit e28bec4 Aug 4, 2024
14 checks passed
@tomkinsc
Copy link
Member

tomkinsc commented Aug 4, 2024

The merge of this to master triggered a deployment to dnanexus, which is currently failing to compile via dxWDL:
https://github.com/broadinstitute/viral-pipelines/actions/runs/10237505135/job/28320935779
The issue seems to be with tasks_megablast.wdl. The checks in this PR didn't catch it because deployment to dx is currently only triggered after a merge into master. I'm looking into it and may change actions behavior so we actually test deployment to dx in PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants