Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSPS-142 updates to help creating simulated reference panel and running imputation against it #1296

Merged
merged 13 commits into from
Jun 10, 2024

Conversation

jsotobroad
Copy link
Contributor

@jsotobroad jsotobroad commented Jun 7, 2024

Description

Dont think we actually want to merge this yet but this is the wdl i ran to test the simulated data through the beagle pipelines. we shoudl chat about how we want to pass the reference panel basename with the actual refernece panel

here is a run across all chromosomes using the 10k simulated reference panel. Note that an error override of 0 was used.


Checklist

If you can answer "yes" to the following items, please add a checkmark next to the appropriate checklist item(s) and notify our WARP documentation team by tagging either @ekiernan or @kayleemathews in a comment on this PR.

  • Did you add inputs, outputs, or tasks to a workflow?
  • Did you modify, delete or move: file paths, file names, input names, output names, or task names?
  • If you made a changelog update, did you update the pipeline version number?

@jsotobroad jsotobroad requested a review from mmorgantaylor June 7, 2024 20:27
@dsde-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

Copy link
Member

@mmorgantaylor mmorgantaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some thoughts on specifying the ref panel path / file names

@@ -40,12 +42,12 @@ workflow ImputationBeagle {

scatter (contig_index in range(length(contigs))) {
# these are specific to hg38 - contig is format 'chr1'
String reference_filename = reference_panel_path + "hgdp.tgp.gwaspy.merged." + contigs[contig_index] + ".merged.AN_added.bcf.ac2"
String reference_basename = reference_panel_path + "sim.10k." + contigs[contig_index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should either pass in the basename as an input or add it to the reference_panel_path input so this can just be
String reference_basename = reference_panel_path + contigs[contig_index]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternately we could have the reference_panel_path be the full path with a regex string to substitute the contig using [sub] (https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-substring-string-string). so the reference_panel_path input would have to look like
https://lz8b0d07a4d28c13150a1a12.blob.core.windows.net/sc-94fd136b-4231-4e80-ab0c-76d8a2811066/hg38/simulated/sim.10k.<CONTIG> and then this line would become String reference_basename = sub(reference_panel_path, "<CONTIG>", contigs[contig_index])

i'm not sure which would be less confusing. using the sub solution would allow the existing ref panel paths to not be renamed.

@jsotobroad jsotobroad merged commit b404f9b into TSPS-183_mma_beagle_imputation_hg38 Jun 10, 2024
0 of 2 checks passed
@jsotobroad jsotobroad deleted the js_TSPS-142 branch June 10, 2024 17:52
mmorgantaylor pushed a commit that referenced this pull request Feb 13, 2025
…ng imputation against it (#1296)

* add optional error count override for testing

* rename reference base prefix variable and make it more user friendly

---------

Co-authored-by: Jose Soto <[email protected]>
mmorgantaylor pushed a commit that referenced this pull request Feb 18, 2025
…ng imputation against it (#1296)

* add optional error count override for testing

* rename reference base prefix variable and make it more user friendly

---------

Co-authored-by: Jose Soto <[email protected]>
mmorgantaylor pushed a commit that referenced this pull request Feb 19, 2025
…ng imputation against it (#1296)

* add optional error count override for testing

* rename reference base prefix variable and make it more user friendly

---------

Co-authored-by: Jose Soto <[email protected]>
mmorgantaylor pushed a commit that referenced this pull request Feb 25, 2025
…ng imputation against it (#1296)

* add optional error count override for testing

* rename reference base prefix variable and make it more user friendly

---------

Co-authored-by: Jose Soto <[email protected]>
nikellepetrillo added a commit that referenced this pull request Feb 28, 2025
* wip add beagle imputation stuff

* add 2 wdls to dockstore.yml

* fix docker gar url

* use the right path for jars

* wip on imputation wdl

* oops use correct jar

* missing equals

* fix java call again

* fix java call

* oops match file names

* update beagle jar to 01Mar24.d36

* debug GatherVcfs

* debug GatherVcfs 2

* try to resolve missing file issue

* don't impute over padding

* make the index again

* supply vcf_index input to SelectVariantsByIds

* update Imputation wdl too

* newlines

* update for hg38

* Revert "update for hg38"

This reverts commit 3757137.

* update for hg38

* liftover wdl

* remove GCP-specific vm commands

* use gatk

* fix suffix and basename

* fix more filenames

* remove missing contig stuff for now

* fix ref panel path

* another chr fix

* warn on missign contig

* do fail if missing contig

* more mem

* troubleshooting wld

* fixed plink path

* add select_first test

* cleanup

* add if block to test

* create and use ref panel interval list

* move interval list creation to ref panel wdl

* give default values for optional inputs, weird

* change CountVariants calls

* test

* add output to test

* next test

* more test

* another test

* update real task

* TSPS-226 presplit and prechunk beagle inputs (#1272)

*pre splitting and prechunking beagle imputation inputs to lower log numbers and storage account egress

---------

Co-authored-by: Jose Soto <[email protected]>

* TSPS-221 remove index input and add seed to make beagle tool deterministic (#1285)

* remove multi sample vcf index workflow input and add it to the PreSplitVcf task.
add seed number so that beagle is always deterministic. add comment to cpu input for PhaseAndImputeBeagle task

* change output_callset_name to output_base_name and remove optional outputs

* change n_failed_chunks ticket to an int

---------

Co-authored-by: Jose Soto <[email protected]>

* rename workflow

* TSPS-241 Clean up beagle wdl (#1288)

* clean up wdl with stuff from TSPS-241

* try to make fail fast work with double nested scatters

---------

Co-authored-by: Jose Soto <[email protected]>

* add specific gatk_docker

* TSPS-142 updates to help creating simulated reference panel and running imputation against it (#1296)

* add optional error count override for testing

* rename reference base prefix variable and make it more user friendly

---------

Co-authored-by: Jose Soto <[email protected]>

* add maxRetries 2 to all imputation beagle tasks

* add prechunk wdl to dockstore

* use acr for default ubuntu image

* add preemptible 3

* use acr gatk docker as default

* don't use preemptibles on GatherVcfs

* basename fix for imputation beagle ref panel generation (#1332)

* try auto specifying chr at end of basename

* both tasks

* add liftovervcfs to dockstore

* allow specifying max mem

* TSPS-269 Speed up CountVariantsInChunksBeagle by using bedtools (#1335)

* try creating bed files

* try again

* try again again

* a different thing

* use bedtools and bed ref panel files

* oops update the correct task

* fix

* use the right freaking file name

* remove comment

* update pipeline version to 0.0.2

* TSPS-293: Fix up streaming imputation beagle (#1347)

update ImputationBeagle

* add array imputation quota consumed wdl (#1425)

* add array imputation quota consumed wdl

* add changelogs for imputation array related workflows

---------

Co-authored-by: Jose Soto <[email protected]>

* TSPS-239 get wdl running on 400k sample ref panel (#1373)

* changes to help beagle imputation wdl run on a 400k sample reference panel

---------

Co-authored-by: Jose Soto <[email protected]>

* remove create imputation ref panel beagle wdl and changelog

* PR feedback

---------

Co-authored-by: Jose Soto <[email protected]>
Co-authored-by: M. Morgan Aster <[email protected]>

* add set -e -o pipefail to all relevant imputation tasks (#1434)

Co-authored-by: Jose Soto <[email protected]>

* TSPS-341 remove tasks for recovering variants not in the reference panel (#1468)

* remove tasks for recovering variants not in the reference panel and separate out beagle tasks from imputation tasks

* remove prechunk wdl and references to it
remove "Beagle" from task names in BeagleTasks.wdl

---------

Co-authored-by: Jose Soto <[email protected]>

* Updated pipeline_versions.txt with all pipeline version information

* [PR to feature branch] Add testing to imputation beagle (#1503)

* TSPS-239 get wdl running on 400k sample ref panel (#1373)

* changes to help beagle imputation wdl run on a 400k sample reference panel

---------

Co-authored-by: Jose Soto <[email protected]>

* remove create imputation ref panel beagle wdl and changelog

* PR feedback

---------

Co-authored-by: Jose Soto <[email protected]>
Co-authored-by: M. Morgan Aster <[email protected]>

* add new files for testing

* add test wdl to .dockstore.yml

* add test data json files, other updates

* version to 1.0.0, update changelog

* update beagle docker

* update beagle docker again

* fix call phase task

* re-deleting ImputationBeaglePreChunk.wdl

* temporarily try to run test on feature branch pr

* remove vault inputs

* update output basename for plumbing test

* remove feature branch from gha pr branches

* pr comments

* add quotes in VerifyTasks.CompareVcfs

* update dockers, move CreateVcfIndex to BeagleTasks

---------

Co-authored-by: jsotobroad <[email protected]>
Co-authored-by: Jose Soto <[email protected]>

* Updated pipeline_versions.txt with all pipeline version information

* remove newline at end of Utilities.wdl

* remove LiftoverVcfs, add README for imputation_beagle

* oops this commit adds the README for imputation_beagle

* rename test inputs files to reflect contents

* PR comments round 1

* Updated pipeline_versions.txt with all pipeline version information

* update changelog for BroadInternalImputation

* Updated pipeline_versions.txt with all pipeline version information

* add back newline to Utilities.wdl with -w flag on changed file check

* remove change to Minimac4 task

* revert change to tool command in OptionalQCSites

* fix fail task dependency, revert attempt to ignore newline in diff, other pr comments

* update README for ImputationBeagle

* rename test files

* Updated pipeline_versions.txt with all pipeline version information

* another commit for hashes

* Updated pipeline_versions.txt with all pipeline version information

* dummy commit

* pr comments

* Updated pipeline_versions.txt with all pipeline version information

* dummy commit

* dummy commit

---------

Co-authored-by: jsotobroad <[email protected]>
Co-authored-by: Jose Soto <[email protected]>
Co-authored-by: GitHub Action <[email protected]>
Co-authored-by: Nikelle Petrillo <[email protected]>
Co-authored-by: npetrill <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants