-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GVS / Hail VDS integration test [VS-639] #8086
Merged
Merged
Changes from 46 commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
7bbf4d3
wip
mcovarr 7fc099b
integration
mcovarr 757fa22
hopefully fail in more interesting ways
mcovarr 059e7b1
spike-worthy hackery
mcovarr c422cf8
increase hackery
mcovarr 7a1ea90
huh
mcovarr f41226e
maybe
mcovarr c62a283
wip
mcovarr d835a2e
wip
mcovarr bc3d001
doh
mcovarr 67a8453
fix
mcovarr 27f2690
cleanup
mcovarr b4a0e96
oops
mcovarr 1a3c10c
fix
mcovarr 23333c7
more spikey wip
mcovarr 1834e5f
wip
mcovarr 4daa85e
oops
mcovarr f3a4084
gah
mcovarr b7ca91d
revert many of the differences with non-hail integration test
mcovarr d1d71b0
fixes
mcovarr c85d452
so much DRYing
mcovarr d207592
restore my beauteous whitespace
mcovarr c751af3
oops need drop_state NONE
mcovarr 99e867c
hackery for short cycle times
mcovarr 1210bb2
Revert "hackery for short cycle times"
mcovarr 3b0ec84
fixees
mcovarr 54c06c8
wip
mcovarr 2926093
wip
mcovarr 397268f
fix from Tim
mcovarr 0b0b84a
Revert "wip"
mcovarr e3fb81f
hacked up to resume
mcovarr b1d5dad
whoops
mcovarr 721cc2a
Revert "hacked up to resume"
mcovarr 2978931
uber integration wdl
mcovarr 08ebbcc
separate test run prefixes
mcovarr 86c6e9b
cleanup
mcovarr b0b1160
Revert "separate test run prefixes"
mcovarr a228651
rework
mcovarr 80eb983
fixes / improvements
mcovarr 901a3ed
gah
mcovarr c49641a
oops
mcovarr 98dcede
checkpoint
mcovarr 0b029e3
cleanup
mcovarr ba03a9f
cleanup, update references
mcovarr f8a4bb5
oops
mcovarr ed64917
omg
mcovarr 308c703
comment my trickery
mcovarr b42d7b1
delete erroneous comment
mcovarr bd65357
remove junk
mcovarr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
145 changes: 145 additions & 0 deletions
145
scripts/variantstore/wdl/GvsQuickstartHailIntegration.wdl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
version 1.0 | ||
|
||
import "GvsUtils.wdl" as Utils | ||
import "GvsExtractAvroFilesForHail.wdl" as ExtractAvroFilesForHail | ||
import "GvsQuickstartVcfIntegration.wdl" as QuickstartVcfIntegration | ||
|
||
workflow GvsQuickstartHailIntegration { | ||
input { | ||
String branch_name | ||
String hail_wheel = "gs://gvs-internal-scratch/hail-wheels/2022-10-18/0.2.102-964bee061eb0/hail-0.2.102-py3-none-any.whl" | ||
} | ||
|
||
String project_id = "gvs-internal" | ||
|
||
call QuickstartVcfIntegration.GvsQuickstartVcfIntegration { | ||
input: | ||
branch_name = branch_name, | ||
drop_state = "NONE", | ||
extract_do_not_filter_override = false, | ||
dataset_suffix = "hail", | ||
} | ||
|
||
call ExtractAvroFilesForHail.GvsExtractAvroFilesForHail { | ||
input: | ||
go = GvsQuickstartVcfIntegration.done, | ||
project_id = project_id, | ||
dataset = GvsQuickstartVcfIntegration.dataset_name, | ||
filter_set_name = GvsQuickstartVcfIntegration.filter_set_name, | ||
scatter_width = 10, | ||
} | ||
|
||
call CreateAndTieOutVds { | ||
input: | ||
branch_name = branch_name, | ||
hail_wheel = hail_wheel, | ||
avro_prefix = GvsExtractAvroFilesForHail.avro_prefix, | ||
vds_destination_path = GvsExtractAvroFilesForHail.vds_output_path, | ||
tieout_vcfs = GvsQuickstartVcfIntegration.output_vcfs, | ||
tieout_vcf_indexes = GvsQuickstartVcfIntegration.output_vcf_indexes, | ||
} | ||
|
||
output { | ||
Array[File] output_vcfs = GvsQuickstartVcfIntegration.output_vcfs | ||
Array[File] output_vcf_indexes = GvsQuickstartVcfIntegration.output_vcf_indexes | ||
Float total_vcfs_size_mb = GvsQuickstartVcfIntegration.total_vcfs_size_mb | ||
File manifest = GvsQuickstartVcfIntegration.manifest | ||
String vds_output_path = GvsExtractAvroFilesForHail.vds_output_path | ||
Boolean done = true | ||
} | ||
} | ||
|
||
|
||
task CreateAndTieOutVds { | ||
input { | ||
File hail_wheel | ||
String branch_name | ||
String avro_prefix | ||
String vds_destination_path | ||
Array[File] tieout_vcfs | ||
Array[File] tieout_vcf_indexes | ||
} | ||
parameter_meta { | ||
tieout_vcfs: { | ||
localization_optional: true | ||
} | ||
tieout_vcf_indexes: { | ||
localization_optional: true | ||
} | ||
} | ||
command <<< | ||
# Prepend date, time and pwd to xtrace log entries. | ||
PS4='\D{+%F %T} \w $ ' | ||
set -o errexit -o nounset -o pipefail -o xtrace | ||
|
||
script_url_prefix="https://raw.githubusercontent.com/broadinstitute/gatk/~{branch_name}/scripts/variantstore/wdl/extract" | ||
|
||
for script in hail_gvs_import.py hail_join_vds_vcfs.py gvs_vds_tie_out.py | ||
do | ||
curl --silent --location --remote-name "${script_url_prefix}/${script}" | ||
done | ||
|
||
# Create a manifest of VCFs and indexes to bulk download with `gcloud storage cp`. | ||
touch vcf_manifest.txt | ||
# This is extremely noisy and not interesting, turn off xtrace. | ||
set +o xtrace | ||
for file in ~{sep=' ' tieout_vcfs} ~{sep=' ' tieout_vcf_indexes} | ||
do | ||
echo $file >> vcf_manifest.txt | ||
done | ||
# xtrace back on | ||
set -o xtrace | ||
|
||
# Copy VCFs and indexes to the current directory. | ||
cat vcf_manifest.txt | gcloud storage cp -I . | ||
|
||
# `avro_prefix` includes a trailing `avro` so don't add another `avro` here. | ||
gcloud storage cp --recursive ~{avro_prefix} $PWD | ||
|
||
export REFERENCES_PATH=$PWD/references | ||
mkdir -p ${REFERENCES_PATH} | ||
|
||
gcloud storage cp 'gs://hail-common/references/Homo_sapiens_assembly38.fasta*' ${REFERENCES_PATH} | ||
|
||
# Temurin Java 8 | ||
apt-get -qq install wget apt-transport-https gnupg | ||
wget -O - https://packages.adoptium.net/artifactory/api/gpg/key/public | apt-key add - | ||
echo "deb https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | tee /etc/apt/sources.list.d/adoptium.list | ||
apt-get -qq update | ||
apt -qq install -y temurin-8-jdk | ||
|
||
pip install ~{hail_wheel} | ||
export PYSPARK_SUBMIT_ARGS='--driver-memory 16g --executor-memory 16g pyspark-shell' | ||
|
||
export WORK=$PWD/work | ||
mkdir ${WORK} | ||
|
||
export TEMP_PATH=$WORK/temp | ||
mkdir ${TEMP_PATH} | ||
|
||
export VDS_PATH=$WORK/gvs_import.vds | ||
export AVRO_PATH=$PWD/avro | ||
|
||
python3 ./hail_gvs_import.py --avro-path ${AVRO_PATH} --vds-path ${VDS_PATH} --temp-path ${TEMP_PATH} --references-path ${REFERENCES_PATH} | ||
|
||
export JOINED_MATRIX_TABLE_PATH=${WORK}/joined.mt | ||
|
||
python3 ./hail_join_vds_vcfs.py --vds-path ${VDS_PATH} --joined-matrix-table-path ${JOINED_MATRIX_TABLE_PATH} *.vcf.gz | ||
|
||
# Copy up the VDS | ||
gcloud storage cp --recursive ${VDS_PATH} ~{vds_destination_path} | ||
|
||
pip install pytest | ||
ln -s ${WORK}/joined.mt . | ||
pytest ./gvs_vds_tie_out.py | ||
>>> | ||
runtime { | ||
# `slim` here to be able to use Java | ||
docker: "gcr.io/google.com/cloudsdktool/cloud-sdk:409.0.0-slim" | ||
disks: "local-disk 2000 HDD" | ||
memory: "30 GiB" | ||
} | ||
output { | ||
Boolean done = true | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is clever, but it might be worth adding a comment somewhere about why we are doing this?