-
Notifications
You must be signed in to change notification settings - Fork 47
WIP for launching batch runs on AWS directly from COVIDScenarioPipeline #223
Conversation
batch/job_launcher.py
Outdated
import yaml | ||
|
||
@click.command() | ||
@click.option("-p", "--job-prefix", type=str, required=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: intelligent defaults to reduce the number of required params would be nice. e.g., job-prefix
is name
in config file plus a uniqifer, num-jobs
is calculated based on # of scenarios/# of sims with some cap, etc.
batch/job_launcher.py
Outdated
# Update and save the config file with the number of sims to run | ||
print("Updating config file %s to run %d simulations..." % (config_file, sims_per_job)) | ||
config = open(config_file).read() | ||
config = re.sub("nsimulations: \d+", "nsimulations: %d" % sims_per_job, config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would use pyyaml or the utils.config
module to read-modify-write versus regex, this seems error-prone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I did at first, but the output was a total re-ordering of the entries in the config file, which seemed not great; I wonder if there's a way to keep the ordering of the entries consistent on the read-edit-write cycle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this thread, you need to set sort_keys=False
on the dump. Trying it locally, this preserves key order in the file, but strips comments and whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we would be okay with that, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think it's fine
batch/job_launcher.py
Outdated
{"name": "DVC_OUTPUTS", "value": " ".join(dvc_outputs)}, | ||
{"name": "S3_RESULTS_PATH", "value": results_path} | ||
] | ||
s3_cp_run_script = "aws s3 cp s3://%s/%s $PWD/run-covid-pipeline" % (s3_input_bucket, runner_script_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(aside, don't know if you know about fstrings in python 3.6+, they make subbing variables much easier/neater)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not really; will take a look!
batch/job_launcher.py
Outdated
@click.option("-i", "--s3-input-bucket", "s3_input_bucket", type=str, default="idd-input-data-sets") | ||
@click.option("-o", "--s3-output-bucket", "s3_output_bucket", type=str, default="idd-pipeline-results") | ||
@click.option("-d", "--job-definition", "batch_job_definition", type=str, default="Batch-CovidPipeline-Job") | ||
@click.option("-q", "--job-queue", "batch_job_queue", type=str, default="Batch-CovidPipeline") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for these params, i think you want show_default=True
set so when you do a --help
you'll see what the default value is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also missing helpstrings :)
nice! this looks really good -- thanks for doing it! i only have minor/trivial comments |
for output in "${DVC_OUTPUTS_ARRAY[@]}" | ||
do | ||
"Saving output $output" | ||
tar cv --use-compress-program=pbzip2 -f $output.tar.bz2 $output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is compression required? We took that route initially but found that just a sync was sufficient and faster overall.
s3_cp_run_script = f"aws s3 cp s3://{s3_input_bucket}/{runner_script_name} $PWD/run-covid-pipeline" | ||
command = ["sh", "-c", f"{s3_cp_run_script}; /bin/bash $PWD/run-covid-pipeline"] | ||
container_overrides = { | ||
'vcpus': 72, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should one allow an override on that? If the advanced user wants a different configuration?
can we change this to use |
Going to be iterating on this for a bit, but posting it for feedback from folks now.