-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bundle jobs #227
Bundle jobs #227
Conversation
1d1c40f
to
f34516b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@golaz This is ready for review. I've included comments explaining the code. My suggested order of review is:
docs/
andtests/
(to get context on howzppy
is run with bundles)zppy/__main__.py
andzppy/utils.py
(for the majority of the code that enables the bundle functionality)- Other files in
zppy
: the task files &zppy/templates/default.ini
# bundle1 and bundle2 should run. After they finish, run: | ||
rm /lcrc/group/e3sm/ac.forsyth2/zppy_test_bundles_output/v2.LR.historical_0201/post/scripts/bundle3.bash | ||
# (If this file isn't deleted, zppy will fail because it assumes bundle3 is already running). | ||
zppy -c tests/integration/test_bundles.cfg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, bundle3
and ilamb_run
have to be run after bundle1
finishes. This is because of a dependency within bundle1
. Unfortunately, I haven't come up with a good way to automatically launch later tasks/bundles when the bundle they depend on finishes.
One possibility is:
- Launch
bundle1
, keeping track of its job id. - When we reach
bundle3
we'd see that we're missing a dependency. [In the current implementation, we'd simply skipbundle3
(meaning the user has to manually re-runzppy
at a later point -- oncebundle1
finishes.)] - Comb through every bundle's bash file to find the dependency. We'd find that
bundle1
contains our dependency. - Submit
bundle3
with a dependency on the job id ofbundle1
.
That would presumably work for a bundle depending on another -- but having ilamb_run
(not in a bundle in this case) depend on bundle1
also causes problems. In this implementation, we process bundles after all the tasks -- so we'd presumably have to once again cycle through all the tasks to launch ilamb_run
this time with the job id of bundle1
as a dependency.
In any case, it probably makes sense to consider this in its own issue/pull request.
rm -rf /lcrc/group/e3sm/ac.forsyth2/zppy_test_bundles_output/v2.LR.historical_0201/post | ||
zppy -c tests/integration/test_bundles.cfg | ||
# bundle1 and bundle2 should run. After they finish, run: | ||
rm /lcrc/group/e3sm/ac.forsyth2/zppy_test_bundles_output/v2.LR.historical_0201/post/scripts/bundle3.bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting quirk. The current implementation doesn't create status files for the bundles, so if you re-run zppy
immediately, then it will try to re-run tasks that were just done. To prevent users from accidentally overwriting anything, I just made zppy
raise an error in such case.
Unfortunately, by the time dependencies are checked, the bundle's bash file already exists. So, if the bundle depends on something that isn't finished, it gets skipped. That means if you re-run zppy
once that dependency has finished, that error will still get raised (because the bash file exists). So, the bash file needs to be deleted (or renamed) before re-running.
Now that I'm typing this however, let me look into this a bit more. It seems like zppy
wouldn't try to run individual tasks with status files that say "RUNNING", but I believe I was running into that. Perhaps status files for bundles could fix the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@golaz a few options included in the new second commit:
Simply remove the FileExistsError
: tests still pass except for the bash file content test. Basically, if you rerun zppy
while bundle1
is running, bundle1.bash
gets overwritten (so it doesn't have the lines to run scripts that already have status files with "OK"). Presumably work begins twice on scripts that have not yet been run.
Check for existence of a status file (e.g., bundle1.status
) rather than a bash file: this seems to be the best solution.
- Downside is if
bundle1
is still running but has finished the task thatbundle3
needs,zppy
will error out before having a chance to runbundle3
(it will try to re-runbundle1
, which is currently running, before getting tobundle3
). - Another downside is that it takes a few seconds for the status files to get created, so it's still possible to launch two identical jobs if you re-run
zppy
fast enough. - Upside is that there is no need to delete
bundle3.bash
or any other files anymore --bundle3.status
doesn't get created until after it actually starts running. - Tests still pass with this change, aside from the bash file content test (will need to include the line printing to the bundle's status file).
Lastly, I included, in a commented out section, code to print an error status if a particular script fails: while this would allow us to have a status simply beyond "the bundle has started", it does add of a lot lines to the scripts, for not much added value, I think.
def test_bundles_bash_file_content(self): | ||
actual_directory = "/lcrc/group/e3sm/ac.forsyth2/zppy_test_bundles_output/v2.LR.historical_0201/post/scripts" | ||
expected_directory = "/lcrc/group/e3sm/public_html/zppy_test_resources/expected_bundles/bundle_files" | ||
# Check that bundle files are correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: move comment to immediately after def
line.
os.path.join(diff_dir, "{}_diff.png".format(simple_image_name)), | ||
"PNG", | ||
) | ||
from tests.integration.utils import check_mismatched_images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since similar logic is used for the bundles
test, I put the common logic in utils
.
# climo tasks | ||
climo(config, scriptDir) | ||
existing_bundles = climo(config, scriptDir, existing_bundles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We start with an empty list of bundles. As we go through each task, we build up a list of existing bundles. That way, we don't end up with two versions of the same bundle.
# If any script fails, no new ones should try to run. | ||
# It's possible the failed script is a dependency for later scripts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set the script to fail if any task fails, since there may be dependencies.
return existing_bundles | ||
for b in existing_bundles: | ||
if b.bundle_name == bundle_name: | ||
# This bundle already exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prevents us from creating a bundle that already exists.
# If one task requires export="ALL", then the bundle script will need it as well | ||
bundle.export = export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're only launching one job rather than many, we have to make a decision on the value for export
. I decided that if any task required export="ALL"
, then the whole bundle would.
|
||
|
||
# ----------------------------------------------------------------------------- | ||
def submitScript(scriptFile, export, dependFiles=[]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export
is now a required parameter
b.display_dependencies() | ||
if not b.dry_run: | ||
submitScript( | ||
b.bundle_file, b.export, dependFiles=b.dependencies_not_in_bundle_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding my comment in docs/source/dev_guide/testing.rst
: when we submit bundle3
, it will have a dependency listed in dependencies_not_in_bundle_file
which is run as part of bundle1
. As the code currently is, however, it has no idea that this dependency is in bundle1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work on this PR. I've made a few more modifications to your PR. The changes are available on a separate branch: https://github.com/E3SM-Project/zppy/tree/bundle-jobs. Maybe you can import them into this PR.
My changes add the following:
- Bundle bash scripts are now constructed from a Jinja2 template. This will facilitate future changes.
- Ability to explicitly declare bundle jobs in the cfg file via new section and sub-sections.
[bundle]
[[ bundle2 ]]
nodes=2
- Some code restructuring.
Notes:
- black doesn't work. You branched before the changes to fix black were merged in. I used
--no-verify
- The integration tests run fine, but I have not updated the expected output, so one reports a failure.
- For the complete integration tests, the status files for mpas_analysis and tc have the weird 'NING ' status.
Closing in favor of #243. |
Bundle jobs. Resolves #189.