-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and enhance job resume functionality #5247
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mvdbeek
changed the title
[WIP] Fix job resume functionality for non-prefixed input data
Fix job resume functionality for non-prefixed input data
Dec 29, 2017
mvdbeek
force-pushed
the
job_rerun_fixes
branch
from
December 30, 2017 19:03
d30e0e2
to
42bbda1
Compare
mvdbeek
changed the title
Fix job resume functionality for non-prefixed input data
Fix and enhance job resume functionality
Dec 30, 2017
This should fix galaxyproject#5222. The problem was that HDA ids in nested parameters were not always being updated properly. In the case of bowtie2 the input dataset is provided via a conditional, but the conditional prefix is not being stored in the JobToInputDatasetAssociation, and so the `update_param` function was not able to update the dataset id in these instances. The approach now is to simply replace all occurences of the old dataset id with the new dataset id. TODO: tests, refactor the replacement functionality into a separate function and make this work for JobToInputDatasetCollectionAssociation.
This test runs a workflow whose first step fails, followed by a tool that uses the first step's output as an input, which is behind a nested conditional. This recapitulates the bug described in galaxyproject#5222.
I am slightly surprised that this worked without change, but it appears that the remapping occurs via the HDAs that the HDCA is composed of.
This specifically addresses the problem where some jobs of a mapped-over collection have failed. Instead of filtering the failed collection and restarting the workflow at this position (involving a lot of copy-paste ...) the user can now limit the rerun to the problematic jobs and the workflow should resume from there. Should fix galaxyproject#2235. This is one possible implementation, it would also be feasible to not manipulate the original collection, but to copy the HDCA and then to replace collection elements and replace all references for jobs that depend on the HDCA, as we do for HDAs. This implementation seems simpler, but let me know if you see problems with this approach.
mvdbeek
force-pushed
the
job_rerun_fixes
branch
from
December 31, 2017 12:09
42bbda1
to
c0dbace
Compare
Wow - this is totally awesome. Great fix and great feature to finally have some test coverage of - this is very exciting! Thanks so much @mvdbeek. |
mvdbeek
added a commit
to mvdbeek/galaxy
that referenced
this pull request
Jan 17, 2018
Replacing failed collection elements was already possible when dependent jobs were found (galaxyproject#5247). This commit restructures the remapping so that remapping is possible when no dependent jobs are available. This also simplifies the replacement of HDAs between old and new jobs.
mvdbeek
added a commit
to mvdbeek/galaxy
that referenced
this pull request
Jan 17, 2018
Replacing failed collection elements was already possible when dependent jobs were found (galaxyproject#5247). This commit restructures the remapping so that remapping is possible when no dependent jobs are available. This also simplifies the replacement of HDAs between old and new jobs.
mvdbeek
added a commit
to mvdbeek/galaxy
that referenced
this pull request
Jan 17, 2018
Replacing failed collection elements was already possible when dependent jobs were found (galaxyproject#5247). This commit restructures the remapping so that remapping is possible when no dependent jobs are available. This also simplifies the replacement of HDAs between old and new jobs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This should fix #5222.
The problem was that HDA ids in nested parameters were not always being updated
properly. In the case of bowtie2 the input dataset is provided via a
conditional, but the conditional prefix is not being stored in the
JobToInputDatasetAssociation, and so the
update_param
function was not ableto update the dataset id in these instances. The approach now is to simply
replace all occurences of the old dataset id with the new dataset id.
The PR includes 2 testcases for this functionality, one for hda input and one for hdca input.
With this PR we also replace failed element of a collection with a job rerun (including a testcase), which fixes #2235.