-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rerunning a failed dataset collection element should substitute the failed element #2235
Comments
@jmchilton Should this go on the Roadmap #1928? |
Closed
8 tasks
nsoranzo
changed the title
Rerunning a failed dataset collection element does substitute the failed element
Rerunning a failed dataset collection element should substitute the failed element
Oct 1, 2016
Closed
1 task
mvdbeek
added a commit
to mvdbeek/galaxy
that referenced
this issue
Dec 30, 2017
This specifically addresses the problem where some jobs of a mapped-over collection have failed. Instead of filtering the failed collection and restarting the workflow at this position (involving a lot of copy-paste ...) the user can now limit the rerun to the problematic jobs and the workflow should resume from there. Should fix galaxyproject#2235. This is one possible implementation, it would also be feasible to not manipulate the original collection, but to copy the HDCA and then to replace collection elements and replace all references for jobs that depend on the HDCA, as we do for HDAs. This implementation seems simpler, but let me know if you see problems with this approach.
mvdbeek
added a commit
to mvdbeek/galaxy
that referenced
this issue
Dec 31, 2017
This specifically addresses the problem where some jobs of a mapped-over collection have failed. Instead of filtering the failed collection and restarting the workflow at this position (involving a lot of copy-paste ...) the user can now limit the rerun to the problematic jobs and the workflow should resume from there. Should fix galaxyproject#2235. This is one possible implementation, it would also be feasible to not manipulate the original collection, but to copy the HDCA and then to replace collection elements and replace all references for jobs that depend on the HDCA, as we do for HDAs. This implementation seems simpler, but let me know if you see problems with this approach.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Presently if an element of a dataset collection fails (e.g. for a problem on a cluster node), rerunning it will create a new history dataset outside of the collection. In this way, the collection will remain in a failed state and it won't be possible to use it as input for other tools.
This is a serious problem for large collections with thousands of elements, in which the probability of having a randomly failed job is quite high.
The text was updated successfully, but these errors were encountered: