-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jobs stuck in matched when using SingularityCE #6885
Comments
Not entirely sure about that but from what I have observed so far, the As the The results of the submissions are handled by a callback function named Thus, the following blocks of code do not seem to be executed: https://github.com/DIRACGrid/DIRAC/blob/rel-v8r0/src/DIRAC/WorkloadManagementSystem/Agent/JobAgent.py#L307-L315 A suggestion:
This solution means that the |
Good analysis, I don't find it incorrect. It is maybe possible to verify it with a test, it would be useful also when coding the solution. Regarding your suggestion: IMHO, PoolCE might even be the default "Inner CE" out there (if it's bug-free...). The reason why we kept it as the not-default one was exactly because of these possible bugs. |
Another thing I am not entirely sure to understand. |
It was true at the time the comment was written and was removed in df805d5. |
In LHCb we've been seeing jobs getting stuck in
Matched
when using SingularityCE instead of rescheduling when issues happen launching the container.I suspect it's caused by the interplay between the PoolCE with an inner SingularityCE.
The text was updated successfully, but these errors were encountered: