-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the status method in kubernetes provider #1740
Conversation
except Exception: | ||
logger.warning("Failed to poll pod {} status, most likely because pod was terminated".format(jid)) | ||
if self.resources[jid]['status'] is JobStatus(JobState.RUNNING): | ||
phase = 'Unknown' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first time the state changes from Running -> missing, we should be setting some terminal state. Next time status is polled, we should skip these blocks. I don't know what the strategy could do with unknown jobstate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine to review now @yadudoc . Kube provider only polls those pods with non-terminal states.
@ZhuozhaoLi I'm thinking this might be better going into 1.1. |
@BenGalewsky You might want to take a quick look at this. This impacts funcX. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZZ, this PR looks good to go. It's unclear whether the changes to test_site.py
necessary for this PR. Can you clarify why this was added?
The test config for general providers is a block with 2 nodes, while the config for k8s has 2 blocks and each has only 1 node (one cannot request 2 nodes in 1 block for k8s). This is to ensure there will be two workers before real testing (adding a small sleep to ensure even if the |
Okay, that makes sense. |
This PR modifies the
_status
method in kubernetes provider.It also changes the site test
test_site.py
a bit because in Kubernetes provider, there cant be more than nodes on one block, so instead there are two blocks, each with one node. The second block takes some time to launch, causing the later assertionassert len(pinfo) == 2, "Expected two nodes, instead got {}".format(pinfo)
in the test to fail.