Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the status method in kubernetes provider #1740

Merged
merged 8 commits into from
Feb 8, 2021
Merged

Conversation

ZhuozhaoLi
Copy link
Contributor

@ZhuozhaoLi ZhuozhaoLi commented Jun 8, 2020

This PR modifies the _status method in kubernetes provider.

It also changes the site test test_site.py a bit because in Kubernetes provider, there cant be more than nodes on one block, so instead there are two blocks, each with one node. The second block takes some time to launch, causing the later assertion assert len(pinfo) == 2, "Expected two nodes, instead got {}".format(pinfo) in the test to fail.

@ZhuozhaoLi ZhuozhaoLi requested a review from yadudoc June 10, 2020 15:16
except Exception:
logger.warning("Failed to poll pod {} status, most likely because pod was terminated".format(jid))
if self.resources[jid]['status'] is JobStatus(JobState.RUNNING):
phase = 'Unknown'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time the state changes from Running -> missing, we should be setting some terminal state. Next time status is polled, we should skip these blocks. I don't know what the strategy could do with unknown jobstate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine to review now @yadudoc . Kube provider only polls those pods with non-terminal states.

@yadudoc
Copy link
Member

yadudoc commented Jun 10, 2020

@ZhuozhaoLi I'm thinking this might be better going into 1.1.

@yadudoc yadudoc modified the milestones: 1.0, 1.1 Jun 10, 2020
@ZhuozhaoLi ZhuozhaoLi changed the title Fix kube status polling Fix the status method in kubernetes provider Jun 18, 2020
@yadudoc
Copy link
Member

yadudoc commented Dec 11, 2020

@BenGalewsky You might want to take a quick look at this. This impacts funcX.

Copy link
Member

@yadudoc yadudoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZZ, this PR looks good to go. It's unclear whether the changes to test_site.py necessary for this PR. Can you clarify why this was added?

@ZhuozhaoLi
Copy link
Contributor Author

ZZ, this PR looks good to go. It's unclear whether the changes to test_site.py necessary for this PR. Can you clarify why this was added?

The test config for general providers is a block with 2 nodes, while the config for k8s has 2 blocks and each has only 1 node (one cannot request 2 nodes in 1 block for k8s). This is to ensure there will be two workers before real testing (adding a small sleep to ensure even if the scale_out is too slow).

@yadudoc
Copy link
Member

yadudoc commented Feb 8, 2021

Okay, that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants