Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] [Doc] Add a tip for retry mechanism in scale-to-zero #3232

Merged
merged 3 commits into from
Feb 27, 2024

Conversation

MaoZiming
Copy link
Collaborator

#3194 (comment)

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we explain why we need a retry mechanism?

@MaoZiming
Copy link
Collaborator Author

We mentioned "wait until the replicas are provisioned and ready"?

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tip in the doc @MaoZiming! Left a question of the autoscaler behavior. : )


.. tip::

If the scale-to-zero is set, the clients that access the endpoint should make sure to have a retry mechanism to be able to wait until the replicas are provisioned and ready.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the scale-to-zero is set, the clients that access the endpoint should make sure to have a retry mechanism to be able to wait until the replicas are provisioned and ready.
If the scale-to-zero is set, the clients that access the endpoint should make sure to have a retry mechanism to be able to wait until the replicas are provisioned and ready, i.e., starting a new replica when there is zero replica available.

Question: how does our autoscaler handle the case for retrying? If the client keeps retrying for failed requests, will those failed requests be considered for calculating the target_num_replicas? Need to think about the case, when zero replica is available and the client keeps retrying, causing the number of requests in the window become significantly large. Will it cause the autoscaler to scale up the service to max_replicas?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The failed requests will contribute to calculating the target_num_replicas.
It might cause the autoscaler to scale up to max_replicas. As long as the retry frequency is not too big we should be fine

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we avoid the request to make target_num_replicas to be larger than 1 when there is no ready replicas?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think maybe we should not - SkyServe as it is now cannot differentiate between retry requests and actual user requests. When there is no ready replica but accumulated requests, it is possible that there is a big increase in user requests and we might want to scale to more replicas (even though some of them might be retries) to be safe.

@MaoZiming MaoZiming merged commit 7484d98 into master Feb 27, 2024
19 checks passed
@MaoZiming MaoZiming deleted the serve-scale-to-zero-doc branch February 27, 2024 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants