-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Exponential Backoff for Retry Until Up #2821
Remove Exponential Backoff for Retry Until Up #2821
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this @rohanvaidya45! It might be better to remove the exponential backoff only and keep the random jitter.
sky/backends/cloud_vm_ray_backend.py
Outdated
@@ -2859,7 +2859,7 @@ def _provision( | |||
# TODO(suquark): once we have sky on PyPI, we should directly | |||
# install sky from PyPI. | |||
local_wheel_path, wheel_hash = wheel_utils.build_sky_wheel() | |||
backoff = common_utils.Backoff(_RETRY_UNTIL_UP_INIT_GAP_SECONDS) | |||
jitter = common_utils.Jitter(_RETRY_UNTIL_UP_INIT_GAP_SECONDS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just use the Backoff
with max_backoff_factor=1
?
@@ -2859,7 +2859,9 @@ def _provision( | |||
# TODO(suquark): once we have sky on PyPI, we should directly | |||
# install sky from PyPI. | |||
local_wheel_path, wheel_hash = wheel_utils.build_sky_wheel() | |||
backoff = common_utils.Backoff(_RETRY_UNTIL_UP_INIT_GAP_SECONDS) | |||
backoff = common_utils.Backoff( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth a comment on why we disable exponential backoff. Something like:
The most frequent reason for the failure of a provision request is resource unavailability instead of rate limiting; to make users wait shorter, we do not make the backoffs exponential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this @rohanvaidya45! LGTM.
# limiting; to make users wait shorter, we do not make | ||
# backoffs exponential. | ||
backoff = common_utils.Backoff( | ||
initial_backoff=_RETRY_UNTIL_UP_INIT_GAP_SECONDS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tried a few clouds and it seems we can also reduce this _RETRY_UNTIL_UP_INIT_GAP_SECONDS
to 30
. : )
* remove exponential backoff * add back random jitter * format * use backoff * format * add comment * rerun checks * format * pylint * form * change to 30
Looks to resolve #2805
Replaces the use of exponential backoff with the static initial value each retry
Tested:
bash format.sh
pytest tests/test_smoke.py