Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use exponential backoff for DNS updates #10996

Merged
merged 1 commit into from
Mar 9, 2021

Conversation

hakman
Copy link
Member

@hakman hakman commented Mar 9, 2021

The AWS Route53 API is severely rate-limited and makes periodic e2e tests fail. This may happen to users also, as they see DNS records not being updated quite often lately.
Not sure if AWS changed something or we're just noticing this, but there's not much that can be done except do this less often.

/cc @rifelpet @olemarkus

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 9, 2021
@olemarkus
Copy link
Member

Shouldn't we use some exponential backoff or something for this operation?

If the ratelimit is caused by external factors, say that there are many more tests running in the same region in the same account (by other installers/projects?), increasing to 30 just makes matter worse. We would try less often, but still risk being pulled into a rate limit.

@hakman
Copy link
Member Author

hakman commented Mar 9, 2021

I agree that exponential backoff is a better idea, though I don't get how increasing to 30s makes the matter worse.
In any case, 5s between runs is totally unreasonable in real life.

@olemarkus
Copy link
Member

If a user has some other system installed that triggers the ratelimit this change makes things worse because it will fail on just as many calls, but we make the calls less frequently. So it will need a lot more time to get a successful attempt through.

@hakman
Copy link
Member Author

hakman commented Mar 9, 2021

If a user has another system installed that triggers the rate limit somehow, nothing will help, not even exponential backoff will help. This way at least we reduce the chance of this happening because of too many requests.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 9, 2021
@hakman
Copy link
Member Author

hakman commented Mar 9, 2021

/retest

@hakman hakman force-pushed the dns-update-interval branch 2 times, most recently from c8cf7d2 to d468bc2 Compare March 9, 2021 10:04
@hakman hakman changed the title Increase DNS update interval to 30 seconds Use exponential backoff for DNS updates Mar 9, 2021
@hakman
Copy link
Member Author

hakman commented Mar 9, 2021

@olemarkus Does this look more appealing now?

@hakman hakman force-pushed the dns-update-interval branch from d468bc2 to 04aca04 Compare March 9, 2021 10:07
Copy link
Member

@olemarkus olemarkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly. Thanks for this, @hakman

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 9, 2021
@hakman
Copy link
Member Author

hakman commented Mar 9, 2021

Thanks @olemarkus. I want to fix one small issue before merging.
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 9, 2021
@hakman hakman force-pushed the dns-update-interval branch from 04aca04 to 057397f Compare March 9, 2021 11:40
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2021
@hakman hakman removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 9, 2021
@hakman hakman force-pushed the dns-update-interval branch from 057397f to 631b43b Compare March 9, 2021 12:07
@rifelpet
Copy link
Member

rifelpet commented Mar 9, 2021

There should be random jitter added to the backoff to avoid the thundering herd problem

@hakman
Copy link
Member Author

hakman commented Mar 9, 2021

There should be random jitter added to the backoff to avoid the thundering herd problem

I don't think we have many updates in parallel from the same cluster. Other clusters should not matter that much as they are already quite random.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: olemarkus, rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2021
@k8s-ci-robot k8s-ci-robot merged commit 3b7ca6e into kubernetes:master Mar 9, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 9, 2021
@hakman hakman deleted the dns-update-interval branch March 9, 2021 13:32
k8s-ci-robot added a commit that referenced this pull request Mar 11, 2021
…-upstream-release-1.19

Automated cherry pick of #10996: Use exponential backoff for DNS updates
k8s-ci-robot added a commit that referenced this pull request Mar 11, 2021
…-upstream-release-1.20

Automated cherry pick of #10996: Use exponential backoff for DNS updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants