Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns-controller: allow configuring DNS update interval #5759

Conversation

diversario
Copy link
Member

Given enough DNS records, and with update batching, Route53 will throttle API calls. We set this in our forked DNS controller to 30 seconds to avoid throttling.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 6, 2018
@diversario diversario force-pushed the dns-controller-configurable-update-interval branch from 3552c2e to 39501bc Compare September 6, 2018 18:44
@diversario diversario force-pushed the dns-controller-configurable-update-interval branch from 39501bc to 7125287 Compare September 6, 2018 19:30
@idealhack
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 7, 2018
@chrisz100
Copy link
Contributor

I am wondering if this is the way we should go about this problem... @justinsb @chrislovecnm any thoughts?

@diversario
Copy link
Member Author

This is definitely a workaround and if there's a better solution I'm all for that.

AWS R53 docs say Five requests per second per AWS account. but my understanding is that this isn't exactly how it happens – that they use a token bucket approach, and in certain configurations DNS updates become effectively not possible with the current DNS controller config. In our situation, we have several clusters (envs) in the same AWS account. All of them have a large number of records. We had to lower our batch size to keep updates under 32000 chars, and that, in turn, caused the number of requests go above the throttling limit. Lower update frequency helps us stay under the limit.

Perhaps a better approach is to space requests a bit but I think that would require modification in the R53 AWS provider.

@geojaz
Copy link
Member

geojaz commented Sep 14, 2018

I think this makes sense- it's fine by me.

But why don't we let folks configure the interval AND set it to 30 seconds by default? Is 30 seconds an unreasonably long interval for "normal" circumstances? Rate limiting is a constant problem on AWS, I'd rather be a little bit restrained than try to hammer the API and unexpectedly break things.

@diversario
Copy link
Member Author

Well, I personally have nothing against 30s by default, but I'm saying this without understanding why it is 5s currently. If 5s was chosen arbitrarily then I see no problem with increasing the delay.

@justinsb
Copy link
Member

30 seconds would mean at least a 25 second slow-down on cluster-bring-up (vs 5 seconds), and I bet there's more than one (etcd & apiserver records).

I'd rather keep this defaulting to ~ 5 seconds; I certainly agree having a better strategy would be preferable.

I think the challenge is that the rate limit isn't easily known, so we instead back-off when we see it. If you have logs you could share that might be helpful (e.g. on slack). I know that we could be smarter about batch-size to try to pack each batch full (i.e. estimate / compute the record size).

AFAICT, you haven't exposed the flag in kops itself? I guess that means you are manually setting it; and FYI I think that means that kops will clear the flag it every time we update dns-controller. But that can be a separate PR!

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: diversario, justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 21, 2018
@k8s-ci-robot k8s-ci-robot merged commit 19575ec into kubernetes:master Sep 21, 2018
@diversario diversario deleted the dns-controller-configurable-update-interval branch September 24, 2018 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants