-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change gossip dns conn limit by ENV #5077
Conversation
/assign @andrewsykim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinsb why do we even have this?? How about we bump to 1000?
protokube/pkg/gossip/mesh/gossip.go
Outdated
if gossipDnsConnLimit != "" { | ||
limit, err := strconv.Atoi(gossipDnsConnLimit) | ||
if err != nil { | ||
return nil, fmt.Errorf("cannot parse env GOSSIP_DNS_CONN_LIMIT value: %v", gossipDnsConnLimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add err
to message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks.
protokube/pkg/gossip/mesh/gossip.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf("cannot parse env GOSSIP_DNS_CONN_LIMIT value: %v", gossipDnsConnLimit) | ||
} | ||
connLimit = limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can just be above,
connLimit, err := strconv.Atoi(gossipDnsConnLimit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we put all in one statement, it will output the following error(create a new inner scope var):
protokube/pkg/gossip/mesh/gossip.go:47:3: connLimit declared and not used
so i split it :)
Yes, i really have considered make a PR that just changes the default conn limit from 64 to some larger number, for example, 300, and i prefer that:) I think the problem with a lot of nodes using gossip dns maybe related to the issue. |
1000 would still be a limit :-) I thought it was number of peers, I didn't realize it was anything more than that. Do we know that this limit is in fact a problem? I'd be inclined to just bump it to 200 - I think that's what weave did. |
@Yanci what have you tested? |
@chrislovecnm sorry for late response:) Yes, we have setup a k8s cluster with gossip dns enabled. but as we start more nodes than the limit(64), the new nodes doesn't show in So i think the num of the total nodes of a k8s cluster can't exceed
Is that correct? @justinsb Yes, 1000 would still be a limit, we can bump it to 200. 64 is too small for a medium k8s cluster:) thanks. |
I ran into this issue today! 65 nodes was the maximum! Is there any way we can bypass this at the moment? How would one pass this arg through to new nodes too? |
When building this pr myself, k8s recognized more than 64 nodes.
and I executed the following commands.
|
/ok-to-test |
LGTM, this has impacted us as well. Little insane that this is an issue :( |
So I don't think this should be an env var that has to be set. I'm going to merge this PR, but I'm then going to send a PR that tries to set it automatically. I'd like to remove the env var at least from having to be set every time you call kops. My understanding is that 64 nodes shouldn't be a hard limit, but rather each node will limit itself to 64 peers. As long as we don't form two disjoint segments, we should be able to go to much bigger sizes. But I'll test it out! /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: justinsb, yancl The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
This simply turns off gossip connection limits, so we shouldn't ever have to manually configure them. Follow on to kubernetes#5077
I haven't been able to reproduce the problem, but I agree that it should happen! But #5486 should just remove the limit entirely. Edit: was able to reproduce it. I started at 60 nodes + 1 master and then went to 80 nodes. I think it happens if you have 64 members that form a connected clique and refuse to admit anyone else. |
This simply turns off gossip connection limits, so we shouldn't ever have to manually configure them. Follow on to kubernetes#5077
I understand this issue may be long forgotten. But it might help others who land here... hence this question. If the connection limit has been removed entirely, the original problem this PR (or having the GOSSIP_DNS_CONN_LIMIT env variable) was trying to fix is also fixed. Explicitly setting GOSSIP_DNS_CONN_LIMIT to some value less than the total number of nodes could lead to problems (of running into a clique), right? |
so that we can run more nodes than 64 :)