-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DigitalOcean] Add load balancer support for master HA #8237
Conversation
Hi @srikiz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
3dac7a1
to
54292fb
Compare
/ok-to-test |
@timoreimann - Please review when you get a chance, thanks ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good -- left a bunch of improvement suggestions.
|
||
lbSpec := b.Cluster.Spec.API.LoadBalancer | ||
if lbSpec == nil { | ||
// Skipping API ELB creation; not requested in Spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ELB is probably from AWS? If so, we should just call it LB in my opinion.
Similar for several other occurrences in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do.
return fmt.Errorf("unhandled LoadBalancer type %q", lbSpec.Type) | ||
} | ||
|
||
loadbalancerName := "api-" + strings.Replace(b.ClusterName(), ".", "-", -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we could extract the result of the Replace
call into a variable for reusage on the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
|
||
// Create LoadBalancer for API ELB | ||
var loadbalancer *dotasks.LoadBalancer | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to have the extra scope?
To me it looks like we can get rid of it and use :=
to define loadbalancer
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updated.
pkg/resources/digitalocean/cloud.go
Outdated
// Note that this must match Digital Ocean's lb name | ||
klog.V(2).Infof("Querying DO to find Loadbalancers for API (%q)", cluster.Name) | ||
|
||
loadBalaners, _, err := c.LoadBalancers().List(context.TODO(), nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the first variable name (missing c).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is List()
from godo and thus needs paging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that was the godo api to retrieve all the load balancers, so needed paging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed typo as well.
_, err := c.Client.LoadBalancers.Delete(context.TODO(), lb.ID) | ||
|
||
if err != nil { | ||
return fmt.Errorf("failed to delete load balancer with name %s", lb.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include err
in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
} | ||
|
||
for _, loadbalancer := range loadbalancers { | ||
if loadbalancer.Name == fi.StringValue(lb.Name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if it's also possible to look up an LB by ID? That would allow us to do a single lookup using a different godo method, with no paging necessary whatsoever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really helps, thanks for letting me know. Changed the logic to use LB ID instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should handle the case where the names don't match, e.g., logging.
I'm not sure I understand why we need the extra check though -- can we think of a case where the ID alone is insufficient?
} | ||
|
||
klog.Errorf("Error fetching IP Address for lb Name=%s", fi.StringValue(lb.Name)) | ||
time.Sleep(10 * 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
30
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed that to 10 seconds with time.Sleep(10 * time.Second)
return nil, errors.New("IP Address is still empty.") | ||
} | ||
|
||
func is_ipv4(host string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS, another way to implement this is to invoke net.ParseIP
followed by To4
while checking that we always get a non-nil result on each step. That is,
ip := net.ParseIP(host)
if ip == nil {
return false
}
return ip.To4() != nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, updated.
return nil, errors.New("IP Address is still empty.") | ||
} | ||
|
||
func is_ipv4(host string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function name style should follow Go, i.e., be called isIPv4
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
address := loadBalancer.IP | ||
|
||
if len(address) > 0 && is_ipv4(address) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop the len(address) > 0
check part, it's implicitly checked by is_ipv4
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
54292fb
to
d856dac
Compare
@timoreimann - I have incorporated all the comments. Please have a look when you get a chance. Thanks ! |
/test pull-kops-e2e-kubernetes-aws |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good -- just two smallish things left.
pkg/resources/digitalocean/cloud.go
Outdated
// Note that this must match Digital Ocean's lb name | ||
klog.V(2).Infof("Querying DO to find Loadbalancers for API (%q)", cluster.Name) | ||
|
||
loadBalancers, _, err := c.LoadBalancers().List(context.TODO(), nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need paging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I missed that. Updated it.
} | ||
|
||
for _, loadbalancer := range loadbalancers { | ||
if loadbalancer.Name == fi.StringValue(lb.Name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should handle the case where the names don't match, e.g., logging.
I'm not sure I understand why we need the extra check though -- can we think of a case where the ID alone is insufficient?
Add load balancer fi tasks Add load balancer builder for DO Fix go imports Implement FindIPAddress functionality Add load balancer api ingress status calls Add error checks for FindIPAddress Add delete LB option Update load balancer delete logic Revert make file changes revert utils code changes Revert NewDOCloud changes Remove minor code comments Update with gomod for dependencies
1c5c54f
to
e440397
Compare
@timoreimann - I have incorporated the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good, thanks for cleaning it up further.
I caught just a few more smallish thing; after those, I think we should be ready.
return nil, fmt.Errorf("load balancer service get request returned error %v", err) | ||
} | ||
|
||
// do another check to double-sure we are talking to the right load balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment looks like it's obsolete by now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
// Loadbalancer = nil if not found | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if we do
if fi.StringValue(lb.ID) == "" {
// Loadbalancer = nil if not found
return nil, nil
}
cloud := c.Cloud.(*digitalocean.Cloud)
lbService := cloud.LoadBalancers()
loadbalancer, _, err := lbService.Get(context.TODO(), fi.StringValue(lb.ID))
[...]
then we do not need to intend the if-branch for the case where the LB ID exists. The code also seems easier to read because the check for the empty LB ID and the action we take are not far apart anymore.
(I also moved the cloud
and lbService
variables next to their usage -- they are not needed unless we have an ID.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that's a good suggestion. Updated accordingly.
loadBalancerService := cloud.LoadBalancers() | ||
|
||
klog.V(10).Infof("Find IP address for load balancer ID=%s", fi.StringValue(lb.ID)) | ||
loadBalancer, _, err := loadBalancerService.Get(context.TODO(), fi.StringValue(lb.ID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we certain at this point that we have an LB ID? Genuine question as I don't know the kops work flow in this context that well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @timoreimann - we are certain we have an LB ID, but not certain about the LB IP Address. FindAddress is only called after RenderDO is invoked - we create a new LB Request in RenderDO that returns a LB ID. We use this LB ID to find if the IP Address is available.
return &address, nil | ||
} | ||
|
||
klog.Errorf("Error fetching IP Address for lb Name=%s", fi.StringValue(lb.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we match the description with the error below and mention that we are sleeping so that users aren't surprised by the delay, e.g.:
const lbWaitTime = 10 * time.Second
klog.Errorf("IP address for LB %s not yet available -- sleeping %s", fi.StringValue(lb.Name), lbWaitTime)
I also extracted the wait time into a const so that we can reuse it in the actual time.Sleep
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, updated.
return &address, nil | ||
} | ||
|
||
klog.Errorf("Error fetching IP Address for lb Name=%s", fi.StringValue(lb.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a klog.Warnf
here? The delay should be expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
klog.Errorf("Error fetching IP Address for lb Name=%s", fi.StringValue(lb.Name)) | ||
time.Sleep(10 * time.Second) | ||
|
||
klog.V(10).Infof("Sleeping for 10 seconds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop this one given my improvement suggestion for the message above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, done.
return ip.To4() != nil | ||
} | ||
|
||
// terraformVolume represents the digitalocean_volume resource in terraform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one looks like a trailing comment that shouldn't be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, thanks a ton Sri.
LGTM (though it still needs approval from a kops maintainer / Kubernetes org member).
The E2E failures are unrelated, so we can move forward with this regardless. Thanks! /test pull-kops-e2e-kubernetes-aws /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rifelpet, srikiz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kops-e2e-kubernetes-aws |
…pstream-release-1.17 Automated cherry pick of #8237: Initial changes for load balancer task
This PR adds load balancer support for digital ocean cloud provider.
It would add a new load balancer based on a cluster master tag.
Incorporated the new load balancer address as part of the kuber config file.
Tested with 3 masters and a load balancer and was able to access the kubernetes cluster via the load balancer address from outside.
Deleting the cluster also deletes the load balancer.
Created a new cluster with the below command
./kops create cluster --cloud=digitalocean --name=dev5.k8s.local --networking=cilium --api-loadbalancer-type=public --master-count=3 --networking=cilium --zones=tor1 --ssh-public-key=~/.ssh/id_rsa.pub -v 10 --yes
Will be creating a new PR to add documentation for DO's support to multi-master with gossip based clustering.