Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet reports secondary InternalIP in AWS with multiple ENIs #61921

Closed
micahhausler opened this issue Mar 30, 2018 · 41 comments
Closed

Kubelet reports secondary InternalIP in AWS with multiple ENIs #61921

micahhausler opened this issue Mar 30, 2018 · 41 comments
Labels
area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one.

Comments

@micahhausler
Copy link
Member

micahhausler commented Mar 30, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug
/sig aws

What happened:
When a node gets a second ENI (elastic network interface) attached to it, the kubelet PATCH's up to the API server all IP's for all network interfaces on the node (it gets this from the EC2 metadata) The payload might look like this:

{
  "status": {
    "$setElementOrder/addresses": [
      {
        "type": "InternalIP"
      },
      {
        "type": "InternalIP"
      },
      {
        "type": "InternalIP"
      },
 ... 
      {
        "type": "InternalIP"
      },
      {
        "type": "InternalDNS"
      },
      {
        "type": "ExternalDNS"
      },
      {
        "type": "Hostname"
      }
    ],
    "addresses": [
      {
        "address": "10.0.116.30", # A secondary IP
        "type": "InternalIP"
      },
      {
        "address": "10.0.106.136", # A secondary IP
        "type": "InternalIP"
      },
      {
        "address": "10.0.110.215", # A secondary IP
        "type": "InternalIP"
      },
...
      {
        "address": "10.0.104.85", # Correct primary IP
        "type": "InternalIP"
      },
      {
        "address": "10.0.103.102",  # A secondary IP, the one incorrectly reported for the node
        "type": "InternalIP"
      }
    ]
  }
}

When trying to detect if the node has changed, the node controller manager de-duplicates addresses of the same Type from each node, picking the last one the in the list the Kubelet reports.

What you expected to happen:
The eth0 IP should be reported as the PrivateIP address on the node's .status.addresses[] list.

How to reproduce it (as minimally and precisely as possible):
Create a new cluster (with kops), install the AWS CNI provider, and check the node's InternalIP

kubectl get no -o json | jq .items[].status.addresses

You'll get something like

[
  {
    # NOTE: Different than the IP in the InternalDNS name
    "type": "InternalIP",
    "address": "10.0.103.102"
  },
  {
    "type": "ExternalIP",
    "address": "53.11.22.33"
  },
  {
    "type": "InternalDNS",
    "address": "ip-10-0-104-85.us-west-2.compute.internal"
  },
  {
    "type": "ExternalDNS",
    "address": "ec2-52-11-22-33.us-west-2.compute.amazonaws.com"
  },
  {
    "type": "Hostname",
    "address": "ip-10-0-104-85.us-west-2.compute.internal"
  }
]

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.9.x
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Amazon Linux
  • Kernel (e.g. uname -a): 4.9.85-47.59.amzn2.x86_64

Other reports:

Maybe related to #42125

Since the Node controller only keeps the last InternalIP address in the PATCH, can we have the EC2 metadata lookup query http://169.254.169.254/latest/meta-data/local-ipv4 for the InternalIP? There may be something I'm missing, and I'd love to get pointed in the right direction. If this is unintentional, is this something that we could get a fix for backported onto the 1.9 branch? I'm happy to get a patch out for this.

ping @justinsb @chrislovecnm

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/aws labels Mar 30, 2018
@micahhausler micahhausler changed the title Kubelet reports wrong InternalIP in AWS with multiple ENIs Kubelet reports secondary InternalIP in AWS with multiple ENIs Apr 2, 2018
@mtaufen
Copy link
Contributor

mtaufen commented Apr 3, 2018

@cheftako would you know anything about this?

/cc @bowei @cheftako

@micahhausler
Copy link
Member Author

@cheftako Do you have any insight into why the apiserver is deduplicating node addresses of the same type?

@rtripat
Copy link
Contributor

rtripat commented Apr 11, 2018

Reading through a couple of places (below) in API server, it looks like it assumes there is exactly 1 Address per NodeAddressType for a kubelet.

  1. How preferred node address is determined from Node status object
  2. How node address change is detected in node controller

The issue and corresponding PR where the change was made in kubelet to add IPs from all ENIs attached to an EC2 instance. If my understanding above is right, did this PR result in conflict with that assumption leading to this behavior?

@micahhausler
Copy link
Member Author

@wlan0 Do you know anything about this one?

@wlan0
Copy link
Member

wlan0 commented Apr 11, 2018

ping @nckturner

@nckturner
Copy link
Contributor

nckturner commented Apr 11, 2018

@wlan0 @cheftako Our question is: should kubelet be reporting exactly 1 address for each NodeAddressType? When there are multiple interfaces on a node, kubelet (the AWS cloud provider code) sends a list of addresses, which (I think) leads to the node controller picking 1 without knowing which is the correct one to choose. Basically we think #50112 leads to incorrect behavior.

@cheftako
Copy link
Member

Cloud node controller seems to assume this behavior when it computes whether an IP has changed https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cloud/node_controller.go#L444. I believe this is corresponds to https://github.com/kubernetes/kubernetes/blob/master/pkg/util/node/node.go#L81 and https://github.com/kubernetes/kubernetes/blob/master/pkg/util/node/node.go#L99. So right now we do seem to have some baked in assumptions in various parts of the system about a node having 1 IP per Type. Beyond that I would suggest talking to sig-networking. Maybe @bowei can help?

@nckturner
Copy link
Contributor

Yeah, that's what we found as well. I think in the original PR the user was trying to specify a secondary IP with the --node-ip flag which was failing. An ideal fix would allow --node-ip to use any IP on the instance, but in either case kubelet would not send a list of addresses, only the default private IP or the IP specified by --node-ip.

@micahhausler
Copy link
Member Author

Unless anyone has some strong objections, what we'd like to do is go back to using the local-ipv4 from the ec2-metadata, but still let users specify a secondary ip for the --node-ip flag (as #44686 asked for). I'll create a PR next week

@andrewsykim
Copy link
Member

Our question is: should kubelet be reporting exactly 1 address for each NodeAddressType? When there are multiple interfaces on a node, kubelet (the AWS cloud provider code) sends a list of addresses, which (I think) leads to the node controller picking 1 without knowing which is the correct one to choose. Basically we think #50112 leads to incorrect behavior.

I still don't have much context to this issue yet but it seems like setting multiple addresses of the same type on node status is the wrong direction as some people have already mentioned. For handling multiple private addresses that can be set on the node interface we should either create new address types (although I prefer if we don't) or have whatever IP provided by --node-ip take precedence. @micahhausler no objections from me to push forward with your PR to just use local-ipv4 but it would be good to get feedback from @jlzhao27 in case we are missing some context here.

@jlzhao27
Copy link
Contributor

Ya this seems fine. I agree with @nckturner's comment above. The exact issue I ran into was that the old code did not allow the user to use a secondary ENI with the --node-ip flag since it was not being parsed by the Kubelet at all. It seems like my fix broke some other assumptions, maybe the change here is to fix the code path for --node-ip and make the kubelet report either the user defined ip or the default internal ec2 instance ip.

@andrewsykim
Copy link
Member

andrewsykim commented Apr 24, 2018

An ideal fix would allow --node-ip to use any IP on the instance, but in either case kubelet would not send a list of addresses, only the default private IP or the IP specified by --node-ip.

^ it seems like we override all the node address types if the kubelet sets --node-ip which seems wrong to me. I've opened this PR #63101 to address that (need to fix tests still). We may want a follow up PR that will do a better job of finding what the actual --node-ip is (based on type?)

@andrewsykim
Copy link
Member

andrewsykim commented Apr 24, 2018

@jlzhao27 thanks for the quick reply, seems like steps forward are:

  • use ec2 metadata local-ipv4 by default as node address InternalIP (@micahhausler you said you can do this one?)
  • override the InternalIP address type on node controller if --node-ip is set on the kubelet
  • maybe modify --node-ip to specify the exact address type we want to override

@micahhausler
Copy link
Member Author

micahhausler commented Apr 24, 2018

Here's a summary of where I'm at with this.

The kubelet informs the API server of it's addresses in the setNodeAddress() method. If there is a CloudProvider (set by the --cloud-provider flag), the kubelet calls the cloud interface's NodeAddresses() method which is implemented by the AWS cloud provider here.

PR #50112 changed the AWS implementation of NodeAddresses() by querying the EC2 metadata for all IP's on all network interfaces and returning the list, rather than just getting the EC2 metadata's local-ipv4. This was done to allow an operator to use a secondary ENI with a single IP as the kubelet address. This is a legitimate case, where the operator may attach a secondary ENI with one assigned IP and wish to use that as the Kubelet's IP.

When using the AWS CNI plugin, the secondary ENI with multiple IPs are not used by the node itself, but assigned to individual pods. In this case, the kubelet doesn't get enough information from the EC2 metadata about which secondary ENI's or IPs are actually used by the node itself, and ends up reporting all IP's (in the case of an m4.large, that is 10 IP's per ENI) to the API server. The API server does a reduction somewhere along the line where it only accepts one NodeAddress per type (types being InternalIP, ExternalIP, InternalDNS, ExternalDNS) and seems to pick the last one in the list, which is often the wrong IP. (The list comes from the enumeration of IP's for each ENI from the EC2 metadata.) When querying the API server for the node's .status.address list, the InternalIP is one of the pod's IPs, which the kubelet is not listening on. This incorrect address causes any calls back to the node (kubectl exec/logs/proxy) to fail.

Our current workaround is to specify the --node-ip flag on the kubelet to the correct IP, but this has 2 issues. First, it requires anyone using the AWS CNI plugin to use the --node-ip flag in order to ensure the correct node IP. This is inconvenient, and requires an out-of-band scripting step. Second, when specifying a node IP, ExternalDNS, ExternalIP, and InternalDNS node addresses are no longer reported. This is information we don't want to loose, as many workloads and use-cases need to know this information.

The kubelet logic around validating the --node-ip requires that the operator-provided IP be included in the list that the cloud provider NodeAddresses() returns. This protects the node from reporting an IP it shouldn't be requesting, but drops the ExternalDNS, ExternalIP, and InternalDNS.

As far as solutions, there are 3 things I've considered.

  1. In the AWS implementation of NodeAddresses(), make sure that the last address of type InternalIP is the primary ENI's IP (the eth0). This feels like a hack and relies on observed behavior in the API.

  2. Make the kubelet return ExternalIP and ExternalDNS addresses even when --node-ip is specified. This breaks the current behavior of --node-ip, but we can make sure the IP specified is the right one.

  3. Not validate that the supplied --node-ip is in the NodeAddresses() list, and use local-ipv4 by default.

Short of those 3 solutions, I'm not sure how to allow for a user to specify a secondary ENI as the node IP and how to to correctly report the primary IP when using the AWS CNI provider. Am I missing anything big on option 3?

@andrewsykim
Copy link
Member

@micahhausler thanks for the comprehensive update! Would you be able to join the cloud provider WG meeting today again so we can hash out some final details? I think we're close to a workig solution :).

@micahhausler
Copy link
Member Author

I'll be there!

@nckturner
Copy link
Contributor

nckturner commented Apr 25, 2018

Summary of the conversation in WG Cloud Provider:

  • Fix AWS cloud provider to only report 1 Address of each type by default (@micahhausler). Is this possible to do and still allow a --node-ip to use a secondary IP?
  • Modify kubelet to allow the reporting ExternalDNS, ExternalIP, InternalDNS when --node-ip is set.

@micahhausler
Copy link
Member Author

Created #63158 around the --node-ip issue

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 24, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 23, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@jnicholls
Copy link

It seems like getting the local-ipv4 metadata was the correct approach for the use case of API server communication, with --node-ip as the override...while network/interfaces/macs/<mac>/local-ipv4s is fine for the overall NodeAddresses() use case. Rather than just choosing one of the IPs from that list blindly, since the order of interfaces can switch in some circumstances (the assigned MAC for the ENI can change when the ENIs are attached to a replacement node). Are we concluding that folks using cluster orchestrators like kops must automate the means for passing --node-ip explicitly to have a deterministic ENI selection?

@jnicholls
Copy link

jnicholls commented Jan 28, 2020

  1. Not validate that the supplied --node-ip is in the NodeAddresses() list, and use local-ipv4 by default.

I think this option (particularly the part of using local-ipv4 by default) would have been a good approach for avoiding the out-of-order issue when MACs flip around, and avoids the need to script up a solution for specifying --node-ip that would ultimately (in this case at least) reference local-ipv4.

@jnicholls
Copy link

/reopen
/remove-lifecycle rotten

@k8s-ci-robot
Copy link
Contributor

@jnicholls: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 28, 2020
@bowei
Copy link
Member

bowei commented Jan 28, 2020

/reopen
/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot reopened this Jan 28, 2020
@k8s-ci-robot
Copy link
Contributor

@bowei: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@micahhausler: There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

  • /sig <group-name>
  • /wg <group-name>
  • /committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 28, 2020
@k8s-ci-robot
Copy link
Contributor

@bowei: The label(s) sig/aws cannot be applied, because the repository doesn't have them

In response to this:

/sig aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@bowei: The label(s) sig/provider-aws cannot be applied, because the repository doesn't have them

In response to this:

/sig provider-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@bowei: The label(s) sig/cloud-provider-aws cannot be applied, because the repository doesn't have them

In response to this:

/sig cloud-provider-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@k8s-ci-robot
Copy link
Contributor

@bowei: The label(s) sig/cloud-provider-aws cannot be applied, because the repository doesn't have them

In response to this:

/sig cloud-provider-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@bowei: The label(s) sig/provider-aws cannot be applied, because the repository doesn't have them

In response to this:

/sig provider-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@bowei: The label(s) sig/aws cannot be applied, because the repository doesn't have them

In response to this:

/sig aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bowei
Copy link
Member

bowei commented Jan 28, 2020

/area provider/aws

@k8s-ci-robot k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Jan 28, 2020
@k8s-ci-robot
Copy link
Contributor

@bowei: The label(s) area/ cannot be applied, because the repository doesn't have them

In response to this:

/area provider/aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bowei
Copy link
Member

bowei commented Jan 28, 2020

:sigh:

@wongma7
Copy link
Contributor

wongma7 commented Feb 14, 2020

@jnicholls

Rather than just choosing one of the IPs from that list blindly, since the order of interfaces can switch in some circumstances (the assigned MAC for the ENI can change when the ENIs are attached to a replacement node).

It doesn't choose blindly, it's supposed to sort by device-number https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/legacy-cloud-providers/aws/aws.go#L1409 so eth0 should be the default.

Feel free to create a new issue to discuss further, IMO it's a bit confusing to continue discussion here as technically the original issue has been fixed.

@wongma7
Copy link
Contributor

wongma7 commented Feb 14, 2020

/close

@k8s-ci-robot
Copy link
Contributor

@wongma7: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.