Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag CLUSTER_ID on Node EC2 instances to enable cni-metrics-helper #763

Closed
StevenACoffman opened this issue Apr 29, 2019 · 6 comments
Closed

Comments

@StevenACoffman
Copy link
Contributor

StevenACoffman commented Apr 29, 2019

Why do you want this feature?
The Troubleshooting Guide documents using an AWS provided tool, the cni-metrics-helper which can show aggregated ENIs and IPs information at the cluster level. These metrics can optionally be pushed to cloudwatch.

However, the tool relies on the EC2 Instance tag CLUSTER_ID or defaults to eks-cluster. This isn't very useful if I have multiple EKS clusters and wish to monitor different clusters for available ENI and IP address capacity.

While I can specify ec2 tags in node groups, I would like eksctl to tag ec2 instances with CLUSTER_ID as it does for the eksctl.cluster.k8s.io/v1alpha1/cluster-name tag

Ref aws/amazon-vpc-cni-k8s#313

@StevenACoffman
Copy link
Contributor Author

It looks like adding a line here:
https://github.com/weaveworks/eksctl/blob/master/pkg/cfn/manager/api.go#L53
and another here:
https://github.com/weaveworks/eksctl/blob/master/pkg/apis/eksctl.io/v1alpha4/types.go#L104

Would be all that is required, but not sure if that requires v1alpha4 -> v1alpha5

@errordeveloper
Copy link
Contributor

I think this is a bug, I was under impression metrics worked already (as of #296). Or is it the case of metrics being written, but not useful enough?

@StevenACoffman
Copy link
Contributor Author

StevenACoffman commented May 1, 2019

In #296 eksctl added the cloudwatch:PutMetricData which gives the IAM permissions to put metric data, each worker also needs ec2:DescribeTags (which I think they do by default, but I've adjusted mine). Even with correct IAM permissions, if you examine the logs you see these errors:

cni-metrics-helper-xxxx's logs output `Failed to obtain cluster-id and default to eks-cluster`?

e.g:

$ kubectl -n kube-system logs cni-metrics-helper-xxxx
...
           publisher.go:76] Failed to obtain cluster-id and default to eks-cluster

The problem is that if you have multiple clusters, they will all mix their metrics into the same eks-cluster dimension. If you add the tag of CLUSTER_ID with the name matching the cluster name, and then restart cni-metrics-helper-xxxx pod you can then confirm that Failed to obtain cluster-id and default to eks-cluster error was stopped and you should be able to see the metrics dimension which is configured by CLUSTER_ID's value instead of eks-cluster.

@errordeveloper
Copy link
Contributor

I must say that CLUSTER_ID is pretty arbitrary choice of tag, I've not seen it being recommended/mandated by EKS before.

Anyhow, from looking at the code, seems like it should just fallback to Name tag instead, and we certainly set that, it's unique per-nodegroup (it starts with cluster name, so I'd hope you can use it).
Maybe it's changed recently, what version of the metrics helper image are you running?

@errordeveloper
Copy link
Contributor

We can add ec2:DescribeTags for the CNI metrics, we don't really have a need for it to have it by default, but it's not harmful to add. I think it's currently available as part of autoscaler IAM add-on policy, which you may wish to try using in the meantime...

@StevenACoffman
Copy link
Contributor Author

That code very recently changed (18 hrs ago!) to fallback to Name, and actually per nodegroup would be better than per cluster (since different single az nodegroups in different subnets might behave differently). Thanks for taking a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants