-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix metrics server addon #6201
Fix metrics server addon #6201
Conversation
So, v0.3.0 changed how the options are specified, so it makes sense that we needed to make this change. See: * kubernetes/kubernetes#44540 (comment) * kubernetes/kubernetes#44540 (comment) From checking the address preferences configured on our API server i.e. and then mirroring those options to metrics-server. See: * kubernetes-sigs/metrics-server#67 (comment)
In 0.3.x, the secure kubelet port with auth is enabled by default. Use of the insecure port is deprecated and may even be removed as soon as the next release. See: * https://github.com/kubernetes-incubator/metrics-server/releases/tag/v0.3.0 * https://github.com/kubernetes-incubator/metrics-server/releases/tag/v0.3.1 That said, metrics-server now uses webhook authentication so kubelet would need webhook authentication enabled (which we don’t). This flag enables, that serviceaccount tokens to be used to authenticate against the kubelet: See: * https://kubernetes.io/docs/reference/access-authn-authz/webhook/ * #5508 * #5176 (comment) * kubernetes-sigs/metrics-server#175 Currently, even if we were to use certificates, this provides a challenge since metrics server will generate it’s own self-signed certificates since we don’t set `--tls-cert-file` and `--tls-private-key-file`. Related: * kubernetes-sigs/metrics-server#25 * kubernetes-sigs/metrics-server#146
@prageethw: GitHub didn't allow me to assign the following users: justins. Note that only kubernetes members and repo collaborators can be assigned. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @justinsb |
I've added more information to the PR description for a better understanding of the issue and the solution therein. Also, kubernetes-sigs/metrics-server#131 (comment) is a question I posed to one of the main developers of metrics-server and I got a response with some more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This will fix problems users are experiencing with HPA and metric-server right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@itskingori I don't think this fix works anymore after kops 1.11.0, seems to be working on kops 1.10.0 with metrics server 3.1 |
@prageethw, can you give us more detail on what problem you are experiencing? This fix works for me on k8s 1.11 and k8s 1.12. Might be something related to your specific environment. |
it works fine in kops 1.10.0 (in fact it works just with --kubelet-insecure-tls), but does not work in kops 1.11.0, the same unauthorized error appears I think the issue is not with K8s but kops. Can you confirm you are running kops 1.11.0? |
@prageethw Hmmmm. Used this in v1.10.11, just migrated to v1.11.6 and metrics server is still working (I think, will have to check). What error are you getting? |
@prageethw , when you Link to issue: #5706 I can confirm I am running 7 clusters on k8s 1.11.x and they all work with this fix. |
@itskingori @Cryptophobia my helm chart and kops and k8s versions below which works like a charm. k8s version GitVersion:"v1.11.7" kops version Version 1.10.0 helm install stable/metrics-server \
--name metrics-server \
--version 2.0.4 \
--set replicas=2 \
--namespace metrics \
--set args={"--kubelet-insecure-tls=true"} \
--set resources.limits.cpu="100m",resources.limits.memory="50Mi" here the case it fails and metrics logs metrics logs unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-172-20-67-59.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-67-59.us-east-2.compute.internal (ip-172-20-67-59.us-east-2.compute.internal): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-20-89-40.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-89-40.us-east-2.compute.internal (ip-172-20-89-40.us-east-2.compute.internal): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-20-109-206.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-109-206.us-east-2.compute.internal (ip-172-20-109-206.us-east-2.compute.internal): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-20-43-140.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-43-140.us-east-2.compute.internal (ip-172-20-43-140.us-east-2.compute.internal): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-20-112-175.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-112-175.us-east-2.compute.internal (ip-172-20-112-175.us-east-2.compute.internal): request failed - "401 Unauthorized", response: "Unauthorized"] k8s version GitVersion:"v1.11.7" kops version Version 1.11.0 helm install stable/metrics-server \
--name metrics-server \
--version 2.0.4 \
--set replicas=2 \
--namespace metrics \
--set args={"--kubelet-insecure-tls=true"} \
--set resources.limits.cpu="100m",resources.limits.memory="50Mi" with exact config that fix suggested it produce even a different error unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-172-20-112-175.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-112-175.us-east-2.compute.internal (172.20.112.175): Get https://172.20.112.175:10250/stats/summary/: x509: cannot validate certificate for 172.20.112.175 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:ip-172-20-67-59.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-67-59.us-east-2.compute.internal (172.20.67.59): Get https://172.20.67.59:10250/stats/summary/: x509: cannot validate certificate for 172.20.67.59 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:ip-172-20-43-140.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-43-140.us-east-2.compute.internal (172.20.43.140): Get https://172.20.43.140:10250/stats/summary/: x509: cannot validate certificate for 172.20.43.140 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:ip-172-20-109-206.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-109-206.us-east-2.compute.internal (172.20.109.206): Get https://172.20.109.206:10250/stats/summary/: x509: cannot validate certificate for 172.20.109.206 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:ip-172-20-89-40.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-89-40.us-east-2.compute.internal (172.20.89.40): Get https://172.20.89.40:10250/stats/summary/: x509: cannot validate certificate for 172.20.89.40 because it doesn't contain any IP SANs] helm helm install stable/metrics-server \
--name metrics-server \
--version 2.0.4 \
--set replicas=2 \
--namespace metrics \
--set args={"--kubelet-insecure-tls=true,--kubelet-preferred-address-types=InternalIP\,Hostname\,ExternalIP"} \
--set resources.limits.cpu="100m",resources.limits.memory="50Mi" hence my personal take is the issue is in KOPS not in K8s, when kops bumped up the version from 1.10.0 to 1.11.0 something has changed the way kubelets communicate with API server which i could not find in release notes either. |
@Cryptophobia |
@prageethw I see. I'm conflating kops and kubernetes 🤦♂️ ... we also upgraded using kops 1.11.0 to kubernetes 1.11.x and it does seems broken. I've been able to find similar errors in my logs too.
Agreed! |
I am running kops version Kops version should be paired with the k8s version. For example when you are upgrading from 1.10.x to 1.11.x you should get the newest kops version which would be 1.11.x . Not sure if that is always the requirement, but that was a requirement from before. First version of kops to support 1.11.x k8s is |
@Cryptophobia @itskingori to give you guys a context I run cluster creation and destruction in CI (with above fix as it was already flagged metrics server not allowing self signed certs anymore) to make sure to catch when things get break asap in general. , so I find out it has broken the moment I bumped kops from 1.10.0 to 1.11.0 |
I'll just leave my experience here: k8 version: 1.11.6
Had to open for webhooks:
Update cluster:
I then had to add this role binding:
I then had to change metrics-server commands:
Hpa now working as expected, but still getting some metrics-server warnings:
|
@prageethw While I was surprised by the log lines you mentioned also showing up in my setup, seems like this still works (or else I'd have had bigger problems). Not sure how to proceed. |
it works fine for me in kops 1.10.0. @Bragegs Note: above versions are Kops, not K8s, I think the issue is in Kops, not k8s. kops version Version 1.11.0 kubectl version Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-04T04:48:03Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.7", GitCommit:"65ecaf0671341311ce6aea0edab46ee69f65d59e", GitTreeState:"clean", BuildDate:"2019-01-24T19:22:45Z", GoVersion:"go1.10.7", Compiler:"gc", Platform:"linux/amd64"} metrics sever logs unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-172-20-48-65.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-48-65.us-east-2.compute.internal (172.20.48.65): Get https://172.20.48.65:10250/stats/summary/: x509: cannot validate certificate for 172.20.48.65 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:ip-172-20-118-10.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-118-10.us-east-2.compute.internal (172.20.118.10): Get https://172.20.118.10:10250/stats/summary/: x509: cannot validate certificate for 172.20.118.10 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:ip-172-20-120-50.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-120-50.us-east-2.compute.internal (172.20.120.50): Get https://172.20.120.50:10250/stats/summary/: x509: cannot validate certificate for 172.20.120.50 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:ip-172-20-65-192.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-65-192.us-east-2.compute.internal (172.20.65.192): Get https://172.20.65.192:10250/stats/summary/: x509: cannot validate certificate for 172.20.65.192 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:ip-172-20-77-20.us-east-2.compute.internal: unable to fetch metrics from Kubelet ip-172-20-77-20.us-east-2.compute.internal (172.20.77.20): Get https://172.20.77.20:10250/stats/summary/: x509: cannot validate certificate for 172.20.77.20 because it doesn't contain any IP SANs] I will need to stick with kops 1.10.0 with below helm untill I know what exacty required for 1.11.0 get work and whats the difference. ....
--set args={"--kubelet-insecure-tls=true"} \
....
sh
|
I am on
This PR fixed it for me, thanks @itskingori ! However, I would now like to promote this to production (the goal is to get Is this the preferred solution for |
update: after few attempts, I have got above webhook method specified by @Bragegs working too. |
Implementing the method in the above comment #6201 (comment) seems to have worked for me as well. |
@itskingori Any chance you could answer my questions? In general, the big idea is: "is this production ready"? Can / should you use the changes in this PR (which are necessary for it to work in our cluster) in production? If not, can you point me to a guide for what to do instead? |
@bensussman As far as I know turning off tls on the metric server means that it will NOT verify the tls certificates when communicating to kubelets of the nodes. This is not very secure but if your subnets are configured to be private, then it shouldn't be too much of a concern. If you are hosting other people's code or containers, and you cannot trust your own network, then maybe In the future, PR here will allow us to switch out the CA that metrics-server uses to verify TLS CA certs on the kubelets: kubernetes-sigs/metrics-server#183 |
@bensussman Sure.
Sure. Must have missed your comment.
It's a stopgap for me i.e. acceptable level of technical debt. The value of having metrics-server far outweighs not having it because of lack of TLS. I can afford the risk because my clusters are not multi-tenant. Sometimes you just have to take what you have now and make a note of what needs to be fixed later once the fix is available (I'm tracking all relevant issues). #6201 (comment) explains the situation in a little more detail. And I didn't want to enable webhook authentication.
Not that I'm aware of.
AFAIK, Heapster is deprecated. I chose not to invest in it. See: |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chrisz100, itskingori The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This was merged without the required cluster_spec changes. Should those become the default? At the very least, we need documentation to support metrics-server. metrics-server is checked in as an addon but has no documentation on how to install it. |
Kubelet surprisingly has 'AlwaysAllow' authentication on, so the minimum things I had to change to enable metrics server were:
cluster yaml
|
Now metrics-server How should I configure it to not use |
All the details are in the commit messages. I think this:
Fixes to issues caused by changes in v0.3.x, see release notes:
Main issue is that in 0.3.x, the secure kubelet port with auth is enabled by default. Use of the insecure port is deprecated and may even be removed as soon as the next release.
That said, metrics-server now uses webhook authentication so kubelet would need webhook authentication enabled (which we don’t). This flag enables, that serviceaccount tokens to be used to authenticate against the kubelet: See:
Currently, even if we were to use certificates, this provides a challenge since metrics server will generate it’s own self-signed certificates since we don’t set
--tls-cert-file
and--tls-private-key-file
. Related:That said, based on the comment in 8ba93ee, do you think it's worth enabling webhook authentication on kubelet instead of doing it this way?