-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The driver reports incorrect number of allocatables count #1808
Comments
Further analysis: the driver uses And metadata service reports the number of mapped devices wrong!! On the machine:
Yet, aws cli:
(of which 4 are ebs + 1 for the root block devices) And the aws UI console shows the same 4+1 attached block devices. |
After vm full shutdown (via console), it finally reports the matching numbers. Yet, Is that value only determined on node register and then not refreshed? If so - how would one update it? |
I'd recommend reporting this to AWS support (especially if it's reproducible), they should be able to get this to the appropriate team that works on IMDS. Unfortunately we don't have a good way to know if/when the metadata service is wrong so we trust its output.
Generally, restarting the EBS CSI Driver pod should cause Kubernetes to re-query the limit and update it. See if that fixes it - usually rebooting the node would be equivalent, my best guess is that there might have been a short period of time during the node's startup where the wrong value was still being displayed. |
Entire node shutdown helped to make http metadata api to report correct numbers, but reported allocatable numbers still make no sense to me:
Text in parentheses is added by me manually, node name is redacted, the number reported is untouched. All 4 nodes are identical t3a.xlarge, run in the same AZ, joined within 30 minutes (originally I said 2 hours, but I checked more accurately after that - they are much closer than that). Each node has 2 network interfaces (default, plus one extra attached). Values between ebs driver pod restarts don't change, even if I drain the node, ensure no extra volumes attached, then restart ebs - it still reports the same number as above. To me numbers look random and the only explanation from reading code I have - it's cached somewhere? |
I'm hitting the same issue. When a node is shut down without draining it first (and thus detaching all its volumes) and started later, then the node gets wrong attach limit in CSINode
Therefore I think it's not really useful to read Would be better just to pick nr. of attachments from instance type ( |
It's quite hard and error prone to recover from this situation. Cluster admin must drain + shut down a node that has misleading capacity count and start it again. It's not enough to restart the CSI driver. |
/reopen Since #1843 was reverted. |
@jsafrane: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@torredil: Closing this issue. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
What happened?
I have a 1.26.4 cluster that runs 4 worker nodes each of which is
t3a.xlarge
. All 4 joined the cluster the same day (within couple hours). All have 2 network interfaces attached.Yet all 4 report different number of
Allocatables.Count
: 9, 10, 16, 26.What you expected to happen?
I expect all of them to have it 26.
How to reproduce it (as minimally and precisely as possible)?
I don't know 🤷
Anything else we need to know?:
Environment: aws
kubectl version
): 1.26.4The text was updated successfully, but these errors were encountered: