-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
amazon-k8s-cni crashes on startup (unless undocumented env var set) #1725
Comments
@johngmyers These changes are not yet released. There will be corresponding CNI manifest and readme updates when we release IPv6 support. You are trying changes on top of master with an old CNI manifest. New Manifest on top of master includes these new env variables - https://github.com/aws/amazon-vpc-cni-k8s/blob/master/config/master/aws-k8s-cni.yaml#L193. |
@achevuru I am aware the changes are not yet released; that is apparent from the lack of tags on the branch. Nonetheless, panicing on lack of required settings is probably something you'd want to backport the fix for prior to release. |
I note that returning the actual error instead of a generic "failed to validate configuration" message would be more helpful to diagnosis. |
@johngmyers Understand what you're trying to say but the problem only arises when someone attempts to use the latest image on an old incompatible CNI manifest (i.e.,) updating the image tag on a v1.9 or v1.8 CNI manifest. If an user wants to upgrade to v1.10 - they would need to apply the v1.10 manifest which will have the required default values required for the new image. Same will be true for few of the older versions as well (trying to use 1.5 manifest for 1.6 etc). We'll consider taking it in. Actual error log will be printed in the config validation function. For ex here and here. Above linked PR added a generic error msg in the caller. |
What happened:
amazon-k8s-cni container dereferenced nil pointer upon startup
Attach logs
What you expected to happen:
Container to not panic. Bonus points for running successfully, or at least including the reason for the failure in the container logs.
How to reproduce it (as minimally and precisely as possible):
image
fields.Anything else we need to know?:
As of the writing of this bug, the above commit is the most recent on the release-1.10 branch.
The panic indicates that
StartNodeIPPoolManager
was called on a nilIPAMContext
. This would have to result from the code:amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go
Lines 399 to 401 in a2485e1
As the body of the if statement does not assign to
err
, it is alwaysnil
at that point.Further analysis shows the reason that
isConfigValid()
is returning false is this:amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go
Lines 2170 to 2173 in a2485e1
Strangely,
isIPv4Enabled()
defaults tofalse
. I would have not made such an incompatible change, but it's your project.I note that despite the requirement that having either
ENABLE_IPv4
orENABLE_IPv6
be set in the environment, neither variable is documented in the "CNI Configuration Variables" section of README.md.I also suggest that having
isConfigValid()
return an error instead of logging to an obscure on-host-filesystem file and merely returning bool would make troubleshooting invalid configuration problems easier.Environment:
kubectl version
): https://storage.googleapis.com/k8s-release-dev/ci/v1.23.0-alpha.4.78+8facd7298627b7cat /etc/os-release
): Debian 11 136693071363/debian-11-amd64-20211011-792uname -a
):The text was updated successfully, but these errors were encountered: