-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only initialize the in-cluster kube client when metadata service is actually unavailable #897
Only initialize the in-cluster kube client when metadata service is actually unavailable #897
Conversation
Welcome @chrisayoub! |
Hi @chrisayoub. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @vdhanan |
Pull Request Test Coverage Report for Build 1981
💛 - Coveralls |
/ok-to-test /lgtm |
will let vdhanan do a review and approve |
Let me know if you need me to try and fix the coverage check, it doesn't seem super trivial to fix based on how the test cases are currently configured, but let me know if you want that fixed. The only thing causing the decrease in coverage is the single |
don't worry about the coverage. I don't think unit tests around this runtime initialization of metadata/client is worth doing |
worth doing in this PR *. It requires too much mocking. We'd have to mock instance metadata client, mock/wrap k8s client initialization, etc. |
actually the mocking already exists. https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/a8c4961cbb336c3e64d38b7ad3b9aabc00398073/pkg/cloud/metadata_test.go Sorry for the noise, |
/retest |
i think it's worth adding a unit test coverage for this. i would just add a boolean |
pkg/cloud/metadata_test.go
Outdated
} | ||
|
||
for _, tc := range testCases { | ||
t.Run(tc.name, func(t *testing.T) { | ||
clientset := fake.NewSimpleClientset(&tc.node) | ||
var clientset *fake.Clientset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What am I doing wrong here? For some reason I can't get this test case to pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you paste the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=== RUN TestNewMetadataService/fail:_kube_client_is_nil
W0520 18:25:16.684313 23724 metadata.go:107] EC2 instance metadata is not available
metadata_test.go:412: NewMetadataService() returned an unexpected error. Expected instance metadata is unavailable and kubernetes clientset is nil, got instance metadata is unavailable and CSI_NODE_NAME env var not set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so even though you've set clientset to nil, it's not hitting the if condition you added to NewMetadataService
, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what dictates whether mockEC2Metadata.Available returns true or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because the pointer got cast into an interface. the interface is not nil even if the implementation is nil.
// creates the clientset | ||
clientset, err = kubernetes.NewForConfig(config) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just omit the test. in runtime, we will either pass NewMetadataService a functional* svc or a functional* clientset. There is no situation where we pass neither. So I don't want to muddy the code just for the sake of passing a test case. I will refactor in a future PR so that it's easier to understand , and write another test, before we release this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I'm thinking is, NewMetadataService should take an interface with one function. and we will have two implementations of that interface, one based on ec2 metadata and one based on kubernetes API. If ec2 metadata is unavailable you pass the kubernetes API interface. simple!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure how to implement what you mean, I'm not super familiar with Go. So, you might want to make your own PR for that approach. I was originally making this PR because I wanted to highlight that the issue existed, but did not previously see the actual issue reported as a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @chrisayoub I don't want you to work on more than you signed up for, I can do it in a follow up. sorry for the confusion, I'm thinking out loud.
let's merge your PR without the unit test changes (so just the first commit).
I can take it from there!
- e2e test it
- refactor
- unit test it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do a rebase or whatever to drop every commit but the first, force push it, and I'll merge. The coverage check doesn't matter, I can merge it regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it works, i'm totally cool with merging without unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll test it out in our cluster, I put a hold on this until I do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't really think of anything besides refactoring the function altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. thanks for your help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, this should be good to go!
/hold |
/hold cancel |
/lgtm thank you! sorry about all the back and forth regarding whether we need a test or not, sometimes i abuse github comments like my personal notepad : ) |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chrisayoub, wongma7 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Is this a bug fix or adding new feature?
Bug fix
What is this PR about? / Why do we need it?
Follow-up PR to this previous PR that fixed instance metadata retrieval when the EC2 metadata service is unavailable:
#855
In our cluster configuration, we run this such that an in-cluster configuration is not available, as we do not give these pods a service account. However, the pods do have access to the EC2 metadata service, as we run the pods with
hostNetwork: true
. When the in-cluster configuration is unavailable, this cause a crash at startup that cannot be avoided, even when setting theAWS_REGION
environment variable.Resolves #876
What testing is done?
Unit tests all still pass, and integration tests already exist for this code path. Tested and working in a real cluster when deployed without a serviceAccount.