Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only initialize the in-cluster kube client when metadata service is actually unavailable #897

Merged
merged 1 commit into from
May 21, 2021
Merged

Only initialize the in-cluster kube client when metadata service is actually unavailable #897

merged 1 commit into from
May 21, 2021

Conversation

chrisayoub
Copy link
Contributor

@chrisayoub chrisayoub commented May 20, 2021

Is this a bug fix or adding new feature?

Bug fix

What is this PR about? / Why do we need it?

Follow-up PR to this previous PR that fixed instance metadata retrieval when the EC2 metadata service is unavailable:
#855

In our cluster configuration, we run this such that an in-cluster configuration is not available, as we do not give these pods a service account. However, the pods do have access to the EC2 metadata service, as we run the pods with hostNetwork: true. When the in-cluster configuration is unavailable, this cause a crash at startup that cannot be avoided, even when setting the AWS_REGION environment variable.

Resolves #876

What testing is done?

Unit tests all still pass, and integration tests already exist for this code path. Tested and working in a real cluster when deployed without a serviceAccount.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 20, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @chrisayoub!

It looks like this is your first PR to kubernetes-sigs/aws-ebs-csi-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-ebs-csi-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @chrisayoub. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 20, 2021
@chrisayoub chrisayoub changed the title Only initialized in-cluster kube client when metadata service is actually unavailable Only initialize the in-cluster kube client when metadata service is actually unavailable May 20, 2021
@k8s-ci-robot k8s-ci-robot requested review from ayberk and vdhanan May 20, 2021 20:41
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 20, 2021
@chrisayoub
Copy link
Contributor Author

/assign @vdhanan

@coveralls
Copy link

coveralls commented May 20, 2021

Pull Request Test Coverage Report for Build 1981

  • 0 of 11 (0.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.06%) to 79.125%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/cloud/metadata.go 0 11 0.0%
Totals Coverage Status
Change from base Build 1976: -0.06%
Covered Lines: 1990
Relevant Lines: 2515

💛 - Coveralls

@wongma7
Copy link
Contributor

wongma7 commented May 20, 2021

/ok-to-test

/lgtm

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 20, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 20, 2021
@wongma7
Copy link
Contributor

wongma7 commented May 20, 2021

will let vdhanan do a review and approve

@chrisayoub
Copy link
Contributor Author

Let me know if you need me to try and fix the coverage check, it doesn't seem super trivial to fix based on how the test cases are currently configured, but let me know if you want that fixed. The only thing causing the decrease in coverage is the single clientset == nil check.

@wongma7
Copy link
Contributor

wongma7 commented May 20, 2021

don't worry about the coverage. I don't think unit tests around this runtime initialization of metadata/client is worth doing

@wongma7
Copy link
Contributor

wongma7 commented May 20, 2021

worth doing in this PR *. It requires too much mocking. We'd have to mock instance metadata client, mock/wrap k8s client initialization, etc.

@wongma7
Copy link
Contributor

wongma7 commented May 20, 2021

actually the mocking already exists. https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/a8c4961cbb336c3e64d38b7ad3b9aabc00398073/pkg/cloud/metadata_test.go

Sorry for the noise,
I'l ldefer to @vdhanan : do we need a test that if instance metadata is unavailable , we DON'T initialize a k8s client?

@vdhanan
Copy link
Contributor

vdhanan commented May 20, 2021

/retest

@vdhanan
Copy link
Contributor

vdhanan commented May 20, 2021

i think it's worth adding a unit test coverage for this. i would just add a boolean isClientSetNil field and an error clientSetNilErr field to the testCases struct and use those in a new test case to determine whether to initialize the clientset variable and what error to expect

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 20, 2021
}

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
clientset := fake.NewSimpleClientset(&tc.node)
var clientset *fake.Clientset
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What am I doing wrong here? For some reason I can't get this test case to pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you paste the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== RUN   TestNewMetadataService/fail:_kube_client_is_nil
W0520 18:25:16.684313   23724 metadata.go:107] EC2 instance metadata is not available
    metadata_test.go:412: NewMetadataService() returned an unexpected error. Expected instance metadata is unavailable and kubernetes clientset is nil, got instance metadata is unavailable and CSI_NODE_NAME env var not set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so even though you've set clientset to nil, it's not hitting the if condition you added to NewMetadataService, is that right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what dictates whether mockEC2Metadata.Available returns true or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because the pointer got cast into an interface. the interface is not nil even if the implementation is nil.

pkg/cloud/metadata.go Outdated Show resolved Hide resolved
pkg/cloud/metadata_test.go Outdated Show resolved Hide resolved
// creates the clientset
clientset, err = kubernetes.NewForConfig(config)
if err != nil {
return nil, err
Copy link
Contributor

@wongma7 wongma7 May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just omit the test. in runtime, we will either pass NewMetadataService a functional* svc or a functional* clientset. There is no situation where we pass neither. So I don't want to muddy the code just for the sake of passing a test case. I will refactor in a future PR so that it's easier to understand , and write another test, before we release this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I'm thinking is, NewMetadataService should take an interface with one function. and we will have two implementations of that interface, one based on ec2 metadata and one based on kubernetes API. If ec2 metadata is unavailable you pass the kubernetes API interface. simple!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure how to implement what you mean, I'm not super familiar with Go. So, you might want to make your own PR for that approach. I was originally making this PR because I wanted to highlight that the issue existed, but did not previously see the actual issue reported as a bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes @chrisayoub I don't want you to work on more than you signed up for, I can do it in a follow up. sorry for the confusion, I'm thinking out loud.

let's merge your PR without the unit test changes (so just the first commit).

I can take it from there!

  • e2e test it
  • refactor
  • unit test it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do a rebase or whatever to drop every commit but the first, force push it, and I'll merge. The coverage check doesn't matter, I can merge it regardless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it works, i'm totally cool with merging without unit tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test it out in our cluster, I put a hold on this until I do that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't really think of anything besides refactoring the function altogether

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. thanks for your help!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, this should be good to go!

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 21, 2021
@chrisayoub
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 21, 2021
@chrisayoub
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 21, 2021
@wongma7
Copy link
Contributor

wongma7 commented May 21, 2021

/lgtm
/approve

thank you! sorry about all the back and forth regarding whether we need a test or not, sometimes i abuse github comments like my personal notepad : )

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrisayoub, wongma7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2021
@wongma7 wongma7 merged commit 610675a into kubernetes-sigs:master May 21, 2021
@chrisayoub chrisayoub deleted the metadata_conditional_client branch May 21, 2021 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't start v1 docker image "panic: runtime error: invalid memory address or nil pointer dereference"
5 participants