Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubetest2: Infer the provider and zones from the kops cluster #10847

Merged
merged 7 commits into from
Apr 27, 2021

Conversation

justinsb
Copy link
Member

This means we don't need to pass these flags explicitly.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 16, 2021
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 16, 2021
@justinsb
Copy link
Member Author

It looks like we weren't running cloudprovider specific tests, AFAICT.

/assign @rifelpet

@rifelpet
Copy link
Member

rifelpet commented Feb 17, 2021

You're right, we are missing cloud provider tests as well as a few others that I'm working to enable (see this comment, though my fix didn't seem to help)

I'm not sure how I feel about importing these dependencies into kubetest2, it feels like kubetest2 should be more standalone and operate agnostic of any kops version. That said, we do need a way of knowing which cloud provider the cluster is using. I brought this up in the kubetest2 repo: kubernetes-sigs/kubetest2#87 (comment) and one suggestion was to use a well-known file that the deployer creates and the tester reads. Perhaps we could do that? just a json or yaml file that contains the cloud provider and zones and nothing else.

Alternatively we could do a more naive unmarshal of kops get cluster -o yaml, or better yet add support for -o jsonpath (I'm sure this would be significantly more work though)

Thoughts?

@justinsb
Copy link
Member Author

Good point on the imports @rifelpet ! I'm deliberately only using the models (e.g. I'm not using the vfs functions to read direct from the state store). We could reimplement the models (just the fields we need); we could also make the apis into the own go module, which would help with the volume of dependencies.

We need so few fields I can easily reimplement, but over time we'll need more (e.g. I have a test for etcd status) and it'll become more tedious. That said, these are part of our contract, so it isn't like changing them is common/easy.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2021
tests/e2e/go.mod Outdated

// These should match the go.mod from k8s.io/kops

replace k8s.io/api => k8s.io/api v0.20.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should replace these the relative path to the local vendor directory, that way we dont need to keep two go.mod files in sync anytime we bump these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried this locally and it doesn't work. replace k8s.io/api => ../../vendor/k8s.io/api will fail because go expects the go.mod and go.sum files to exist at the location, but vendoring modules explicitly does not include them.

Looks like we'll just need to keep these in sync going forward

@rifelpet rifelpet force-pushed the pass_provider_with_kubetest2 branch from 2d641ea to 677b4bf Compare April 14, 2021 23:17
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 14, 2021
@rifelpet rifelpet force-pushed the pass_provider_with_kubetest2 branch from 677b4bf to 4664534 Compare April 16, 2021 23:12
@rifelpet rifelpet force-pushed the pass_provider_with_kubetest2 branch 3 times, most recently from 103650c to f420fe3 Compare April 23, 2021 03:24
@rifelpet
Copy link
Member

retrying now that we're testing with k8s 1.21

/retest

@rifelpet
Copy link
Member

/test pull-kops-e2e-k8s-gce

@johngmyers
Copy link
Member

/retest

@rifelpet
Copy link
Member

The AWS test failures all seem to be permissions related:

event for exec-volume-test-preprovisionedpv-cwxl: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "aws-zj78w" : error attaching EBS volume "vol-00aaae1fd7f2591ed" to instance "i-081097ce79d433274": "UnauthorizedOperation: You are not authorized to perform this operation.

That error is from KCM which uses the master IAM role which has AttachVolume permissions on volumes tagged with KubernetesCluster=<clustername>:

&Statement{
Effect: StatementEffectAllow,
Action: stringorslice.Of(
"ec2:AttachVolume", // aws.go
"ec2:AuthorizeSecurityGroupIngress", // aws.go
"ec2:CreateRoute", // aws.go
"ec2:DeleteRoute", // aws.go
"ec2:DeleteSecurityGroup", // aws.go
"ec2:DeleteVolume", // aws.go
"ec2:DetachVolume", // aws.go
"ec2:RevokeSecurityGroupIngress", // aws.go
),
Resource: resource,
Condition: Condition{
"StringEquals": map[string]string{
"ec2:ResourceTag/KubernetesCluster": clusterName,
},
},
},

Viewing the testgrid history of the job, scrolling far enough to the right reveals that these same tests were passing before the kubetest2 migration around April 2nd. Here is the final kubetest1 job with passing tests.

Comparing the ginkgo test args, these were being provided before and are no longer being provided now:
--gce-region=ca-central-1 --gce-multizone=false --cluster-tag=e2e-de198719ee-ff1eb.test-cncf-aws.k8s.io --repo-root=. --disable-log-dump=true

I'm wondering if the e2e tests use the --cluster-tag to tag the EBS volumes it creates, and without the flag the master IAM role doesn't have permission to attach it. I'll try adding those flags next.

@rifelpet
Copy link
Member

rifelpet commented Apr 24, 2021

Yep I'm guessing thats the case :)

https://github.com/kubernetes/kubernetes/blob/9fd42639a07f44097ec733f2ef2a780895505c08/test/e2e/framework/providers/aws/aws.go#L104-L124

the GCE failure is because we're not passing the gcp project to the tester and kops binary that it shells out to. The tester also needs to provide the project flag to the e2e test suite. I'll address both issues shortly

@rifelpet rifelpet force-pushed the pass_provider_with_kubetest2 branch from 7e0e2b8 to 9afffab Compare April 24, 2021 03:19
@rifelpet
Copy link
Member

/test pull-kops-e2e-k8s-gce

@rifelpet rifelpet force-pushed the pass_provider_with_kubetest2 branch from 50356d2 to 79b904f Compare April 24, 2021 12:56
@rifelpet
Copy link
Member

Ok the additional tags have reduced us down to one failure in the AWS jobs, and I believe I have a fix for the current issue in the GCE job

/test pull-kops-e2e-k8s-gce

@rifelpet
Copy link
Member

rifelpet commented Apr 24, 2021

The failing AWS test is blocked by kubernetes/kubernetes#101443

Regarding the GCE test failures, since I recently added the presubmit job with kubetest2 I'll have to use the periodic job as a reference point to determine how to proceed. The GCE presubmit is optional though, so I suppose this PR doesn't necessarily need to be blocked by getting it to pass.

@rifelpet
Copy link
Member

/test pull-kops-e2e-k8s-gce

@rifelpet
Copy link
Member

/retest

I skipped the failing AWS KMS test until it is fixed upstream, so the aws jobs should pass now.

Since the gce job is new and optional we can work on addressing its failures independently of this PR

@k8s-ci-robot
Copy link
Contributor

@justinsb: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kops-e2e-k8s-gce 8d5e46c link /test pull-kops-e2e-k8s-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 27, 2021
@hakman
Copy link
Member

hakman commented Apr 27, 2021

/skip all

@rifelpet rifelpet force-pushed the pass_provider_with_kubetest2 branch from 8d5e46c to 038baef Compare April 27, 2021 15:12
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 27, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 27, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 3430c52 into kubernetes:master Apr 27, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants