Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add snapshot webhook build and deployment. Modify controller to label invalid objects. #353

Merged
merged 3 commits into from
Aug 29, 2020

Conversation

AndiLi99
Copy link
Contributor

@AndiLi99 AndiLi99 commented Aug 12, 2020

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:
Add a webhook which can be used to perform stricter validation, and to do ratcheting validation as part of the release plan.
Please see this KEP for full context on why a webhook is needed.

Add webhook build, docker, and deployment scripts. Boilerplate for adding the snapshot validation endpoint is in place.
Which issue(s) this PR fixes:

Fixes #187
Fixes #363

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

The validation for volume snapshot objects (VolumeSnapshot and VolumeSnapshotContent) is getting more strict. Due to backwards compatibility this change will occur over multiple releases. The following key changes are highlighted.

1. As part of the first phase of the multi-phased release process, a validating webhook server has been added. This server will perform additional validation (strict) which was not done during the beta release of volume snapshots. It will prevent the cluster from gaining (via create or update) invalid objects.
2. The controller will label objects which fail the additional validation.

The combination of 1 and 2 will allow cluster admins to stop the increase of invalid objects, and provide a way to easily list all objects which currently fail the strict validation. It is the cluster admin's responsibility to install the webhook and to ensure all the invalid objects in the cluster have been deleted or fixed. See the KEP at https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 12, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @AndiLi99!

It looks like this is your first PR to kubernetes-csi/external-snapshotter 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/external-snapshotter has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @AndiLi99. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 12, 2020
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 12, 2020
@AndiLi99 AndiLi99 force-pushed the AndiLi99/webhook branch 6 times, most recently from 4f1e8c8 to 55cd2a5 Compare August 12, 2020 15:32
@yuxiangqian
Copy link
Contributor

cc @xing-yang @msau42

@msau42
Copy link
Collaborator

msau42 commented Aug 12, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 12, 2020
Copy link
Contributor

@yuxiangqian yuxiangqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preliminary comments

Makefile Outdated Show resolved Hide resolved
cmd/webhook/main.go Outdated Show resolved Hide resolved
@@ -0,0 +1,67 @@
# How to deploy the webhook

The webhook server is provided as an image which can be built from this repository. It can be deployed anywhere, as long as the api server is able to reach it over HTTPS. It is recommended to deploy the webhook server in the cluster as snapshotting is latency sensitive. A `ValidatingWebhookConfiguration` object is needed to configure the api server to contact the webhook server. Please see the [documentation](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) for more details. The webhook server code is adapted from the [webhook server](https://github.com/kubernetes/kubernetes/tree/v1.18.6/test/images/agnhost/webhook) used in the kubernetes/kubernetes end to end testing code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should include a prerequisites/dependencies section in this README to include:

  1. k8s version
  2. admission controller is enabled etc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a brief description of the webhook and add a link to this doc in the top level README: https://github.com/kubernetes-csi/external-snapshotter/blob/master/README.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the prereq section. I also added a brief description to top level readme with a link. Please take a look.

deploy/kubernetes/webhook-example/README.md Outdated Show resolved Hide resolved
Patch the `ValidatingWebhookConfiguration` file from the template, filling in the CA bundle field.
s
```bash
cat ./deploy/kubernetes/webhook-example/admission-configuration-template | ./deploy/kubernetes/webhook-example/patch-ca-bundle.sh > ./deploy/kubernetes/webhook-example/admission-configuration.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe do not include the repo path?

@@ -0,0 +1,7 @@
FROM gcr.io/distroless/base:latest
LABEL maintainers="Kubernetes Authors"
LABEL description="Snapshot Webhook"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets call it VolumeSnapshot, ditto everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the naming convention from the Dockerfile for csi-snapshotter and snapshot-controller. Do you want to change both of those as well, or have an inconsistent naming pattern?

pkg/webhook/main.go Outdated Show resolved Hide resolved
pkg/webhook/main.go Outdated Show resolved Hide resolved
pkg/webhook/main.go Outdated Show resolved Hide resolved
pkg/webhook/main.go Outdated Show resolved Hide resolved
pkg/common-controller/snapshot_controller.go Outdated Show resolved Hide resolved
pkg/utils/util.go Outdated Show resolved Hide resolved
pkg/validation-webhook/convert.go Outdated Show resolved Hide resolved
func convertAdmissionResponseToV1beta1(r *v1.AdmissionResponse) *v1beta1.AdmissionResponse {
var pt *v1beta1.PatchType
if r.PatchType != nil {
t := v1beta1.PatchType(*r.PatchType)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always cast'able?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should work, as they're both just strings underneath

pkg/validation-webhook/snapshot.go Outdated Show resolved Hide resolved
pkg/validation-webhook/snapshot.go Outdated Show resolved Hide resolved
Copy link
Contributor

@yuxiangqian yuxiangqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP. First cut.

pkg/common-controller/snapshot_controller.go Outdated Show resolved Hide resolved
pkg/common-controller/snapshot_controller.go Outdated Show resolved Hide resolved
pkg/common-controller/snapshot_controller.go Outdated Show resolved Hide resolved
@@ -1285,7 +1302,7 @@ func (ctrl *csiSnapshotCommonController) addSnapshotFinalizer(snapshot *crdv1.Vo
}
_, err := ctrl.clientset.SnapshotV1beta1().VolumeSnapshots(snapshotClone.Namespace).Update(context.TODO(), snapshotClone, metav1.UpdateOptions{})
if err != nil {
return newControllerUpdateError(snapshot.Name, err.Error())
return newControllerUpdateError(utils.SnapshotKey(snapshot), err.Error())

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looked like a typo, so i changed it

pkg/common-controller/snapshot_controller.go Outdated Show resolved Hide resolved
pkg/utils/util.go Outdated Show resolved Hide resolved
pkg/validation-webhook/main.go Outdated Show resolved Hide resolved
pkg/validation-webhook/snapshot.go Outdated Show resolved Hide resolved
pkg/validation-webhook/snapshot.go Show resolved Hide resolved
pkg/validation-webhook/snapshot.go Outdated Show resolved Hide resolved
@xing-yang
Copy link
Collaborator

Can you add a release note?

@@ -0,0 +1,67 @@
# How to deploy the webhook

The webhook server is provided as an image which can be built from this repository. It can be deployed anywhere, as long as the api server is able to reach it over HTTPS. It is recommended to deploy the webhook server in the cluster as snapshotting is latency sensitive. A `ValidatingWebhookConfiguration` object is needed to configure the api server to contact the webhook server. Please see the [documentation](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) for more details. The webhook server code is adapted from the [webhook server](https://github.com/kubernetes/kubernetes/tree/v1.18.6/test/images/agnhost/webhook) used in the kubernetes/kubernetes end to end testing code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a brief description of the webhook and add a link to this doc in the top level README: https://github.com/kubernetes-csi/external-snapshotter/blob/master/README.md

go.mod Outdated Show resolved Hide resolved
pkg/common-controller/snapshot_controller.go Show resolved Hide resolved
pkg/utils/util.go Outdated Show resolved Hide resolved
pkg/validation-webhook/snapshot.go Outdated Show resolved Hide resolved
@yuxiangqian
Copy link
Contributor

@AndiLi99 some of the unit testings are failing because the newly added label for invalid CRs. Those unit test cases need to be fixed.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 20, 2020
pkg/common-controller/snapshot_controller.go Show resolved Hide resolved
pkg/common-controller/snapshot_controller.go Outdated Show resolved Hide resolved
pkg/validation-webhook/snapshot.go Outdated Show resolved Hide resolved
pkg/validation-webhook/snapshot.go Outdated Show resolved Hide resolved
pkg/validation-webhook/snapshot.go Outdated Show resolved Hide resolved
@yuxiangqian
Copy link
Contributor

@AndiLi99 please fix existing invalid tests which are failing due to newly added invalid labels.

@AndiLi99
Copy link
Contributor Author

/retest

Copy link
Contributor

@yuxiangqian yuxiangqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of more nits.
@xing-yang @msau42 can you two take a look as well?

Build the docker image

```bash
docker build -t gcr.io/your-project-name/webhook:latest -f ./cmd/webhook/Dockerfile .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xing-yang @msau42 which gcr.io project should this repo use for webhook images?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it follow the similar pattern here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Official released images will be like this: k8s.gcr.io/sig-storage/csi-snapshotter:v1.2.2
However, you are showing an example on how to build the image yourself so it won't be the official image. I'd suggest not to specify the image repo name.

docker build -t webhook:latest -f ./cmd/webhook/Dockerfile

spec:
containers:
- name: snapshot-validation
image: gcr.io/your-project-name/validation-webhook:latest # change the image if you wish to use your own custom validation server image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the image once my previous comments w.r.t image location is resolved.
example here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are releasing this together with snapshot-controller and csi-snapshotter, the version will be v2.2.0.

k8s.gcr.io/sig-storage/validation-webhook:v2.2.0

pkg/common-controller/snapshot_controller.go Outdated Show resolved Hide resolved
}
return &tls.Config{
Certificates: []tls.Certificate{sCert},
// TODO: uses mutual tls after we agree on what cert the apiserver should use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there action item for this TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this came from the example webhook. I believe it's for the case where the the webhook server verifies the identity of the api server.

}
return decideSnapshotContent(snapcontent, oldSnapcontent, isCreate)
default:
err := fmt.Errorf("expect resource to be %s or %s", SnapshotV1Beta1GVR, SnapshotContentV1Beta1GVR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting situation here. If a resource other than Snapshot/SnapshotContent has been mis-configured using this webhook for validation, it will fail all the time which might break existing functionality. Should the error here be ignored rather than populated to API server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the error here be ignored rather than populated to API server?

No, we should break fast. If they misconfigure they should know.

// Which allows the remover of finalizers and therefore deletion of this object
// Don't rely on the pointers to be nil, because the deserialization method will convert it to
// The empty struct value. Instead check the operation type.
err := utils.ValidateSnapshot(oldSnapshot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, can be simplified as:
if err := utils.ValidateSnapshot; err != nil { return ...}
this way, you can move line 95 out of the "if isCreate" block and save the call at line 113.
WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i cleaned it up differently, ptal

pkg/validation-webhook/snapshot.go Outdated Show resolved Hide resolved
return *s
}
func checkSnapshotImmutableFields(snapshot, oldSnapshot *volumesnapshotv1beta1.VolumeSnapshot) error {
if snapshot == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil checks are no longer needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't want to make any assumptions about the caller. The ptr is never nil now, but that may change (future tests, deserialization change). Do you recommend changing it anyways?

}

func checkSnapshotContentImmutableFields(snapcontent, oldSnapcontent *volumesnapshotv1beta1.VolumeSnapshotContent) error {
if snapcontent == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil checks are no longer needed.

pkg/validation-webhook/webhook.go Outdated Show resolved Hide resolved
@yuxiangqian
Copy link
Contributor

@AndiLi99 please remove [WIP] from the PR title

@AndiLi99 AndiLi99 changed the title [WIP] Add boiler plate for snapshot webhook build and deployment Add boiler plate for snapshot webhook build and deployment Aug 25, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 25, 2020
kind: Deployment
metadata:
name: snapshot-validation-deployment
namespace: default # NOTE: change the namespace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this namespace be the same as the namespace of the snapshot-controller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be, but it may be easier to organize that way.

spec:
containers:
- name: snapshot-validation
image: gcr.io/your-project-name/validation-webhook:latest # change the image if you wish to use your own custom validation server image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are releasing this together with snapshot-controller and csi-snapshotter, the version will be v2.2.0.

k8s.gcr.io/sig-storage/validation-webhook:v2.2.0

@AndiLi99 AndiLi99 changed the title Add boiler plate for snapshot webhook build and deployment Add snapshot webhook build and deployment. Modify controller to label invalid objects. Aug 26, 2020
@AndiLi99
Copy link
Contributor Author

I think this is at a good state. Please take a look @yuxiangqian @msau42 @xing-yang

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2020
https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md

1. Ratcheting validation webhook server image
2. Controller labels invalid objects
3. Unit tests for webhook
4. Deployment README and example deployment method with certs
5. Update top-level README

Racheting validation:
1. webhook is strict on create
2. webhook is strict on updates where the existing object passes strict validation
3. webhook is relaxed on updates where the existing object fails strict validation (allows finalizer removal, status update, deletion, etc)

Additionally the validating wehook server will perform immutability
checks on scenario 2 above.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 28, 2020
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
pkg/validation-webhook/webhook.go Show resolved Hide resolved
@xing-yang
Copy link
Collaborator

/retest

Andi Li added 2 commits August 28, 2020 12:06
Minor cleanup and change default fail policy and timeout on webhook
config.
@@ -79,6 +81,22 @@ Install CSI Driver:
* kubectl create -f deploy/kubernetes/csi-snapshotter
* https://github.com/kubernetes-csi/external-snapshotter/tree/master/deploy/kubernetes/csi-snapshotter

### Validating Webhook
Copy link
Collaborator

@msau42 msau42 Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xing-yang @yuxiangqian let's look at this before we release, but I think it would be good to revamp the overview to make it more clear of all the pieces in this repo and who is responsible for what:

  • CRDs + webhook
  • controller
  • sidecar
  • client libraries

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@msau42
Copy link
Collaborator

msau42 commented Aug 28, 2020

This lgtm! Will let @xing-yang and @yuxiangqian give final approvals.

Awesome work Andi!

@AndiLi99
Copy link
Contributor Author

@msau42 Thank you so much! Everyone's help was instrumental in getting this done :)

Copy link
Collaborator

@xing-yang xing-yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks Andi!

@@ -79,6 +81,22 @@ Install CSI Driver:
* kubectl create -f deploy/kubernetes/csi-snapshotter
* https://github.com/kubernetes-csi/external-snapshotter/tree/master/deploy/kubernetes/csi-snapshotter

### Validating Webhook
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 28, 2020
@yuxiangqian
Copy link
Contributor

/lgtm
/approve
GJ @AndiLi99

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndiLi99, yuxiangqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 29, 2020
@k8s-ci-robot k8s-ci-robot merged commit 4f3b02a into kubernetes-csi:master Aug 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add labels for invalid snapshot objects Create a webhook to do extra validation on beta APIs
6 participants