conformance: Test that webhook validations are performed #1514

youngnick · 2022-11-08T03:40:35Z

What would you like to be added:
This issue covers adding conformance tests that test that the implementation is implementing the same blocks as are implemented in the webhook. The simplest way to do this is to run the webhook we provide, but the conformance tests should be a bit agnostic about this.

However, I believe that the conformance test should check that invalid objects are never persisted into the cluster by trying to apply them, and checking that the apply fails. That flow will require some sort of webhook, or possibly the extensions to CRD validation api-machinery is considering (which I can't remember the name of right now).

A secondary part of this effort should also clarify some guidelines about what controllers should do with invalid objects - hopefully we can end up with this being as easy as "run a Validate function on the object, and drop the object if it fails" or something.

Why this is needed:
This will make it very clear that the webhook is required for having a conformant implementation.

This issue came up during discussion on #1497, but I think I've mentioned it before, and haven't written it down. Sorry all.

rainest · 2022-11-08T23:34:47Z

If there is a case where we expect that objects are accepted and reach controllers, how to test that behavior isn't immediately obvious. For example, in the case of #1497, if guidance was:

are invalid and will be rejected by the webhook is installed. Implementations MUST exclude these resources from configuration and add an `Accepted=false` condition with a `HeaderFilterNotValid` reason if the webhook is not running.

We can only test one or the other in a given run. If the webhook is running, the resource simply won't exist, and can't have conditions as such. If it is not running, tests can check conditions, but any test to confirm the webhook behavior (that the resource is rejected at creation time) would then fail. This would basically require two suites of conformance tests for behavior with or without the webhook, correct?

youngnick · 2022-11-10T04:15:03Z

Sigh, yes, you're right.

I think that if we're going to do this, we should lean in and say "we're not going to mandate anything about what happens if the webhook is not running, sorry".

shaneutt · 2022-11-10T14:09:34Z

Sigh, yes, you're right.

I think that if we're going to do this, we should lean in and say "we're not going to mandate anything about what happens if the webhook is not running, sorry".

This is inherently a problem with being out of tree, yeah?

If I'm not mistaken at monday's meeting Travis was basically asking us what we recommend as far as getting the webhook deployed on operator clusters, e.g. he was asking the question whether we recommend the webhook be loaded in with the existing (Go) operator. Today the onus is partially on cluster operators to ensure the webhook is installed, but it might be significantly better if implementations were packaging it. I know there's some hesitation to force everyone to run the webhook and I also understand there's some downsides (particularly with multiple implementations), but perhaps we should explore this further and see if we can't put the burden of ensuring the webhook's presence more heavily on the implementation so that cluster operators would have to opt out to start getting into grey territory rather than being in grey territory by default (as this may inevitably get us into trouble)?

rainest · 2022-11-10T17:36:21Z

Putting responsibility on implementations to run it, either as a separate container or integrated into a Golang controller, seems reasonable--that's basically what this test would enforce.

This is inherently a problem with being out of tree, yeah?

Would the situation change at all post-GA? I know that Ingress has more complex validation than schema enforcement alone, and it seems like we'd be able to add most if not all validation rules in the webhook there.

I don't know if maybe there's an effort to avoid adding additional validations there going forward, but if we would eventually be able to add them there, we'd be able to plan for eventually not worrying about the webhook being installed separately.

robscott · 2022-11-11T01:24:42Z

I've been thinking about this for some time now, and I tend to lean in a different direction. I think that we need to differentiate between a "conformant environment" and a "conformant implementation". In some instances, these are going to be one and the same, but in others, the controller author only has so much control. I'm not sure how useful it is for our controller-focused conformance tests to require the webhook to be present.

I'd rather test that controllers don't blow up when presented with invalid resources. Most controllers can't control how they're installed, they can simply provide a recommended set of steps. Our docs currently state the following:

CRD and webhook validation is not the final validation i.e. webhook is "nice UX" but not schema enforcement. This validation is intended to provide immediate feedback to users when they provide an invalid configuration. Write code defensively...

I still think the webhook is valuable and improves the UX of Gateway API dramatically, but I'm not convinced that it can/should be relied on or required to be present when running conformance tests.

Relates to kubernetes-sigs#1514 This is a proof-of-concept conformance test that checks for the validation behavior provided by the admission webhook. In this case we're just checking that an update to a GatewayClass controller name is rejected. We use a temporary GatewayClass rather than the one specified by the --gateway-class, to avoid messing up installations that do allow this update through. This does raise an interesting question though - isn't it strange that we're saying an implementation is required to validate objects that it should otherwise ignore? In this case, we would would otherwise say an implementation should ingore this object because it doesn't recognize the controller name. Failure looks like this: ``` === RUN TestConformance/GatewayClassAdmissionValidation/GatewayClass_controllerName_is_immutable gatewayclass-admission-validation.go:62: Error Trace: gatewayclass-admission-validation.go:62 Error: Should be in error chain: expected: %!q(**errors.StatusError=0xc000014008) in chain: Test: TestConformance/GatewayClassAdmissionValidation/GatewayClass_controllerName_is_immutable Messages: updating gatewayclass-immutable GatewayClass.Spec.ControllerName should not be permitted ```

markmc · 2022-11-14T15:37:09Z

Just for discussion purposes, I put up #1534

Some thoughts:

It is definitely highly desirable to have validation like this - even the simple example of immutable GatewayClass.spec.controllerName shows how valuable it is in terms of UX
If we believe this should usually be achieved by installing our webhook, then this does feel like it's more about a "conformant environment" than a "conformant implementation"
If we want to check for a "conformant environment", it does seem a bit pointless to test all of the behaviors of the webhook - we can test the webhook before we publish its image!
In gateway-api: Add opt-in validating webhook istio/istio#41868 the proposal (AIUI) is to use library code from gateway-api to provide some admission validation checks - if we were providing a comprehensive validation library instead of an installable webhook, it might feel a bit more like we're trying to enforce that "conformant implementations" are providing these validations?
The GatewayClass.spec.controllerName example shows another weirdness with the "conformant implementation" approach - why would we require an implementation to validate objects that it would normally ignore (i.e. because it doesn't recognize the controller name) in order to be conformant? You'd expect we'd only require an implementation to validate objects it is accepting?

HTH

howardjohn · 2022-11-14T16:53:20Z

I feel like we are testing conformant environment. The environment really is the implementation. If we look at k8s for example, we aren't just testing some api-server binary, we are testing "Kubernetes" which consists of an environment many components. Gateway API is a much smaller environment, but fairly similar.

An environment may get the validation from the built in installable webhook, the Istio library import, or even something else - all that matters is it validates.

controllerName

I think its just a bit of a quirk of mixing global resources and shared resources with multiple implementations, the lines get a bit blurred. For example, you may have 1 validator for the entire cluster (in which case you want cross-class validation), or you may have N implementations doing validations (in which case you might want same-class only validation). Trying to make sure all environments have coverage of all resources is tricky - especially because Kubernetes isn't smart enough to know about "class" in the webhook, so it will actually still need to call all the validators, they could just chose to NOP. But if they are already called, it feels like they might as well validate as well. That does lead to making validation potentially stricter (if I have multiple versions, we are applying both versions of validation at once), but that feels OK to me.

So overall, I think it seems weird especially if we just look at GatewayClass, but it still feels like the best option to me, and its less weird for other fields.

youngnick · 2022-11-15T05:56:51Z

I think we've definitely had some good points raised here, so here's a new proposal from me.

What I'm really aiming for is this: the webhook currently implements some important checks for various types of object safety. I'd like to verify that a conformant implementation is doing some form of the same checks. It's a much better user experience if the invalid objects never make it into the apiserver (this is what the validation webhook is doing), but I think it's probably okay for implementations to not accept the invalid objects as well (this will handle @robscott's concern about invalid objects somehow making it through).

What I think we should work towards is a set of conformance tests (maybe a separate suite), that test that invalid objects (as defined in the webhook code) are either not accepted (ie, the webhook or some other webhook that performs the same checks is running), or that the objects are Accepted, status: false, with a Reason of FailedValidation (this is the case that the webhook is not running, but the implementation performs the same validations and updates the status accordingly).

For this to work, we'll need to ensure it's easy to import the validation code and either:

use it in a validation webhook
validate objects as part of normal implementation functions.

We already include the validation logic in release bundles, but we could make this easier by providing a simplified public API that will be easier to call (maybe a Validate function that takes a Kubenetes object.Object and then typecasts it and runs the type-specific checks?), and moving usage of it in our webhook to that.

Alternatively, we have a pretty clear pattern for validation functions in our current validation package. We could stick with what we have, then just mandate that:

implementations need to validate resources as part of reconciliation, and not Accept them with a ValidationFailed reason.
or, implementations can ensure the webhook is running
or both.

This approach would let the conformance tests validate the important thing - that the restrictions described in the webhook, which aren't easily expressible in field-level validations, are in effect. Whether those restrictions are done in a webhook or in the implementation doesn't seem as important to me as that they happen somewhere.

robscott · 2022-11-21T06:35:41Z

Not fixed yet, looks like this was closed by an unrelated PR.

shaneutt · 2023-04-10T21:49:51Z

This is still something we'd like to have:

/help

And ideally we would have it prior to our GA release:

/priority important-soon

As such we'll drop it in the v1.0.0 milestone and see if we can get progress on it soon.

shaneutt · 2024-03-12T16:06:51Z

We have deprecated the webhook, so it would seem we don't need to prioritize this any longer.

youngnick added kind/feature Categorizes issue or PR as related to a new feature. area/conformance labels Nov 8, 2022

youngnick mentioned this issue Nov 8, 2022

Add and test single filter action per header restriction #1497

Merged

shaneutt added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 8, 2022

markmc mentioned this issue Nov 14, 2022

[WIP] conformance: require that GatewayClass controller name is immutable #1534

Closed

Xunzhuo mentioned this issue Nov 21, 2022

fix: multi-push ci failed #1552

Merged

k8s-ci-robot closed this as completed in #1552 Nov 21, 2022

robscott reopened this Nov 21, 2022

robscott mentioned this issue Dec 5, 2022

Conformance Tests: TLSRoute Passthrough #1579

Open

16 tasks

shaneutt added this to Gateway API: The Road to GA Mar 8, 2023

shaneutt moved this to Triage in Gateway API: The Road to GA Mar 8, 2023

k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 10, 2023

shaneutt moved this from Triage to Backlog in Gateway API: The Road to GA Apr 10, 2023

shaneutt added this to the v1.0.0 milestone Apr 10, 2023

shaneutt moved this from Backlog to Next in Gateway API: The Road to GA Jul 3, 2023

shaneutt mentioned this issue Jul 24, 2023

Introduce validations through CEL for Gateway and GatewayClass. #2226

Merged

shaneutt removed this from Gateway API: The Road to GA Sep 27, 2023

shaneutt removed this from the v1.0.0 milestone Sep 27, 2023

shaneutt closed this as not planned Won't fix, can't repro, duplicate, stale Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conformance: Test that webhook validations are performed #1514

conformance: Test that webhook validations are performed #1514

youngnick commented Nov 8, 2022

rainest commented Nov 8, 2022

youngnick commented Nov 10, 2022

shaneutt commented Nov 10, 2022 •

edited

Loading

rainest commented Nov 10, 2022

robscott commented Nov 11, 2022

markmc commented Nov 14, 2022

howardjohn commented Nov 14, 2022

youngnick commented Nov 15, 2022

robscott commented Nov 21, 2022 •

edited

Loading

shaneutt commented Apr 10, 2023

shaneutt commented Mar 12, 2024

conformance: Test that webhook validations are performed #1514

conformance: Test that webhook validations are performed #1514

Comments

youngnick commented Nov 8, 2022

rainest commented Nov 8, 2022

youngnick commented Nov 10, 2022

shaneutt commented Nov 10, 2022 • edited Loading

rainest commented Nov 10, 2022

robscott commented Nov 11, 2022

markmc commented Nov 14, 2022

howardjohn commented Nov 14, 2022

youngnick commented Nov 15, 2022

robscott commented Nov 21, 2022 • edited Loading

shaneutt commented Apr 10, 2023

shaneutt commented Mar 12, 2024

shaneutt commented Nov 10, 2022 •

edited

Loading

robscott commented Nov 21, 2022 •

edited

Loading