-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conformance: Test that webhook validations are performed #1514
Comments
If there is a case where we expect that objects are accepted and reach controllers, how to test that behavior isn't immediately obvious. For example, in the case of #1497, if guidance was: are invalid and will be rejected by the webhook is installed. Implementations MUST exclude these resources from configuration and add an `Accepted=false` condition with a `HeaderFilterNotValid` reason if the webhook is not running. We can only test one or the other in a given run. If the webhook is running, the resource simply won't exist, and can't have conditions as such. If it is not running, tests can check conditions, but any test to confirm the webhook behavior (that the resource is rejected at creation time) would then fail. This would basically require two suites of conformance tests for behavior with or without the webhook, correct? |
Sigh, yes, you're right. I think that if we're going to do this, we should lean in and say "we're not going to mandate anything about what happens if the webhook is not running, sorry". |
This is inherently a problem with being out of tree, yeah? If I'm not mistaken at monday's meeting Travis was basically asking us what we recommend as far as getting the webhook deployed on operator clusters, e.g. he was asking the question whether we recommend the webhook be loaded in with the existing (Go) operator. Today the onus is partially on cluster operators to ensure the webhook is installed, but it might be significantly better if implementations were packaging it. I know there's some hesitation to force everyone to run the webhook and I also understand there's some downsides (particularly with multiple implementations), but perhaps we should explore this further and see if we can't put the burden of ensuring the webhook's presence more heavily on the implementation so that cluster operators would have to opt out to start getting into grey territory rather than being in grey territory by default (as this may inevitably get us into trouble)? |
Putting responsibility on implementations to run it, either as a separate container or integrated into a Golang controller, seems reasonable--that's basically what this test would enforce.
Would the situation change at all post-GA? I know that Ingress has more complex validation than schema enforcement alone, and it seems like we'd be able to add most if not all validation rules in the webhook there. I don't know if maybe there's an effort to avoid adding additional validations there going forward, but if we would eventually be able to add them there, we'd be able to plan for eventually not worrying about the webhook being installed separately. |
I've been thinking about this for some time now, and I tend to lean in a different direction. I think that we need to differentiate between a "conformant environment" and a "conformant implementation". In some instances, these are going to be one and the same, but in others, the controller author only has so much control. I'm not sure how useful it is for our controller-focused conformance tests to require the webhook to be present. I'd rather test that controllers don't blow up when presented with invalid resources. Most controllers can't control how they're installed, they can simply provide a recommended set of steps. Our docs currently state the following:
I still think the webhook is valuable and improves the UX of Gateway API dramatically, but I'm not convinced that it can/should be relied on or required to be present when running conformance tests. |
Relates to kubernetes-sigs#1514 This is a proof-of-concept conformance test that checks for the validation behavior provided by the admission webhook. In this case we're just checking that an update to a GatewayClass controller name is rejected. We use a temporary GatewayClass rather than the one specified by the --gateway-class, to avoid messing up installations that do allow this update through. This does raise an interesting question though - isn't it strange that we're saying an implementation is required to validate objects that it should otherwise ignore? In this case, we would would otherwise say an implementation should ingore this object because it doesn't recognize the controller name. Failure looks like this: ``` === RUN TestConformance/GatewayClassAdmissionValidation/GatewayClass_controllerName_is_immutable gatewayclass-admission-validation.go:62: Error Trace: gatewayclass-admission-validation.go:62 Error: Should be in error chain: expected: %!q(**errors.StatusError=0xc000014008) in chain: Test: TestConformance/GatewayClassAdmissionValidation/GatewayClass_controllerName_is_immutable Messages: updating gatewayclass-immutable GatewayClass.Spec.ControllerName should not be permitted ```
Just for discussion purposes, I put up #1534 Some thoughts:
HTH |
I feel like we are testing conformant environment. The environment really is the implementation. If we look at k8s for example, we aren't just testing some api-server binary, we are testing "Kubernetes" which consists of an environment many components. Gateway API is a much smaller environment, but fairly similar. An environment may get the validation from the built in installable webhook, the Istio library import, or even something else - all that matters is it validates.
I think its just a bit of a quirk of mixing global resources and shared resources with multiple implementations, the lines get a bit blurred. For example, you may have 1 validator for the entire cluster (in which case you want cross-class validation), or you may have N implementations doing validations (in which case you might want same-class only validation). Trying to make sure all environments have coverage of all resources is tricky - especially because Kubernetes isn't smart enough to know about "class" in the webhook, so it will actually still need to call all the validators, they could just chose to NOP. But if they are already called, it feels like they might as well validate as well. That does lead to making validation potentially stricter (if I have multiple versions, we are applying both versions of validation at once), but that feels OK to me. So overall, I think it seems weird especially if we just look at GatewayClass, but it still feels like the best option to me, and its less weird for other fields. |
I think we've definitely had some good points raised here, so here's a new proposal from me. What I'm really aiming for is this: the webhook currently implements some important checks for various types of object safety. I'd like to verify that a conformant implementation is doing some form of the same checks. It's a much better user experience if the invalid objects never make it into the apiserver (this is what the validation webhook is doing), but I think it's probably okay for implementations to not accept the invalid objects as well (this will handle @robscott's concern about invalid objects somehow making it through). What I think we should work towards is a set of conformance tests (maybe a separate suite), that test that invalid objects (as defined in the webhook code) are either not accepted (ie, the webhook or some other webhook that performs the same checks is running), or that the objects are For this to work, we'll need to ensure it's easy to import the validation code and either:
We already include the validation logic in release bundles, but we could make this easier by providing a simplified public API that will be easier to call (maybe a Validate function that takes a Kubenetes Alternatively, we have a pretty clear pattern for validation functions in our current
This approach would let the conformance tests validate the important thing - that the restrictions described in the webhook, which aren't easily expressible in field-level validations, are in effect. Whether those restrictions are done in a webhook or in the implementation doesn't seem as important to me as that they happen somewhere. |
Not fixed yet, looks like this was closed by an unrelated PR. |
This is still something we'd like to have: /help And ideally we would have it prior to our GA release: /priority important-soon As such we'll drop it in the |
We have deprecated the webhook, so it would seem we don't need to prioritize this any longer. |
What would you like to be added:
This issue covers adding conformance tests that test that the implementation is implementing the same blocks as are implemented in the webhook. The simplest way to do this is to run the webhook we provide, but the conformance tests should be a bit agnostic about this.
However, I believe that the conformance test should check that invalid objects are never persisted into the cluster by trying to apply them, and checking that the apply fails. That flow will require some sort of webhook, or possibly the extensions to CRD validation api-machinery is considering (which I can't remember the name of right now).
A secondary part of this effort should also clarify some guidelines about what controllers should do with invalid objects - hopefully we can end up with this being as easy as "run a Validate function on the object, and drop the object if it fails" or something.
Why this is needed:
This will make it very clear that the webhook is required for having a conformant implementation.
This issue came up during discussion on #1497, but I think I've mentioned it before, and haven't written it down. Sorry all.
The text was updated successfully, but these errors were encountered: