Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy returns 403 for failed gRPC ext_authz requests #6119

Closed
hanyu-liu opened this issue Feb 27, 2019 · 20 comments · Fixed by #6669
Closed

Envoy returns 403 for failed gRPC ext_authz requests #6119

hanyu-liu opened this issue Feb 27, 2019 · 20 comments · Fixed by #6669
Labels
design proposal Needs design doc/proposal before implementation

Comments

@hanyu-liu
Copy link

Title: Envoy returns 403 for failed gRPC ext_authz requests

Description:
The gRPC implementation for ext_authz returns 403(forbidden), when the request to the gRPC server is failed on connection error. It can be an temporary error. However, 403 doesn't recommend an automatic retry.

In order to have client retry, can we change it to 503?

https://httpstatuses.com/403

Config:

          http_filters:
          - name: envoy.ext_authz
            config:
              failure_mode_allow: false
              grpc_service:
                envoy_grpc:
                  cluster_name: token-grpc
                timeout: 10.0s

Call Stack:
The FORBIDDEN is hard coded at: https://github.com/envoyproxy/envoy/blob/master/source/extensions/filters/common/ext_authz/ext_authz_grpc_impl.cc#L77

@alyssawilk alyssawilk added the design proposal Needs design doc/proposal before implementation label Feb 28, 2019
@alyssawilk
Copy link
Contributor

Could you add a bit of clarification why you want to retry authz failure? I'm unsure of your use case but most authz failures I've seen have been pretty deterministic, such that retries are going to result in another failure.

@hanyu-liu
Copy link
Author

hanyu-liu commented Feb 28, 2019

Sure.

When an existing gRPC service is un-deployed, Envoy may run into a race condition with connection closure - the tcp package is out before FIN is received by Envoy. As such, the request is failed due to network issue.

I'm referring to the failures due to network issue, rather than explicit failed authorization. As network is not always reliable anyway, I suppose it not a 4xx, since it's more on the server side.

Please feel free to let me know if anything is unclear to you.

@alyssawilk
Copy link
Contributor

Ah, I think I've got you. Seems reasonable to me but I think @gsagula is likely to have a more informed opinion. Gabriel, any thoughts?

@gsagula
Copy link
Member

gsagula commented Mar 1, 2019

@hanyu-liu I see your point. The hard-coded 403 may seem a bit opinionated, but do you really want to give away what's going on with the authorization system? It's not uncommon for authorization servers to always return unauthorized or forbidden, and then treat underlying issues internally. However, I think we could enhance the filter by making it configurable. Please, let me know what you think. Thanks!

@hanyu-liu
Copy link
Author

I feel that clients may get confused by the 403 for an invalid authorization. I am okay with a hard coded 503 returned to client. I can probably make the PR myself, if it sounds good to you.

It also sounds good to me to have a configurable response code for the purpose of backward compatibility. @alyssawilk @gsagula

@gsagula
Copy link
Member

gsagula commented Mar 1, 2019

Thanks! It sounds good to me.

hanyu-liu added a commit to hanyu-liu/envoy that referenced this issue Mar 1, 2019
Return 503 from ext_authz on network failures

Risk Level: Low
Testing: CI
Docs Changes: n/a
Release Notes: n/a

envoyproxy#6119
Signed-off-by: Hanyu Liu <[email protected]>
hanyu-liu added a commit to hanyu-liu/envoy that referenced this issue Mar 8, 2019
Return 503 from ext_authz on network failures

Risk Level: Low
Testing: CI
Docs Changes: n/a
Release Notes: n/a

envoyproxy#6119
Signed-off-by: Hanyu Liu <[email protected]>
@hanyu-liu
Copy link
Author

hanyu-liu commented Mar 14, 2019

Continuing the talk from #6148.

Do you actually think an enum can work well for status_code_on_error? Don't want users to abuse the system by returning like 200, when the ext_authz is down.

Alternatively, can we fix it by introducing a hard-coded error code as a breaking change in the next major release?

@hanyu-liu
Copy link
Author

Link to #5974, who suffers from similar symptoms on http ext_authz

@gdheller42
Copy link

I have the same issue with ext_authz. When I remove or update the ext_authz service. (I run 2 instances in a cluster), the client may have to do multiple retries before the removed service endpoint is properly removed from load balancing. Can the filter be configured to do retries much as a route would. I don't believe the status code received by the downstream would need to be changed in this case. I like to know that the failure (4xx) is from the ext_authz service and not the target upstream.

Link to #5974, who suffers from similar symptoms on http ext_authz

@gsagula
Copy link
Member

gsagula commented Mar 28, 2019

I have this done, just need to write a test to it.

@gdheller42
Copy link

@gsagula any change in status?

@gsagula
Copy link
Member

gsagula commented Apr 22, 2019

@gdheller42 @hanyu-liu ^^

@hanyu-liu
Copy link
Author

Looks like it will not work for gRPC? @gsagula

@gdheller42 @hanyu-liu ^^

@gsagula
Copy link
Member

gsagula commented May 17, 2019

@hanyu-liu Why not?

@hanyu-liu
Copy link
Author

hanyu-liu commented May 17, 2019

@gsagula
Copy link
Member

gsagula commented May 17, 2019

This is the default response code and it will get returned to the client unless the different code is configured on the filter level.

@gdheller42
Copy link

@gsagula Has this been merged into master yet? Thanks.

@gsagula
Copy link
Member

gsagula commented May 17, 2019

Not yet, I'm addressing the last comment. It should get merged soon though.

@gdheller42
Copy link

@gsagula Thanks for the update..

@gdheller42
Copy link

@gsagula pulled a docker image of the update. still need to grab v2 api and update my auth code. Will test next week. Thanks Have a good weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design proposal Needs design doc/proposal before implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants