Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kong upstream passive health check uses the HTTP code set by a plugin instead of the one returned by the target server #10281

Closed
1 task done
shanexguo opened this issue Feb 10, 2023 · 11 comments · Fixed by #10325

Comments

@shanexguo
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

3.0.0.0

Current Behavior

We noticed that when our custom Kong plugin set HTTP code to '502' via its exit hook ( function exit(status, body, headers) ), the involved upstream's passive unhealthy healthcheck using HTTP status code will count this as an incident of "unhealthy" state. When this occurs a few times in a short duration, as in a load test, the upstream will report "{"message":"failure to get a peer from the ring-balancer"}" in the errors log, essentially mark the upstream as "down".

Expected Behavior

We expect the upstream passive unhealthy healthcheck on HTTP status code should only mark the targets as "down" (not reachable or proxiable) using the returned HTTP status code from the target, and it should not be affected by the final HTTP status code set by Kong plugin, which is returned to the client.

Steps To Reproduce

On Kong 3.0.0.0:

  1. Create a custom Kong plugin that contains the following:
    1a) In its init_worker(), register itself as an exit hook:
    kong.response.register_hook("exit", self.exit, self)
    1b) In its body_filter function, generate an exit event as:
    return kong.response.exit(500)
    1c) In its exit hook function exit(status, body, headers), simply add a return statement as:
    return status, body, headers

  2. Create a Kong upstream with a target server, e.g. you can use any HTTP echo service, like http://httpbin.org/headers as the target server. Make sure the passive unhealthy health check is enabled. Leave everything else as default

  3. Assign the upstream to a Kong service, and then attach the custom plugin to the service

  4. Send a burst of HTTP requests that triggers the service and the cusotm plugin, and verify in the errors log that you see "{"message":"failure to get a peer from the ring-balancer"}"

Anything else?

O/S: Linux

@shanexguo
Copy link
Author

shanexguo commented Feb 13, 2023

Attached is a sample plugin that demonstrates the issue:

The plugin

It inserts some headers in its access() & response() functions. While doing this in the response(), it also generates an exit event with the HTTP status code 502, and attached the original HTTP body & headers it received from the upstream. Since it also registers itself with an exit hook, its own exit() function will then get called. All relevant debugging info gets logged in Kong's error log. This simulates our own custom plugins.

The Pongo tests

To test it, start Pongo docker container using the "pongo up && pongo shell" command. Inside the Pongo shell, run the 03 spec file, as:
busted /kong-plugin/spec/api-version/03-integration_spec.lua

The test sends 3 requests to the same URL ["https://httpbin.org/anything"] via configured upstreams. To observe the bug, the above tests invoke the same request 3 times in a row. Since the plugin will exit with the HTTP code "502", which is specified as one of the upstream's passive unhealthy HTTP status codes, the Kong load balancer for the upstream will pass the first HTTP requests and fail the later HTTP requests. You will also see that Kong will return HTTP code 503 with the message '{"message":"failure to get a peer from the ring-balancer"}'. Note: 503 is set by Kong, not the plugin. This happens because Kong uses the plugin's HTTP status code, instead of the upstream's original HTTP status code, during the upstream's passive unhealthy health check. We believe this is a bug since Kong should always use the upstream's original HTTP status code to evaluate if the upstream targets are healthy or not. From the error log, you can see that the original status code is always 200.

To further verify the above, you can remove the '502' status code from the upstream's passive unhealthy HTTP status code check in the 03-integration_spec.lua, and put it as a passive healthy HTTP status code. Then you should see all the 3 requests run successfully.

@shanexguo
Copy link
Author

kong-api-version-plugin.tar.gz
Here is the sample plugin and the Pongo test files described above.

@samugi samugi added the bug label Feb 14, 2023
@locao
Copy link
Contributor

locao commented Feb 14, 2023

Hi @shanexguo! Thanks for your report. That's a known behavior, we are discussing if we should change it in the future or we should document it better.

dndx pushed a commit that referenced this issue Mar 1, 2023
We expect the upstream passive unhealthy healthcheck on HTTP status code should only mark the targets as "down" (not reachable or proxiable) using the returned HTTP status code from the target, and it should not be affected by the final HTTP status code set by Kong plugin, which is returned to the client.

So we change the passive health check implementation to use nginx var `upstream_code`, this will not be changed by any plugin.

FTI-4841
Fix #10281
@shanexguo
Copy link
Author

shanexguo commented Mar 1, 2023 via email

@locao
Copy link
Contributor

locao commented Mar 1, 2023

This change will be included in Kong 3.3.

@shanexguo
Copy link
Author

shanexguo commented Mar 3, 2023 via email

@hbagdi
Copy link
Member

hbagdi commented Mar 3, 2023

Roughly middle of May.

@shanexguo
Copy link
Author

shanexguo commented Mar 8, 2023 via email

@locao
Copy link
Contributor

locao commented Mar 8, 2023

Hi @shanexguo,

Yes, Kong EE 3.0 is still under full support. Could you contact your technical support representative to open a ticket, so we can track this down and get it prioritized, please?

Thank you!

@shanexguo
Copy link
Author

shanexguo commented Mar 9, 2023 via email

@x108207
Copy link

x108207 commented Mar 10, 2023

locao pushed a commit that referenced this issue Mar 17, 2023
We expect the upstream passive unhealthy healthcheck on HTTP status code should only mark the targets as "down" (not reachable or proxiable) using the returned HTTP status code from the target, and it should not be affected by the final HTTP status code set by Kong plugin, which is returned to the client.

So we change the passive health check implementation to use nginx var `upstream_code`, this will not be changed by any plugin.

FTI-4841
Fix #10281

(cherry picked from commit dbe8d94)
locao pushed a commit that referenced this issue Mar 17, 2023
We expect the upstream passive unhealthy healthcheck on HTTP status code should only mark the targets as "down" (not reachable or proxiable) using the returned HTTP status code from the target, and it should not be affected by the final HTTP status code set by Kong plugin, which is returned to the client.

So we change the passive health check implementation to use nginx var `upstream_code`, this will not be changed by any plugin.

FTI-4841
Fix #10281

(cherry picked from commit dbe8d94)
locao pushed a commit that referenced this issue Mar 17, 2023
We expect the upstream passive unhealthy healthcheck on HTTP status code should only mark the targets as "down" (not reachable or proxiable) using the returned HTTP status code from the target, and it should not be affected by the final HTTP status code set by Kong plugin, which is returned to the client.

So we change the passive health check implementation to use nginx var `upstream_code`, this will not be changed by any plugin.

FTI-4841
Fix #10281

(cherry picked from commit dbe8d94)
locao pushed a commit that referenced this issue Mar 17, 2023
We expect the upstream passive unhealthy healthcheck on HTTP status code should only mark the targets as "down" (not reachable or proxiable) using the returned HTTP status code from the target, and it should not be affected by the final HTTP status code set by Kong plugin, which is returned to the client.

So we change the passive health check implementation to use nginx var `upstream_code`, this will not be changed by any plugin.

FTI-4841
Fix #10281

(cherry picked from commit dbe8d94)
bungle pushed a commit that referenced this issue Mar 21, 2023
We expect the upstream passive unhealthy healthcheck on HTTP status code should only mark the targets as "down" (not reachable or proxiable) using the returned HTTP status code from the target, and it should not be affected by the final HTTP status code set by Kong plugin, which is returned to the client.

So we change the passive health check implementation to use nginx var `upstream_code`, this will not be changed by any plugin.

FTI-4841
Fix #10281

(cherry picked from commit dbe8d94)
bungle pushed a commit that referenced this issue Mar 21, 2023
We expect the upstream passive unhealthy healthcheck on HTTP status code should only mark the targets as "down" (not reachable or proxiable) using the returned HTTP status code from the target, and it should not be affected by the final HTTP status code set by Kong plugin, which is returned to the client.

So we change the passive health check implementation to use nginx var `upstream_code`, this will not be changed by any plugin.

FTI-4841
Fix #10281

(cherry picked from commit dbe8d94)
bungle pushed a commit that referenced this issue Mar 21, 2023
We expect the upstream passive unhealthy healthcheck on HTTP status code should only mark the targets as "down" (not reachable or proxiable) using the returned HTTP status code from the target, and it should not be affected by the final HTTP status code set by Kong plugin, which is returned to the client.

So we change the passive health check implementation to use nginx var `upstream_code`, this will not be changed by any plugin.

FTI-4841
Fix #10281

(cherry picked from commit dbe8d94)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants