-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health check extension returns 200 status code during errors #8276
Comments
I'm assigning this to myself as I'm the code owner, but I believe we didn't implement yet the reporting of the state of individual components. |
@jpkrohling sorry this might be a red herring, when I use |
More context: when running a traces exporter like otlp or kafka, sometimes the TCP connection dies, but there is no built-in connection restart, so the exporter queue starts filling up. I want to restart otel-collector to reestablish connections. Ideally this would be done before data loss occurs when you hit exporter queue capacity. Exposing this as a metric in open-telemetry/opentelemetry-collector#4902 and then configuring a percent of capacity threshold for health check failure like "signal unhealthy when capacity reaches 95%" would be a way to prevent data loss. |
I report the same issue in #11780, with more technical details (e.g. explaining why initially HC serves status 500, but after a minute revert to 200). |
Pinging code owners: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I still have this on my queue. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Describe the bug
When I simulate exporter errors and use health check with
check_collector_pipeline
enabled, I get a response likeSteps to reproduce
run v0.46.0 with this config.yaml
And with this python script test.py:
and requirements.txt
execute in a loop until you see errors:
What did you expect to see?
Health check eventually responds with a 5xx status code
What did you see instead?
Health check always responds with a 200 status code
What version did you use?
0.46.0
What config did you use?
See above
Environment
OS: locally tested on OSX
The text was updated successfully, but these errors were encountered: