-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
override_host_set conflates different EDS health statuses #23213
Comments
Sounds great. But how to handle the status set by active health checker or outlier detection? 🤔 Should we just keep two statuses flag or allow the health checker to overwrite the EDS status? |
I would think that an active health checker failure or an outlier detection ejection would map to the EDS |
or may be maintain 2 status fields as @wbpcode suggested? |
I think having two status fields would make the API more complicated, because then we would have to expose a way of controlling whether a given host will be used as an override based on the value of both fields. The only status that is currently exposed in the xDS API is the EDS health status enum, and that enum is capable of representing the health states that Envoy drives from places other than EDS as well, so it seems to me that that's the right thing to use here. |
/assign Any way, we should solve this problem. |
FTR I think it makes sense to keep track of the original EDS status in |
…h status (#23432) Part of work to fix #23213. Because the canonical health() method is used widely and is enough for most of cases. We can treat it as abbreviated status. So, we won't update the health() directly. This PR added a new healthStatus() method to the host to provide more specific health status (with type of EDS envoy::config::core::v3::HealthStatus). This status is hybrid of EDS status and runtime active status (from active health checker or outlier detection). Runtime active status will be taken as a priority. This is simplest implementation for this feature. Risk Level: none. not change any exist code but add a new method. Testing: unit. Docs Changes: n/a. Release Notes: n/a. Platform Specific Features: n/a. Signed-off-by: wbpcode <[email protected]>
I don't think this is fully fixed yet. |
Yeah, an additional minor patch is necessary to match the override status with the new health status. I am doing it now. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
active |
This is something we discovered while in the process of implementing Envoy's stateful session affinity feature (see #16698) in gRPC.
From reading the code, it appears that there's a bug in Envoy's handling of the
override_host_status
field. The field allows setting individual health statuses for which the host-override behavior should be honored. However, internally, Envoy does not actually track all of the health statuses individually; in particular, it treats all ofDRAINING
,UNHEALTHY
, andTIMEOUT
as being exactly the same thing, and it treatsUNKNOWN
andHEALTHY
as being the same thing (see code). So for example, if someone sets theoverride_host_status
set to includeDRAINING
but notUNHEALTHY
orTIMEOUT
, Envoy will actually honor the host-override for all three of those states. I think this is a bug; it seems very clear from the API what is actually intended here.I think the underlying problem here is that Envoy does not internally record the actual EDS health status in the
Host
object; instead, it does a lossy conversion toHost::Health
. I suspect that the right fix is to have theHost
object record the original EDS health status with full fidelity and then look at that when evaluating theoverride_host_status
set.CC @wbpcode @htuch @yanavlasov @sanjaypujare
The text was updated successfully, but these errors were encountered: