-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow customizing initial_fetch_timeout in the envoy sidecar for Consul Service Mesh #17283
Comments
To provide a little more color on why this is important: when envoy starts in this state, it continuously returns 503s for the upstreams that failed to populate, and the only solution is to restart the sidecar container (or kill the instance entirely). |
Hi @komapa @luckymike from reviewing those links you provided it does seem like the best thing to do for default config is set this to |
Thank you for picking this ticket up @david-yu. I cannot think of a case in our setup where that would be needed but we obviously do not represent all of the users :) If it is not terribly difficult to make it an option, I would advise you do so. |
Hi @komapa We just merged a PR that sets |
Hi @komapa Unfortunately we'll need to roll this fix back on 1.14.x and 1.15.x in the interim as we've discovered that our implementation causes issues on Ingress, Terminating and Mesh Gateways based on further testing. We're hoping to re-release this feature again in the future. |
That is very unfortunate. Do you have any public details on what the issue is with the listed software? Also, instead of reverting, can we make it configurable so this way we can make it zero just for the sidecars? Thank you! |
Out of curiosity @komapa do you use any terminating or mesh gateways in your environment? We need to do more investigation to understand how to enable this. It's a lot trickier than we thought. |
We do not actively use terminating gateway functionality and we never used any mesh gateways in our setup. We did upgrade our work in progress Kubernetes clusters and we do see there that the ingress gateways on 1.15.3 do seem to be having problems that I can take a closer look if needed. How can we help so you can help us? :) |
Bump |
Hi @komapa 👋. I'm working on a permanent fix now that I am pretty confident will be in the next set of patch releases. Thanks for working with us while we get this sorted out. The original changes should have been reverted for 1.15.3, so it might be unrelated if you're having problems with ingress gateways. Would be curious to know the issues if you don't mind reporting here or opening a new issue. |
Thank you for fixing this. Greatly appreciated! I will report the ingress gateway problem if it happens again. |
Will go ahead and close as we currently do not plan on making this customizable at the moment. For folks that find this issue please open up a new issue if you are looking to customize the |
Please see istio/istio#31825 and also you can see AWS is doing the "right" thing and defaulting it to
0
with the option to modify it in the rare case that a different behavior is desired: https://docs.aws.amazon.com/app-mesh/latest/userguide/envoy-config.htmlFeature Description
We are running into a pretty unpleasant problem where Envoy sidecar reaches the default 15s
initial_fetch_timeout
and then continues with starting up and responding withLIVE
to the/ready
endpoint while it has NOT loaded all upstreams for all clusters from Consul.We believe Consul should default
initial_fetch_timeout
to0
because starting the Envoy proxy sidecar with incorrect configuration is much worse than not starting at all (which we can handle much easier)Use Case(s)
Not having broken service mesh :)
The text was updated successfully, but these errors were encountered: