-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait to discover all K8s resources before sending xDS responses #1280
Comments
This is different than #1178 which is about ordering. Even if #1178 is fixed Contour will still send only a subset of resources, subject to K8s informer timing, and causes 404s in Envoy when Contour restarts because Contour hasn't discovered all the resources in K8s that it previously programmed Envoy with Example for CDS: Contour1 is Contour 1 Contour1 discovers Cluster1 and responds to CDS with it. It later discovers Cluster2 and responds to CDS with Cluster1 and Cluster2. We'll consider Envoy "fully programmed" for the purpose of this example. Contour1 is terminated for some reason (upgrade, node death, etc.) Contour2 discovers Cluster1 and responds to CDS with it. Note that Cluster2 is missing which means Envoy will remove it. From the docs:
Envoy will also remove the corresponding EDS and RDS.
Whatever cluster (and corresponding resources) was described by Cluster2 now returns 404s until Contour2 discovers Cluster2 and programs Envoy with Cluster1 and Cluster2. Assuming #1178 is fixed then EDS and RDS will come in order per the xDS protocol spec. This problem isn't exclusive to CDS. It's a general problem that needs addressed for all K8s resources that can end up in xDS responses. |
Seemingly related: #1286 |
Yup, we do wait for the caches to sync up (https://github.com/heptio/contour/blob/master//cmd/contour/serve.go#L383), but we're not blocking on the gRPC connection to xDS. |
Sure, you're sync-ing on an empty cache (the SharedInformer Store is empty), so you're not going to wait very long. We implemented a synchronous discovery of the resources, but that has some ramifications preventing me from filing a PR as-is. I think what needs to happen here is something like this:
I'll see if I can come up with something |
Moving tentatively to the next milestone. The plan is to not bring up the grpc server until the shared informer has synced and we have populated the grpc cache at least once. |
In #1765 we added some logic to have the caches sync before starting the gRPC server, however, it's possible that the final DAG still isn't yet built. Leaving this as the open remaining task. |
this issue is seriously affecting us as we have many HTTPProxy and it will cause some minutes of downtime with every contour restart!
|
As far as I can tell Contour will send whatever resources it has whenever xDS DiscoveryRequests come in. This can cause xDS resources to be torn down in Envoy on a Contour restart. Contour should do a synchronous (read: eager and blocking) lookup of its CRDs, Ingresses, Services, Endpoints, Secrets, etc. on startup before responding to DiscoveryRequests.
This is causing a variety of issues depending on the resource. Examples include:
/cc @bryanlatten @lrouquette
Tasks
Blocked
The text was updated successfully, but these errors were encountered: