-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converge [New]GrpcMuxImpl #11477
Comments
What is left to do here? I thought we already merged most of the code? |
That's not the case, there are distinct paths for delta and SotW, I'm making the same change in both places right now for #11362. @wgallagher has more complete state and a provisional plan here. |
The remaining work is to merge GrpcMuxImpl and NewGrpcMuxImpl (the SoTW / Delta mux) |
At this point @wgallagher is probably more plugged in here than I am. However, I think there's one bit of important information to share here. When #8974, which would have resolved this, caused the weird obscure hang with that one user, the reporter and I pored over it together for like a week. There was nothing in the behavior of the new SotW code going wrong in any way. Rather, Envoy was simply not even asking for the config (IIRC; it was definitely something in the vein of "when the problem happens, it's because the unchanged bulk of Envoy never even calls the changed code"). I suspect a pre-existing bug in the init manager, especially because the user found that the hang could be broken out of by intentionally having the xDS server send a rejection. |
Another thing to look at while doing this work; today we have some pretty different behaviors for how discovery requests are handled during warming in SotW and delta. In SotW, we send on each new subscription, regardless of existing state. In delta, we take into account whether an existing request is pending and don't send when a new cluster is added (for example). I think this has (to some extent) to do with the differences in |
I'm picking this up as @wgallagher is unable to continue working on this. @fredlas & @rgs1 is there a way to reproduce the issue you were seeing in #8974? |
@dmitri-d it's been too long and even back then we couldn't get a repro. I am however happy to give a rebased branch a spin and see how it goes. The breakage was deterministic and quick. |
We have two distinct gRPC mux impls for xDS, which impacts velocity as we need to do the same thing twice when working with the xDS transport and will be a source of inconsistency long term. This is some hangover from earlier work on delta xDS.
CC @wgallagher @fredlas
The text was updated successfully, but these errors were encountered: