-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate how two dpservice instances can be run in parallel #643
Comments
Known issues to resolve
Two instances of dpservice in parallel + orchestrationThe first idea was to somehow secondary process(es) but this it not usable because there can always only be one primary process, thus there would be no way to get the new dpservice back to "primary status" What does work though is to run two primary processes with a different hugepage-memory (via Issues
Passing controlThere needs to be a way of stopping dpservice from processing packets. This way the new one can come up in an "idle" state, get reconciled and then simply start processing, while the other stops. [SOLVED] I simply added gRPC call to start/stop processing and dpservice can start with stopped processing by default. This is done in
Passing stateOrchestration is not enough, we also need to pass connection tracking, etc. [POSSIBLE] In principle this is easy. "Just" share some memory or open a connection between processes and transfer data. It will be hard to do properly, but not really complex. Packet processingWe need to only process packets by one instance of dpservice. This is prevented by mlx5 that is duplicating packets when two instances are running. Unfortunately the duplication happens after [POSSIBLE] In DPDK, there is a single place where an Currently though I am skipping this step as actually traffic can survive packet duplication it seems (my guess is the switches/kernels dropping duplicates). Next stepsAs I am intentionally not finishing any step (to minimize work during reserach), the last big thing is to test in proper traffic situation in OSC and verify how big of a problem packet duplication is. There is also a small problem with stopping the old instance, which causes metalnet to unsubscribe everything in the new instance (my hypothesis, no details now), this should be relatively trivial to solve . There is also the fact that two dpservice share CPU and in polling mode we halve the processing power of both. But this is again easily solvable by slowing down the "idle" instance in the graph loop. |
Investigate whether it is possible to run two identical dpservice instances in parallel without disturbing each other.
This is one of the necessary steps to be able to update a single dpservice instance without downtime,
The text was updated successfully, but these errors were encountered: