-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(app, app-shell, app-shell-odd): block initial HTTP request until successful MQTT subscription #15094
Conversation
…cription Currently, the desktop app/ODD attempt to address the "missed update problem" in the following way: while we subscribe to a topic, we simultaneously GET whatever equivalent HTTP resource we just subscribed to. However, there's definitely a world (albeit a very small one) in which we receive the HTTP response, a server update occurs, the server publishes, and then we successfully subscribe to a topic. In this world, we've missed the update event. Solve for this by simply blocking the initial HTTP GET until we subscribe. While the subscribe handshake could theoretically take a maximum of 2 seconds (at which point we forcefully timeout the subscribe action and fallback to polling), in practice it's more like 250ms. We already handle failed connections and don't go through this handshake if we can't connect to the client to begin with, so the "wait 2 second until sub failure" scenario shouldn't realistically happen. This bug was a product of initially using retained messages during prototyping, however we removed retained messaging by MQTT launch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little hesitant to introduce more latency here. What if instead we send the synthetic refresh again after we establish the connection? We have an extra refresh, but we don't extend the so-called time to first light, and we're still way better off than when we were just polling.
Yeah that makes a lot of sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good, looks good to me!
…successful MQTT subscription (#15094) Closes EXEC-429 Currently, the desktop app/ODD attempt to address the "missed update problem" in the following way: while we subscribe to a topic, we simultaneously GET whatever equivalent HTTP resource we just subscribed to. However, there's definitely a world (albeit a very small one) in which we receive the HTTP response, a server update occurs, the server publishes, and then we successfully subscribe to a topic. In this world, we've missed the update event. Solve for this by simply refetching right after the client subscribe ACKs. We still keep the initial fetch on mount to keep latency low.
…successful MQTT subscription (#15094) Closes EXEC-429 Currently, the desktop app/ODD attempt to address the "missed update problem" in the following way: while we subscribe to a topic, we simultaneously GET whatever equivalent HTTP resource we just subscribed to. However, there's definitely a world (albeit a very small one) in which we receive the HTTP response, a server update occurs, the server publishes, and then we successfully subscribe to a topic. In this world, we've missed the update event. Solve for this by simply refetching right after the client subscribe ACKs. We still keep the initial fetch on mount to keep latency low.
Closes EXEC-429
Overview
Currently, the desktop app/ODD attempt to address the "missed update problem" in the following way: while we subscribe to a topic, we simultaneously GET whatever equivalent HTTP resource we just subscribed to. However, there's definitely a world (albeit a very small one) in which we receive the HTTP response, a server update occurs, the server publishes, and then we successfully subscribe to a topic. In this world, we've missed the update event.
Solve for this by simply blocking the initial HTTP GET until we subscribe. While the subscribe handshake could theoretically take a maximum of 2 seconds (at which point we forcefully timeout the subscribe action and fallback to polling), in practice it's more like <~250ms (wall-clock request to response). We already handle failed connections and don't go through this handshake if we can't connect to the client to begin with, so the "wait 2 second until sub failure" scenario shouldn't realistically happen.
Edge vs. 7.3.0?
After discussing with @SyntaxColoring, we believe it's best not to include the fix in 7.3.0, since there's always the possibility we cause a worse problem than we solve (although I feel good that this isn't the case). The worst case scenario is the client very very rarely doesn't receive an update notification as expected.
In practice, this problem has yet to be reported by QA or identified by a dev, so I think it's better to hold off. I'm willing to be convinced otherwise, though.
Test Plan
Changelog
Risk assessment
Medium(ish). See subsection in overview.