Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(app, app-shell, app-shell-odd): block initial HTTP request until successful MQTT subscription #15094

Merged
merged 2 commits into from
May 7, 2024

Conversation

mjhuff
Copy link
Contributor

@mjhuff mjhuff commented May 6, 2024

Closes EXEC-429

Overview

Currently, the desktop app/ODD attempt to address the "missed update problem" in the following way: while we subscribe to a topic, we simultaneously GET whatever equivalent HTTP resource we just subscribed to. However, there's definitely a world (albeit a very small one) in which we receive the HTTP response, a server update occurs, the server publishes, and then we successfully subscribe to a topic. In this world, we've missed the update event.

Solve for this by simply blocking the initial HTTP GET until we subscribe. While the subscribe handshake could theoretically take a maximum of 2 seconds (at which point we forcefully timeout the subscribe action and fallback to polling), in practice it's more like <~250ms (wall-clock request to response). We already handle failed connections and don't go through this handshake if we can't connect to the client to begin with, so the "wait 2 second until sub failure" scenario shouldn't realistically happen.

Edge vs. 7.3.0?

After discussing with @SyntaxColoring, we believe it's best not to include the fix in 7.3.0, since there's always the possibility we cause a worse problem than we solve (although I feel good that this isn't the case). The worst case scenario is the client very very rarely doesn't receive an update notification as expected.

In practice, this problem has yet to be reported by QA or identified by a dev, so I think it's better to hold off. I'm willing to be convinced otherwise, though.

Test Plan

  • Verified behavior works as expected. We block until the subscription ACK, then we send the initial request.

Changelog

  • Fixed an edge-case in which a client could not receive MQTT updates.

Risk assessment

Medium(ish). See subsection in overview.

…cription

Currently, the desktop app/ODD attempt to address the "missed update problem" in the following way:
while we subscribe to a topic, we simultaneously GET whatever equivalent HTTP resource we just
subscribed to. However, there's definitely a world (albeit a very small one) in which we receive the
HTTP response, a server update occurs, the server publishes, and then we successfully subscribe to a
topic. In this world, we've missed the update event. Solve for this by simply blocking the initial
HTTP GET until we subscribe. While the subscribe handshake could theoretically take a maximum of 2
seconds (at which point we forcefully timeout the subscribe action and fallback to polling), in
practice it's more like 250ms. We already handle failed connections and don't go through this
handshake if we can't connect to the client to begin with, so the "wait 2 second until sub failure"
scenario shouldn't realistically happen. This bug was a product of initially using retained messages
during prototyping, however we removed retained messaging by MQTT launch.
@mjhuff mjhuff requested review from sfoster1 and a team May 6, 2024 12:48
@mjhuff mjhuff requested review from a team as code owners May 6, 2024 12:48
@mjhuff mjhuff requested review from smb2268 and removed request for a team and smb2268 May 6, 2024 12:48
@mjhuff mjhuff changed the title fix(app, app-shell): block initial HTTP request until successful MQTT subscription fix(app, app-shell, app-shell-odd): block initial HTTP request until successful MQTT subscription May 6, 2024
Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little hesitant to introduce more latency here. What if instead we send the synthetic refresh again after we establish the connection? We have an extra refresh, but we don't extend the so-called time to first light, and we're still way better off than when we were just polling.

@mjhuff mjhuff requested a review from sfoster1 May 6, 2024 15:44
@mjhuff
Copy link
Contributor Author

mjhuff commented May 6, 2024

I'm a little hesitant to introduce more latency here. What if instead we send the synthetic refresh again after we establish the connection?

Yeah that makes a lot of sense.

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, looks good to me!

@mjhuff mjhuff merged commit 80cfe7e into edge May 7, 2024
20 checks passed
@mjhuff mjhuff deleted the initial-notify-after-subscription-ack branch May 7, 2024 20:09
Carlos-fernandez pushed a commit that referenced this pull request May 20, 2024
…successful MQTT subscription (#15094)

Closes EXEC-429

Currently, the desktop app/ODD attempt to address the "missed update problem" in the following way: while we subscribe to a topic, we simultaneously GET whatever equivalent HTTP resource we just subscribed to. However, there's definitely a world (albeit a very small one) in which we receive the HTTP response, a server update occurs, the server publishes, and then we successfully subscribe to a topic. In this world, we've missed the update event.

Solve for this by simply refetching right after the client subscribe ACKs. We still keep the initial fetch on mount to keep latency low.
Carlos-fernandez pushed a commit that referenced this pull request Jun 3, 2024
…successful MQTT subscription (#15094)

Closes EXEC-429

Currently, the desktop app/ODD attempt to address the "missed update problem" in the following way: while we subscribe to a topic, we simultaneously GET whatever equivalent HTTP resource we just subscribed to. However, there's definitely a world (albeit a very small one) in which we receive the HTTP response, a server update occurs, the server publishes, and then we successfully subscribe to a topic. In this world, we've missed the update event.

Solve for this by simply refetching right after the client subscribe ACKs. We still keep the initial fetch on mount to keep latency low.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants