fix(app): Fix React Router navigation in React Query callbacks #16084

mjhuff · 2024-08-21T18:23:54Z

Closes RQA-3063 and partially closes RQA-3038

Overview

This PR addresses a few issues that have sprung up after the recent React Router migration.

Why after the React Router Migration?

The working hypothesis is there is some lower-level difference between how history.push and navigate function. More specifically, history.push seems to cause additional render cycles to occur, which while bad in terms of performance, bailed us out of a few of our own React Query bugs. Upgrades to React Router, which replace history.push with navigate seem to have reduced these non-performant extra render cycles but exposed some of the pre-existing bugs within our own code.

The User-Presenting Problem

When performing a navigate within a post-Query callback (onSettled, onSuccess, etc) on the ODD, useCurrentRunRoute is the major consideration: with data provided by useCurrentRunId and useNotifyRunQuery, it makes decisions on whether the ODD should be on a particular route. In order to manually navigate somewhere while useCurrentRunRoute thinks we should be somewhere else, we first have to clear useCurrentRunId, then we can navigate. Our hooks, specifically the closeCurrentRun hook in this case, sends out a PATCH request, then clears the query caches on response, then returns the fresh data.

In practice, what happens is the following:

We click the "return to dashboard" button. This sends out our PATCH request.
We do not navigate the user until this request completes. Using one of the post-Query callbacks, we navigate the user.
The user hits the dashboard. Fresh data says there is no current run.
We are often (but not always) routed back to the Run Summary page unexpectedly.
Clicking the "return to dashboard" button again seemingly always redirects us to the dashboard.

The Solutions

A few fixes are required to get the intended behavior, as each fix exposes a lower-level problem each time.

Fix 1: `closeCurrentRun` swallows post-Query callbacks

The hook only executes if there is a current run. If there is no current run, nothing happens. If something else closes the run (say the desktop), no onSuccess callback will ever execute. The fix here is the new closeCurrentRunIfValid wrapper in RunSummary/index.tsx. Although we really should change the behavior of the hook itself, it was decided that since we have to migrate away from all these query callbacks soon anyway, we might as well keep the blast radius smaller (especially since this is a chore_release fix).

Oh yeah, also, we were overriding any onError callback with a custom console.log, which doesn't seem right, so I changed the order of operations here.

Fix 2: `useNotifyDataReady` and `useNotify` hooks.

It's not very intuitive, but the React Query docs make it pretty clear that if a query is disabled, then "The query will ignore query client invalidateQueries and refetchQueries calls that would normally result in the query refetching." The notify hooks explicitly keep the HTTP hook disabled UNTIL the server tells us to refetch. This was done to prevent passed-in refetchIntervals from causing polling, but this is clearly the wrong approach.

The above was causing the following behavior:

We click the "return to dashboard" button. This sends out our PATCH request.
We do not navigate the user until this request completes. Using one of the post-Query callbacks, we navigate the user.
The user hits the dashboard. Fresh data has NOT yet returned because MQTT controls whether or not the hook is enabled, and we are always disabled immediately after the closeCurrentRun hook returns, because MQTT takes some milliseconds to tell the app that the hook needs to be enabled. Yes, the cache data is correctly cleared on the query key, but for that specific hook, which is disabled, the cache is preserved.

So that's the bug - the hook is disabled, which means we use the stale cache data. This redirects us back to RunSummary. The network request completes usually a few MS later, and that's what always allows the second click to work.

This isn't too bad to fix, actually. Instead of conditionally enabling the hook, we always keep it enabled, and instead we just conditionally use a refetchInterval is one is present and otherwise only refetch when the shell tells us to refetch.

Fix 3: `useCurrentRunRoute`

Yeah...so it turns out we also had logic in the app for keeping useNotifyRunQuery disabled if currentRunId is null. So guess what? This still results in stale data even after fixing the MQTT issue. This isn't a recent thing: this enabled condition has existed for the entirety of useCurrentRunRoute. The reason it's only entirely blocking now is that we fixed the other bugs, and MQTT is very fast, so it's very apparent that we're using stale cached, non-invalidated data. Prior to MQTT, polling was slow, so you'd get a 5 second window for fetching the runRecord before the currentRun went to null and disabled the hook. In other words, this has always been a bug, but it likely occurred only on the occasion in practice.

The solution here is just to remove the enabled condition. This DOES mean we ask the server for a run record with a null runId once in a while, but I don't think there's any way around it. React Router doesn't seem to provide us a way to update the cache on invalidations but not actually perform a refetch if runId is null. It seems we get enabled and nothing else to work with.

Fix 3.5: Some Cleanup

We use navigate('/') instead of navigate('/dashboard/), and this does seem to make navigation a bit slower (granted, this is entirely anecdotal, since I didn't actually test this claim). More importantly, we don't actually have a root route, so I think it's probably better not to assume how React Router works/doens't work.
I added isFetching as a condition to useCurrentRunRoute. I really do think we want this, since this acts as insurance in case we have improperly set up query key invalidation, which has several points of failure. This doesn't add any user-noticeable latency, and I think it might actually solve some weird blippiness that occurs occasionally.

TODO but (not in `chore_release`)

Yeah, we need a new layer of abstraction for the notification logic, as it's clearly leaking into the notification hooks themselves. Because this is chore_release, I've decided to keep fixes as localized as possible instead of risking larger issues with app-side networking.

I do plan on following up on this for 8.1, sooner rather than later.

Test Plan and Hands on Testing

I ran probably 20 protocol runs and couldn't repro this with the PR's changes. It was something like 1/2-1/3 times before these changes.
Smoke tested the app to ensure nothing is totally broken.

Changelog

Fixed clicking "return to dashboard" sometimes not navigating users to the dashboard.

Risk assessment

lowish-medium. This touches some pretty fundamental app networking code, but it's nothing as scary as it sounds.

sfoster1

Okay, this looks correct to me. Setting run to null if we're fetching might end up as a footgun if we ever have non-stateful routing (i.e., always route to runs if there's a run id, and route away from runs if it's null) but right now we only change the state to runs if there's a run id, and don't do anything if it's null, so it's safe.

Closes RQA-3038 This PR fixes a longstanding issue (although the severity and presentation of the underlying issue has varied over time) in which clicking on a RecentRunProtocolCard often leads to unexpected UI updates, full screen re-renders, and routing problems. See #16084 for more explanation on why routing issues are more prevalent this release (and why QA is probably now filing these existing issues more aggressively). The RecentRunProtocolCard actively navigates to a page managed by TopLevelRedirects, and the code that caused the active navigation existed before MQTT effectively made these redirects instantaneous (and deprecating the need for active redirects to pages managed by TopLevelRedirects). In other words, we really don't have to do any navigation here at all, but that leads to two questions: What's actually causing this bug? What happens if we just let TopLevelRedirects do the navigation? First, it's difficult to pinpoint exactly what's causing the unexpected routing behavior, however, it very much appears that: It's almost certainly not TopLevelRedirects itself. I verified this with extensive console.logging. Note that this logic doesn't do anything when the app thinks we should still be on the dashboard. It's almost certainly some extra render cycles occurring somewhere in the app as a result of the cloneRun function, which does a good bit of query invalidation. It's quite difficult to pinpoint exactly what's causing the full on dashboard re-rendering to occur (see the current behavior video), because we don't actually have React Devtools on the physical ODD (yeah, I should probably look into this). So what does a good fix look like? If we just let TopLevelRedirects passively navigate us, we unfortunately get some wonky, but expected UI. The protocol cards are sorted based on timestamps, so the cards shuffle around when you click a card, the timestamps change a few times, and THEN we see the navigation occur cleanly. A much simpler and effective solution is to actively navigate to a fake RunSetup page. This solves the above problem (users don't see crazy shuffling cards), and it plays well with TopLevelRedirects (this component doesn't do anything until the real setup route is valid, and then it redirects us to that route cleanly).

Closes EXEC-695 #16084 highlighted some of the existing issues we had after the recent React Router migration. While that PR fixed some problems, the intention was to prioritize keeping the bug radius small for the upcoming 8.0 release, which meant some changes were not implemented optimally. Now that we have plenty of time before the 8.1 release, let's dog food networking changes for as long as possible, starting with this one: currently, TopLevelRedirects chains react queries, taking the currentRunId from useCurrentRunId and feeds it directly into useNotifyQuery. Whenever currentRunId is null, we GET /runs/null, which is a network request that we should avoid. After discussion, the most Reactive solution is to isolate the hooks into their own components, and conditionally render a component with the chained hook only if the hook(s) farther up the chain are not null. In other words, keep the concept of query chaining, but just do it in components.

fix(app): fix React Router navigation in React Query callbacks

1d5e239

mjhuff requested review from sfoster1, shlokamin and a team August 21, 2024 18:23

mjhuff requested a review from a team as a code owner August 21, 2024 18:23

sfoster1 approved these changes Aug 21, 2024

View reviewed changes

mjhuff merged commit d6a0cff into chore_release-8.0.0 Aug 21, 2024
20 checks passed

mjhuff deleted the app_fix-nav-query-callbacks branch August 21, 2024 21:06

mjhuff mentioned this pull request Aug 22, 2024

fix(app): fix Recent Run Protocol Card onClick routing behavior #16094

Merged

brenthagen mentioned this pull request Aug 26, 2024

fix(app): remove module card navigate on settled run query #16131

Merged

This was referenced Aug 29, 2024

fix(app): fix fallback polling not working on select notifications #16162

Merged

refactor(app): improve top level query chaining #16179

Merged

mjhuff mentioned this pull request Sep 3, 2024

refactor(app): Clean up notification hooks #16183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(app): Fix React Router navigation in React Query callbacks #16084

fix(app): Fix React Router navigation in React Query callbacks #16084

mjhuff commented Aug 21, 2024 •

edited

Loading

sfoster1 left a comment

fix(app): Fix React Router navigation in React Query callbacks #16084

fix(app): Fix React Router navigation in React Query callbacks #16084

Conversation

mjhuff commented Aug 21, 2024 • edited Loading

Overview

Why after the React Router Migration?

The User-Presenting Problem

The Solutions

Fix 1: closeCurrentRun swallows post-Query callbacks

Fix 2: useNotifyDataReady and useNotify hooks.

Fix 3: useCurrentRunRoute

Fix 3.5: Some Cleanup

TODO but (not in chore_release)

Test Plan and Hands on Testing

Changelog

Risk assessment

sfoster1 left a comment

Choose a reason for hiding this comment

mjhuff commented Aug 21, 2024 •

edited

Loading

Fix 1: `closeCurrentRun` swallows post-Query callbacks

Fix 2: `useNotifyDataReady` and `useNotify` hooks.

Fix 3: `useCurrentRunRoute`

TODO but (not in `chore_release`)