-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(app): Fix React Router navigation in React Query callbacks #16084
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this looks correct to me. Setting run to null if we're fetching might end up as a footgun if we ever have non-stateful routing (i.e., always route to runs if there's a run id, and route away from runs if it's null) but right now we only change the state to runs if there's a run id, and don't do anything if it's null, so it's safe.
Closes RQA-3038 This PR fixes a longstanding issue (although the severity and presentation of the underlying issue has varied over time) in which clicking on a RecentRunProtocolCard often leads to unexpected UI updates, full screen re-renders, and routing problems. See #16084 for more explanation on why routing issues are more prevalent this release (and why QA is probably now filing these existing issues more aggressively). The RecentRunProtocolCard actively navigates to a page managed by TopLevelRedirects, and the code that caused the active navigation existed before MQTT effectively made these redirects instantaneous (and deprecating the need for active redirects to pages managed by TopLevelRedirects). In other words, we really don't have to do any navigation here at all, but that leads to two questions: What's actually causing this bug? What happens if we just let TopLevelRedirects do the navigation? First, it's difficult to pinpoint exactly what's causing the unexpected routing behavior, however, it very much appears that: It's almost certainly not TopLevelRedirects itself. I verified this with extensive console.logging. Note that this logic doesn't do anything when the app thinks we should still be on the dashboard. It's almost certainly some extra render cycles occurring somewhere in the app as a result of the cloneRun function, which does a good bit of query invalidation. It's quite difficult to pinpoint exactly what's causing the full on dashboard re-rendering to occur (see the current behavior video), because we don't actually have React Devtools on the physical ODD (yeah, I should probably look into this). So what does a good fix look like? If we just let TopLevelRedirects passively navigate us, we unfortunately get some wonky, but expected UI. The protocol cards are sorted based on timestamps, so the cards shuffle around when you click a card, the timestamps change a few times, and THEN we see the navigation occur cleanly. A much simpler and effective solution is to actively navigate to a fake RunSetup page. This solves the above problem (users don't see crazy shuffling cards), and it plays well with TopLevelRedirects (this component doesn't do anything until the real setup route is valid, and then it redirects us to that route cleanly).
Closes EXEC-695 #16084 highlighted some of the existing issues we had after the recent React Router migration. While that PR fixed some problems, the intention was to prioritize keeping the bug radius small for the upcoming 8.0 release, which meant some changes were not implemented optimally. Now that we have plenty of time before the 8.1 release, let's dog food networking changes for as long as possible, starting with this one: currently, TopLevelRedirects chains react queries, taking the currentRunId from useCurrentRunId and feeds it directly into useNotifyQuery. Whenever currentRunId is null, we GET /runs/null, which is a network request that we should avoid. After discussion, the most Reactive solution is to isolate the hooks into their own components, and conditionally render a component with the chained hook only if the hook(s) farther up the chain are not null. In other words, keep the concept of query chaining, but just do it in components.
Closes RQA-3063 and partially closes RQA-3038
Overview
This PR addresses a few issues that have sprung up after the recent React Router migration.
Why after the React Router Migration?
The working hypothesis is there is some lower-level difference between how
history.push
andnavigate
function. More specifically,history.push
seems to cause additional render cycles to occur, which while bad in terms of performance, bailed us out of a few of our own React Query bugs. Upgrades to React Router, which replacehistory.push
withnavigate
seem to have reduced these non-performant extra render cycles but exposed some of the pre-existing bugs within our own code.The User-Presenting Problem
When performing a
navigate
within a post-Query callback (onSettled, onSuccess, etc) on the ODD,useCurrentRunRoute
is the major consideration: with data provided byuseCurrentRunId
anduseNotifyRunQuery
, it makes decisions on whether the ODD should be on a particular route. In order to manually navigate somewhere whileuseCurrentRunRoute
thinks we should be somewhere else, we first have to clearuseCurrentRunId
, then we can navigate. Our hooks, specifically thecloseCurrentRun
hook in this case, sends out aPATCH
request, then clears the query caches on response, then returns the fresh data.In practice, what happens is the following:
PATCH
request.The Solutions
A few fixes are required to get the intended behavior, as each fix exposes a lower-level problem each time.
Fix 1:
closeCurrentRun
swallows post-Query callbacksThe hook only executes if there is a current run. If there is no current run, nothing happens. If something else closes the run (say the desktop), no
onSuccess
callback will ever execute. The fix here is the newcloseCurrentRunIfValid
wrapper inRunSummary/index.tsx
. Although we really should change the behavior of the hook itself, it was decided that since we have to migrate away from all these query callbacks soon anyway, we might as well keep the blast radius smaller (especially since this is achore_release
fix).Oh yeah, also, we were overriding any
onError
callback with a customconsole.log
, which doesn't seem right, so I changed the order of operations here.Fix 2:
useNotifyDataReady
anduseNotify
hooks.It's not very intuitive, but the React Query docs make it pretty clear that if a query is disabled, then "The query will ignore query client
invalidateQueries
andrefetchQueries
calls that would normally result in the query refetching." Thenotify
hooks explicitly keep the HTTP hook disabled UNTIL the server tells us to refetch. This was done to prevent passed-inrefetchInterval
s from causing polling, but this is clearly the wrong approach.The above was causing the following behavior:
PATCH
request.closeCurrentRun
hook returns, because MQTT takes some milliseconds to tell the app that the hook needs to be enabled. Yes, the cache data is correctly cleared on the query key, but for that specific hook, which is disabled, the cache is preserved.So that's the bug - the hook is
disabled
, which means we use the stale cache data. This redirects us back toRunSummary
. The network request completes usually a few MS later, and that's what always allows the second click to work.This isn't too bad to fix, actually. Instead of conditionally enabling the hook, we always keep it enabled, and instead we just conditionally use a
refetchInterval
is one is present and otherwise only refetch when the shell tells us to refetch.Fix 3:
useCurrentRunRoute
Yeah...so it turns out we also had logic in the app for keeping
useNotifyRunQuery
disabled ifcurrentRunId
isnull
. So guess what? This still results in stale data even after fixing the MQTT issue. This isn't a recent thing: thisenabled
condition has existed for the entirety ofuseCurrentRunRoute
. The reason it's only entirely blocking now is that we fixed the other bugs, and MQTT is very fast, so it's very apparent that we're using stale cached, non-invalidated data. Prior to MQTT, polling was slow, so you'd get a 5 second window for fetching therunRecord
before thecurrentRun
went to null and disabled the hook. In other words, this has always been a bug, but it likely occurred only on the occasion in practice.The solution here is just to remove the enabled condition. This DOES mean we ask the server for a run record with a
null
runId once in a while, but I don't think there's any way around it. React Router doesn't seem to provide us a way to update the cache on invalidations but not actually perform a refetch if runId is null. It seems we getenabled
and nothing else to work with.Fix 3.5: Some Cleanup
navigate('/')
instead ofnavigate('/dashboard/)
, and this does seem to make navigation a bit slower (granted, this is entirely anecdotal, since I didn't actually test this claim). More importantly, we don't actually have a root route, so I think it's probably better not to assume how React Router works/doens't work.isFetching
as a condition touseCurrentRunRoute
. I really do think we want this, since this acts as insurance in case we have improperly set up query key invalidation, which has several points of failure. This doesn't add any user-noticeable latency, and I think it might actually solve some weird blippiness that occurs occasionally.TODO but (not in
chore_release
)Yeah, we need a new layer of abstraction for the notification logic, as it's clearly leaking into the notification hooks themselves. Because this is
chore_release
, I've decided to keep fixes as localized as possible instead of risking larger issues with app-side networking.I do plan on following up on this for 8.1, sooner rather than later.
Test Plan and Hands on Testing
Changelog
Risk assessment
lowish-medium. This touches some pretty fundamental app networking code, but it's nothing as scary as it sounds.