fix(app): Run timer stays at 00:00:00 if you reconnect in the middle of a delay #7841

celsasser · 2021-05-26T16:57:13Z

Overview

If the Opentrons App disconnects from an OT-2 in the middle of a protocol, and reconnects while the OT-2 is in the middle of a delay step, the Opentrons App's run timer will be stuck at 00:00:00, as if the protocol had never started.

This may contribute to users thinking a protocol has "canceled itself."

Closes #7740

Changelog

revise our policy regarding returning a start-time if there is no compensation duration. Which upon refresh there is not. Why? I don't know. If my solution is not acceptable then will have to figure that out.
In the process of looking for and testing the bug found that pause does not pause the timer. The solution is hacky in that refresh wipes the slate clear. But I think for the most common use case it is better than the way it currently works or doesn't work depending on how you look at it.
Updating unit tests

Review requests

I have added some TODOs. Please offer feedback if you don't like the solution and whether it should be torn out.

Is ignoring remoteTimeCompensation (when we don't have it) okay or is there a latency so crazy that it makes the presented time out of some acceptable bounds.

Risk assessment

Please see the Review requests. I don't have concerns beyond the way....the way I am fixing the issue.

…e of a delay: - introducing pause and paused duration support. Was going to refactor to use a `StopWatch` but it was so invasive and was intimidated away from that refactor. - need to follow up. Switching gears.

- lots of logging which will be removed - notes about the pause logic

codecov · 2021-05-26T16:57:19Z

Codecov Report

❗ No coverage uploaded for pull request base (edge@46431d5). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 2ecf85b differs from pull request most recent head 073967c. Consider uploading reports for the commit 073967c to get more accurate results

@@           Coverage Diff           @@
##             edge    #7841   +/-   ##
=======================================
  Coverage        ?   83.48%           
=======================================
  Files           ?      336           
  Lines           ?    21350           
  Branches        ?        0           
=======================================
  Hits            ?    17824           
  Misses          ?     3526           
  Partials        ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 46431d5...073967c. Read the comment docs.

api/src/opentrons/hardware_control/api.py

app/src/redux/robot/reducer/session.js

app/src/redux/robot/selectors.js

app/src/rpc/websocket-client.js

- clarify in comments - revise/add testing

app/src/redux/robot/reducer/session.js

…arization. And being more consistent with startTime and making pauseTime be `null` when not activated.

SyntaxColoring · 2021-05-28T15:25:05Z

app/src/redux/robot/reducer/session.js

-  return { ...state, resumeRequest: { inProgress: true, error: null } }
+  const pausedDuration = state.pausedTime ? Date.now() - state.pausedTime : 0
+  return {
+    ...state,
+    pausedDuration: state.pausedDuration + pausedDuration,
+    pausedTime: null,
+    resumeRequest: { inProgress: true, error: null },
+    runTime: Date.now(),
+  }


In the process of looking for and testing the bug found that pause does not pause the timer. The solution is hacky in that refresh wipes the slate clear. But I think for the most common use case it is better than the way it currently works or doesn't work depending on how you look at it.

So if you:

Start running a protocol.

Get through a bunch of pauses and delays in the protocol, amounting to 15 minutes.

Disconnect from the robot or quit the app.

Reconnect to the robot or reopen the app.

Then the run timer will jump up by 15 minutes between steps 3 and 4, since the app will no longer have any knowledge of pausedDuration?

If this is right, I don't think we should attempt to pause the timer during pauses at all. The timer being predictable and consistent is more important than it excluding pause time.

~~Also:~~

When you're authoring a protocol, there's a distinction between a "delay" ("this scientific process requires our science juice to marinate for exactly 15 minutes before we move on to the next step") and a "pause" ("I want to wait here until the user clicks 'Resume'").

~~By the time these events reach the Opentrons App, I think (but I'm definitely not sure) that they get conflated into a single "pause" event.~~

~~It wouldn't be right for the timer to stop running during one of those deliberate 15-minute delays. But if we fix the timer running during pauses, we might accidentally do that?~~

Yes, that's exactly right. Okay, then if we do want support for it then we would have to go deeper and make the robot-server or lower track pauses. But sounds like we should the hacky fix out.

Oh dear, I've lost the ability to type proper sentences. Okay, I'll pull it out.

Also:

When you're authoring a protocol, there's a distinction between a "delay" ("this scientific process requires our science juice to marinate for exactly 15 minutes before we move on to the next step") and a "pause" ("I want to wait here until the user clicks 'Resume'").

By the time these events reach the Opentrons App, I think (but I'm definitely not sure) that they get conflated into a single "pause" event.

It wouldn't be right for the timer to stop running during one of those deliberate 15-minute delays. But if we fix the timer running during pauses, we might accidentally do that?

I may have been mistaken about what you were saying. It's a little confusing. The pause I am addressing is not the delay in the protocol. It's 100% the Pause button in the app. When they press pause I am essentially stopping the timer from ticking and measuring the time between when the initiated pause and resumed it. And upon resume the timer ticks again.

I am not doing anything with delays scripted into protocols.

Does that change anything? The motivation for doing it was that we are conditioned to see things pause when we pause them. Maybe I spent too much time in the media industry.

Sorry, I think you can disregard that—I think I'm just confusing myself. 🤐

mcous

Ok so the TL;DR on this review is that most of the existing timer state is completely useless. At one point in the robot server's history, session.startTime could not be trusted, so we did a lot of work to try to bandaid around that. Since #6501, we can trust the server time, more or less, so all this brittle stuff isn't really doing anything.

There's also the knotty UX question of "what does the run timer do when the protocol is paused?" We've been attempting to pause the timer when the run is paused or delayed, but that's really difficult to do reliably given server constraints.

I think the pause time tracking idea this PR adds is solid and helpful, and I think we actually have enough information from the server to track it. The gist of everything that follows is what if we surfaced current pause time to the user instead of trying to get clever about when the timer runs or not.

The key is that the server sends a session.statusInfo object that gets placed in state.robot.session.statusInfo. Whenever the protocol status changes (e.g. to paused), this statusInfo object will hold on to a relative timestamp of when that change occurred.

That means we have the following reliable information from the server:

Protocol run start time
Relative timestamp of the last status change

Given that, rather than trying to add stuff to client state until things work, I think this PR should, instead, try to remove stuff until it works 😄

Remove the runTime state entry because it represents "now, if running", and we can calculate that from startTime in a selector / in the UI component
Remove remoteTimeCompensation state entry and associated machinery, because with time syncing, this has been made unnecessary
Remove the TICK_RUN_TIME action and associated machinery, because why would we need to do this in state?

If we go back to our original story for this bug, it's something along the lines of:

As a user, I want to know that a robot I've connected to is still running its protocol, even if it's paused.

So, I propose that we ditch trying to stop the run timer when the protocol is paused. IMO it's too technically difficult to implement reliably given the current constraints:

Server does not differentiate between a delay and a pause
Server does not track a historical record of pauses

Instead we can have a run timer that is running from the moment the user hits start until the moment the protocol ends or is stopped. But, we still want to make sure the user knows the protocol is paused (or delaying). So, using the information we already have...

Relative start of the current pause in relation to overall start time (session.statusInfo.changedAt if session.status === PAUSED)
Current time from Date.now()

...we should update the run timer UI when the protocol is paused, doing something like:

Make the currently big "Run Time: 01:02:03" smaller, calling it "Total Run Time" and putting it just above "Start Time"
Where the big runner timer used to be, put "Paused For: 01:02:03"

I think this hits all our requirements:

Total run time is now driven by:
- Hasn't run yet: nothing, it's 0
- When running / paused: Date.now() less session.startTime
- When stopped / errored / finished: session.statusInfo.changedAt (because that's when the status changed to stopped, etc.)
Current paused time is driven by:
- If not paused: nothing, it's 0
- If paused: Date.now() less session.startTime + session.statusInfo.changedAt
If robot is paused, user knows clearly that it's paused and for how long it's been paused, because UI clearly communicates a ticking time labeled "Paused For"
If robot is not paused, everything keeps going smoothly

mcous · 2021-06-01T17:21:46Z

app/src/redux/robot/reducer/session.js

-  return { ...state, runTime: 0, runRequest: { inProgress: true, error: null } }
+  return {
+    ...state,
+    runTime: Date.now(),


I think calls to anything non-idempotent should be strictly avoided in Redux reducers. Which makes it very embarrassing to realize that handleTickRunTime below has been sitting there for 4 years with a call to Date.now() that I definitely wrote...

Oh, don't be too hard on yourself. It's probably one of my edits :)

celsasser · 2021-06-01T21:25:39Z

Ok so the TL;DR on this review is that most of the existing timer state is completely useless. At one point in the robot server's history, session.startTime could not be trusted, so we did a lot of work to try to bandaid around that. Since #6501, we can trust the server time, more or less, so all this brittle stuff isn't really doing anything.

There's also the knotty UX question of "what does the run timer do when the protocol is paused?" We've been attempting to pause the timer when the run is paused or delayed, but that's really difficult to do reliably given server constraints.

I think the pause time tracking idea this PR adds is solid and helpful, and I think we actually have enough information from the server to track it. The gist of everything that follows is what if we surfaced current pause time to the user instead of trying to get clever about when the timer runs or not.

The key is that the server sends a session.statusInfo object that gets placed in state.robot.session.statusInfo. Whenever the protocol status changes (e.g. to paused), this statusInfo object will hold on to a relative timestamp of when that change occurred.

That means we have the following reliable information from the server:

Protocol run start time

Relative timestamp of the last status change

Given that, rather than trying to add stuff to client state until things work, I think this PR should, instead, try to remove stuff until it works 😄

Remove the runTime state entry because it represents "now, if running", and we can calculate that from startTime in a selector / in the UI component

Remove remoteTimeCompensation state entry and associated machinery, because with time syncing, this has been made unnecessary

Remove the TICK_RUN_TIME action and associated machinery, because why would we need to do this in state?

If we go back to our original story for this bug, it's something along the lines of:

As a user, I want to know that a robot I've connected to is still running its protocol, even if it's paused.

So, I propose that we ditch trying to stop the run timer when the protocol is paused. IMO it's too technically difficult to implement reliably given the current constraints:

Server does not differentiate between a delay and a pause

Server does not track a historical record of pauses

Instead we can have a run timer that is running from the moment the user hits start until the moment the protocol ends or is stopped. But, we still want to make sure the user knows the protocol is paused (or delaying). So, using the information we already have...

Relative start of the current pause in relation to overall start time (session.statusInfo.changedAt if session.status === PAUSED)

Current time from Date.now()

...we should update the run timer UI when the protocol is paused, doing something like:

Make the currently big "Run Time: 01:02:03" smaller, calling it "Total Run Time" and putting it just above "Start Time"

Where the big runner timer used to be, put "Paused For: 01:02:03"

I think this hits all our requirements:

Total run time is now driven by:

Hasn't run yet: nothing, it's 0

When running / paused: Date.now() less session.startTime

When stopped / errored / finished: session.statusInfo.changedAt (because that's when the status changed to stopped, etc.)

Current paused time is driven by:

If not paused: nothing, it's 0

If paused: Date.now() less session.startTime + session.statusInfo.changedAt

If robot is paused, user knows clearly that it's paused and for how long it's been paused, because UI clearly communicates a ticking time labeled "Paused For"

If robot is not paused, everything keeps going smoothly

Okay, not sure I am clear on where to go from here, but totally accept that it was a mistake to deviate from the bug. And regarding the deviation I wasn't attempting to sync to anything going on in the protocol. It only was meant to pause the timer when a user initiated pause via the interface - the PAUSE button. It was hacky, but the oddness of it was glaring when testing.

Regarding next steps (taking for granted that I will remove everything having to with pause that I introduced) I think by removing remoteTimeCompensation will fix it. Do we want to be more aggressive? I guess, to start with wondering about the following:

* Remove the runTime state entry because it represents "now, if running", and we can calculate that from startTime in a selector / in the UI component
* Remove remoteTimeCompensation state entry and associated machinery, because with time syncing, this has been made unnecessary
* Remove the TICK_RUN_TIME action and associated machinery, because why would we need to do this in state?

mcous · 2021-06-01T22:48:00Z

Okay, not sure I am clear on where to go from here, but totally accept that it was a mistake to deviate from the bug.

My apologies, I didn't mean to imply that you made any sort of mistake here! This is some of the very oldest code in the app, it has a lot of baggage.

Regarding next steps (taking for granted that I will remove everything having to with pause that I introduced) I think by removing remoteTimeCompensation will fix it. Do we want to be more aggressive?

The source of the bug, as best I can tell from reading the code, doesn't (necessarily) have anything to do with remoteTimeCompensation. I think:

The run timer relies on state.robot.session.runTime to exist as a value that represents "now"
- The only thing that can set runTime is a TICK_RUN_TIME action
The code in app/src/redux/robot/api-client/client.js has an interval that runs, triggering a TICK_RUN_TIME action every second to set state.robot.session.runTime to Date.now()
- The client stops this interval whenever session.status is not running
If the app connects to a robot with session.status === 'paused', the TICK_RUN_TIME interval will never spin up, and runTime will never get set to Date.now()

So, as food for thought, here are two potential solutions:

option 1: do the least

Modify handleSessionUpdate in app/src/redux/robot/reducer/session.ts to set runTime
- This part of the reducer gets triggered whenever the app connects to a robot and whenever the session state gets updated
- Probably set it to something weird like:
  - If session status is running, set runTime to now
  - If session status is paused, set runTime to startTime + statusInfo.changedAt - remoteTimeCompensation

option 2: do the most

See review comment above, but the gist of it is "ditch all client-side run time tracking and calculate everything from server state, updating the UI to account for the differences"

I like option (2) because it's more "correct" and doesn't feel like a huge lift, but option (1) would probably still fix the bug.

celsasser · 2021-06-01T23:15:27Z

Okay, not sure I am clear on where to go from here, but totally accept that it was a mistake to deviate from the bug.

My apologies, I didn't mean to imply that you made any sort of mistake here! This is some of the very oldest code in the app, it has a lot of baggage.

Regarding next steps (taking for granted that I will remove everything having to with pause that I introduced) I think by removing remoteTimeCompensation will fix it. Do we want to be more aggressive?

The source of the bug, as best I can tell from reading the code, doesn't (necessarily) have anything to do with remoteTimeCompensation. I think:

The run timer relies on state.robot.session.runTime to exist as a value that represents "now"

The only thing that can set runTime is a TICK_RUN_TIME action

The code in app/src/redux/robot/api-client/client.js has an interval that runs, triggering a TICK_RUN_TIME action every second to set state.robot.session.runTime to Date.now()

The client stops this interval whenever session.status is not running

If the app connects to a robot with session.status === 'paused', the TICK_RUN_TIME interval will never spin up, and runTime will never get set to Date.now()

So, as food for thought, here are two potential solutions:

option 1: do the least

Modify handleSessionUpdate in app/src/redux/robot/reducer/session.ts to set `runTime

This part of the reducer gets triggered whenever the app connects to a robot and whenever the session state gets updated

Probably set it to something weird like:

If session status is running, set runTime to now

If session status is paused, set runTime to startTime + statusInfo.changedAt - remoteTimeCompensation

option 2: do the most

See review comment above, but the gist of it is "ditch all client-side run time tracking and calculate everything from server state, updating the UI to account for the differences"

I like option (2) because it's more "correct" and doesn't feel like a huge lift, but option (1) would probably still fix the bug.

I'm sure you are right. For whatever the reason, I was either reading or observing that the bug had to do with the start-time. Which I'll take a look at, but it was that problem that a null remoteTimeCompensation fixed...which was not the bug. option #2 sounds good. Thanks.

ps. Was just thinking about the following while making dinner: My apologies, I didn't mean to imply that you made any sort of mistake here! This is some of the very oldest code in the app, it has a lot of baggage. No, you didn't and not at all what I meant. It occured to me when Max was reviewing. Actually, before that. Generally, I think it's a good idea to stick to bug fixes and do work outside of them in separate PRs. And I knew the pause change was hacky.

celsasser · 2021-06-02T14:23:31Z

Going to take the excellent feedback in this PR but pursue it with a new branch since the commit history in this one is of no value and the massive TS effort has been merged into edge. Will port the feedback to the new PR.

curtelsasser added 3 commits May 25, 2021 11:30

Merge remote-tracking branch 'origin/edge' into fix-run-timer-reset

da725bd

- have what may be a hacky fix.

a7118b4

- lots of logging which will be removed - notes about the pause logic