feat: configure performance-metrics to work on robot #15316

DerekMaggio · 2024-06-03T16:12:06Z

Overview

Adds packaging scripts and configuration to enable pushing and running performance metrics on a robot.

Currently running analysis and getting a cached analysis are the only things that are tracked on the robot

Closes https://opentrons.atlassian.net/browse/EXEC-497

Test Plan

Follow the steps in README to setup the env
Run robot analysis for 2-3 different protocols by going to the setup screen for the protocols
Repeat running analysis for each protocol
SSH into the robot and go to /data/performance_metrics_data/
There should be 2 files
- robot_context_data
- robot_context_data_headers

Check their content

robot_context_data should have lines that look like this,

ANALYZING_PROTOCOL,1718739269730290312,14028043893
GETTING_CACHED_ANALYSIS,1718739323578647179,47381526

...

robot_context_data_headers should have a single line
```
state_name,function_start_time,duration
```

After you are finished run make unset-performance-metrics-ff
Run a protocol and make sure performance metrics are no longer being recorded. Also, make sure you can still run the analysis and get a cached analysis
Verify analysis and cached analysis on the OT-2 dev server
Verify analysis and cached analysis on the Flex dev server
Verify analysis and cached analysis on OT-2
Verify analysis and cached analysis on Flex Dev Kit
Verify analysis and cached analysis on Flex @skowalski08

Changelog

`performance_helpers.py`

Add TrackingFunctions class which contains functions intended to be used as decorators
Generalize wrapper creation function
Make all other entities in the file private

`performance-metrics/Makefile`

Add push targets for OT-2 and Flex
Add helper target update-robot-version to hack the robot into thinking it is on the same version as the app
Add set-performance-metrics-ff and unset-performance-metrics-ff
Add a get-logs target

`performance-metrics/setup.py`

Remove shared-data as a dep for performance-metrics as it does not use it

`robot-server/Pipfile`

Remove performance-metrics as a dependency as robot_server is not using it directly

`robot-server/robot_server/protocols/protocol_analyzer.py`

Wrap analyze and label it as ANALYZING_PROTOCOL

`robot-server/robot_server/protocols/router.py`

Wrap logic that gets cached_analysis and label it as GETTING_CACHED_PROTOCOL_ANALYSIS

Review requests

Take a look at my test plan, is there anything else that should be added to reduce the risk?
Am I installing the performance-metrics package correctly on the flex by just unpacking it in /opt/opentrons-robot-server?

Risk assessment

Medium? performance-metrics is now wrapping production code. But the package is not being built and installed on robots, it has to be pushed by a dev. Furthermore, nothing will be run without setting the feature flag

make everything that is not TrackingFunctions private Wrap analyze in protocol_analyzer Wrap getting cached protocol logic

relock

SyntaxColoring

Changes requested for ~~performance-metrics/Makefile and~~ robot-server/Pipfile (unless I'm missing something). Other than that, LGTM. Thanks!

performance-metrics/Makefile

performance-metrics/setup.cfg

robot-server/Pipfile

SyntaxColoring · 2024-06-24T16:00:14Z

robot-server/robot_server/protocols/router.py

-                protocol_store=protocol_store,
-                analysis_store=analysis_store,
-                analyses_manager=analyses_manager,
+        @TrackingFunctions.track_getting_cached_protocol_analysis


This looks like it'll work for now, but it seems like it's turning out that we want the trackers to work as context managers instead of decorators, right? In other words, we shouldn't have to turn a chunk of code into an inner function just for the sake of packaging it up and handing to track_getting_cached_protocol_analysis().

I also think you and @sfoster1 have been trying hard to avoid performance_metrics introducing even small overhead. I'm not sure if this is what we're doing now, but, for example, at one point you were checking the feature flag once on first import to avoid the overhead of an if/else every time the function under measurement was called. Is that still a goal?

@SyntaxColoring

This looks like it'll work for now, but it seems like it's turning out that we want the trackers to work as context managers instead of decorators, right? In other words, we shouldn't have to turn a chunk of code into an inner function just for the sake of packaging it up and handing to track_getting_cached_protocol_analysis().

See #14834 (comment) for a previous discussion about this. Let me know what you think.

I also think you and @sfoster1 have been trying hard to avoid performance_metrics introducing even small overhead. I'm not sure if this is what we're doing now, but, for example, at one point you were checking the feature flag once on first import to avoid the overhead of an if/else every time the function under measurement was called. Is that still a goal?

@SyntaxColoring, you're correct that it should not be checking every time. But because _should_track is evaluated at import time you have to restart the robot-server if you ever want to update. Should I set the ff restart_required to True maybe?

Thank you for the background!

So to back up, you've gone to pains in the design of performance_metrics to avoid introducing overhead. But then it looks like the overhead is getting added here anyway?

Specifically—and I have not profiled this, but—if every call to create_protocol() defines an inner function _get_cached_protocol_analysis(), and then decorates it, and then calls it, I'm skeptical that all of that will be cheaper than it would have been to just enter and exit a context manager. Note that this particular decoration does not run at file parse time the way #14834 (comment) describes. It runs every time this block of _get_cached_protocol_analysis() runs.

You could avoid this by refactoring the _get_cached_protocol_analysis() inner function to be a module-level private function, and then the decoration really would only run once at module parse time.

But my larger point, which I've broached before and which I still believe, is: Are these pains really worth it? Are simpler approaches, like entering context managers and accessing os.environ "too frequently," really not good enough? I mean, we're measuring protocol analysis here. The latency of starting the background thread alone is probably going to drown out whatever microseconds we shaved by doing the decorator thing. I believe there are use cases where it could matter, but I can easily live without them, personally.

I think that I'm in both camps. I think the default approach, and the structural approach to implementing the reusable analysis side, should be oriented towards zero-cost abstractions even if they're inconvenient. But I think that at the point of integration, i.e. right here, sometimes it's okay to say "this is not zero-cost but the cost is marginal". For instance, here it's not zero-cost - like @SyntaxColoring says, you're creating an inner function each call - but this is not a hotpath; it's the top level call that will last a long time, not something called many times in a loop. We would jump through many more hoops than this to keep a zero-cost integration in some core protocol engine function, for a counterexample. And we need to have those hoops available to jump through, which is why we designed the metrics system as a function transformer that can run at import time.

I think it is reasonable to set restart_required to true for the feature flag for the general case.

To make sure I understand. Leave the creation of the inner function as is because it is not something we call a whole lot, and even if we did it is a call that takes a while so the impact would be negligible.
If we were wrapping some logic that is called many times then the wrapping of the logic should be done in a way that is as performant as possible.

performance-metrics/README.md

SyntaxColoring

Yay, thank you! Looking forward to using it.

DerekMaggio force-pushed the performance-metrics-packaging-for-robot branch from 7e15359 to 20bbb43 Compare June 18, 2024 16:12

DerekMaggio changed the title ~~chore: configure project to go on robot~~ chore: configure performance-metrics to work on robot Jun 20, 2024

DerekMaggio changed the title ~~chore: configure performance-metrics to work on robot~~ feat: configure performance-metrics to work on robot Jun 20, 2024

DerekMaggio force-pushed the performance-metrics-packaging-for-robot branch 2 times, most recently from 7436749 to b7274f1 Compare June 21, 2024 18:32

DerekMaggio added 17 commits June 21, 2024 12:47

fix: fix header name

5820d8e

chore: add state for getting cached analysis

f586f3a

feat: add TrackingFunctions and use it

600b75d

make everything that is not TrackingFunctions private Wrap analyze in protocol_analyzer Wrap getting cached protocol logic

chore: add stuff to package performance-metrics for robot installation

bfceb7c

chore: add some utility functions

2d03de5

chore: add README

8d7de2a

chore: remove gitignore

6b11657

fix: use project name not folder name

712cad2

fix: need version file

8c04e3a

fix: remove quotes and add src as prefix folder

9af243d

chore: performance-metrics does not use shared-data

224f9c6

robot_server does not directly use performance metrics api does

6321276

relock

chore: this file isn't used

2592d6b

chore: ignore license created by buildling sdist

3c35925

remove confusing stuff

18545c8

clarify

35b8f9b

thank you automated tests

0cb46ab

DerekMaggio force-pushed the performance-metrics-packaging-for-robot branch from 91360f8 to 0cb46ab Compare June 21, 2024 20:03

DerekMaggio self-assigned this Jun 21, 2024

really...

cbb713f

DerekMaggio marked this pull request as ready for review June 21, 2024 20:48

DerekMaggio requested review from a team as code owners June 21, 2024 20:48

DerekMaggio removed request for a team June 21, 2024 20:48

DerekMaggio requested review from sfoster1, SyntaxColoring, a team and y3rsh June 21, 2024 20:48

SyntaxColoring requested changes Jun 24, 2024

View reviewed changes

DerekMaggio added 3 commits June 25, 2024 11:57

fix: Max is right, I am wrong, no surprises there

4490738

doc changes

5b8d813

a newline

b2503d7

SyntaxColoring approved these changes Jun 25, 2024

View reviewed changes

DerekMaggio merged commit 4247d54 into edge Jun 25, 2024
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: configure performance-metrics to work on robot #15316

feat: configure performance-metrics to work on robot #15316

DerekMaggio commented Jun 3, 2024 •

edited

Loading

SyntaxColoring left a comment •

edited

Loading

SyntaxColoring Jun 24, 2024

SyntaxColoring Jun 24, 2024

DerekMaggio Jun 24, 2024

DerekMaggio Jun 24, 2024

SyntaxColoring Jun 24, 2024

sfoster1 Jun 25, 2024

DerekMaggio Jun 25, 2024

SyntaxColoring left a comment •

edited

Loading

feat: configure performance-metrics to work on robot #15316

feat: configure performance-metrics to work on robot #15316

Conversation

DerekMaggio commented Jun 3, 2024 • edited Loading

Overview

Test Plan

Changelog

performance_helpers.py

performance-metrics/Makefile

performance-metrics/setup.py

robot-server/Pipfile

robot-server/robot_server/protocols/protocol_analyzer.py

robot-server/robot_server/protocols/router.py

Review requests

Risk assessment

SyntaxColoring left a comment • edited Loading

Choose a reason for hiding this comment

SyntaxColoring Jun 24, 2024

Choose a reason for hiding this comment

SyntaxColoring Jun 24, 2024

Choose a reason for hiding this comment

DerekMaggio Jun 24, 2024

Choose a reason for hiding this comment

DerekMaggio Jun 24, 2024

Choose a reason for hiding this comment

SyntaxColoring Jun 24, 2024

Choose a reason for hiding this comment

sfoster1 Jun 25, 2024

Choose a reason for hiding this comment

DerekMaggio Jun 25, 2024

Choose a reason for hiding this comment

SyntaxColoring left a comment • edited Loading

Choose a reason for hiding this comment

DerekMaggio commented Jun 3, 2024 •

edited

Loading

`performance_helpers.py`

`performance-metrics/Makefile`

`performance-metrics/setup.py`

`robot-server/Pipfile`

`robot-server/robot_server/protocols/protocol_analyzer.py`

`robot-server/robot_server/protocols/router.py`

SyntaxColoring left a comment •

edited

Loading

SyntaxColoring left a comment •

edited

Loading