Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: configure performance-metrics to work on robot #15316

Merged
merged 21 commits into from
Jun 25, 2024

Conversation

DerekMaggio
Copy link
Contributor

@DerekMaggio DerekMaggio commented Jun 3, 2024

Overview

Adds packaging scripts and configuration to enable pushing and running performance metrics on a robot.

Currently running analysis and getting a cached analysis are the only things that are tracked on the robot

Closes https://opentrons.atlassian.net/browse/EXEC-497

Test Plan

  • Follow the steps in README to setup the env

  • Run robot analysis for 2-3 different protocols by going to the setup screen for the protocols

  • Repeat running analysis for each protocol

  • SSH into the robot and go to /data/performance_metrics_data/

  • There should be 2 files

    • robot_context_data
    • robot_context_data_headers
  • Check their content

    • robot_context_data should have lines that look like this,
      ANALYZING_PROTOCOL,1718739269730290312,14028043893
      GETTING_CACHED_ANALYSIS,1718739323578647179,47381526
      
      ...
      
    • robot_context_data_headers should have a single line
      state_name,function_start_time,duration
      
  • After you are finished run make unset-performance-metrics-ff

  • Run a protocol and make sure performance metrics are no longer being recorded. Also, make sure you can still run the analysis and get a cached analysis

  • Verify analysis and cached analysis on the OT-2 dev server

  • Verify analysis and cached analysis on the Flex dev server

  • Verify analysis and cached analysis on OT-2

  • Verify analysis and cached analysis on Flex Dev Kit

  • Verify analysis and cached analysis on Flex @skowalski08

Changelog

performance_helpers.py

  • Add TrackingFunctions class which contains functions intended to be used as decorators
  • Generalize wrapper creation function
  • Make all other entities in the file private

performance-metrics/Makefile

  • Add push targets for OT-2 and Flex
  • Add helper target update-robot-version to hack the robot into thinking it is on the same version as the app
  • Add set-performance-metrics-ff and unset-performance-metrics-ff
  • Add a get-logs target

performance-metrics/setup.py

Remove shared-data as a dep for performance-metrics as it does not use it

robot-server/Pipfile

Remove performance-metrics as a dependency as robot_server is not using it directly

robot-server/robot_server/protocols/protocol_analyzer.py

Wrap analyze and label it as ANALYZING_PROTOCOL

robot-server/robot_server/protocols/router.py

Wrap logic that gets cached_analysis and label it as GETTING_CACHED_PROTOCOL_ANALYSIS

Review requests

  • Take a look at my test plan, is there anything else that should be added to reduce the risk?
  • Am I installing the performance-metrics package correctly on the flex by just unpacking it in /opt/opentrons-robot-server?

Risk assessment

Medium? performance-metrics is now wrapping production code. But the package is not being built and installed on robots, it has to be pushed by a dev. Furthermore, nothing will be run without setting the feature flag

@DerekMaggio DerekMaggio force-pushed the performance-metrics-packaging-for-robot branch from 7e15359 to 20bbb43 Compare June 18, 2024 16:12
@DerekMaggio DerekMaggio changed the title chore: configure project to go on robot chore: configure performance-metrics to work on robot Jun 20, 2024
@DerekMaggio DerekMaggio changed the title chore: configure performance-metrics to work on robot feat: configure performance-metrics to work on robot Jun 20, 2024
@DerekMaggio DerekMaggio force-pushed the performance-metrics-packaging-for-robot branch 2 times, most recently from 7436749 to b7274f1 Compare June 21, 2024 18:32
@DerekMaggio DerekMaggio force-pushed the performance-metrics-packaging-for-robot branch from 91360f8 to 0cb46ab Compare June 21, 2024 20:03
@DerekMaggio DerekMaggio self-assigned this Jun 21, 2024
@DerekMaggio DerekMaggio marked this pull request as ready for review June 21, 2024 20:48
@DerekMaggio DerekMaggio requested review from a team as code owners June 21, 2024 20:48
@DerekMaggio DerekMaggio removed request for a team June 21, 2024 20:48
@DerekMaggio DerekMaggio requested review from sfoster1, SyntaxColoring, a team and y3rsh June 21, 2024 20:48
Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested for performance-metrics/Makefile and robot-server/Pipfile (unless I'm missing something). Other than that, LGTM. Thanks!

performance-metrics/Makefile Show resolved Hide resolved
performance-metrics/Makefile Outdated Show resolved Hide resolved
performance-metrics/setup.cfg Outdated Show resolved Hide resolved
robot-server/Pipfile Show resolved Hide resolved
protocol_store=protocol_store,
analysis_store=analysis_store,
analyses_manager=analyses_manager,
@TrackingFunctions.track_getting_cached_protocol_analysis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it'll work for now, but it seems like it's turning out that we want the trackers to work as context managers instead of decorators, right? In other words, we shouldn't have to turn a chunk of code into an inner function just for the sake of packaging it up and handing to track_getting_cached_protocol_analysis().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think you and @sfoster1 have been trying hard to avoid performance_metrics introducing even small overhead. I'm not sure if this is what we're doing now, but, for example, at one point you were checking the feature flag once on first import to avoid the overhead of an if/else every time the function under measurement was called. Is that still a goal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SyntaxColoring

This looks like it'll work for now, but it seems like it's turning out that we want the trackers to work as context managers instead of decorators, right? In other words, we shouldn't have to turn a chunk of code into an inner function just for the sake of packaging it up and handing to track_getting_cached_protocol_analysis().

See #14834 (comment) for a previous discussion about this. Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think you and @sfoster1 have been trying hard to avoid performance_metrics introducing even small overhead. I'm not sure if this is what we're doing now, but, for example, at one point you were checking the feature flag once on first import to avoid the overhead of an if/else every time the function under measurement was called. Is that still a goal?

@SyntaxColoring, you're correct that it should not be checking every time. But because _should_track is evaluated at import time you have to restart the robot-server if you ever want to update. Should I set the ff restart_required to True maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the background!

So to back up, you've gone to pains in the design of performance_metrics to avoid introducing overhead. But then it looks like the overhead is getting added here anyway?

Specifically—and I have not profiled this, but—if every call to create_protocol() defines an inner function _get_cached_protocol_analysis(), and then decorates it, and then calls it, I'm skeptical that all of that will be cheaper than it would have been to just enter and exit a context manager. Note that this particular decoration does not run at file parse time the way #14834 (comment) describes. It runs every time this block of _get_cached_protocol_analysis() runs.

You could avoid this by refactoring the _get_cached_protocol_analysis() inner function to be a module-level private function, and then the decoration really would only run once at module parse time.

But my larger point, which I've broached before and which I still believe, is: Are these pains really worth it? Are simpler approaches, like entering context managers and accessing os.environ "too frequently," really not good enough? I mean, we're measuring protocol analysis here. The latency of starting the background thread alone is probably going to drown out whatever microseconds we shaved by doing the decorator thing. I believe there are use cases where it could matter, but I can easily live without them, personally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I'm in both camps. I think the default approach, and the structural approach to implementing the reusable analysis side, should be oriented towards zero-cost abstractions even if they're inconvenient. But I think that at the point of integration, i.e. right here, sometimes it's okay to say "this is not zero-cost but the cost is marginal". For instance, here it's not zero-cost - like @SyntaxColoring says, you're creating an inner function each call - but this is not a hotpath; it's the top level call that will last a long time, not something called many times in a loop. We would jump through many more hoops than this to keep a zero-cost integration in some core protocol engine function, for a counterexample. And we need to have those hoops available to jump through, which is why we designed the metrics system as a function transformer that can run at import time.

I think it is reasonable to set restart_required to true for the feature flag for the general case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand. Leave the creation of the inner function as is because it is not something we call a whole lot, and even if we did it is a call that takes a while so the impact would be negligible.
If we were wrapping some logic that is called many times then the wrapping of the logic should be done in a way that is as performant as possible.

performance-metrics/README.md Outdated Show resolved Hide resolved
performance-metrics/README.md Show resolved Hide resolved
Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, thank you! Looking forward to using it.

@DerekMaggio DerekMaggio merged commit 4247d54 into edge Jun 25, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants