Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(robot-server): move run time params to their own DB table #15650

Merged
merged 20 commits into from
Jul 29, 2024

Conversation

sanni-t
Copy link
Member

@sanni-t sanni-t commented Jul 15, 2024

Closes AUTH-527

Overview

This PR moves primitive RTPs to their own table and refactors the analysis-generation process. This entire refactor makes it easier to add on CSV-parameter handling.

Architecture and database changes:

  • Created a new table to store all primitive (non-CSV) run time parameter values used in the analysis.
  • Previously, we used to compare only the RTP values provided by the client with the full set of RTP values used in the last analysis. Because the values provided by the client cannot be assumed to be the full set of values of run time parameters in a protocol (parameters that don't have a client-specified value get assigned a default one), we had to make additional comparison of previous RTP values and default values.
  • This PR simplifies that comparison by extracting the full set of RTP values for the current analysis beforehand and then comparing it with the full set of previous RTP values.
  • This extraction of full RTP values earlier in the process meant that we needed to re-architect the flow of analysis.

Here's what the new analysis flow looks like on a high level-

  flowchart TD
  %% Nodes
      A1(Protocol)
      A2(Run time param values from client)
      B1(Upload protocol endpoint)
      C1("Initialize analyzer: \n Validate and set run time parameters \n using definitions in protocol \nand values passed by client")
      D1("Create a not-ok analysis result. \nSave result to DB. \nReturn analysis completed result.")
      D2("Compare the newly validated RTP values \nwith those of the last analysis (if any) \nof this protocol.")
      E1("No analysis required. Use last analysis in analyses list.")
      E2("In a background task, initiate new analysis using the new RTP values.")
      F1("Pending analysis is created")
      F2("Return pending analysis summary response")
  
  %% Edge connections between nodes
  
      A1 ---> B1
      A2 ---> B1
      B1 ---> C1
      C1 -- Initialization/RTP validation failed --> D1
      C1 -- RTP values validated successfully --> D2
      D2 -- Matching RTPs --> E1
      D2 -- Non-matching RTPs --> E2 --> F1 --> F2
Loading
flowchart TD
%% Nodes
    EF("Analyze task")
    G1("Pending analysis is updated to a \n'Completed analysis' with an error.")
    G2("Pending analysis is updated to a \n successful 'Completed analysis'.")
    End(" \nAnalysis and RTP values are saved to DB.")

%% Edge connections between nodes

    EF -- Analysis Error --> G1 --> End
    EF -- Sucessful analysis --> G2 --> End
Loading

Test Plan

  • Since this is only a refactor, existing integration and snapshot tests passing should be a good enough check for whether this is accidentally changing anything it shouldn't be
  • This changes the DB by removing a column from a table in schema 4 so we should make sure that a robot on DB v4, that has protocols (both RTP & non-RTP), gets migrated to v5 successfully and there are no issues handling their analyses.

Changelog

  • added the new primitive RTP table and removed the 'rtp values and default' column from analysis table
  • updated DB v4 -> v5 migration
  • refactored AnalysesManager, ProtocolAnalyzer, AnalysisStore and protocols router to use the new flow as shown above.
  • updated tests

Review requests

  • @SyntaxColoring and I had discussed most of these changes pretty extensively. I was initially hesitant towards moving the RTP validation and extraction earlier in the process because of the amount of refactoring that would result in, but as i was working towards adding CSV parameter handling, the early validation method deemed quite useful, especially since CSV parameters will be relying on external files to set their 'values'. (so thanks for the suggestion Max!)
  • Does the new flow make sense? Anything that isn't clear?

Risk assessment

Medium-high. It changes quite a critical part of the protocol upload and analysis process. But the fact that all integration tests and analyses snapshots are passing makes me feel quite confident that the existing behavior of these endpoints has not changed by this work.

Copy link
Contributor

A PR has been opened to address analyses snapshot changes. Please review the changes here: https://github.com/Opentrons/opentrons/pull/

Copy link
Contributor

A PR has been opened to address analyses snapshot changes. Please review the changes here: https://github.com/Opentrons/opentrons/pull/

Copy link
Contributor

A PR has been opened to address analyses snapshot changes. Please review the changes here: https://github.com/Opentrons/opentrons/pull/

@sanni-t sanni-t marked this pull request as ready for review July 15, 2024 19:25
@sanni-t sanni-t requested review from a team as code owners July 15, 2024 19:25
Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working my way through this, looks great, thanks! Here are some requests.

sqlalchemy.Column(
"run_time_parameter_values_and_defaults",
"parameter_variable_name",
Copy link
Contributor

@SyntaxColoring SyntaxColoring Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to throw in unique=True on the parameter_variable_name column.

Whoops, no, sorry, definitely not just on the parameter_variable_name column. It would need to be on the "analysis_id", "parameter_variable_name" combination.

This is just a nice-to-have.

robot-server/robot_server/persistence/tables/schema_5.py Outdated Show resolved Hide resolved
schema_5.run_command_table,
source_transaction,
dest_transaction,
order_by_rowid=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think this is harmless, but order_by_rowid can be False for run_command_table because it has explicitly-declared row_id and index_in_run columns.

I wonder if order_by_rowid=True is actually any slower or more memory-hungry than =False. Maybe it should just always be True to simplify things.

robot-server/robot_server/protocols/rtp_resources.py Outdated Show resolved Hide resolved
robot-server/robot_server/protocols/rtp_resources.py Outdated Show resolved Hide resolved
Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final set of comments.

Copy link
Contributor

A PR has been opened to address analyses snapshot changes. Please review the changes here: https://github.com/Opentrons/opentrons/pull/

Copy link
Member Author

@sanni-t sanni-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from meeting:

  • Update the persistence snapshot tests to include RTP checks.

Copy link

codecov bot commented Jul 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.43%. Comparing base (e0e018f) to head (262db4b).
Report is 36 commits behind head on edge.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             edge   #15650       +/-   ##
===========================================
+ Coverage   63.25%   92.43%   +29.18%     
===========================================
  Files         287       77      -210     
  Lines       15013     1283    -13730     
===========================================
- Hits         9497     1186     -8311     
+ Misses       5516       97     -5419     
Flag Coverage Δ
g-code-testing 92.43% <ø> (ø)
hardware ?
shared-data ?
system-server ?
update-server ?
usb-bridge ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 210 files with indirect coverage changes

Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I'm always for breaking stuff into more steps to better mirror semantics. I have a couple things I'd love to see changed but if stuff depends on this getting merged it's ok:

  • Remove some null checks by slightly reorganizing
  • Make remaining checks not just asserts but actual errors
  • Just saying, Scalar feels like a better name than Primitive

import robot_server.errors.error_mappers as em


class FailedToInitializeAnalyzer(Exception):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an enumerated error so it can carry more context?

parameter,
prev_value_and_default,
) in rtp_values_and_defaults_in_last_analysis.items():
assert set(param.variableName for param in new_parameters) == set(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure about this one? I really don't like using assert in this way in production code. If we're going to be paranoid, why not have this function (1) log something big and then (2) just return False, since if these parameter names don't match then they're definitionally different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfoster1 can you explain why you don't like using assert in this way?
I prefer assert rather than silently logging because it makes sure that the error is actually brought to attention. It's a practice we've followed in most of robot server and engine code. If we only log the error and continue running (by returning False in this case), then the server might run into some other error down the line because failing this assertion means that there's a bug in our code somewhere else. So it's better to fail right away and know why it failed rather than failing a little later with a possibly unclear reason.

@@ -407,22 +409,22 @@ def add(
self,
protocol_id: str,
analysis_id: str,
) -> PendingAnalysis:
run_time_parameters: List[RunTimeParameter],
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this? feels like a nice convenience thing to return the newly created object, and it's a smaller diff

with self._sql_engine.begin() as transaction:
results = transaction.execute(statement).all()

rtps: Dict[str, PrimitiveAllowedTypes] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

resources = [PrimitiveParameterResource.from_sql_row(row) for row in results]
rtps = {param. parameter_variable_name: param.parameter_value for param in resources}

? Means you don't have to explicitly type rtps

@@ -34,45 +37,57 @@ def __init__(
"""Initialize the analyzer and its dependencies."""
self._analysis_store = analysis_store
self._protocol_resource = protocol_resource
self._orchestrator: Optional[RunOrchestrator] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking around the code, it seems like the first thing you always do with a protocol analyzer is call load_runner (now load_orchestrator) with the RTPs, including in this pr. Given that, instead of having load_orchestrator can we

  • Require an orchestrator instance as an initializer param, meaning it's not-null by construction
  • make create_protocol_analyzer take the optional run_time_param values and build this there

That way we don't need all these asserts and we make it impossible to call get_verified_run_time_parameters in an illegal way.

@sanni-t
Copy link
Member Author

sanni-t commented Jul 29, 2024

Thanks for the feedback @sfoster1. I'll address them in a follow-up PR since this one's quite overdue for merging.

@sanni-t sanni-t merged commit 3df5fb3 into edge Jul 29, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants