fix(robot-server): fetching of data files used in runs of a protocol #15908

sanni-t · 2024-08-06T20:47:47Z

Overview

Fixes a bug where CSV files used in protocol runs were not being saved to the CSV RTP table for runs, and as a result, were not being included in response for the /protocols/{protocolId}/dataFiles endpoint.

Test Plan and Hands on Testing

Testing with app & ODD:

Upload a protocol that uses a CSV parameter
Send the protocol to a robot, upload the csv file to use
Run the protocol
Run the protocol again with a different CSV file -> Do this 5 times (so total 6 runs with 6 different CSV files). By re-running the protocol 6 times, we are making the robot delete its oldest analysis (since max analyses per protocol is 5), essentially deleting the first CSV file from the analysis csv table, but not from runs table
Check that when you run the protocol again on the ODD, it shows you all the 6 different CSV files previously uploaded

Testing with Postman/ direct HTTP requests:

Upload a few data files
Upload a protocol that uses a CSV parameter and specify a data file (data_file_1) for the CSV param
Start a new analysis for the same protocol by specifying a second data file (data_file_2) for the CSV param
Create a run for the protocol by specifying data_file_1 for its CSV param
Create another run for the protocol by specifying a third data file (data_file_3) for its CSV param
Check that the response to GET /protocols/{protocolId}/dataFiles contains the 3 data files used with the runs & analyses. Check that they are listed in the order that the files were uploaded to the server (via POST /dataFiles)

Changelog

wired up CSV RTP table insertion during run creation
updated the run deletion code to remove the CSV RTP entry from the run_csv_rtp_table before deleting the run.
updated the ../{protocolId}/dataFiles response so that it lists the files in the order they were uploaded.
added tests

Review requests

Usual code review stuff
Is there a better way to write the SQL query to get the ordered list of data files used in runs and analyses?

Risk assessment

Low. Fixes bug

…n is deleted, added tests

SyntaxColoring

Looks good to me once that failing integration test is passing. Thanks!

SyntaxColoring · 2024-08-07T15:02:35Z

robot-server/robot_server/runs/run_data_manager.py

+        self._run_store.insert_csv_rtp(
+            run_id=run_id, run_time_parameters=run_time_parameters
+        )


This works for this PR, but ideally, I think we'd want this insert_csv_rtp(...) call to be combined with the self._run_store.insert(...) call a few lines above so it can be one atomic transaction.

SyntaxColoring · 2024-08-07T15:04:48Z

robot-server/tests/integration/http_api/runs/test_run_with_run_time_parameters.tavern.yaml

-      status_code: 201
+      status_code:
+        - 201
+        - 200


Discussed in a call: Yeah, we should have a DELETE method so clients can manually delete data files and these tests can restore the robot to its original state. Certainly for a separate PR.

SyntaxColoring · 2024-08-07T15:35:27Z

robot-server/robot_server/protocols/protocol_store.py

+            ids_in_analyses = transaction.execute(analysis_csv_file_ids).scalars().all()
+            ids_in_runs = transaction.execute(run_csv_file_ids).scalars().all()
+            data_files_rows = transaction.execute(
+                data_files_table.select()
+                .where(
+                    data_files_table.c.id.in_(list(set(ids_in_analyses + ids_in_runs)))
+                )
+                .order_by(sqlite_rowid)


Answering your review question, I think we can do this:

with self._sql_engine.begin() as transaction: data_files_rows = transaction.execute( data_files_table.select() .where( data_files_table.c.id.in_(analysis_csv_file_ids) | data_files_table.c.id.in_(run_csv_file_ids) ) .order_by(sqlite_rowid) ).all()

Which keeps the set operations inside SQLite.

The documentation for | is difficult to Google for, but it's here.

Some kind of JOIN would work too. I haven't tested this but we might be reaching the point of query complexity where it actually makes a difference.

Also, completely unrelated, but for future reference, tables that need to support queries like this should probably have indexes. For example, right now, to find out which analyses use a given CSV file, SQLite needs to linearly scan on analysis_csv_rtp_table.file_id. And to find out which CSV files are used by a given analysis, SQLite needs to linearly scan on analysis_csv_rtp_table.analysis_id.

The way that I check for this kind of thing is:

When you add a new query, run the unit test with --log-level=INFO -rA so you can see the compiled SQL. Equivalently, you could watch the logs in the dev server as you interact with it via Postman.

Run sqlite3 on any robot-server .db file to get a REPL. The database can be empty; you just need it to have the correct schema.

Run EXPLAIN QUERY PLAN in the REPL and paste the query from above. You might need to adjust the parameter placeholders to be syntactically valid. If the query explanation says SCAN, that means it's doing an O(n) full table scan and the schema deserves a second look. If it says SEARCH ... USING INDEX, that's a better O(log(n)).

Here's example output from this query:

QUERY PLAN |--MULTI-INDEX OR | |--INDEX 1 | | |--LIST SUBQUERY 2 | | | |--SCAN analysis_csv_rtp_table | | | `--LIST SUBQUERY 1 | | | `--SEARCH analysis USING INDEX ix_analysis_protocol_id (protocol_id=?) | | `--SEARCH data_files USING INDEX sqlite_autoindex_data_files_1 (id=?) | `--INDEX 2 | |--LIST SUBQUERY 4 | | |--SCAN run_csv_rtp_table | | `--LIST SUBQUERY 3 | | `--SCAN run | `--SEARCH data_files USING INDEX sqlite_autoindex_data_files_1 (id=?) `--USE TEMP B-TREE FOR ORDER BY

Thanks for the tips!

SyntaxColoring · 2024-08-07T15:39:37Z

robot-server/robot_server/protocols/protocol_store.py

        select_referencing_analysis_ids = sqlalchemy.select(analysis_table.c.id).where(
            analysis_table.c.protocol_id == protocol_id
        )
        select_referencing_run_ids = sqlalchemy.select(run_table.c.id).where(
            run_table.c.protocol_id == protocol_id
        )
-        # Get all entries in csv table that match the analyses
+        # Get all entries in analysis_csv_table that match the analysis IDs above
        analysis_csv_file_ids = sqlalchemy.select(
            analysis_csv_rtp_table.c.file_id
        ).where(
            analysis_csv_rtp_table.c.analysis_id.in_(select_referencing_analysis_ids)
        )
+        # Get all entries in run_csv_table that match the run IDs above
        run_csv_file_ids = sqlalchemy.select(run_csv_rtp_table.c.file_id).where(
            run_csv_rtp_table.c.run_id.in_(select_referencing_run_ids)
        )


Nitpick, and not introduced in this PR: Can these variable names either consistently have a select_ prefix to indicate that they're merely statements instead of results, or consistently not have that prefix?

sanni-t added 2 commits August 6, 2024 16:16

return an *ordered list* of data files used in a protocol

41a0389

actually save CSV RTPs to the table upon run creation, delete when ru…

be937d2

…n is deleted, added tests

sanni-t requested review from a team as code owners August 6, 2024 20:47

sanni-t requested review from SyntaxColoring and jbleon95 August 6, 2024 20:48

improve test

bcbd144

SyntaxColoring approved these changes Aug 7, 2024

View reviewed changes

addressed comments, fixed test

7303add

sanni-t merged commit 90127ff into chore_release-8.0.0 Aug 7, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(robot-server): fetching of data files used in runs of a protocol #15908

fix(robot-server): fetching of data files used in runs of a protocol #15908

sanni-t commented Aug 6, 2024 •

edited

Loading

SyntaxColoring left a comment

SyntaxColoring Aug 7, 2024

SyntaxColoring Aug 7, 2024

SyntaxColoring Aug 7, 2024

SyntaxColoring Aug 7, 2024 •

edited

Loading

sanni-t Aug 7, 2024

SyntaxColoring Aug 7, 2024

fix(robot-server): fetching of data files used in runs of a protocol #15908

fix(robot-server): fetching of data files used in runs of a protocol #15908

Conversation

sanni-t commented Aug 6, 2024 • edited Loading

Overview

Test Plan and Hands on Testing

Changelog

Review requests

Risk assessment

SyntaxColoring left a comment

Choose a reason for hiding this comment

SyntaxColoring Aug 7, 2024

Choose a reason for hiding this comment

SyntaxColoring Aug 7, 2024

Choose a reason for hiding this comment

SyntaxColoring Aug 7, 2024

Choose a reason for hiding this comment

SyntaxColoring Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

sanni-t Aug 7, 2024

Choose a reason for hiding this comment

SyntaxColoring Aug 7, 2024

Choose a reason for hiding this comment

sanni-t commented Aug 6, 2024 •

edited

Loading

SyntaxColoring Aug 7, 2024 •

edited

Loading