-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(robot-server): fetching of data files used in runs of a protocol #15908
fix(robot-server): fetching of data files used in runs of a protocol #15908
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me once that failing integration test is passing. Thanks!
self._run_store.insert_csv_rtp( | ||
run_id=run_id, run_time_parameters=run_time_parameters | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works for this PR, but ideally, I think we'd want this insert_csv_rtp(...)
call to be combined with the self._run_store.insert(...)
call a few lines above so it can be one atomic transaction.
status_code: 201 | ||
status_code: | ||
- 201 | ||
- 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in a call: Yeah, we should have a DELETE
method so clients can manually delete data files and these tests can restore the robot to its original state. Certainly for a separate PR.
ids_in_analyses = transaction.execute(analysis_csv_file_ids).scalars().all() | ||
ids_in_runs = transaction.execute(run_csv_file_ids).scalars().all() | ||
data_files_rows = transaction.execute( | ||
data_files_table.select() | ||
.where( | ||
data_files_table.c.id.in_(list(set(ids_in_analyses + ids_in_runs))) | ||
) | ||
.order_by(sqlite_rowid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering your review question, I think we can do this:
with self._sql_engine.begin() as transaction:
data_files_rows = transaction.execute(
data_files_table.select()
.where(
data_files_table.c.id.in_(analysis_csv_file_ids)
| data_files_table.c.id.in_(run_csv_file_ids)
)
.order_by(sqlite_rowid)
).all()
Which keeps the set operations inside SQLite.
The documentation for |
is difficult to Google for, but it's here.
Some kind of JOIN
would work too. I haven't tested this but we might be reaching the point of query complexity where it actually makes a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, completely unrelated, but for future reference, tables that need to support queries like this should probably have indexes. For example, right now, to find out which analyses use a given CSV file, SQLite needs to linearly scan on analysis_csv_rtp_table.file_id
. And to find out which CSV files are used by a given analysis, SQLite needs to linearly scan on analysis_csv_rtp_table.analysis_id
.
The way that I check for this kind of thing is:
- When you add a new query, run the unit test with
--log-level=INFO -rA
so you can see the compiled SQL. Equivalently, you could watch the logs in the dev server as you interact with it via Postman. - Run
sqlite3
on any robot-server.db
file to get a REPL. The database can be empty; you just need it to have the correct schema. - Run
EXPLAIN QUERY PLAN
in the REPL and paste the query from above. You might need to adjust the parameter placeholders to be syntactically valid. If the query explanation saysSCAN
, that means it's doing an O(n) full table scan and the schema deserves a second look. If it saysSEARCH ... USING INDEX
, that's a better O(log(n)).
Here's example output from this query:
QUERY PLAN
|--MULTI-INDEX OR
| |--INDEX 1
| | |--LIST SUBQUERY 2
| | | |--SCAN analysis_csv_rtp_table
| | | `--LIST SUBQUERY 1
| | | `--SEARCH analysis USING INDEX ix_analysis_protocol_id (protocol_id=?)
| | `--SEARCH data_files USING INDEX sqlite_autoindex_data_files_1 (id=?)
| `--INDEX 2
| |--LIST SUBQUERY 4
| | |--SCAN run_csv_rtp_table
| | `--LIST SUBQUERY 3
| | `--SCAN run
| `--SEARCH data_files USING INDEX sqlite_autoindex_data_files_1 (id=?)
`--USE TEMP B-TREE FOR ORDER BY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tips!
select_referencing_analysis_ids = sqlalchemy.select(analysis_table.c.id).where( | ||
analysis_table.c.protocol_id == protocol_id | ||
) | ||
select_referencing_run_ids = sqlalchemy.select(run_table.c.id).where( | ||
run_table.c.protocol_id == protocol_id | ||
) | ||
# Get all entries in csv table that match the analyses | ||
# Get all entries in analysis_csv_table that match the analysis IDs above | ||
analysis_csv_file_ids = sqlalchemy.select( | ||
analysis_csv_rtp_table.c.file_id | ||
).where( | ||
analysis_csv_rtp_table.c.analysis_id.in_(select_referencing_analysis_ids) | ||
) | ||
# Get all entries in run_csv_table that match the run IDs above | ||
run_csv_file_ids = sqlalchemy.select(run_csv_rtp_table.c.file_id).where( | ||
run_csv_rtp_table.c.run_id.in_(select_referencing_run_ids) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, and not introduced in this PR: Can these variable names either consistently have a select_
prefix to indicate that they're merely statements instead of results, or consistently not have that prefix?
Closes AUTH-638
Overview
Fixes a bug where CSV files used in protocol runs were not being saved to the CSV RTP table for runs, and as a result, were not being included in response for the
/protocols/{protocolId}/dataFiles
endpoint.Test Plan and Hands on Testing
Testing with app & ODD:
Testing with Postman/ direct HTTP requests:
GET /protocols/{protocolId}/dataFiles
contains the 3 data files used with the runs & analyses. Check that they are listed in the order that the files were uploaded to the server (viaPOST /dataFiles
)Changelog
run_csv_rtp_table
before deleting the run.../{protocolId}/dataFiles
response so that it lists the files in the order they were uploaded.Review requests
Risk assessment
Low. Fixes bug