feat(robot-server): implement data files auto-deletion #15879

sanni-t · 2024-08-02T17:57:54Z

Overview

Adds auto-deletion of old data files. If the data files stored on disk exceed 50 files, then we start auto-deleting files as needed. The oldest file which is not being referenced by any analysis or run in the database is deleted from disk and removed from the database

Test Plan and Hands on Testing

Add 50 data files to the robot, use some of them in a run
Check that 50 files still exist
Add another file
Check that the oldest unused file is deleted
(pro-tip: you can change the file limit in robot settings for testing)

Changelog

added DataFileAutoDeleter and DataFileDeletionPlanner
updated data file router to use auto-deletion before adding a new file
added new remove() method to DataFilesStore

Review requests

Usual code review

Risk assessment

Medium. Doesn't change existing infrastructure, except that if implemented incorrectly, could delete data files unexpectedly.

SyntaxColoring

Looks great, thanks. Some minor comments.

SyntaxColoring · 2024-08-02T18:05:25Z

robot-server/robot_server/settings.py

+        default=50,
+        gt=0,
+        description=(
+            "The maximum number of data files to allow before auto-deleting old ones."


Should this say:

Suggested change

"The maximum number of data files to allow before auto-deleting old ones."

"The maximum number of unused data files to allow before auto-deleting old ones."

And be named maximum_unused_data_files?

Oh, and would you also mind running pipenv run python scripts/settings_schema.py settings_schema.json and then a top-level make format-js?

I'm not sure if settings_schema.json is really helpful, but given that it exists, we should keep it up to date. I'd also be down to just delete it.

It's actually not max number of unused data files though. As in, it will start deleting unused files when [used+unused] exceed 50.

Does that imply that if you have 10 runs that each use 5 CSV files that are all distinct from each other, it's impossible to upload any additional CSV files? Because you're already at the limit of 50 total, and the server will not autoremove any of those 50?

No, in that case we'll exceed the 50 file limit and allow adding new ones. Each time we try to add a new one, we will re-check to find any newly un-referenced files and delete them until files < 50 again.

SyntaxColoring · 2024-08-02T18:12:20Z

robot-server/robot_server/data_files/data_files_store.py

+            transaction.execute(delete_statement)
+
+        file_dir = self._data_files_directory.joinpath(file_id)
+        if file_dir:


What's this if file_dir intended to do? It looks like it will only check if the stringification of the path is != "", not that the path actually exists on the filesystem?

SyntaxColoring · 2024-08-02T18:16:59Z

robot-server/robot_server/data_files/file_auto_deleter.py

+        # It feels wasteful to collect usage info of upto 50 files
+        # even when there's no need for deletion


Yeah. Another thing that we should keep in mind is that this AutoDeleter pattern is not safe in the face of concurrent requests. I think in the future we should look for ways to internalize this logic within the stores, which are in a better position to do things transactionally and which already have some of this information available.

SyntaxColoring · 2024-08-02T18:17:54Z

robot-server/robot_server/data_files/data_files_store.py

+                raise FileInUseError(
+                    data_file_id=file_id,
+                    message=f"Cannot remove file {file_id} as it is being used in"
+                    f" existing{analysis_usage_text or ''}{conjunction}{runs_usage_text or ''}.",


Stylistically, it might be cleaner to do this kind of stringification inside FileInUseError()'s __init__().

jbleon95

Straightforward and looking good.

jbleon95 · 2024-08-02T18:26:24Z

robot-server/robot_server/data_files/file_auto_deleter.py

+    async def make_room_for_new_file(self) -> None:
+        """Delete old data files to make room for a new one."""
+        # It feels wasteful to collect usage info of upto 50 files
+        # even when there's no need for deletion


Maybe we can just check how many actual data files there are rather than the more complicated usage request? Not necessary for now and I don't know how much more efficient it'll be (especially once you're at 50 files and we're gonna have to do this every time), but a thought.

The usage check is necessary because deleting a used file will result in a cascade of errors starting from foreign keys in analysis/ runs tables. A solution would probably involve internalizing the file deletion in the store itself, as @SyntaxColoring suggested, so that we can keep track of the files as they are created/ deleted and handle auto-deletion much more efficiently.

sanni-t added 3 commits August 1, 2024 14:00

use file auto deletion before adding files, get file usage

8d80d7b

implement and used file deletion planner

d226348

implement file removal, update tests

279bd8f

sanni-t requested a review from a team as a code owner August 2, 2024 17:57

SyntaxColoring approved these changes Aug 2, 2024

View reviewed changes

jbleon95 approved these changes Aug 2, 2024

View reviewed changes

lint fix

925f758

sanni-t merged commit dceee85 into edge Aug 2, 2024
7 checks passed

sanni-t deleted the AUTH-467-auto-delete-old-csv-files branch August 7, 2024 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(robot-server): implement data files auto-deletion #15879

feat(robot-server): implement data files auto-deletion #15879

sanni-t commented Aug 2, 2024

SyntaxColoring left a comment

SyntaxColoring Aug 2, 2024

SyntaxColoring Aug 2, 2024 •

edited

Loading

sanni-t Aug 2, 2024

SyntaxColoring Aug 2, 2024 •

edited

Loading

sanni-t Aug 2, 2024

SyntaxColoring Aug 2, 2024

SyntaxColoring Aug 2, 2024

SyntaxColoring Aug 2, 2024

jbleon95 left a comment

jbleon95 Aug 2, 2024

sanni-t Aug 2, 2024

	"The maximum number of data files to allow before auto-deleting old ones."
	"The maximum number of unused data files to allow before auto-deleting old ones."

		# It feels wasteful to collect usage info of upto 50 files
		# even when there's no need for deletion

feat(robot-server): implement data files auto-deletion #15879

feat(robot-server): implement data files auto-deletion #15879

Conversation

sanni-t commented Aug 2, 2024

Overview

Test Plan and Hands on Testing

Changelog

Review requests

Risk assessment

SyntaxColoring left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbleon95 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring Aug 2, 2024 •

edited

Loading

SyntaxColoring Aug 2, 2024 •

edited

Loading