feat: unify data directory cli #1469

morph-dev · 2024-09-20T13:55:44Z

What was wrong?

Currently, trin-execution binary doesn't support setting custom directory and trin supports it only via env variable. Also the code initializes directory is split.

How was it fixed?

Added data-dir command line flag to both trin and trin-execution, and extracted common logic into trin-utils crate.

if neither data-dir nor ephemeral flags are set, we use OS's default local data directory (same as now)
if only data-dir is set, we use that folder instead
if only ephemeral is set, we use OS's temporary directory, and create trin/trin-execution subfolder inside (same as now)
if both flags are set, it behaves the same as with ephemeral, but it uses provided directory instead of OS's temp directory

In the case of trin, I left an option for env variable (TRIN_DATA_PATH) to be used. It behaves the same as --data-dir was set. If both are set, we will throw error.

Other smaller refactorings:

convert to use &Path instead of PathBuf in several existing functions
created create_temp_test_dir function that unifies how temporary directories are created for test purposes
- they are more use cases that I didn't catch in this PR, as this wasn't the primary purpose of this PR

To-Do

Clean up commit history and use conventional commits.

KolbyML

njgheorghita · 2024-09-20T18:57:24Z

trin-execution/tests/content_generation.rs

@@ -124,5 +124,6 @@ async fn test_we_can_generate_content_key_values_up_to_x() -> Result<()> {
        trin_execution.database.storage_cache.clear();
        println!("Block {block_number} finished. Total content key/value pairs: {content_pairs}");
    }
+    temp_directory.close()?;


I guess it's good practice to explicitly close this, but just so I understand, all the explicity closing you're doing isn't necessary correct? Since once temp_directory goes out of scope, the directory will be deleted

Yes, but...

It's important to ensure that handles (like File and ReadDir) to files inside the directory are dropped before the TempDir goes out of scope. The TempDir destructor will silently ignore any errors in deleting the directory; to instead handle errors call TempDir::close.

So calling close() helps us identify if we're doing anything to stop the directory deletion from working.

Also, I wonder if the way we exit trin will let the destructor work. I haven't tinkered in the exit code for a while, so it would be nice to double-check that the new ephemeral directory disappears, at least after a clean trin exit.

Also, I wonder if the way we exit trin will let the destructor work. I haven't tinkered in the exit code for a while, so it would be nice to double-check that the new ephemeral directory disappears, at least after a clean trin exit.

Currently (and after this pr), when running trin or trin-execution with "ephemeral" flag, the folder won't be deleted at exit. The way it works (at least in Linux) is that new directory will be created under /tmp/trin/ (/tmp/trin-execution), that OS will delete during next boot.

I actually like this behavior as it allows me to inspect database manually, or even restart the same node (e.g. using --data-dir), but still count on that data being deleted by the OS on next restart.

The way this works is that you can call TempDir::into_path that will destroy TempDir without deleting the directory. This is used both in trin and trin-execution. But also in couple of tests, where I think it shouldn't (it's easier to have utility function that will initialize the object you are testing, e.g. OverlayService, without having to keep track of TempDir).

I guess it's good practice to explicitly close this, but just so I understand, all the explicity closing you're doing isn't necessary correct? Since once temp_directory goes out of scope, the directory will be deleted

What @carver said, plus doing it explicitly means that you can't accidental call into_path on it (e.g. doing refactoring) that will result in not deleting it.

Ah right, I remember this debate about ephemeral, now. Yes, I see the value in being able to inspect. I also am slightly annoyed that running tests spams my /tmp, especially when running something like cargo watch -x 'test ...' which can run quite a few tests. (and I reboot my laptop ... irregularly 😆 ) But if I'm forced to choose, the inspection currently wins out, on balance.

I guess we could find some way to alter this behavior for just tests, but I'm not sure what that solution would look like, and if you don't see a quick way to do that, then I don't want to hold up the PR for it.

In my opinion, tests should delete their TempFile once they are done. But they need to be refactored to behave like that. I plan to look into this next week, during my 🦩 week.

And for completeness, as I believe I didn't explain this in details in the previous post.
If you have TempDir and you need PathBuf, you can do it in two ways:

temp_dir.into_path() - which will destroy TempDir and return PathBuf that you can use

this will not delete the directory at all

temp_dir.path().to_path_buf() - this returns &Path that is converted to PathBuf

this will delete temporary directory when temp_dir goes out of scope (or temp_dir.close() is called

in order for this to work correctly in test, you have to keep temp_dir as variable in the body of the test itself, otherwise directory might be deleted before test finishes (e.g. if temp_dir is used only in utility function that initializes everything)

njgheorghita · 2024-09-20T19:01:56Z

trin-execution/src/main.rs

-        false => None,
-        true => Some(setup_temp_dir()?),
-    };
+    let data_dir = setup_data_dir(


nitpick not sure if it's confusing that we accept an env var in trin but not here... Maybe it's worth adding a check just to see if the env var is set, and displaying a warning that it's not being used? Or maybe not...

Trin is using TRIN_DATA_PATH. Should we do a check for that one (which doesn't make much sense), or for TRIN_EXECUTION_DATA_PATH (which doesn't exist and is never mentioned)? Or should I come up with new, shared, name?

I'm more in favor of completely removing support for env vars, but @carver made an argument that he finds it useful, so I'm leaving it. And if we are leaving that one, I'm not against adding another one (e.g. TRIN_EXECUTION_DATA_PATH), and have the same behavior. Thoughts?

If we want to add an environment variable to trin-execution I think it should have a different variable name then for trin

shared, name

I would not be a fan of ^. Because you might be running both on the same system.

In any case Trin will likely be used as a library in both

portal bridge

Trin Execution

One day (not fully related, just sharing what I think)

Yeah, I'm also inclined more towards different name, if we want to add one.

carver

LGTM

morph-dev added cleanup Tidy up existing code or remove dead code refactoring Code refactoring labels Sep 20, 2024

morph-dev self-assigned this Sep 20, 2024

morph-dev requested review from carver, njgheorghita, KolbyML and ogenev September 20, 2024 14:55

morph-dev marked this pull request as ready for review September 20, 2024 14:57

KolbyML approved these changes Sep 20, 2024

View reviewed changes

njgheorghita approved these changes Sep 20, 2024

View reviewed changes

carver approved these changes Sep 21, 2024

View reviewed changes

feat: unify data directory cli

933b9f0

morph-dev force-pushed the data_dir_cli branch from 54e4036 to 933b9f0 Compare September 22, 2024 07:31

morph-dev merged commit 286adfb into ethereum:master Sep 22, 2024
9 checks passed

morph-dev deleted the data_dir_cli branch September 22, 2024 07:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: unify data directory cli #1469

feat: unify data directory cli #1469

morph-dev commented Sep 20, 2024 •

edited

Loading

KolbyML left a comment

njgheorghita Sep 20, 2024

carver Sep 21, 2024

morph-dev Sep 21, 2024

carver Sep 21, 2024

morph-dev Sep 21, 2024

njgheorghita Sep 20, 2024

morph-dev Sep 21, 2024

KolbyML Sep 21, 2024

morph-dev Sep 21, 2024

carver left a comment

feat: unify data directory cli #1469

feat: unify data directory cli #1469

Conversation

morph-dev commented Sep 20, 2024 • edited Loading

What was wrong?

How was it fixed?

To-Do

KolbyML left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carver left a comment

Choose a reason for hiding this comment

morph-dev commented Sep 20, 2024 •

edited

Loading