Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement]: Add list, remove, clear, and size operations to shuttle-persist #1066

Merged
merged 28 commits into from
Aug 17, 2023
Merged

[Improvement]: Add list, remove, clear, and size operations to shuttle-persist #1066

merged 28 commits into from
Aug 17, 2023

Conversation

sentinel1909
Copy link
Contributor

@sentinel1909 sentinel1909 commented Jul 2, 2023

Description of change

This PR adds list and remove operations to shuttle-persist, closes issue #1052.
Looking for feed back to see if I'm on the right track. I feel I'm not handling errors in quite the right way yet. The only way to learn is to practice and ask for feedback.

How has this been tested? (if applicable)

Created two unit tests, test_list and test_remove, to test the new functionality. All tests passed.

@jonaro00
Copy link
Member

jonaro00 commented Jul 3, 2023

  1. Use cargo fmt to get past that step in CI.
  2. You can refactor the error handling by following the pattern in the other methods. The key is map_err and the ? shorthand return.
  3. You can probably squeeze in the remove operation in this same PR. This feature is not in a rush, and it is nice if the updates are released together.

@sentinel1909
Copy link
Contributor Author

  1. Use cargo fmt to get past that step in CI.
  2. You can refactor the error handling by following the pattern in the other methods. The key is map_err and the ? shorthand return.
  3. You can probably squeeze in the remove operation in this same PR. This feature is not in a rush, and it is nice if the updates are released together.

I will be going back to the drawing board, as the saying goes. I'm having great difficulty with the ? operator and error conversion.

Will definitely work in the remove feature.

Glad there's no rush here because this is a meaty bone for me to chew on and it's going to take more time.

…le-persist-list

2023-07-03 update with shuttle-hq rep
@sentinel1909
Copy link
Contributor Author

One thing I forgot to ask, comments or no comments? I noticed there aren't any elsewhere around where I'm working.

@jonaro00
Copy link
Member

jonaro00 commented Jul 3, 2023

If they provide value/context about why the code does X when it can't be directly understood from reading the code, you can leave them in (and even add ones to the other methods). It is also good to use doc comments /// on public functions, but the function names are pretty self-explanatory in this case.
In summary, yes, I think some of your comments are too verbose.

@sentinel1909 sentinel1909 changed the title [Improvement]: Add list operation to shuttle-persist [Improvement]: Add list and remove operations to shuttle-persist Jul 3, 2023
@sentinel1909
Copy link
Contributor Author

Tests are a failin' where it matters...having a "but it worked on my machine" moment. :) Back to the head banging.

@jonaro00
Copy link
Member

jonaro00 commented Jul 3, 2023

It is starting to look really good now!

One thing that I would like to see it for the file name to be stripped of .bin when listing, so that (pseudo-code) remove(list()[0]) will work.

Did you also want to add clear()? It should not be much more than one line of code.

@sentinel1909
Copy link
Contributor Author

sentinel1909 commented Jul 3, 2023

Ooooo...challenge accepted :) The clear() method would nuke all the keys from orbit I assume?

@sentinel1909
Copy link
Contributor Author

Ooooo...challenge accepted :) The clear() method would nuke all the keys from orbit I assume?

Looks like fs::remove_dir_all will do the trick and is indeed a one-liner. fs::remove_dir won't cut it because it won't work if directory isn't empty.

@sentinel1909
Copy link
Contributor Author

One thing that I would like to see it for the file name to be stripped of .bin when listing, so that (pseudo-code) remove(list()[0]) will work.

Ok, clear() was easy and is done with a test. For the refinement to list, it should strip off the .bin from the file extension somewhere before going into the list vector that gets returned, correct? This would allow you to pass in a list with the index as you say to another method like remove. This will cause some headscratching but I'm gonna get it! :)

@sentinel1909
Copy link
Contributor Author

The remove method will accept a list_item parameter, which is a string. This could be the output from list(), called as you outlined in the commend above, i.e. remove(list()[0])

@jonaro00
Copy link
Member

jonaro00 commented Jul 4, 2023

The current state of the remove function will just iterate over each item and then delete the last index. I still think remove should just delete one key. What I meant with remove(list()[0]) is that remove() should be able to successfully delete any arbitrary key from the vector that list() returns (in my example, the 0'th). This was why I hinted about removing the .bin suffix.

@sentinel1909
Copy link
Contributor Author

The current state of the remove function will just iterate over each item and then delete the last index. I still think remove should just delete one key. What I meant with remove(list()[0]) is that remove() should be able to successfully delete any arbitrary key from the vector that list() returns (in my example, the 0'th). This was why I hinted about removing the .bin suffix.

Understood.

Copy link
Member

@jonaro00 jonaro00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are making good progress! 🥳 Here are some suggestions.

.into_os_string()
.into_string()
.unwrap_or_else(|_| "path contains non-UTF-8 characters".to_string())
.split(".bin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting here works, but can break: if a key aaa.binbbb is saved, that would make the file aaa.binbbb.bin, which after this split+collect becomes aaabbb. I would say trimming the file extension from the end is better. You can check out std::path::Path::with_extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struggling here. Not seeing how std:::path::Path::with_extension helps. I'm getting better at interpreting the docs content, but I have a ways to go. Will resume the head banging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. The dir.path() method gives you a PathBuf, and there the method is called set_extension(). The idea is to modify the path of the dir entry before converting it to a string, making it more "properly" handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. The dir.path() method gives you a PathBuf, and there the method is called set_extension(). The idea is to modify the path of the dir entry before converting it to a string, making it more "properly" handled.

Yes!! I was just looking around on StackOverflow and came to the same conclusion...dir.path() gives back a PathBuf! Not being clear on that was the missing piece. Unfortunately I have to return to work today, but I will continue to work this evening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have adjusted the list method to a good form now, I feel.

}

/// remove method deletes a key from the PersistInstance
pub fn remove(&self, key: String) -> Result<(), PersistError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since save and load takes the key as a &str, it makes sense that this also does. This means that you will need .as_str() in the test case.

Copy link
Contributor Author

@sentinel1909 sentinel1909 Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Waffled endlessly here. The compiler suggested it would be ok to borrow, in the test. I think you're right though, as_str() conveys intent better. Thanks for confirming that staying consistent with the rest is best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now using .as_str() in the assocated test case.


persist.save("test_list", "test_list").unwrap();

let result = vec!["shuttle_persist/test_list/test_list".to_string()];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the resulting list item really contain the full path like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, list returns the key name only.

assert_eq!(
clear_result.to_string(),
"failed to list contents of folder: No such file or directory (os error 2)"
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is unintuitive that calling list() on an empty persist store returns an error. If you also call create_dir_all in list() like save() does, you can make sure that the folder always exists when asked for, and an empty Vec can be returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now just removing the files instead of the entire folder, test adjusted accordingly.

@sentinel1909
Copy link
Contributor Author

I would like to embelish the test for the clear method to create multiple keys within the sample instance of persist, list the result (should contain multiple keys), then remove a random result. This doesn't seem to work. I only ever get one instance of persist with one key. Creating multiples (with a for loop) just overwrites over top of the same one. I will continue to think about this.

@sentinel1909 sentinel1909 changed the title [Improvement]: Add list and remove operations to shuttle-persist [Improvement]: Add list, remove, clear, and size operations to shuttle-persist Jul 15, 2023
@jonaro00
Copy link
Member

The implementation of size() surprised me, as the initial suggestion was

Nice! I think it would be interesting to have a .size() method, right? It could just count the amount of files in the folder

I was thinking this would simply be a shorthand for checking the len() of list(), but that is quite trivial for the end user. Knowing the persist's storage size on disk might be useful if the future introduces storage limits, but I don't see it being useful for a user yet. We can keep it as is if the doc comment clearly states what it does, so it is not confused with the above.


The randomness added in the tests are strange to me. You generally want to avoid randomness in tests. While the loops you constructed will seem to always work, the randomness adds no "proof" that it works.
Now, the number of keys N handled in those tests are between 1 and 20, but ensuring it works for 1 and 2 should be enough to be sure that it works for larger N.
One thing that the tests are missing is N=0 (aka does the persist store work as expected when empty?).
You could add N=0 checks before and after some of the tests, for example

persist.remove("abc") // assert this is an Err (or just unwrap_err)
persist.save("abc", "xyz")
persist.remove("abc") // assert this is an Ok (or just unwrap)
persist.remove("abc") // assert this is an Err (or just unwrap_err)

The same pattern can repeat where relevant, for example, checking if list() gives an empty list or an error when empty.

@sentinel1909
Copy link
Contributor Author

sentinel1909 commented Jul 17, 2023

I added a new() method, which looks like this:

pub fn new(name: &str) -> Self {
     PersistInstance { service_name: ServiceName::from_str(name).unwrap() }

The name gets passed in as a parameter when you call new() and then you get back your PersistInstance. I used this in the tests and it works nicely. Two things:

  1. Not sure if this is wanted, if I'm going off script, let me know and I'll change it back.
  2. The unwrap() worries me and I'm not sure how to propgate the error nicely with the ? operator.

I've thrown all the tests out and redid them. I haven't committed yet as I still need to address the N=0 checks as noted above. I'm also thinking the clear() method is wrong and far too verbose.

@jonaro00
Copy link
Member

I'd say we can wait with involving ServiceName since it looks like there are huge changes regarding those coming soon. (To propagate with ?, you would have made the new() function return Result<Self, ...>)

clear() could take the approach of just removing and re-creating the directory.

@sentinel1909
Copy link
Contributor Author

sentinel1909 commented Jul 24, 2023

Looks like I failed a couple of the integration tests for cargo-shuttle. Looked at the details, not sure how to resolve the issues.

Copy link
Member

@jonaro00 jonaro00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! We are close to being finished now :)

@@ -12,3 +12,6 @@ bincode = "1.2.1"
serde = { version = "1.0.0", features = ["derive"] }
shuttle-service = { path = "../../service", version = "0.21.0" }
thiserror = "1.0.32"

[dev-dependencies]
rand = "0.8.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this anymore :)

.file_stem()
.unwrap_or_default()
.to_str()
.unwrap_or("file name contains non-UTF-8 characters")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error should propagate instead of being turned into a key name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is causing a bit of head bashing, but will continue to bash. I'm not seeing how to propagate right in this moment, but will get there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can take a shortcut and map the error to a ListFolder and propagate with ? like the statement before it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I tried but it's not working. The .to_str() method gives off an Option, which I thought .ok_or() would help me convert into a Result<T, E> which I could then .map_err on, but nope. Will get back at it this evening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, well then don't you already have a Result<&str, PersistError> after doing .ok_or(PersistError::ListFolder)? If so, you can do the ? immediately, then to_string().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm...might just have to take my laptop with me to work today and continue to work on this at lunch :)

Copy link
Contributor Author

@sentinel1909 sentinel1909 Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I've got this worked out. I figured out how to use ok_or to convert the error related to the potential for invalid Unicode characters. Here's the revised list method:

/// list method returns a vector of strings containing all the keys associated with a PersistInstance
    pub fn list(&self) -> Result<Vec<String>, PersistError> {
        let storage_folder = self.get_storage_folder();

        let mut list = Vec::new();

        let entries = fs::read_dir(storage_folder).map_err(PersistError::ListFolder)?;
        for entry in entries {
            let key = entry.map_err(PersistError::ListFolder)?;
            let key_name = key
                .path()
                .file_stem()
                .unwrap_or_default()
                .to_str()
                .ok_or("the file name contains invalid characters").map_err(PersistError::ListName)?
                .to_string();
            list.push(key_name);
        }
        Ok(list)
    }

I did have to introduce a lifetime specifier on our PersistError enum, to satisfy the compiler:

#[derive(Error, Debug)]
pub enum PersistError <'a> {
    #[error("failed to open file: {0}")]
    Open(std::io::Error),
    #[error("failed to create folder: {0}")]
    CreateFolder(std::io::Error),
    #[error("failed to list contents of folder: {0}")]
    ListFolder(std::io::Error),
    #[error("failed to list the file name: {0}")]
    ListName(&'a str),
    #[error("failed to clear folder: {0}")]
    RemoveFolder(std::io::Error),
    #[error("failed to remove file: {0}")]
    RemoveFile(std::io::Error),
    #[error("failed to serialize data: {0}")]
    Serialize(BincodeError),
    #[error("failed to deserialize data: {0}")]
    Deserialize(BincodeError),
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that error does not need a test case, since it should realistically never happen. Since the keys (file names) are Strings when they are saved, they should be valid utf-8 when reading the directory as well.

#[test]
fn test_list_and_size() {
let persist = PersistInstance {
service_name: ServiceName::from_str("test1").unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are looking really good now! However, there is one case that is not yet covered, but it might simplify the code.

If list/load/remove etc are called before any key has been saved, they will fail due to the folder not existing yet. This can be avoided if we ensure the folder exists before any method is called.

My idea is to add a method new(service_name: ServiceName) -> Result<Self, PersistError> that creates the folder upfront. Creating the folder at this time also means it is not needed in save.
This would mean all constructions of PersistInstance { ... } change to PersistInstance::new(...) + .unwrap()/?. It also means that the error needs to map_err into a shuttle_service::Error (or other way of converting) in ResourceBuilder::output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a new method after all then...got it.

Copy link
Contributor Author

@sentinel1909 sentinel1909 Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing a push as a checkpoint to show where I'm at. As a start, I've made a skeleton new method that instantiates a PersistInstance struct, given a ServiceName. I've substituted this in the tests and all works as before. I'm having difficulty now understanding how to get the folder creation incorporated. If I put a &self in as a parameter to new, how do I then instantiate in the tests? Just need a couple of breadcrumbs to set the path forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Good point.
I think this is a nice workaround. Using the fact that we "have" the instance (self) in the new() function since we are creating it right there.

fn new(...) -> Result<...> {
    let instance = Self { ... };
    let directory = instance.get_storage_folder();
    fs::create_dir_all(...).map_err(...)?;

    Ok(instance)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also means that the error needs to map_err into a shuttle_service::Error (or other way of converting) in ResourceBuilder::output.

Having some difficulty now with this piece. I understand what you're saying but am not sure how to make it happen. I will reflect through the day today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, time to admit it. After sititng staring at the code for a fair bit of time, I have no clue how to convert the errors into a shuttle_service::Error in ResourceBuilder::output. I think I need a hint.

Copy link
Member

@jonaro00 jonaro00 Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works

// Get the PersistError in, let's say, a match statement, then
return Err(shuttle_service::Error::Custom(
    /*The PersistError*/.into(),
));

EDIT: based on the above I would imagine that something like this will work:

PersistInstance::new(...).map_err(|e| shuttle_service::Error::Custom(e.into()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I feel I've got the new method working. It will convert a PersistError into a shuttle_service::Error. I'm having difficulty with a test though. All I managed to do yesterday evening was test ServiceName, which panics if you pass it something invalid. This doesn't really test the new method erroring out because of an issue with create_dir_all. Also, the cargo-shuttle circleci check is still failing, and I'm not clear why (I have tried to read the log output) or how to resolve it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the create_dir_all is hard to test failures of. Perhaps creating a folder in a root-owned directory, but that might not be reproducible on all machines. I would say that we can trust it without a test.

The CI fail on cargo-shuttle init is a sporadic error (a hard one to fix :/ ), and not related to your changes.

Comment on lines 49 to 51
return Err(shuttle_service::Error::Custom(
PersistError::CreateFolder(e).into(),
))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought with converting to a shuttle_service::Error::Custom was to have that happen in ResourceBuilder::output() (since the trait requires a shuttle_service::Error type in that result). The new function can simply return this PersistError with map_err and ?, similar to all other errors in this struct. The construction of PersistInstance in output should then use ::new(...) and convert the error if it happens there (with this match clause or map_err).
(The code looks correct, it is just missing the usage of new() in the builder 😄)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will sort it out this weekend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got it! Except for the failing cargo-shuttle circleci :(

@jonaro00
Copy link
Member

This is looking pretty complete now 🥳! Thanks for the consistent work and tolerating my annoying feedback 😄. I'll let the team know this is ready for review.

@sentinel1909
Copy link
Contributor Author

This is looking pretty complete now 🥳! Thanks for the consistent work and tolerating my annoying feedback 😄. I'll let the team know this is ready for review.

Thank you for your patience and guidance! Working this solidified a lot of things for me, it was extremely valuable. The code base is way less scary now :)

@sentinel1909
Copy link
Contributor Author

Wondering if there is any further work to be done here? Are upcoming changes going to make this obsolete and not necessary? Happy to refine this work further if needed.

@jonaro00 jonaro00 requested review from iulianbarbu and oddgrd August 16, 2023 18:54
@oddgrd
Copy link
Contributor

oddgrd commented Aug 16, 2023

Sorry for the delay, Jeff! This looks great, we just haven't been able to get to the final review yet. We'll get to it this week or early next week. 🙂

Copy link
Contributor

@oddgrd oddgrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thanks for all your work on this @sentinel1909! 🥳 And thanks for the mentoring @jonaro00! I left a very minor comment about the doc comments, we may want to change them a bit.

Oh, and it would be great to have the persist docs in the docs repo updated with these new features as well. If you have time, if not we can also do it. 🙂

let file_path = self.get_storage_file(key);
let file = File::create(file_path).map_err(PersistError::Open)?;
let mut writer = BufWriter::new(file);
Ok(serialize_into(&mut writer, &struc).map_err(PersistError::Serialize))?
}

/// list method returns a vector of strings containing all the keys associated with a PersistInstance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we capitalize these doc comments like we do elsewhere in the codebase? I also don't think we need "list method" as the first part of the comment.

Suggested change
/// list method returns a vector of strings containing all the keys associated with a PersistInstance
/// Returns a vector of strings containing all the keys associated with a [PersistInstance]

@jonaro00 jonaro00 merged commit 31dec11 into shuttle-hq:main Aug 17, 2023
@sentinel1909 sentinel1909 deleted the improvement-shuttle-persist-list branch August 26, 2023 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants