Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builder: add support for archiving special files on Unix #246

Merged
merged 4 commits into from
Jul 22, 2021

Conversation

liushuyu
Copy link
Contributor

@liushuyu liushuyu commented Apr 3, 2021

Added support for archiving character, block and FIFO files on Unix, this is useful when you want to take a whole system backup or re-archiving an extracted system image.

The file handling in this pull request isn't particularly graceful since I am not very experienced with the codebase, any suggestions are welcome!


This change is Reviewable

@liushuyu liushuyu force-pushed the master branch 2 times, most recently from af50eea to 1d5d533 Compare April 3, 2021 11:50
@liushuyu liushuyu marked this pull request as draft April 3, 2021 12:04
Copy link
Owner

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable enough to me, thanks!

Could this perhaps be refactored a bit though to avoid a few extra calls to fs::metadata being called? I think this could have an internal append function which takes a FileType from libstd and dispatches internally about how to append it? (e.g. more cases would be handled on Unix than on Windows, but regular files would all roughly be handled in the same place)

@liushuyu
Copy link
Contributor Author

Could this perhaps be refactored a bit though to avoid a few extra calls to fs::metadata being called? I think this could have an internal append function which takes a FileType from libstd and dispatches internally about how to append it? (e.g. more cases would be handled on Unix than on Windows, but regular files would all roughly be handled in the same place)

Sorry for the late reply. I am a bit confused by this suggestion: I saw the

let stat = fs::symlink_metadata(&src)?;
part so I assumed one more stat() should be fine here? https://github.com/alexcrichton/tar-rs/pull/246/files#diff-e4f794ca308aa712cbad31feb714df7cada1bbe453a30e232e45891571430263R607

Did you mean to refactor the append_dir_all function so that the stat could be avoided? Or should I do something about append_special function?

@alexcrichton
Copy link
Owner

I mostly just got the feeling reading this that it'd be better to refactor where at some point there's the equivalent of a match on the FileType (or similar) of what to archive, and right now it seems spread across a few locations with perhaps an extra metadata call here or there.

@liushuyu
Copy link
Contributor Author

I have made an attempt to reduce stat calls:

image

src/builder.rs Outdated
// In case of a symlink pointing to a directory, is_dir is false, but src.is_dir() will return true
if is_dir || (is_symlink && follow && src.is_dir()) {
for entry in fs::read_dir(&src)? {
let entry = entry?;
let file_type = entry.file_type()?;
stack.push((entry.path(), file_type.is_dir(), file_type.is_symlink()));
let metadata = entry.metadata()?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately has a side effect where on platforms like Unix this ends up doing a syscall whereas previously accessing file_type generally doesn't require a syscall on most major platforms. The intention was that this loop should primarily only do fs::read_dir as a sycall and otherwise other syscalls can be avoided.

My thinking about refactoring this was largely that this loop has one iteration of "look at the file type and delegate appropriately" but above in append_path_with_name it's basically the same loop. With the append_path_with_name addition it's getting more complicated so I was wondering if it would be possible to refactor this loop and append_path_with_name above to call a common function which is the one location where special files are handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately has a side effect where on platforms like Unix this ends up doing a syscall whereas previously accessing file_type generally doesn't require a syscall on most major platforms. The intention was that this loop should primarily only do fs::read_dir as a sycall and otherwise other syscalls can be avoided.

I have reverted the changes made to this loop.

My thinking about refactoring this was largely that this loop has one iteration of "look at the file type and delegate appropriately" but above in append_path_with_name it's basically the same loop. With the append_path_with_name addition it's getting more complicated so I was wondering if it would be possible to refactor this loop and append_path_with_name above to call a common function which is the one location where special files are handled.

I am not quite sure which one is better:

  1. Make a new delegate function append_special and rename the current append_special to append_special_unix. Then in the append_special function, use fs::metadata to get the metadata and pass it to append_special_unix. In which case, the loop needs to be modified to differentiate a regular file from a special one.
  2. Similar to (1) but instead of using fs::metadata in that function, fs::metadata is used in the "special file" branch of the loop. The advantage of this is one less stat when called from append_path_with_name (since there is one stat in that function as well).
  3. Similar to (1) but making append_special to accept Option<fs::metadata> and stat on-demand.

src/builder.rs Outdated Show resolved Hide resolved
src/builder.rs Outdated Show resolved Hide resolved
tests/all.rs Outdated Show resolved Hide resolved
Added support for archiving character, block and fifo files on Unix,
this is useful when you want to take a whole system backup or
re-archiving an extracted system image
* attempt to reduce `stat` calls
@liushuyu
Copy link
Contributor Author

liushuyu commented Jul 22, 2021

Ping? Is there any update for this Pull Request?

@alexcrichton
Copy link
Owner

Sorry I think this got lost in my inbox, but this looks good to me now!

@alexcrichton alexcrichton merged commit b9cffc8 into alexcrichton:master Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants