Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for fs::walk_dir #27707

Closed
alexcrichton opened this issue Aug 12, 2015 · 15 comments · Fixed by #30187
Closed

Tracking issue for fs::walk_dir #27707

alexcrichton opened this issue Aug 12, 2015 · 15 comments · Fixed by #30187
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

This is a tracking issue for the fs_walk unstable feature in the standard library. Some of the open questions currently are:

  • There are many many ways to walk a directory, and there probably wants to be something like a WalkBuilder structure.
  • The current walking strategy holds open a lot of file descriptors - walk_dir uses too many file descriptors #23715
  • The default walking strategy has not been well audited for all manner of fs corner cases

The C++ <filesystem> implementation of walking directories is quite a useful reference in terms of what options we may want as well as what we haven't implemented. Note that this is a large enough feature that this may want to be prototyped outside the standard library first and then perhaps consider moving in at a later date.

@alexcrichton alexcrichton added A-io T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. labels Aug 12, 2015
@sfackler
Copy link
Member

I've found Java's Files::walk_file_tree really useful since it supports things like stopping iteration early and running code at different points in the iteration.

https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#walkFileTree-java.nio.file.Path-java.nio.file.FileVisitor-

@BurntSushi
Copy link
Member

I'd like to take the conn on this, and echo this thought:

Note that this is a large enough feature that this may want to be prototyped outside the standard library first and then perhaps consider moving in at a later date.

I've taken a brief survey other other implementations which range from every-feature-in-the-book (ftw in glibc) to very minimal (Go's filepath.Walk), so there's a lot of variation on what we can do. I'd like to start with the following goals:

  • Fix the "too many open fds" issue. I am particularly fond of making the common case fast and falling back to some kind of memory allocation if we exceed some preset (but configurable) limit on the number of open file descriptors. (I think this probably means switching to depth first search, since it "feels" like shallow but bigger directories are more common than very deeply nested directories. In particular, with depth first search, we can guarantee that at most one file descriptor is open for each depth.)
  • Provide some kind of facility for traversing symlinks (but this should be configurable and will require extra resources if we want to preserve the invariant that every item is yielded exactly once).
  • Provide control over iteration (e.g., "don't descend into this directory" or "skip the rest of this directory.")

Most of the interfaces I've seen use a user-provided function to control iteration or even response to errors. I feel like we should probably pursue the same functionality but with iterators if possible, although I'm not sure what it will look like yet. Possibly methods on the iterator itself? (Although this is a design that is not too common in the Rust world.)

How does this sound for a reasonable starting point?

@SimonSapin
Copy link
Contributor

This may also be worth looking at:

What’s New In Python 3.5: PEP 471 - os.scandir() function – a better and faster directory iterator

@huonw
Copy link
Member

huonw commented Sep 17, 2015

@SimonSapin fwiw, that PEP has apparently already been taken into account for the current design: #16236.

@SimonSapin
Copy link
Contributor

Never mind, then :)

@alexcrichton
Copy link
Member Author

@BurntSushi that all sounds great to me! I know that whenever I've wanted to use a fs walk-like function I've always wanted the option to say "don't descend into this directory" so it's certainly something I'd like to see!

This definitely seems like a good candidate for development outside the standard library as it can be a relatively meaty piece of functionality and in theory our fs bindings should give you all the support you need to build this.

@BurntSushi
Copy link
Member

I've finished my initial implementation: http://burntsushi.net/rustdoc/walkdir/

Overall, I'm pretty happy with it. It isn't clear to me whether it's destined for std. In particular, the crate's custom DirEntry type has grown a fair bit of complexity, mostly due to the handling of following symbolic links. Should I just let it bake on crates.io for now? Write an RFC? (I have acquired a fair amount of institutional knowledge that might be good to encode into an RFC, even if the intention is to postpone.) Do we deprecate std::fs::walk_dir?

@alexcrichton
Copy link
Member Author

I think the best path forward would be to maybe let it bake for a bit, move it to the nursery, see how it turns out, and then eventually write an RFC to become an official rust-lang crate, and perhaps finally move it into the standard library. I don't necessarily feel an urgency to move it into the standard library right now, and the fast iteration outside seems quite useful for now!

@BurntSushi
Copy link
Member

Sounds good to me!

@steveklabnik
Copy link
Member

👍

On Mon, Sep 28, 2015 at 6:56 AM, Andrew Gallant [email protected]
wrote:

Sounds good to me!


Reply to this email directly or view it on GitHub
#27707 (comment).

@aturon
Copy link
Member

aturon commented Nov 3, 2015

I wonder if we should go ahead and deprecate the in-tree version? Is there a path toward stabilization as-is?

Nominating for 1.6 discussion.

@sfackler
Copy link
Member

sfackler commented Nov 3, 2015

I'd vote to deprecate as well.

@BurntSushi
Copy link
Member

Deprecation sounds good. Stabilizing as-is definitely seems bad.

@alexcrichton
Copy link
Member Author

🔔 This issue is now entering its cycle-long final comment period for deprecation 🔔

@alexcrichton alexcrichton added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed I-nominated labels Nov 5, 2015
@alexcrichton
Copy link
Member Author

The libs team discussed this in triage today and the decision was to deprecate

alexcrichton added a commit to alexcrichton/rust that referenced this issue Dec 5, 2015
This commit is the standard API stabilization commit for the 1.6 release cycle.
The list of issues and APIs below have all been through their cycle-long FCP and
the libs team decisions are listed below

Stabilized APIs

* `Read::read_exact`
* `ErrorKind::UnexpectedEof` (renamed from `UnexpectedEOF`)
* libcore -- this was a bit of a nuanced stabilization, the crate itself is now
  marked as `#[stable]` and the methods appearing via traits for primitives like
  `char` and `str` are now also marked as stable. Note that the extension traits
  themeselves are marked as unstable as they're imported via the prelude. The
  `try!` macro was also moved from the standard library into libcore to have the
  same interface. Otherwise the functions all have copied stability from the
  standard library now.
* The `#![no_std]` attribute
* `fs::DirBuilder`
* `fs::DirBuilder::new`
* `fs::DirBuilder::recursive`
* `fs::DirBuilder::create`
* `os::unix::fs::DirBuilderExt`
* `os::unix::fs::DirBuilderExt::mode`
* `vec::Drain`
* `vec::Vec::drain`
* `string::Drain`
* `string::String::drain`
* `vec_deque::Drain`
* `vec_deque::VecDeque::drain`
* `collections::hash_map::Drain`
* `collections::hash_map::HashMap::drain`
* `collections::hash_set::Drain`
* `collections::hash_set::HashSet::drain`
* `collections::binary_heap::Drain`
* `collections::binary_heap::BinaryHeap::drain`
* `Vec::extend_from_slice` (renamed from `push_all`)
* `Mutex::get_mut`
* `Mutex::into_inner`
* `RwLock::get_mut`
* `RwLock::into_inner`
* `Iterator::min_by_key` (renamed from `min_by`)
* `Iterator::max_by_key` (renamed from `max_by`)

Deprecated APIs

* `ErrorKind::UnexpectedEOF` (renamed to `UnexpectedEof`)
* `OsString::from_bytes`
* `OsStr::to_cstring`
* `OsStr::to_bytes`
* `fs::walk_dir` and `fs::WalkDir`
* `path::Components::peek`
* `slice::bytes::MutableByteVector`
* `slice::bytes::copy_memory`
* `Vec::push_all` (renamed to `extend_from_slice`)
* `Duration::span`
* `IpAddr`
* `SocketAddr::ip`
* `Read::tee`
* `io::Tee`
* `Write::broadcast`
* `io::Broadcast`
* `Iterator::min_by` (renamed to `min_by_key`)
* `Iterator::max_by` (renamed to `max_by_key`)
* `net::lookup_addr`

New APIs (still unstable)

* `<[T]>::sort_by_key` (added to mirror `min_by_key`)

Closes rust-lang#27585
Closes rust-lang#27704
Closes rust-lang#27707
Closes rust-lang#27710
Closes rust-lang#27711
Closes rust-lang#27727
Closes rust-lang#27740
Closes rust-lang#27744
Closes rust-lang#27799
Closes rust-lang#27801
cc rust-lang#27801 (doesn't close as `Chars` is still unstable)
Closes rust-lang#28968
bors added a commit that referenced this issue Dec 6, 2015
This commit is the standard API stabilization commit for the 1.6 release cycle.
The list of issues and APIs below have all been through their cycle-long FCP and
the libs team decisions are listed below

Stabilized APIs

* `Read::read_exact`
* `ErrorKind::UnexpectedEof` (renamed from `UnexpectedEOF`)
* libcore -- this was a bit of a nuanced stabilization, the crate itself is now
  marked as `#[stable]` and the methods appearing via traits for primitives like
  `char` and `str` are now also marked as stable. Note that the extension traits
  themeselves are marked as unstable as they're imported via the prelude. The
  `try!` macro was also moved from the standard library into libcore to have the
  same interface. Otherwise the functions all have copied stability from the
  standard library now.
* `fs::DirBuilder`
* `fs::DirBuilder::new`
* `fs::DirBuilder::recursive`
* `fs::DirBuilder::create`
* `os::unix::fs::DirBuilderExt`
* `os::unix::fs::DirBuilderExt::mode`
* `vec::Drain`
* `vec::Vec::drain`
* `string::Drain`
* `string::String::drain`
* `vec_deque::Drain`
* `vec_deque::VecDeque::drain`
* `collections::hash_map::Drain`
* `collections::hash_map::HashMap::drain`
* `collections::hash_set::Drain`
* `collections::hash_set::HashSet::drain`
* `collections::binary_heap::Drain`
* `collections::binary_heap::BinaryHeap::drain`
* `Vec::extend_from_slice` (renamed from `push_all`)
* `Mutex::get_mut`
* `Mutex::into_inner`
* `RwLock::get_mut`
* `RwLock::into_inner`
* `Iterator::min_by_key` (renamed from `min_by`)
* `Iterator::max_by_key` (renamed from `max_by`)

Deprecated APIs

* `ErrorKind::UnexpectedEOF` (renamed to `UnexpectedEof`)
* `OsString::from_bytes`
* `OsStr::to_cstring`
* `OsStr::to_bytes`
* `fs::walk_dir` and `fs::WalkDir`
* `path::Components::peek`
* `slice::bytes::MutableByteVector`
* `slice::bytes::copy_memory`
* `Vec::push_all` (renamed to `extend_from_slice`)
* `Duration::span`
* `IpAddr`
* `SocketAddr::ip`
* `Read::tee`
* `io::Tee`
* `Write::broadcast`
* `io::Broadcast`
* `Iterator::min_by` (renamed to `min_by_key`)
* `Iterator::max_by` (renamed to `max_by_key`)
* `net::lookup_addr`

New APIs (still unstable)

* `<[T]>::sort_by_key` (added to mirror `min_by_key`)

Closes #27585
Closes #27704
Closes #27707
Closes #27710
Closes #27711
Closes #27727
Closes #27740
Closes #27744
Closes #27799
Closes #27801
cc #27801 (doesn't close as `Chars` is still unstable)
Closes #28968
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants