-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
path/filepath: add WalkDir (Walk using DirEntry) #42027
Comments
Change https://golang.org/cl/243916 mentions this issue: |
I've retracted #41974. Let's focus on this (WalkDir) as the replacement for Walk. What I like most about this API is that updating existing code requires almost no effort at all: change Walk to WalkDir, change FileInfo to DirEntry, maybe rename info to d, and in 90% of cases you're done and have a more efficient file system traversal that does the same thing as the original code. |
There is one detail that still bothers me that might be worth changing. Suppose you want to do a walk but ignore all testdata directories. In the WalkFunc you write:
That works fine. But behind the scenes, filepath.Walk already did a full directory read from testdata before calling the WalkFunc. Because the dir ended up being skipped, that ReadDir was entirely wasted effort. Often one reason to skip a directory is that it's big (like a cache). Doing a ReadDir on a big directory that you are going to skip is unfortunate. (It's nice that it's a ReadDir and not a Readdir, so you didn't spend tons of time calling Stat on every entry on Unix systems, but still, it's wasted effort.) One of the complications of #41974 was defining that both entry and exit from a directory appeared in the iteration; the equivalent here would be calling the WalkFunc twice for a directory: both before and after. We clearly do not want to do that. But I wonder if instead we should define that a directory read error (only) can result in a second call to the WalkFunc with the same path, to report the error. That is, to walk a directory, WalkDir does basically:
In addition to avoiding an expensive ReadDir that is not needed, this has the benefit of presenting the early children of a directory even if the directory read fails later in the directory. The current filepath.Walk throws away any children that were found when a read error also occurs. That's clearly a mistake, which would be good to fix in a new API but may be too subtle to fix in the existing API. And then the error is reported after the children that are available. The one downside of course is that the WalkFunc is called twice for a directory with a read error: once for the existence of the directory itself, which is error-free, and then again when the directory read fails. This seems like a clearer separation of concerns, but at the cost of two calls with the same path. Over on #41974 (comment), @ianlancetaylor wrote:
The "extra callback only for directory read error" I'm suggesting here is the equivalent to what Ian suggested, but for the callback API. It seems like a reasonable solution to me. What do other people think about doing this? |
I expect the vast majority of if err != nil {
log.Print(err)
return nil
} or equivalent, or if err != nil {
return err
} right at the beginning. For the latter functions, the second call for a For the former functions — the ones that just The only |
Using a second Adding Just to clarify, is the intent to add a potentially different |
@mpx, I think io/fs needs a walk API at the start, and if filepath.WalkDir exists, then fs.WalkDir should too. That doesn't preclude adding another one later, but it does make it unlikely without a really compelling case. |
That seems like an equally compelling argument for not including any official API (or starting it out in golang.org/x/ until it's worth is proven) |
(I have no beef in this game, but...) (One of just a couple cases I have ever used |
I'm not convinced The opportunity to explore and provide a better API should be balanced against the benefit of having a standard library implementation immediately. If the later wins, the current proposal seems relatively safe since it's close to the existing approach. |
To me, the ability to easily convert callers using
|
@mpx, I hear you, but as I noted above, I disagree. io/fs should be as capable as the existing library routines. |
Based on the discussion above, this (including the extra callback for reporting directory read errors) seems like a likely accept. |
I agree w/@bcmills argument for supporting easy migration of existing code - this does seem like the best approach for now. Better APIs can still be provided elsewhere, and potentially considered for the standard library if there is enough benefit. |
Change https://golang.org/cl/266240 mentions this issue: |
I hava a question here may not related to this proposal. why we need this deterministic ? it is unfortunate for a large directory. |
A lot of existing code benefits from assuming paths are walking in lexigraphical order. The proposed (sorted) API will continue to simplify new usage, and support easier migration of old code. Ideally, it would be good to support some different tradeoffs. Eg, conserving memory vs file descriptors, performance vs ease of use,... However, this isn't practical with the proposed API since it would need to be parameterised. It will be easy enough to create a different implementation/API supporting a different set of tradeoff when it matters. Eg, some code will see significant performance improvements from keeping file descriptors open and processing in DirEntry order. Maybe one of these APIs might be clean enough and desirable enough to add to the standard library in future. |
No change in consensus, so accepted. |
Change https://golang.org/cl/267719 mentions this issue: |
Reopening because CL 266240 was reverted in CL 267798; it needs to be re-sent. |
Change https://golang.org/cl/267887 mentions this issue: |
This commit is a copy of filepath.WalkDir adapted to use fs.FS instead of the native OS file system. It is the last implementation piece of the io/fs proposal. The original io/fs proposal was to adopt filepath.Walk, but we have since introduced the more efficient filepath.WalkDir (#42027), so this CL adopts that more efficient option instead. (The changes in path/filepath bring the two copies more in line with each other. The main change is unembedding the field in statDirEntry, so that the fs.DirEntry passed to the WalkDirFunc for the root of the tree does not have any extra methods.) For #41190. Change-Id: I9359dfcc110338c0ec64535f22cafb38d0b613a6 Reviewed-on: https://go-review.googlesource.com/c/go/+/243916 Trust: Russ Cox <[email protected]> Run-TryBot: Russ Cox <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Rob Pike <[email protected]>
Now that filepath.WalkDir is available, it is more efficient and should be used in place of filepath.Walk. Update the tree to reflect best practices. As usual, the code compiled with Go 1.4 during bootstrap is excluded. (In this CL, that's only cmd/dist.) For #42027. Change-Id: Ib0f7b1e43e50b789052f9835a63ced701d8c411c Reviewed-on: https://go-review.googlesource.com/c/go/+/267719 Trust: Russ Cox <[email protected]> Run-TryBot: Russ Cox <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Rob Pike <[email protected]>
Change https://golang.org/cl/285595 mentions this issue: |
For #40700 For #42027 Change-Id: Ifb73050dfdab21784fa52d758ad9c408e6489684 Reviewed-on: https://go-review.googlesource.com/c/go/+/285595 Trust: Ian Lance Taylor <[email protected]> Reviewed-by: Brad Fitzpatrick <[email protected]>
Go 1.16 adds a more efficient routine to walk the filesystem. See golang/go#42027 This speeds up traversal of a directory with 70K files from ~1.3 seconds to ~0.9 seconds.
There are a few annoyances with filepath.Walk, but the biggest problem is that it is needlessly inefficient. The new ReadDir API (#41467) provides a way to avoid the inefficiency in the implementation, but that must be paired with a new API that does not offer a FileInfo to the callback, since obtaining the FileInfo is the expensive part.
#41974 proposes a new API with an iterator object. That may or may not be a good idea.
If that one doesn't work out, here's a smaller change: add WalkDir that replaces FileInfo with DirEntry but otherwise behaves exactly the same as Walk:
The only changes here are s/Walk/WalkDir/g and s/FileInfo/DirEntry/g.
The text was updated successfully, but these errors were encountered: