Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.Dir.deleteTree optimizations and recursive implementation #13070

Closed
wants to merge 3 commits into from

Conversation

squeek502
Copy link
Collaborator

@squeek502 squeek502 commented Oct 5, 2022

Opening this as a draft for feedback, as there are open questions here:

  • Is a recursive implementation okay? An alternative with the same/similar performance could use allocation instead of recursion.
  • Should the non-recursive/non-allocating version be maintained? If so, which should be the default? How should they be named?

EDIT: See #13070 (comment).


See #13048 for more context. All benchmarking is done with the same folder structure mentioned here:

tested with the folder that resulted from npx create-react-app my-app; 41209 files, 344MiB

There are two separate/unrelated parts to this:


First, this eliminates some redundant deleteFile calls from the current deleteTree implementation while not changing any functionality (see #13048 (comment) and the commit message in the list below for more info). This leads to a decent reduction in syscalls but a pretty insignificant decrease in wallclock time.

Before:

$ strace -c ./rm-master my-app-copy
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 47.82    0.580480           8     65368     24159 unlinkat
 33.35    0.404906          13     29036           getdents64
  7.55    0.091704           3     24159           close
  6.73    0.081669           3     24159           openat
  4.55    0.055245           2     24159           lseek
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    1.214004                166884     24159 total

After:

$ strace -c ./rm-updated my-app-copy
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.90    0.496441          12     41210         1 unlinkat
 37.58    0.406455          13     29036           getdents64
  8.60    0.093017           3     24159           close
  7.93    0.085720           3     24159           openat
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    1.081633                118567         1 total

(note also that the lseek syscalls are also gone; this is via the introduction of IterableDir.iterateAssumeFirstIteration which the recursive implementation also takes advantage of)

Benchmark results:

$ hyperfine "./rm-master my-app-copy" "./rm-updated my-app-copy" --prepare="cp -r my-app my-app-copy" --warmup 1 --runs 20
Benchmark #1: ./rm-master my-app-copy
  Time (mean ± σ):     695.0 ms ±  11.9 ms    [User: 28.7 ms, System: 654.1 ms]
  Range (min … max):   678.5 ms … 718.0 ms    20 runs
 
Benchmark #2: ./rm-updated my-app-copy
  Time (mean ± σ):     683.6 ms ±  28.2 ms    [User: 26.1 ms, System: 639.1 ms]
  Range (min … max):   662.6 ms … 793.6 ms    20 runs
 
Summary
  './rm-updated my-app-copy' ran
    1.02 ± 0.05 times faster than './rm-master my-app-copy'

Second, this adds a more performant version of deleteTree that uses recursion. This version ends up beating rm -rf on my system.

Recursive version:

$ strace -c ./rm-recursive my-app-copy
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 80.75    0.447671          10     41210         1 unlinkat
  8.07    0.044726           4      9744           getdents64
  7.73    0.042867           8      4863           close
  3.45    0.019143           3      4863           openat
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    0.554407                 60683         1 total

(note that without IterableDir.iterateAssumeFirstIteration, there would also be 4863 lseek calls here but they also wouldn't affect performance too much [strace says they are 2% of the time taken in the version with the lseek calls])

rm -rf:

$ strace -c rm -rf my-app-copy
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 67.52    0.499861          12     41209           unlinkat
  9.36    0.069262           4     14585           getdents64
  7.39    0.054712           2     24303           fcntl
  5.64    0.041753           2     14687           close
  4.64    0.034350           3      9825           openat
  3.25    0.024067           2      9825           fstat
  2.20    0.016287           3      4863           newfstatat
... (syscalls with tiny amounts omitted)
------ ----------- ----------- --------- --------- ----------------
100.00    0.740348                119333         3 total

Benchmark results:

$ hyperfine "./rm-recursive my-app-copy" "./rm-updated my-app-copy" "./rm-master my-app-copy" "rm -rf my-app-copy" --prepare="cp -r my-app my-app-copy" --warmup 1 --runs 20
Benchmark #1: ./rm-recursive my-app-copy
  Time (mean ± σ):     380.2 ms ±   7.0 ms    [User: 18.3 ms, System: 354.7 ms]
  Range (min … max):   365.7 ms … 395.9 ms    20 runs
 
Benchmark #2: ./rm-updated my-app-copy
  Time (mean ± σ):     675.3 ms ±  14.0 ms    [User: 30.6 ms, System: 631.4 ms]
  Range (min … max):   652.8 ms … 710.1 ms    20 runs
 
Benchmark #3: ./rm-master my-app-copy
  Time (mean ± σ):     687.5 ms ±   9.2 ms    [User: 29.2 ms, System: 648.7 ms]
  Range (min … max):   671.7 ms … 706.1 ms    20 runs
 
Benchmark #4: rm -rf my-app-copy
  Time (mean ± σ):     405.8 ms ±  22.3 ms    [User: 18.6 ms, System: 374.3 ms]
  Range (min … max):   386.8 ms … 494.2 ms    20 runs

Summary
  './rm-recursive my-app-copy' ran
    1.07 ± 0.06 times faster than 'rm -rf my-app-copy'
    1.78 ± 0.05 times faster than './rm-updated my-app-copy'
    1.81 ± 0.04 times faster than './rm-master my-app-copy'

This allows for avoiding an unnecessary lseek (or equivalent) call in places where it can be known that the fd has not had its cursor modified yet.
There are two parts to this:

1. The deleteFile call on the sub_path has been moved outside the loop, since if the first call fails with `IsDir` then it's very likely that all the subsequent calls will do the same. Instead, if the `openIterableDir` call ever hits `NotDir` after the `deleteFile` hit `IsDir`, then we assume that the tree was deleted at some point and can consider the deleteTree a success.

2. Inside the `dir_it.next()` loop, we look at entry.kind and only try doing the relevant (deleteFile/openIterableDir) operation, but always fall back to the other if we get the relevant error (NotDir/IsDir).
@andrewrk
Copy link
Member

andrewrk commented Oct 5, 2022

  • Is a recursive implementation okay? An alternative with the same/similar performance could use allocation instead of recursion.

I'd like to fully exhaust the non-recursive, non-heap-allocating implementation options before resorting to this. Could you try a version that uses a fixed-size stack buffer (and resorts to redundant syscall operations in the case that the buffer is exhausted)?

  • Should the non-recursive/non-allocating version be maintained? If so, which should be the default? How should they be named?

I think we should have only one implementation in the standard library. Hopefully it can accomplish all the characteristics we want:

  • Equivalent or better perf of rm -rf.
  • No possibility of stack overflow due to "user input" of a deeply nested file system tree
  • No allocator parameter required.

If it's not possible to satisfy all these requirements then we can revisit the question, but I'm not convinced that will be necessary.

@squeek502
Copy link
Collaborator Author

squeek502 commented Oct 5, 2022

That makes total sense, submitted that as #13073.

@squeek502 squeek502 closed this Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants