Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible benchmarks #36

Closed
sharkdp opened this issue Jun 9, 2017 · 15 comments
Closed

Reproducible benchmarks #36

sharkdp opened this issue Jun 9, 2017 · 15 comments

Comments

@sharkdp
Copy link
Owner

sharkdp commented Jun 9, 2017

For example:

  • Generate a lot of dummy files / directories in a controlled way by a shell script and run specific benchmarks within this test folder.
  • Clone a certain (big) repository and run benchmarks within this repository (this is not 100% reproducible).
@sharkdp sharkdp added the idea label Jun 9, 2017
@ghost
Copy link

ghost commented Jun 10, 2017

As per #30 you might also want to bench against network filesystems like NFS and SMB. And also against sshfs and other FUSE filesystems.

@lnicola
Copy link

lnicola commented Sep 18, 2017

One thing I noticed is that fd calls stat three times (and lstat once) for each file.

@sharkdp
Copy link
Owner Author

sharkdp commented Sep 18, 2017

One thing I noticed is that fd calls stat three times (and lstat once) for each file.

Good catch!! This should be fixed on master (via adaf4f6), i.e. fd should perform only one lstat and one stat call.

Note: those syscalls are only necessary when colored output is active (in order to check if the entries are symlinks, directories, executables or normal files).

@lnicola
Copy link

lnicola commented Sep 18, 2017

I think it does a lstat via walkdir and one stat per path component in print_entry.

But I'm not sure that's a good idea. Don't we care only about the leaf component? Presumably, non-leaf ones are directories and we only want to check the type of the file.

@sharkdp
Copy link
Owner Author

sharkdp commented Sep 18, 2017

Don't we care only about the leaf component?

Well, currently fd can do this (i.e. colorize "symlink1" as a symlink):

image

@lnicola
Copy link

lnicola commented Sep 18, 2017

Yeah, that might be useful, I suppose. I wasn't sure what fd is supposed to do with symlinks:

 grayshade@ubik  ~/foo  ll
total 1.0K
drwxr-xr-x 2 grayshade grayshade 2 Sep 18 22:32 bar
lrwxrwxrwx 1 grayshade grayshade 3 Sep 18 22:32 baz -> bar
 grayshade@ubik  ~/foo  fd . baz
bar

Anyway, this isn't really related to the issue at hand, so sorry for spamming it.

@sharkdp
Copy link
Owner Author

sharkdp commented Oct 7, 2017

If somebody has any ideas on how to proceed on this, I'd love to hear them!

Here are the two benchmark scripts that I have so far:

https://gist.github.com/sharkdp/4bc3e5f5ea9df2f29c02ede50634b16a

@avently
Copy link

avently commented Oct 13, 2017

On my system fd is slower than find. After multiple starts:
fd -HI -c never zsh //2.3-2.4 sec
find . -name "zsh" //1.9 sec

@sharkdp
Copy link
Owner Author

sharkdp commented Oct 13, 2017

@avently Thank you for the feedback! Could you please give a little bit of background information?

  • Which version of fd are you using (fd -V)?
  • How many cores does your system have (nproc)?
  • Which OS?
  • Anything specific about the directory that you perform the search in (e.g.: what kind of physical device is it placed on)?

@avently
Copy link

avently commented Oct 13, 2017

@sharkdp
4.0.0
2 cores and 4 threads (Intel i3)
Manjaro (Arch-based)
home directory (~/), ext4 mounted from the same HDD as the system, 630000 files

@sharkdp
Copy link
Owner Author

sharkdp commented Oct 13, 2017

@avently Thanks!

I just now looked at your command more closely. Note that these two commands do something quite different:

  • find . -name "zsh" will look for files whose name is exactly "zsh" (case-sensitive)

  • fd -HI -c never zsh will look for files whose filename matches the regular expression "zsh" (anywhere, case-insensitive).

To have a fair comparison, you should consider running either find . -iregex '.*zsh.*' or fd -HI -s -c never '^zsh$'.

If you are interested, you could also try experimenting with different numbers of threads via the -j n/--threads n option. Default should be 4 for you.

@avently
Copy link

avently commented Oct 13, 2017

@sharkdp the command i used for test was
find . -name "zsh"
I just wrote here the wrong one.
As i understand it's the same as
fd -HI -c never zsh
but case-sensitive

@sharkdp
Copy link
Owner Author

sharkdp commented Oct 13, 2017

Oh I see, Markdown was interpreting *zsh* as italic zsh 😄

In this case, they are the same (except for case-sensitivity). But note that fd does regex-by-default.

@avently
Copy link

avently commented Oct 13, 2017

@sharkdp fd use 344% cpu, find use 96% cpu for those queries.
But if i use regex search:
find . -iregex '.zsh.'
system 99% cpu 5,436 total

fd is winner in the regex search.

@sharkdp
Copy link
Owner Author

sharkdp commented Jan 24, 2018

I'm going to close this, as we have this now: https://github.com/sharkdp/fd-benchmarks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants