-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify mkdir_p
(and make it faster in some cases)
#60
Conversation
Doing it this way is simpler and it doesn't end up adding "/" to the list of folders, so it doesn't need to be removed later.
I think it's debatable which is the most common usage of `FileUtils.mkdir_p`, but even assuming the most common use case is creating a folder when it doesn't previously exist but the parent does, this optimization doesn't seem to have a noticiable effect there while harming other use cases. For benchmarks, I created this script ```ruby require "benchmark/ips" Benchmark.ips do |x| x.report("old mkdir_p - exists") do FileUtils.mkdir_p "/tmp" end x.report("new_mkdir_p - exists") do FileUtils.mkdir_p_new "/tmp" end x.compare! end FileUtils.rm_rf "/tmp/foo" Benchmark.ips do |x| x.report("old mkdir_p - doesnt exist, parent exists") do FileUtils.mkdir_p "/tmp/foo" FileUtils.rm_rf "/tmp/foo" end x.report("new_mkdir_p - doesnt exist, parent exists") do FileUtils.mkdir_p_new "/tmp/foo" FileUtils.rm_rf "/tmp/foo" end x.compare! end Benchmark.ips do |x| x.report("old mkdir_p - doesnt exist, parent either") do FileUtils.mkdir_p "/tmp/foo/bar" FileUtils.rm_rf "/tmp/foo" end x.report("new_mkdir_p - doesnt exist, parent either") do FileUtils.mkdir_p_new "/tmp/foo/bar" FileUtils.rm_rf "/tmp/foo" end x.compare! end Benchmark.ips do |x| x.report("old mkdir_p - more levels") do FileUtils.mkdir_p "/tmp/foo/bar/baz" FileUtils.rm_rf "/tmp/foo" end x.report("new_mkdir_p - more levels") do FileUtils.mkdir_p_new "/tmp/foo/bar/baz" FileUtils.rm_rf "/tmp/foo" end x.compare! end ``` and copied the method with the "optimization" removed as `FileUtils.mkdir_p_new`. The results are as below: ``` Warming up -------------------------------------- old mkdir_p - exists 15.914k i/100ms new_mkdir_p - exists 46.512k i/100ms Calculating ------------------------------------- old mkdir_p - exists 161.461k (± 3.2%) i/s - 811.614k in 5.032315s new_mkdir_p - exists 468.192k (± 2.9%) i/s - 2.372M in 5.071225s Comparison: new_mkdir_p - exists: 468192.1 i/s old mkdir_p - exists: 161461.0 i/s - 2.90x (± 0.00) slower Warming up -------------------------------------- old mkdir_p - doesnt exist, parent exists 2.142k i/100ms new_mkdir_p - doesnt exist, parent exists 1.961k i/100ms Calculating ------------------------------------- old mkdir_p - doesnt exist, parent exists 21.242k (± 6.7%) i/s - 107.100k in 5.069206s new_mkdir_p - doesnt exist, parent exists 19.682k (± 4.2%) i/s - 100.011k in 5.091961s Comparison: old mkdir_p - doesnt exist, parent exists: 21241.7 i/s new_mkdir_p - doesnt exist, parent exists: 19681.7 i/s - same-ish: difference falls within error Warming up -------------------------------------- old mkdir_p - doesnt exist, parent either 945.000 i/100ms new_mkdir_p - doesnt exist, parent either 1.002k i/100ms Calculating ------------------------------------- old mkdir_p - doesnt exist, parent either 9.689k (± 4.4%) i/s - 49.140k in 5.084342s new_mkdir_p - doesnt exist, parent either 10.806k (± 4.6%) i/s - 54.108k in 5.020714s Comparison: new_mkdir_p - doesnt exist, parent either: 10806.3 i/s old mkdir_p - doesnt exist, parent either: 9689.3 i/s - 1.12x (± 0.00) slower Warming up -------------------------------------- old mkdir_p - more levels 702.000 i/100ms new_mkdir_p - more levels 775.000 i/100ms Calculating ------------------------------------- old mkdir_p - more levels 7.046k (± 3.5%) i/s - 35.802k in 5.087548s new_mkdir_p - more levels 7.685k (± 5.5%) i/s - 38.750k in 5.061351s Comparison: new_mkdir_p - more levels: 7685.1 i/s old mkdir_p - more levels: 7046.4 i/s - same-ish: difference falls within error ``` I think it's better to keep the code simpler is the optimization is not so clear like in this case.
I also wanted to remove all code using |
Hei @jeremyevans, I noticed that the code I'm modifying in the first commit here is relatively new. I think my solution doesn't change your idea in #53, just makes the code simpler. Let me know what you think, in case you have some time. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach. It makes the code simpler and shouldn't change the behavior, creating only the minimum number of directories necessary.
Thanks! |
@deivid-rodriguez Actually it changed the behavior unfortunately. https://bugs.ruby-lang.org/issues/18941 Could you check it out? |
It seems some Windows specific issue, right? I don't have much time right now but I will try to have a look! |
Oh, wait, I see the problem now. Let me fix it! |
So, I don't have a Windows machine right now to try this but this is my guess. Intuitively I think we could restore the previous behavior by checking until path == stack.last # dirname("/")=="/", dirname("C:/")=="C:/" However, that doesn't work for me:
Maybe that's because I'm not on Windows, but this check in So I'm not sure, I guess we need to try it on Windows. But in any case, I think reverting e842a0e would also fix the issue because it would always eagerly try to create the directory and generate the expected error in this case. |
There's two commits here:
The first one simplifies building the list of paths that need to be created. Previously the exit condition for the
until
loop was thatFile.dirname(path) == path
(root path), but that's unnecessary. It's enough to exit the loop once we find a directory that already exists while traversing the path upwards.The second one removes an optimization that doesn't seem to affect much the case it's meant to optimize (creating a folder that doesn't exist already under a folder that already exists), but does affect other use cases (for example, when the target folder already exists). To measure the effect of this commit, I copied the method with the optimization removed as
FileUtils.mkidr_p_new
, and run the following micro benchmark:Results were: