Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rimraf progress #13

Closed
boneskull opened this issue Jan 11, 2019 · 9 comments
Closed

rimraf progress #13

boneskull opened this issue Jan 11, 2019 · 9 comments

Comments

@boneskull
Copy link
Collaborator

We left off where @bengl mentioned he was interested in looking into the rimraf problem.. I believe we decided to go the route of a JS implementation because a performant C++ implementation would be painful.

@refack
Copy link

refack commented Jan 11, 2019

An interesting avenue is the new std::filesystem::remove_all. This still has two challenges for us:

  1. We need C++17 compatible compilers for our platforms - compilers: install GCC 8 /CLANG 7 (as V8 does) build#1543
    We could implement this gradually under an #ifdef slimier to certain fs.watch capabilities.
    For example the current Windows compiler already supports this.
    And we plan to have CentOS7 with devtoolset7 ready for node12 - CentOS 7 for Node 12+ build#1583
  2. remove_all is synchronous and non ACID, so at minimum it should be wrapped in uv_queue_work

@sam-github
Copy link

@boneskull and I had a brief conversation about this recently, he suggested we move our discussion here. He said:

A pure JS implementation mimicking userland could suffice for now, but we'd see a significant
performance gain if we could do it at the C++ level.

For the record, I am personally an advocate of npm i rimraf. I can see some advantage to nodejs adopting it's maintenance if @isaacs was into that, but I'm not sure why we would rewrite it.

I could be correct on this, but I think rimraf is multi-threaded, from a quick skim of its source, rmkids() fires multiple recursive calls in parallel to remove directory entries, and actual leaf files/dirs are removed with fs.unlink() and fs.rmdir() which execute in the threadpool.

While using a muti-threaded recursive rmdir in pure c/c++ could be more performant than rimraf, I don't know that https://en.cppreference.com/w/cpp/filesystem/remove will help, its docs don't indicate that it is multi-threaded, is it your understanding that it is, @refack? If its not, it might be slower than an implementation done with node APIs, as rimraf is.

Its risky to speculate on performance, but I'll do it anyhow :-). I'd expect recursive rmdir (for any folder size deep&large enough for performance to matter) to be I/O bound, so concurrency might help, but doing things in C/C++ rather than js, or avoiding C++ to js transitions, is probably not the bigger win.

@refack
Copy link

refack commented Feb 5, 2019

I don't know that en.cppreference.com/w/cpp/filesystem/remove will help, its docs don't indicate that it is multi-threaded, is it your understanding that it is, @refack? If its not, it might be slower than an implementation done with node APIs, as rimraf is.

I don't think it is multi-threaded, I'd assume it would be stated explicitly if it was. It's probably intentionally unspecified and left for implementers of HPC stdlibs.

IMO the benefits of embedding it into core would be uniformity and reduced code maintenance, so I'd be in favor of either adopting Isaac's rimraf or using std::remove_all. Not a big fan of reinventing the wheel either in JS or in C++...

@bengl
Copy link
Member

bengl commented Mar 19, 2019

It was discussed in the last meeting that including rimraf as-is (or, mostly as-is) is an approach worth taking, and IIRC (and based on the minutes), @iansu expressed interest in porting it into node.

rimraf is ISC licensed, so I don't see it being a problem as long as the license is inserted into https://github.com/nodejs/node/blob/master/LICENSE. (I am not a lawyer.)

I'd think that glob support should be removed, as it would be inconsistent with the rest of Node.js' APIs. That means effectively running the existing rimraf with disableGlob: true.

@iansu thoughts?

@iansu
Copy link
Contributor

iansu commented Mar 19, 2019

@bengl I am still interested in doing this. We also discussed having the external rimraf package (which provides a CLI version) use the new internal version if available. Removing glob support could cause issues if someone tries to use the CLI with a glob though. Maybe we can discuss this further at our next meeting tomorrow or on Slack.

@coreyfarrell
Copy link
Member

Personally I think globs and CLI are userspace territory and should be excluded from any node.js implementation. I think a function to recursively delete a distinct file or folder is what node.js should get. This would mean that the rimraf module doesn't get replaced, it just eventually shrinks to become a simple glue between glob and the node.js built-in function.

@boneskull
Copy link
Collaborator Author

@coreyfarrell agree

@refack
Copy link

refack commented Mar 19, 2019

Progress on adding std:filesystem - seems like node@12 will only use GCC-6 so that's no help.

As for a JS implementation I'm in favor of porting a full shutil module. But I don't have the bandwidth for the near future 🤷‍♂️

@iansu
Copy link
Contributor

iansu commented Mar 19, 2019

Based on the discussion in our most recent meeting I think the consensus is to implement something like shutil. I've created a new issue to discuss that work: #23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants