Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadPathCache scan takes a lot of time #277

Closed
katafrakt opened this issue Oct 2, 2019 · 7 comments · Fixed by #424
Closed

LoadPathCache scan takes a lot of time #277

katafrakt opened this issue Oct 2, 2019 · 7 comments · Fixed by #424

Comments

@katafrakt
Copy link

katafrakt commented Oct 2, 2019

I noticed that 2.5 out of 10 seconds of my application's startup time after requiring bootsnap is consumed by Bootsnap::LoadPathCache::PathScanner.call. By tracking the issue down, I found that it checks more than 150000 files that Dir.glob(path + ALL_FILES) returns. Most of those files are node_modules, cache files and compiled assets.

This is obviously unnecessary. By simply hardcoding skipping of those path I gained 20-25% less of application boot time. I was wondering how this can be parameterized without actually overwriting part of Bootsnap code.

"All-in" approach would be writing a C extension that would only return those paths that are not explicitly excluded from load path optimization instead of Dir.glob. I have some preliminary (working) result here: https://github.com/Shopify/bootsnap/compare/master...katafrakt:dir-walker?expand=1

Lighter approach would be to just next if excluded_paths.any? {|ep| absolute_path.start_with?(ep)} to the block inside Dir.glob.each, but of course this would be a bit less performant (I don't know how much less).

What are your thoughts about this issue?

@burke
Copy link
Member

burke commented Oct 2, 2019

I think something like this makes sense in general, and I'm even okay with a first version of it hardcoding node_modules as the only blacklisted item, but I don't have a strong intuition of what the right implementation strategy is.

@katafrakt
Copy link
Author

katafrakt commented Oct 3, 2019

I did some more profiling on different solutions:

  • just excluding node_modules (next if relative_path.starts_with?('node_modules') gives about 45% of time improvement (nice!)
  • having a configurable list of absolute paths (5 elements) gives maybe 5-10%
  • same with relative paths (so, shorter ones) gives me about 35%
  • C extension instead of Dir.glob seems to give 98.5% of time improvement

EDIT: wrong number given at first

@mscrivo
Copy link

mscrivo commented Nov 6, 2022

Being able to exclude folders, particularly node_modules, would be super nice. I'm playing around with swapping yarn with pnpm in our monorepo and because of pnpm's nested symlinking structure it uses, it makes bootsnap absolutely choke. Startup time went from 4s to approx 1m with pnpm's structure.

@katafrakt
Copy link
Author

Probably extracting just a config option to exclude some directories from the linked PR would be pretty easy.

@casperisfine
Copy link
Contributor

Ack. I thought I implemented this already but apparently not. I'm traveling right now, so I don't have a lot of time, but I'll try to implement this soonish (might take about a week or so)

casperisfine pushed a commit that referenced this issue Nov 8, 2022
If you have large non-ruby directories in the middle of your
load path, it can severly slow down scanning.
Typically this is a problem with `node_modules`

Fix: #277
@casperisfine
Copy link
Contributor

I'd appreciate if you could give this a try: #424

@mscrivo
Copy link

mscrivo commented Nov 8, 2022

@casperisfine works wonderfully! thanks for taking a look at that so quickly!

casperisfine pushed a commit that referenced this issue Nov 9, 2022
If you have large non-ruby directories in the middle of your
load path, it can severly slow down scanning.
Typically this is a problem with `node_modules`

Fix: #277
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants