Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gatsby-source-filesystem fails with "EMFILE: too many open files" on Windows with lots of files #12011

Closed
FraserThompson opened this issue Feb 23, 2019 · 19 comments
Labels
stale? Issue that may be closed soon due to the original author not responding any more. status: awaiting author response Additional information has been requested from the author status: confirmed Issue with steps to reproduce the bug that’s been verified by at least one reviewer.

Comments

@FraserThompson
Copy link
Contributor

FraserThompson commented Feb 23, 2019

Description

When I try to gatsby build a site with around 9,500 files in the src folder on Windows 10 I get this error and the build fails:

Error: EMFILE: too many open files

It specifies the error occurred on an image file (usually a different one each time).

Steps to reproduce

My project is an incredibly image heavy website, and it uses gatsby-source-filesystem, gatsby-image and gatsby-transformer-sharp to process the images.

I have lots of folders containing lots of images like this:

content/gigs/[gig name]/[artist_name]/[image1.jpg,image2.jpg,etc]

In each gig folder there's an index.md which contains metadata for the page. I use a graphql query in gatsby-node.js to turn these markdown files into pages via a template. The template uses a graphql querie and a regex injected via gatsby-node.js to get the images in the folder for that page.

Quite impressively it succeeds with 8,503 files but starts failing if I add in more.

All the code is here: https://github.com/FraserThompson/dunedinsound.com/tree/gatsby

Expected result

I expected it to be able to build my site because the documentation doesn't mention file limits.

Actual result

The build doesn't succeed.

Environment

gatsby info --clipboard fails for me, probably because of #11496, but this is the output of gatsby info:

System:
OS: Windows 10
CPU: (16) x64 AMD Ryzen 7 2700X Eight-Core Processor
Binaries:
Yarn: 1.13.0 - C:\Program Files (x86)\Yarn\bin\yarn.CMD
Browsers:
Edge: 44.17763.1.0
npmPackages:
gatsby: ^2.1.4 => 2.1.4
gatsby-image: ^2.0.22 => 2.0.29
gatsby-plugin-feed: ^2.0.8 => 2.0.13
gatsby-plugin-google-analytics: ^2.0.5 => 2.0.14
gatsby-plugin-manifest: ^2.0.5 => 2.0.17
gatsby-plugin-offline: ^2.0.5 => 2.0.23
gatsby-plugin-react-helmet: ^3.0.0 => 3.0.6
gatsby-plugin-sharp: ^2.0.20 => 2.0.20
gatsby-plugin-styled-components: ^3.0.4 => 3.0.5
gatsby-plugin-typography: ^2.2.0 => 2.2.7
gatsby-remark-copy-linked-files: ^2.0.5 => 2.0.9
gatsby-remark-images: ^2.0.4 => 2.0.6
gatsby-remark-prismjs: ^3.0.0 => 3.2.4
gatsby-remark-smartypants: ^2.0.5 => 2.0.8
gatsby-source-filesystem: ^2.0.2 => 2.0.20
gatsby-transformer-remark: ^2.1.15 => 2.2.5
gatsby-transformer-sharp: ^2.1.13 => 2.1.13

@yogeshkotadiya
Copy link
Contributor

yogeshkotadiya commented Feb 23, 2019

I'll try to reproduce and check, this might be caused by the original node fs module.
EDIT: I can't reproduce this because provided repo has a lot of incorrect GraphQL queries. Please fix your graphql queries first.

@FraserThompson
Copy link
Contributor Author

Sorry, I think the GraphQL queries were failing because I excluded all the images from git since there's gigabytes of them.

I've made a new branch which contains all my original images resized so that they're small enough for source control. It should make it easier to reproduce the issue hopefully.

https://github.com/FraserThompson/dunedinsound.com/tree/gatsby-file-test

@plackowski
Copy link

I have the same problem, I mean too many open files:

success open and validate gatsby-configs — 0.019 s
success load plugins — 0.710 s
success onPreInit — 1.689 s
success delete html and css files from previous builds — 0.598 s
success initialize cache — 0.019 s
success copy gatsby files — 0.102 s
success onPreBootstrap — 0.020 s
error Plugin gatsby-source-filesystem returned an error

Error: EMFILE: too many open files, open (...)

I have about 8k markdown files to open, and now I'm stuck.. :(

@freiksenet
Copy link
Contributor

Currently we don't have any solutions to this except increasing open file limit (not sure if it's possible on Windows). One workaround could be using CHOKIDAR_USEPOLLING=1 env var to tell chokidar (our fs watcher package) to poll instead of using file watchers. This has performance implications, but could help with this problem.

@freiksenet freiksenet added type: feature or enhancement status: confirmed Issue with steps to reproduce the bug that’s been verified by at least one reviewer. labels Feb 25, 2019
@FraserThompson
Copy link
Contributor Author

FraserThompson commented Feb 27, 2019

Thanks for the response. For me CHOKIDAR_USEPOLLING=1 doesn't seem to help, and as far as I know there's no soft file limit on Windows. Maybe I'll look into building it inside a Linux Docker container or something if this is a Windows problem.

Google seems to suggest https://github.com/isaacs/node-graceful-fs as a drop in replacement for fs, I might also experiment with that to see if it makes a difference.

EDIT: I can confirm that monkeypatching fs with graceful-fs at the top of gatsby-node.js as in the snippet below fixes the issue for me.

const realFs = require('fs')
const gracefulFs = require('graceful-fs')
gracefulFs.gracefulify(realFs)

EDIT2: Actually after upgrading from Node 10 to Node 11 everything seems to be fine without having to patch fs... So all is well!

@yogeshkotadiya
Copy link
Contributor

@FraserThompson Use fs-extra module since it's already added in package.json and also fs-extra is built on top of graceful-fs and it's a drop-in replacement for fs module.

@gatsbot
Copy link

gatsbot bot commented Apr 12, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

Thanks for being a part of the Gatsby community! 💪💜

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Apr 12, 2019
@yogeshkotadiya yogeshkotadiya added status: awaiting author response Additional information has been requested from the author and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Apr 12, 2019
@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label May 3, 2019
@gatsbot
Copy link

gatsbot bot commented May 3, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contributefor more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@gatsbot
Copy link

gatsbot bot commented May 14, 2019

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.

Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community!

@gatsbot gatsbot bot closed this as completed May 14, 2019
@volkandkaya
Copy link

volkandkaya commented Jul 27, 2019

I'm getting this issue with https://github.com/volkandkaya/saaspages

Not a clue what's going on. Changed ulimit and the other things google has recommend me but can't fix it.

I'm using ubuntu.

@dr0i
Copy link

dr0i commented May 25, 2020

@volkandkaya you may have already somehow solved your issue. For me, using the library fs-extra in version 9.0.0 fixed the issue on ubuntu 18.

@adamthrottleup
Copy link

I am getting this issue with my gatsby-trasformer-asciidoctor project (over 10k asciidoc files, source is private) running on WSL2 ubuntu 20. @volkandkaya for me the problem is happening in the gatsby-source-filesystem plugin so how do I make it use fs-extra?
Interestingly this only happened when I upgraded my environment from ubuntu 18 on WSL v1 to ubuntu 20 on WSL 2

@FraserThompson
Copy link
Contributor Author

I should also add that this issue never went away for me. Every time I change Gatsby versions it seems to come back again.

This should be re-opened imo since it seems a bunch of people with large sites are experiencing this issue.

@wamir79
Copy link

wamir79 commented Aug 30, 2020

Hi all,

the best way i found to solve this issue is using graceful-fs.
add in the beginning of gatsby-config.js:

const fs = require('fs');
const gracefulFs = require('graceful-fs');
gracefulFs.gracefulify(fs);

@epbarger
Copy link

epbarger commented Sep 9, 2020

Also experiencing this issue with gatsby-source-filesystem, only I'm experiencing it on Mac. Agreed that this should be reopened!

The fix wamir79 and others have posted appears to work for me :)

sroertgen added a commit to openeduhub/skohub-vocabs that referenced this issue Nov 2, 2020
large vocabs couldn't be build due to the following error:
```
 Error: EMFILE: too many open files, scandir '....'
```

found the solution here:
<gatsbyjs/gatsby#12011>
@philr35
Copy link

philr35 commented Mar 1, 2021

@dr0i THANK YOU! I tried everything mentioned here, and using fs-extra was the only thing that worked! Using graceful-fs did nothing for me.

For those that need it, here's the documentation: https://github.com/jprichardson/node-fs-extra

note: I'm windows user and this worked. I also had to refactor my code a bit to work with the new api of fs-extra.

@FraserThompson
Copy link
Contributor Author

@philliprognerud gatsby-source-filesystem already uses fs-extra, so in the context of this issue using it isn't a fix.

I still experience this issue in the latest version of Gatsby, and the gracefulFs.gracefulify(fs) fix still doesn't work for me. I'm getting around this by using a batch script so that npm run build restarts itself on failure, and eventually the build completes. Perhaps this could be a built in feature of gatsby build? Like --ignore-failures or something.

@FraserThompson
Copy link
Contributor Author

FraserThompson commented Nov 27, 2022

I'm still experiencing this issue, and it seems to have gotten worse. Bear in mind my site is even bigger now, it has hundreds of thousands of files.

I had to delete the .cache directory, and now my builds consistently fail with this error. My previous working method (which was to use a script which restarts the build automatically when it fails) no longer works, I left it overnight and it's still failing, so I currently cannot build my site at all.

But I have made some progress to finding out the cause.

I tracked down most of the errors to the call to md5File in gatsby-source-filesystem's create-file-node.js. This library still uses regular fs, not fs-extra, which I think meant it was being less graceful.

I re-implemented this function in the file so it had access to fs-extra like this:

const crypto = require('crypto');

function md5File (path) {
  return new Promise((resolve, reject) => {
    const output = crypto.createHash('md5')
    const input = fs.createReadStream(path)

    input.on('error', (err) => {
      reject(err)
    })

    output.once('readable', () => {
      resolve(output.read().toString('hex'))
    })

    input.pipe(output)
  })
}

After this it was failing less, but still occasionally, so I implemented a retry mechanism on this call to md5file, so if it fails it will wait and try again:

let contentDigest = null;
let tries = 0;
while (true) {
  try {
    contentDigest = await md5File(slashedFile.absolutePath);
    break;
  } catch (e) {
    tries++;
    if (tries > 4) throw e;
    await new Promise(r => setTimeout(r, 5000));
  }
}

This seemed to allow the build to get past the source nodes stage eventually, but it wasn't reliable: It would still fail sometimes.

And I started getting other EMFILE: too many open files generated by the likes of gatsby-plugin-mdx.

I think the problem is it's literally just trying to open too many files. It's opening files to generate hashes, then while it's still generating a hash from the previous file, it tries to open more to generate more hashes, until eventually there are too many and it simply explodes.

On Windows we can't simply raise ulimit like we can on Linux. This issue here is relevant: nodejs/node#44832

Some sort of queue which hard-limits how many files it can process at a time might be a better fix, but would require major changes to gatsby-source-filesystem. And it's likely that this issue only effects huge edge-case sites like mine (and only on Windows) since barely anyone else seems to be experiencing it, so there's not much incentive for such a change...

edit: I think another factor (and why this isn't experienced more often) is that my site has quite a few large files - big 40mb or so MP3 files. Obviously these take longer to read and hash, so they stay open longer.

@FraserThompson
Copy link
Contributor Author

Okay, for anyone else who is having this issue consistently and looking for a permanent fix, I've made a version of gatsby-source-filesystem which uses a queue to limit the number of files accessed simultaneously: gatsby-source-filesystem-with-queue

To use it you can just replace gatsby-source-filesystem with gatsby-source-filesystem-with-queue in your package.json and your gatsby-config.js. Run npm install and your builds should start succeeding.

Since it limits the number of files being processed, it might slow things down. You can tweak this value (by default 800) with the GATSBY_CONCURRENT_FILES environment variable to strike a balance between build speed and not producing EMFILE errors.

Another hint for large sites or sites with lots of large files: You'll probably need to raise the Node heap limit, for example to 8gb with NODE_OPTIONS=--max-old-space-size=8192.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale? Issue that may be closed soon due to the original author not responding any more. status: awaiting author response Additional information has been requested from the author status: confirmed Issue with steps to reproduce the bug that’s been verified by at least one reviewer.
Projects
None yet
Development

No branches or pull requests

10 participants