Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with large page numbers (>60k) #12343

Closed
me4502 opened this issue Mar 6, 2019 · 5 comments
Closed

Issues with large page numbers (>60k) #12343

me4502 opened this issue Mar 6, 2019 · 5 comments

Comments

@me4502
Copy link
Contributor

me4502 commented Mar 6, 2019

Description

I've been setting up a content surfacing system using GatsbyJS, and we're encountering a fair few issues with the amount of pages we have. I've made a few changes as suggested in Discord, and done a fair bit of investigating into the cause of the slowdowns.

The following symptoms have been noticed:

  1. npm run develop is significantly slower than npm run build (50 minutes vs 2 minutes)
  2. Slowdown occurs during the "running graphql queries" step, but before the text has shown
  3. Some machines encounter an "invalid instruction" crash after the "info bootstrap finished" text (http://paste.enginehub.org/tb7s6C)
  4. It appears to be running a query per page, despite the fact that the pages have no queries. There appears to already be an issue for this (GraphQL queries are executed even if not using GraphQL #12216)

A few notes about our setup:

  • All data is passed to pages/templates via the context, rather than a per-page query (As per discord recommendation)
  • I can provide the source code to the Gatsby team privately if required

What we've discovered:

  • The major slowdowns appear to be related to the queue library, and a QuickSort that gets run on insertion of elements. From what we can tell, the "Running graphql queries" step is actually running a list of tasks that may not be related to graphql, so it may be worth renaming this.
  • As for why it significantly slows down in develop, we discovered that in build the priority function is deleted from the queue. We've noticed the same speedups by setting the priority of non-active paths to undefined, which skips sorting for them entirely. (https://github.com/diamondio/better-queue-memory/blob/cff881f2074ff0508bcb6e932bda0b92977d3d2b/index.js#L48)
  • Halving the page number takes the time from 50 minutes to 10 minutes, so it's not a linear slowdown.

Steps to reproduce

Using the source code that I can provide privately:

  1. Run npm run build, notice the time it takes to run, and then crashes Node.
  2. Run npm run develop, notice that it takes significantly longer, with the same crash.

Expected result

Gatsby should be able to handle this quantity of pages, as there are multiple sources that state they're running ~10 million with little to no issue.

Actual result

Gatsby struggles at these page numbers.

Environment

System:
OS: macOS 10.14.2
CPU: (12) x64 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
Shell: 3.2.57 - /bin/bash
Binaries:
Node: 10.15.3 - ~/.nvm/versions/node/v10.15.3/bin/node
npm: 6.8.0 - ~/.nvm/versions/node/v10.15.3/bin/npm
Languages:
Python: 2.7.15 - /usr/local/bin/python
Browsers:
Chrome: 72.0.3626.119
Safari: 12.0.2
npmPackages:
gatsby: ^2.1.23 => 2.1.23
gatsby-image: ^2.0.31 => 2.0.31
gatsby-plugin-catch-links: ^2.0.12 => 2.0.12
gatsby-plugin-manifest: ^2.0.22 => 2.0.22
gatsby-plugin-offline: ^2.0.24 => 2.0.24
gatsby-plugin-react-helmet: ^3.0.8 => 3.0.8
gatsby-plugin-sharp: ^2.0.25 => 2.0.25
gatsby-plugin-styled-components: ^3.0.6 => 3.0.6
gatsby-plugin-typescript: ^2.0.10 => 2.0.10
gatsby-plugin-web-font-loader: ^1.0.4 => 1.0.4
gatsby-source-filesystem: ^2.0.23 => 2.0.23
gatsby-transformer-json: ^2.1.8 => 2.1.8
gatsby-transformer-sharp: ^2.1.15 => 2.1.15

@me4502
Copy link
Contributor Author

me4502 commented Mar 6, 2019

One potential solution for the slower develop, is if the queue is FIFO, then it doesn't need to sort when a non-activePath entry is added. This means only a few putTask calls will sort, rather than all (60k in this case).

This doesn't resolve the other issues such as the crashing though.

@KyleAMathews
Copy link
Contributor

Queries are slower in development because there's not as much caching.

You can try enabling the experimental support for loki.js by running gatsby with this environment variable set GATSBY_DB_NODES=loki

@stefanprobst
Copy link
Contributor

@me4502 As the crash seems to be happening in saveState after bootstrap is finished, I'd be interested if using V8.serialize instead of JSON.stringify would help in your case.

@me4502
Copy link
Contributor Author

me4502 commented Mar 7, 2019

@KyleAMathews That option doesn't appear to speed it up too much, however I've made a PR that brings develop performance to basically the same as build performance (#12365)

@stefanprobst It appears that does fix the crash, even just switching the json-stringify-safe with JSON.stringify fixes it.

pieh pushed a commit that referenced this issue Mar 19, 2019
…tead of sorting (#12365)

## Description

This change prevents the entire task queue being sorted every time a low-priority task is added to the queue. This takes a run of `npm run develop` with 60k pages from 50 minutes to 1.5 minutes. There should be no behaviour change here, as these were already the lowest priority and will be added to the end of a FIFO queue. High priority tasks still sort, which means they will still be correctly moved to the front.

## Related Issues

Fixes one aspect of #12343
@DSchau
Copy link
Contributor

DSchau commented Apr 15, 2019

@me4502 I believe we've fixed this with #10732 and gatsby@^2.3.20.

Closing this out--but please re-open or reply if this is not the case and you can still reproduce these OOM issues.

We're always working on making Gatsby more scalable, and the more issues we can surface and fix--the better. Thanks for surfacing this one!

@DSchau DSchau closed this as completed Apr 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants