Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Gatsby type inference is slow on 60k pages #12692

Closed
bennetthardwick opened this issue Mar 20, 2019 · 21 comments
Closed

New Gatsby type inference is slow on 60k pages #12692

bennetthardwick opened this issue Mar 20, 2019 · 21 comments

Comments

@bennetthardwick
Copy link
Contributor

bennetthardwick commented Mar 20, 2019

Description

After the onPreExtractQueries step of the build process, Gatsby gets when looking through 60k or so pages. Specifically, the isDate method (which is run on every string to check whether it's a string or a date string), is taking up the most time. These methods are all being run from the example-value.js.

Steps to reproduce

  1. Clone gatsby-intense-benchmark
  2. Run:
node --inspect-brk node_modules/.bin/gatsby develop
  1. Open chrome / chromium, navigate to chrome://inspect and start the debugger.
  2. When the build process finishes onPreExtractQueries, wait for a few minutes and then pause the debugger (chances are it will be in the right spot)
  3. Alternatively run the profiler for a few minutes and inspect the chart

The build will be stuck in this state for 30+ minutes (I haven't had a successful build).

Expected result

Build completes in a reasonable time.

Actual result

Build never completes.

Environment

System:
OS: Linux 4.15 Ubuntu 18.04.2 LTS (Bionic Beaver)
CPU: (4) x64 Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
Shell: 5.4.2 - /usr/bin/zsh
Binaries:
Node: 10.15.0 - ~/.nvm/versions/node/v10.15.0/bin/node
Yarn: 1.13.0 - ~/.nvm/versions/node/v10.15.0/bin/yarn
npm: 6.4.1 - ~/.nvm/versions/node/v10.15.0/bin/npm
Languages:
Python: 2.7.15 - /usr/bin/python
npmPackages:
gatsby: ^2.2.1 => 2.2.1
gatsby-image: ^2.0.34 => 2.0.34
gatsby-plugin-manifest: ^2.0.24 => 2.0.24
gatsby-plugin-offline: ^2.0.25 => 2.0.25
gatsby-plugin-react-helmet: ^3.0.10 => 3.0.10
gatsby-plugin-sharp: ^2.0.29 => 2.0.29
gatsby-source-filesystem: ^2.0.27 => 2.0.27
gatsby-transformer-sharp: ^2.1.17 => 2.1.17
npmGlobalPackages:
gatsby-dev-cli: 2.4.12
gatsby: 2.2.2

@stefanprobst
Copy link
Contributor

Thanks for the report and for providing a testing repo!

We are indeed now checking every string if it is a date string -- we did not do this before but only checked on a randomly picked field value which unfortunately made type inference non-deterministic. We are aware though that this does not scale very well.

I'll test with the provided repo soon -- in the meantime: if it hangs after onPreExtractQueries that means in the schema update step. If that's also what you experience in your real project, you can try simply disabling schema updating here since it's really only to make the context fields available in the schema which you probably won't need (see this RFC)
I'd also want to try switching momentjs for something like date-fns@2' (something like const isDate = value => isValid(parseISO(value)) and see if that would make a difference.

@millette
Copy link
Contributor

Me too! I am using a single 60 MiB JSON file for my data. Was taking about 10-15 minutes to build with gatsby < 2.2.0 but with the most recent, it's taking 2h30.

@wardpeet
Copy link
Contributor

PR #12700 is a quickfix to make it a little bit faster again. Do you mind trying it out?

@millette
Copy link
Contributor

millette commented Mar 20, 2019

Thanks @wardpeet I'm giving it a shot, hopefully with an answer in less than 2 hours :-)

UPDATE:
Previous run took (from memory) 8000s to build schemas, with the patch it was only 600s.

Looking good!

@KyleAMathews
Copy link
Contributor

What did schema creation take before?

@me4502
Copy link
Contributor

me4502 commented Mar 21, 2019

Me too! I am using a single 60 MiB JSON file for my data. Was taking about 10-15 minutes to build with gatsby < 2.2.0 but with the most recent, it's taking 2h30.

They mentioned in an earlier comment that it was taking 10-15 minutes. I'm going to try this on our codebase as we were taking 15 seconds before, but >30 minutes after.

@stefanprobst
Copy link
Contributor

I also tried with swapping momentjs for date-fns@2 and the difference is enormous. The problem with how things are done currently is that we check 55 different ISO format strings with moment.

@freiksenet
Copy link
Contributor

We will merge Ward's PR as a temporary optimization. Sadly date-fns parser is a bit too loose (like it allows dates with arbitrary trailing characters), so that would be a potential breaking change.

As a long term solution to this, we will provide an opt-out from inference that will considerably speed up sites with lots of nodes. This will be done by specifying the GraphQL types for the nodes that you don't want to be inferred. We've added the "specifying the type" part already, but for historical reasons inference always happens. We will fix this in the next couple of weeks.

wardpeet added a commit that referenced this issue Mar 21, 2019
## Description
Checks if strings start with 4 numbers as all ISO 8601 date-time are using YYYY syntax

## Related Issues
#12692
@millette
Copy link
Contributor

Closing (it's merged); further steps will require more specific issues.

@danechitoaie
Copy link

danechitoaie commented Mar 21, 2019

Just to throw an idea out there. What about if the source-plugins would be able to specify the data type when they create the nodes?

Something like name: gatsby.StringType(something.name), date: gatsby.DateType(something.date, "yyyy/MM/dd" /* format for how to be parsed */), etc. ?

(as a proposal for a future improvement)

@wardpeet
Copy link
Contributor

@danechitoaie that's something we're going to move to.

@millette @bennetthardwick I did some more perf improvements #12722 could you guys give it a spin and report if something is wonky

@millette
Copy link
Contributor

millette commented Mar 22, 2019

@wardpeet I didn't have a chance to try #12722 but on first sight it looks like a lot of code compared to first patch. I'm not used to "multirepos" so I have to patch the dist module manually. Also, my project should be redesigned somewhat since it's already pretty slow and doesn't make the best usage of gatsby, so it shouldn't be too significant to proper gatsby usage.

(reopening the issue since we're not done, it seems)

@millette
Copy link
Contributor

@wardpeet I finally tried the new patch with gatsby 2.2.8.

gatsby 2.2.8:

success source and transform nodes — 132.075 s
success building schema — 566.949 s

gatsby 2.2.8 with new patch:

success source and transform nodes — 158.689 s
success building schema — 265.077 s

@DSchau
Copy link
Contributor

DSchau commented Mar 25, 2019

@millette could you try with a gatsby@~2.1.0 version? The initial issue here was with the new Schema Customization API changes, so I'd be curious to see if it's still slower!

Thanks!

@millette
Copy link
Contributor

millette commented Mar 25, 2019

@DSchau To be clear, you want me to test #12722 (which was merged 13 hours ago) against gatsby 2.1.0 ?

@DSchau
Copy link
Contributor

DSchau commented Mar 25, 2019

@millette the central issue here was with a regression with 2.2.0.

Specifically, it seemed like the bottleneck was with Date inference.

That change was released in [email protected] so if we want to compare performance, we would ideally test this change:

  • In the current/latest update ([email protected])
  • In a version of Gatsby prior to this API change (gatsby@~2.1.0)

@millette
Copy link
Contributor

@DSchau Sorry in advance for the very long response, but here we go.

I've included results while also running mplayer which took some cpu. If you compare with runs without mplayer, I concluded that the 2.2 branch takes a bit more cpu (vs IO) since it was more impacted by mplayer running.

So there's a regression (540s vs 320s with mplayer; 485s vs 326s without mplayer). When 2.2 came out, the same build took almost 3 hours to complete so I'm not complaining :-)

I didn't patch anything in these tests.

v2.2.11 build#1 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.025 s
success onPreInit — 1.260 s
success delete html and css files from previous builds — 0.020 s
success initialize cache — 0.025 s
success copy gatsby files — 0.076 s
success onPreBootstrap — 0.050 s
success source and transform nodes — 169.772 s
success building schema — 283.182 s
success createPages — 0.147 s
success createPagesStatefully — 0.223 s
success onPreExtractQueries — 0.009 s
success update schema — 0.180 s
success extract queries from components — 0.725 s
success run graphql queries — 72.963 s — 28/28 0.38 queries/second
success write out page data — 0.030 s
success write out redirect data — 0.022 s
success onPostBootstrap — 0.008 s
info bootstrap finished - 540.315 s





v2.2.11 build#2 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.008 s
success onPreInit — 1.215 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.026 s
success copy gatsby files — 0.060 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 131.905 s
success building schema — 267.430 s
success createPages — 0.143 s
success createPagesStatefully — 0.215 s
success onPreExtractQueries — 0.007 s
success update schema — 0.171 s
success extract queries from components — 0.709 s
success run graphql queries — 69.625 s — 28/28 0.40 queries/second
success write out page data — 0.020 s
success write out redirect data — 0.004 s
success onPostBootstrap — 0.007 s
info bootstrap finished - 484.869 s





v2.2.11 build#3 (no mplayer, with cache)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 0.978 s
success onPreInit — 1.269 s
success delete html and css files from previous builds — 0.159 s
success initialize cache — 0.024 s
success copy gatsby files — 0.055 s
success onPreBootstrap — 0.033 s
success source and transform nodes — 0.890 s
success building schema — 270.609 s
success createPages — 0.184 s
success createPagesStatefully — 0.271 s
success onPreExtractQueries — 0.007 s
success update schema — 0.177 s
success extract queries from components — 0.707 s
success run graphql queries — 1.823 s — 16/16 8.79 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.003 s
success onPostBootstrap — 0.003 s
info bootstrap finished - 300.021 s





v2.2.11 build#4 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.040 s
success load plugins — 1.009 s
success onPreInit — 1.195 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.026 s
success copy gatsby files — 0.056 s
success onPreBootstrap — 0.038 s
success source and transform nodes — 134.602 s
success building schema — 267.401 s
success createPages — 0.144 s
success createPagesStatefully — 0.215 s
success onPreExtractQueries — 0.008 s
success update schema — 0.182 s
success extract queries from components — 0.723 s
success run graphql queries — 71.003 s — 28/28 0.39 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.005 s
success onPostBootstrap — 0.007 s
info bootstrap finished - 489.475 s



v2.2.11 build#5 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.042 s
success load plugins — 1.032 s
success onPreInit — 1.234 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.027 s
success copy gatsby files — 0.058 s
success onPreBootstrap — 0.052 s
success source and transform nodes — 159.721 s
success building schema — 274.916 s
success createPages — 0.154 s
success createPagesStatefully — 0.216 s
success onPreExtractQueries — 0.007 s
success update schema — 0.186 s
success extract queries from components — 0.709 s
success run graphql queries — 73.223 s — 28/28 0.38 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.001 s
success onPostBootstrap — 0.027 s
info bootstrap finished - 525.877 s




v2.1.39 build#6 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.050 s
success onPreInit — 1.242 s
success delete html and css files from previous builds — 0.019 s
success initialize cache — 0.026 s
success copy gatsby files — 0.055 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 160.464 s
success building schema — 20.783 s
success createPages — 0.322 s
success createPagesStatefully — 0.321 s
success onPreExtractQueries — 0.008 s
success update schema — 20.012 s
success extract queries from components — 0.746 s
success run graphql queries — 143.294 s — 28/28 0.20 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.001 s
success onPostBootstrap — 0.006 s
info bootstrap finished - 361.719 s




v2.1.39 build#7 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.084 s
success onPreInit — 1.204 s
success delete html and css files from previous builds — 0.019 s
success initialize cache — 0.025 s
success copy gatsby files — 0.063 s
success onPreBootstrap — 0.038 s
success source and transform nodes — 133.101 s
success building schema — 19.157 s
success createPages — 0.318 s
success createPagesStatefully — 0.326 s
success onPreExtractQueries — 0.007 s
success update schema — 21.001 s
success extract queries from components — 0.767 s
success run graphql queries — 140.068 s — 28/28 0.20 queries/second
success write out page data — 0.025 s
success write out redirect data — 0.005 s
success onPostBootstrap — 0.009 s
info bootstrap finished - 330.762 s



v2.1.39 build#8 (with mplayer, with cache)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.001 s
success onPreInit — 1.297 s
success delete html and css files from previous builds — 0.165 s
success initialize cache — 0.025 s
success copy gatsby files — 0.068 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 0.900 s
success building schema — 19.452 s
success createPages — 0.346 s
success createPagesStatefully — 0.382 s
success onPreExtractQueries — 0.008 s
success update schema — 20.832 s
success extract queries from components — 0.756 s
success run graphql queries — 1.960 s — 16/16 8.17 queries/second
success write out page data — 0.016 s
success write out redirect data — 0.002 s
success onPostBootstrap — 0.002 s
info bootstrap finished - 70.195 s





v2.1.39 build#9 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.040 s
success load plugins — 1.009 s
success onPreInit — 1.239 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.025 s
success copy gatsby files — 0.051 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 134.509 s
success building schema — 19.830 s
success createPages — 0.322 s
success createPagesStatefully — 0.314 s
success onPreExtractQueries — 0.008 s
success update schema — 19.755 s
success extract queries from components — 0.761 s
success run graphql queries — 136.816 s — 28/28 0.20 queries/second
success write out page data — 0.017 s
success write out redirect data — 1.651 s
success onPostBootstrap — 0.006 s
info bootstrap finished - 326.824 s




v2.1.39 build#10 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.000 s
success onPreInit — 1.245 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.025 s
success copy gatsby files — 0.058 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 133.040 s
success building schema — 20.355 s
success createPages — 0.313 s
success createPagesStatefully — 0.322 s
success onPreExtractQueries — 0.008 s
success update schema — 20.324 s
success extract queries from components — 0.819 s
success run graphql queries — 148.839 s — 28/28 0.19 queries/second
success write out page data — 0.704 s
success write out redirect data — 0.013 s
success onPostBootstrap — 0.007 s
info bootstrap finished - 336.769 s




v2.1.39 build#11 (no mplayer, with cache)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 0.997 s
success onPreInit — 1.281 s
success delete html and css files from previous builds — 0.161 s
success initialize cache — 0.024 s
success copy gatsby files — 0.067 s
success onPreBootstrap — 0.035 s
success source and transform nodes — 0.895 s
success building schema — 19.220 s
success createPages — 0.353 s
success createPagesStatefully — 0.371 s
success onPreExtractQueries — 0.008 s
success update schema — 20.691 s
success extract queries from components — 0.738 s
success run graphql queries — 1.912 s — 16/16 8.38 queries/second
success write out page data — 0.014 s
success write out redirect data — 0.002 s
success onPostBootstrap — 0.004 s
info bootstrap finished - 69.401 s

@DSchau
Copy link
Contributor

DSchau commented Mar 26, 2019

@millette not slow at all! Thank you for doing this--we appreciate it!

@millette
Copy link
Contributor

FYI, I'm not generating 60k pages, but I'm using a 60 MiB JSON file as a source. It's built with https://github.com/millette/gatsby-starter-location-github and the source data is generated with https://github.com/millette/ghraphql. I need to put some time into it soon since it's getting rather slow and bulky: http://dev.rollodeqc.com/en/ but that's all on me.

Thanks to all Gatsby contributors :-)

@millette
Copy link
Contributor

Time to close this issue?

@bennetthardwick
Copy link
Contributor Author

The changes have fixed my original issue at least. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants