Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LHCI] [WIP] Migrate to MDX v2 and gatsby-plugin-mdx v4 to resolve mdx import app bundle issue #4426

Closed
wants to merge 26 commits into from

Conversation

randychilau
Copy link
Contributor

@randychilau randychilau commented Jun 24, 2023

Description

This PR is currently a proof of concept/work-in-progress for discussion purposes, as there are issues with the migration.

Currently the site is using a version of MDX and gatsby-plugin-mdx that has the following issue:

MDX currently shoehorns imports into the app bundle, and adds all imports to the global scope (causing imports to need to be namespaced to avoid clashing). (src)

This means any .mdx file that uses import to include something in the page will be added to the app bundle for every page that loads, which unnecessarily adds to the page weight and can potentially increase loading times, as well as reduce performance, especially on mobile.


For example, this mdx file:
src/collections/programs/sca-contributhon/index.mdx

imports the image sca.svg
import {ReactComponent as ScaLogo} from "./sca.svg";

when you view the treemap from a recent lighthouse report for the /about page:

image

It is included even though it is not displayed on the /about page. As you can see the size of the collection part of the bundle is over 300K
image
which is a significant percentage of the page weight.


Migrating the site to MDX v2 and gatsby-plugin-mdx v4 has many friction points, one of which includes updating .mdx files to be compatible with MDX v2 (would like to discuss with anyone experienced using MDXv2 and Gatsby).

However, this WIP is able to show the benefit of the migration on a executed build's treemap where the collection is less than 25K and the total page weight is reduced:

image
image


From the little experience I've had attempting this migration locally, it seems migrating to version upgrades can have lots of friction points and potential sacrifices that require discussion about cost/benefit (I hope to have a list of these in the near future). As many in community have done a lot of work on performance (e.g. @ayushthe1 and @Udit-takkar ) and upgrades/migrations (e.g. @Nikhil-Ladha ), I am curious on the thoughts of this proposed upgrade, and other potential upgrades like Gatsby v5 [status] and React 18.

I look forward to everyone's feedback and the conversation.

Cheers,
Randy


Notes for Reviewers

One major reason this is a WIP, is that the build fails on local and GitHub default settings during the Build HTML renderer step with the error message:
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory [Gatsby info]

I can successfully create a local build of this branch using the following command (based on this info):
NODE_OPTIONS=--max_old_space_size=4096 gatsby build

My guess is that processing .mdx files in MDXv2 is causing a memory spike for some reason (when I reduce the number of mdx files, I can successfully run the build on default settings). One guess is that many of the mdx files require significant reformatting to create static HTML pages. Would love to hear any other possible causes.

potentially related reports on Gatsby/Github:
Heap memory usage increase during development builds after upgrading to v4.24.5 #36899
Gatsby 5 Build Fails - FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory #37010


Signed commits

  • Yes, I signed my commits.

@github-actions github-actions bot added area/blog New posts or new blog functionality area/careers area/community area/events area/learn Related to /learn section area/news A noteworthy article, event, happening area/projects An issue relating to Layer5 initiatives (projects) area/resources area/site-config labels Jun 24, 2023
@Boombag0607
Copy link
Member

Boombag0607 commented Jun 26, 2023

I think for this to work you would have to add NODE_OPTIONS=--max_old_space_size=4096 in the .env.development file for gatsby build to recognise them.

@randychilau
Copy link
Contributor Author

Hi @Boombag0607,

Thanks for reading through this PR and the information. I will try it out to get a Preview built for demo purposes.

I believe the need for more more memory during the build is an indicator of migration issue(s) that should be resolved before moving forward.

I am inexperienced in troubleshooting these types of problems and slowly working through guesses on the cause. Anyone with thoughts on isolating memory leaks or monitoring build activity (file creation etc), please let me know.

Cheers,
Randy

Copy link
Contributor

@Nikhil-Ladha Nikhil-Ladha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Randy,

Upgrades can be a really tiring task, I know. Really appreicate for taking up the initiative for this.
Regarding the OOM issue - Is it reproducible on your local machine too? OR, is it just the CI?
I have seen this issue earlier as well when a modal package was added to the codebase, but I can't recall how we got rid of it (or did we got rid of it at all. :p). Anyway, I went though the gatsby GH issue and found that couple of people were able to resolve it by follwing some workarounds/fix, one that seems slight more reasonable is the no-sourcemap plugin, did you try this out as well? If not, maybe give it a try.
Another thing is optimising the gatsby-node.js file by reducing the amount of data queried by graphql, I doubt if there is any scope of reduction but maybe you can check that out once as well.

const BannerDefault = () => {
const imgHero = "../../../assets/images/homePage-images/Lee-Calcote-Cloud-Native-Rejekts.webp";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it was declared outside is because it is better to have all the imports together and outside the component, it just keeps the code cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Nikhil-Ladha,

Thanks, the reason I included the file inside the component was because of the way Gatsby described using their StaticImage component:

Because the image is loaded at build time, you cannot pass the filename in as a prop, or otherwise generate it outside of the component. It should either be a static string, or a local variable in the component’s scope.

However, I am not sure if I am interpreting the last sentence correctly.

Cheers,
Randy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's fine the way it was defined earlier. What they try to say is that you cannot generate the value at build time, it should be a static value decalred within the scope of the component. In this case, is it a static variable, just with global scope.
Side note, if it would have been incorrect then the site would have never worked till now :)

Comment on lines +9 to +22
import { Link } from "gatsby";
import { BlogWrapper } from "./src/collections/blog/Blog.style.js";
import { CareerWrapper } from "./src/collections/careers/Career.style.js";
import { BookWrapper } from "./src/collections/service-mesh-books/Book.style.js";
import { NewsWrapper } from "./src/collections/news/News.style.js";
import { ResourcesWrapper } from "./src/collections/resources/Resources.style.js";
import { ProgramsWrapper } from "./src/collections/programs/Programs.style.js";
import { ChapterStyle } from "./src/components/Learn-Components/Chapters-Style/chapters.style.js";
import Button from "./src/reusecore/Button";
import Blockquote from "./src/reusecore/Blockquote";
import BlockquoteAlt from "./src/reusecore/Blockquote/Blockquote-alt-style";
// import Code from "./src/components/CodeBlock";
import loadable from "@loadable/component";
const Code = loadable(() => import("./src/components/CodeBlock"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good cleanup.

@@ -11,7 +11,7 @@ const { paginate } = require("gatsby-awesome-pagination");
const { createFilePath } = require("gatsby-source-filesystem");
const config = require("./gatsby-config");

if (process.env.CI) {
if (process.env.CI === "true") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the CI property a bool?

Copy link
Contributor Author

@randychilau randychilau Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Nikhil-Ladha,

Yes, I thought so also, but when I tried to set the CI value to false, the CI conditional still executed as true, and then some googling and using typeof showed:

  • environment variables are always set as strings. (src)
  • when you set an env variable to false, e.g. DEBUG=false, it evaluates as the string "false" (src)

So I had to update all the usages of the CI variable (since it was relevant to a Lighthouse (LHCI) issue).

Thanks,
Randy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting find!

Comment on lines +137 to +140
// const MultiWorkshopTemplate = path.resolve(
// "src/sections/Careers/Careers-Programs-grid/index.js"
// );

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this commented out or just used for testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Nikhil-Ladha,

Yes, that commented part was used for testing to resolve another quirk about MDX v2 and gatsby-plugin-mdx v4, which was the removal of the MDXrenderer function which does not seem to have a clear workaround.

I will add descriptive comments to these kind of items for reviewers in the future.

Thanks,
Randy

Comment on lines +98 to +119
"overrides": [
{
"files": ["*.mdx", "*.md"],
"extends": ["plugin:mdx/recommended"],
"settings": {
"mdx/code-blocks": true,
"mdx/language-mapper": {},
},
"rules": {
"semi": 0,
"indent": 0,
"no-trailing-spaces": 0,
"object-curly-spacing": 0,
"space-infix-ops": 0,
"no-irregular-whitespace": 0,
"no-unused-expressions": 0,
"linebreak-style": 0,
"react/jsx-no-undef": 0,
"no-unused-vars": 0,
}
}
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason for these overrides?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @Nikhil-Ladha,

I am not familiar with the interactions of eslint, vscode, and using multiple plugins.

I believe the MDX eslint plugin to be helpful in creating/updating mdx files to be compatible with MDX v2, but I was encountering an issue where it seemed like rules from other extended plugins, were not allowing commits/saves or were reformatting the mdx files on save (since I added the file type to the checklint command).

Please let me know if there is an issue with the configuration or any other thoughts on how to have the .mdx files only get linted by the plugin.

Do .mdx files currently get linted or checked for MDX v1 compatibility in the current workflow?

Cheers,
Randy

Copy link
Contributor

@Nikhil-Ladha Nikhil-Ladha Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do .mdx files currently get linted or checked for MDX v1 compatibility in the current workflow?

Good question, and I doubt that happens as of now.
While commenting, I didn't notice that the override was for .md, .mdx files only. So, this should be fine.

However, we should enable some things in the lint like:

"semi": 0,
        "indent": ["error", 2],
        "no-trailing-spaces": 1,
        "object-curly-spacing": 1,
        "space-infix-ops": 0,
        "no-irregular-whitespace": 1,
        "no-unused-expressions": 1,
        "react/jsx-no-undef": 1,
        "no-unused-vars": 1,

@@ -4,7 +4,7 @@
font-weight: normal;
font-display: swap;
}

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Nikhil-Ladha,

This was related to some LHCI testing and attempting to reduce unnecessary processing.

It seemed that only one of the fonts were being referenced, and the other font variants were not being used on the site, which then seemed to be unnecessary declarations (but perhaps it does not matter?)

Thanks for asking these questions, I should be adding more detailed comments in the future.

Cheers,
Randy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, some (or most) of the fonts were used, but if you are sure that they are not used anywhere in the site then I am fine with commenting them out for perf improvement.

@randychilau
Copy link
Contributor Author

Hi @Nikhil-Ladha,

Thanks for taking the time and energy to review this whole thing.

The OOM error and workaround success with the command NODE_OPTIONS=--max_old_space_size=4096 gatsby build is reproducible on my local machine. I will try to use add the command to the CI for the sole purpose of getting a build preview.

I appreciate your suggestions and will try them to see if they help.

I am currently trying to batch modify mdx files to remove any problematic HTML tags and move closer to the MDX v2 standard. The thinking is perhaps the memory spike (that causes the OOM) is due to processing .mdx files with incompatible html tags/structure (e.g. <p> inside <div>) or something related since OOM error occurs generally during the Building HTML renderer step.


notes:

  • if I reduce the number of mdx files, the build succeeds with default settings
  • I noticed that after using the workaround command for an initial cold build (after using command gatsby clean), subsequent "warm" builds are much faster (but still need the workaround)
  • the initial build causes my disk space to go down approximately 8-9GBs
cold build with workaround, total time: 1562.060379293 sec
success load gatsby config - 0.352s
success load plugins - 3.669s
success onPreInit - 0.014s
success initialize cache - 0.061s
success copy gatsby files - 1.071s
success Compiling Gatsby Functions - 1.474s
success onPreBootstrap - 1.763s
success createSchemaCustomization - 0.014s
success Checking for changed pages - 0.004s
success source and transform nodes - 16.772s
info Writing GraphQL type definitions to
/Users/landyrau/Documents/GitHub/layer5/.cache/schema.gql
success building schema - 0.938s
success create redirects
success createPages - 11.369s
success createPagesStatefully - 1.178s
info Total nodes: 6818, SitePage nodes: 799 (use --verbose for breakdown)
success Checking for changed pages - 0.022s
success Cleaning up stale page-data - 0.005s
success onPreExtractQueries - 0.003s
success extract queries from components - 14.659s
success write out redirect data - 0.024s
success Build manifest and related icons - 0.319s
success onPostBootstrap - 1.540s
info bootstrap finished - 75.089s
success write out requires - 0.067s
warn ./node_modules/react-accessible-accordion/dist/es/index.js
Attempted import error: 'useId' is not exported from 'react' (imported as
'useId').
success Building production JavaScript and CSS bundles - 477.968s
success Building HTML renderer - 753.266s
success Execute page configs - 0.418s
success Caching Webpack compilations - 0.012s
warn This query took more than 15s to run — which is unusually long and
might indicate you're querying too much or have some unoptimized code:
File path: /Users/landyrau/Documents/GitHub/layer5/src/sections/Blog/Blog-
single/index.js
success run queries in workers - 61.135s - 823/823 13.46/s
success Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 1409.306s -
857/857 0.61/s
success Merge worker state - 0.070s
success Rewriting compilation hashes - 0.052s
success Writing page-data.json files to public directory - 1.517s -
798/799 526.77/s
success Building static HTML for pages - 42.461s - 798/798 18.79/s
success Delete previous page data - 0.003s
success onPostBuild - 6.371s
info Done building in 1562.060379293 sec
warm build with workaround, total time: 418.916365853 sec
success load gatsby config - 0.388s
success load plugins - 2.593s
success onPreInit - 0.012s
success delete worker cache from previous builds - 0.009s
success initialize cache - 0.063s
success copy gatsby files - 0.771s
success Compiling Gatsby Functions - 1.053s
success onPreBootstrap - 1.324s
success createSchemaCustomization - 0.065s
success Checking for changed pages - 0.013s
success source and transform nodes - 12.440s
info Writing GraphQL type definitions to
/Users/landyrau/Documents/GitHub/layer5/.cache/schema.gql
success building schema - 2.546s
success create redirects
success createPages - 11.707s
success createPagesStatefully - 0.777s
info Total nodes: 6818, SitePage nodes: 798 (use --verbose for breakdown)
success Checking for changed pages - 0.021s
success Cleaning up stale page-data - 0.079s
success onPreExtractQueries - 0.008s
success extract queries from components - 14.941s
success write out redirect data - 0.099s
success Build manifest and related icons - 0.242s
success onPostBootstrap - 1.400s
info bootstrap finished - 67.423s
success write out requires - 0.069s
warn ./node_modules/react-accessible-accordion/dist/es/index.js
Attempted import error: 'useId' is not exported from 'react' (imported as
'useId').
success Building production JavaScript and CSS bundles - 29.736s
success Building HTML renderer - 219.035s
success Execute page configs - 0.610s
success Caching Webpack compilations - 0.016s
success run queries in workers - 16.443s - 662/662 40.26/s
success Merge worker state - 0.051s
success Rewriting compilation hashes - 0.054s
success Writing page-data.json files to public directory - 0.410s - 7/7
17.06/s
success Building static HTML for pages - 53.716s - 798/798 14.86/s
success onPostBuild - 13.449s
info Done building in 418.916365853 sec

Thanks again for the assistance.

Cheers,
Randy

@l5io
Copy link
Contributor

l5io commented Jun 27, 2023

🚀 Preview for commit 4aa7062 at: https://649a52087ba22531a0e3ac73--layer5.netlify.app

@randychilau
Copy link
Contributor Author

Hi All,

As suggested by @Boombag0607, I was able to get a preview build by adding the setting for additional memory in the command used by GitHub Actions:

image

image

My concern is that the need for more memory is an indication of a problem or a leak of some sort that requires attention, and ideally a migration/upgrade should not cause the current build process to change substantially.

Thanks,
Randy

@randychilau
Copy link
Contributor Author

randychilau commented Jun 28, 2023

Hi @Nikhil-Ladha,

Looks like updating and cleaning up the .mdx files hasn't helped with the issue, so that theory is out.

I tried the gatsby-plugin-no-sourcemaps plugin, but still got the OOM with the default settings, will try some additional tests.

Regarding:

Another thing is optimising the gatsby-node.js file by reducing the amount of data queried by graphql, I doubt if there is any scope of reduction but maybe you can check that out once as well.

I decided to go over the gatsby-node.js file, and was wondering if you knew where on the site it was using the "Section" parts of the learnNodes, for example:

const onCreateSectionNode = ({ actions, node, slug }) => {
  const { createNodeField } = actions;
  const parts = getSlugParts(slug);
  const [learnpath, course, section] = parts;

  createNodeField({ node, name: "learnpath", value: learnpath });
  createNodeField({ node, name: "slug", value: `learn/learning-paths${slug}` });
  createNodeField({ node, name: "permalink", value: `${config.siteMetadata.permalink}${slug}` });
  createNodeField({ node, name: "course", value: course });
  createNodeField({ node, name: "section", value: section });
  createNodeField({ node, name: "pageType", value: "section" });
};

and

const createSectionPage = ({ envCreatePage, node }) => {
  const {
    learnpath,
    slug,
    course,
    section,
    pageType,
    permalink,
  } = node.fields;

  envCreatePage({

    path: `${slug}`,
    component: path.resolve("src/sections/Learn-Layer5/Section/index.js"),
    context: {
      learnpath,
      slug,
      course,
      section,
      pageType,
      permalink,
    },
  });
};

I don't have a complete understanding about what is going on for the learnNodes, but it seems that the Section part is not being used on the site? (unfortunately it hasn't helped with the OOM issues).


I found some logging tools (one for webpack and another for memory usage), but I don't know how to interpret the results to find the issue or a resolution.

This has been quite challenging trying to determine the cause of the OOM issue, and unfortunately just doing trial and error with guesses at this point. Soon, I will probably have to close this PR if I cannot find a path forward.

Do you have strategies on how to approach this type of situation? What you would look for, how to test, etc? [this question is also to the community].

note: Gatsby has their pages on the topic:
Resolving Out-of-Memory Issues (link)
Improving Site Performance (link)

but nothing specific for .mdx files, or reasons why the mdx v2 is causing OOM, while v1 is working.

Cheers,
Randy

@randychilau
Copy link
Contributor Author

Hi All,

Closing this PR in favor of PR #4532 which is more focused on the MDX migration issue and doesn't include unrelated changes that I put in here.

Special thanks to @Nikhil-Ladha for his helpful comments and taking the time to review this, as well as others who provided input.

Cheers,
Randy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/blog New posts or new blog functionality area/careers area/community area/events area/handbook area/learn Related to /learn section area/news A noteworthy article, event, happening area/projects An issue relating to Layer5 initiatives (projects) area/resources area/site-config project/meshery type/core-styls
Development

Successfully merging this pull request may close these issues.

4 participants