Lazy loaded shiki languages during syntax highlighting #10618

43081j · 2024-03-30T16:10:02Z

Changes

There's a few changes in this PR, each contributing to memory usage improvement. Bundled into one PR since all but one are close to one-liners.

Lazily load shiki languages if they're not already loaded
- Rather than falling back to plain text, we attempt to load the language first
- We default to ['plaintext'] rather than all languages, leading to all others being lazily loaded by default
- This results in a chunky reduction of memory since we're no longer holding shiki's many language modules in memory for no reason
Discarding contexts after build end
- Some contexts were being held around permanently (i.e. retained in memory). By discarding them on build end, we allow them to be garbage collected
Move static image helper into a factory
- To avoid capturing local scopes, it has been moved into a factory
- Since we store it on globalThis, we were holding deep references to large objects before
- Now we no longer hold those refs, they can be garbage collected

Memory benchmark

Before:

	Elapsed time (s)	Memory used (MB)	Final memory (MB)
SSR build	3.11	91.74	128.05
Client build	0.11	-0.04	128.26
Static generate	0.91	4.88	132.53

After:

	Elapsed time (s)	Memory used (MB)	Final memory (MB)
SSR build	3.50	61.52	97.82
Client build	0.10	-41.66	56.20
Static generate	0.90	5.02	51.29

I did a few runs and got roughly these results each time.

Testing

Extra test added for shiki highlighting via a lazily loaded language
Memory benchmarking run to observe benefit of changes
Full test run for regression

Docs

We may need to update docs on the fact that the default shiki languages are now ['plaintext'], although it shouldn't make much difference to a user since the rest will be lazily loaded anyway if they use them.

For reviewers

Each commit describes its own change, so it may be worth reading the commit messages and/or the diff of the individual commit if you want to understand better.

changeset-bot · 2024-03-30T16:10:06Z

🦋 Changeset detected

Latest commit: aaddd40

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

43081j · 2024-03-30T16:29:58Z

making this a draft for now as its possible we can't have these nice things unless rehype lets us do async 👀

edit: false alarm, looks like it is happy

bluwy · 2024-04-01T07:02:50Z

Thanks! All these changes make sense to me. About shiki lazy load, we actually used to do the same but I removed it because I thought shikiji/shiki 1.0 already did that by default, but in hindsight it probably didn't because its highlighter functions are all sync anyways (which you pointed out).

About your message on Discord on version bumps, this will require a major for @astrojs/markdown-remark as it's published. Adding a major changeset for it should be enough and changesets will auto bump the rest (I hope). There's already many things within @astrojs/markdown-remark that had been marked deprecated, and this seems like the right time to clean it all up.

Would you be able to add a major changeset for@astrojs/markdown-remark? I'll handle the other major changes in another PR.

packages/markdown/remark/src/shiki.ts

43081j · 2024-04-01T13:32:47Z

have fixed the broken test

FYI it was working because of this it seems:

astro/packages/astro/src/runtime/server/escape.ts

Lines 99 to 103 in 0be26d8

    
           // If a promise, await the result and mark that. 
        
           else if (typeof str.then === 'function') { 
        
           	return Promise.resolve(str).then((value) => { 
        
           		return unescapeHTML(value); 
        
           	});

i suppose <Fragment set:html=${promise}> results in it awaiting the promise internally via the above, so it was working fine

edit: yeah of course, we await the child in renderChild so that was resulting in the highlight task being resolved properly

This moves the static image helper outside of the build hooks such that the scope is no longer captured. Since the static image helper exists on `window`, none of the function's scope will be garbage collected at any point. Part of said scope is `resolvedConfig`, which holds package caches, plugins, etc. These will also be retained. By moving the function into a factory, we no longer hold a reference to `resolvedConfig`, leading to lower memory usage.

This discards the SSR plugin context once a build has finished, so we're no longer holding it in memory and it can be garbage collected.

This discards the markdown processor once a build ends, allowing it to be garbage collected.

When highlighting code via shiki, we default to enabling all available languages. This results in shiki internally loading each individual language module, evaluating it, and keeping hold of it in memory. Instead, we should be able to load the bare minimum and (try) lazily load any missing languages later on. By doing this, we're not unnecessarily loading up dozens of large modules and will only load what the user consumes. Since traversal of the AST via unist is synchronous, we first do the traversal to find the nodes and later asynchronously _process_ the nodes (i.e. execute the highlighter and mutate the AST).

This adds a test for ensuring highlighting happens when using a lazily loaded language (typescript in this case).

Aligning the prism plugin with the shiki plugin - exporting an async visitor since the code block highlighter now returns a promise.

Markdoc actually supports async transforms already. Now that the shiki highlighting is async, we also need to make our transform async. Do note that typescript will be unhappy until markdoc/markdoc#495 is resolved.

bluwy

Thanks!

.changeset/real-rabbits-bake.md

github-actions bot added feat: markdown Related to Markdown (scope) pkg: create-astro Related to the `create-astro` package (scope) pkg: astro Related to the core `astro` package (scope) labels Mar 30, 2024

43081j force-pushed the shiki-memory branch from a3e4c0a to b080fa7 Compare March 30, 2024 16:16

43081j marked this pull request as draft March 30, 2024 16:29

github-actions bot added the pkg: integration Related to any renderer integration (scope) label Mar 30, 2024

43081j marked this pull request as ready for review March 30, 2024 17:22

bluwy reviewed Apr 1, 2024

View reviewed changes

packages/markdown/remark/src/shiki.ts Show resolved Hide resolved

bluwy mentioned this pull request Apr 1, 2024

Remove deprecated APIs from @astrojs/markdown-remark #10629

Merged

43081j force-pushed the shiki-memory branch from 2eb4792 to dc0a17a Compare April 1, 2024 13:30

43081j added 7 commits April 1, 2024 14:33

feat: discard plugin context after build

1986dda

This discards the SSR plugin context once a build has finished, so we're no longer holding it in memory and it can be garbage collected.

feat: discard markdown processor on build end

ec02b09

This discards the markdown processor once a build ends, allowing it to be garbage collected.

test: add a test for lazily loaded shiki languages

f45205b

This adds a test for ensuring highlighting happens when using a lazily loaded language (typescript in this case).

fix: make prism highlighting async

74cdd2f

Aligning the prism plugin with the shiki plugin - exporting an async visitor since the code block highlighter now returns a promise.

fix: make markdoc transforms async

5e9b258

Markdoc actually supports async transforms already. Now that the shiki highlighting is async, we also need to make our transform async. Do note that typescript will be unhappy until markdoc/markdoc#495 is resolved.

43081j force-pushed the shiki-memory branch from dc0a17a to 945a7b1 Compare April 1, 2024 13:33

fix: await highlight result in code component

4a48b2c

43081j force-pushed the shiki-memory branch from 945a7b1 to 4a48b2c Compare April 1, 2024 13:34

chore: add changeset

ee0002e

bluwy approved these changes Apr 1, 2024

View reviewed changes

bluwy reviewed Apr 1, 2024

View reviewed changes

.changeset/real-rabbits-bake.md Outdated Show resolved Hide resolved

Update .changeset/real-rabbits-bake.md

aaddd40

bluwy merged commit 374efcd into withastro:main Apr 1, 2024
4 checks passed

astrobot-houston mentioned this pull request Apr 1, 2024

[ci] release #10603

Merged

43081j deleted the shiki-memory branch April 1, 2024 15:41

bluwy mentioned this pull request Apr 12, 2024

Remove MDX processor on buildEnd #10770

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy loaded shiki languages during syntax highlighting #10618

Lazy loaded shiki languages during syntax highlighting #10618

43081j commented Mar 30, 2024 •

edited

Loading

changeset-bot bot commented Mar 30, 2024 •

edited

Loading

43081j commented Mar 30, 2024 •

edited

Loading

bluwy commented Apr 1, 2024

43081j commented Apr 1, 2024 •

edited

Loading

bluwy left a comment

Lazy loaded shiki languages during syntax highlighting #10618

Lazy loaded shiki languages during syntax highlighting #10618

Conversation

43081j commented Mar 30, 2024 • edited Loading

Changes

Memory benchmark

Testing

Docs

For reviewers

changeset-bot bot commented Mar 30, 2024 • edited Loading

🦋 Changeset detected

43081j commented Mar 30, 2024 • edited Loading

bluwy commented Apr 1, 2024

43081j commented Apr 1, 2024 • edited Loading

bluwy left a comment

Choose a reason for hiding this comment

43081j commented Mar 30, 2024 •

edited

Loading

changeset-bot bot commented Mar 30, 2024 •

edited

Loading

43081j commented Mar 30, 2024 •

edited

Loading

43081j commented Apr 1, 2024 •

edited

Loading