Skip to content

Commit

Permalink
Improve MDX optimize with sibling nodes (#10887)
Browse files Browse the repository at this point in the history
  • Loading branch information
bluwy authored May 1, 2024
1 parent 7edbb91 commit c997c1b
Show file tree
Hide file tree
Showing 5 changed files with 224 additions and 19 deletions.
5 changes: 5 additions & 0 deletions .changeset/smart-rats-mate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@astrojs/mdx": patch
---

Updates the `optimize` option to group static sibling nodes as a `<Fragment />`. This reduces the number of AST nodes and simplifies runtime rendering of MDX pages.
5 changes: 5 additions & 0 deletions .changeset/violet-snails-call.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@astrojs/mdx": patch
---

Fixes `export const components` keys detection for the `optimize` option
38 changes: 27 additions & 11 deletions packages/integrations/mdx/src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,7 @@ After:

```jsx
function _createMdxContent() {
return (
<>
<h1>My MDX Content</h1>
<pre set:html="<code class=...</code>"></pre>
</>
);
return <Fragment set:html="<h1>My MDX Content</h1>\n<code class=...</code>" />;
}
```

Expand All @@ -49,15 +44,20 @@ The next section explains the algorithm, which you can follow along by pairing w

### How it works

Two variables:
The flow can be divided into a "scan phase" and a "mutation phase". The scan phase searches for nodes that can be optimized, and the mutation phase applies the optimization on the `hast` nodes.

#### Scan phase

Variables:

- `allPossibleElements`: A set of subtree roots where we can add a new `set:html` property with its children as value.
- `elementStack`: The stack of elements (that could be subtree roots) while traversing the `hast` (node ancestors).
- `elementMetadatas`: A weak map to store the metadata used only by the mutation phase later.

Flow:

1. Walk the `hast` tree.
2. For each `node` we enter, if the `node` is static (`type` is `element` or `mdxJsxFlowElement`), record in `allPossibleElements` and push to `elementStack`.
2. For each `node` we enter, if the `node` is static (`type` is `element` or starts with `mdx`), record in `allPossibleElements` and push to `elementStack`. We also record additional metadata in `elementMetadatas` for the mutation phase later.
- Q: Why do we record `mdxJsxFlowElement`, it's MDX? <br>
A: Because we're looking for nodes whose children are static. The node itself doesn't need to be static.
- Q: Are we sure this is the subtree root node in `allPossibleElements`? <br>
Expand All @@ -71,8 +71,24 @@ Flow:
- Q: Why before step 2's `node` enter handling? <br>
A: If we find a non-static `node`, the `node` should still be considered in `allPossibleElements` as its children could be static.
5. Walk done. This leaves us with `allPossibleElements` containing only subtree roots that can be optimized.
6. Add the `set:html` property to the `hast` node, and remove its children.
7. 🎉 The rest of the MDX pipeline will do its thing and generate the desired JSX like above.
6. Proceed to the mutation phase.

#### Mutation phase

Inputs:

- `allPossibleElements` from the scan phase.
- `elementMetadatas` from the scan phase.

Flow:

1. Before we mutate the `hast` tree, each element in `allPossibleElements` may have siblings that can be optimized together. Sibling elements are grouped with the `findElementGroups()` function, which returns an array of element groups (new variable `elementGroups`) and mutates `allPossibleElements` to remove elements that are already part of a group.
- Q: How does `findElementGroups()` work? <br>
A: For each elements in `allPossibleElements` that are non-static, we're able to take the element metadata from `elementMetadatas` and guess the next sibling node. If the next sibling node is static and is an element in `allPossibleElements`, we group them together for optimization. It continues to guess until it hits a non-static node or an element not in `allPossibleElements`, which it'll finalize the group as part of the returned result.

2. For each elements in `allPossibleElements`, we serailize them as HTML and add it to the `set:html` property of the `hast` node, and remove its children.
3. For each element group in `elementGroups`, we serialize the group children as HTML and add it to a new `<Fragment set:html="..." />` node, and replace the group children with the new `<Fragment />` node.
4. 🎉 The rest of the MDX pipeline will do its thing and generate the desired JSX like above.

### Extra

Expand All @@ -82,7 +98,7 @@ Astro's MDX implementation supports specifying `export const components` in the

#### Further optimizations

In [How it works](#how-it-works) step 4,
In [Scan phase](#scan-phase) step 4,

> we remove all the elements in `elementStack` from `allPossibleElements`
Expand Down
123 changes: 115 additions & 8 deletions packages/integrations/mdx/src/rehype-optimize-static.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ export interface OptimizeOptions {
ignoreComponentNames?: string[];
}

interface ElementMetadata {
parent: Node;
index: number;
}

const exportConstComponentsRe = /export\s+const\s+components\s*=/;

/**
Expand All @@ -29,10 +34,11 @@ export function rehypeOptimizeStatic(options?: OptimizeOptions) {
for (const child of tree.children) {
if (child.type === 'mdxjsEsm' && exportConstComponentsRe.test(child.value)) {
// Try to loosely get the object property nodes
const objectPropertyNodes = child.data.estree.body[0]?.declarations?.[0]?.init?.properties;
const objectPropertyNodes =
child.data.estree?.body[0]?.declaration?.declarations?.[0]?.init?.properties;
if (objectPropertyNodes) {
for (const objectPropertyNode of objectPropertyNodes) {
const componentName = objectPropertyNode.key?.name ?? objectPropertyNode.key?.value;
const componentName = objectPropertyNode.key?.name;
if (componentName) {
ignoreComponentNames.add(componentName);
}
Expand All @@ -45,18 +51,22 @@ export function rehypeOptimizeStatic(options?: OptimizeOptions) {
const allPossibleElements = new Set<Node>();
// The current collapsible element stack while traversing the tree
const elementStack: Node[] = [];
// Metadata used by `findElementGroups` later
const elementMetadatas = new WeakMap<Node, ElementMetadata>();

const isNodeNonStatic = (node: Node) => {
return node.type.startsWith('mdx') || ignoreComponentNames.has(node.tagName);
};

visit(tree, {
enter(node, key) {
enter(node, key, index, parents) {
// `estree-util-visit` may traverse in MDX `attributes`, we don't want that. Only continue
// if it's traversing the root, or the `children` key.
if (key != null && key !== 'children') return SKIP;

// @ts-expect-error read tagName naively
const isNodeIgnored = node.tagName && ignoreComponentNames.has(node.tagName);
// For nodes that can't be optimized, eliminate all elements in the
// `elementStack` from the `allPossibleElements` set.
if (node.type.startsWith('mdx') || isNodeIgnored) {
// For nodes that are not static, eliminate all elements in the `elementStack` from the
// `allPossibleElements` set.
if (isNodeNonStatic(node)) {
for (const el of elementStack) {
allPossibleElements.delete(el);
}
Expand All @@ -72,6 +82,12 @@ export function rehypeOptimizeStatic(options?: OptimizeOptions) {
if (node.type === 'element' || isMdxComponentNode(node)) {
elementStack.push(node);
allPossibleElements.add(node);

// @ts-expect-error MDX types for `.type` is not enhanced because MDX isn't used directly
if (index != null && node.type === 'element') {
// Record metadata for element node to be used for grouping analysis later
elementMetadatas.set(node, { parent: parents[parents.length - 1], index });
}
}
},
leave(node, key, _, parents) {
Expand All @@ -97,6 +113,11 @@ export function rehypeOptimizeStatic(options?: OptimizeOptions) {
},
});

// Within `allPossibleElements`, element nodes are often siblings and instead of setting `set:html`
// on each of the element node, we can create a `<Fragment set:html="...">` element that includes
// all element nodes instead, simplifying the output.
const elementGroups = findElementGroups(allPossibleElements, elementMetadatas, isNodeNonStatic);

// For all possible subtree roots, collapse them into `set:html` and
// strip of their children
for (const el of allPossibleElements) {
Expand All @@ -114,9 +135,95 @@ export function rehypeOptimizeStatic(options?: OptimizeOptions) {
}
el.children = [];
}

// For each element group, we create a new `<Fragment />` MDX node with `set:html` of the children
// serialized as HTML. We insert this new fragment, replacing all the group children nodes.
// We iterate in reverse to avoid changing the index of groups of the same parent.
for (let i = elementGroups.length - 1; i >= 0; i--) {
const group = elementGroups[i];
const fragmentNode = {
type: 'mdxJsxFlowElement',
name: 'Fragment',
attributes: [
{
type: 'mdxJsxAttribute',
name: 'set:html',
value: toHtml(group.children),
},
],
children: [],
};
group.parent.children.splice(group.startIndex, group.children.length, fragmentNode);
}
};
}

interface ElementGroup {
parent: Node;
startIndex: number;
children: Node[];
}

/**
* Iterate through `allPossibleElements` and find elements that are siblings, and return them. `allPossibleElements`
* will be mutated to exclude these grouped elements.
*/
function findElementGroups(
allPossibleElements: Set<Node>,
elementMetadatas: WeakMap<Node, ElementMetadata>,
isNodeNonStatic: (node: Node) => boolean
): ElementGroup[] {
const elementGroups: ElementGroup[] = [];

for (const el of allPossibleElements) {
// Non-static nodes can't be grouped. It can only optimize its static children.
if (isNodeNonStatic(el)) continue;

// Get the metadata for the element node, this should always exist
const metadata = elementMetadatas.get(el);
if (!metadata) {
throw new Error(
'Internal MDX error: rehype-optimize-static should have metadata for element node'
);
}

// For this element, iterate through the next siblings and add them to this array
// if they are text nodes or elements that are in `allPossibleElements` (optimizable).
// If one of the next siblings don't match the criteria, break the loop as others are no longer siblings.
const groupableElements = [el];
for (let i = metadata.index + 1; i < metadata.parent.children.length; i++) {
const node = metadata.parent.children[i];

// If the node is non-static, we can't group it with the current element
if (isNodeNonStatic(node)) break;

if (node.type === 'element') {
// This node is now (presumably) part of a group, remove it from `allPossibleElements`
const existed = allPossibleElements.delete(node);
// If this node didn't exist in `allPossibleElements`, it's likely that one of its children
// are non-static, hence this node can also not be grouped. So we break out here.
if (!existed) break;
}

groupableElements.push(node);
}

// If group elements are more than one, add them to the `elementGroups`.
// Grouping is most effective if there's multiple elements in it.
if (groupableElements.length > 1) {
elementGroups.push({
parent: metadata.parent,
startIndex: metadata.index,
children: groupableElements,
});
// The `el` is also now part of a group, remove it from `allPossibleElements`
allPossibleElements.delete(el);
}
}

return elementGroups;
}

function isMdxComponentNode(node: any) {
return node.type === 'mdxJsxFlowElement' || node.type === 'mdxJsxTextElement';
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import assert from 'node:assert/strict';
import { describe, it } from 'node:test';
import { compile as _compile } from '@mdx-js/mdx';
import { rehypeOptimizeStatic } from '../../dist/rehype-optimize-static.js';

/**
* @param {string} mdxCode
* @param {Readonly<import('@mdx-js/mdx').CompileOptions>} options
*/
async function compile(mdxCode, options) {
const result = await _compile(mdxCode, {
jsx: true,
rehypePlugins: [rehypeOptimizeStatic],
...options,
});
const code = result.toString();
// Capture the returned JSX code for testing
const jsx = code.match(/return (.+);\n\}\nexport default function MDXContent/s)?.[1];
if (jsx == null) throw new Error('Could not find JSX code in compiled MDX');
return dedent(jsx);
}

function dedent(str) {
const lines = str.split('\n');
if (lines.length <= 1) return str;
// Get last line indent, and dedent this amount for the other lines
const lastLineIndent = lines[lines.length - 1].match(/^\s*/)[0].length;
return lines.map((line, i) => (i === 0 ? line : line.slice(lastLineIndent))).join('\n');
}

describe('rehype-optimize-static', () => {
it('works', async () => {
const jsx = await compile(`# hello`);
assert.equal(
jsx,
`\
<_components.h1 {...{
"set:html": "hello"
}} />`
);
});

it('groups sibling nodes as a single Fragment', async () => {
const jsx = await compile(`\
# hello
foo bar
`);
assert.equal(
jsx,
`\
<Fragment set:html="<h1>hello</h1>
<p>foo bar</p>" />`
);
});

it('skips optimization of components', async () => {
const jsx = await compile(`\
import Comp from './Comp.jsx';
# hello
This is a <Comp />
`);
assert.equal(
jsx,
`\
<><Fragment set:html="<h1>hello</h1>
" /><_components.p>{"This is a "}<Comp /></_components.p></>`
);
});
});

0 comments on commit c997c1b

Please sign in to comment.