Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow sidebar generation when there are a lot of items #1318

Closed
1 task
stavros-k opened this issue Dec 28, 2023 · 13 comments
Closed
1 task

Slow sidebar generation when there are a lot of items #1318

stavros-k opened this issue Dec 28, 2023 · 13 comments

Comments

@stavros-k
Copy link

stavros-k commented Dec 28, 2023

What version of starlight are you using?

0.15.1

What version of astro are you using?

4.0.8

What package manager are you using?

npm

What operating system are you using?

Linux

What browser are you using?

Chrome

Describe the Bug

Autogenerating sidebar with a lot of pages, impacts performance by a large amount.

Little backstory:
I'm doing some PoC to convert a Docusarus site https://truecharts.org / https://github.com/truecharts/website/ to Starlight.
Started by dumping all .md files and started a build (Fixed few issues with frontmatter missing etc).
I instantly got an insanely fast build under 40sec! Compared to 20+min we had before with cache prefilled.

Then I enabled autogeneration of sidebar and things got weird :D

No sidebar:
image
Sidebar:
image

Keep in mind that most of the helm-security.md files are empty, just a title in the frontmatter and nothing else
(Had to disable a generate due to other perf issues, unrelated).

Some more info might be gathered from the discord chat https://discord.com/channels/830184174198718474/1070481941863878697/1190075061797924974

I'd love to see some performance improvement regarding this!

Link to Minimal Reproducible Example

Participation

  • I am willing to submit a pull request for this issue.
@HiDeoo
Copy link
Member

HiDeoo commented Dec 28, 2023

Dumping some rough notes regarding the issue:

  • The issue was first brought up with a sidebar containing 1700 auto-generated entries.
  • After some quick testing, it turns out the same slowness is observed with a sidebar containing 1700 entries manually added with no auto-generation involved at all.
  • Commenting the rendering of nested sublists (and thus not rendering a huge amount of entries) seems to fix the issue which may indicate that the issue is more related to the rendering part rather than the data generation/auto-generation part.
  • The markup for the sidebar could be different in each page and even be overridden through component overrides.

@delucis
Copy link
Member

delucis commented Jan 2, 2024

Thanks for the issue @stavros-k and the investigations @HiDeoo! Sounds like the problem is not the generation step then, but more what HiDeoo concluded: “the issue is more related to the rendering part”.

Subgroups use <Astro.self /> to render recursively, so I’m curious if that could be a source of slow rendering. Would be interesting to compare nested items vs flat items. Will try running some tests now.

@delucis
Copy link
Member

delucis commented Jan 2, 2024

Sharing some preliminary test results. Does look like we hit something exponential even with a flat sidebar looking at the second benchmark.

Flat sidebar

This test uses the Starlight starter template and adds a number of extra top-level links to the sidebar (controlled via the ITEMS=n variable). Unsurprisingly, bigger sidebars take longer.

4 pages

Default starter template pages (3 pages + 404).

Benchmark 1: ITEMS=1 pnpm build
  Time (mean ± σ):      4.719 s ±  0.055 s    [User: 8.284 s, System: 0.834 s]
  Range (min … max):    4.640 s …  4.800 s    10 runs
 
Benchmark 2: ITEMS=1000 pnpm build
  Time (mean ± σ):      5.003 s ±  0.097 s    [User: 8.983 s, System: 0.868 s]
  Range (min … max):    4.917 s …  5.206 s    10 runs
 
Benchmark 3: ITEMS=2000 pnpm build
  Time (mean ± σ):      6.048 s ±  0.349 s    [User: 10.254 s, System: 0.892 s]
  Range (min … max):    5.388 s …  6.582 s    10 runs
 
Benchmark 4: ITEMS=4000 pnpm build
  Time (mean ± σ):     10.305 s ±  0.534 s    [User: 14.825 s, System: 0.987 s]
  Range (min … max):    9.420 s … 10.855 s    10 runs
 
Summary
  ITEMS=1 pnpm build ran
    1.06 ± 0.02 times faster than ITEMS=1000 pnpm build
    1.28 ± 0.08 times faster than ITEMS=2000 pnpm build
    2.18 ± 0.12 times faster than ITEMS=4000 pnpm build

103 pages

Same as above, but the default src/content/guides/example.md was duplicated 99 times.

Benchmark 1: ITEMS=1 pnpm build
  Time (mean ± σ):      6.672 s ±  0.176 s    [User: 11.485 s, System: 1.315 s]
  Range (min … max):    6.447 s …  6.953 s    10 runs
 
Benchmark 2: ITEMS=1000 pnpm build
  Time (mean ± σ):     10.652 s ±  0.392 s    [User: 17.624 s, System: 1.670 s]
  Range (min … max):   10.107 s … 11.273 s    10 runs
 
Benchmark 3: ITEMS=2000 pnpm build
  Time (mean ± σ):     53.578 s ±  2.254 s    [User: 63.410 s, System: 2.293 s]
  Range (min … max):   51.055 s … 57.011 s    10 runs

Benchmark 4: ITEMS=4000 pnpm build
  Time (mean ± σ):     186.694 s ±  6.164 s    [User: 210.440 s, System: 3.198 s]
  Range (min … max):   179.578 s … 199.538 s    10 runs
 
Summary
  ITEMS=1 pnpm build ran
    1.60 ± 0.07 times faster than ITEMS=1000 pnpm build
    8.03 ± 0.40 times faster than ITEMS=2000 pnpm build
   27.98 ± 1.18 times faster than ITEMS=4000 pnpm build

@delucis
Copy link
Member

delucis commented Jan 2, 2024

I do think this is probably something Starlight is doing.

I made a quick benchmark using Astro’s empty/minimal template that can generate a different number of pages or list items and didn’t see the same exponential behaviour:

Benchmark 1: ITEMS=1 PAGES=1 pnpm build
  Time (mean ± σ):      2.206 s ±  0.076 s    [User: 3.084 s, System: 0.444 s]
  Range (min … max):    2.121 s …  2.329 s    10 runs
 
Benchmark 2: ITEMS=1000 PAGES=1 pnpm build
  Time (mean ± σ):      2.230 s ±  0.067 s    [User: 3.251 s, System: 0.443 s]
  Range (min … max):    2.141 s …  2.360 s    10 runs
 
Benchmark 3: ITEMS=2000 PAGES=1 pnpm build
  Time (mean ± σ):      2.426 s ±  0.149 s    [User: 3.556 s, System: 0.479 s]
  Range (min … max):    2.230 s …  2.703 s    10 runs
 
Benchmark 4: ITEMS=4000 PAGES=1 pnpm build
  Time (mean ± σ):      2.364 s ±  0.100 s    [User: 3.493 s, System: 0.461 s]
  Range (min … max):    2.238 s …  2.516 s    10 runs
 
Benchmark 5: ITEMS=1 PAGES=100 pnpm build
  Time (mean ± σ):      2.338 s ±  0.051 s    [User: 3.325 s, System: 0.489 s]
  Range (min … max):    2.255 s …  2.419 s    10 runs
 
Benchmark 6: ITEMS=1000 PAGES=100 pnpm build
  Time (mean ± σ):      3.734 s ±  0.147 s    [User: 5.536 s, System: 0.650 s]
  Range (min … max):    3.521 s …  3.968 s    10 runs
 
Benchmark 7: ITEMS=2000 PAGES=100 pnpm build
  Time (mean ± σ):      5.387 s ±  0.212 s    [User: 7.943 s, System: 0.866 s]
  Range (min … max):    5.043 s …  5.803 s    10 runs
 
Benchmark 8: ITEMS=4000 PAGES=100 pnpm build
  Time (mean ± σ):      8.176 s ±  0.552 s    [User: 12.653 s, System: 1.391 s]
  Range (min … max):    7.547 s …  9.142 s    10 runs
 
Summary
  ITEMS=1 PAGES=1 pnpm build ran
    1.01 ± 0.05 times faster than ITEMS=1000 PAGES=1 pnpm build
    1.06 ± 0.04 times faster than ITEMS=1 PAGES=100 pnpm build
    1.07 ± 0.06 times faster than ITEMS=4000 PAGES=1 pnpm build
    1.10 ± 0.08 times faster than ITEMS=2000 PAGES=1 pnpm build
    1.69 ± 0.09 times faster than ITEMS=1000 PAGES=100 pnpm build
    2.44 ± 0.13 times faster than ITEMS=2000 PAGES=100 pnpm build
    3.71 ± 0.28 times faster than ITEMS=4000 PAGES=100 pnpm build

@HiDeoo
Copy link
Member

HiDeoo commented Jan 2, 2024

Would you be able to share the repro using Astro’s minimal template or describe it, e.g. what the pages looks like, are they physical pages on disk, etc.?

I'm asking because with the same template I'm getting faster time than you for a ITEMS=1 PAGES=100 pnpm build setup so I was expecting to get something faster or close to you for ITEMS=4000 PAGES=100 pnpm build, but I'm getting way slower time 🤔

Benchmark 1: ITEMS=1 pnpm build
  Time (mean ± σ):      1.692 s ±  0.126 s    [User: 2.920 s, System: 0.334 s]
  Range (min … max):    1.553 s …  1.998 s    10 runs

Benchmark 2: ITEMS=4000 pnpm build
  Time (mean ± σ):     26.435 s ±  0.962 s    [User: 30.663 s, System: 1.147 s]
  Range (min … max):   24.636 s … 27.282 s    10 runs

Summary
  ITEMS=1 pnpm build ran
   15.62 ± 1.29 times faster than ITEMS=4000 pnpm build

@delucis
Copy link
Member

delucis commented Jan 2, 2024

Would you be able to share the repro using Astro’s minimal template or describe it, e.g. what the pages looks like, are they physical pages on disk, etc.?

I renamed index.astro to [slug].astro and made it generate all the pages:

---
import type { GetStaticPaths } from 'astro';
export const getStaticPaths = (() => {
  return Array.from({ length: parseInt(import.meta.env.PAGES || '1') })
    .fill('')
    .map((_, i) => ({ params: { slug: `page-${i}` } }));
}) satisfies GetStaticPaths;
---

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <link rel="icon" type="image/svg+xml" href="/favicon.svg" />
    <meta name="viewport" content="width=device-width" />
    <meta name="generator" content={Astro.generator} />
    <title>Astro</title>
  </head>
  <body>
    <h1>Astro</h1>
    <ul>
      {
        Array.from({ length: parseInt(import.meta.env.ITEMS || '1') })
          .fill('')
          .map((_, i) => (
            <li>
              <a href={`page-${i}`}>{i}</a>
            </li>
          ))
      }
    </ul>
  </body>
</html>

It is a bit simpler than the Starlight template obviously.

I’ve also tried running 0x to look at the flamegraphs of whats happening, but didn’t yet find it super useful. For example, the worst performing build (Starlight with 4000 sidebar entries and 103 pages):

Flamegraph showing 75% of the stack spent on a read function internal to Node

Pretty similar with 2000 sidebar entries, except now weirdly undici also shows up in that section of the graph, even though I don’t think any network stuff is happening.

Flamegraph showing the same read function at 69% of the stack, but now on top of an undici internal

@HiDeoo
Copy link
Member

HiDeoo commented Jan 3, 2024

Pretty similar with 2000 sidebar entries, except now weirdly undici also shows up in that section of the graph, even though I don’t think any network stuff is happening.

I would think this is not directly related to network I/O but due to some streams API being used in the code.


I continued exploring a little bit more in some free time with node --prof, node --prof-process, node --inspect-brk and the Chrome DevTools profiler.

Starting from an example similar to yours, using the Starlight starter template, with 103 pages and 4000 sidebar entries, I get the following results which are close to yours as expected:

Benchmark 1: pnpm build
  Time (mean ± σ):     117.772 s ±  4.576 s    [User: 125.633 s, System: 2.044 s]
  Range (min … max):   109.370 s … 123.736 s    10 runs

Altho, if I apply the following change in Astro to src/core/render/core.ts

	const response = await runtimeRenderPage(
		result,
		Component,
		renderContext.props,
		{},
-		env.streaming,
+		false,
		renderContext.route
	);

I get the following results:

Benchmark 1: pnpm build
  Time (mean ± σ):     11.561 s ±  0.480 s    [User: 19.444 s, System: 1.520 s]
  Range (min … max):   10.852 s … 12.282 s    10 runs

Unfortunately, I had to stop here and didn't get the time yet to investigate further.

@delucis
Copy link
Member

delucis commented Jan 3, 2024

Very interesting, thanks @HiDeoo! That flag controls whether Astro renders to a ReadableStream or a string: https://github.com/withastro/astro/blob/da307e4a080483f8763f1919a05fa2194bb14e22/packages/astro/src/runtime/server/render/page.ts#L49-L53

Could be that rendering to the format required for stream consumption is much more expensive beyond some certain threshold for some reason 🤔

@HiDeoo
Copy link
Member

HiDeoo commented Jan 3, 2024

Yeah, and it looks like the bigger the stream gets, the worse it gets with most of the time being spend in undici readAllBytes(). I don't think we can do much about it from Starlight as it's not something exposed by Astro altho I'm starting to wonder if there are any real benefits to this path with SSG 🤔

@delucis
Copy link
Member

delucis commented Jan 3, 2024

Yeah from my testing there won’t be much we can do in Starlight. We can try pulling a few bits of logic up to sidebar generation and out of the component to see if that helps performance, but it would be small incremental benefits I suspect, rather than the big shift needed to unlock this. Did some tests where I simplified the component a bit and you still hit this inflection point eventually.

I did notice a style hoisting bug with components in MDX when I built with a patch passing false instead of env.streaming like in your comment @HiDeoo, so switching off the streaming in SSG may require fixing some other bugs, but does seem very attractive.

@matthewp has also been chatting with some of the Node.js folks and they suggested Astro doesn’t need ReadableStream at all and could do what it needs with an async iterator: https://twitter.com/matteocollina/status/1742634854972342599

FWIW, some more data from my tests with a slightly simpler sidebar component (hence the higher numbers than we saw above where the trouble starts):

Percent of the stack consumed by the readAllBytes() method

graph curve showing near 0% stack use at first but quickly increasing once list items is above 4000 items

(These are just results from one-off runs rather than proper benchmarks as I wasn’t sure how to get the flamegraph data otherwise)

Total build time (for a modest 23-page site)

graph showing an exponential curve where build time increases as list items increase

(10 runs each with hyperfine)

Full benchmark data
Command Mean [s] Min [s] Max [s] Relative
ITEMS=1000 pnpm build 7.709 ± 0.325 7.386 8.564 1.00
ITEMS=2000 pnpm build 8.358 ± 0.146 8.104 8.511 1.08 ± 0.05
ITEMS=3000 pnpm build 8.693 ± 0.121 8.544 8.914 1.13 ± 0.05
ITEMS=4000 pnpm build 9.473 ± 0.175 9.206 9.834 1.23 ± 0.06
ITEMS=5000 pnpm build 16.209 ± 0.501 15.701 16.875 2.10 ± 0.11
ITEMS=6000 pnpm build 23.813 ± 1.553 20.433 25.278 3.09 ± 0.24
ITEMS=7000 pnpm build 21.604 ± 1.682 18.454 24.055 2.80 ± 0.25
ITEMS=8000 pnpm build 35.274 ± 3.218 32.081 39.589 4.58 ± 0.46
ITEMS=9000 pnpm build 46.008 ± 2.098 42.993 49.602 5.97 ± 0.37
ITEMS=10000 pnpm build 52.259 ± 2.289 48.252 55.392 6.78 ± 0.41

@HiDeoo
Copy link
Member

HiDeoo commented Jan 4, 2024

Did some tests where I simplified the component a bit and you still hit this inflection point eventually.

Yeah, this is not something I personally even tried as I imagined this would only slightly just delay the issue.

The entire conversation you linked and the trick suggested by Matteo is super interesting.

Thanks for sharing all this and it's great to at least already have a lead on what could be the issue / potential improvements.

@delucis
Copy link
Member

delucis commented Jan 4, 2024

Yes, will be cool if we uncovered some improvements for all Astro users!

For the simple change you suggested of disabling streaming in SSR, I'm currently testing the preview release in withastro/astro#9603

In the existing Astro benchmarks it seems to make no difference, but I don't think they have a test like this long list case.

@delucis
Copy link
Member

delucis commented Jan 5, 2024

Thanks again for the issue @stavros-k — it’s led to quite some investigations in the whole Astro team! There are currently two different Astro PRs looking to improve performance here: withastro/astro#9603 and withastro/astro#9614.

To round off my data mentioned above, using the preview release saw a significant improvement in my benchmark with the current Astro release in blue and the experimental preview release with streaming disabled (withastro/astro#9603) in red:

chart showing build time increasing exponentially as list size increases for current Astro and a more or less flat curve as list size increases with a preview release with streaming disabled

Can hopefully run the same benchmark for withastro/astro#9614 soon.

Given these investigations, I hope it’s OK I close this issue here as it seems to be something that needs improving upstream. I’ll be monitoring those two PRs and pushing for these improvements there!

@delucis delucis closed this as completed Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants