Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collecting Requirements for Per-Language Splitting #88

Open
LorisSigrist opened this issue Apr 22, 2024 — with Linear · 24 comments
Open

Collecting Requirements for Per-Language Splitting #88

LorisSigrist opened this issue Apr 22, 2024 — with Linear · 24 comments
Assignees
Labels

Comments

Copy link
Collaborator

LorisSigrist commented Apr 22, 2024

Context

Paraglide currently splits messages by component / page. If you load a page with 3 client components (or your framework's equivalent) only the messages for those three components are sent to the client. But, they are currently sent in all languages. Ideally we would only send messages in the language that is displayed.

This issue collects ideas on how that could be achieved

Expected Impact - Case Study Inlang.com

The average translation (1 message in one language) on Inlang.com is about 50 - 60 bytes. Times that by the number of languages (7) & you get the average impact per message. About 400 bytes.

There are about 200 messages on the Website, but because of per-page splitting only an average of 20 are loaded when you go to a page. This leaves us with a bundle-size impact of 400 * 20 = 8kB per page on average.

If we got per-language splitting to work on top of that it could save 6 out of 7 bytes, leaving us at just over 1kB. This would be a huge win, but only if the language-splitting adds less than 7kB to the client bundle.

Inlang.com has 7 languages, which is more than most sites. Usually you would have between 2 and 4. So the actual size-limit for the per-page splitting runtime would be about 2kB. For context: i18next is 40kB.

Work done so far

We have already tried a few approaches & run into various challenges.

  • Copying the routes/ directory for each language & using middleware to multiplex between the different builds based on language.
    • Imports from in/out of the routes/ folder are incredibly fragile
    • Doesn't work for all routers
    • Only works if the framework has a rewrite mechanism
  • Post-processing the build output by copying each output file for each language and replacing messages with the language-specific version.
    • Doesn't work with compressed build ouputs
    • Introduces various linking issues

Fundamentally this is a dynamic linking problem in a world of ESM and static linking, which is really hard.

Another promising idea that we haven't tried yet is to serialize the messages & pass them along with the page-data. However, there are open questions on how we would know which messages need to be sent .

Note: Lazy Loading is not the Solution

Any solution using fetch or await import is bound to introduce a render-fetch waterfall which drastically increases Time-To-Interactive. Eagerly loading messages in all languages is preferable in the vast majority of cases.

Most projects have between 2-4 languages, lazy-loading only becomes justifiable at 10<.

@LorisSigrist LorisSigrist added the Feature label Apr 22, 2024 — with Linear
@LorisSigrist LorisSigrist self-assigned this Apr 22, 2024
@osdiab
Copy link

osdiab commented Apr 30, 2024

Keenly watching this. Seems like a core make or break feature that determines if this library can truly scale.

@LorisSigrist
Copy link
Collaborator Author

Per-Language splitting is one of our big goals!

That being said, Paraglide already does scale really well. Because of it's small footprint (tiny runtime, minified message ids, per-client-component-splitting) it already stays small, even when shipping extra languages.

We did some benchmarks on this:

  • As long as you stay under 5 Languages Paraglide already is the smallest choice.
  • If you're using a Framework with Server-Components / Islands / Some sort of partial hydration it stays the best choice for up to 10 languages.

Per-Language splitting will make it so that paraglide stays the best regardless of how many languages you have, but for a lot of projects it's already the best choice.

@osdiab
Copy link

osdiab commented May 24, 2024

Another promising idea that we haven't tried yet is to serialize the messages & pass them along with the page-data. However, there are open questions on how we would know which messages need to be sent

Maybe leveraging AsyncLocalStorage (NextJS already seems to use this for headers()) to have a request context for this could help, having the translation functions add to a list at runtime?

Copy link
Collaborator Author

That's an interesting idea, however, that likely only catches the messages that are actually executed during server-rendering, not messages that are used conditionally. We would need those too.

@osdiab
Copy link

osdiab commented May 25, 2024

Hmm yeah, in that case it probably can’t be a runtime thing then. Maybe can crawl the AST at compile time to find every invocation of a translation function, traversing from the starting point for each route (I think should be clear for each metaframework, eg for NextJS any default export from a page/layout/route file, not sure how one would achieve this framework agnostically though).

@minht11
Copy link

minht11 commented Jul 13, 2024

I think this could be solved with import maps. It allows to load different specifiers dynamically. Main caveat being that importmap must be inserted before any module loading occurs.

With this only one language would be loaded. The downside that i18n module keys could not be inlined inside the bundle like they are right now or whole application chunks would need to be duplicated.

I tested it locally and it works, I am pretty sure something like this could be implemented by Paraglide relatively easily.

Main experiment code, missing en.js and de.js files.

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />

    <script type="application/javascript">
      const language = localStorage.getItem('language') ?? 'en';

      const im = document.createElement('script');
      im.type = 'importmap';
      im.textContent = JSON.stringify({
        imports: {
          language: `/${language}.js`,
        }
      });
      document.currentScript.after(im);
    </script>
    
  </head>
  <body>
    <div id="languageContent"></div>

    <button id="toggleLanguage">
      Toggle language
    </button>

    <script type="module">
      import { languageName } from 'language'

      languageContent.innerHTML = `<h1>${languageName}</h1>`

      toggleLanguage.addEventListener('click', async () => {
        const language = localStorage.getItem('language') ?? 'en';
        const newLanguage = language === 'en' ? 'de' : 'en';

        localStorage.setItem('language', newLanguage);

        window.location.reload();
      });
    </script>
  </body>
</html>

@osdiab
Copy link

osdiab commented Jul 14, 2024

I think the problem with that is that the goal is to pass the i18n strings in the initial page load, not in a separate HTTP request after the page loads in the browser. The script tag would need to be parsed and executed in the client’s browser rather than happening entirely on the server.

@minht11
Copy link

minht11 commented Jul 14, 2024

If you inline script inside html, no seperate request will be made, it will load sync with html and since script for import map is very small the cost is very minimal, far less than even loading 2 languages.

Things won't be loaded lazy loaded, just i18n strings need to be in separate chunks for specifier imports to work (or whole separate app bundle for each language). Module preload native/vite polyfilled should make few more separate chunks non issue.

Also my solution would work for SPA too, in my use case I am not using a server or meta framework. Server solution you were discussing sounds like meta framework specific.

Copy link
Collaborator Author

Inlining the scripts would be very nice! We'll definitely prototype that.
I'm not yet sure how we would do the build-transforms necessary to do this.

  • How do we know which messages can be rendered on the current page? This info exists in the tree-shaking but we need to have it at runtime.
  • How do we get the client-side build to use the inlined messages?

It's a promising approach though. I imagine it would generalize quite well across frameworks

@ambigos1
Copy link

ambigos1 commented Sep 5, 2024

Hi @LorisSigrist .
I just noticed that all languages are getting downloaded and I found it after I translated my app to 57 languages :(

Performance is my top priority and deploying my website with 57 languages will damage my performance on web core vitals.

I am currently using Svelte Static-Adapter and all my website is prerendered for all languages.
Is there a way to prevent downloading the JavaScript files with all languages since they are generated on build time as static HTML files?

Thank you very much for your time :)

@samuelstroschein
Copy link
Member

Hi @ambigos1

Is there a way to prevent downloading the JavaScript files with all languages since they are generated on build time as static HTML files?

Not atm. I might set a bounty on this issue. If anyone is down to implement per language-splitting after #217

@samuelstroschein
Copy link
Member

The new vite environments https://main.vitejs.dev/guide/api-environment could be the solution we waited for by creating one environment per locale.

@ambigos1
Copy link

@ambigos1

This comment was marked as off-topic.

@samuelstroschein

This comment was marked as off-topic.

@ambigos1

This comment was marked as off-topic.

@samuelstroschein

This comment was marked as off-topic.

@samuelstroschein
Copy link
Member

Making the locale/language tag getter static on a per build basis could be interesting. If the language getter is static on a given build, bundlers will tree-shake unused imports.

const jojo_mountain_day = (inputs, options = {}) => {
	const locale = "en";
	if (locale === "en") return en.jojo_mountain_day(inputs);
-	if (locale === "de") return de.jojo_mountain_day(inputs);
-	if (locale === "en-US") return en_US.jojo_mountain_day(inputs);
	return "jojo_mountain_day";
};

@samuelstroschein
Copy link
Member

I will look into per language/locale splitting next week #201 (comment)

@samuelstroschein
Copy link
Member

Trivial to implement for the compiler. Last open question is how to build per locale with bundlers. Vite's environment API could be the breakthrough.

  • add a staticLocale to the runtime
  • define staticLocale on compile
  • bundler tree-shakes messages that dont' correspond to the static locale 🚀
+const staticLocale = "de"

const greeting = (inputs, options = {}) => {
+	const locale = staticLocale ?? options.locale ?? getLocale();
	if (locale === "en") return en.greeting(inputs);
	if (locale === "de") return de.greeting(inputs);
	return "greeting";
};

export { greeting };

@moufmouf
Copy link

Hey!

For the context, I'm in a situation where I have 10+ languages to translate and a SPA running in Svelte only mode (no SvelteKit). See #351.

I just had an idea I wanted to share here. I don't think it was mentioned before.

If I understand correctly, you are scratching your heads to avoid lazy-loading because you assume it will cause an additional round-trip:

Any solution using fetch or await import is bound to introduce a render-fetch waterfall which drastically increases Time-To-Interactive. Eagerly loading messages in all languages is preferable in the vast majority of cases.

BUT! There are now ways to tell the browser to prefetch resources. For instance: Early hints: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/103

Basically, there might be no need to make one build per locale if you can ensure the correct message file is already loaded in the browser when your app is bootstrapping.

Did you guys already explore this solution?

@samuelstroschein
Copy link
Member

samuelstroschein commented Jan 29, 2025

@moufmouf thanks for the idea. might be something to it.

The waterfall is not the main issue. The main issue is avoiding message functions like m.happy_elephant() to be async. The moment that happens, complexity will explode. Every render turns async, which requires suspense, etc.

export async function happy_elephant(){
    if (locale === "de) return await import("de.js")
    // ...
}
function Component() {
   // 💥 happy_elepahtn is a promise
   // which will render <p>Promise</p>
   return <p>{m.happy_elephant()}</p>
}

What could work, however, is using ESM new top-level await. If the locale is set before the top level await of a message bundle function (the bundle function "bundles" the messages for all locales) is executed, then your approach could work! The bundler tree-shakes un-unsed message bundle functions, and the message bundle function lazy loads the message defined by the locale!

// top level import of the message in the current locale
const message = await import("{locale}.js")

export happy_elephant() {
   return message
}

I will investigate this! This might have legs!

@ambigos1
Copy link

Hey!

For the context, I'm in a situation where I have 10+ languages to translate and a SPA running in Svelte only mode (no SvelteKit). See #351.

I just had an idea I wanted to share here. I don't think it was mentioned before.

If I understand correctly, you are scratching your heads to avoid lazy-loading because you assume it will cause an additional round-trip:

Any solution using fetch or await import is bound to introduce a render-fetch waterfall which drastically increases Time-To-Interactive. Eagerly loading messages in all languages is preferable in the vast majority of cases.

BUT! There are now ways to tell the browser to prefetch resources. For instance: Early hints: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/103

Basically, there might be no need to make one build per locale if you can ensure the correct message file is already loaded in the browser when your app is bootstrapping.

Did you guys already explore this solution?

I am using prerender, and all the text is generated at compile time.
I don't even need all the messages to be loaded at the production
If I only knew how I could remove them from the build output, it would help me boosting my performance

@samuelstroschein
Copy link
Member

If I only knew how I could remove them from the build output, it would help me boosting my performance

@ambigos1 nice one. opened #354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants