Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate ways to load additional data on demand #3530

Closed
jerch opened this issue Oct 25, 2021 · 11 comments
Closed

investigate ways to load additional data on demand #3530

jerch opened this issue Oct 25, 2021 · 11 comments
Labels
area/build Regarding the build process type/automation Relating to CI/CD pipeline, automation, etc. type/enhancement Features or improvements to existing features

Comments

@jerch
Copy link
Member

jerch commented Oct 25, 2021

Coming from #3524.

xterm.js currently has no way to load additional data on demand in the core. We are at "either it is there prebundled - or forget it".

We should investigate, how we can externalize rarely used (blob) data into additional resources, that can be requested on demand. Ideally things cover these aspects:

  • easy to use in code
  • either works transparently with TS' import statement, or does not interfere negatively
  • keep working for CJS/ESM
  • keeps working in nodejs and browser context
  • support proper electron bundling
  • keep working with common web bundlers (webpack, Rollup, parcel ...)

Sidenote: JSON imports are working for most above, but not all - like ESM currently cannot handle it. A workaround to that might be to use the ESM import() function at code level with real JS modules (JSON data written to proper code units). Needs investigation...

@jerch jerch added type/enhancement Features or improvements to existing features area/build Regarding the build process type/automation Relating to CI/CD pipeline, automation, etc. labels Oct 25, 2021
@jerch
Copy link
Member Author

jerch commented Nov 11, 2021

Some investigations:

The most straight forward solution is to be able to directly use

const moduleData = await import(...);

Sadly this does not work out of the box, has several drawbacks and raises security concerns:

  • Due to our project settings pointing to CJS output TS transpiles this to a promise-require construct, which again gets bundled into the the final package. TS created a special module: node12 output type to circumvent this particular transformation (see Add flag to not transpile dynamic import() when module is CommonJS microsoft/TypeScript#43329), but we cannot easily switch the module type (gives tons of issues at other ends).
  • The import statement still cannot natively deal with JSON, which is the primary goal here. Thus this still needs some sort of explicit JSON-to-JS transformation.
  • While import() would work with such a JSON-to-JS transformation, it places a much higher risk on those imports for being treated as executable JS content. From a security viewpoint this is a dimension shift - we leave the static single asset realms and ask for trouble from lazy loaded & executed JS content.

Searching further reveals, that we kinda stumbled into an uncommon edge case, where the tooling around TS + CJS output + bundlers fails to deliver a proper handling, if you have multiple targets (we have browsers, electron, nodejs). I found some SO posts from angular peeps, that were able to bridge similar requirements with a mixture of fetch/import in a separate JS module to circumvent TS' transformation and to generate code, that bundlers still can handle nicely.

So my idea goes also into that direction:

  • build a JS module exporting some sort of dynimport('path/to/resource'): Promise<data>
  • dynimport can conditionally resolve resource requests either with fetch() (JSON on browser) or import() (JS on browser/nodejs/electron) or a require() shim (JS/JSON on nodejs/electron)
  • limit handling to JSON for now, if we need JS dynamic imports later on, it can be extended by that
  • JS module can live in base repo next to TS sources, with some interface declaration
  • imported JSON resources prolly need some explicit type/interface declarations somewhere
  • security bonus - think about some sort of SRI/hash securing of lazy loaded resources
  • bundling bonus - offer some way to switch prebundling vs separate lazy resources during packaging (prolly would simplify electron builds)

Plz lets discuss the details, pitfalls or even completely different ideas, as I still might not see the whole picture. (Esp. ESM vs CJS seems to be tricky in TS, not sure yet, how that plays into the idea above.)

FYI: @Tyriar, @mofux, @meganrogge, @Eugeny

@Eugeny
Copy link
Member

Eugeny commented Nov 11, 2021

IMO dynimport looks good, but users should be able to override the implementation.

Points to consider:

  • Path format passed to dynimport()
  • Whether two separate builds (with statically bundled resources and without) are necessary
  • Fallback in case a user chooses not to provide the JSON files at all (i.e. disable named color support in OSC 4/10/11/12 color handling #3524)
  • If JS imports are coming in the future, whether we have to mandate the packaging format for these

@jerch
Copy link
Member Author

jerch commented Nov 11, 2021

@Eugeny

...but users should be able to override the implementation.

I agree, but wonder how we can achieve that in a straight forward way. I actually thought about multiple "loader backends" for electron, node/CJS, browser output or a pure static one, to deal with their specifics in a way bundlers still can handle. We could make the basic interface abstract, so peeps can provide their own implementation (if the standard ones do not fit).
Nasty thing about it - it certainly needs us to bundle multiple packages with the different loader impls, either integrated in multiple full xterm.js packages, or as separated small packages, that can be plugged into a loader API endpoint of xterm.js.
And the resource assets need to be prebundled for the loader formats as well (or we provide some easy build script for that).

Path format passed to dynimport()

Do you mean like introducing some sort of schemes, as nodejs did with their 'node:...' thingy?

Whether two separate builds (with statically bundled resources and without) are necessary

As indicated above, I hope avoid that with separate loader packages to be hooked into an xterm.js loader endpoint. Example for electron builds:

  • grab xterm.js package
  • grab electron loader package providing an electron based dynimport, which knows how to translate path specs into proper lookups on electron side, maybe even with proper asar lookup
  • instantiate terminal with some loader notion, e.g.:
    const electronImporter = new ElectronImporter('path/to/asar');  // from xterm-dynimport-electron package
    new Terminal({ ..., dynimport: electronImporter });
    ...
    // later on in terminal code
    await dynimport('/resource/symbol');

Fallback in case a user chooses not to provide the JSON files at all

Yes, for named colors this could be handled directly at the dynimport call (just do nothing if the import returns nothing). If we want that more fined-grained for embedders, we prolly could make it configurable via API, e.g. list all resources so peeps could enable/disable them during terminal initialization.

@mofux
Copy link
Contributor

mofux commented Nov 12, 2021

Whenever I look into the ES Module / CJS problem, I hit a point where I feel that public support / tooling is not yet ready.

At the current point in time, I'd rather pre-bundle the additional data with the core package or as an addon, instead of building a hacky tooling to deal with the different environments.

IMHO the effort for supporting and implementing such a dynimport tooling is not worth it for our current use-case. We should keep in mind that it also increases the implementation effort for the consumers of xterm.js.

We'll get there eventually -- once we can ship xterm.js as an ES Module exclusively and await import() works in browser and node. But sadly I feel CJS will stay with us for a long time to come.

@Eugeny
Copy link
Member

Eugeny commented Nov 12, 2021

We could avoid the chaos of multiple packages by completely relegating the dynimport() implementation to user and instead provide only a single own implementation in form of all resources bundled into a single JS that exports dynimport.

So:

  • Import xterm.js and xterm-resources.js if they want a quick start
  • Import xterm.js and provide own dynimport (or just pass ESM import as dynimport) for granular loading

@jerch
Copy link
Member Author

jerch commented Nov 12, 2021

Whenever I look into the ES Module / CJS problem, I hit a point where I feel that public support / tooling is not yet ready.

Yes, and kinda the whole ecosystem suffers from that. In theory with full ESM-isation of xterm.js we would not need any tooling in xterm.js at all, since it is a library, where even bundling our npm package "xterm.js" is superfluous. ESM is designed to grab through on declared resources, so things can either be delivered that way directly (yes it just works in browsers transparently), or grabbed during bundling on integrator side for final static assets creation. With all goodies like tree shaking working.
It is understandable that no one really wants to write bridging tooling for CJS to ESM, as it is doomed to get obsolete soon, just because ESM handles every aspect in a ESM-only world just fine, even the lazy loading issue.
The promises of ESM are high, but it also has its downsides - the scattered loading caps for example are a nightmare for security audits (can be fixed at integrator level by prebundling critical parts again, so nothing we as library maintainer have to worry about anymore), and JSON loading is still not working.

At the current point in time, I'd rather pre-bundle the additional data with the core package or as an addon, instead of building a hacky tooling to deal with the different environments.

Imho our current addon system cannot reflect the needs here, which are lazy loading of rarely used but data intensive resources in the terminal core. For one we have no lazy addon loading triggered from within the terminal, and second - how would you integrate that into core code? Writing dead endpoints to be grabbed from addon code? This would work, but undermines completely our API idea, as such addons always would have to dig deep into private code. (Or we bloat the API with every single aspect popping up.)
And third - addons are hooked very late after terminal initialization, thus early code stepping into dynimport needs would then just fail? (Btw thats even an issue of import() with its promise return in a terminal instance at top level JS nomodule context. But full ESM modularization with an API rewrite to Terminal(): Promise<TerminalType> can solve that.)

IMHO the effort for supporting and implementing such a dynimport tooling is not worth it for our current use-case. We should keep in mind that it also increases the implementation effort for the consumers of xterm.js.

The current use case of "named X11 colors" is not really one, we can just say "nope, not supported". But it revealed the problem underneath. Example - with proper unicode grapheme handling we are likely to need 50-200kB of additional data for different codepage segments, depending on how many unicode flags the code has to evaluate. Here it would be nice to split that data into smaller packages for these code segments to allow lazy loading on demand. This helps with overall package size, startup time and does not bloat the JS engine at runtime with data never needed.
The unicode story is the same everywhere - see nodejs, which moved from 18 MB (eng) to >50 MB (intl).

We'll get there eventually -- once we can ship xterm.js as an ES Module exclusively and await import() works in browser and node. But sadly I feel CJS will stay with us for a long time to come.

I already tried the mixed ESM/CJS packaging in node-sixel - well a nightmare, thus kept everything in CJS for now. At one point I will rewrite the whole package to ESM for good. I think the same step will be needed for xterm.js somewhere in the future.

@Eugeny

We could avoid the chaos of multiple packages by completely relegating the dynimport() implementation to user and instead provide only a single own implementation in form of all resources bundled into a single JS that exports dynimport.

Thats a very good idea to start with.

... or just pass ESM import as dynimport ...

This works once import can handle JSON transparently. But even at the current state, it is no biggy to provide a dynimport with a JS vs JSON switch, where JSON falls back to a fetch shim.

@Tyriar
Copy link
Member

Tyriar commented Nov 12, 2021

I feel like this problem gets solved when we adopt ESM fully, inventing our own thing is probably a waste of time given what it ends up giving us only temporarily. The color names are a special case which we could solve for example by adding an option to inject the colors, or a callback to fetch them, and the embedder handles the async part.

@jerch
Copy link
Member Author

jerch commented Nov 12, 2021

... or a callback to fetch them, and the embedder handles the async part

How is that different from dynimport (beside it being a single shot solution for the particular named colors issue)?

@Eugeny
Copy link
Member

Eugeny commented Nov 12, 2021

@Tyriar I think you and @jerch mean exactly the same thing - a user-settable async callback that takes a filename and returns the contents

@Tyriar
Copy link
Member

Tyriar commented Nov 12, 2021

I guess I just don't want it to be there permanently or used widely is my main comment because ESM will solve it properly eventually. This is easily to understand than a generic thing imo and doesn't need additional documentation on how to setup general dynamic importing:

interface ITerminalOptions {
  resolveColor: (name: string) => IColor;
}

@jerch
Copy link
Member Author

jerch commented Nov 16, 2021

Out of scope, thus closing.

@jerch jerch closed this as completed Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build Regarding the build process type/automation Relating to CI/CD pipeline, automation, etc. type/enhancement Features or improvements to existing features
Projects
None yet
Development

No branches or pull requests

4 participants