String marshaling/interop for wasm and asm.js scenarios #716

kg · 2019-04-19T03:20:45Z

I was talking with Ben Smith about reducing string marshaling costs for wasm applications talking to the DOM and other APIs. He mentioned that WebIDL-bindings has some mechanisms for this mentioned in it, but it might make sense to try and codify this and make it something that can be exposed to wasm or JS apps directly for interacting with things other than WebIDL APIs, and perhaps make it easier to get consistent performance.

My general thought here is that a well defined 'string view' primitive is needed, sort of like how typed array views represent a subset of an array buffer as a block of int32s or floats or what have you. A string view would represent a subset of an array buffer as a utf8 (or utf16?) string with a known length provided at construction time. You promise to the runtime that the contents of the view won't change and that you will call something like .dispose() on the view when done with it, and in exchange the runtime has opportunities to optimize the way it handles that string. Maybe copies still happen on invocation, etc, but many operations could potentially be much faster.

I know that v8 and spidermonkey both have some optimizations for small strings and keeping strings in alternate representations when possible, so it seems like there's room for some wins there. It's also great if you don't necessarily have to go through textdecoder or a manual fromcharcode rope nightmare just to pass a string to the Canvas API or to jQuery.

Would it make sense to try and codify something like this and align it with webidl-bindings? Even if it ends up being a 'ctor that selects a byte range and turns it into a js string internally' that probably provides chances for optimization over webidl having to do that marshal on every call.

annevk · 2019-04-22T09:05:44Z

So instead of a decode operation on a view you'd be able to pass the view directly and the binding layer gets all the added complexity (and potentially more, if you want to optimize). It seems we'd need to solve whatwg/encoding#172 first to ensure this ends up working identically.

cc @hsivonen @lukewagner @tschneidereit

lukewagner · 2019-04-22T14:58:59Z

Indeed, the "string view" idea keeps coming up and I think @tschneidereit might have an explainer/polyfill somewhere from the last round of discussions?

FWIW, I think a string view would be generally quite useful beyond just wasm or calls to Web APIs. If a string view was standardized, the wasm Web IDL Bindings' view binding operator could naturally be extended to allow string views.

annevk · 2019-04-23T12:52:31Z

Naming nit: if this happens, should we call it a ProbableUTF8Array or some such for consistency with existing views?

annevk · 2019-04-23T12:53:15Z

cc @aphillips

kg · 2019-04-23T16:25:46Z

Naming nit: if this happens, should we call it a ProbableUTF8Array or some such for consistency with existing views?

UTF8Array would make sense. If it's 'probable' I think that indicates severe neglect on the part of the caller, unless probable just indicates that it's possible the string goes away after it was constructed (because of manual allocation). The downside to calling it array is that it implies the [] operator, but maybe that operator is easy to implement.

Making it useful as a StringView just implies that the array has .toString, which I guess has a history of doing roughly what you want on other array types. Then .toString can have a runtime-level fast path, or an IDL-level one (so it becomes fast to pass a UTF8Array to IDL APIs). Maybe a UTF16Array also comes with the bargain.

For managing lifetime (i.e. telling the runtime that you freed the underlying memory) maybe you just detach the view? Detached arrays already have well-defined behavior, iirc

annevk · 2019-04-23T17:35:56Z

Probably signifies that the caller might have made a mistake (perhaps intentionally) and you'll have to validate it being UTF-8. This cannot be avoided as far as I can tell. You make a good point that we might not want full parity with the other typed array API objects, so perhaps using a different name is okay. Either way, seems like something to run by TC39 if this proceeds.

kg · 2019-04-26T00:34:04Z

Would it be worthwhile to rough out a proposed API via a polyfill so we can think through some basic issues that way?

littledan · 2019-04-26T22:33:39Z

@kg Writing out a proposed API sounds like a good idea, maybe in its own separate repository (or gist). I'm happy to help with TC39 stuff if it ends up going that way, and in general I'm interested in figuring out how to reduce marshalling costs (but currently a bit behind on these threads). This seems pretty different from strings, e.g., as it's mutable; something to keep in mind when phrasing things for TC39.

kg · 2019-04-26T22:34:33Z

It might make sense to initially dictate that it's immutable, i.e. that if you change the heap after allocating it there's no guarantee that the string representation in JS won't change... though maybe that could expose bugs or vulnerabilities, so it's not an acceptable trick.

littledan · 2019-04-26T23:12:13Z

I don't really see how this could work as an immutable thing. How would you change the heap from mutating out from under you, without copying or something like mprotect?

kg · 2019-04-26T23:31:03Z

The immutability isn't enforced by the runtime, it's a promise from the user. So if they violate it there's no guarantee the string will contain what they want. It's just like how many runtimes have 'readonly' fields that you can mutate via taking addresses or using reflection if you insist on being naughty.

I can imagine that not being acceptable for some reasons but it seems like it could be an acceptable tradeoff. Some runtimes would just copy at construction time and some wouldn't.

littledan · 2019-04-27T08:25:20Z

I agree--Making an immutable view over an object which may mutate sounds like a good design for this case. I think we need to make the semantics well-defined, though, meaning the runtime could not trust the developer.

annevk added the topic: arraybuffer label Aug 2, 2021

annevk mentioned this issue Sep 1, 2021

Adding a UnicodeString type? tc39/proposal-is-usv-string#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String marshaling/interop for wasm and asm.js scenarios #716

String marshaling/interop for wasm and asm.js scenarios #716

kg commented Apr 19, 2019

annevk commented Apr 22, 2019

lukewagner commented Apr 22, 2019

annevk commented Apr 23, 2019

annevk commented Apr 23, 2019

kg commented Apr 23, 2019

annevk commented Apr 23, 2019

kg commented Apr 26, 2019

littledan commented Apr 26, 2019

kg commented Apr 26, 2019

littledan commented Apr 26, 2019

kg commented Apr 26, 2019

littledan commented Apr 27, 2019

String marshaling/interop for wasm and asm.js scenarios #716

String marshaling/interop for wasm and asm.js scenarios #716

Comments

kg commented Apr 19, 2019

annevk commented Apr 22, 2019

lukewagner commented Apr 22, 2019

annevk commented Apr 23, 2019

annevk commented Apr 23, 2019

kg commented Apr 23, 2019

annevk commented Apr 23, 2019

kg commented Apr 26, 2019

littledan commented Apr 26, 2019

kg commented Apr 26, 2019

littledan commented Apr 26, 2019

kg commented Apr 26, 2019

littledan commented Apr 27, 2019