-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String marshaling/interop for wasm and asm.js scenarios #716
Comments
So instead of a decode operation on a view you'd be able to pass the view directly and the binding layer gets all the added complexity (and potentially more, if you want to optimize). It seems we'd need to solve whatwg/encoding#172 first to ensure this ends up working identically. |
Indeed, the "string view" idea keeps coming up and I think @tschneidereit might have an explainer/polyfill somewhere from the last round of discussions? FWIW, I think a string view would be generally quite useful beyond just wasm or calls to Web APIs. If a string view was standardized, the wasm Web IDL Bindings' |
Naming nit: if this happens, should we call it a |
cc @aphillips |
UTF8Array would make sense. If it's 'probable' I think that indicates severe neglect on the part of the caller, unless probable just indicates that it's possible the string goes away after it was constructed (because of manual allocation). The downside to calling it array is that it implies the [] operator, but maybe that operator is easy to implement. Making it useful as a StringView just implies that the array has .toString, which I guess has a history of doing roughly what you want on other array types. Then .toString can have a runtime-level fast path, or an IDL-level one (so it becomes fast to pass a UTF8Array to IDL APIs). Maybe a UTF16Array also comes with the bargain. For managing lifetime (i.e. telling the runtime that you freed the underlying memory) maybe you just detach the view? Detached arrays already have well-defined behavior, iirc |
|
Would it be worthwhile to rough out a proposed API via a polyfill so we can think through some basic issues that way? |
@kg Writing out a proposed API sounds like a good idea, maybe in its own separate repository (or gist). I'm happy to help with TC39 stuff if it ends up going that way, and in general I'm interested in figuring out how to reduce marshalling costs (but currently a bit behind on these threads). This seems pretty different from strings, e.g., as it's mutable; something to keep in mind when phrasing things for TC39. |
It might make sense to initially dictate that it's immutable, i.e. that if you change the heap after allocating it there's no guarantee that the string representation in JS won't change... though maybe that could expose bugs or vulnerabilities, so it's not an acceptable trick. |
I don't really see how this could work as an immutable thing. How would you change the heap from mutating out from under you, without copying or something like mprotect? |
The immutability isn't enforced by the runtime, it's a promise from the user. So if they violate it there's no guarantee the string will contain what they want. It's just like how many runtimes have 'readonly' fields that you can mutate via taking addresses or using reflection if you insist on being naughty. I can imagine that not being acceptable for some reasons but it seems like it could be an acceptable tradeoff. Some runtimes would just copy at construction time and some wouldn't. |
I agree--Making an immutable view over an object which may mutate sounds like a good design for this case. I think we need to make the semantics well-defined, though, meaning the runtime could not trust the developer. |
I was talking with Ben Smith about reducing string marshaling costs for wasm applications talking to the DOM and other APIs. He mentioned that WebIDL-bindings has some mechanisms for this mentioned in it, but it might make sense to try and codify this and make it something that can be exposed to wasm or JS apps directly for interacting with things other than WebIDL APIs, and perhaps make it easier to get consistent performance.
My general thought here is that a well defined 'string view' primitive is needed, sort of like how typed array views represent a subset of an array buffer as a block of int32s or floats or what have you. A string view would represent a subset of an array buffer as a utf8 (or utf16?) string with a known length provided at construction time. You promise to the runtime that the contents of the view won't change and that you will call something like .dispose() on the view when done with it, and in exchange the runtime has opportunities to optimize the way it handles that string. Maybe copies still happen on invocation, etc, but many operations could potentially be much faster.
I know that v8 and spidermonkey both have some optimizations for small strings and keeping strings in alternate representations when possible, so it seems like there's room for some wins there. It's also great if you don't necessarily have to go through textdecoder or a manual fromcharcode rope nightmare just to pass a string to the Canvas API or to jQuery.
Would it make sense to try and codify something like this and align it with webidl-bindings? Even if it ends up being a 'ctor that selects a byte range and turns it into a js string internally' that probably provides chances for optimization over webidl having to do that marshal on every call.
The text was updated successfully, but these errors were encountered: