Specify display conventions for wasm locations #1053

dschuff · 2017-05-03T17:30:56Z

Based on the discussion in #990

jfbastien · 2017-05-03T17:39:59Z

Web.md

+To achive the same goal of a common representations for WebAssembly constructs, the
+following conventions are adopted.
+
+A wasm location is a reference to a particular instruction in the binary, and may be


s/wasm/WebAssembly/g everywhere.

jfbastien · 2017-05-03T17:40:54Z

Web.md

+Where
+* `${url}` is the URL associated with the module (e.g. via a response object), if any.
+* `${funcIndex}` is an index the [function index space](https://github.com/WebAssembly/design/blob/master/Modules.md#function-index-space).
+* `${pcOffset}` is the offset in the module binary of the first byte of the instruction, printed in hexadecimal with lower-case digits.


0x prefix or not?

Yes, but in this formulation the 0x is part of the template and not part of the substituted value. Do you think we should switch that? It would mean that this line would say something like "${pcOffset} is the offset ... printed in hexadecimal with lower-case digits and a leading 0x prefix" which seems a little more awkward, but I don't have a strong opinion on that.

Dunno, all I'm saying is this isn't clear. Whichever way works for me.

Switched to more-or-less that wording. I agree it's clearer.

jfbastien · 2017-05-03T17:41:40Z

Web.md

+Names of functions may also be displayed if the module contains a `"name"`
+section; these can be used in the same contexts as JavaScript functions.
+If there are no names provided, then engines should somehow indicate this;
+(it may be sufficient to simply use e.g. an empty string if the name is


Function number instead of empty string?

In a stack trace this would be kind of redundant. e.g. in SpiderMonkey, the full stacktrace entry would be
${name}@${location} where ${location} would include the wasm function number for wasm, and ${name} is already empty for top-level JS code not in a function.
For V8 the format is currently
at ${name} (${location}) for wasm and when there is a JS function, and just at ${location} for for top-level JS.
So the point is that there is already precedent for empty JS function names that browsers might want to reuse. OTOH I'm not opposed to a little redundancy in the name of making things clearer either.

OK I guess this wording is unclear to me.

Reworded to clarify.

rossberg · 2017-05-03T17:59:52Z

Web.md

+It has the following format:
+`${url}:wasm-function[${funcIndex}]:0x${pcOffset}`
+Where
+* `${url}` is the URL associated with the module (e.g. via a response object), if any.


What is the format if there is no URL? Is the field just empty? Is the colon still included?

I think there should be something, rather than empty. The note below addresses that, but I agree it's not clear at this line.

Otherwise if it's empty there would be no way to tell different modules apart if they didn't have URLs. Obviously it's still possible to have collisions if the instantiation location is used, but at least it would allow a developer to avoid them if they cared.

Reworded to clarify.

backes · 2017-05-04T08:52:43Z

Web.md

+`${url}:wasm-function[${funcIndex}]:0x${pcOffset}`
+Where
+* `${url}` is the URL associated with the module (e.g. via a response object), if any.
+* `${funcIndex}` is an index the [function index space](https://github.com/WebAssembly/design/blob/master/Modules.md#function-index-space).


typo: index into the ...
Also, the URL can be relative (just "Modules.md#...").

Cellule · 2017-05-05T02:18:55Z

I am still a little fuzzy about the url field.
Are we talking about the url of the script that instantiated the module or compiled the module or the bytes of the module ?
Even if the bytes are located in a .wasm file somewhere, when we create the module today we don't pass any info about the source, we simply give a buffer to compile.

Edit: I just noticed the part about the Response object, which is the only api I see that can give a meaningful url

rossberg · 2017-05-05T07:15:48Z

Web.md

-* `${pcOffset}` is the offset in the module binary of the first byte of the instruction, printed in hexadecimal with lower-case digits.
+* `${url}` is the URL associated with the module (e.g. via a response
+  object), or other module identifier (see notes).
+* `${funcIndex}` is an index the [function index space](Modules.md#function-index-space).


Typo: missing "in"

rossberg · 2017-05-05T07:17:52Z

Web.md


 Notes:
 * The URL field may be interpreted differently depending on the context. For
 example offline tools may use a file name; or when the ArrayBuffer-based
 `WebAssembly.instantiate` API is used in a browser, it may display the
-location of the API call instead.
+location of the API call instead. It should not be empty however; a


API calls may not have useful source locations either, e.g. when performed as part of an eval call.

rossberg · 2017-05-05T07:20:39Z

Web.md


 Notes:
 * The URL field may be interpreted differently depending on the context. For
 example offline tools may use a file name; or when the ArrayBuffer-based
 `WebAssembly.instantiate` API is used in a browser, it may display the
-location of the API call instead.
+location of the API call instead. It should not be empty however; a
+developer should be able to write their code such that modules from


Are you suggesting that programmers should be able to rely on unambiguous location URLs? If so, I don't think that can work in general, e.g. for the aforementioned reason, but also because of URL ambiguities in general. I would rather drop that half-sentence.

dschuff · 2017-05-05T16:21:58Z

@Cellule @rossberg-chromium re:URLs

So the obvious case is for the response APIs where there's a real non-data URL, which we hope will be the most common.
For the ArrayBuffer APIs, @lukewagner suggested in #990 (comment) that we could display the location of the JS caller that called the instantiate API. Obviously this could still be ambiguous (e.g. there's just one call in the source that instantiates different modules). However if a developer wanted to, they could introduce more call locations, ensuring that the modules could be distinguished for their own codebase. So the intent of this wording was to ensure that property, without necessarily mandating that "show the location of the API call" be exactly the mechanism. Thinking more about this though, if browsers disagree on that mechanism, it's probably not very useful. So we should probably either say that

All browsers should display the JS location of the API call (this is presumably easy to do, and allows developers to split out the call locations if they want to), or
there's no restriction, and browsers will presumably just disagree (most likely show the API call location or nothing)

dschuff · 2017-05-05T16:35:40Z

I don't really like option 2 since it seems likely that with e.g. dynamic loading we'll have several linked modules all instantiated from the same API call (especially if we have an instantiateGroup-like API). One simple mechanism to help is to allow modules to be named (proposed in #1055). If we had a module name, we could also add it to this string (although it's maybe getting kind of long now).

domenic · 2017-05-05T17:23:48Z

One thing to note is that currently for eval browsers have all sorts of heuristics that generate "URLs" for stack traces. (E.g. I've seen "<eval code> in https://example.com/page.html" as a "URL".) I'm not sure whether browsers would want to either reuse that logic for wasm, or if maybe the community wants to crack down and only allow interoperable well-specified URLs for this new technology. (Possibly including such generated "URLs" in the future, but only if we get them specified so they can be implemented interoperably.)

dschuff · 2017-05-10T22:30:27Z

If modules can have names inside them, it makes sense to prefer that over the location of an API call. And maybe even over the URL (although then it would be asymmetric with JS locations which do use URLs instead of other names). So I could go either way on whether a URL or module name should be preferred. But in any case, having the name available will at least ensure that a developer can always specify something of their choosing.

dschuff · 2017-05-10T22:31:18Z

(And I've not attempted to specify here what anyone should do if in the presence of eval or whatever).

lukewagner · 2017-05-11T17:35:52Z

Thanks for writing this up! Sorry for taking so long to get back to it (it always takes some time to page everything in).

So the module name is a new (but good) twist to consider. I think, even if a module name is supplied, you'd still always want to include the URL (since it's not redundant info). So what if instead:

the id field is renamed to url and is defined to be either the URL of the fetch (for the Response API) and otherwise the URL of the JS caller of compile/instantiate in the same fashion as eval (except substituting "eval" for the wasm method name: "compile", "instantiate", so, e.g., in SM, https://foo.com/foo.js line 10 > instantiate).
The name is defined to be module_name.func_name (or module_name, or func_name, or empty-string, if one of those fields is absent). I'd specifically say to use the empty string if no module/func names are present rather than index to avoid repeating wasm-function[funcIndex].

What I like is that this keeps all the names from the name section to the left of the @/at.

dschuff · 2017-05-11T22:24:54Z

@lukewagner I like that idea; however:

I don't want to prescribe the exact way an eval'd code location is represented, as it is an already-established difference between engines. Actually I'm not sure I really even want to say that 'it should be like the engine handles eval, but with the wasm function name' because some engines punt that entirely; e.g. JSC just says eval@[native code] so I'd like to allow them room to do something better for wasm without necessarily changing how they handle eval. (also, how would you handle an instantiate call from inside eval'd code? I guess just nest the representations?).
If there are contexts where the function name is displayed that isn't right next to this location representation (e.g. in devtools UI?), then you probably don't want an empty string. So that could be beyond the scope of this document (e.g. if you have more expressive UI than just text). But are there other situations we are forgetting that would display a function name but not its location?

lukewagner · 2017-05-12T14:57:29Z

Agreed we don't want to overspecify b/c this already varies. But could we just call it the url field and say "a URL symmetric to a JS eval's URL" and give some examples? (instantiate-from-eval would work like eval-from-eval: @blah.js line 10 > eval line 5 > eval :)
Good question. In a context like devtools where one didn't have the full mod_name.func_name@url:wasm-function[i]:pcOffset quintuplet, but rather just a "name" and "url" fields, I think we'd want "name" to be mod_name.func_name if both are defined else mod_name.wasm-function[i] if func name is not defined else wasm-function[i] if no names are defined. Perhaps we can capture this contextual distinction?

domenic · 2017-05-12T15:04:08Z

To be clear, my point in bringing up eval was that the from-ArrayBuffer APIs are analogous in terms of the kind of "source URLs" they might generate, in response to

So the obvious case is for the response APIs where there's a real non-data URL, which we hope will be the most common.

For the ArrayBuffer APIs, @lukewagner suggested in #990 (comment) that we could display the location of the JS caller that called the instantiate API.

eval() of code that calls the from-ArrayBuffer APIs is yet another level of complication (similar to eval() of code that calls eval()), but I wasn't intending to discuss that.

lukewagner · 2017-05-12T15:49:32Z

@domenic I think I agree with what you're saying, but I'm not sure if you're disagreeing with my more-recent comments :) To be clear, I'm suggesting that if, e.g., you're SM and already have URLs like @test.js line 2 > eval and @test.js line 2 > eval line 3 > eval for (nested) eval() then for WebAssembly.compile you'd have URLs like @test.js line 2 > WebAssembly.compile and @test.js line 2 > eval line 3 > WebAssembly.compile. And other engines would do symmetrically, basically doing s/eval/WebAssembly.compile/ (or WebAssembly.instantiate).

I like the idea of trying to be even more compatible, but this seems hard if there's already a diverging precedent for eval and wasm from-ArrayBuffer APIs can be called from within eval.

domenic · 2017-05-12T16:19:21Z

No disagreement; that sounds right! And yeah, it's not clear what the right answer is here, besides just speccing something like "engines should treat these APIs like they do eval for purposes of source locations". We can then hope that in the future someone takes on the heroic task of nailing down the stack trace format, including what happens with eval, for ES, and then wasm can just copy that work.

dschuff · 2017-05-12T17:30:46Z

Good suggestions; I've tried to capture it, PTAL

domenic · 2017-05-12T17:50:19Z

Looks great, although there might be some mismatched parens in the example :)

lukewagner · 2017-05-17T21:22:49Z

Web.md

+
+Names of functions may also be displayed if the module contains a
+["name" section](BinaryEncoding.md#name-section);
+these can be used in the same contexts as JavaScript functions.


nit: "... same contexts as JavaScript function names".

lukewagner

Oops, I missed the reply in email; sorry for taking so long to get back and thanks for applying all the changes! lgtm with two small nits

lukewagner · 2017-05-17T21:25:23Z

Web.md

+not specify the full format of strings such as stack frame representations;
+this allows engines to continue using their existing formats for JavaScript
+(which existing code may already be depending on) while still printing
+WebAssembly frames in a format consistent with JavaScript.


Could you also add a Note saying somewhere that these conventions do not describe the value of the .name property of exported WebAssembly functions which is precisely [defined](JS.md#exported-function-exotic-objects) to be ToString(function-index)

Ha good point. Would we want a way to map one to the other as a standalone function?

You you mean like some new JS API for producing the module_name.func_name? That seems possible, but it also makes the names section (more) semantically visible (than before), so I guess it depends on what our use case is.

Yes. My thinking it: we already let developers access the name section so they don't have to parse their own module... but then they need to parse the name section to get that information! Cut the middle-person. 😄

Yeah, that makes sense if client code would otherwise be doing their own binary parsing that we've already done. With the other Module reflection methods, our motivating use case was module loaders (and specific experience with incorporating wasm into SystemJS). It'd be nice to have some specific user who is wanting to programmatically access these function names.

Anyhow, this probably belongs in a different issue.

lukewagner · 2017-05-18T23:18:31Z

I think @dschuff is out for a bit, so I took the liberty of applying my review requests to the PR. I also turned the naming paragraph into nested bullets so it was a bit easier to see the if/else structure.

lukewagner · 2017-05-18T23:20:07Z

Any last comments before merging?

hemobo · 2017-05-19T11:36:51Z

This has probably been discussed elsewhere, but...
Is using the same format for runtime errors and compilation errors a strong constraint here? If it isn't, have you considered using a 'function n : instruction m (m being the m'th instruction in function n)' format instead? Individual instructions seem to be what both source maps and stack traces actually point to – what's actually behind the 'first byte of an instruction' part that's spec'ed here, just with a weird indirection through the binary encoding.

That would have the advantage of being independent of the current representation of a module (binary, text or some other data structure). As it is written, it seems one has to keep the particular byte stream around just to properly format a stack trace.

In principle an absolute binary offset can be used without having to really understand the binary encoding, but it isn't immediately clear that this is an advantage here, because tooling that has access to the byte stream and wants to do anything useful with that information (like displaying the trapping instructions) needs to be able to parse function bodies and possibly convert them into textual representation in any case.

lukewagner · 2017-05-22T14:47:54Z

The 'function n' part is already explicitly present in this PR via wasm-function[${funcIndex}], so I think the main new thing you're proposing is changing from the current bytecode index to an instruction index (indexed by number of whole instructions).

From my experience, and from the previous discussion in #990, I think the bytecode index is simpler for everyone. For the engine compiling a trapping instruction, it's quite easy to just save (compile into the fail path, save in trap metadata, etc) the "current" bytecode offset for later trap reporting; no need to save bytecode. If we had to report instruction index, we'd have to maintain an additional instruction-counter that was incremented after decoding each instruction and this could both be a source of rare corner-case bugs and a mild source of decoding slowdown. For wasm producers, tools like wabt have a dump command that naturally displays bytecode offset next to each instruction; this too would need extra work to maintain an instruction counter instead.

Also, wasm source maps are currently being proposed to map bytecode offsets to source via bytecode offset and this would provide what developers really want which is errors in terms of source code location.

dschuff · 2017-05-22T21:27:25Z

@lukewagner Thanks for helping push this along! I do think that using some abstraction other than byte offset is an interesting idea worth considering; we currently have some discussion in #1064 and here, so maybe we should merge this and file a separate issue or PR for that question specifically. If we were to switch, it would just replace the ${pcOffset} field as defined here with something different' maybe just a different number.

Specify display conventions for wasm locations

b9ca72f

jfbastien reviewed May 3, 2017

View reviewed changes

rossberg reviewed May 3, 2017

View reviewed changes

s/wasm/WebAssembly

a2c18aa

backes reviewed May 4, 2017

View reviewed changes

review feedback

759732f

rossberg reviewed May 5, 2017

View reviewed changes

Define module ID instead of URL, prefer module name if present

44f9459

dschuff mentioned this pull request May 11, 2017

Add support for module names in wasm binaries WebAssembly/binaryen#1010

Open

use URL and module_name.function_name

40727c8

Clarify eval analogy with examples, refine name display and add example

fc10e38

fix copypasta, link, formatting

2c19e3a

AndrewScheidecker mentioned this pull request May 15, 2017

Expand binary name section to include type, table, memory, global, and label names. #1064

Closed

lukewagner reviewed May 17, 2017

View reviewed changes

lukewagner approved these changes May 17, 2017

View reviewed changes

Address review comment

aaa35d0

dschuff merged commit d1bd4a4 into master May 22, 2017

This was referenced May 22, 2017

Should module byte offsets be used for specifying wasm code locations? #1071

Open

Display in browsers and tools for addresses of functions and instructions #990

Closed

jfbastien deleted the display branch November 6, 2017 17:08

AndrewScheidecker mentioned this pull request Jul 8, 2019

DWARF for WebAssembly Target WebAssembly/debugging#1

Open

ppenzin mentioned this pull request Apr 2, 2020

Standard wasm Error location chakra-core/ChakraCore#6402

Open

Specify display conventions for wasm locations #1053

Specify display conventions for wasm locations #1053

Conversation

dschuff commented May 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Cellule commented May 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dschuff commented May 5, 2017

dschuff commented May 5, 2017

domenic commented May 5, 2017 • edited Loading

dschuff commented May 10, 2017

dschuff commented May 10, 2017

lukewagner commented May 11, 2017

dschuff commented May 11, 2017

lukewagner commented May 12, 2017

domenic commented May 12, 2017

lukewagner commented May 12, 2017 • edited Loading

domenic commented May 12, 2017

dschuff commented May 12, 2017

domenic commented May 12, 2017

Choose a reason for hiding this comment

lukewagner left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukewagner commented May 18, 2017

lukewagner commented May 18, 2017

hemobo commented May 19, 2017

lukewagner commented May 22, 2017

dschuff commented May 22, 2017

Cellule commented May 5, 2017 •

edited

Loading

domenic commented May 5, 2017 •

edited

Loading

lukewagner commented May 12, 2017 •

edited

Loading

lukewagner left a comment •

edited

Loading