-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Break / Branch? #445
Comments
(bonus: we don't need to rename the |
Another approach is to rearrange the syntax. Instead of putting the label at the top of a block, we could put it at the bottom:
which makes it quite clear where control goes on a branch with that label. |
I agree that "break" is the more adequate term. @sunfishcode, labels are symbolic names for AST nodes, not "positions" in sequential code. The syntax you propose would not be good fit for that. |
@rossberg-chromium It's a new language, so we can make labels be whatever we want. This issue appears subjective at this point. |
@rossberg-chromium You could consider the syntax for the Furthermore, if we do the analogous thing for loop:
then it seems 10x more readable and intuitive than |
@sunfishcode, well, if it involves changing the semantics of labels then it would be a change to the language as it has been spec'ed and implemented so far (e.g. in v8-proto). So in that sense, it's not just subjective. @lukewagner, the name notwithstanding, a language with nesting structure and expressions is pretty remote from a "real" assembly language (for good reason!), so I'm not sure that is too strong an argument. The fact that we have breaks rather than random branches is a direct consequence of that structure. Sure, we could tweak syntax in various ways, but are you sure this isn't trying to shoehorn the language into a form that does not actually reflect its actual structure anymore? |
I agree with @rossberg-chromium - yes, this is an "assembly", but it's an AST-based assembly. Doing something that seems odd for an AST is an unnecessary weirdness. And in AST-based forms, it is natural to see
e.g. in JavaScript, which is going to be a major point of comparison since this is an assembly for the Web (but the same is also in Java, etc.). It seems better to not have unnecessary odd differences, like putting the label at the end. However, if people feel strongly that "branch" is meaningful here, then how about "branch out of"? (as opposed to the implied "branch to") That's novel, which is a downside, but at least it could work with the labels in their natural place on the top. |
@rossberg-chromium I expect we'll end up with the same binary encoding either way, so even if you impose a formalism on this discussion which makes it technically a semantics question, it's still a subjective one. |
In which context do we need to spell out whether "br" stands for "break" or On Wed, Nov 4, 2015 at 9:08 AM, Dan Gohman [email protected] wrote:
|
In the s-expression format, yes, And it would be nice to not have some people call it a "branch" and others a "break". That already happened now: reading s-expression testcases, I read "break" in my head, then was discussing something with @sunfishcode, and he said "there is no break, there's just branch"... |
I'm not sure why there appears to be disagreement. It seems like, in programmers' minds, the meaning of "break" and "branch" is well-established. This is break:
And this is branch:
So if it's "branch", the label must go on the end in the text format. Anything else is confusing. |
I don't think it's reasonable to dismiss someone's concerns with this, given that we've actually made zero effort to verify that our changes don't regress binary encoding efficiency or significantly alter the binary representation. |
Does this pertain to what I responded to here, or to break/branch? On break/branch, regardless of the tree encoding, nodes will be opcodes and attributes. Break and branch would both have a single attribute, which for break could be a depth in the control flow stack identifying an AST node to break from or continue in, and for branch could be the depth in the label stack identifying a label to branch to. My main observation here is that these are the same, except for the words we use to describe them, and the appearance of the s-expression language. |
I agree that these are basically the same: this is just a discussion about how to call this particular AST node. And I think there are strong reasons to call it "break":
|
can be changed
to some people
Java bytecode, CLR bytecode, actual assembly languages, and others
for reasons of compression and decode speed
in your opinion
Break sounds like it can't go to the top of a loop shrug |
I'm not sure you're taking this seriously :) If there's a joke here going over my head, apologies in advance...
Of course. I'm arguing that the changing it that way is not good. There are known conventions for s-expression formats, for ASTs, and so forth.
Do any of those have an AST node where the label is at the end? (Honest question, not being snarky.)
Are you saying those are the only reasons? In particular, are you saying that that text format should not be AST-based? Perhaps this is the core issue here?
Indeed break can't get to the top of a loop. But we put a block inside the loop, and break out of that - isn't that the desugaring? |
But we are first and foremost designing an assembly language / virtual ISA. The only reason for an AST structure is because it buys us something concrete (compact encoding, cheap phi placement, free use-def/liveness info for dumb compilers, ease of impl for compiler backends w/o support for irreducible control flow); were it not for those reasons, then we'd likely just have |
I believe there is another concrete benefit to being an AST: it's better for view source. While we could have an arbitrary binary format and figure out an unrelated text representation on top, it's very nice to keep those harmonious and close. In other words, if we call something a "branch" in the binary but call it a "break" in the text format (or, worse, put the labels at the end in the text format), we just introduce unnecessary confusion. Again, as @sunfishcode said, this is just naming things. But naming things matters for view source. Calling it "break" seems by far the best option for every reasonable text format I can think of. Since this is just a name, why not call it that? There is no actual downside to the binary format. Or do people not even want to think of the text format yet? |
An important distinction I didn't make in my comment above is "benefit" vs. "constraint". I don't think we should constrain the design of wasm around trying to make wasm look like a high-level language; as argued quite a few times before, that is the role of the source maps future feature; people want to read |
|
Anyhow, this is pretty much the epitome of bikeshedding, but "branch" follows from the recognition that we're designing a virtual ISA here (which happens to have sufficient efficiently-checktable constraints to guarantee reducibility), not a programming language. |
Let me try to explain why I strongly disagree. To make this concrete, imagine I am a web dev, and I see a cool WebGL thing on a site. I want to see how it's done. I open view-source, and see this:
Now, stuff like Instead, if we had view source show
then it looks natural to the many millions of people that know JavaScript, Java, etc. At least control flow looks natural and familiar, so figuring the rest out is much easier. Here we know that L1 is a label on a block, just like we are already familiar with, and breaks work in a simple, well-understood manner. @lukewagner, you dismissed this as "won't move the needle", but it's actually a huge deal. This is not a pure bikeshed of picking a color. This has implications for the text format, which is crucial for the view-source capability, which is necessary for WebAssembly to fit in well on the Web and to succeed. If our view-source is the former (with
Dropping the coercions is great, certainly more natural to read that way. But replacing the familiar Can we defer thinking about view-source for now? If we do we might end up with But maybe someone can think of a better text format that does use |
I think the use case of being able to generate a relative pretty source view from the wasm binary should be considered - this use case is not just bike shedding. The fact that a higher level language can be compiled to wasm binary form does not invalidate this use case. I think most programmers understand that expressions are easier to read than assembler code, because they hide irrelevant register allocation matters, and encapsulate the data flow and control flow. Perhaps the generation of the pretty wasm source from the binary could be a plug-in module itself in web browsers, giving the user more flexibility, and deferring the issue here and now. But still the support for this could be a technical consideration now. If it is trivial for a wasm binary pretty printer to emit the label at the start or end and to emit a branch or break then I suggest deferring the matter but keep it in mind when evaluating the binary format. If a little extra support is needed now then just accept a patch now. Personally, if I had a choice I would use a wasm-binary pretty printer that emitted s-expressions, and I'd like to see the |
Other decisions have already had a major negative impact on view-source quality (for example, the removal of globals), so this ship may have sailed already, @kripken. But I do agree that for your example scenario a raw textual representation of the AST would be much less confusing if labels and On the other hand, a view-source algorithm (and the standard text format, if we come up with a fancy new one) could do a lot of this instead of it being baked into the design. For example, the If you care about the quality of view-source and the text format it's important to think about the big picture - lots of things interact here. For example right now Casual familarity for view-source is probably lost already for wasm. Our current control flow approach discards the patterns that casual developers will be familiar with ( |
I think |
@kg: I don't think this is a lost cause. Yes, we don't have globals, and that's not great for readability. However, in effect, neither did asm.js - while we had the stack pointer and a few others, 99% of globals in 99% of asm.js code out there did not use named globals (they were lowered into constant addresses), and even if we did have them, minifiers (on asm.js or non-asm.js code) removed meaningful global names anyhow. So we are not regressing on the aspect of globals. But using "branch" and oddly-placed labels in the text format, on the other hand, would regress control flow significantly. Putting labels at the end, and making them "face forward", is a possible compromise, as you suggest. (In which case we should possibly call them "goto" and not "break" or "branch", to be consistent with C, etc.?) But they are not true gotos due to the structural limitation - it actually does matter that they are on the proper AST node. That is,
The reason the second branch is invalid is because of the AST structure, and I think hiding that makes things more confusing. L1 belongs to its AST node, not to the code after it. Overall, the main question I have now is what I proposed earlier as a possible compromise: Would we be willing to call something a "branch" in the spec for the binary format, but write it out as a normal labeled "break" in the text format? In other words, would we be ok with that inconsistency between the spec and what users actually read? If we are ok with that, then the entire problem in this issue goes away (and you can ignore the rest of this post): View-source can be as good as we can get it for people on the web, while assembly purists get to use the term "branch". But the downside is a lack of consistency. A lot of people will experience WebAssembly via view-source, and they'll learn "breaks", and then feel puzzled if they some time later read something more in-depth (maybe even the spec), and see things are different ("this is exactly the same concept, why is now called a break? Am I missing something subtle here?"). Going in the other direction, people reading the spec (like compiler hackers) would see "branch", but then see something else when they view the text format representation of their first binaries. It feels wrong to me to have such an inconsistency; however, if we have nothing better, I can live with it. We could try to explain the inconsistency well in the spec, at least. Another possible compromise is to split the text representation: There could be a "low-level" text format that mirrors the spec and has "branch", and a "web text format" which is free to find ways to make code more accessible to a wide audience. Again, the downsides are clear, but this does give both sides in the debate most of what they want (I think). If we can't find a way to keep the text format as readable as possible (either one of the compromises, or moving to "break"), then we need to update the FAQ entry on view-source. Right now, that entry says
In particular, the part saying "the WebAssembly text format should be much more natural to read and write than asm.js" would no longer be true if we make control flow less natural. And the rest would need updates as well, as the current text is very optimistic, but some voices here imply that really, view-source is already a losing battle, and WebAssembly is just continuing that trend, and that that isn't a problem. The only positive part of the current text that should clearly remain is the bit about source maps, but that doesn't address view-source on general web content. Moving forward: If opponents of "break" are still not swayed, and my proposed compromises are not acceptable either, then we seem to be at an impasse. If so, then I think a next step should be getting feedback from the wider web community, since view-source is an important part of web culture - and we are just a small group of compiler hackers over here. I think input from a wider group could give us a broader perspective (and maybe it'll prove me wrong, and if so, of course I'm fine with that). |
The current loop/br approach had some claimed benefits, and might these also have some utility in reading the code too? If you were going to ask people then to be fair you would want to note the pros and cons of both approaches, not just ask them which is more familiar. How would it affected these benefits if there were separate Could the |
If this is just an issue of what goes in the text format; then it seems like it can wait until we start defining the text format in earnest. Seems premature to attempt to micro-optimize usability before there is an established context or set of design criteria. |
@lukewagner: I would be totally cool with that, if it is clear that the decision here does not constrain the text format (since break/branch in the text format is a major issue IMHO). That is precisely what I was asking in my last comment - what are your thoughts there? |
Well, if we're postponing what we decide for the text format, then there isn't a decision "here". |
I meant: Are we all good with saying that we are ok with "branch" in the spec, and would be ok with "break" in the text format (should that be seen as the best option there regardless of the spec)? In other words, I want to know we won't see this argument later on: "It's 'branch' in the spec, and it would be inconsistent to call it anything else in the text format, so because of our previous decisions it has to be 'branch' there too." (I see the appeal of such an argument myself, hence I am raising it; but as mentioned earlier, in the interests of compromise, I would be willing to overlook it.) Or to put it another way: I want to know that we are not constraining the text format now, on the important matter of control flow. Totally cool to defer that discussion, as long as we are not making calls now that limit our options later. |
If we defer now, then we're not limiting our options later. I'm happy to defer now, since we've generally deferred defining the text format. Also, fwiw, I am increasingly in favor of the proposal made above of having a "low-level" text format which is just branches and labels (we could even drop the syntactic block structure since it is kinda implied by the placement of labels and thus unnecessary) and a "high-level" text format which has the high-level control flow operators and other sugar we might want for the View Source and source-to-source translation use cases. |
@lukewagner Yes, I agree that the text format needs to be deferred, it would be a huge distraction, but is there any technical reason not to always use |
@JSStats The reason I'm aware of is so that |
I don't think |
@lukewagner: great, glad to hear you don't think this limits our options later, and that you're starting to like the "split the text representation" proposal from before. I'll open a pull request with some of this, that we can discuss further. |
Opened #457 to continue discussion of the "split the text representation" proposal. |
Discussion in that pull (splitting the text format) has not managed to proceed. Meanwhile, I've been convinced by @ncbray and others that a "higher"-level text format is a harder problem than I had considered, since the decisions already made limit our text format options more than I appreciated. That makes me almost want to give up on trying to do this "right" from my perspective, but on the other hand, also that it really matters that we at least try to get the easier things that we can get right, which includes break/branch. Have those opposed to "break" given some more thought to the compromise proposals given here? If there is no change, then I have two concrete suggestions for how to move forward towards a resolution on break/branch (possibly leaving bigger text format issues for later). First, I've noticed that in practice, several of us say "branch" and "break" interchangeably, when speaking on this repo and the spec repo and irl. And, the actual name of the thing we are talking about is "br". This suggests that perhaps a simple solution is then to have the design (and future official docs) say something like "br: Move control flow to another location, with a structural limitation that it must be relative to an enclosing I think this could be a good resolution. It would use the combined intuitions behind both interpretations of If that (or the previous compromises) is not convincing for people here, then I think we can't avoid starting to think about the text format. And a good way to start there is to get some wider feedback, as suggested before, since view source is going to affect a lot of people. I am thinking about doing a small blogpost, something along the lines of "We are starting to think about the WebAssembly text format, which is important since it is what will be showed when you do "view source" in a browser. There isn't a clear direction for what the format should look like at this early stage, and it would be helpful to get people's feedback and suggestions, to help point us in the right way. In particular, here are a few very broad options, what are your thoughts?" And I was thinking to give three sketches of possible syntaxes:
Currently I don't know which I think is the best out of those options. I lean towards 2, but I've heard arguments that sway me towards 1. I see the benefit of 3, but it feels more suitable as a Let me know if anyone wants to work together on the wording of such a blogpost. |
I don't think we'll have a very productive conversation if we start with an open-ended discussion without identifying a more specific set of use cases and constraints we're trying to optimize. There's a lot more use cases to consider than just the reverse-engineering-without-source-maps use case and we haven't really started that yet. |
Looks like to good resolution for now, so add: 'Definitions: I would like to see the text format a plugin, so I can have familiar (but unpopular) s-exp! I think option 2 is the realistic one, namely to use high level familiar structures. I think option 3 would be unfortunate. Personally I think calling this 'assembly' and making it look like assembler code would be a mistake - and that is coming from someone who learnt to program in machine code. The operations being primitive and close-to-the-metal seems orthogonal to the structure to me. When I read machine/assembler code the typical process is to work out the structure, to isolate code, remove the copying from registers to stacks etc - it's a lot of unnecessary mental work and a distraction and I hope the text format can avoid this which should make wasm much more accessible and productive. |
@kripken I'd strongly recommend against that approach. I'll be happy to give you some tips on ways to approach this process. I've been through a lot of language design processes. :) |
@lukewagner 'reverse-engineering' is a very loaded word and such claims might disadvantage web users rights. It is all source code to me. It might not all be the preferred source format of the code author, but it's still source code. The binary format is just a source encoding, not machine code and not even a virtual machine code, and printing it in some arbitrary text format is not disassembly or reverse engineering, just a mechanical translation. |
Sure, we can call it the "View Source without source maps" use case instead if we want to be careful about legal triggers. It's definitely a very important use case, but not the only one. |
Withdrawing objections. It seems practical to just add a flag in an optional source code meta data section to change the presentation of |
This repo currently has
br: branch to a given label in an enclosing construct
. I assume the intention is thatIs my understanding correct?
If so, isn't this more of a "break" than a "branch"? We not are not actually branching to
$block
, we are going to$more
in fact. But the name "branch" implies to me "branch to". It seems nicer to do either(branch $more)
, which says "branch to$more
" (where we actually branch to), or(break $block)
, which says "break on$block
" (just like a C/JS/etc. break on a labeled block),over the current state, which says
(branch $block)
but actually does not branch to$block
.Obviously 1 is a forward goto, which is bad. How about 2?
The text was updated successfully, but these errors were encountered: