Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify well-structured control flow #475

Merged
merged 1 commit into from
Dec 8, 2015

Conversation

kripken
Copy link
Member

@kripken kripken commented Nov 23, 2015

I think it's helpful to add some clarification to what we mean by "well-structured", since that term - and similarly "irreducible control flow" and so forth - are not universally familiar, and actually have some different definitions (e.g. they can be relative to specific constructs).

In addition, this added text provides an intuition to help understand our control flow for people more familiar with high-level languages. With this addition, I hope the text will be more accessible to a wider range of compiler hackers. In particular, it could help the large community of hackers currently compiling down to JavaScript, that we would love to get compiling to WebAssembly eventually.

I believe this would also address most of my concerns on the break/branch topic (#445).

from outside it. This restriction ensures all control flow graphs are well-structured
in the exact sense as in high-level languages like Java and JavaScript. To
further see the parallel, note that a `br` to a `block`'s label is functionally
equivalent to a labeled `break` in high-level languages, that is, a `br` on a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in high-level languages, thus a br simply breaks out of a block"

wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure which part of the sentence you mean to edit to that, I can see more than one way to do it. I pushed an update which rewords part of it (which I agree makes it shorter and clearer). How is it now?

@titzer
Copy link

titzer commented Nov 23, 2015

lgtm other than minor wordsmithing on the last sentence.

@kripken kripken force-pushed the elaborate-structured-control-flow branch from 3cf3436 to abb1961 Compare November 23, 2015 18:56
from outside it. This restriction ensures all control flow graphs are well-structured
in the exact sense as in high-level languages like Java and JavaScript. To
further see the parallel, note that a `br` to a `block`'s label is functionally
equivalent to a labeled `break` in high-level languages, that is, a `br` simply
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might read a little smoother if we change ", that is, " to "in that". That avoids the comma pauses. Otherwise lgtm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, done.

@kripken kripken force-pushed the elaborate-structured-control-flow branch from abb1961 to bdcc63a Compare November 23, 2015 20:56
@jfbastien
Copy link
Member

lgtm

While we're clarifying, would you want to add examples of what not possible? Or do you think that's too much? Maybe such deep-dive should be in Rationale.md?

@kripken
Copy link
Member Author

kripken commented Nov 23, 2015

Rationale might be a good place to go into more detail, yeah.

But it's actually not easy! :) Since what's not possible is jumping into a loop, but really, even that is possible if you use a control flow threading variable, so what we really mean is "jump into a loop without additional overhead" - but even that isn't strictly true, since the threading variable would likely be eliminated out, as was mentioned back when we discussed this in more detail. So we kind of mean "not possible to jump into a loop using just br, loop and block", which is almost true, because it is actually possible to jump into the middle of the loop using those constructs, only you would be jumping over code at the beginning of the loop that is never accessible anyhow (unconditional branch at the top into the middle). Which again brings us back to "without additional overhead", but really, the entire argument here is that we can add blocks and brs to achieve certain control flow (creating stacks in some cases, and potential overhead), so we are actually introducing overhead to achieve this anyhow. So perhaps we can say "you can't jump into the middle of a loop using just br, loop and block, where 'middle of a loop' ignores dead code", but how about if we split the loop, in which case (with adding nodes) we can actually logically enter the middle...

Sorry for the lengthy paragraph, I've actually been trying to find a good way to say this - just like you, re-reading this section made me want to add something. But I can't find a way that is not too detailed while remaining correct.

I think it might actually be best to stay at the more intuitive level, which is what we'll have after this pull: (1) we mention that entering the middle of a loop is not possible, which indeed in some sense it isn't, and (2) we mention the connection to high-level languages, since our limitation is completely identical to theirs, so all readers should understand what we mean easily.

@jfbastien
Copy link
Member

I'd be fine if tricky things like this were succinct in the main design, with a link to Rationale.md for more details. It sounds like what you're suggesting?

p.s.: I'm amused at jumping into the middle of Rationale.md from another part of the design. It's meta.

@kripken
Copy link
Member Author

kripken commented Nov 23, 2015

Basically I'm saying I don't know how to write this in the Rationale without going into that entire huge paragraph :)

But if you think details like that make sense for Rationale, happy to add them, plus a link from here.

@jfbastien
Copy link
Member

Yes, I think it would be helpful: the current document isn't clear on how we got to the well-structured control flow design we have. The few of us involved in getting there have the context, but external folks don't.

@kripken
Copy link
Member Author

kripken commented Nov 23, 2015

Cool, will start a followup pull request for that now.

@sunfishcode
Copy link
Member

I propose AstSemantics.md just say "can't jump into the middle of a loop" and mean it in the literal sense, rather than trying to mean it in the sense which includes all things which are semantically equivalent to it :-). If we want to write a separate compiler-writers guide, that'd be a good place to explain the various options for lowering a loop with multiple entries.

Also, ironically, the original text here was intended to try to ease the fears of compiler writers who aren't aware of the full power of labeled break, for whom emphasizing that "it's just like JS" is actually more confusing than enlightening.

@kripken
Copy link
Member Author

kripken commented Nov 24, 2015

Created followup pull in #479.

@sunfishcode: Is your proposed change in the last comment for this pull, or for the followup?

@sunfishcode
Copy link
Member

I was addressing this patch. I was mainly agreeing that the larger paragraph above should go somewhere else besides AstSemantics.md :-).

And concerning my other comment, from my perspective, drawing parallels to high-level language constructs in AstSemantics.md is a distraction, but I realize that perspectives will vary.

@kripken
Copy link
Member Author

kripken commented Nov 24, 2015

Cool, yes, the content in that big paragraph is intended for Rationale.md, as suggested by @jfbastien. It's in #479 if you want to take a look. I think I got it better than that rambling massive paragraph here ;) but it's still not easy to summarize this stuff.

I get the point that more intuitions might be a distraction for some readers. But I think those readers likely already would understand the topic, from the rest of the spec? Whereas the parallel to high-level languages would help a large class of other compiler hackers.

@kripken
Copy link
Member Author

kripken commented Nov 25, 2015

Waiting for feedback from @sunfishcode.

@sunfishcode
Copy link
Member

My feedback is that I personally think it's more confusing than enlightening. My impression talking to even some people who know JS well is that it's not widely known just how theoretically powerful labeled break is, because its full power is almost never used in hand-written code (for good reason, to be sure), so drawing a parallel to JS doesn't seem to convey the right idea.

@kripken
Copy link
Member Author

kripken commented Nov 30, 2015

Yes, I agree not all JavaScript devs know about labeled break. Certainly many casual devs might not. But still quite a significant amount of JavaScript (and Java) developers do, in particular, the ones writing compilers to and from JavaScript (and Java) would be very likely to. And as discussed earlier I think that's a very important audience for us.

@ghost
Copy link

ghost commented Nov 30, 2015

Perhaps some of the rationale could be explained by the goal to optimize parsing and analysis and even runtime performance for simpler consumers?

For example: 'Control structures that lead to more efficient parsing and control flow analysis are clearly identified and separated from those needed to support general control flow. While the general control flow operators could be used to specify all control flow, implementations would be expected to be slower parsing them and may well not optimize them well so runtime performance many also be slower.'

@sunfishcode
Copy link
Member

@kripken Many people know how to exit from an inner loop of a nest using labeled break. However, many people I've talked to recently were not aware that labeled breaks can build arbitrary control-flow DAGs (if we treat if (x) break L as a single operation, etc.).

@kripken
Copy link
Member Author

kripken commented Nov 30, 2015

Yes, it sounds like those people are in an intermediary stage between not knowing about labeled break, and fully grokking all the theoretical implications of what it implies. But the second part of the sentence confuses me?

  1. We aren't talking about the ability to create any DAG, we are talking about what is possible to do without a helper variable, i.e., what is directly possible with labeled break - precisely what someone writing a compiler into JS would likely be aware of.
  2. The pull mentions an exact functional equivalence between labeled break and something else. This is therefore helpful intuition to anyone that knows how labeled break works, period, even if they don't have a deep understanding of the theoretical implications of that. (Of course, such a deep understanding could help them even more.)

I am quite curious to hear more about those people and their background, and what would help them better understand things. But this is starting to sound somewhat philosophical, and this pull request is just a small no-semantic-changes text clarification with several lgtms and no other objections - is it ok if I merge it (with your reservations as already noted, of course), and we'll continue the discussion separately?

@sunfishcode
Copy link
Member

The original text here is talking about the ability to create any DAG, because that's one of the things that is possible to do without a helper variable. That's one of things the original text is trying to point out, specifically to (briefly) assuage fears that wasm's structured control flow is too much restricted by high-level language sensibilities. Immediately following this with a sentence likening wasm to high-level languages weakens what the original text is trying to convey.

Another concern is that WebAssembly is a low-level language in general, but its AST structure has already led some people to think of it in high-level language terms in other areas, and it isn't a very good high-level language. The more we encourage thinking about WebAssembly literally in terms of JS or "high level languages" in the spec, the more we risk diluting wasm with conflicting purposes, potentially representing no purpose well.

@kripken
Copy link
Member Author

kripken commented Dec 1, 2015

I certainly don't want to weaken what the original text is trying to convey. But I don't see how it does - looks the opposite to me - so I don't have any idea how to fix it. Do you have a concrete suggestion for how I can improve this pull? Happy to iterate on that with you.

Regarding the second point, the addition here uses a specific analogy (and a 100% precise one) to explain a specific feature. It's not saying "wasm is high-level". But, if you feel we should clarify that wasm is low-level (I don't think we need to, but also I don't see the harm) then perhaps draft a separate pull request with that addition (for the FAQ maybe?) instead of opposing this one on a side issue? It also sounds like that side issue is a very big deal for you, so let's address it seriously and with the proper focus, on its own?

@rossberg
Copy link
Member

rossberg commented Dec 2, 2015

I, for one, do think this PR provides a useful clarification.

@sunfishcode
Copy link
Member

I have seen this list mentioned a few times around this project as something we should focus on. I think this says a lot about perspectives. I propose that a better list to think about is this list.

AstSemantics does not define or explain itself in terms of JS, and I think this is an important invariant, to encourage us to think of WebAssembly as a new language.

@sunfishcode
Copy link
Member

I have been convinced to stop opposing this PR.

One comment I would add then is that labeled break is also a feature of somewhat less high-level languages such as Rust.

@kripken kripken force-pushed the elaborate-structured-control-flow branch from bdcc63a to a181137 Compare December 3, 2015 22:35
@kripken
Copy link
Member Author

kripken commented Dec 3, 2015

Sounds good, I added Rust and Go which you found have labeled break as well.

Any other languages worth mentioning?

@jfbastien
Copy link
Member

The more we add, the more this should be in Rationale.md. AstSemantics.md is for "this is what things are", whereas Rationale.md is for "and here's why it's this way and what that implies". Maybe move all of this text (and the preceding paragraph) to Rationale.md?

@ghost
Copy link

ghost commented Dec 4, 2015

@jfbastien Personally I like to see rationale and 'implementation notes' in an annotated specification near where the issue is specified, but don't hold anything up on this account.

@kripken Common Lisp has block and return-from. E.g. (block outer (block inner (return-from outer 1)) 2) => 1' The specification describes it as 'a structured, lexical, non-local exit facility'.

@kripken
Copy link
Member Author

kripken commented Dec 4, 2015

@jfbastien: not opposed, but I literally added two words and a comma :) I was hoping not to open a new discussion in this already-too-long-issue...

@kripken
Copy link
Member Author

kripken commented Dec 4, 2015

@jfbastien: How about if I merge this and start a followup to move parts into Rationale+links to them?

Or do you prefer I do that in this pull?

@jfbastien
Copy link
Member

sgtm

@kripken
Copy link
Member Author

kripken commented Dec 8, 2015

Ok, merging this now, and will start on followup.

kripken added a commit that referenced this pull request Dec 8, 2015
…-flow

Clarify well-structured control flow
@kripken kripken merged commit 33e6a23 into master Dec 8, 2015
@jfbastien jfbastien deleted the elaborate-structured-control-flow branch December 9, 2015 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants