-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editorial: Define + use StringToNumber #1554
Conversation
spec.html
Outdated
1. Assert: Type(_str_) is String. | ||
1. Let _text_ be the sequence of Unicode code points that results from interpreting _str_ as UTF-16 encoded Unicode text as described in <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>. | ||
1. Let _literal_ be the result of parsing _text_ using the goal symbol |StringNumericLiteral|. If _text_ does not conform to the grammar, or if any elements of _text_ were not matched by the parse, return *NaN*. | ||
1. NOTE: The terminal symbols of the |StringNumericLiteral| grammar are all code points in the Unicode Basic Multilingual Plane (BMP). Therefore, this operation will return *NaN* if _str_ contains any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units, whether paired or unpaired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be phrased better as an assertion, e.g.
1. NOTE: The terminal symbols of the |StringNumericLiteral| grammar are all code points in the Unicode Basic Multilingual Plane (BMP). Therefore, this operation will return *NaN* if _str_ contains any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units, whether paired or unpaired. | |
1. Assert: _str_ does not contain any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm doubtful that that's better, but I'll change it if there's consensus that it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does read much simpler to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, sure, the suggested assertion is simpler, because it doesn't contain the reasoning that's in the Note. Even simpler would be to leave out the Note/Assertion entirely. It's only there because the status quo has the Note, and I didn't want to leave anything out. I'm not sure it actually helps the reader understand the algorithm.
And actually, the day may come when it's not true, if Unicode ever defines an astral code point with General_Category=Zs, because then that will be a valid WhiteSpace. (I don't see a guarantee that they won't, though I haven't looked very hard.) And if that ever happens, the algorithm will still run fine, except that the Note/Assertion will be wrong.
So I'm coming around to thinking that the spec is better off without that Note/Assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a commit to remove the Note.
spec.html
Outdated
1. Return the Number value for _mv_ (as specified in <emu-xref href="#sec-ecmascript-language-types-number-type"></emu-xref>). | ||
</emu-alg> | ||
|
||
<p>A digit is significant if it is not part of an |ExponentPart| and</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could benefit from either being converted into a linkable definition or moving above the algorithm steps. It feels very similar to "the string-concatenation of…".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "this", do you mean just the definition of significant digit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that I think the overall state of spec with this PR would benefit from either making a single linkable definition for "significant digit", or—failing that—continuing to define "significant digit" inside the operations that need it but moving that definition above the algorithm steps to establish context for when the concept is encountered inside them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be useful to have a single linkable definition for "significant digit". However, I think it actually could be somewhat tricky to write a definition that works for all uses, so I'd prefer to have that in a PR on its own, rather than complicating this PR.
cc @caiolima |
Those changes LGTM. |
(force-pushed to resolve conflicts after the merge of PR #1135) |
... and express the latter as a proper syntax-directed operation. (This also accomplishes some of PR tc39#1554.)
... and express the latter as a proper syntax-directed operation. (This also accomplishes some of PR tc39#1554.)
(force-pushed to resolve conflicts after the merge of PR #1515) |
(force-pushed to resolve merge conflict /3) |
forced-pushed to resolve merge conflicts /4, and:
I didn't remove the NOTE about StringNumericLiteral's terminal symbols being code points in the BMP, but I'm still thinking of doing so. StringToNumber has a potential use of ParseText(), so that could happen here or in PR #2013, whichever lands later. |
forced-pushed to:
|
spec.html
Outdated
1. If _literal_ is a List of errors, return *NaN*. | ||
1. If _literal_ contains a |StrUnsignedDecimalLiteral| and _literal_ has more than 20 significant digits, then | ||
1. Let _lit_ be an implementation-defined choice of: | ||
* _literal_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By my reading, this option is not actually available to implementations in the previous spec text:
the Number value may be either the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit or the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit and then incrementing the literal at the 20th digit position.
lists only two options, neither of which is using the MV of the actual literal.
That said, I'm almost certain that the MV of _literal_
is necessarily equal to the MV of one of the other two options in this list, given the constraints of this section. So it doesn't really matter, but by the same token is probably confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the prose is ambiguous. E.g., consider:
- "Breakfast is cereal, unless it's Saturday, in which case breakfast may be pancakes."
Here, I think it's fairly clear that Saturday breakfast could still be cereal. (Otherwise, you'd say "... is pancakes".)
- "Breakfast is cereal, unless it's Saturday, in which case breakfast may be either pancakes or waffles."
Here, it's unclear if cereal is still an option for Saturday breakfast. It's possible to read the "may be" as in the previous sentence (so cereal is an option), and yet it's also possible to read it as 'referring' only to the choice between pancakes and waffles (so cereal isn't an option).
Roughly, it's the difference between "might be" and "must be". In the spec text, it's unclear which is meant. (Note that "must be" is used earlier in the sentence, which suggests that it would have been used later if that were the appropriate sense, and hence that it's not the appropriate sense, "might be" is. But that's not very convincing.) The wording goes all the way back to the first edition.
That said, I'm almost certain that the MV of literal is necessarily equal to the MV of one of the other two options in this list, given the constraints of this section.
Yeah, me too, almost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I take @jmdyck's reading, that the MV of literal is currently allowed for literals with more than 20 significant digits.
The note says that "the result of ToNumber will be *NaN* if the string contains any [surrogate] code units, whether paired or unpaired." However, if Unicode were to define a non-BMP code point with General_Category=Zs, that would qualify as <USP> and thus WhiteSpace and StrWhiteSpaceChar, in which case the note would be rendered false. That is, the ToNumber procedure would continue to work, and would result in non-NaN values for some strings containing surrogates. (And if you think that the ES spec doesn't need to concern itself with such future possibilities, note that ES 6 (2015) changed/clarified the semantics of String.p.trim et al for precisely this case, to say how they would deal with non-BMP white space.) Given that the note could be invalidated by a future edition of Unicode, I think it's a bit risky to keep it in.
…9#1554) ... under "ToNumber Applied to the String Type". The phrase "non-|StrWhiteSpaceChar|" skips both WhiteSpace *and* LineTerminator code points, which is important for an example like `Number("\n-0")`. Also, insert "(if any)", because a StringNumericLiteral might not have any such code point. See tc39#1554 (comment)
Formerly, the procedure for applying ToNumber to the String type was spread over three widely-separated prose paragraphs in two different clauses. This commit brings all that together, expresses it as an actual algorithm, and gives it the name StringToNumber. Also, the definitions of the syntax-direction operation NumericValue mostly just delegated to a prose description. This commit replaces that with actual algorithms, similar to that for StringToNumber.
force-pushed to:
Also, I added a fixup commit for a copy-paste mistake I noticed on review. |
3d0c24c
to
7a79833
Compare
The note says that "the result of ToNumber will be *NaN* if the string contains any [surrogate] code units, whether paired or unpaired." However, if Unicode were to define a non-BMP code point with General_Category=Zs, that would qualify as <USP> and thus WhiteSpace and StrWhiteSpaceChar, in which case the note would be rendered false. That is, the ToNumber procedure would continue to work, and would result in non-NaN values for some strings containing surrogates. (And if you think that the ES spec doesn't need to concern itself with such future possibilities, note that ES 6 (2015) changed/clarified the semantics of String.p.trim et al for precisely this case, to say how they would deal with non-BMP white space.) Given that the note could be invalidated by a future edition of Unicode, I think it's a bit risky to keep it in.
Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.
(force-pushed to resolve conflicts arising from the merge of PR #2435) |
The note says that "the result of ToNumber will be *NaN* if the string contains any [surrogate] code units, whether paired or unpaired." However, if Unicode were to define a non-BMP code point with General_Category=Zs, that would qualify as <USP> and thus WhiteSpace and StrWhiteSpaceChar, in which case the note would be rendered false. That is, the ToNumber procedure would continue to work, and would result in non-NaN values for some strings containing surrogates. (And if you think that the ES spec doesn't need to concern itself with such future possibilities, note that ES 6 (2015) changed/clarified the semantics of String.p.trim et al for precisely this case, to say how they would deal with non-BMP white space.) Given that the note could be invalidated by a future edition of Unicode, I think it's a bit risky to keep it in.
Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.
(force-pushed to give StringToNumber a structured header) |
The note says that "the result of ToNumber will be *NaN* if the string contains any [surrogate] code units, whether paired or unpaired." However, if Unicode were to define a non-BMP code point with General_Category=Zs, that would qualify as <USP> and thus WhiteSpace and StrWhiteSpaceChar, in which case the note would be rendered false. That is, the ToNumber procedure would continue to work, and would result in non-NaN values for some strings containing surrogates. (And if you think that the ES spec doesn't need to concern itself with such future possibilities, note that ES 6 (2015) changed/clarified the semantics of String.p.trim et al for precisely this case, to say how they would deal with non-BMP white space.) Given that the note could be invalidated by a future edition of Unicode, I think it's a bit risky to keep it in.
Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.
The note says that "the result of ToNumber will be *NaN* if the string contains any [surrogate] code units, whether paired or unpaired." However, if Unicode were to define a non-BMP code point with General_Category=Zs, that would qualify as <USP> and thus WhiteSpace and StrWhiteSpaceChar, in which case the note would be rendered false. That is, the ToNumber procedure would continue to work, and would result in non-NaN values for some strings containing surrogates. (And if you think that the ES spec doesn't need to concern itself with such future possibilities, note that ES 6 (2015) changed/clarified the semantics of String.p.trim et al for precisely this case, to say how they would deal with non-BMP white space.) Given that the note could be invalidated by a future edition of Unicode, I think it's a bit risky to keep it in.
Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.
Formerly, the procedure for applying ToNumber to the String type was spread over three widely-separated prose paragraphs in two different clauses.
This commit brings all that together, expresses it as an actual algorithm, and gives it the name StringToNumber.
Similarly, the prose underlying the syntax-directed operation NumericValue (to get the value of a NumericLiteral) is expressed more algorithmically.