Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Define + use StringToNumber #1554

Merged
merged 2 commits into from
Jul 19, 2021
Merged

Conversation

jmdyck
Copy link
Collaborator

@jmdyck jmdyck commented May 28, 2019

Formerly, the procedure for applying ToNumber to the String type was spread over three widely-separated prose paragraphs in two different clauses.

This commit brings all that together, expresses it as an actual algorithm, and gives it the name StringToNumber.

Similarly, the prose underlying the syntax-directed operation NumericValue (to get the value of a NumericLiteral) is expressed more algorithmically.

spec.html Outdated
1. Assert: Type(_str_) is String.
1. Let _text_ be the sequence of Unicode code points that results from interpreting _str_ as UTF-16 encoded Unicode text as described in <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>.
1. Let _literal_ be the result of parsing _text_ using the goal symbol |StringNumericLiteral|. If _text_ does not conform to the grammar, or if any elements of _text_ were not matched by the parse, return *NaN*.
1. NOTE: The terminal symbols of the |StringNumericLiteral| grammar are all code points in the Unicode Basic Multilingual Plane (BMP). Therefore, this operation will return *NaN* if _str_ contains any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units, whether paired or unpaired.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be phrased better as an assertion, e.g.

Suggested change
1. NOTE: The terminal symbols of the |StringNumericLiteral| grammar are all code points in the Unicode Basic Multilingual Plane (BMP). Therefore, this operation will return *NaN* if _str_ contains any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units, whether paired or unpaired.
1. Assert: _str_ does not contain any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doubtful that that's better, but I'll change it if there's consensus that it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does read much simpler to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, sure, the suggested assertion is simpler, because it doesn't contain the reasoning that's in the Note. Even simpler would be to leave out the Note/Assertion entirely. It's only there because the status quo has the Note, and I didn't want to leave anything out. I'm not sure it actually helps the reader understand the algorithm.

And actually, the day may come when it's not true, if Unicode ever defines an astral code point with General_Category=Zs, because then that will be a valid WhiteSpace. (I don't see a guarantee that they won't, though I haven't looked very hard.) And if that ever happens, the algorithm will still run fine, except that the Note/Assertion will be wrong.

So I'm coming around to thinking that the spec is better off without that Note/Assertion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit to remove the Note.

spec.html Outdated
1. Return the Number value for _mv_ (as specified in <emu-xref href="#sec-ecmascript-language-types-number-type"></emu-xref>).
</emu-alg>

<p>A digit is significant if it is not part of an |ExponentPart| and</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could benefit from either being converted into a linkable definition or moving above the algorithm steps. It feels very similar to "the string-concatenation of…".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "this", do you mean just the definition of significant digit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that I think the overall state of spec with this PR would benefit from either making a single linkable definition for "significant digit", or—failing that—continuing to define "significant digit" inside the operations that need it but moving that definition above the algorithm steps to establish context for when the concept is encountered inside them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be useful to have a single linkable definition for "significant digit". However, I think it actually could be somewhat tricky to write a definition that works for all uses, so I'd prefer to have that in a PR on its own, rather than complicating this PR.

@littledan
Copy link
Member

cc @caiolima

@caiolima
Copy link
Contributor

Those changes LGTM.

@ljharb ljharb requested a review from caiolima June 25, 2019 23:33
@jmdyck
Copy link
Collaborator Author

jmdyck commented Jul 5, 2019

(force-pushed to resolve conflicts after the merge of PR #1135)

jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Sep 12, 2019
... and express the latter as a proper syntax-directed operation.

(This also accomplishes some of PR tc39#1554.)
caiolima pushed a commit to caiolima/ecma262 that referenced this pull request Sep 17, 2019
... and express the latter as a proper syntax-directed operation.

(This also accomplishes some of PR tc39#1554.)
@jmdyck
Copy link
Collaborator Author

jmdyck commented Oct 14, 2019

(force-pushed to resolve conflicts after the merge of PR #1515)

@jmdyck jmdyck changed the title Editorial: Define + use StringToNumber and NumericValue Editorial: Define + use StringToNumber, and better define NumericValue Oct 14, 2019
@jmdyck
Copy link
Collaborator Author

jmdyck commented Oct 24, 2019

(force-pushed to resolve merge conflict /3)

@ljharb ljharb requested a review from syg October 24, 2019 21:54
@jmdyck
Copy link
Collaborator Author

jmdyck commented Oct 16, 2020

forced-pushed to resolve merge conflicts /4, and:

  • give StringToNumber() a standard preamble;
  • use 𝔽() and StringToCodePoints(); and
  • tweak some algorithm steps.

I didn't remove the NOTE about StringNumericLiteral's terminal symbols being code points in the BMP, but I'm still thinking of doing so.

StringToNumber has a potential use of ParseText(), so that could happen here or in PR #2013, whichever lands later.

@jmdyck
Copy link
Collaborator Author

jmdyck commented Dec 2, 2020

forced-pushed to:

spec.html Outdated Show resolved Hide resolved
spec.html Outdated Show resolved Hide resolved
spec.html Outdated Show resolved Hide resolved
spec.html Outdated
1. If _literal_ is a List of errors, return *NaN*.
1. If _literal_ contains a |StrUnsignedDecimalLiteral| and _literal_ has more than 20 significant digits, then
1. Let _lit_ be an implementation-defined choice of:
* _literal_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By my reading, this option is not actually available to implementations in the previous spec text:

the Number value may be either the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit or the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit and then incrementing the literal at the 20th digit position.

lists only two options, neither of which is using the MV of the actual literal.

That said, I'm almost certain that the MV of _literal_ is necessarily equal to the MV of one of the other two options in this list, given the constraints of this section. So it doesn't really matter, but by the same token is probably confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say the prose is ambiguous. E.g., consider:

  • "Breakfast is cereal, unless it's Saturday, in which case breakfast may be pancakes."

Here, I think it's fairly clear that Saturday breakfast could still be cereal. (Otherwise, you'd say "... is pancakes".)

  • "Breakfast is cereal, unless it's Saturday, in which case breakfast may be either pancakes or waffles."

Here, it's unclear if cereal is still an option for Saturday breakfast. It's possible to read the "may be" as in the previous sentence (so cereal is an option), and yet it's also possible to read it as 'referring' only to the choice between pancakes and waffles (so cereal isn't an option).

Roughly, it's the difference between "might be" and "must be". In the spec text, it's unclear which is meant. (Note that "must be" is used earlier in the sentence, which suggests that it would have been used later if that were the appropriate sense, and hence that it's not the appropriate sense, "might be" is. But that's not very convincing.) The wording goes all the way back to the first edition.

That said, I'm almost certain that the MV of literal is necessarily equal to the MV of one of the other two options in this list, given the constraints of this section.

Yeah, me too, almost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I take @jmdyck's reading, that the MV of literal is currently allowed for literals with more than 20 significant digits.

@bakkot bakkot added the editor call to be discussed in the next editor call label Dec 13, 2020
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jun 12, 2021
The note says that "the result of ToNumber will be *NaN*
if the string contains any [surrogate] code units,
whether paired or unpaired."

However, if Unicode were to define a non-BMP code point
with General_Category=Zs, that would qualify as <USP>
and thus WhiteSpace and StrWhiteSpaceChar, in which case
the note would be rendered false. That is, the ToNumber
procedure would continue to work, and would result in
non-NaN values for some strings containing surrogates.

(And if you think that the ES spec doesn't need to concern itself
with such future possibilities, note that ES 6 (2015) changed/clarified
the semantics of String.p.trim et al for precisely this case,
to say how they would deal with non-BMP white space.)

Given that the note could be invalidated by a future
edition of Unicode, I think it's a bit risky to keep it in.
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jun 12, 2021
…9#1554)

... under "ToNumber Applied to the String Type".

The phrase "non-|StrWhiteSpaceChar|" skips both WhiteSpace
*and* LineTerminator code points, which is important
for an example like `Number("\n-0")`.

Also, insert "(if any)", because a StringNumericLiteral
might not have any such code point.

See tc39#1554 (comment)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jun 12, 2021
Formerly, the procedure for applying ToNumber to the String type
was spread over three widely-separated prose paragraphs
in two different clauses.

This commit brings all that together, expresses it as an actual algorithm,
and gives it the name StringToNumber.

Also, the definitions of the syntax-direction operation
NumericValue mostly just delegated to a prose description.
This commit replaces that with actual algorithms,
similar to that for StringToNumber.
@jmdyck
Copy link
Collaborator Author

jmdyck commented Jun 12, 2021

force-pushed to:

Also, I added a fixup commit for a copy-paste mistake I noticed on review.

@ljharb ljharb force-pushed the master branch 3 times, most recently from 3d0c24c to 7a79833 Compare June 29, 2021 02:21
@bakkot bakkot removed the editor call to be discussed in the next editor call label Jun 30, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jul 11, 2021
The note says that "the result of ToNumber will be *NaN*
if the string contains any [surrogate] code units,
whether paired or unpaired."

However, if Unicode were to define a non-BMP code point
with General_Category=Zs, that would qualify as <USP>
and thus WhiteSpace and StrWhiteSpaceChar, in which case
the note would be rendered false. That is, the ToNumber
procedure would continue to work, and would result in
non-NaN values for some strings containing surrogates.

(And if you think that the ES spec doesn't need to concern itself
with such future possibilities, note that ES 6 (2015) changed/clarified
the semantics of String.p.trim et al for precisely this case,
to say how they would deal with non-BMP white space.)

Given that the note could be invalidated by a future
edition of Unicode, I think it's a bit risky to keep it in.
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jul 11, 2021
Introduce the abstract operation StringToNumber
to replace the prose that expressed the procedure
for applying ToNumber to String values.
@jmdyck
Copy link
Collaborator Author

jmdyck commented Jul 11, 2021

(force-pushed to resolve conflicts arising from the merge of PR #2435)

@michaelficarra michaelficarra added ready to merge Editors believe this PR needs no further reviews, and is ready to land. and removed ready to merge Editors believe this PR needs no further reviews, and is ready to land. labels Jul 15, 2021
@jmdyck jmdyck changed the title Editorial: Define + use StringToNumber, and better define NumericValue Editorial: Define + use StringToNumber Jul 15, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jul 18, 2021
The note says that "the result of ToNumber will be *NaN*
if the string contains any [surrogate] code units,
whether paired or unpaired."

However, if Unicode were to define a non-BMP code point
with General_Category=Zs, that would qualify as <USP>
and thus WhiteSpace and StrWhiteSpaceChar, in which case
the note would be rendered false. That is, the ToNumber
procedure would continue to work, and would result in
non-NaN values for some strings containing surrogates.

(And if you think that the ES spec doesn't need to concern itself
with such future possibilities, note that ES 6 (2015) changed/clarified
the semantics of String.p.trim et al for precisely this case,
to say how they would deal with non-BMP white space.)

Given that the note could be invalidated by a future
edition of Unicode, I think it's a bit risky to keep it in.
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jul 18, 2021
Introduce the abstract operation StringToNumber
to replace the prose that expressed the procedure
for applying ToNumber to String values.
@jmdyck
Copy link
Collaborator Author

jmdyck commented Jul 18, 2021

(force-pushed to give StringToNumber a structured header)

@michaelficarra michaelficarra added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Jul 19, 2021
jmdyck added 2 commits July 19, 2021 11:22
The note says that "the result of ToNumber will be *NaN*
if the string contains any [surrogate] code units,
whether paired or unpaired."

However, if Unicode were to define a non-BMP code point
with General_Category=Zs, that would qualify as <USP>
and thus WhiteSpace and StrWhiteSpaceChar, in which case
the note would be rendered false. That is, the ToNumber
procedure would continue to work, and would result in
non-NaN values for some strings containing surrogates.

(And if you think that the ES spec doesn't need to concern itself
with such future possibilities, note that ES 6 (2015) changed/clarified
the semantics of String.p.trim et al for precisely this case,
to say how they would deal with non-BMP white space.)

Given that the note could be invalidated by a future
edition of Unicode, I think it's a bit risky to keep it in.
Introduce the abstract operation StringToNumber
to replace the prose that expressed the procedure
for applying ToNumber to String values.
@ljharb ljharb merged commit 37d6204 into tc39:master Jul 19, 2021
@jmdyck jmdyck deleted the NumericValue branch July 20, 2021 03:34
mathiasbynens pushed a commit to mathiasbynens/ecma262 that referenced this pull request Oct 18, 2021
The note says that "the result of ToNumber will be *NaN*
if the string contains any [surrogate] code units,
whether paired or unpaired."

However, if Unicode were to define a non-BMP code point
with General_Category=Zs, that would qualify as <USP>
and thus WhiteSpace and StrWhiteSpaceChar, in which case
the note would be rendered false. That is, the ToNumber
procedure would continue to work, and would result in
non-NaN values for some strings containing surrogates.

(And if you think that the ES spec doesn't need to concern itself
with such future possibilities, note that ES 6 (2015) changed/clarified
the semantics of String.p.trim et al for precisely this case,
to say how they would deal with non-BMP white space.)

Given that the note could be invalidated by a future
edition of Unicode, I think it's a bit risky to keep it in.
mathiasbynens pushed a commit to mathiasbynens/ecma262 that referenced this pull request Oct 18, 2021
Introduce the abstract operation StringToNumber
to replace the prose that expressed the procedure
for applying ToNumber to String values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial change ready to merge Editors believe this PR needs no further reviews, and is ready to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants