Editorial: Define + use StringToNumber #1554

jmdyck · 2019-05-28T03:19:22Z

Formerly, the procedure for applying ToNumber to the String type was spread over three widely-separated prose paragraphs in two different clauses.

This commit brings all that together, expresses it as an actual algorithm, and gives it the name StringToNumber.

Similarly, the prose underlying the syntax-directed operation NumericValue (to get the value of a NumericLiteral) is expressed more algorithmically.

gibson042 · 2019-05-29T19:20:42Z

spec.html

+            1. Assert: Type(_str_) is String.
+            1. Let _text_ be the sequence of Unicode code points that results from interpreting _str_ as UTF-16 encoded Unicode text as described in <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>.
+            1. Let _literal_ be the result of parsing _text_ using the goal symbol |StringNumericLiteral|. If _text_ does not conform to the grammar, or if any elements of _text_ were not matched by the parse, return *NaN*.
+            1. NOTE: The terminal symbols of the |StringNumericLiteral| grammar are all code points in the Unicode Basic Multilingual Plane (BMP). Therefore, this operation will return *NaN* if _str_ contains any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units, whether paired or unpaired.


I think this would be phrased better as an assertion, e.g.

Suggested change

1. NOTE: The terminal symbols of the |StringNumericLiteral| grammar are all code points in the Unicode Basic Multilingual Plane (BMP). Therefore, this operation will return *NaN* if _str_ contains any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units, whether paired or unpaired.

1. Assert: _str_ does not contain any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units.

I'm doubtful that that's better, but I'll change it if there's consensus that it is.

It does read much simpler to me.

Well, sure, the suggested assertion is simpler, because it doesn't contain the reasoning that's in the Note. Even simpler would be to leave out the Note/Assertion entirely. It's only there because the status quo has the Note, and I didn't want to leave anything out. I'm not sure it actually helps the reader understand the algorithm.

And actually, the day may come when it's not true, if Unicode ever defines an astral code point with General_Category=Zs, because then that will be a valid WhiteSpace. (I don't see a guarantee that they won't, though I haven't looked very hard.) And if that ever happens, the algorithm will still run fine, except that the Note/Assertion will be wrong.

So I'm coming around to thinking that the spec is better off without that Note/Assertion.

I've added a commit to remove the Note.

gibson042 · 2019-05-29T19:42:29Z

spec.html

+            1. Return the Number value for _mv_ (as specified in <emu-xref href="#sec-ecmascript-language-types-number-type"></emu-xref>).
+          </emu-alg>
+
+          <p>A digit is significant if it is not part of an |ExponentPart| and</p>


This could benefit from either being converted into a linkable definition or moving above the algorithm steps. It feels very similar to "the string-concatenation of…".

By "this", do you mean just the definition of significant digit?

What I mean is that I think the overall state of spec with this PR would benefit from either making a single linkable definition for "significant digit", or—failing that—continuing to define "significant digit" inside the operations that need it but moving that definition above the algorithm steps to establish context for when the concept is encountered inside them.

I agree that it would be useful to have a single linkable definition for "significant digit". However, I think it actually could be somewhat tricky to write a definition that works for all uses, so I'd prefer to have that in a PR on its own, rather than complicating this PR.

littledan · 2019-06-25T06:39:50Z

cc @caiolima

caiolima · 2019-06-25T21:43:24Z

Those changes LGTM.

jmdyck · 2019-07-05T00:35:35Z

(force-pushed to resolve conflicts after the merge of PR #1135)

... and express the latter as a proper syntax-directed operation. (This also accomplishes some of PR tc39#1554.)

jmdyck · 2019-10-14T01:50:34Z

(force-pushed to resolve conflicts after the merge of PR #1515)

jmdyck · 2019-10-24T00:36:40Z

(force-pushed to resolve merge conflict /3)

jmdyck · 2020-10-16T04:26:14Z

forced-pushed to resolve merge conflicts /4, and:

give StringToNumber() a standard preamble;
use 𝔽() and StringToCodePoints(); and
tweak some algorithm steps.

I didn't remove the NOTE about StringNumericLiteral's terminal symbols being code points in the BMP, but I'm still thinking of doing so.

StringToNumber has a potential use of ParseText(), so that could happen here or in PR #2013, whichever lands later.

jmdyck · 2020-12-02T22:29:01Z

forced-pushed to:

rebase to master,
remove the Note re surrogates, and
use ParseText in StringToNumber (now that PR Editorial: Extract operation ParseText #2013 has landed).

spec.html

bakkot · 2020-12-03T05:15:45Z

spec.html

+            1. If _literal_ is a List of errors, return *NaN*.
+            1. If _literal_ contains a |StrUnsignedDecimalLiteral| and _literal_ has more than 20 significant digits, then
+              1. Let _lit_ be an implementation-defined choice of:
+                * _literal_


By my reading, this option is not actually available to implementations in the previous spec text:

the Number value may be either the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit or the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit and then incrementing the literal at the 20th digit position.

lists only two options, neither of which is using the MV of the actual literal.

That said, I'm almost certain that the MV of _literal_ is necessarily equal to the MV of one of the other two options in this list, given the constraints of this section. So it doesn't really matter, but by the same token is probably confusing.

I'd say the prose is ambiguous. E.g., consider:

"Breakfast is cereal, unless it's Saturday, in which case breakfast may be pancakes."

Here, I think it's fairly clear that Saturday breakfast could still be cereal. (Otherwise, you'd say "... is pancakes".)

"Breakfast is cereal, unless it's Saturday, in which case breakfast may be either pancakes or waffles."

Here, it's unclear if cereal is still an option for Saturday breakfast. It's possible to read the "may be" as in the previous sentence (so cereal is an option), and yet it's also possible to read it as 'referring' only to the choice between pancakes and waffles (so cereal isn't an option).

Roughly, it's the difference between "might be" and "must be". In the spec text, it's unclear which is meant. (Note that "must be" is used earlier in the sentence, which suggests that it would have been used later if that were the appropriate sense, and hence that it's not the appropriate sense, "might be" is. But that's not very convincing.) The wording goes all the way back to the first edition.

That said, I'm almost certain that the MV of literal is necessarily equal to the MV of one of the other two options in this list, given the constraints of this section.

Yeah, me too, almost.

Hm, I take @jmdyck's reading, that the MV of literal is currently allowed for literals with more than 20 significant digits.

The note says that "the result of ToNumber will be *NaN* if the string contains any [surrogate] code units, whether paired or unpaired." However, if Unicode were to define a non-BMP code point with General_Category=Zs, that would qualify as <USP> and thus WhiteSpace and StrWhiteSpaceChar, in which case the note would be rendered false. That is, the ToNumber procedure would continue to work, and would result in non-NaN values for some strings containing surrogates. (And if you think that the ES spec doesn't need to concern itself with such future possibilities, note that ES 6 (2015) changed/clarified the semantics of String.p.trim et al for precisely this case, to say how they would deal with non-BMP white space.) Given that the note could be invalidated by a future edition of Unicode, I think it's a bit risky to keep it in.

…9#1554) ... under "ToNumber Applied to the String Type". The phrase "non-|StrWhiteSpaceChar|" skips both WhiteSpace *and* LineTerminator code points, which is important for an example like `Number("\n-0")`. Also, insert "(if any)", because a StringNumericLiteral might not have any such code point. See tc39#1554 (comment)

Formerly, the procedure for applying ToNumber to the String type was spread over three widely-separated prose paragraphs in two different clauses. This commit brings all that together, expresses it as an actual algorithm, and gives it the name StringToNumber. Also, the definitions of the syntax-direction operation NumericValue mostly just delegated to a prose description. This commit replaces that with actual algorithms, similar to that for StringToNumber.

jmdyck · 2021-06-12T15:29:16Z

force-pushed to:

rebase to master
add "(Editorial: Define + use StringToNumber #1554)" to commit msgs

Also, I added a fixup commit for a copy-paste mistake I noticed on review.

The note says that "the result of ToNumber will be *NaN* if the string contains any [surrogate] code units, whether paired or unpaired." However, if Unicode were to define a non-BMP code point with General_Category=Zs, that would qualify as <USP> and thus WhiteSpace and StrWhiteSpaceChar, in which case the note would be rendered false. That is, the ToNumber procedure would continue to work, and would result in non-NaN values for some strings containing surrogates. (And if you think that the ES spec doesn't need to concern itself with such future possibilities, note that ES 6 (2015) changed/clarified the semantics of String.p.trim et al for precisely this case, to say how they would deal with non-BMP white space.) Given that the note could be invalidated by a future edition of Unicode, I think it's a bit risky to keep it in.

Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.

jmdyck · 2021-07-11T04:26:41Z

(force-pushed to resolve conflicts arising from the merge of PR #2435)

The note says that "the result of ToNumber will be *NaN* if the string contains any [surrogate] code units, whether paired or unpaired." However, if Unicode were to define a non-BMP code point with General_Category=Zs, that would qualify as <USP> and thus WhiteSpace and StrWhiteSpaceChar, in which case the note would be rendered false. That is, the ToNumber procedure would continue to work, and would result in non-NaN values for some strings containing surrogates. (And if you think that the ES spec doesn't need to concern itself with such future possibilities, note that ES 6 (2015) changed/clarified the semantics of String.p.trim et al for precisely this case, to say how they would deal with non-BMP white space.) Given that the note could be invalidated by a future edition of Unicode, I think it's a bit risky to keep it in.

Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.

jmdyck · 2021-07-18T15:30:08Z

(force-pushed to give StringToNumber a structured header)

The note says that "the result of ToNumber will be *NaN* if the string contains any [surrogate] code units, whether paired or unpaired." However, if Unicode were to define a non-BMP code point with General_Category=Zs, that would qualify as <USP> and thus WhiteSpace and StrWhiteSpaceChar, in which case the note would be rendered false. That is, the ToNumber procedure would continue to work, and would result in non-NaN values for some strings containing surrogates. (And if you think that the ES spec doesn't need to concern itself with such future possibilities, note that ES 6 (2015) changed/clarified the semantics of String.p.trim et al for precisely this case, to say how they would deal with non-BMP white space.) Given that the note could be invalidated by a future edition of Unicode, I think it's a bit risky to keep it in.

Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.

The note says that "the result of ToNumber will be *NaN* if the string contains any [surrogate] code units, whether paired or unpaired." However, if Unicode were to define a non-BMP code point with General_Category=Zs, that would qualify as <USP> and thus WhiteSpace and StrWhiteSpaceChar, in which case the note would be rendered false. That is, the ToNumber procedure would continue to work, and would result in non-NaN values for some strings containing surrogates. (And if you think that the ES spec doesn't need to concern itself with such future possibilities, note that ES 6 (2015) changed/clarified the semantics of String.p.trim et al for precisely this case, to say how they would deal with non-BMP white space.) Given that the note could be invalidated by a future edition of Unicode, I think it's a bit risky to keep it in.

Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.

jmdyck added the editorial change label May 28, 2019

ljharb requested review from littledan, allenwb and waldemarhorwat May 28, 2019 05:30

gibson042 reviewed May 29, 2019

View reviewed changes

ljharb requested a review from caiolima June 25, 2019 23:33

caiolima approved these changes Jun 26, 2019

View reviewed changes

jmdyck force-pushed the NumericValue branch from 68d8da5 to 686c613 Compare July 5, 2019 00:35

jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Sep 12, 2019

Merge "Number Value" and "BigInt Value" into "NumericValue"

2349b79

... and express the latter as a proper syntax-directed operation. (This also accomplishes some of PR tc39#1554.)

caiolima pushed a commit to caiolima/ecma262 that referenced this pull request Sep 17, 2019

Merge "Number Value" and "BigInt Value" into "NumericValue"

22bb09c

... and express the latter as a proper syntax-directed operation. (This also accomplishes some of PR tc39#1554.)

jmdyck force-pushed the NumericValue branch from 686c613 to 4117629 Compare October 14, 2019 01:49

jmdyck changed the title ~~Editorial: Define + use StringToNumber and NumericValue~~ Editorial: Define + use StringToNumber, and better define NumericValue Oct 14, 2019

jmdyck force-pushed the NumericValue branch from 4117629 to 86ded1d Compare October 24, 2019 00:35

ljharb requested a review from syg October 24, 2019 21:54

jmdyck force-pushed the NumericValue branch from 86ded1d to 6e26396 Compare October 16, 2020 04:25

jmdyck force-pushed the NumericValue branch from 6e26396 to 2e95686 Compare December 2, 2020 22:28

bakkot reviewed Dec 3, 2020

View reviewed changes

spec.html Outdated Show resolved Hide resolved

bakkot reviewed Dec 3, 2020

View reviewed changes

spec.html Outdated Show resolved Hide resolved

jmdyck force-pushed the NumericValue branch from 2e95686 to cef0d3a Compare December 3, 2020 02:16

ljharb reviewed Dec 3, 2020

View reviewed changes

spec.html Outdated Show resolved Hide resolved

jmdyck force-pushed the NumericValue branch from cef0d3a to 8124e2e Compare December 3, 2020 03:17

bakkot reviewed Dec 3, 2020

View reviewed changes

bakkot added the editor call to be discussed in the next editor call label Dec 13, 2020

jmdyck force-pushed the NumericValue branch from 7635f3d to d827daf Compare June 12, 2021 15:27

bakkot mentioned this pull request Jun 14, 2021

Editorial: extract StringNumericValue from MV and add/use RoundStringMVResult helper #2435

Merged

ljharb force-pushed the master branch 3 times, most recently from 3d0c24c to 7a79833 Compare June 29, 2021 02:21

bakkot removed the editor call to be discussed in the next editor call label Jun 30, 2021

jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jul 11, 2021

Editorial: Define + use StringToNumber (tc39#1554)

5319884

Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.

jmdyck force-pushed the NumericValue branch from d827daf to 5319884 Compare July 11, 2021 04:25

michaelficarra approved these changes Jul 15, 2021

View reviewed changes

michaelficarra added ready to merge Editors believe this PR needs no further reviews, and is ready to land. and removed ready to merge Editors believe this PR needs no further reviews, and is ready to land. labels Jul 15, 2021

jmdyck changed the title ~~Editorial: Define + use StringToNumber, and better define NumericValue~~ Editorial: Define + use StringToNumber Jul 15, 2021

jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jul 18, 2021

Editorial: Define + use StringToNumber (tc39#1554)

4563151

Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.

jmdyck force-pushed the NumericValue branch from 5319884 to 4563151 Compare July 18, 2021 15:28

michaelficarra added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Jul 19, 2021

jmdyck added 2 commits July 19, 2021 11:22

Editorial: Define + use StringToNumber (tc39#1554)

37d6204

Introduce the abstract operation StringToNumber to replace the prose that expressed the procedure for applying ToNumber to String values.

ljharb force-pushed the NumericValue branch from 4563151 to 37d6204 Compare July 19, 2021 18:23

ljharb merged commit 37d6204 into tc39:master Jul 19, 2021

jmdyck deleted the NumericValue branch July 20, 2021 03:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editorial: Define + use StringToNumber #1554

Editorial: Define + use StringToNumber #1554

jmdyck commented May 28, 2019 •

edited

Loading

gibson042 May 29, 2019

jmdyck May 29, 2019

ljharb Jun 26, 2019

jmdyck Jun 26, 2019

jmdyck Dec 2, 2020

gibson042 May 29, 2019

jmdyck May 29, 2019

gibson042 May 29, 2019

jmdyck Jun 26, 2019

littledan commented Jun 25, 2019

caiolima commented Jun 25, 2019

jmdyck commented Jul 5, 2019

jmdyck commented Oct 14, 2019

jmdyck commented Oct 24, 2019

jmdyck commented Oct 16, 2020

jmdyck commented Dec 2, 2020

bakkot Dec 3, 2020

jmdyck Dec 3, 2020

syg Dec 16, 2020

jmdyck commented Jun 12, 2021

jmdyck commented Jul 11, 2021

jmdyck commented Jul 18, 2021

	1. NOTE: The terminal symbols of the \|StringNumericLiteral\| grammar are all code points in the Unicode Basic Multilingual Plane (BMP). Therefore, this operation will return NaN if _str_ contains any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units, whether paired or unpaired.
	1. Assert: _str_ does not contain any <emu-xref href="#leading-surrogate"></emu-xref> or <emu-xref href="#trailing-surrogate"></emu-xref> code units.

Editorial: Define + use StringToNumber #1554

Editorial: Define + use StringToNumber #1554

Conversation

jmdyck commented May 28, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

littledan commented Jun 25, 2019

caiolima commented Jun 25, 2019

jmdyck commented Jul 5, 2019

jmdyck commented Oct 14, 2019

jmdyck commented Oct 24, 2019

jmdyck commented Oct 16, 2020

jmdyck commented Dec 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmdyck commented Jun 12, 2021

jmdyck commented Jul 11, 2021

jmdyck commented Jul 18, 2021

jmdyck commented May 28, 2019 •

edited

Loading