Adding ' ' flag to string.format & double rounding method #3283

vsky279 · 2020-09-14T20:24:59Z

This PR is for the dev branch rather than for master.
This PR is compliant with the other contributing guidelines as well (if not, please describe why).
I have thoroughly tested my contribution.
The code changes are reflected in the documentation at docs/*.

This PR has two commits for the time being. These are to be squashed before merger.

The first commit implements ' ' flag to string format (%f, %g, %e)

It passes well the test performed by @mk-pmb in #3265 - this script.

Also it removes a part of code non compatible with Ubuntu's Lua 5.1 as well as Lua 5.3.
The code is removing the - sign for negative zero. "%.3f"%-0.0004 should indeed give -0.000.

This test gives same results in NodeMCU Lua as well as in Ubuntu:

for i=-1,1 do print(("%.3f\t%+.3f\t% .3f"):format(0.0004*i, 0.0004*i, 0.0004*i)) end
-0.000	-0.000	-0.000
0.000	+0.000	 0.000
0.000	+0.000	 0.000

The second commit deals with intuitively wrong rounding

As discussed in #3265 "%.3f"%3.1415 gives 3.141 (because of its float representation). Mathematically this is correct behavior and is not a bug per se while user would expect different behavior.
I am proposing this workaround which I think is in line with the existing logic. Note that =3.1415 prints 3.1415 and not 3.1414999. So when it is detected that such number would print this way the least significant bit of float mantise is added leading to "correct" rounding.

This is the test it passes well

for i=1,20 do d=(3.1405+i*0.001); print(("%.3f\t%7f\t%.20f"):format(d, d, d)) end
3.142	3.141500	3.14149999618530273438
3.143	3.142500	3.14250016212463378906
3.144	3.143500	3.14350008964538574219
3.145	3.144500	3.14450001716613769531
3.146	3.145500	3.14550018310546875000
3.147	3.146500	3.14650011062622070313
3.148	3.147500	3.14750003814697265625
3.149	3.148500	3.14849996566772460938
3.150	3.149500	3.14950013160705566406
3.151	3.150500	3.15050005912780761719
3.152	3.151500	3.15149998664855957031
3.153	3.152500	3.15250015258789062500
3.154	3.153500	3.15350008010864257813
3.155	3.154500	3.15450000762939453125
3.156	3.155500	3.15550017356872558594
3.157	3.156500	3.15650010108947753906
3.158	3.157500	3.15750002861022949219
3.159	3.158500	3.15849995613098144531
3.160	3.159500	3.15950012207031250000
3.161	3.160500	3.16050004959106445313

The result is the same as on Ubuntu Lua except for %.20f of course as 64bit doubles are used.
It works well also for %e format

for i=1,10 do d=-(3.1405+i*0.001)*1e-5; print(("%.3e\t%7e\t%.20e"):format(d, d, d)) end
-3.142e-05	-3.141500e-05	-3.14149983751121908426e-05
-3.143e-05	-3.142500e-05	-3.14249991788528859615e-05
-3.144e-05	-3.143500e-05	-3.14349999825935810804e-05
-3.145e-05	-3.144500e-05	-3.14450007863342761993e-05
-3.146e-05	-3.145500e-05	-3.14550015900749713182e-05
-3.147e-05	-3.146500e-05	-3.14649987558368593454e-05
-3.148e-05	-3.147500e-05	-3.14749995595775544643e-05
-3.149e-05	-3.148500e-05	-3.14850003633182495832e-05
-3.150e-05	-3.149500e-05	-3.14950011670589447021e-05
-3.151e-05	-3.150500e-05	-3.15049983328208327293e-05

pjsg · 2020-09-14T20:35:24Z

I don't think that I like 3.1414999 rounding to 3.142

Note that I would expect 0.146 to round to 0.15 (for 2 dp) and to 0.1 (for 1 dp).

vsky279 · 2020-09-14T20:38:28Z

Do you like rounding 3.1415 to 3.141 (for 3 dp)?
="%.3f"%3.1414999 of course rounds correctly to 3.141.

The issue is that 3.1415 is represented as 3.141499996185303 thus "wrong" rounding.

 ="%.2f"%0.146
0.15
="%.1f"%0.146
0.1

pjsg · 2020-09-14T20:50:39Z

If there isn't a representation for the value 3.1415, then you have to use the closest representation -- and since it is less than 3.141500000000, then it should round to 3.141.

vsky279 · 2020-09-14T20:57:45Z

...which is not very intuitive especially when print(3.1415) prints 3.1415 (and hides the real representation).

vsky279 · 2020-09-15T06:42:12Z

@pjsg Philip, please note, that this workaround should be compatible with your PR #3225. It is efficient only when the number is float for doubles it will not be triggered. It is checked with the !(dbl_u.m_long & 0x1fffffff) condition.

mk-pmb · 2020-09-15T18:47:14Z

Your patches seem to have lines with trailing whitespace.
After applying them, my firmware repo is at branch dev, commit a06809d "String format %f, %g, %e intuitively incorrect rounding workaround" with recent history:

0136fcb Adding ' ' flag to string.format
139af0c Handle tcp fragmentation and also fix XSS problem. (Handle tcp fragmentation and also fix XSS problem. #3275)
c000a08 Fix problem reading files a multiple of 128 bytes long (Fix problem reading files a multiple of 128 bytes long #3280)
64bbf00 Adding '+' flag to string.format
4d78479 Add deprecation note to node.flashreload (Add deprecation note to node.flashreload #3273)

and this firmware was buiilt from it.

With your patches, numfmt.lua now prints the same on ESP8266 as on Ubuntu, both for 3.1410 and 3.1415 (log).

pjsg · 2020-09-16T00:40:37Z

This code does weird stuff if the bottom 29 bits of the float are zero -- this will mean that printing increasing double values will not result in increasing numbers being displayed (in the double version of the build).

These are the values printed by C for floats close to 3.1415

  Hex       %.1f   %.2f   %.3f    %.4f     %.5f      %.6f       %.7f        %.8f         %.9f          
0x40490e4c:  3.1   3.14   3.141   3.1415   3.14150   3.141498   3.1414976   3.14149761   3.141497612   
0x40490e4d:  3.1   3.14   3.141   3.1415   3.14150   3.141498   3.1414979   3.14149785   3.141497850   
0x40490e4e:  3.1   3.14   3.141   3.1415   3.14150   3.141498   3.1414981   3.14149809   3.141498089   
0x40490e4f:  3.1   3.14   3.141   3.1415   3.14150   3.141498   3.1414983   3.14149833   3.141498327   
0x40490e50:  3.1   3.14   3.141   3.1415   3.14150   3.141499   3.1414986   3.14149857   3.141498566   
0x40490e51:  3.1   3.14   3.141   3.1415   3.14150   3.141499   3.1414988   3.14149880   3.141498804   
0x40490e52:  3.1   3.14   3.141   3.1415   3.14150   3.141499   3.1414990   3.14149904   3.141499043   
0x40490e53:  3.1   3.14   3.141   3.1415   3.14150   3.141499   3.1414993   3.14149928   3.141499281   
0x40490e54:  3.1   3.14   3.141   3.1415   3.14150   3.141500   3.1414995   3.14149952   3.141499519   
0x40490e55:  3.1   3.14   3.141   3.1415   3.14150   3.141500   3.1414998   3.14149976   3.141499758   
0x40490e56:  3.1   3.14   3.141   3.1415   3.14150   3.141500   3.1415000   3.14150000   3.141499996     <--- this is 3.1415
0x40490e57:  3.1   3.14   3.142   3.1415   3.14150   3.141500   3.1415002   3.14150023   3.141500235   
0x40490e58:  3.1   3.14   3.142   3.1415   3.14150   3.141500   3.1415005   3.14150047   3.141500473   
0x40490e59:  3.1   3.14   3.142   3.1415   3.14150   3.141501   3.1415007   3.14150071   3.141500711   
0x40490e5a:  3.1   3.14   3.142   3.1415   3.14150   3.141501   3.1415009   3.14150095   3.141500950   
0x40490e5b:  3.1   3.14   3.142   3.1415   3.14150   3.141501   3.1415012   3.14150119   3.141501188   
0x40490e5c:  3.1   3.14   3.142   3.1415   3.14150   3.141501   3.1415014   3.14150143   3.141501427   
0x40490e5d:  3.1   3.14   3.142   3.1415   3.14150   3.141502   3.1415017   3.14150167   3.141501665   
0x40490e5e:  3.1   3.14   3.142   3.1415   3.14150   3.141502   3.1415019   3.14150190   3.141501904   
0x40490e5f:  3.1   3.14   3.142   3.1415   3.14150   3.141502   3.1415021   3.14150214   3.141502142   
0x40490e60:  3.1   3.14   3.142   3.1415   3.14150   3.141502   3.1415024   3.14150238   3.141502380

These values seem plausible. I would hope that the Lua code produces the same values......

vsky279 · 2020-09-16T06:31:25Z

Now 32bit Lua does what's in your printout. The only difference with this patch compared to your printout is in the 3.1415 line (%.3f):

  Hex       %.1f   %.2f   %.3f    %.4f     %.5f      %.6f       %.7f        %.8f         %.9f   
0x40490e55:  3.1   3.14   3.141   3.1415   3.14150   3.141500   3.1414998   3.14149976   3.141499758   
0x40490e56:  3.1   3.14   3.142   3.1415   3.14150   3.141500   3.1415000   3.14150000   3.141499996     <--- this is 3.1415
0x40490e57:  3.1   3.14   3.142   3.1415   3.14150   3.141500   3.1415002   3.14150023   3.141500235

Mathematically it's wrong. Intuitively (if you don't care about how floats are stored) it's correct.
This workaround should be left out when compiled in 64bit Lua.

pjsg · 2020-09-18T00:56:57Z

I think that having different rules for lua32 than every other 32 bit float implementation is asking for trouble.

vsky279 · 2020-09-18T05:39:32Z

Well this is very relevant comment. Though I still like this workaround. Would it be acceptable for everyone to have this an optionality configurable in user_config.h disabled by default?
E.g. //#define INTUITIVE_FLOAT_ROUNDING

mk-pmb · 2020-09-19T00:04:13Z

I like the option but we should use another name, because what seems "intuitive" to whom, is more of a feeling and may change. There are many explicitly-named kinds of rounding in WP-en. I'll try to figure out which one is the one I learned in school math. Until then my draft option name would be SPRINTF_USE_SCHOOL_MATH_ROUNDING. However, will this always be a boolean? My gut suspects that some day in the future we'd like to introduce a 3rd rounding and thus should use an enum.

Edit: In my tests so far, I forgot about negative numbers. Here's how my Ubuntu's bash's printf rounds:

$ printf '%8.3f\t%8.4g\t%8.6g\n' {,-}{,12}3.1415{,,}
   3.141           3.141          3.1415
 123.142           123.1         123.142
  -3.141          -3.141         -3.1415
-123.142          -123.1        -123.142

I'll try and compare the LUAs soon.

sonaux · 2020-09-19T01:36:13Z

Most probably school math rounding is variant named "Commercial rounding" in German WP, or "Nearest integer rounding" in Russian WP. English WP tells about special case but not method itself.

mk-pmb · 2020-09-19T02:01:37Z

WP-en's section "Round half away from zero" has "This method, also known as commercial rounding, …" and that's what I currently consider intuitive. Would it be easily feasible to have an enum setting SPRINTF_ROUNDING_MODE where one possible value is sth. like SRM_COMMERCIAL?
Another value should be sth. that expresses the intent to emulate the default Ubuntu LUA or the default Windows LUA. (Do they vary accross CPU widths?) It may effectively be the same as one of the other rounding modes, but we would take responsibility for maintaining the mapping from intent to implementation.
Edit: On 2nd thought I'm not sure if there's an important enough use case for being exactly compatible with other OSs.

vsky279 · 2020-09-19T07:37:46Z

Here's how my Ubuntu's bash's printf rounds:

It seems that Ubuntu's printf suffers the same issue as 3.1415 rounds to 3.141 in your printout. Maybe it uses single-precision float?

Thanks for your thoughts.
I went quickly through the WP page. It seems to me that the closest matching method to what's implemented is Double rounding. In their example 9.46 can round to 10. This is our case too. In the implementation (I will stop calling it workaround) the number is first rounded to 7 decimal point digits and then to the requested number of dps.

In order not to complicate the code I suggest

#define SPRINTF_ROUNDING_MODE_DOUBLEROUNDING

Should there be other rounding method implemented the directive will look like SPRINTF_ROUNDING_MODE_SOMENEWMETHOD.

mk-pmb · 2020-09-19T15:38:00Z

Are tostring/tonumber or sjson.{en,de}code affected by this? Should we test each rounding method for rounding error accumulation with repeated conversion, or are there some theoretical guarantees that make it redundant?

pjsg · 2020-09-19T18:21:18Z

It would be nice to have some tests that went to/from Json and verified that it round tripped correctly

…

On Sat, Sep 19, 2020, 11:38 M.K. ***@***.***> wrote: Should we test each rounding method for rounding error accumulation with repeated conversion tostring/tonumber or sjson.{en,de}code, or are there some theoretical guarantees that make it redundant? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3283 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALQLTNSG5E7FRSFS27L5KLSGTF6JANCNFSM4RMBI4KQ> .

vsky279 · 2020-09-20T08:16:07Z

I hope this test is what you've been asking for:

a={}
for i=1,100 do a[i]=math.floor(node.random()*1e7)/(10^math.floor(math.random()*7)) end
b=sjson.encode(a)
c=sjson.decode(b)
for i,j in pairs(a) do
    if a[i]~=c[i] then print(("mismatch [%d]: %.20f ~= %.20f, diff: %.20f"):format(i, a[i], c[i], a[i]-c[i])) end 
end

It generates numbers with 7 significant digits with decimal point somewhere between these digits (%7g format is used when exporting to json). The test passes well in both cases (SPRINTF_ROUNDING_MODE_DOUBLEROUNDING disable/enabled).

Of course when 8 significant digits are generated, e.g. math.floor(node.random()*1e7)/(10^(math.floor(math.random()*20)-10)) or math.floor(node.random()*1e8)/(10^math.floor(math.random()*7)) the test does not pass for all generated numbers no matter what rounding mode is selected.

To wrap up: the rounding mode setting does not make any difference. It is activated only when less or equal than 6 dp are printed (condition trunc<=6).

pjsg · 2020-09-20T12:28:10Z

Yes - that is exactly the right test.... Thank you for doing it.

mk-pmb · 2020-09-20T13:25:32Z

For debugging it might be nicer to use a fixed set of numbers. Maybe someone familiar with the bit-level side of the problem could (possibly later) replace the randomness with specifically chosen edge focus on the potential problems.

pjsg · 2020-09-20T13:30:27Z

Actually, I just filed #3287 which reveals that there are some values that do not round trip correctly. In fact, it turns out that around 60% of floats do not round trip exactly. This is ugly.

vsky279 · 2020-09-27T09:37:45Z

Closed thanks to #3291.
I will proposed double rounding PR for the new snprintf (' ' flag is solved by the new PR).

vsky279 mentioned this pull request Sep 14, 2020

String formatting: plus sign ignored for %+g #3265

Closed

marcelstoer added this to the Next release milestone Sep 15, 2020

marcelstoer added bugfix enhancement labels Sep 15, 2020

vsky279 force-pushed the c_stringformat branch from 85f3df8 to 3a98c8a Compare September 18, 2020 19:41

Adding ' ' flag to string.format

025a406

vsky279 force-pushed the c_stringformat branch from 3a98c8a to 127a30e Compare September 18, 2020 19:48

String format %f, %g, %e double rounding method

9dd72d7

vsky279 force-pushed the c_stringformat branch from 127a30e to 9dd72d7 Compare September 19, 2020 07:37

vsky279 changed the title ~~Adding ' ' flag to string.format & intuitively wrong rounding fix workaround~~ Adding ' ' flag to string.format & double rounding method Sep 19, 2020

vsky279 mentioned this pull request Sep 24, 2020

Add snprintf and remove old sprintf #3291

Merged

3 tasks

vsky279 closed this Sep 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding ' ' flag to string.format & double rounding method #3283

Adding ' ' flag to string.format & double rounding method #3283

vsky279 commented Sep 14, 2020 •

edited

Loading

pjsg commented Sep 14, 2020

vsky279 commented Sep 14, 2020 •

edited

Loading

pjsg commented Sep 14, 2020

vsky279 commented Sep 14, 2020

vsky279 commented Sep 15, 2020

mk-pmb commented Sep 15, 2020

pjsg commented Sep 16, 2020

vsky279 commented Sep 16, 2020 •

edited

Loading

pjsg commented Sep 18, 2020

vsky279 commented Sep 18, 2020

mk-pmb commented Sep 19, 2020 •

edited

Loading

sonaux commented Sep 19, 2020

mk-pmb commented Sep 19, 2020 •

edited

Loading

vsky279 commented Sep 19, 2020

mk-pmb commented Sep 19, 2020 •

edited

Loading

pjsg commented Sep 19, 2020 via email

vsky279 commented Sep 20, 2020

pjsg commented Sep 20, 2020

mk-pmb commented Sep 20, 2020

pjsg commented Sep 20, 2020

vsky279 commented Sep 27, 2020

Adding ' ' flag to string.format & double rounding method #3283

Adding ' ' flag to string.format & double rounding method #3283

Conversation

vsky279 commented Sep 14, 2020 • edited Loading

The first commit implements ' ' flag to string format (%f, %g, %e)

The second commit deals with intuitively wrong rounding

pjsg commented Sep 14, 2020

vsky279 commented Sep 14, 2020 • edited Loading

pjsg commented Sep 14, 2020

vsky279 commented Sep 14, 2020

vsky279 commented Sep 15, 2020

mk-pmb commented Sep 15, 2020

pjsg commented Sep 16, 2020

vsky279 commented Sep 16, 2020 • edited Loading

pjsg commented Sep 18, 2020

vsky279 commented Sep 18, 2020

mk-pmb commented Sep 19, 2020 • edited Loading

sonaux commented Sep 19, 2020

mk-pmb commented Sep 19, 2020 • edited Loading

vsky279 commented Sep 19, 2020

mk-pmb commented Sep 19, 2020 • edited Loading

pjsg commented Sep 19, 2020 via email

vsky279 commented Sep 20, 2020

pjsg commented Sep 20, 2020

mk-pmb commented Sep 20, 2020

pjsg commented Sep 20, 2020

vsky279 commented Sep 27, 2020

vsky279 commented Sep 14, 2020 •

edited

Loading

vsky279 commented Sep 14, 2020 •

edited

Loading

vsky279 commented Sep 16, 2020 •

edited

Loading

mk-pmb commented Sep 19, 2020 •

edited

Loading

mk-pmb commented Sep 19, 2020 •

edited

Loading

mk-pmb commented Sep 19, 2020 •

edited

Loading