Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ' ' flag to string.format & double rounding method #3283

Closed
wants to merge 2 commits into from

Conversation

vsky279
Copy link
Contributor

@vsky279 vsky279 commented Sep 14, 2020

Fixes #3265.

  • This PR is for the dev branch rather than for master.
  • This PR is compliant with the other contributing guidelines as well (if not, please describe why).
  • I have thoroughly tested my contribution.
  • The code changes are reflected in the documentation at docs/*.

This PR has two commits for the time being. These are to be squashed before merger.

The first commit implements ' ' flag to string format (%f, %g, %e)

It passes well the test performed by @mk-pmb in #3265 - this script.

Also it removes a part of code non compatible with Ubuntu's Lua 5.1 as well as Lua 5.3.
The code is removing the - sign for negative zero. "%.3f"%-0.0004 should indeed give -0.000.

This test gives same results in NodeMCU Lua as well as in Ubuntu:

for i=-1,1 do print(("%.3f\t%+.3f\t% .3f"):format(0.0004*i, 0.0004*i, 0.0004*i)) end
-0.000	-0.000	-0.000
0.000	+0.000	 0.000
0.000	+0.000	 0.000

The second commit deals with intuitively wrong rounding

As discussed in #3265 "%.3f"%3.1415 gives 3.141 (because of its float representation). Mathematically this is correct behavior and is not a bug per se while user would expect different behavior.
I am proposing this workaround which I think is in line with the existing logic. Note that =3.1415 prints 3.1415 and not 3.1414999. So when it is detected that such number would print this way the least significant bit of float mantise is added leading to "correct" rounding.

This is the test it passes well

for i=1,20 do d=(3.1405+i*0.001); print(("%.3f\t%7f\t%.20f"):format(d, d, d)) end
3.142	3.141500	3.14149999618530273438
3.143	3.142500	3.14250016212463378906
3.144	3.143500	3.14350008964538574219
3.145	3.144500	3.14450001716613769531
3.146	3.145500	3.14550018310546875000
3.147	3.146500	3.14650011062622070313
3.148	3.147500	3.14750003814697265625
3.149	3.148500	3.14849996566772460938
3.150	3.149500	3.14950013160705566406
3.151	3.150500	3.15050005912780761719
3.152	3.151500	3.15149998664855957031
3.153	3.152500	3.15250015258789062500
3.154	3.153500	3.15350008010864257813
3.155	3.154500	3.15450000762939453125
3.156	3.155500	3.15550017356872558594
3.157	3.156500	3.15650010108947753906
3.158	3.157500	3.15750002861022949219
3.159	3.158500	3.15849995613098144531
3.160	3.159500	3.15950012207031250000
3.161	3.160500	3.16050004959106445313

The result is the same as on Ubuntu Lua except for %.20f of course as 64bit doubles are used.
It works well also for %e format

for i=1,10 do d=-(3.1405+i*0.001)*1e-5; print(("%.3e\t%7e\t%.20e"):format(d, d, d)) end
-3.142e-05	-3.141500e-05	-3.14149983751121908426e-05
-3.143e-05	-3.142500e-05	-3.14249991788528859615e-05
-3.144e-05	-3.143500e-05	-3.14349999825935810804e-05
-3.145e-05	-3.144500e-05	-3.14450007863342761993e-05
-3.146e-05	-3.145500e-05	-3.14550015900749713182e-05
-3.147e-05	-3.146500e-05	-3.14649987558368593454e-05
-3.148e-05	-3.147500e-05	-3.14749995595775544643e-05
-3.149e-05	-3.148500e-05	-3.14850003633182495832e-05
-3.150e-05	-3.149500e-05	-3.14950011670589447021e-05
-3.151e-05	-3.150500e-05	-3.15049983328208327293e-05

@pjsg
Copy link
Member

pjsg commented Sep 14, 2020

I don't think that I like 3.1414999 rounding to 3.142

Note that I would expect 0.146 to round to 0.15 (for 2 dp) and to 0.1 (for 1 dp).

@vsky279
Copy link
Contributor Author

vsky279 commented Sep 14, 2020

Do you like rounding 3.1415 to 3.141 (for 3 dp)?
="%.3f"%3.1414999 of course rounds correctly to 3.141.

The issue is that 3.1415 is represented as 3.141499996185303 thus "wrong" rounding.

 ="%.2f"%0.146
0.15
="%.1f"%0.146
0.1

@pjsg
Copy link
Member

pjsg commented Sep 14, 2020

If there isn't a representation for the value 3.1415, then you have to use the closest representation -- and since it is less than 3.141500000000, then it should round to 3.141.

@vsky279
Copy link
Contributor Author

vsky279 commented Sep 14, 2020

...which is not very intuitive especially when print(3.1415) prints 3.1415 (and hides the real representation).

@vsky279
Copy link
Contributor Author

vsky279 commented Sep 15, 2020

@pjsg Philip, please note, that this workaround should be compatible with your PR #3225. It is efficient only when the number is float for doubles it will not be triggered. It is checked with the !(dbl_u.m_long & 0x1fffffff) condition.

@mk-pmb
Copy link
Contributor

mk-pmb commented Sep 15, 2020

Your patches seem to have lines with trailing whitespace.
After applying them, my firmware repo is at branch dev, commit a06809d "String format %f, %g, %e intuitively incorrect rounding workaround" with recent history:

and this firmware was buiilt from it.

With your patches, numfmt.lua now prints the same on ESP8266 as on Ubuntu, both for 3.1410 and 3.1415 (log).

@pjsg
Copy link
Member

pjsg commented Sep 16, 2020

This code does weird stuff if the bottom 29 bits of the float are zero -- this will mean that printing increasing double values will not result in increasing numbers being displayed (in the double version of the build).

These are the values printed by C for floats close to 3.1415

  Hex       %.1f   %.2f   %.3f    %.4f     %.5f      %.6f       %.7f        %.8f         %.9f          
0x40490e4c:  3.1   3.14   3.141   3.1415   3.14150   3.141498   3.1414976   3.14149761   3.141497612   
0x40490e4d:  3.1   3.14   3.141   3.1415   3.14150   3.141498   3.1414979   3.14149785   3.141497850   
0x40490e4e:  3.1   3.14   3.141   3.1415   3.14150   3.141498   3.1414981   3.14149809   3.141498089   
0x40490e4f:  3.1   3.14   3.141   3.1415   3.14150   3.141498   3.1414983   3.14149833   3.141498327   
0x40490e50:  3.1   3.14   3.141   3.1415   3.14150   3.141499   3.1414986   3.14149857   3.141498566   
0x40490e51:  3.1   3.14   3.141   3.1415   3.14150   3.141499   3.1414988   3.14149880   3.141498804   
0x40490e52:  3.1   3.14   3.141   3.1415   3.14150   3.141499   3.1414990   3.14149904   3.141499043   
0x40490e53:  3.1   3.14   3.141   3.1415   3.14150   3.141499   3.1414993   3.14149928   3.141499281   
0x40490e54:  3.1   3.14   3.141   3.1415   3.14150   3.141500   3.1414995   3.14149952   3.141499519   
0x40490e55:  3.1   3.14   3.141   3.1415   3.14150   3.141500   3.1414998   3.14149976   3.141499758   
0x40490e56:  3.1   3.14   3.141   3.1415   3.14150   3.141500   3.1415000   3.14150000   3.141499996     <--- this is 3.1415
0x40490e57:  3.1   3.14   3.142   3.1415   3.14150   3.141500   3.1415002   3.14150023   3.141500235   
0x40490e58:  3.1   3.14   3.142   3.1415   3.14150   3.141500   3.1415005   3.14150047   3.141500473   
0x40490e59:  3.1   3.14   3.142   3.1415   3.14150   3.141501   3.1415007   3.14150071   3.141500711   
0x40490e5a:  3.1   3.14   3.142   3.1415   3.14150   3.141501   3.1415009   3.14150095   3.141500950   
0x40490e5b:  3.1   3.14   3.142   3.1415   3.14150   3.141501   3.1415012   3.14150119   3.141501188   
0x40490e5c:  3.1   3.14   3.142   3.1415   3.14150   3.141501   3.1415014   3.14150143   3.141501427   
0x40490e5d:  3.1   3.14   3.142   3.1415   3.14150   3.141502   3.1415017   3.14150167   3.141501665   
0x40490e5e:  3.1   3.14   3.142   3.1415   3.14150   3.141502   3.1415019   3.14150190   3.141501904   
0x40490e5f:  3.1   3.14   3.142   3.1415   3.14150   3.141502   3.1415021   3.14150214   3.141502142   
0x40490e60:  3.1   3.14   3.142   3.1415   3.14150   3.141502   3.1415024   3.14150238   3.141502380   

These values seem plausible. I would hope that the Lua code produces the same values......

@vsky279
Copy link
Contributor Author

vsky279 commented Sep 16, 2020

Now 32bit Lua does what's in your printout. The only difference with this patch compared to your printout is in the 3.1415 line (%.3f):

  Hex       %.1f   %.2f   %.3f    %.4f     %.5f      %.6f       %.7f        %.8f         %.9f   
0x40490e55:  3.1   3.14   3.141   3.1415   3.14150   3.141500   3.1414998   3.14149976   3.141499758   
0x40490e56:  3.1   3.14   3.142   3.1415   3.14150   3.141500   3.1415000   3.14150000   3.141499996     <--- this is 3.1415
0x40490e57:  3.1   3.14   3.142   3.1415   3.14150   3.141500   3.1415002   3.14150023   3.141500235   

Mathematically it's wrong. Intuitively (if you don't care about how floats are stored) it's correct.
This workaround should be left out when compiled in 64bit Lua.

@pjsg
Copy link
Member

pjsg commented Sep 18, 2020

I think that having different rules for lua32 than every other 32 bit float implementation is asking for trouble.

@vsky279
Copy link
Contributor Author

vsky279 commented Sep 18, 2020

Well this is very relevant comment. Though I still like this workaround. Would it be acceptable for everyone to have this an optionality configurable in user_config.h disabled by default?
E.g. //#define INTUITIVE_FLOAT_ROUNDING

@mk-pmb
Copy link
Contributor

mk-pmb commented Sep 19, 2020

I like the option but we should use another name, because what seems "intuitive" to whom, is more of a feeling and may change. There are many explicitly-named kinds of rounding in WP-en. I'll try to figure out which one is the one I learned in school math. Until then my draft option name would be SPRINTF_USE_SCHOOL_MATH_ROUNDING. However, will this always be a boolean? My gut suspects that some day in the future we'd like to introduce a 3rd rounding and thus should use an enum.

Edit: In my tests so far, I forgot about negative numbers. Here's how my Ubuntu's bash's printf rounds:

$ printf '%8.3f\t%8.4g\t%8.6g\n' {,-}{,12}3.1415{,,}
   3.141           3.141          3.1415
 123.142           123.1         123.142
  -3.141          -3.141         -3.1415
-123.142          -123.1        -123.142

I'll try and compare the LUAs soon.

@sonaux
Copy link
Contributor

sonaux commented Sep 19, 2020

Most probably school math rounding is variant named "Commercial rounding" in German WP, or "Nearest integer rounding" in Russian WP. English WP tells about special case but not method itself.

@mk-pmb
Copy link
Contributor

mk-pmb commented Sep 19, 2020

WP-en's section "Round half away from zero" has "This method, also known as commercial rounding, …" and that's what I currently consider intuitive. Would it be easily feasible to have an enum setting SPRINTF_ROUNDING_MODE where one possible value is sth. like SRM_COMMERCIAL?
Another value should be sth. that expresses the intent to emulate the default Ubuntu LUA or the default Windows LUA. (Do they vary accross CPU widths?) It may effectively be the same as one of the other rounding modes, but we would take responsibility for maintaining the mapping from intent to implementation.
Edit: On 2nd thought I'm not sure if there's an important enough use case for being exactly compatible with other OSs.

@vsky279
Copy link
Contributor Author

vsky279 commented Sep 19, 2020

Here's how my Ubuntu's bash's printf rounds:

It seems that Ubuntu's printf suffers the same issue as 3.1415 rounds to 3.141 in your printout. Maybe it uses single-precision float?

Thanks for your thoughts.
I went quickly through the WP page. It seems to me that the closest matching method to what's implemented is Double rounding. In their example 9.46 can round to 10. This is our case too. In the implementation (I will stop calling it workaround) the number is first rounded to 7 decimal point digits and then to the requested number of dps.

In order not to complicate the code I suggest

#define SPRINTF_ROUNDING_MODE_DOUBLEROUNDING

Should there be other rounding method implemented the directive will look like SPRINTF_ROUNDING_MODE_SOMENEWMETHOD.

@vsky279 vsky279 changed the title Adding ' ' flag to string.format & intuitively wrong rounding fix workaround Adding ' ' flag to string.format & double rounding method Sep 19, 2020
@mk-pmb
Copy link
Contributor

mk-pmb commented Sep 19, 2020

Are tostring/tonumber or sjson.{en,de}code affected by this? Should we test each rounding method for rounding error accumulation with repeated conversion, or are there some theoretical guarantees that make it redundant?

@pjsg
Copy link
Member

pjsg commented Sep 19, 2020 via email

@vsky279
Copy link
Contributor Author

vsky279 commented Sep 20, 2020

I hope this test is what you've been asking for:

a={}
for i=1,100 do a[i]=math.floor(node.random()*1e7)/(10^math.floor(math.random()*7)) end
b=sjson.encode(a)
c=sjson.decode(b)
for i,j in pairs(a) do
    if a[i]~=c[i] then print(("mismatch [%d]: %.20f ~= %.20f, diff: %.20f"):format(i, a[i], c[i], a[i]-c[i])) end 
end

It generates numbers with 7 significant digits with decimal point somewhere between these digits (%7g format is used when exporting to json). The test passes well in both cases (SPRINTF_ROUNDING_MODE_DOUBLEROUNDING disable/enabled).

Of course when 8 significant digits are generated, e.g. math.floor(node.random()*1e7)/(10^(math.floor(math.random()*20)-10)) or math.floor(node.random()*1e8)/(10^math.floor(math.random()*7)) the test does not pass for all generated numbers no matter what rounding mode is selected.

To wrap up: the rounding mode setting does not make any difference. It is activated only when less or equal than 6 dp are printed (condition trunc<=6).

@pjsg
Copy link
Member

pjsg commented Sep 20, 2020

Yes - that is exactly the right test.... Thank you for doing it.

@mk-pmb
Copy link
Contributor

mk-pmb commented Sep 20, 2020

For debugging it might be nicer to use a fixed set of numbers. Maybe someone familiar with the bit-level side of the problem could (possibly later) replace the randomness with specifically chosen edge focus on the potential problems.

@pjsg
Copy link
Member

pjsg commented Sep 20, 2020

Actually, I just filed #3287 which reveals that there are some values that do not round trip correctly. In fact, it turns out that around 60% of floats do not round trip exactly. This is ugly.

@vsky279 vsky279 mentioned this pull request Sep 24, 2020
3 tasks
@vsky279
Copy link
Contributor Author

vsky279 commented Sep 27, 2020

Closed thanks to #3291.
I will proposed double rounding PR for the new snprintf (' ' flag is solved by the new PR).

@vsky279 vsky279 closed this Sep 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants