Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review all uses of lua numbers to determine if they ought to be integers #3219

Open
pjsg opened this issue Jul 17, 2020 · 13 comments
Open

Review all uses of lua numbers to determine if they ought to be integers #3219

pjsg opened this issue Jul 17, 2020 · 13 comments

Comments

@pjsg
Copy link
Member

pjsg commented Jul 17, 2020

As Philip has correctly pointed out, we have a soft break introduced in Lua 5.3 in the we've moved our default build to sint32 / float32 sub types so that a TValue can fit into 8 rather than 12 bytes. The standard Lua C API has a set of calls relating to number and integer handling that are largely preserved.

A lot of our modules adopt rather sloppy use of the API for example int val = lua_tonumber(L,1). In Lua 5.1 the API call returns a double which is then converted inline to a type int or unsigned int. This is lossless if the underlying Lua value contained this type.

  • This is no longer the case with Lua 5.3 which returns a type float that can only store 23 bit integers without loss of precision.
  • Given that the VM manipulates integer quantities using 32 integers, it makes sense to avoid the unnecessary int -> float-> int double conversion.

This issue largely subsumes the sub issues: #3213, #3214, #3215, #3216, #3217 and #3218, I suggest that we handle general number type issues here. See my further analysis below.


[Philip's original lede] It turns out that it isn't just lua_pushnumber, but the reading functions can have problems as well.

rtcmem.write32 has this problem (from code inspection). I think that the json decoder will have problems....

@KT819GM
Copy link

KT819GM commented Jul 18, 2020

Is this behavior because of same reasons?

Example code:

data = {
mixed_data = node.chipid()..'/data',
mixed_status = node.chipid()..'/status',
integer = 500,
integer2 = 1000,
boolean_f = false,
boolean_t = true,
}

print('Integer: '..tostring(data.integer)..' boolean: '..tostring(data.boolean_f))

sjson.encode(data) 
ok, json = pcall(sjson.encode, data)
json_obj = sjson.decode(json)
print(type(json_obj))
for k, v in pairs(json_obj) do     print(k..' = '..tostring(v)..',') end

Result:


Integer: 500 boolean: false

table

boolean_t = true,
integer2 = 1000.0,
boolean_f = false,
mixed_data = 1111111/data,
integer = 500.0,
mixed_status = 1111111/status,

I'm using tostring because of mixed table, as on example I've provided. It could be seen, that before conversion tostring behaves as expected, but after sjson I'm getting .0. Also, result came in different order, possibly using sjson wrongly.

Edit: forgot to write that it's on dev 5.3

@pjsg
Copy link
Member Author

pjsg commented Jul 18, 2020

sjson is definitely a module that needs some work to get the integer/float handling correct. I'm going to be looking at it over the weekend.

@pjsg
Copy link
Member Author

pjsg commented Jul 18, 2020

@KT819GM Take a look at #3222 -- this should fix your problem.

@KT819GM
Copy link

KT819GM commented Jul 19, 2020

@KT819GM Take a look at #3222 -- this should fix your problem.

Yes, now numbers are represented correctly. Not sure why string order is changing, not a problem for me (for config file) but possibly could be issue for someone who will be parsing json string with fixed position (like parsing object in Node-Red and adding values to array renaming them). Thank you

@pjsg
Copy link
Member Author

pjsg commented Jul 19, 2020 via email

@KT819GM
Copy link

KT819GM commented Jul 19, 2020

Couldn't be more clear. Thank you

@TerryE
Copy link
Collaborator

TerryE commented Aug 24, 2020

The C API defines the following number and integer functions. The former use type LUA_NUMBER and the latter LUA_INT for numeric arguments and returns. These are double and int on 5.1 and float and int on 5.3, respectively. The declarative types are lua_Number and lua_Integer.

Number form Integer Form Comments
lua_isnumber(L,n) lua_isinteger(L,n)
lua_pushnumber(L,i) lua_pushinteger(L,i)
lua_tonumber(L,n) lua_tointeger(L,n)
luaL_checknumber(L,n) luaL_checkinteger(L,n)
luaL_optnumber(L,n,def) luaL_optinteger(L,n,def)
lua_stringtonumber(L,n,s) converts to number/int

Note that:

  • All except the last are common to both APIs; lua_stringtonumber is currently only supported on Lua 5.3, but I will add this to the Lua 5.1 API, so that we can use this in our modules without needed #if code.
  • The latest GCC compiler will generate precision warnings if the source coerces types which could incur loss of precision. The APIs also include a limited number of coercion macros, but these are different for the two API header sets, so it is simpler and easier to do explicit coercion when needed.

I recommend that:

  • Except for a couple of cases, the rest of our code uses various int types so we should stick to the integer forms except where the values are truly floating point.
  • We should use the above 5 integer forms and add explicit type case coercion to all uses where the return is not type int (e.g. size_t. We also coerce integer inputs where there is a potential loss of precision (e.g. a size_t input to lua_pushinteger().
  • We review all uses of double or float. IMO, we should stick to using lua_Number as the float type so that this facilitates @pjsg's optional 64 bit PR. A couple of modules (e.g. sjson and struct may need special handling) but I will leave this to Philip as their active maintainer.

I will post a fully code analysis next, but I have to be sociable to visitors for the next few hours.

@TerryE
Copy link
Collaborator

TerryE commented Aug 25, 2020

Having gone through the code, I don't want to break the legacy 5.1 integer mode builds and the issue we have for these is that lua_Number is type int and using lua_Number for a default float type would break these for 5.1 integer builds, so on reflection it is often just simpler leaving the internal float temporary variable as double for both builds. I have also added a lua_Float which is a double on 5.1 builds and a synonym for lua_Number on 5.3; when we just want FP, but only need 6 d.p. we can use lua_Float and this will also work fine on both Lua variants.

  • ads1115, bme280, bme680, dht, si7021, struct all use double variables.
  • I've changed ads1115, dth and si7021.
  • bme280, bme680 do some weird stuff which I have left using double -- one for the maintainer.
  • I will leave struct to @pjsg

Other reviews changes.

  • app/pm/pmSleep.c used lua_tonumber() etc. and I've tidied this coding up. @dnc40085 will need to check.

Slight change of subject

When I 've been going through this code, I find myself slipping into #1028, in that the way that we've coded up much of the parameter validation is just so sloppy; this can be done faster, generated less code, and also tighter. A typical coding pattern is
something along the lines of

    lua_getfield(L, 1, "start_ch");
    if (!lua_isnil(L, -1)){
      if(lua_isnumber(L, -1)){
        start_ch = (uint8)luaL_checkinteger(L, -1);
        luaL_argcheck(L, (start_ch >= 1 && start_ch <= 14), 1, "start_ch: Range:1-14");
        cfg.schan = start_ch;
      }
      else
        luaL_argerror(L, 1, "start_ch: must be a number");
    }
    else
      cfg.schan = 1;

"Some parameter is an integer in the range x to y; either with no default or with a specific default" is a really common coding pattern in our modules, and this is coded "long hand" in about 4 different ways. So in this case omitting start_ch defaults to 1 but what happens with start_ch = 1.0 or start_ch = 2.1 even? In some coding cases the latter fails to an invalid parameter error); some cases you get a "number has no integer representation_" error; others 0; others the "start_ch: must be a number" error. It's basically a mess and I feel that we should have a consistent coding pattern that we should adopt. A common mistake is to do an API call such as luaL_checkinteger() which throws an error if the stack entry isn't an integer but then do extra validation where the error paths will never be taken. There are also nice macro extensions such as luaL_optinteger(L, ndx, default) which greatly simplify default handing.

#define luaL_opt(L,f,n,d)	(lua_isnoneornil(L,(n)) ? (d) : f(L,(n)))

Lua itself give some good templates, for example in ltablib.c for the argument to table.remove:

  lua_Integer pos = luaL_optinteger(L, 2, size);
  if (pos != size) 
    luaL_argcheck(L, 1 <= pos && pos <= size + 1, 1, "position out of bounds");

Our coded equivalents might be a dozen lines instead of 3 and not be as robust. Pull my hair out time. In my bones I would like to sort out this, but being realistic with every edit there is a small chance of making a breaking change, and so this sort of code change really needs testing. So on reflection and however tempting, this isn't something that we should realistically as part of this lua numbers review.

  • If the current code pattern works then leave any optimisations for now.
  • Focus on converting number -> integer calls is the existing call results in a potential loss of precision, and / or where a simple 1 for 1 switch number -> integer will address the issue.

Defer the API cleanup to a second pass, starting with node, file, net and wifi as separate issues / PRs , Perhaps we start with one of these and get a couple of the other committers (@nwf, @HHHartmann, ... or whoever wants to volunteer) to go through the changes in details so we can establish an agreed best-practice pattern, and then we can divide the rest out between those committers than would like to help execute the cleanup.

Thoughts? N + G + @jmattsson @pjsg @galjonsfigur ... ?

@jmattsson
Copy link
Member

Definitely defer this, I'd say. While I absolutely agree it's something that should get done, let's get all the big LVM stuff landed and settled. I'd like to get to the point where we can resume merging the 8266 and esp32 branches, but it'd be madness to have two major tickets running concurrently :D

@galjonsfigur
Copy link
Member

I agree with @jmattsson - It's true that many of C modules are interacting with Lua C API in sloppy way, but they are proven to work. For example, there were no big changes of Lua interface for ta least 2 years in wifi C module (aside from some code replacements with luaL_ and lua_ helper functions). Deferring this to be done after merging esp8266 and esp32 branches would give us another benefit of being able to cleanup C modules one by one, but for both variants, additionally verifying API compatibility between them.

@TerryE
Copy link
Collaborator

TerryE commented Aug 26, 2020

  • node.deepsleep() takes a sleep time in uSec. The SDK API calls take a uint64 argument. Reading the reference in the node documentation, this is around 225 mins which is longer than the 71 mins corresponding to MAX_UINT. In version 5.1 this just casts a double to a uint64 but for 5.3 I've used the following which is lossless up this 71 mins, but this being said the timer is accurate to far less than 6 d.p. so there is no practical loss of precision because of this float to 64-bit int conversion. I also converted the unsigned < 0 check which is pointless to a < 10 hrs which will at least pick up integer overflows.
#if LUA_VERSION_NUM == 501
    us = luaL_checknumber(L, 1);
#else /* 503 */
    us = lua_isinteger(L, 1) ? lua_tounsigned(L, 1) : (uint64) lua_tonumber(L, 1);
#endif
  • struct.c Format specifier d will only convert a float value on 5.3. getinteger() evaluates an int32 on the ESP but assigns this to a lua_Number for return. It would be better to stay as a 32-bit integer at least on 5.3. However I haven't updated this and have left this one to Philip.

See latest push to #3221.

@stale
Copy link

stale bot commented Aug 28, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 28, 2021
@TerryE
Copy link
Collaborator

TerryE commented Sep 26, 2021

Not safe to close this one.

@stale stale bot removed the stale label Sep 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants