Changing number tokenizer Output messages about invalid numbers to real errors #8546

miv391 · 2023-11-30T10:47:29Z

Describe the project you are working on

I'm making a game using GDScript.

Describe the problem or limitation you are having in your project

Literal number tokenizing doesn't work consistently when numbers are invalid. Examples:

# This invalid binary number works fine, b will be 0, although the actual number part is empty.
var b = 0b

# This invalid binary number doesn't work, it spams "value is too large" messages to Output.
var b = 0b1111111111111111111111111111111111111111111111111111111111111111111111111111

# This invalid hexadecimal number doesn't work, "Invalid hexadecimal notation character "x" (U+0078) 
	# in string "0x"." is spammed to Output.
var h = 0x

# This invalid base 10 number produces a proper Error ("Invalid numeric notation.")
var i = 123qwerty

# This invalid base 10 number doesn't work, it spams "value is too small" messages to Output.
var n = -9999999999999999999999999999999999999999

The problem is that only sometimes tokenization errors are proper errors. Mostly the erroneous number just causes messages spammed to Output.

In my opinion error messages should not be printed to Output at all when tokenizing the code. The problems with this are:

If you don't have Output open, you miss the error messages completely.
Whenever code is tried to be tokenized, a new error message is printed to Output, so there will be a lot of spam.
If you are using external editor, you don't see these errors at all, because they are not "proper" errors you can see through language server.

The user experience looks like this with invalid base 10 and hex numbers:

Describe the feature / enhancement and how it helps to overcome the problem or limitation

All number conversion errors during tokenizing should produce proper errors.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

As far as I understand the tokenizer code, the problem lies here:
https://github.com/godotengine/godot/blob/d76c1d0e516fedc535a2e394ab780cac79203477/modules/gdscript/gdscript_tokenizer.cpp#L841-L854

If we take the hex numbers as an example, int64_t String::hex_to_int() is used to convert a string to an int. If it fails (the string is for example "0x"), 0 is returned and error message ("Invalid hexadecimal notation...") is printed to Output. This the exactly same function that is used runtime when executing this code: print("0x".hex_to_int())

The problem is that when tokenizer uses String's *_to_int conversion functions, it cannot get any information whether the conversion failed and if it failed, how did it failed. So currently the tokenizer doesn't have any information to create proper errors and it cannot get it from current conversion functions.

I think the tokenizer part is easy to fix, but the String's conversion functions are a bit more difficult. I have here three different ideas to do that.

All three ideas need an enum for the conversion result:

	typedef enum {
		OK,
		INVALID_NOTATION,
		TOO_LARGE,
		TOO_SMALL
	} ConversionResult;

Add a new conversion function:

int64_t hex_to_int(ConversionResult &result) const;

This new conversion function would not print anything to Output.

But this would require duplicating most of the current hex_to_int() code, and I don't like that very much.

Second way would be to modify the current hex_to_int() to use the new status returning conversion function.

int64_t String::hex_to_int() const {
	ConversionResult result;
	int64_t value = hex_to_int(result);
	// print message to Output if result != OK
	return value;
}

Update: I tested this option, but the String's hex_to_int() took a big performance hit, it used about 25% more time. So this option and the third option are likely out of the question.

Third way would to add the status result as an optional pointer parameter to the current conversion function.

int64_t hex_to_int(ConversionResult *result = 0) const;

Then the implementation would be changed so that if result == nullptr, the conversion function would work just like it does now. If conversion fails, it prints an error message to Output. That is how it must work runtime. But the tokenizer would give a pointer to a result enum and then the conversion function would not print anything to Output, it would store the conversion result to the result parameter instead.

If this enhancement will not be used often, can it be worked around with a few lines of script?

No, this cannot be worked around.

Is there a reason why this should be core and not an add-on in the asset library?

This is a core functionality.

The text was updated successfully, but these errors were encountered:

AThousandShips added the topic:gdscript label Nov 30, 2023

miv391 mentioned this issue Dec 2, 2023

Add more C++ unit tests for String to number conversions godotengine/godot#85666

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing number tokenizer Output messages about invalid numbers to real errors #8546

Changing number tokenizer Output messages about invalid numbers to real errors #8546

miv391 commented Nov 30, 2023 •

edited

Loading

Changing number tokenizer Output messages about invalid numbers to real errors #8546

Changing number tokenizer Output messages about invalid numbers to real errors #8546

Comments

miv391 commented Nov 30, 2023 • edited Loading

Describe the project you are working on

Describe the problem or limitation you are having in your project

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

If this enhancement will not be used often, can it be worked around with a few lines of script?

Is there a reason why this should be core and not an add-on in the asset library?

miv391 commented Nov 30, 2023 •

edited

Loading