Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing number tokenizer Output messages about invalid numbers to real errors #8546

Open
miv391 opened this issue Nov 30, 2023 · 0 comments

Comments

@miv391
Copy link

miv391 commented Nov 30, 2023

Describe the project you are working on

I'm making a game using GDScript.

Describe the problem or limitation you are having in your project

Literal number tokenizing doesn't work consistently when numbers are invalid. Examples:

# This invalid binary number works fine, b will be 0, although the actual number part is empty.
var b = 0b

# This invalid binary number doesn't work, it spams "value is too large" messages to Output.
var b = 0b1111111111111111111111111111111111111111111111111111111111111111111111111111

# This invalid hexadecimal number doesn't work, "Invalid hexadecimal notation character "x" (U+0078) 
	# in string "0x"." is spammed to Output.
var h = 0x

# This invalid base 10 number produces a proper Error ("Invalid numeric notation.")
var i = 123qwerty

# This invalid base 10 number doesn't work, it spams "value is too small" messages to Output.
var n = -9999999999999999999999999999999999999999

The problem is that only sometimes tokenization errors are proper errors. Mostly the erroneous number just causes messages spammed to Output.

In my opinion error messages should not be printed to Output at all when tokenizing the code. The problems with this are:

  • If you don't have Output open, you miss the error messages completely.
  • Whenever code is tried to be tokenized, a new error message is printed to Output, so there will be a lot of spam.
  • If you are using external editor, you don't see these errors at all, because they are not "proper" errors you can see through language server.

The user experience looks like this with invalid base 10 and hex numbers:

kuva

Describe the feature / enhancement and how it helps to overcome the problem or limitation

All number conversion errors during tokenizing should produce proper errors.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

As far as I understand the tokenizer code, the problem lies here:
https://github.com/godotengine/godot/blob/d76c1d0e516fedc535a2e394ab780cac79203477/modules/gdscript/gdscript_tokenizer.cpp#L841-L854

If we take the hex numbers as an example, int64_t String::hex_to_int() is used to convert a string to an int. If it fails (the string is for example "0x"), 0 is returned and error message ("Invalid hexadecimal notation...") is printed to Output. This the exactly same function that is used runtime when executing this code: print("0x".hex_to_int())

The problem is that when tokenizer uses String's *_to_int conversion functions, it cannot get any information whether the conversion failed and if it failed, how did it failed. So currently the tokenizer doesn't have any information to create proper errors and it cannot get it from current conversion functions.

I think the tokenizer part is easy to fix, but the String's conversion functions are a bit more difficult. I have here three different ideas to do that.

All three ideas need an enum for the conversion result:

	typedef enum {
		OK,
		INVALID_NOTATION,
		TOO_LARGE,
		TOO_SMALL
	} ConversionResult;
  1. Add a new conversion function:
int64_t hex_to_int(ConversionResult &result) const;

This new conversion function would not print anything to Output.

But this would require duplicating most of the current hex_to_int() code, and I don't like that very much.

  1. Second way would be to modify the current hex_to_int() to use the new status returning conversion function.
int64_t String::hex_to_int() const {
	ConversionResult result;
	int64_t value = hex_to_int(result);
	// print message to Output if result != OK
	return value;
}

Update: I tested this option, but the String's hex_to_int() took a big performance hit, it used about 25% more time. So this option and the third option are likely out of the question.

  1. Third way would to add the status result as an optional pointer parameter to the current conversion function.
int64_t hex_to_int(ConversionResult *result = 0) const;

Then the implementation would be changed so that if result == nullptr, the conversion function would work just like it does now. If conversion fails, it prints an error message to Output. That is how it must work runtime. But the tokenizer would give a pointer to a result enum and then the conversion function would not print anything to Output, it would store the conversion result to the result parameter instead.

If this enhancement will not be used often, can it be worked around with a few lines of script?

No, this cannot be worked around.

Is there a reason why this should be core and not an add-on in the asset library?

This is a core functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants