-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize targets to 80 char strings for ATN serialization segments. #3438
normalize targets to 80 char strings for ATN serialization segments. #3438
Conversation
…hat's wrong when failed.
…va which needs big strings for efficiency.
Well, @ericvergnaud is the one who cares and suggests 80 so let's go old school :) |
crap. i got the |
well not sure what's going on but it seems that the value returned by getSerializedATNSegmentLimit is not used... |
For which targets? Wow. SerializedATN.getSegments() isn't called anywhere (unless reflection). Looks like constructor does all the work to get string:
oh! the template can call
and java:
|
Actually, it's called internally by StringTemplate, and value from |
Unfortunately, in the end, it's a very bad change. It's limit not for line length (it's set up internally by StringTemplate during wrapping) but for the maximum length of string or char array. And now in generated code, we have a lot of useless small 80-element arrays for targets that consider ATN segment limit (Dart, C++, PHP). Also, this limit is actual only for Java target since other targets don't have 65535 limit. I suggest reverting |
The getSerializedATNSegmentLimit is used to prevent strings longer than 65535 characters for the serialized ATN. If we have a serialized ATN that requires something larger than that, it has to be split into multiple strings that are joined at runtime. An obvious case where this will happen is when the number of ATN states goes beyond 16 bits. I believe this also has the effect of generating strings on multiple lines instead of one contiguous long line. @ericvergnaud wants to continue keeping it for the various targets he manages because it makes it easier to see the generated code in the editor. In the case of Java, it would slow things down a little bit and cause memory allocation if we set it to 80 or whatever. In my case, I prefer the full 64k string to avoid multiple allocations. |
Yes, that's why it should not be changed to 80. It creates a lot of unwanted small fragments for targets that respect segments (PHP, C++, Dart). I've fixed it in the latest PR. It looks like this parameter is only relevant for Java.
It doesn't affect string splitting because StringTemplate is responsible for string wrapping.
Only on the compiler step. Java compiler folds constant strings into a single one in .class file. |
I don’t think it creates multiple allocations, rather it references string constants in the class file.
Envoyé de mon iPhone
… Le 30 janv. 2022 à 01:00, Ivan Kochurkin ***@***.***> a écrit :
The getSerializedATNSegmentLimit is used to prevent strings longer than 65535 characters for the serialized ATN.
Yes, that's why it should not be changed to 80. It creates a lot of unwanted small fragments for targets that respect segments (PHP, C++, Dart). I've fixed it in the latest PR.
I believe this also has the effect of generating strings on multiple lines instead of one contiguous long line.
It doesn't affect string splitting because StringTemplate is responsible for string wrapping.
In the case of Java, it would slow things down a little bit and cause memory allocation if we set it to 80 or whatever.
Only on the compiler step. Java compiler folds constant strings into a single one in .class file.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
Multi-line String constants are supported in C#, Python, Javascript, Dart, Go, PHP, Swift and C++.
So we should use them for readability - that’s a template topic.
They are already supported starting in Java 12 I think, so maybe not worth investing so much time in this ?
… Le 30 janv. 2022 à 00:30, Terence Parr ***@***.***> a écrit :
The getSerializedATNSegmentLimit is used to prevent strings longer than 65535 characters for the serialized ATN. If we have a serialized ATN that requires something larger than that, it has to be split into multiple strings that are joined at runtime. An obvious case where this will happen is when the number of ATN states goes beyond 16 bits.
I believe this also has the effect of generating strings on multiple lines instead of one contiguous long line. @ericvergnaud <https://github.com/ericvergnaud> wants to continue keeping it for the various targets he manages because it makes it easier to see the generated code in the editor. In the case of Java, it would slow things down a little bit and cause memory allocation if we set it to 80 or whatever. In my case, I prefer the full 64k string to avoid multiple allocations.
—
Reply to this email directly, view it on GitHub <#3438 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNQJBRHI6QPN53Y6IMLVTUYR2CJANCNFSM5K26SOAA>.
You are receiving this because you were mentioned.
|
The limit is not about multiline strings, it's about maximum string length that is only actual for Java because of 65535 limit. |
My own preference would be to see one big array of integers wrapped appropriately at some line length, rather than that just with lots of different segments. Unless a target has a limit, we should probably set the segment length to max integer. I think we should separate the idea of line length, which can be handled in templates, from the length of the ATN serialization list of integers. I don't think any language but Java has a string length issue in terms of compilation or storing in a generated binary file. @ericvergnaud points out that most targets support multi line strings so they can be built with appropriate template actions. In the end, my vote is to set the segment length to be max int and then change the templates to solve the issue of wrapping that Eric originally pointed out. E.g., the java target already does that trivially:
@ericvergnaud how will it affect JavaScript or other targets if we moved to max int? Can we tweak the templates to satisfy your needs? |
(ANTLR history/info Is slowly filtering back into my old brain) woot! |
I've checked, all targets have wrapping (over string or arrays). |
excellent. Ok, so this covers the primary concern that @ericvergnaud had originally. I will look at this after we figure out the UU ID and ATN serialization version thing. |
@ericvergnaud @KvanTTT How does this look? Tests appear to pass locally.