-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ATN serialized data: remove shifting by 2, remove UUID #3516
Conversation
e3b5e76
to
88e10ac
Compare
It looks like some runtime tests rely on the stable version of ANTLR, that's why some tests are failing. It shouldn't be. |
@ericvergnaud where can I find the original grammar for https://github.com/antlr/antlr4/blob/bca2536f3fb689e6e06b8351c71571f8e837f71e/runtime/Python3/tests/mocks/TestLexer.py ? It looks strange that it's not possible to update the generated data. @parrt the same for https://github.com/antlr/antlr4/blob/88e10ac51d8c1e744d7dae054264f63f85a12439/runtime-testsuite/resources/org/antlr/v4/test/runtime/InterpDataReaderTest.interp It does not have original grammar. |
Thats a question for @youkaichao I believe... |
Ok, let me pull in this branch and see what it does. Thank you for your work on this guys... standby. |
It looks like we should either remove some tests or fix them for working with the actual ANTLR version from the source, not from a package repository. |
All Java tests pass. Also Go. Weird Python3 is passing but it should not based upon what you showed me in the gists above. |
@KvanTTT which test is failing for TestLexer.py? I can't get it to fail after I pulled in your branch but it should fail. |
looks like there are other Python tests that should fail:
|
Yes, because it's extra tests from https://github.com/antlr/antlr4/tree/master/runtime/Python3/tests, not from the runtime-testsuite directory. I can't even regenerate lexer and parsers because grammars are not presented. Other runtimes also have similar tests. |
Ok, I will make a PR to fix the Java thing first. |
What's wrong with Java? |
The .interp files that do not have corresponding grammars |
Yep, I was able to change them manually but it does not look the correct way. |
Ok, so I have the Java fixed, but the Python TestLexer.py looks to be in the runtime and not the runtime test suite! Those files shouldn't be run by the unit tests so we should be okay although that area should be cleaned up eventually. I'm back to looking at this PR. |
looking at the change list...wow, it dramatically simplifies the code to avoid the UU ID. love it. looks like Sam was trying to create backward compatibility but I don't think it's worth it. |
This looks great to me @KvanTTT !!! I just had that one issue about removing a dead comment. oh and also rebase from master once we include my fixes to the missing Java grammar files. Then once @ericvergnaud signs off on this we can merge. :) |
@parrt one we test Swift i.e. load a newly generated parser using the current runtime I'll be fine. |
@KvanTTT can you rebase on master? Then I'll repull. |
Should I disable such tests to make CI status green? |
Yes, please disable. I'll make new issue: #3521 |
Ok, I will be able to do it tomorrow since it's late for me. |
Great thanks!
|
48645fd
to
1314459
Compare
Yes, it's consistent with my observations.
Yes, he is right partially (0 takes 2 bytes instead of 1 and 0xFFFF takes 3 bytes instead of 1), but it does not imply triple size of source/class file as he said (as we see it's about 11% at worse case).
I agree the current implementation is not very clear. I can suggest two solutions how to optimize ATN data (but after this PR status resolving): the first includes target-dependent shift (only for Java) the second includes improved and more compact encoding which is actual for all runtimes. Both of them are better than the old solution with +2 shift that is only actual for Java. And both of them also imply more code removing. |
Ok, sounds good. Let us proceed as follows (mostly as-is for this PR).
Then we can attend to the possibility of shrinking the ATN serialization strings with your other PR. |
Just to clarify: do you suggest fixing the current PR to add shifting optimization only for Java? |
I suppose maybe that's the easiest path. Otherwise we would have to make a separate PR that did the shifting and this one is almost ready to go. But if you prefer we can go with to PRs. |
Ok, I can do it in this PR. Just thought there are already a lot of changes for the review and maybe it's inconvenient :) |
Yep But I think I’ve been keeping up with the changes but will take an
overall look again
On Sat, Feb 5, 2022 at 11:14 AM Ivan Kochurkin ***@***.***> wrote:
I suppose maybe that's the easiest path. Otherwise we would have to make a
separate PR that did the shifting and this one is almost ready to go. But
if you prefer we can go with to PRs.
Ok, I can do it in this PR. Just thought there are already a lot of
changes for the review :)
—
Reply to this email directly, view it on GitHub
<#3516 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABLUWJLE2XXTAS5HEPPJ2DUZVZH7ANCNFSM5NEIXMYA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Dictation in use. Please excuse homophones, malapropisms, and nonsense.
|
I've implemented Java-specific ATN data optimization in the latest commit. |
@@ -0,0 +1,6 @@ | |||
package org.antlr.v4.runtime.atn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer not creating new files unless there is a strong reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly in the runtime because it means Target developers will try to implement.
|
||
import java.util.List; | ||
|
||
public class ATNSerializeUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure creating a one method utility class is worth creating a file.
public ATNDataReader(char[] data) { | ||
this.data = data; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... The principal I like what you are doing here except we have to think about how it affects all of the other targets. Now we are diverging from the way other targets are implementing things. In other words everybody else has data[p++]
still. Of course we might need to generalize this if we are going to handle 32 bit ATN states but I not sure we need such a widespread change. If we were starting from scratch I can imagine this, but every change in a PR to a mature product has its costs or risks anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we are diverging from the way other targets are implementing things. In other words everybody else has data[p++] still.
I've just encapsulated reading logic into the separated class. Actually almost all targets already have similar implementation that uses readInt
methods: see C#, Python, Dart and other.
Of course we might need to generalize this if we are going to handle 32 bit ATN states but I not sure we need such a widespread change.
Handling of 32 bit ATN states could be easily added to the ATNDataWriter
and ATNDataReader
. I would demonstrate it in the next PR. I guess some users need such a feature since there are several user issues on the tracker.
If we were starting from scratch I can imagine this, but every change in a PR to a mature product has its costs or risks anyway.
We have a lot of tests and ATN data is already not back-compatible because of changes in this PR. I don't see a strong reason why to not add support of full range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for handling 32 bit integers is quite simple:
public void write(int value) {
if (value < 0b1000_0000_0000_0000) {
writeInt16(value);
} else {
writeInt16((value & 0b0111_1111_1111_1111) | (1 << 15));
writeInt16(value >>> 15);
}
}
public int read() {
int value = readInt16();
return value < 0b1000_0000_0000_0000 && value >= 0
? value
: (readInt16() << 15) | (value & 0b0111_1111_1111_1111);
}
@@ -0,0 +1,36 @@ | |||
package org.antlr.v4.runtime.atn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another file that's in one target which will diverge from other targets and then new targets will have yet another file to implement. A lotta these files are really related to the way Java likes to do things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATNDataWriter
is only actual for Java, maybe C++ because other targets don't write they only read ATN.
I was kind of expecting you to simply back out the code so that the job at target added then subtracted 2 again. haha.. Seems like kind of a major change, particularly given the comments I have in line. :) |
It could be possible but the suggested implementation looks more clear and does not require additional allocation. |
34edfb8
to
0f1da3e
Compare
I've amended the latest commit by your notes. |
I really value all of the effort you put in, but this is an old established project and there are many considerations that I've had to keep in mind over the years. What you are saying is true that it is clear, but that is not my only consideration. As I mentioned, one of my concerns is keeping the code for the various target similar and this changes things quite significantly. It may be the case that we need to go to something more general in the future if we need more than 16 bits. Please consider making the minimal change necessary to the java target. My intention for this update to the software is simply carving out the UUID from all targets, then removing +-2 from all target accept Java. |
0f1da3e
to
60f92b8
Compare
Ok, understand you. I've rewritten the latest commit with minimal changes: 60f92b8 Let's postpone other changes for future. |
As I said code of various targets are differs from Java because they use |
Interesting. Ok, didn't know that. Well let's keep it as-s for now. |
Wow! Great work! So much code removed!!! Thanks, @KvanTTT :) |
Please, don't forget to merge antlr/antlr-php-runtime#17 as well to make our CI status completely green. |
If you like code removing, probably you'd like the next PR with optimization of Swift serialization (JSON -> binary) :) #3513 |
Changes from antlr/antlr-php-runtime#17 are required in master for PHP
fix #3515