Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of ATN serialization restrictions (ATN states size can be > 65535, up to 2^31-1), remove excess serialization and allocations #3493

Closed
wants to merge 5 commits into from

Conversation

KvanTTT
Copy link
Member

@KvanTTT KvanTTT commented Jan 15, 2022

It's the limit of serialization, not of ANTLR tool itself. It can be and should be fixed. I've fixed it by using int serialization with varying bytes count that depends on int value:

public void writeCompactUInt32(int value) {
	if (value < 0b1000_0000_0000_0000) {
		writeUInt16(value);
	} else {
		writeUInt16((value & 0b0111_1111_1111_1111) | (1 << 15));
		writeUInt16(value >>> 15);
	}
}

All changes are back-compatible excluding quite large lexers or parsers (with more than 32768 states). But actually, it does not matter as I've said before because of version check on runtime.

If it's okay for Java, I'll fix other runtimes as well.

Fixes #1863, #2732, #3338

The PR depends on #3488

@KvanTTT KvanTTT force-pushed the increase-atn-states-size-limit branch from 27f96d8 to d60bfed Compare January 16, 2022 15:59
@KvanTTT
Copy link
Member Author

KvanTTT commented Jan 16, 2022

Ok, I managed to save back-compatibility (up to 2^14-1=16383 that is also quite big) using the following encoding scheme:

encoding count type
00xx xxxx xxxx xxxx 1 int (14 bit)
01xx xxxx xxxx xxxx xxxx xxxx xxxx xxxx 2 int (30 bit)
1000 0000 0000 0000 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx 3 int (32 bit)
1111 1111 1111 1111 1 -1 (0xFFFF)

We only need positive integers and -1 (0xFFFF) for null.

All tests for other runtime works except a couple of tests with big lexers. But now ANTLR supports up to 2^31-1 integers and doesn't have any restrictions related to serialization.

Also, I've optimized both the tool and Java runtime, I've removed excess serialization and allocations. If the code is good, I'll fix other runtimes to support big lexers and parsers.

But anyway I prefer refactoring described in Optimization of ATN serialization format discussion.

@KvanTTT KvanTTT changed the title Increase ATN states size limit (> 65535) Get rid of ATN serialization restrictions (ATN states size can be > 65535, up to 2^31-1), remove excess serialization and allocations Jan 16, 2022
Refactor ATN serializer and deserializer, use ATNDataWriter, ATNDataReader

Remove excess data cloning in deserializer

fixes antlr#1863, fixes antlr#2732, fixes antlr#3338
… to Integer.MAX_VALUE)

Simplify serializer and deserializer API
@KvanTTT KvanTTT force-pushed the increase-atn-states-size-limit branch from d60bfed to 00f0765 Compare January 17, 2022 22:14
@KvanTTT
Copy link
Member Author

KvanTTT commented Jan 17, 2022

@parrt since #3488 is merged, this request is ready for intermediate review (Java only)

@KvanTTT
Copy link
Member Author

KvanTTT commented Jan 21, 2022

I'm closing in favor of #3505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a more understandable message than "Serialized ATN data element .... element ... out of range 0..65535"
1 participant