-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #1890 Standalone CR should be recognized as line separator #2519
Conversation
Link to an issue: #1098 |
Ping? |
2c23d4c
to
65ca40b
Compare
@parrt maybe it makes sense to accept the |
If there is an isolated maybe we should count as a newline and reset char position and bump line. |
As I've already written in the issue comment:
But this pull request is not okay because it does not have a unit test and fixes only Java runtime (other runtimes should accept |
See OP:
|
Also, BaseRuntimeTest should be improved to accept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- A unit test is required (
LexerExec/CarriageReturnAsLineSeparator.txt
). Also, readDescriptor should be improved to accept\r
and don't change to\n
all line separators from the input stream. - Implementation for other runtime is required as we've discussed.
Before implementation, we should wait for @parrt final decision about \r
as a line separator.
} else if ( curChar == '\r' ) { | ||
int nextChar = input.LA(2); | ||
if ( nextChar != '\n') { | ||
line++; | ||
charPositionInLine=0; | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's better to change to the following?
} else if ( curChar == '\r' ) { | |
int nextChar = input.LA(2); | |
if ( nextChar != '\n') { | |
line++; | |
charPositionInLine=0; | |
} | |
} else { | |
} else if ( curChar == '\r' ) { | |
if (input.LA(2) == '\n') { | |
input.consume(); | |
} | |
line++; | |
charPositionInLine=0; | |
} else { |
It looks more optimal since it does not require reading \n
twice.
Hi, |
I think it's a very rare case when ANTLR is used for not text parsing (In my opinion it's not very suitable for binary parsing) and when Single I'm basing on the most widespread situations case and ANTLR practical usage. In most cases,
I have no objection but it complicates runtimes. Anyway by default line separator should be
Also, actually binary files don't have "lines" at all. |
You'd be surprised how many people use antlr for use cases it wasn't designed for. |
Hi guys, Thanks very much for helping me think through this problem. Excellent analyses. My thoughts:
In the end, given the simple solution and potential issues arising from a big change across targets for a rare situation is probably not worth it. Consequently, I'm going to close this one with the recommendation that people use a custom character stream or file loader. |
a unit test that checked |
Thanks for your opinion, @parrt My 2 cents: instead of changing character stream (which could lead to position errors) user may override LexerATNSimulator: and pass it to lexer as |
Ah. I wondered about subclassing to change consume() directly. thanks! |
I started working on it in another pull request: Preserve line separators for input runtime tests dat. But there are some problems because git changes EOL depending on OS and other options. On the other hand, we don't have another input format that allows chars escaping: @nixel2007 yes, |
we fixed EOLs on test files with .gitattributes file |
Yes, I've tried to ignore EOL changing for the entire directory:
But maybe it makes sense to set up
|
BTW, it looks like ANTLR 3 also does not treat single |
I am trying this approach in CSharp, but Consume is not a virtual member. |
And, of course, shadowing it won't work as well. |
Allow user to subclass and consume differently. Useful for CR handling as suggested in #2519 Signed-off-by: Alberto Simões <[email protected]>
Allow user to subclass and consume differently. Useful for CR handling as suggested in antlr#2519 Signed-off-by: Alberto Simões <[email protected]>
Allow user to subclass and consume differently. Useful for CR handling as suggested in antlr#2519 Signed-off-by: Alberto Simões <[email protected]>
Allow user to subclass and consume differently. Useful for CR handling as suggested in antlr#2519 Signed-off-by: Alberto Simões <[email protected]> Signed-off-by: Victor Smirnov <[email protected]>
Hello!
I've made changes to treat stand-alone CR as line separator.
This changes were tested on millions of LOC with-out any noticable perfomance regress. Also I use my fork to parse files everyday as a part of static analysis process.
If antlr4 authors want I can make the same changes for all other supported languages.