Emulation for Java Character class in generated lexers #349

BurtHarris · 2017-12-28T23:44:53Z

It seems like otherwise-portable lexers in the ANTLR communality frequently use something like this (from LexBasic.g4):

// covers all characters above 0xFF which are not a surrogate
// and UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
fragment JavaUnicodeChars
	: ~[\u0000-\u00FF\uD800-\uDBFF]		{Character.isJavaIdentifierPart(_input.LA(-1))}?
	|  [\uD800-\uDBFF] [\uDC00-\uDFFF]	{Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
	;

Unfortunately this generates code that doesn't work in antlr4ts. There are three problems:

References to _input need to be prefixed with this.
The class Character doesn't exist in ECMAScript/TypeScript
The cast (char) doesn't work in ECMAScript/TypeScript

Issue #1 can be fixed simply with a local variable _input in the code emitted for sematic predicates
Issue #2 can be fixed by defining a Character class w/isJavaIdentiferPart in the antlr4ts runtime.

Thus a lowly cast is the remaining issue... Unfortunately it's not limited to the lack of a char type in ECMAScript, the syntax of casts in typescript is also different!

The text was updated successfully, but these errors were encountered:

BurtHarris · 2017-12-28T23:49:52Z

One possible solution for #3 might be to have the code generation tool apply some simple transforms to any code it encounters. A simple regex s/\(char\)/<number>/ might do a world of good.

ChuckJonas · 2017-12-29T21:40:15Z

+1, as this issue threw me off when I first tried to use antlr4ts

sharwell · 2018-05-24T06:57:02Z

One possible solution for #3 might be to have the code generation tool apply some simple transforms to any code it encounters. A simple regex s/\(char\)/<number>/ might do a world of good.

This should either be implemented as a target-language-agnostic DSL for grammar predicates/actions, or not be implemented at all. The proposed implementation introduces risk of breaking actions written in the correct target language, and may or may not work for any given action (a maintenance and usability nightmare).

The current implementation doesn't support rewriting actions automatically into the target language, but at least the behavior is consistent across all the targets. 😄

sharwell · 2018-05-24T06:58:00Z

📝 See #350 (comment) for my take on the Character class.

BurtHarris self-assigned this Dec 28, 2017

sharwell added the documentation label Dec 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emulation for Java Character class in generated lexers #349

Emulation for Java Character class in generated lexers #349

BurtHarris commented Dec 28, 2017

BurtHarris commented Dec 28, 2017 •

edited

Loading

ChuckJonas commented Dec 29, 2017

sharwell commented May 24, 2018

sharwell commented May 24, 2018

Emulation for Java Character class in generated lexers #349

Emulation for Java Character class in generated lexers #349

Comments

BurtHarris commented Dec 28, 2017

BurtHarris commented Dec 28, 2017 • edited Loading

ChuckJonas commented Dec 29, 2017

sharwell commented May 24, 2018

sharwell commented May 24, 2018

BurtHarris commented Dec 28, 2017 •

edited

Loading