-
Notifications
You must be signed in to change notification settings - Fork 3.3k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Go target] Generated lexer uses "l" for some functions, "p" for other functions. #4306
Comments
I can make it consistent (probably) but this is not a pattern for go. I
imagine that the original template copied code from the parse gen to the
lexer and forgot to change the receiver. It is idiomatic to use the same
receiver name for all funcs. It would normally be p for parser and l for
lexer.
I’ll take a look
…On Wed, Jun 7, 2023 at 20:08 Ken Domino ***@***.***> wrote:
This is in regards to writing a "target agnostic grammar", where we try to
reuse a grammar even though actions ("semantic predicates") in a grammar
are always target specific.
The javascript
<https://github.com/antlr/grammars-v4/tree/master/javascript/javascript>
grammar contains action code for tokenization. For example, here
<https://github.com/antlr/grammars-v4/blob/6590fdd07b9fa1b4a66025c8315ee2c93bbcbe7d/javascript/javascript/JavaScriptLexer.g4#L48>.
Notice that the "this." prefix is platform specific syntax for a
qualified method call in Java, as well as CSharp, Dart, JavaScript
(modern), and TypeScript. For the other targets, the grammar needs t be sed
-i of the .g4's to get them to work (and they do--"target agnostic
format" really works). For PHP, it needs to be changed to "self::'". For
Python3, "self.". For Cpp, "this->". For Go, it's much, much harder.
The reason it is harder is because the "methods" are implemented using
functions that pass a pointer to the type.
func (l *JavaScriptLexer) OpenBrace_Action(localctx antlr.RuleContext, actionIndex int) {
switch actionIndex {
case 0:
l.ProcessOpenBrace()
default:
panic("No registered action for: " + fmt.Sprint(actionIndex))
}
}
That would be fine--if I knew that "this." should be rewritten always to
"l." because the parameter in the function is named "l". But it is not
always the case. For this action in the lexer grammar
<https://github.com/antlr/grammars-v4/blob/6590fdd07b9fa1b4a66025c8315ee2c93bbcbe7d/javascript/javascript/JavaScriptLexer.g4#L39>,
the parameter is named "p".
func (p *JavaScriptLexer) HashBangLine_Sempred(localctx antlr.RuleContext, predIndex int) bool {
switch predIndex {
case 0:
return l.IsStartOfFile()
default:
panic("No predicate with index: " + fmt.Sprint(predIndex))
}
}
This is rather annoying because rather than use a parameter named "this"
for all these generated functions in the parser and lexer, I need to modify
the grammar to use the name that is used in the templates. I have to look
at the templates for Go, and see where "p" vs "l" are used, then change the
action code to use that name.
Could we change the Go templates just always use "this" as a parameter
name?
—
Reply to this email directly, view it on GitHub
<#4306>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMGD5DEULIMMWBVA2MLXKBVKBANCNFSM6AAAAAAY5YR57U>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Ah, it is because of predicates.
…On Wed, Jun 7, 2023 at 21:22 Jim Idle ***@***.***> wrote:
I can make it consistent (probably) but this is not a pattern for go. I
imagine that the original template copied code from the parse gen to the
lexer and forgot to change the receiver. It is idiomatic to use the same
receiver name for all funcs. It would normally be p for parser and l for
lexer.
I’ll take a look
On Wed, Jun 7, 2023 at 20:08 Ken Domino ***@***.***> wrote:
> This is in regards to writing a "target agnostic grammar", where we try
> to reuse a grammar even though actions ("semantic predicates") in a grammar
> are always target specific.
>
> The javascript
> <https://github.com/antlr/grammars-v4/tree/master/javascript/javascript>
> grammar contains action code for tokenization. For example, here
> <https://github.com/antlr/grammars-v4/blob/6590fdd07b9fa1b4a66025c8315ee2c93bbcbe7d/javascript/javascript/JavaScriptLexer.g4#L48>.
> Notice that the "this." prefix is platform specific syntax for a
> qualified method call in Java, as well as CSharp, Dart, JavaScript
> (modern), and TypeScript. For the other targets, the grammar needs t be sed
> -i of the .g4's to get them to work (and they do--"target agnostic
> format" really works). For PHP, it needs to be changed to "self::'". For
> Python3, "self.". For Cpp, "this->". For Go, it's much, much harder.
>
> The reason it is harder is because the "methods" are implemented using
> functions that pass a pointer to the type.
>
> func (l *JavaScriptLexer) OpenBrace_Action(localctx antlr.RuleContext, actionIndex int) {
> switch actionIndex {
> case 0:
> l.ProcessOpenBrace()
>
> default:
> panic("No registered action for: " + fmt.Sprint(actionIndex))
> }
> }
>
> That would be fine--if I knew that "this." should be rewritten always to
> "l." because the parameter in the function is named "l". But it is not
> always the case. For this action in the lexer grammar
> <https://github.com/antlr/grammars-v4/blob/6590fdd07b9fa1b4a66025c8315ee2c93bbcbe7d/javascript/javascript/JavaScriptLexer.g4#L39>,
> the parameter is named "p".
>
> func (p *JavaScriptLexer) HashBangLine_Sempred(localctx antlr.RuleContext, predIndex int) bool {
> switch predIndex {
> case 0:
> return l.IsStartOfFile()
>
> default:
> panic("No predicate with index: " + fmt.Sprint(predIndex))
> }
> }
>
> This is rather annoying because rather than use a parameter named "this"
> for all these generated functions in the parser and lexer, I need to modify
> the grammar to use the name that is used in the templates. I have to look
> at the templates for Go, and see where "p" vs "l" are used, then change the
> action code to use that name.
>
> Could we change the Go templates just always use "this" as a parameter
> name?
>
> —
> Reply to this email directly, view it on GitHub
> <#4306>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAJ7TMGD5DEULIMMWBVA2MLXKBVKBANCNFSM6AAAAAAY5YR57U>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
|
This would be a "nice to have", but not critical. I can use Trash to extract plain actions vs semantic predicate actions, modify those actions with "l." or "p." accordingly, then reconstruct the grammar. The build tools can handle this. I just don't quite get why it's not a consistent name like "self" in the generated code, instead of all this "l" or "p". |
self or this are different concepts really. But we should be using consistent receiver variable names. |
Actually, there sort of is one, which I accidentally stumbled upon it with a typo I wrote. #3508 |
The cpp version could be a reference rather than a pointer, that would avoid the -> vs . issue. |
Yes, instead of |
The original proposal was $this, Ter changed it to parser, which I think is better because IIRC it’s sometimes called from inner classes.
I don’t see anything in the way for adding $lexer, but maybe Ter does...
… Le 8 juin 2023 à 15:33, Ken Domino ***@***.***> a écrit :
The cpp version could be a reference rather than a pointer, that would avoid the -> vs . issue.
Yes, instead of $parser->foobar(), use $parser.foobar(), and in the Antlr tool, generate code as (*this).foobar(). That would work for at least $parser. Still missing it for the lexer. Either need to add $lexer or better, $this and always use the . operator afterwards.
—
Reply to this email directly, view it on GitHub <#4306 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNQJHBX7HGZMFHXPVUXY3XKHICJANCNFSM6AAAAAAY5YR57U>.
You are receiving this because you commented.
|
Yes, we could use I'll still need the transformGrammar.py hack because there are a couple of other issues where it corrects other problems with codegen. In fortran/fortran90, I have to sed this line to turn it into |
Well I guess whilst it's obviously good practice to name the abstract parser header from its name, it's definitely not something we can enforce. We're not generating or providing that file are we ? #if PHP IIRC you hate the idea (or maybe that's Ivan?), but tbh I haven't seen a better one :-( |
This would be okay. I think my concern was lexing the target code between the
It just makes the lexing more difficult, e.g., is the |
We could certainly pick a different pattern but tbh if developers are clever enough to introduce such complexity then I expect they'd also be able to fix them. |
I believe we need this multi target capability in more than 1 place, but the curly brace encloser is present with all of themHow about @member(CPP) { … ***@***.***(CPP) { … } And so forth ?Envoyé de mon iPhoneLe 8 juin 2023 à 16:51, Ken Domino ***@***.***> a écrit :
This would be okay. I think my concern was lexing the target code between the #if and else. The target code itself may have Cpp preprocessor directives:
grammar Foobar;
#if Cpp
// Cpp code vvv.
#if defined(XXX)
#include "xxx.h"
#else
#include "other.h"
#endif
// Cpp code ^^^
#endif
It just makes the lexing more difficult, e.g., is the #else part of the Cpp target code, or @member code for all other targets? @member uses an action block, so it only has to count and match open and closing braces outside of string literals within the action blocks.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
That's cool. Works for me! |
Yes, I still think the idea with preprocessor directives is not good. I have had experience with Objective-C language processing and encountered a lot of problems with preprocessor directives. Moreover, directives are not part of target language. Probably you meant this topic: Support predicates. I suggest using something like this (it was suggested by @udif in Unified Actions Language): stat
: { java5() }? 'goto' ID ';'
...
@parser::members::java {
boolean java5() {
...
}
}
@parser::members::cpp {
bool java5() {
...
}
} We can use universal function calls and target-specific function declarations. BTW, currently I'm working on Multiplatform feature in Kotlin and it looks like the idea above (in Kotlin there are |
Exactly! Thank you @KvanTTT. Anltr already has The only other alternative I can think of is to add syntax to actions and semantic predicates in Antlr grammars to declare the target. It completely pitches any and all notions of attributes, parser and lexer instances, the
|
Mmm… the bad news is that whilst this may work for Java which has an implicit this, it won’t for other targets, notably JS, TS and Python…Envoyé de mon iPhoneLe 10 juin 2023 à 17:06, Ken Domino ***@***.***> a écrit :
We can use universal function calls and target-specific function declarations.
Exactly! Thank you @KvanTTT. Anltr already has $parser, which somehow someone snuck into the Antlr tool. And, we also have syntax for accessing attributes in actions. But, now we have an impasse where can't agree on something like $parser for lexers, and the translation of the operator that follows the reference into the target-specific syntax in order to make it a little more bearable over in grammars-v4. We don't need to devise a general-purpose language for actions, just a common way to access code in the target. That's a plane of features that I can use.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
(Which is why I had to introduce $parser in the first place)Envoyé de mon iPhoneLe 11 juin 2023 à 12:50, Wanadoo ***@***.***> a écrit :Mmm… the bad news is that whilst this may work for Java which has an implicit this, it won’t for other targets, notably JS, TS and Python…Envoyé de mon iPhoneLe 10 juin 2023 à 17:06, Ken Domino ***@***.***> a écrit :
We can use universal function calls and target-specific function declarations.
Exactly! Thank you @KvanTTT. Anltr already has $parser, which somehow someone snuck into the Antlr tool. And, we also have syntax for accessing attributes in actions. But, now we have an impasse where can't agree on something like $parser for lexers, and the translation of the operator that follows the reference into the target-specific syntax in order to make it a little more bearable over in grammars-v4. We don't need to devise a general-purpose language for actions, just a common way to access code in the target. That's a plane of features that I can use.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Not sure what you are looking at, but "this." is perfectly valid for JavaScript. Here's the constructor code that's generated for the lexer.
I can prove it for TypeScript and Python, too, if you need to see a functioning, complete example. But, it doesn't matter if syntax is added to |
Yes, you are right for Python3. There is no "this." But, the code passes a "self" in each of the wrapper functions. So, my "transformGrammar.py" hack rewrites the "this." to "self.". The generated code also works fine. |
To be precise in Python use can use any word as receiver, not only class C:
def f(foo):
print(foo.x)
c = C()
c.x = 42
c.f() // prints 42 |
It will work work other targets if consider first parameter as receiver (actually on IR level all class members represented as members with first |
Let's move this to a discussion ? |
Yes but I suggest continuing existing ones: Unified Actions Language or Support predicates. |
I'm moving this to the discussions as suggested. But I recommend looking at the hacks that I have to employ in order to get a grammar to work across 8 different targets here. |
(I can't convert the issue to a discussion.) |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
This is in regards to writing a "target agnostic grammar", where we try to reuse a grammar even though actions ("semantic predicates") in a grammar are always target specific.
The javascript grammar contains action code for tokenization. For example, here. Notice that the action is platform-specific code. In Java, the
"this."
syntax is a qualified method call in Java. It also so happens that the syntax works in CSharp, Dart, JavaScript (modern), and TypeScript. For the other targets, the grammar .g4's have to besed -i
to a syntax that works for that platform. "Target agnostic" isn't far enough. For PHP, it needs to be changed to"self::'"
. For Python3,"self."
. For Cpp,"this->"
. For Go, it's harder.The reason it is harder is because actions in a grammar turn into functions that use either "l" or "p", depending on whether the action is a predicate or not. For a non-predicate action, e.g., this, generated code is the following.
For a predicate, this action in the lexer grammar is transformed to a function with a parameter that is named "p".
This is rather annoying because I have keep track of which action uses a "p" vs "l".
Could we change the Go templates just always use "this" as a parameter name for all these generated functions that contain action code? It's not critical, because I can write a script to rewrite the "this." to either "p." or "l." depending on whether the action is a predicate or not.
The text was updated successfully, but these errors were encountered: