-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work in lexer-parser feature. #85
Comments
Hi Uziel, I've been wanting to have it for a long time! I would rather keep RichTextFX low-level and versatile, not bound to any specific parser implementation. I would encourage you to start a separate project that depends on both RichTextFX and ANTLR (and the grammars) and provides syntax highlighting out of the box. I am myself thinking of starting such a project using the Papa Carlo incremental parser when I find some time. Nevertheless, I highly support your ANTLR effort, since there are ANTLR grammars available for many languages, while Papa Carlo is still experimental and I have only seen JSON and Java grammars for it. I will provide a link to your project on RichTextFX page if you decide to go this way. Best, |
Papa Carlo sounds amazing!, I'm now want to learn about this. |
Hi Uziel and Tomas, For my thesis, I need to add syntax highlighting to an editor based on the language I'm developing. Yesterday, I whipped a prototype based on Tomas' JavaKeywordAsync example that might help this conversation. You can see it here. My basic approach is to use an ANTLR4 lexer to generate tokens for the editor's text. I map the token's type to css class to be applied to my code area instance. ANTLR's private static StyleSpans<Collection<String>> computeHighlighting(String text){
StyleSpansBuilder<Collection<String>> spansBuilder = new StyleSpansBuilder<>();
int lastEnd = 0;
ShiroLexer lex = new ShiroLexer(new ANTLRInputStream(text));
// parse
for(Token t: lex.getAllTokens()){
spansBuilder.add(Collections.emptyList(), t.getStartIndex() - lastEnd);
spansBuilder.add(Collections.singleton(getStyleClass(t)),
(t.getStopIndex() + 1 - t.getStartIndex()));
lastEnd = t.getStopIndex() + 1;
}
spansBuilder.add(Collections.emptyList(), text.length() - lastEnd);
return spansBuilder.create();
} With regard to supporting a variety of languages in RichTextFX, it shouldn't be too difficult. I haven't had the time to write up the code yet, but this would be my approach. I would experiment with the parser and lexer interpreters provided by the ANTLR4 runtime. These allow you to load a grammar (combined, parser, or lexer) from a file at runtime. The challenge comes in how you take the parse tree or the token stream and turn it into Now that I think of it, this probably means you'd be better off building something like pygments based on ANTLR4. You would collect all the grammars you want to support (ANTLR has repo of different grammars), generate parsers and lexers for them and then write a class to convert those token streams or parse trees (which ever you choose) to Regarding an incremental parser, I've only started to think about how I would make syntax highlighting fast and efficient. I found this post by one of the ANTLR developers useful. |
Quick update: Here is a ANTLR4 parser based implementation of a syntax highlighter for my language. @UzielSilva, you should be able to use the technique to adapt it to a language of your choice. I'm starting on a more generic version that will allow you to choose between a number of different languages using the techniques described in my previous post. Watch Xanthic for progress. |
Hi Jeff, nice work!
Does this mean that you need to write a class manually for each of the supported languages? Wouldn't it possible to require just a mapping (e.g. |
Hi Tomas, Thanks! If you only want to use a lexer, you're right. You would just need a mapping between token types and styles to generate the If you want the grammar to determine the highlighting, you'll need to walk the parse tree and decide in each rule the style to assign to the token. I think in many cases using just a lexer will be enough; however, if you want to assign the same token different styles depending on it's grammatical function, you'll need to write either a ANTLR visitor, or tree listener. For example: nodestmt
: NODE MFNAME ('[' activeSelector ']')? BEGIN NEWLINE
nodeInternal
END
;
portDecl
: portType portName MFNAME
;
MFNAME: UCLETTER (LCLETTER | UCLETTER | DIGIT|'_')* If I want to style the My thinking with Xanthic is to build a pipeline like:
|
Thanks for clarification. Still, isn't there a way to walk the parse tree in a generic way and choose styles based on the names of the grammar rules? The mapping from tokens to styles could then be specified as
or something like that. On the other hand, even grammar-sensitive highlighting doesn't give you all you might want, anyway. For example, one may want different styles for an IDENTIFIER in Java, depending on whether it is a type name, static field, non-static field, final/non-final, local variable, method name, ... You may recognize some of them (e.g. if it's in a type position, then it's a type name), but not all of them, at the grammar level. You would need type-checking for that, but this is how far you get with a syntax-highlighting editor. |
ANTLR does support a subset of XPath for identifying parse tree nodes, but I haven't played with it yet. It might be possible to do what you describe using XPath. Can you walk me through some of the API use cases you have in mind? How would you like syntax highlighting to work?
Yes, you're right. The more info about the language you want to use to inform style decisions the more you'll inch towards writing a sort of interpreter to do the highlighting. With ANTLR writing interpreters of this style are pretty easy. |
At the high level, I imagine something like this: class SyntaxArea extends CodeArea {
public void setSyntax(Syntax syntax);
}
class Syntax {
public Syntax(Grammar grammar, Map<String, String> styleMap);
}
I'm not sure what |
Yes, ANTLR can interpreter grammars at runtime. This would allow someone to load a grammar at runtime from a file. The only drawback of this approach is not being able to walk the parse tree, but I don't that will be an issue because you'd be using XPaths to get the parse tree nodes rather than walking the tree. |
Looks good to me! |
Ok, give me a couple days to get a prototype built. One more question. Is this something you want in the RichTextFX repo, or would you rather I built it as a separate library people can add if they want it? In other words, do you want to add ANTLR as a dependency to RichTextFX? |
As I mentioned in the comment to Uziel above, I would rather keep it a separate project, for
|
Sounds good. Just wanted to double check. I'll post here when I have something to share. |
The code is a bit rough at the moment, but I built a proof of concept. You can find it at https://github.com/jrguenther/Xanthic. Clone the repo and run Enjoy! |
Good job, keep it up! |
Hi Tomas,
I will work in syntax highlighting feature via lexer-parser(Using ANTLRv4 and https://github.com/antlr/grammars-v4).
Is there a problem if after done I create a pull request about this?
The text was updated successfully, but these errors were encountered: