-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using nanopass with non-Lispy source language. #16
Comments
Hey @abhi18av, Yes, you can definitely use the nanopass framework for languages that are not S-expression based, but you do need to write your own lexer and parser (or scanner-less parser) to parse the source files into an AST. One thing you may notice looking at S-expressions, is that they look a great deal like ASTs. There are a few toolkits out there for writing parsers in Scheme, though I'm not really sure what to recommend, since I've not really used most of them. We do have an LALR parser toolkit that we've used in the past, with a hand-written lexer, but I need to double check that we can share it. I've also written my own hand-rolled recursive decent parser, partially to deal with some grammar productions that required more look-ahead than the LALR framework we have allows for. My approach is something like the following: First, we need to have the grammar we want to parse in some form that we can fairly easily read, so something like EBNF. Second, I usually try to make as exact a translation from the original EBNF into a nanopass grammar as I can. This just keeps things simpler, since the parser can then simply produce nanopass forms directly as the parser discovers the structure of the program from the source file. Third, I usually write the lexer (a.k.a scanner, tokenizer) to tokenize the input. As I said, I've always done this by hand by writing a state machine that matches the input and can be called to provide the next input. Finally, I write the parser, either with a parser toolkit or by hand with a recursive decent parser. There are some tricks to this with regard to parsing expressions correctly, in that you need to either construct the grammar in the parser to obey the relative priority of expressions or follow an expression parsing technique to get these correct. One of the nice things about hand-writing the parser is that you can capture whatever source information you'd like to gather in order to provide better error messages, both during the parsing phase and later in the compiler, if the source information is preserved. I'll try to write up an example of the and include it in the documentation and testing examples over the next couple of weeks. |
Thank you @akeep for the wonderfully detailed response! The kind of source language I have in mind is somewhat like Ruby and I'm trying out implementing the compiler frontend based on the nanopass framework. However, since you've mentioned the possibilities of a well tested Lexer and framework in Scheme - I'm actually quite excited! The thing is I do have an inkling that the parser might involve a slightly more complex mechanism since I have to implement the type inference algorithm within the compiler itself. Please, give me a couple more days to think upon what you have outlined and how I could integrate the information - I'm running a bit busy for a while now. Absolutely looking forward to the write-up! 👍 |
Hi @akeep
I've been intensely engaged in understanding and exploring the nanopass framework and I am curious whether this can be used for non-Lispy languages as well. I do sense that it's, of course, possible.
I've come up with a very basic flow, I wish to benchmark against a single source file for now.
So, as you can see it's the very initial part that's hazy.
Could you give a general sketch of
I know it's not an issue with the framework but something like a basic tutorial for non-Lispy source lang. would be really helpful, apart from the current docs and references.
The text was updated successfully, but these errors were encountered: