-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Info] Working on parsers in PHP #142
Comments
Directly using the parse tree (instead of manually constructing an AST) was something I considered before writing this project - it's a lot easier to implement (basically just grab the Zend grammar file and adjust the kmyacc code to create nodes when reducing) and it's great for code refactoring because you can retain the entire formatting. However it has a problem (apart from the usual differences in use case for a concrete parse tree and an AST): PHP's grammar sucks. Especially the way complex variables are implemented (like properties, methods, varvars, ...) is really weird and results in parse trees that represent syntax, but aren't semantically meaningful. Like, you can't say whether a variable is an object access or an array access without inspecting leaf nodes (and of course there's half a dozen ways that an object access could be represented, depending on the exact details). Which is why I abandoned the parse tree approach originally. I think with the UVS and AST implemented in PHP 7 the grammar got a lot more reasonable and this might be a feasible approach in the future. And probably it's also okay now if you're more interested at things at the namespace/class/method level than at the variable level. |
Closing as this isn't a real issue. Btw there's also https://github.com/grom358/pharborist which tries to better support preserving whitespace and formatting in general. Didn't look closely at it yet. |
Hello,
This "issue" is just an informational notice to let you know about a side project I'm working on, related to parsing in PHP:
https://github.com/nicolas-grekas/Parser is the most recent and in early WIP state. I took @nikic's idea of using kmyacc for generating parsers in PHP. But unlike PHP-Parser, I'm trying not to be bound to parsing only PHP. Instead, I take the symbols in the grammar files as nodes names in my AST, whatever the parsed syntax. Here is a self generating PHP file with its AST nodes next to each source code statement. Here is the same AST but dumped in XML.
I also managed to refactor the abstract kmyacc parser algorithm to make it bijective. This means that non-semantic tokens are preserved (e.g. whitespaces and comments). This could be ported to PHP-Parser now (see Optionally add nodes for whitespace #41).
https://github.com/nicolas-grekas/Patchwork-PHP-Parser is an other stable PHP parser that has its first commit in 2005. The approache it uses is based on a collection of dedicated and contextual token analysers. So, no grammar behind, but only "expert" knowledge. I consider the kmyacc way a better one. My goal now is to port all the good things inside this old but stable parser to the above new one. What good things? Well, this parser is focused on applying code transformations to an existing PHP code. I have written a lot of local parsers for e.g. gathering contextual information (resolving namespaces), backward/forward porting new syntaxes, instrumentation, inlining, etc. See the full list here and see them in action here.
I'm not sure what's next to this notice. But at least that can give some ideas?
The text was updated successfully, but these errors were encountered: