[Info] Working on parsers in PHP #142

nicolas-grekas · 2014-11-13T09:51:30Z

Hello,

This "issue" is just an informational notice to let you know about a side project I'm working on, related to parsing in PHP:

https://github.com/nicolas-grekas/Parser is the most recent and in early WIP state. I took @nikic's idea of using kmyacc for generating parsers in PHP. But unlike PHP-Parser, I'm trying not to be bound to parsing only PHP. Instead, I take the symbols in the grammar files as nodes names in my AST, whatever the parsed syntax. Here is a self generating PHP file with its AST nodes next to each source code statement. Here is the same AST but dumped in XML.

I also managed to refactor the abstract kmyacc parser algorithm to make it bijective. This means that non-semantic tokens are preserved (e.g. whitespaces and comments). This could be ported to PHP-Parser now (see Optionally add nodes for whitespace #41).
https://github.com/nicolas-grekas/Patchwork-PHP-Parser is an other stable PHP parser that has its first commit in 2005. The approache it uses is based on a collection of dedicated and contextual token analysers. So, no grammar behind, but only "expert" knowledge. I consider the kmyacc way a better one. My goal now is to port all the good things inside this old but stable parser to the above new one. What good things? Well, this parser is focused on applying code transformations to an existing PHP code. I have written a lot of local parsers for e.g. gathering contextual information (resolving namespaces), backward/forward porting new syntaxes, instrumentation, inlining, etc. See the full list here and see them in action here.

I'm not sure what's next to this notice. But at least that can give some ideas?

nikic · 2014-12-05T23:45:42Z

Directly using the parse tree (instead of manually constructing an AST) was something I considered before writing this project - it's a lot easier to implement (basically just grab the Zend grammar file and adjust the kmyacc code to create nodes when reducing) and it's great for code refactoring because you can retain the entire formatting.

However it has a problem (apart from the usual differences in use case for a concrete parse tree and an AST): PHP's grammar sucks. Especially the way complex variables are implemented (like properties, methods, varvars, ...) is really weird and results in parse trees that represent syntax, but aren't semantically meaningful. Like, you can't say whether a variable is an object access or an array access without inspecting leaf nodes (and of course there's half a dozen ways that an object access could be represented, depending on the exact details).

Which is why I abandoned the parse tree approach originally. I think with the UVS and AST implemented in PHP 7 the grammar got a lot more reasonable and this might be a feasible approach in the future. And probably it's also okay now if you're more interested at things at the namespace/class/method level than at the variable level.

nikic · 2015-03-10T18:52:05Z

Closing as this isn't a real issue.

Btw there's also https://github.com/grom358/pharborist which tries to better support preserving whitespace and formatting in general. Didn't look closely at it yet.

ockham mentioned this issue Nov 13, 2014

Mark non-private, not-protected abstract member function as public #143

Closed

nikic closed this as completed Mar 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Info] Working on parsers in PHP #142

[Info] Working on parsers in PHP #142

nicolas-grekas commented Nov 13, 2014

nikic commented Dec 5, 2014

nikic commented Mar 10, 2015

[Info] Working on parsers in PHP #142

[Info] Working on parsers in PHP #142

Comments

nicolas-grekas commented Nov 13, 2014

nikic commented Dec 5, 2014

nikic commented Mar 10, 2015