Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add css lexer to examples #7

Open
tunnckoCore opened this issue Apr 29, 2016 · 9 comments
Open

add css lexer to examples #7

tunnckoCore opened this issue Apr 29, 2016 · 9 comments

Comments

@tunnckoCore
Copy link
Collaborator

tunnckoCore commented Apr 29, 2016

Port the awesome PostCSS tokenizer, using plugins. Btw this tokenizer actually may be in help even for CSON guys, meaning they can build the CSON syntax using this tokenizer (I tested few complex structures) which we will create using limon and plugins.

/cc @ai @MoOx @jkrems @balupton @RobLoach @ben-eb @kof

@ai
Copy link

ai commented Apr 29, 2016

Hm. It is some kind of universal tokenizer? I like the idea of unversal solution.

But it will be hard to migrate PostCSS to it. For my experience, it is better to create special tokenizer for specific languages.

For example, comment token will be different in different language. In CSS we have even different context — comment could be inside a function (color()), but not inside special “function” url().

And comments is the most important part of tokenizer. If we will put comments tokenizing into parser, it will remove all benefits from tokenizer—parser separation.

@ai
Copy link

ai commented Apr 29, 2016

Ouh, seems like I miss that this tokenizer support plugins. It makes it 2 times more interesting! :).

What about performance? Tokenizer is a slowest part in any parser.

@tunnckoCore
Copy link
Collaborator Author

tunnckoCore commented Apr 29, 2016

oh, forgot to ping the tokenize/lexer/parser guru @wooorm, creator of few awesome things such as parse-latin, parse-english, retext and remark and a few AST-specs. I think it may be interested.

@ai it won't be hard. In anyway postcss parser again is on per character basis. Performance should be at least the same (really, it depends on what plugins do, they can even don't use regex. in anyway, internally all is just one loop over string), but with ability to decompose thing further with plugins.

Tokenizer is a slowest part in any parser.; it is better to create special tokenizer for specific languages.

Indeed. But this lexer is totally agnostic, it just loop over string and passes you each character, its position and the whole input string - all of them available in each plugin.

You can build any type of parser. That's the idea of this separation - there are 3 processes - lexer (with .tokenize method returning tokens) for generating tokens, parser (with .parse method returning AST), and stringifier (with .stringify method) for consuming the AST and composing the new string.
It's simple and awesome.

In one or another way, your tokenizer must be extracted and in anyway it can be used for CSS, JSON and CSON - as retext can be used for what you want based on some parser. And if we think further, any parser (i'm about to push it to github in a bit) can then be extended (again with plugins).
As @wooorm doing it. He have one parser for english/latin, which generates some AST (CST, actually), then parsers on top of this parser extends this AST and make more type of node types - UNIST AST -> NLCST AST and etc.

With separate lexer we can produce what kind of parser/AST we want.

@tunnckoCore
Copy link
Collaborator Author

tunnckoCore commented Apr 29, 2016

In anyway, I need lexer and parser. In anyway, i'll try to do 2 things - json lexer, parser and stringifier (also started work on https://github.com/postjson/postjson); and semver lexer, parser and stringifier (i really don't like semver package - its API is awful, I need more flexibility - and if i can accomplish at least same speed and pass the tests i'll make a PR there); worth nothing to port postcss to use limon and be specifically only for css, then we can merge it to postcss :)

edit: It's going to be great journey! :)

@tunnckoCore
Copy link
Collaborator Author

See the other examples - simple, CSV and semver. :)

@tunnckoCore
Copy link
Collaborator Author

@ai I'm thinking firstly to extract only the tokenizer in separate repo to play there and make some benchmarks, then all add to benchmark the ported tokenizer. So we can see what will be the diffs. :)

@ai
Copy link

ai commented May 2, 2016

I really like the idea of universal tokenizer, but as inventor you should convince me, that it is possible to make fast tokenizer ;).

If you write some proof-of-concept tokenizer with same performance, I will help you finish it ad we will add it to PostCSS.

@tunnckoCore
Copy link
Collaborator Author

If you write some proof-of-concept tokenizer with same performance

Yea that's what i'm talking about, that i'm going to do.

@jkrems
Copy link

jkrems commented May 2, 2016

For CSON reusing the CoffeeScript lexer & parser was a pretty important design decision, e.g. we want to match CoffeeScript's syntax which doesn't exist outside of its lexer/parser (it doesn't have an official, spec'd grammar). If we'd migrate away, I think we'd use a full parser generator like PegJS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants