add css lexer to examples #7

tunnckoCore · 2016-04-29T07:21:05Z

Port the awesome PostCSS tokenizer, using plugins. Btw this tokenizer actually may be in help even for CSON guys, meaning they can build the CSON syntax using this tokenizer (I tested few complex structures) which we will create using limon and plugins.

/cc @ai @MoOx @jkrems @balupton @RobLoach @ben-eb @kof

The text was updated successfully, but these errors were encountered:

ai · 2016-04-29T07:28:49Z

Hm. It is some kind of universal tokenizer? I like the idea of unversal solution.

But it will be hard to migrate PostCSS to it. For my experience, it is better to create special tokenizer for specific languages.

For example, comment token will be different in different language. In CSS we have even different context — comment could be inside a function (color()), but not inside special “function” url().

And comments is the most important part of tokenizer. If we will put comments tokenizing into parser, it will remove all benefits from tokenizer—parser separation.

ai · 2016-04-29T07:30:00Z

Ouh, seems like I miss that this tokenizer support plugins. It makes it 2 times more interesting! :).

What about performance? Tokenizer is a slowest part in any parser.

tunnckoCore · 2016-04-29T09:26:29Z

oh, forgot to ping the tokenize/lexer/parser guru @wooorm, creator of few awesome things such as parse-latin, parse-english, retext and remark and a few AST-specs. I think it may be interested.

@ai it won't be hard. In anyway postcss parser again is on per character basis. Performance should be at least the same (really, it depends on what plugins do, they can even don't use regex. in anyway, internally all is just one loop over string), but with ability to decompose thing further with plugins.

Tokenizer is a slowest part in any parser.; it is better to create special tokenizer for specific languages.

Indeed. But this lexer is totally agnostic, it just loop over string and passes you each character, its position and the whole input string - all of them available in each plugin.

You can build any type of parser. That's the idea of this separation - there are 3 processes - lexer (with .tokenize method returning tokens) for generating tokens, parser (with .parse method returning AST), and stringifier (with .stringify method) for consuming the AST and composing the new string.
It's simple and awesome.

In one or another way, your tokenizer must be extracted and in anyway it can be used for CSS, JSON and CSON - as retext can be used for what you want based on some parser. And if we think further, any parser (i'm about to push it to github in a bit) can then be extended (again with plugins).
As @wooorm doing it. He have one parser for english/latin, which generates some AST (CST, actually), then parsers on top of this parser extends this AST and make more type of node types - UNIST AST -> NLCST AST and etc.

With separate lexer we can produce what kind of parser/AST we want.

tunnckoCore · 2016-04-29T09:32:37Z

In anyway, I need lexer and parser. In anyway, i'll try to do 2 things - json lexer, parser and stringifier (also started work on https://github.com/postjson/postjson); and semver lexer, parser and stringifier (i really don't like semver package - its API is awful, I need more flexibility - and if i can accomplish at least same speed and pass the tests i'll make a PR there); worth nothing to port postcss to use limon and be specifically only for css, then we can merge it to postcss :)

edit: It's going to be great journey! :)

tunnckoCore · 2016-05-01T01:59:03Z

See the other examples - simple, CSV and semver. :)

tunnckoCore · 2016-05-02T15:05:54Z

@ai I'm thinking firstly to extract only the tokenizer in separate repo to play there and make some benchmarks, then all add to benchmark the ported tokenizer. So we can see what will be the diffs. :)

ai · 2016-05-02T15:48:50Z

I really like the idea of universal tokenizer, but as inventor you should convince me, that it is possible to make fast tokenizer ;).

If you write some proof-of-concept tokenizer with same performance, I will help you finish it ad we will add it to PostCSS.

tunnckoCore · 2016-05-02T16:00:37Z

If you write some proof-of-concept tokenizer with same performance

Yea that's what i'm talking about, that i'm going to do.

jkrems · 2016-05-02T17:59:54Z

For CSON reusing the CoffeeScript lexer & parser was a pretty important design decision, e.g. we want to match CoffeeScript's syntax which doesn't exist outside of its lexer/parser (it doesn't have an official, spec'd grammar). If we'd migrate away, I think we'd use a full parser generator like PegJS.

tunnckoCore added todo help wanted labels Apr 29, 2016

tunnckoCore mentioned this issue Apr 29, 2016

cssx syntax compiled to jss json dsl krasimir/cssx#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add css lexer to examples #7

add css lexer to examples #7

tunnckoCore commented Apr 29, 2016 •

edited

Loading

ai commented Apr 29, 2016

ai commented Apr 29, 2016

tunnckoCore commented Apr 29, 2016 •

edited

Loading

tunnckoCore commented Apr 29, 2016 •

edited

Loading

tunnckoCore commented May 1, 2016

tunnckoCore commented May 2, 2016

ai commented May 2, 2016

tunnckoCore commented May 2, 2016

jkrems commented May 2, 2016

add css lexer to examples #7

add css lexer to examples #7

Comments

tunnckoCore commented Apr 29, 2016 • edited Loading

ai commented Apr 29, 2016

ai commented Apr 29, 2016

tunnckoCore commented Apr 29, 2016 • edited Loading

tunnckoCore commented Apr 29, 2016 • edited Loading

tunnckoCore commented May 1, 2016

tunnckoCore commented May 2, 2016

ai commented May 2, 2016

tunnckoCore commented May 2, 2016

jkrems commented May 2, 2016

tunnckoCore commented Apr 29, 2016 •

edited

Loading

tunnckoCore commented Apr 29, 2016 •

edited

Loading

tunnckoCore commented Apr 29, 2016 •

edited

Loading