Parser for Language Services and Editor Tooling #1426

bd82 · 2018-07-18T19:42:34Z

Hello.

I was wondering if supporting/enabling editor tooling and language services is in the scope of this project.
More specifically I was wondering if having a Parser that has low level capabilities for supporting editor tooling is within the scope of this project.

Such capabilities may include:

Fault Tolerance and reporting multiple syntax errors to the user.
Syntactic Content Assist (providing the next possible kind of Token).
Lossless output format (Know where every comma and parenthesis are in the text).
Re-useable parser that does not only output the compiler AST, meaning it can be customized to output different kinds of outputs for different needs.

I saw there is work done on the graphql-language-service And there there seems to be a separate parser implemented there.

Cheers.
Shahar.

bd82 · 2018-07-18T19:47:49Z

pinging @wincent @asiandrummer due to their work on the graphql-language-service package.

IvanGoncharov · 2018-07-20T10:45:08Z

pinging @wincent @asiandrummer

@bd82 AFAIK they both switched to new projects not related to GraphQL.

My 2¢: It would be great to extend the current parser to support this features if you can do that without compromising graphql-js main purpose: to provide a reference implementation for GraphQL server libraries. That means it should be fast, use a minimum of memory and source code should be simple enough to easily port it to other languages.

bd82 · 2018-07-20T12:32:18Z

Context

The reason I am asking is because I've recently created a GraphQL sample grammar
using the Chevrotain parsing library I've authored.
So I may be able to provide some tentative answers regarding the constraints raised.

That sample grammar is not productive quality (yet), as there are hardly any tests, its just a sample...

Fast

It will be fast, particularly on V8, See performance benchmark. I don't know if it will be as fast as the existing parser yet but more than fast enough.
Some benchmarking would be required to get a better idea.

Memory

I am pretty sure the memory usage would be higher, firstly because all the Tokens are lexed in advance in a single flow and secondly because the recommended workflow relies on the automatic creation of an intermediate data structure (Parse Tree) which contains many arrays.

It is likely possible to mitigate and optimize memory consumption if this would be the only issue remaining.

Easily Portable

I believe this is the biggest potential issue, Lets look at a single parsing rule to get a better understanding:

Definition Rule in the spec.

Definition:
  ExecutableDefinition
  TypeSystemDefinition
  TypeSystemExtension

Definition Rule in Chevrotain:

$.RULE("Definition", () => {
  $.OR([
      { ALT: () => $.SUBRULE($.ExecutableDefinition) },
      { ALT: () => $.SUBRULE($.TypeSystemDefinition) },
      { ALT: () => $.SUBRULE($.TypeSystemExtension) }
    ])
})

Definition Rule in the current Parser:

function parseDefinition(lexer: Lexer<*>): DefinitionNode {
  if (peek(lexer, TokenKind.NAME)) {
    switch (lexer.token.value) {
      case 'query':
      case 'mutation':
      case 'subscription':
      case 'fragment':
        return parseExecutableDefinition(lexer);
      case 'schema':
      case 'scalar':
      case 'type':
      case 'interface':
      case 'union':
      case 'enum':
      case 'input':
      case 'directive':
        return parseTypeSystemDefinition(lexer);
      case 'extend':
        return parseTypeSystemExtension(lexer);
    }
  } else if (peek(lexer, TokenKind.BRACE_L)) {
    return parseExecutableDefinition(lexer);
  } else if (peekDescription(lexer)) {
    return parseTypeSystemDefinition(lexer);
  }

  throw unexpected(lexer);
}

It is clear to see that Chevrotain is much closer to the spec as it is more declarative.
While the hand crafted current parser must explicitly deal with implementation questions (e.g lookahead). But this also means that it is much easier re-implement the current parser in a different programing language as the Chevrotain parser is more like another format for the specs, and specs are not executable.

Chevrotain is even self documenting, see auto generated syntax diagrams 😄

I don't see how to resolve this issue. 😢

IvanGoncharov · 2018-07-20T14:22:28Z

But this also means that it is much easier re-implement the current parser in a different programing language as the Chevrotain parser is more like another format for the specs, and specs are not executable.

Yeah, in this particular case, we can't force people to reimplement Chevrotain before they even start working on implementing the actual parser in their language of choice.
+ Another very important constraint: Apollo, urql and other front-end libraries bundle GraphQL parser so it should be as small as possible.

That said I see a lot of potential in having advanced parser for GraphQL tooling that is completely free from above constraints. I suspect that not only language services but also advanced tools like prettier will benefit from such a parser.

Few questions I have:

Can we make output AST be backward compatible with the graphql-js parser?
Can we reuse graphql-js lexer?
Can we make parser extensible? Meaning that you can rewrite one or many rules? See: Add experimental support for parsing variable definitions in fragments #1141 (comment)

If #3 is possible than such parser can be a critical component in experimenting with GraphQL Query Syntax and SDL. I fill like inflexibility of the current parser is the only reason why nobody wrote "Babel for GraphQL".

Chevrotain is even self documenting, see auto generated syntax diagrams 😄

Would be super cool if you can add short descritions and URLs to a matching paragraphs in http://facebook.github.io/graphql/June2018/#sec-Language

It will be fast, particularly on V8, See performance benchmark. I don't know if it will be as fast as the existing parser yet but more than fast enough. Some benchmarking would be required to get a better idea.

If you have time for that would be great to see. We have some basic perfomance test here: https://github.com/graphql/graphql-js/blob/master/src/language/__tests__/parser-benchmark.js
You can run it with npm run benchmark.

bd82 · 2018-07-20T15:20:53Z

Another very important constraint: Apollo, urql and other front-end libraries bundle GraphQL parser so it should be as small as possible.

Minified Chevrotain it is around 140k
I suppose it may be reduceable by not loading parts we don't need (e.g the lexer engine).

Can we make output AST be backward compatible with the graphql-js parser?

Yes. unless there is something really unique here, we can build whatever data structure we want
Either in embedded actions or in a Parse Tree Visitor. It is really important to not modify the existing AST
to limit the scope of the project.

Can we reuse graphql-js lexer?

Yes, but that won't automatically solve the tokenizing the entire input in advance problem.
Here is an example using Acron's Lexer with a Chevrotain ECMAScript5 Parser.

I will comment more later...

bd82 · 2018-07-20T16:22:44Z

Can we make parser extensible? Meaning that you can rewrite one or many rules?

Yes, Chevrotain grammars can be extended using inheritance semantics.
With a bit of hacking mixins should also be possible.

See Grammar Inheritance example.

There are also some dynamic extension abilities for Lexers,
such as Matching against an abstract "Token Category" instead of a concrete Token.
And because a Lexer is defined using a set of Tokens it is trivial to reuse
a slightly different set of Tokens to define a new Lexer.

Would be super cool if you can add short descritions and URLs to a matching paragraphs in http://facebook.github.io/graphql/June2018/#sec-Language

That is a good idea, I am no good with HTML, but I should be able to modify the rule name headers to include a docs link next to their name. Although that is a lower priority.

If you have time for that would be great to see. We have some basic perfomance test here: https://github.com/graphql/graphql-js/blob/master/src/language/__tests__/parser-benchmark.js
You can run it with npm run benchmark.

I will experiment with this a bit.

brainkim · 2018-07-20T17:46:05Z

Chiming in just to say I’ve had a pretty good experience with chevrotain. I consider it to be a hidden gem of the JavaScript ecosystem, and would love to see it adopted in a larger project like this. It‘s easily one of the best parser libraries in terms of performance, the API is intuitive, its error recovery features can’t be found in (any?) other JavaScript libraries, and the the source code is easy to understand and written in TypeScript. If graphql-js decides to use a parser library I’d love to see it using chevrotain!

IvanGoncharov · 2018-07-20T19:10:00Z

If graphql-js decides to use a parser library I’d love to see it using chevrotain!

@brainkim @bd82 Just to clarify: I personally think current graphql-js parser is good enough solution for a server library so I don't think it worth to replace it with chevrotain or any other generic parser.

That said I think it definitely worth to have to have a separate NPM package with compatible (API and AST format) parser that supports advanced features outlined in the initial comment and also allows the community to experiment with graphql syntax.

bd82 · 2018-07-20T19:20:20Z

Just to clarify: I personally think current graphql-js parser is good enough solution for a server library so I don't think it worth to replace it with chevrotain or any other generic parser.

Yeah, The requirement for easy portability would be mutually exclusive with it.

That said I think it definitely worth to have to have a separate NPM package with compatible (API and AST format) parser that supports advanced features outlined in the initial comment and also allows the community to experiment with graphql syntax.

Should this separate library really produce the same AST?
Should the output structure be more general to allow building different "final" output structures
for different scenarios?

IvanGoncharov · 2018-07-20T20:05:41Z

Should this separate library really produce the same AST?

Not the same, but would be great if it would be compatible so we reuse print, visit, valueFromAST and other useful functions from graphql-js. If for some reason it couldn't be compatible than it should be at least convertible.

Should the output structure be more general to allow building different "final" output structures for different scenarios?

Don't have a lot of experience with parser myself, but I think community usually settles on some AST format as standard e.g. ESTree.

bd82 · 2018-07-20T22:35:30Z

Not the same, but would be great if it would be compatible so we reuse print, visit, valueFromAST and other useful functions from graphql-js. If for some reason it couldn't be compatible than it should be at least convertible.

I will keep this in mind, but we need to remember that the AST used in graphql-js may not be suitable for all editor purposes (I don't know, I have not checked yet...).

In terms of conversions I am not worried because the most basic level structure (Parse Tree)
From which the AST is constructed is low level enough and contains virtually all the information from the text, So practically any higher level structure (AST variation) could be built from it.

Have a look here: https://sap.github.io/chevrotain/playground/ at the default example,
You can explore the automatically generated Parse Tree in the bottom right corner.

Don't have a lot of experience with parser myself, but I think community usually settles on some AST format as standard e.g. ESTree.

Yes, but if I end up building such a thing I'll probably start with a structure suitable for my purposes (building editor services) and only consider standardization and alignment at a later time to reduce the scope of the project.

bd82 · 2018-07-20T23:07:54Z

@IvanGoncharov Where can I get many samples of GraphQL source code?

I'd like to increase the quality of the parser I wrote and need samples for it.
I'll happily use even tens of thousands of lines of samples or more.

IvanGoncharov · 2018-07-22T10:49:31Z

Where can I get many samples of GraphQL source code?

@bd82 Probably the most extensive and up to date source would be lexer/parser tests:
https://github.com/graphql/graphql-js/tree/master/src/language/__tests__

bd82 · 2018-07-29T12:37:06Z

Update

So I've created a separate project graphql-advanced-parser. Currently it only produces a low level Concrete Abstract Tree (no AST yet)
And I don't think I even enabled error recovery yet.

Benchmark

This project also includes a benchmark you can checkout and execute.

Here are the results on my laptop:

GraphQL Parser Benchmark
grahql-js - AST output x 1,791 ops/sec ±0.84% (91 runs sampled)
Chevrotain - CST output x 630 ops/sec ±2.37% (87 runs sampled)
Chevrotain - NO output x 885 ops/sec ±1.13% (92 runs sampled)
Fastest is grahql-js - AST output

graphql-js is obviously faster, but we are dealing with a ~1,000 lines input,
So I personally do not see a problem with only parsing 600,000 lines per second versus 1,800,000.
Especially so because by having partial parsing capabilities one could reduce the input needed to be parsed by several orders of magnitude.

Also Note that the Chevrotain based parser has not been optimized (yet)
and may never be fully optimized because I prefer to keep 1 to 1 mapping to the spec as much as possible.

Next Steps

I guess creating an AST, preferably one that is a strict superset of the graphql-js ast would be the next step.

Question

I see graphql-language-service-parser package on npm is fairly popular and used in all sorts of editor tooling around graphQL.

You mentioned, earlier that this package's author has moved onto other things, does this mean that that package is no longer maintained and will slowly deviate from the newest specs?
I am trying to understand if my efforts here may replace graphql-language-service-parser
package in the longer term?

IvanGoncharov · 2018-07-29T13:58:33Z

So I personally do not see a problem with only parsing 600,000 lines per second versus 1,800,000.

In some applications speed is important. AFAIK Relay stores GraphQL schema as SDL file and for some API such schemas can be extremely big, e.g. Facebook claims to have more than 15000 types in their schema.

But as you initially pointed out current parser lack features that are critical for some applications. It would be ideal to have a parser which is both feature-reach and a fast however if it's not possible it would make sense to have two separate ones.

I guess creating an AST, preferably one that is a strict superset of the graphql-js ast would be the next step.

👍 It would be a great step forward.
Another important step would be to show extensibility of a parser and as example implement this experimental feature as a "plugin" to your parser.

You mentioned, earlier that this package's author has moved onto other things, does this mean that that package is no longer maintained and will slowly deviate from the newest specs?

It's already deviated, e.g. its lexer doesn't support the & symbol.
However, it's less critical since all of the deviations is contained to SDL and it's primarily used to parse queries.

I am trying to understand if my efforts here may replace graphql-language-service-parser package in the longer term?

Disclaimer: I'm not a Facebook emploee, just a very active external contributor with commit rights.

It's a complex question: Ideally I would like to add as much as possible features into existing graphql-js parser without comprimising it's speed or portatibility.
I suppect it's not possible for all features but we need to atleast try since every improvement to refence implementation of a parser will benefit GraphQL implementation in all other languages.
Your current work is big help in this effort since it easier to experiment with adding features if you already have another GraphQL parser implementing it.

As for replacing graphql-language-service-parser in particular I think someone from Facebook team should participate in this disscussion.

bd82 · 2018-07-29T15:55:06Z

In some applications speed is important. AFAIK Relay stores GraphQL schema as SDL file and for some API such schemas can be extremely big, e.g. Facebook claims to have more than 15000 types in their schema.

In such an extreme scenario I would expect to avoid parsing graphql text (everytime) altogether and instead optimize by serializing the huge schema AST in an easier (faster) to read format (e.g JSON).

Also keep in mind that parsing is just one step in the whole process, so parsing performance regressions do not translate directly to same percentages of overall performance regressions.

But as you initially pointed out current parser lack features that are critical for some applications. It would be ideal to have a parser which is both feature-reach and a fast however if it's not possible it would make sense to have two separate ones.

It may be possible to expand the capabilities of the existing parser while avoiding major performance regressions. For example, the TypeScript Parser by Microsoft supports both regular compilation flows and language services flows. There is bound to be some performance cost, but when using Chevrotain there may be additional performance costs due to using a more generic and abstracted tool.

Another important step would be to show extensibility of a parser and as example implement this experimental feature as a "plugin" to your parser.

I will have a look at this.

mjmahone · 2018-07-29T19:42:38Z

@bd82: I work at Facebook and have run benchmarks on parsing: the actual parsing of GraphQL text is not much worse than just reading in a JSON AST. It's also a more compact and readable representation, so we often serialize to GraphQL's SDL instead of JSON when we have the choice between the two.

But we frequently do operations on thousands of query and fragment definitions, in addition to our ten+ thousand types schema. We will not compromise the speed of graphql-js's parser. I'd be happy to see an outside-of-graphql-js parser built that can plug in with other components of graphql-js, though!

bd82 · 2018-07-29T22:51:28Z

run benchmarks on parsing: the actual parsing of GraphQL text is not much worse than just reading in a JSON AST

Interesting that the de-serialization of the GraphQL text is so fast.

But we frequently do operations on thousands of query and fragment definitions, in addition to our ten+ thousand types schema. We will not compromise the speed of graphql-js's parser.

I am still a little unsure about the actual speed requirements (parser vs rest of the compiler ) but I don't have these inputs to bench and profile and it does not really matter from my perspective as portability concerns makes using Chevrotain as the default parser irrelevant.

I'd be happy to see an outside-of-graphql-js parser built that can plug in with other components of graphql-js, though!

I'll be looking into this, if I can create an AST that is a strict superset it could even be plug-able to the rest of graphql-js, But I worry that this AST would have to support partial results, so the rest of the compiler could break due to assumptions that certain properties always exist.

My main focus is to create an AST that can be used for language services such as auto complete / go to definition / find usages ...

bd82 · 2018-07-30T21:19:54Z

@IvanGoncharov:

Another important step would be to show extensibility of a parser and as example implement this experimental feature as a "plugin" to your parser.

See this example for a very basic plugin system allowing to overwrite parser (and lexer) rules externally from the parser.

IvanGoncharov added the needs exploration label Jul 20, 2018

IvanGoncharov added this to the v15.0.0 milestone Jan 30, 2019

bd82 mentioned this issue Jun 27, 2019

Some Questions about the parser implemented here. graphql/graphiql#888

Open

IvanGoncharov removed this from the v15.x.x milestone Aug 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser for Language Services and Editor Tooling #1426

Parser for Language Services and Editor Tooling #1426

bd82 commented Jul 18, 2018

bd82 commented Jul 18, 2018

IvanGoncharov commented Jul 20, 2018

bd82 commented Jul 20, 2018 •

edited

Loading

IvanGoncharov commented Jul 20, 2018 •

edited

Loading

bd82 commented Jul 20, 2018

bd82 commented Jul 20, 2018

brainkim commented Jul 20, 2018

IvanGoncharov commented Jul 20, 2018

bd82 commented Jul 20, 2018

IvanGoncharov commented Jul 20, 2018 •

edited

Loading

bd82 commented Jul 20, 2018

bd82 commented Jul 20, 2018

IvanGoncharov commented Jul 22, 2018

bd82 commented Jul 29, 2018

IvanGoncharov commented Jul 29, 2018

bd82 commented Jul 29, 2018

mjmahone commented Jul 29, 2018

bd82 commented Jul 29, 2018

bd82 commented Jul 30, 2018

Parser for Language Services and Editor Tooling #1426

Parser for Language Services and Editor Tooling #1426

Comments

bd82 commented Jul 18, 2018

bd82 commented Jul 18, 2018

IvanGoncharov commented Jul 20, 2018

bd82 commented Jul 20, 2018 • edited Loading

Context

Fast

Memory

Easily Portable

IvanGoncharov commented Jul 20, 2018 • edited Loading

bd82 commented Jul 20, 2018

bd82 commented Jul 20, 2018

brainkim commented Jul 20, 2018

IvanGoncharov commented Jul 20, 2018

bd82 commented Jul 20, 2018

IvanGoncharov commented Jul 20, 2018 • edited Loading

bd82 commented Jul 20, 2018

bd82 commented Jul 20, 2018

IvanGoncharov commented Jul 22, 2018

bd82 commented Jul 29, 2018

Update

Benchmark

Next Steps

Question

IvanGoncharov commented Jul 29, 2018

bd82 commented Jul 29, 2018

mjmahone commented Jul 29, 2018

bd82 commented Jul 29, 2018

bd82 commented Jul 30, 2018

bd82 commented Jul 20, 2018 •

edited

Loading

IvanGoncharov commented Jul 20, 2018 •

edited

Loading

IvanGoncharov commented Jul 20, 2018 •

edited

Loading