Refactors tokenizer to emit a Token double-linked-list #463

leebyron · 2016-08-14T04:21:12Z

This refactors the tokenizer, simplifying both tokenizer and parser code slightly (fewer utility functions) and returns tokens as part of the AST's loc information.

Location info is now created as an Object with non-writable defined properties, only start and end are enumerable. That removes the need for the noSource option, which was really only added to not choke JSON.stringify/util.inspect on the full source. Location info now also includes the startToken and endToken associated with every node in the AST, providing easy access to the token stream from any point.

Tokens also have gained a few new properties: line for the 1-indexed line they appear on and prev/next to navigate the dll of Tokens.

Finally, this introduces three new Token types: <SOF> for the empty token at beginning of the file stream, , for ignored comma tokens, and Comment for ignored comment tokens.

These tokens are included in the Token list allowing for easier navigation of the parsed document in GraphQL tools and as a basis for parsing comment descriptions in the schema language.

leebyron · 2016-08-14T04:31:20Z

src/language/parser.js

+  const lexer = createLexer(sourceObj, options || {});
+  expect(lexer, TokenKind.SOF);
+  const value = parseValueLiteral(lexer, false);
+  expect(lexer, TokenKind.EOF);


This was an existing bug! Previously parseValue('123garbage') would return the value 123, now it rightfully creates a parse error.

This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.

leebyron · 2016-08-15T19:33:57Z

Open to feedback on this. cc @robzhu @dschafer @alangenfeld

leebyron · 2016-08-15T19:34:09Z

See also #464

leebyron · 2016-08-15T19:35:27Z

One change I plan on making is to include a column field in Token, which is very cheap for the lexer to calculate now that it's tracking line, and will allow replacing the existing expensive utility that converts character offset to line/column.

leebyron · 2016-08-15T22:37:17Z

column now included in tokens (will replace utilities in follow-up PRs)

stubailo · 2016-08-16T03:48:17Z

Does this mean that now by default stringifying the AST won't include source? That sounds pretty great.

leebyron · 2016-08-16T04:00:12Z

That's right @stubailo - the option to omit source from AST location nodes has been removed, but instead all but start and end have been made non-enumerable so they don't appear in JSON.stringify/util.inspect - not having source appear in stringified AST was really the only reason that parse flag was available.

leebyron · 2016-08-16T05:43:31Z

Actually, update on that. I was profiling this change (parser needs to be fast) and discovered that Object.defineProperties is horribly slow, like 100x the cpu time of a simple object creation. Since the parser's runtime is partially bottlenecked by object creation, this translated to a ~20x slowdown of the parser.

I've got a new approach posted up now that instead defines a prototype with toJSON and inspect defined to essentially achieve the same goal. The only downside is mocha/chai's deepEqual comparator doesn't allow for a hook like this, so the tests had to be edited slightly to compensate.

This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.

This refactors the tokenizer, simplifying both tokenizer and parser code slightly (fewer utility functions) and returns tokens as part of the AST's `loc` information. Tokens also have gained a few new properties: `line`/`column` for the 1-indexed position the token begins and `prev`/`next` to navigate the dll of Tokens. Finally, this introduces two new Token types: `<SOF>` for the empty token at beginning of the file stream, and `Comment` for ignored comment tokens. These tokens are *included* in the Token list allowing for easier navigation of the parsed document in GraphQL tools and as a basis for parsing comment descriptions in the schema language. Providing tokens as part of each AST node's `loc` provides a linkage between the abstract AST and the concrete token list which provides more useful information to GraphQL language tools. Thanks to the addition of the `<SOF>` token, The AST `Document` node returned from `parse()` now definitionally points to `<SOF>` as its `startToken` and `<EOF>` as its `endToken`, effectively the head and tail of the token double linked list. This also removes the `noSource` parser option in favor of defining toJSON/inspect methods so that they do not appear in JSON.stringify/util.inspect.

This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.

stubailo · 2016-08-16T07:15:24Z

parser needs to be fast

Thanks for considering this! The overwhelming of our users are currently using the parser on the client so this perf is definitely very important.

By the way, how did you profile this change? Do you have any ideas about how to automate that kind of stuff?

leebyron · 2016-08-16T07:35:24Z

I used benchmark and microtime npm modules and ran the parser on the kitchen sink file. On a modern MBP I saw ~32,000 parses per second typical, and ~1,700 per second using the non-enumerable properties.

leebyron · 2016-08-16T07:39:22Z

That level of benchmarking is usually only worthwhile in looking for larger swings in performance since the runner isn't super accurate in my experience and isn't very fast to run, so it can be a challenge to set up in an automated way.

stubailo · 2016-08-16T07:55:42Z

Yeah, could be a good sanity check still, by running the baseline and new version on the same machine and making sure there isn't like a 2x regression. Maybe I'll try that soon.

This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.

Reflect changes in graphql#463

ghost added the CLA Signed label Aug 14, 2016

leebyron reviewed Aug 14, 2016
View reviewed changes

leebyron mentioned this pull request Aug 14, 2016

Schema lang descriptions #464

Merged

OlegIlyenko mentioned this pull request Aug 14, 2016

Update description formatting/parsing based in the changes in the reference implementation sangria-graphql/sangria#155

Closed

leebyron force-pushed the tokensss branch 3 times, most recently from 53014ea to 2403fc7 Compare August 15, 2016 22:36

leebyron force-pushed the tokensss branch 2 times, most recently from 9633548 to 026821c Compare August 16, 2016 03:17

leebyron force-pushed the tokensss branch 2 times, most recently from 381df6b to 1024b6e Compare August 16, 2016 05:38

leebyron force-pushed the tokensss branch from 1024b6e to 8a98799 Compare August 16, 2016 05:55

leebyron force-pushed the tokensss branch from 8a98799 to 77f2785 Compare August 16, 2016 06:06

leebyron merged commit 165b9d3 into master Aug 16, 2016

leebyron deleted the tokensss branch August 16, 2016 06:21

leebyron mentioned this pull request Aug 24, 2016

parse comments beginning with ## as descriptions on schema definitions #427

Closed

jnwng mentioned this pull request Mar 22, 2017

TypeError: Cannot read property 'body' of undefined - Fragment in *.graphql file does't load apollographql/graphql-tag#64

Closed

IvanGoncharov added a commit to IvanGoncharov/graphql-js that referenced this pull request Oct 23, 2018

Fix comments to reflex current behaviour

33bf1a3

Reflect changes in graphql#463

IvanGoncharov added a commit to IvanGoncharov/graphql-js that referenced this pull request Oct 23, 2018

Fix comments to reflex current behaviour

18b2b24

Reflect changes in graphql#463

IvanGoncharov mentioned this pull request Oct 23, 2018

Lexer: Fix comments to reflex current behaviour #1558

Merged

snyk-bot mentioned this pull request Apr 11, 2021

[Snyk] Upgrade eslint-plugin-flowtype from 5.2.0 to 5.4.0 turkdevops/graphql-js#134

Closed

This was referenced May 4, 2021

[Snyk] Upgrade eslint-plugin-flowtype from 5.2.0 to 5.7.0 turkdevops/graphql-js#158

Closed

[Snyk] Upgrade eslint-plugin-flowtype from 5.2.0 to 5.7.1 turkdevops/graphql-js#166

Closed

snyk-bot mentioned this pull request May 14, 2021

[Snyk] Upgrade eslint-plugin-flowtype from 5.2.0 to 5.7.2 turkdevops/graphql-js#174

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactors tokenizer to emit a Token double-linked-list #463

Refactors tokenizer to emit a Token double-linked-list #463

leebyron commented Aug 14, 2016 •

edited

Loading

leebyron Aug 14, 2016

leebyron commented Aug 15, 2016

leebyron commented Aug 15, 2016

leebyron commented Aug 15, 2016

leebyron commented Aug 15, 2016

stubailo commented Aug 16, 2016

leebyron commented Aug 16, 2016

leebyron commented Aug 16, 2016

stubailo commented Aug 16, 2016

leebyron commented Aug 16, 2016

leebyron commented Aug 16, 2016

stubailo commented Aug 16, 2016

Refactors tokenizer to emit a Token double-linked-list #463

Refactors tokenizer to emit a Token double-linked-list #463

Conversation

leebyron commented Aug 14, 2016 • edited Loading

leebyron Aug 14, 2016

Choose a reason for hiding this comment

leebyron commented Aug 15, 2016

leebyron commented Aug 15, 2016

leebyron commented Aug 15, 2016

leebyron commented Aug 15, 2016

stubailo commented Aug 16, 2016

leebyron commented Aug 16, 2016

leebyron commented Aug 16, 2016

stubailo commented Aug 16, 2016

leebyron commented Aug 16, 2016

leebyron commented Aug 16, 2016

stubailo commented Aug 16, 2016

leebyron commented Aug 14, 2016 •

edited

Loading