Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactors tokenizer to emit a Token double-linked-list #463

Merged
merged 1 commit into from
Aug 16, 2016
Merged

Conversation

leebyron
Copy link
Contributor

@leebyron leebyron commented Aug 14, 2016

This refactors the tokenizer, simplifying both tokenizer and parser code slightly (fewer utility functions) and returns tokens as part of the AST's loc information.

Location info is now created as an Object with non-writable defined properties, only start and end are enumerable. That removes the need for the noSource option, which was really only added to not choke JSON.stringify/util.inspect on the full source. Location info now also includes the startToken and endToken associated with every node in the AST, providing easy access to the token stream from any point.

Tokens also have gained a few new properties: line for the 1-indexed line they appear on and prev/next to navigate the dll of Tokens.

Finally, this introduces three new Token types: <SOF> for the empty token at beginning of the file stream, , for ignored comma tokens, and Comment for ignored comment tokens.

These tokens are included in the Token list allowing for easier navigation of the parsed document in GraphQL tools and as a basis for parsing comment descriptions in the schema language.

@ghost ghost added the CLA Signed label Aug 14, 2016
const lexer = createLexer(sourceObj, options || {});
expect(lexer, TokenKind.SOF);
const value = parseValueLiteral(lexer, false);
expect(lexer, TokenKind.EOF);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an existing bug! Previously parseValue('123garbage') would return the value 123, now it rightfully creates a parse error.

leebyron added a commit that referenced this pull request Aug 14, 2016
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
@leebyron
Copy link
Contributor Author

Open to feedback on this. cc @robzhu @dschafer @alangenfeld

@leebyron
Copy link
Contributor Author

See also #464

@leebyron
Copy link
Contributor Author

One change I plan on making is to include a column field in Token, which is very cheap for the lexer to calculate now that it's tracking line, and will allow replacing the existing expensive utility that converts character offset to line/column.

@leebyron leebyron force-pushed the tokensss branch 3 times, most recently from 53014ea to 2403fc7 Compare August 15, 2016 22:36
@leebyron
Copy link
Contributor Author

column now included in tokens (will replace utilities in follow-up PRs)

@leebyron leebyron force-pushed the tokensss branch 2 times, most recently from 9633548 to 026821c Compare August 16, 2016 03:17
@stubailo
Copy link

Does this mean that now by default stringifying the AST won't include source? That sounds pretty great.

@leebyron
Copy link
Contributor Author

That's right @stubailo - the option to omit source from AST location nodes has been removed, but instead all but start and end have been made non-enumerable so they don't appear in JSON.stringify/util.inspect - not having source appear in stringified AST was really the only reason that parse flag was available.

@leebyron leebyron force-pushed the tokensss branch 2 times, most recently from 381df6b to 1024b6e Compare August 16, 2016 05:38
@leebyron
Copy link
Contributor Author

Actually, update on that. I was profiling this change (parser needs to be fast) and discovered that Object.defineProperties is horribly slow, like 100x the cpu time of a simple object creation. Since the parser's runtime is partially bottlenecked by object creation, this translated to a ~20x slowdown of the parser.

I've got a new approach posted up now that instead defines a prototype with toJSON and inspect defined to essentially achieve the same goal. The only downside is mocha/chai's deepEqual comparator doesn't allow for a hook like this, so the tests had to be edited slightly to compensate.

leebyron added a commit that referenced this pull request Aug 16, 2016
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
This refactors the tokenizer, simplifying both tokenizer and parser code slightly (fewer utility functions) and returns tokens as part of the AST's `loc` information. Tokens also have gained a few new properties: `line`/`column` for the 1-indexed position the token begins and `prev`/`next` to navigate the dll of Tokens. Finally, this introduces two new Token types: `<SOF>` for the empty token at beginning of the file stream, and `Comment` for ignored comment tokens. These tokens are *included* in the Token list allowing for easier navigation of the parsed document in GraphQL tools and as a basis for parsing comment descriptions in the schema language.

Providing tokens as part of each AST node's `loc` provides a linkage between the abstract AST and the concrete token list which provides more useful information to GraphQL language tools.

Thanks to the addition of the `<SOF>` token, The AST `Document` node returned from `parse()` now definitionally points to `<SOF>` as its `startToken` and `<EOF>` as its `endToken`, effectively the head and tail of the token double linked list.

This also removes the `noSource` parser option in favor of defining toJSON/inspect methods so that they do not appear in JSON.stringify/util.inspect.
leebyron added a commit that referenced this pull request Aug 16, 2016
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
leebyron added a commit that referenced this pull request Aug 16, 2016
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
@leebyron leebyron merged commit 165b9d3 into master Aug 16, 2016
@leebyron leebyron deleted the tokensss branch August 16, 2016 06:21
leebyron added a commit that referenced this pull request Aug 16, 2016
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
@stubailo
Copy link

parser needs to be fast

Thanks for considering this! The overwhelming of our users are currently using the parser on the client so this perf is definitely very important.

By the way, how did you profile this change? Do you have any ideas about how to automate that kind of stuff?

@leebyron
Copy link
Contributor Author

I used benchmark and microtime npm modules and ran the parser on the kitchen sink file. On a modern MBP I saw ~32,000 parses per second typical, and ~1,700 per second using the non-enumerable properties.

@leebyron
Copy link
Contributor Author

That level of benchmarking is usually only worthwhile in looking for larger swings in performance since the runner isn't super accurate in my experience and isn't very fast to run, so it can be a challenge to set up in an automated way.

@stubailo
Copy link

Yeah, could be a good sanity check still, by running the baseline and new version on the same machine and making sure there isn't like a 2x regression. Maybe I'll try that soon.

leebyron added a commit that referenced this pull request Aug 17, 2016
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
leebyron added a commit that referenced this pull request Aug 24, 2016
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
IvanGoncharov added a commit to IvanGoncharov/graphql-js that referenced this pull request Oct 23, 2018
IvanGoncharov added a commit to IvanGoncharov/graphql-js that referenced this pull request Oct 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants