-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactors tokenizer to emit a Token double-linked-list #463
Conversation
const lexer = createLexer(sourceObj, options || {}); | ||
expect(lexer, TokenKind.SOF); | ||
const value = parseValueLiteral(lexer, false); | ||
expect(lexer, TokenKind.EOF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an existing bug! Previously parseValue('123garbage')
would return the value 123
, now it rightfully creates a parse error.
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
Open to feedback on this. cc @robzhu @dschafer @alangenfeld |
See also #464 |
One change I plan on making is to include a |
53014ea
to
2403fc7
Compare
column now included in tokens (will replace utilities in follow-up PRs) |
9633548
to
026821c
Compare
Does this mean that now by default stringifying the AST won't include |
That's right @stubailo - the option to omit source from AST location nodes has been removed, but instead all but |
381df6b
to
1024b6e
Compare
Actually, update on that. I was profiling this change (parser needs to be fast) and discovered that I've got a new approach posted up now that instead defines a prototype with |
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
This refactors the tokenizer, simplifying both tokenizer and parser code slightly (fewer utility functions) and returns tokens as part of the AST's `loc` information. Tokens also have gained a few new properties: `line`/`column` for the 1-indexed position the token begins and `prev`/`next` to navigate the dll of Tokens. Finally, this introduces two new Token types: `<SOF>` for the empty token at beginning of the file stream, and `Comment` for ignored comment tokens. These tokens are *included* in the Token list allowing for easier navigation of the parsed document in GraphQL tools and as a basis for parsing comment descriptions in the schema language. Providing tokens as part of each AST node's `loc` provides a linkage between the abstract AST and the concrete token list which provides more useful information to GraphQL language tools. Thanks to the addition of the `<SOF>` token, The AST `Document` node returned from `parse()` now definitionally points to `<SOF>` as its `startToken` and `<EOF>` as its `endToken`, effectively the head and tail of the token double linked list. This also removes the `noSource` parser option in favor of defining toJSON/inspect methods so that they do not appear in JSON.stringify/util.inspect.
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
Thanks for considering this! The overwhelming of our users are currently using the parser on the client so this perf is definitely very important. By the way, how did you profile this change? Do you have any ideas about how to automate that kind of stuff? |
I used benchmark and microtime npm modules and ran the parser on the kitchen sink file. On a modern MBP I saw ~32,000 parses per second typical, and ~1,700 per second using the non-enumerable properties. |
That level of benchmarking is usually only worthwhile in looking for larger swings in performance since the runner isn't super accurate in my experience and isn't very fast to run, so it can be a challenge to set up in an automated way. |
Yeah, could be a good sanity check still, by running the baseline and new version on the same machine and making sure there isn't like a 2x regression. Maybe I'll try that soon. |
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
This adds descriptions to the `buildASTSchema` (string->schema) and `schemaPrinter` (schema->string) via walking the previous full-line contiguous comment block. This is dependent on #463.
Reflect changes in graphql#463
Reflect changes in graphql#463
This refactors the tokenizer, simplifying both tokenizer and parser code slightly (fewer utility functions) and returns tokens as part of the AST's
loc
information.Location info is now created as an Object with non-writable defined properties, only
start
andend
are enumerable. That removes the need for thenoSource
option, which was really only added to not choke JSON.stringify/util.inspect on the full source. Location info now also includes thestartToken
andendToken
associated with every node in the AST, providing easy access to the token stream from any point.Tokens also have gained a few new properties:
line
for the 1-indexed line they appear on andprev
/next
to navigate the dll of Tokens.Finally, this introduces three new Token types:
<SOF>
for the empty token at beginning of the file stream,,
for ignored comma tokens, andComment
for ignored comment tokens.These tokens are included in the Token list allowing for easier navigation of the parsed document in GraphQL tools and as a basis for parsing comment descriptions in the schema language.