Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the scanner #459

Open
ianlewis opened this issue Aug 29, 2023 · 1 comment
Open

Refactor the scanner #459

ianlewis opened this issue Aug 29, 2023 · 1 comment
Labels
performance An issue with performance refactor A code refactor or cleanup

Comments

@ianlewis
Copy link
Owner

ianlewis commented Aug 29, 2023

Rob Pike had a good talk on the subject.
https://www.youtube.com/watch?v=HxaD_trXwRE
https://go.dev/talks/2011/lex.slide

Some thoughts:

  • Use a more standard lexer/parser architecture. It has a similar CommentScanner/TODOScanner architecture but this could be cleaner.
  • Don't use regex to parse TODO comments. Instead generate lexemes from a lexer that can then be put together into full todos by a parser.
  • Rob Pike's idea to have states be functions was neat but I'm not sure I like that when a state has to hold data. Instead maybe make it a simple interface with a Run method. This would allow states to more easily hold data.
    type state interface {
      Run() state
    }
  • Consider building a generic lexer/parser package using generics.

Some alternative implementations

@ianlewis ianlewis added the refactor A code refactor or cleanup label Aug 29, 2023
This was referenced Aug 30, 2023
@ianlewis ianlewis changed the title refactor: Refactor the scanner Refactor the scanner Sep 1, 2023
@ianlewis ianlewis added the performance An issue with performance label Sep 4, 2023
@ianlewis
Copy link
Owner Author

ianlewis commented Nov 9, 2023

I can also perhaps just read from a byte reader and check the individual bytes match the starting characters for comments strings etc. This is because pretty much all languages use ASCII characters for these which are represented as bytes. I can then just scan to the end of the line or to the end of a multi-line comment to get the comment bytes and convert them to utf8. That way I don't have to convert all bytes in every file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance An issue with performance refactor A code refactor or cleanup
Projects
None yet
Development

No branches or pull requests

1 participant