Refactor `RuneReader` #441

ianlewis · 2023-08-28T13:27:04Z

RuneReader was meant to be used as an easy way to read charset converted text from a file into Go's utf-8 strings.

However, this doesn't seem necessary and todos should be able to deal purely in bytes (so long as language identifiers for comments or strings are not defined in non-ASCII characters). This is especially true if output text of todos is not converted into the local terminal's character set.

The text was updated successfully, but these errors were encountered:

ianlewis · 2023-08-31T01:20:24Z

I wonder if it's wise to work with bytes rather than runes. I think using bytes mostly depends on the assumption that any character encoding we encounter is going to be compatible with ASCII. Some character encodings like UTF-16 are incompatible.

A few other folks have implemented buffered rune readers

Other folks seem to roll their own using bufio.

go-toml used to use go-buffruneio but it seems like they just work on bytes now. It would be interesting to see if go-toml even works on UTF-16 files. Err, no it seems that TOML must be UTF-8 so that's why they can do that.

It might be better to just scrap my implementation and use go-buffruneio as it seems to do everything mine does but it's pretty old and doesn't seem to have gotten any updates. Maybe another project to contribute a bit of maintenance to.

ianlewis · 2023-08-31T01:52:04Z

Another idea is to refactor the RuneReader and release it as a separate package that's a bit more modern than the existing packages.

I'm not sure I like the "unbounded growth unless you call Forget" part of go-buffruneio and the fact that they have a special EOF rune. Probably I would allow the user to specify the size of the buffer rather than allowing it to grow unbounded since you probably don't ever need to call UnreadRune on more than a few runes at a time. I guess the thinking its that once you've found a full token in the input you would call Forget but it seems strange to do it this way.

I don't think I can use bufrr at all since I would need to peek at more than one rune.

Or maybe expand it into a full lexing package that allows for some customization?

ianlewis · 2023-08-31T02:01:54Z

One person created a lexer package based on Rob Pike's GDG Sydney talk (Related #459). This makes me think to try it with generics.
https://github.com/bbuck/go-lexer

But it seems someone has done this already. Seems fairly reasonable. I wonder how fast it would be relative to an implementation I could write. Generics and all the other code to generalize it might slow it down a bit.
https://github.com/zalgonoise/lex

ianlewis · 2023-08-31T02:12:50Z

Hmm, it seems that the lex package's Lex only operates on slices and LexBuffer uses the gio library but that doesn't support buffered io. When would you ever really be reading anything other than bytes?

It looks like an interesting idea, but I would have implemented it differently for sure; using just the simple io.Reader interface (which can work with strings, slices, as well as other Readers etc.).

ianlewis added enhancement New feature or request performance An issue with performance refactor A code refactor or cleanup and removed enhancement New feature or request labels Aug 28, 2023

ianlewis changed the title ~~Remove RuneReader~~ Refactor RuneReader Aug 31, 2023

ianlewis added performance An issue with performance and removed performance An issue with performance labels Sep 1, 2023

ianlewis mentioned this issue Oct 3, 2023

refactor: RuneReader #842

Merged

6 tasks

ianlewis closed this as completed in #842 Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `RuneReader` #441

Refactor `RuneReader` #441

ianlewis commented Aug 28, 2023

ianlewis commented Aug 31, 2023 •

edited

Loading

ianlewis commented Aug 31, 2023 •

edited

Loading

ianlewis commented Aug 31, 2023

ianlewis commented Aug 31, 2023

Refactor RuneReader #441

Refactor RuneReader #441

Comments

ianlewis commented Aug 28, 2023

ianlewis commented Aug 31, 2023 • edited Loading

ianlewis commented Aug 31, 2023 • edited Loading

ianlewis commented Aug 31, 2023

ianlewis commented Aug 31, 2023

Refactor `RuneReader` #441

Refactor `RuneReader` #441

ianlewis commented Aug 31, 2023 •

edited

Loading

ianlewis commented Aug 31, 2023 •

edited

Loading