Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV: don't eagerly check next char after newline #11174

Merged
merged 1 commit into from
Sep 6, 2021

Conversation

asterite
Copy link
Member

@asterite asterite commented Sep 6, 2021

Fixes #11172

When a \n char was detected by the lexer, it would check what the next char was. If it was \0 it means the end of the file/stream as opposed to a newline. However, doing this when streaming over an IO meant that the newline token wasn't returned until more content was available after the newline. This isn't ideal.

This PR changes that so that we only check the content after the newline if a next token is asked. This allows the parser to produce a row right after a newline comes.

This is technically a breaking change because something like "a\n" would previously produce Cell("a"), EOF but now it will produce Cell("a"), Newline, EOF. I think this breaking change is acceptable because:

  • it's very unlikely to break client code (I doubt anyone is using the lexer instead of the high-level code)
  • it can be considered a bugfix: now you can distinguish between "a\n" and "a", if that matters to you
  • the parser will still produce the same output

@asterite asterite added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:serialization topic:stdlib:text labels Sep 6, 2021
Copy link
Member

@sdogruyol sdogruyol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @asterite 🙏

@straight-shoota straight-shoota added this to the 1.2.0 milestone Sep 6, 2021
@asterite asterite merged commit a7f29eb into master Sep 6, 2021
@asterite asterite deleted the bug/csv-consumes-tail branch September 6, 2021 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:serialization topic:stdlib:text
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSV Lexer doesn't read last line of pipe-based IO source
3 participants