-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes behaviour for incomplete lines when recover_with_nulls
is enabled
#14252
Fixes behaviour for incomplete lines when recover_with_nulls
is enabled
#14252
Conversation
recover_with_nulls
is enabled
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you leave a comment on the changes in logic in the FST (how does this translate into the intended behavior change)? I can glean some logic from the code, but it's possibly made up :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added comments to the code (get_translation_table
and the nl_tokens
lambda).
Essentially, we just changed the behaviour when we're in recover_from_error
, such that we fail if the JSON line is an incomplete JSON value. E.g.,
- when we see a newline while we are still parsing a
LIST
or aSTRUCT
, we emit anErrorBegin
token that will mark this incomplete JSON line as invalid. - when we see a newline while we are still parsing a
string
orfield name
(e.g.,{"a":"123\n
) , we emit anErrorBegin
token that will mark this incomplete JSON line as invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comments.
It's still challenging to understand case-to-case why regular_tokens
and recovering_tokens
might be different, but I think I got it.
/ok to test |
d087e48
to
bfb5397
Compare
…ing-json-lines-incomplete-lines
/merge |
Description
Closes #14227. Adapts the behaviour of the JSON finite-state transducer (FST) when
recover_with_nulls
istrue
to be more strict and reject lines that contain incomplete JSON objects (aka records) or JSON arrays (aka lists).Checklist