-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json decoder behavior changed from versions 21 to 21 and returns non-sensical num_rows for RecordBatch #2722
Comments
Is it possible the reason it is returning a length of 1, is that the specified schema is empty and so technically the number of rows is technically unbounded? How many times does 0 divide 0 😆 ? Not saying we shouldn't fix this to behave the same way as before, but just wondering if this is specific to the case of an empty schema - where it isn't technically incorrect 😅 |
I think the
The previous implementation was always giving the error (because it doesn't deal with empty projections), while the correct result in this case should be a record batch with a |
Sounds related to #1552 |
Closing for now as it seems to work as intended - thanks for reporting @MathiasKindberg feel free to reopen if we have a bug or open an issue for missing documentation. |
Describe the bug
We have a test that relied on triggering an Arrow error by sending in an empty value to the decoder, truly questionable if that is sensible but that is how it was done to exercise failure cases in our business logic. When upgrading from version 21 or 22 this test stopped working.
Debugging it I could track the change to the
next_batch
function. Digging through recent PRs it seems like PR #2604 has changed the behavior when dealing with empty input values.The weird thing now is that when calling
num_rows
on the produced record batch gives 1, even though no data is inside it which I would consider very undesirable behavior unless Arrow specifies that an emptyRecordBatch
has the length 1?To Reproduce
For version 21 this gives:
For Version 22 this gives
Expected behavior
I would expect
num_rows
to return 0 on an empty record_batch. I don't see any issue with empty record batches existing, although the exact behavior of thenext_batch
function should likely be more thoroughly documented in regards to how it errors and why.The text was updated successfully, but these errors were encountered: