Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43833: [Integration] validate that arrow IPC files include a valid embedded IPC stream #43834

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented Aug 26, 2024

Rationale for this change

IPC files should contain a valid IPC stream with EOS. Some arrow producers don't guarantee that the EOS is written. However there might be readers depending on simply reading the embedded IPC stream (and depending on the EOS to avoid reading the Footer as a corrupted message) so we should validate stream reading from IPC files as well as file reading.

What changes are included in this PR?

arrow-json-integration-test will now also try to read a file as a stream, which will fail if the file doesn't include an EOS or otherwise isn't a valid stream.

Are these changes tested?

yes

Are there any user-facing changes?

no

ARROW_ASSIGN_OR_RAISE(auto footer_cookie, arrow_file->ReadAt(file_size - 10, 10));
auto footer_size =
bit_util::FromLittleEndian(util::SafeLoadAs<int32_t>(footer_cookie->data()));
int64_t footer_offset = 8 + file_size - footer_size - 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 8 + is a bit cryptic. I realize it accounts for the 8-byte padded magic at the start of file, but a comment would make this easier to understand.

Or perhaps you can change the checks to something like:

  // The footer is followed by the 10-byte footer length + magic
  const int64_t footer_offset = file_size - footer_size - 10;
  // Make sure the stream ends before the footer
  ARROW_ASSIGN_OR_RAISE(const int64_t stream_size, stream->Tell());
  const int64_t stream_end_offset = stream_size + 8;
  if (stream_end_offset > footer_offset) {

}
ARROW_ASSIGN_OR_RAISE(int64_t stream_size, stream->Tell());

if (footer_offset <= stream_size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why not footer_offset < stream_size? It should be ok if the stream ends just before the footer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants