Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(extensions/nanoarrow_ipc): Add single-threaded stream reader #164

Merged
merged 23 commits into from
Mar 22, 2023

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Mar 20, 2023

Higher level runtimes may be able to use the ArrowIpcDecoder (or more than one) and handle IO/parallelization using tools that are difficult to provide from C; however, for testing we do need the ability to read streams in their entirety. This PR provides a tool to do that based on an arbitrary bytes input.

This reduces the overhead of coordinating the various steps required to decode a stream to:

struct ArrowIpcInputStream input;
ArrowIpcInputStreamInit<Buffer|File|Custom Implementation>(&input, ...);

struct ArrowArrayStream stream;
ArrowIpcArrayStreamReaderInit(&stream, &input, nullptr);

struct ArrowSchema schema;
stream.get_schema(&stream, &schema);

struct ArrowArray array;
while (1) {
  stream.get_next(&stream, &array);

  if (array.release != NULL) {
    array.release(&array);
  } else {
    break;
  }
}

schema.release(&schema);
stream.release(&stream);

There is also a utility to read an entire stream from file or stdin:

$ ./dump_stream big.arrows 
Read Schema <0.000122 seconds>
struct
  state: string
  geometry: geoarrow.polygon{list}
    rings: list
      vertices: geoarrow.point{fixed_size_list(2)}
        xy: double
Read 23548499 rows in 1 batch(es) <1.623532 seconds>

@paleolimbot paleolimbot marked this pull request as ready for review March 22, 2023 18:57
@paleolimbot paleolimbot requested a review from lidavidm March 22, 2023 19:03
@paleolimbot paleolimbot merged commit 62cffa6 into apache:main Mar 22, 2023
@paleolimbot paleolimbot deleted the ipc-reader branch March 22, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants