Writing and Reading Random Access Files #434

okartal · 2023-05-04T08:18:44Z

Maybe related to #353

It is already possible to use Tables.partitioner to write record batches to a single Arrow file. However, when I read that file with Arrow.Table I do not know how to access a specific record batch like here: https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files

According to the docs, this should be possible but I am not sure if that is not implemented yet or simply not documented.

quinnj · 2023-05-23T00:06:27Z

You're right that we don't expose this very well (i.e at all) via Arrow.Table right now; but using Arrow.Stream gives you back an iterator of Arrow.Table for each record batch. But we could probably also expose a way via Arrow.Table to let you get the individual tables. Something to think about, or at least improve in the docs mentioning Arrow.Stream.

okartal · 2023-05-29T21:51:11Z

According to https://arrow.apache.org/docs/python/ipc.html#writing-and-reading-random-access-files we need to use a seek method to implement random access to a batch

Moelf · 2023-05-29T21:55:21Z

we don't have to do any Python implementation says, that's specifically for Python. A batch is a well defined thing in file format, independent of which implementation we're talking about, it's purely a logical question of how do we get there given the schema / metadata and what's the interface for user

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing and Reading Random Access Files #434

Writing and Reading Random Access Files #434

okartal commented May 4, 2023

quinnj commented May 23, 2023

okartal commented May 29, 2023

Moelf commented May 29, 2023 •

edited

Loading

Writing and Reading Random Access Files #434

Writing and Reading Random Access Files #434

Comments

okartal commented May 4, 2023

quinnj commented May 23, 2023

okartal commented May 29, 2023

Moelf commented May 29, 2023 • edited Loading

Moelf commented May 29, 2023 •

edited

Loading