Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing and Reading Random Access Files #434

Open
okartal opened this issue May 4, 2023 · 3 comments
Open

Writing and Reading Random Access Files #434

okartal opened this issue May 4, 2023 · 3 comments

Comments

@okartal
Copy link
Contributor

okartal commented May 4, 2023

Maybe related to #353

It is already possible to use Tables.partitioner to write record batches to a single Arrow file. However, when I read that file with Arrow.Table I do not know how to access a specific record batch like here: https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files

According to the docs, this should be possible but I am not sure if that is not implemented yet or simply not documented.

@quinnj
Copy link
Member

quinnj commented May 23, 2023

You're right that we don't expose this very well (i.e at all) via Arrow.Table right now; but using Arrow.Stream gives you back an iterator of Arrow.Table for each record batch. But we could probably also expose a way via Arrow.Table to let you get the individual tables. Something to think about, or at least improve in the docs mentioning Arrow.Stream.

@okartal
Copy link
Contributor Author

okartal commented May 29, 2023

According to https://arrow.apache.org/docs/python/ipc.html#writing-and-reading-random-access-files we need to use a seek method to implement random access to a batch

@Moelf
Copy link
Contributor

Moelf commented May 29, 2023

we don't have to do any Python implementation says, that's specifically for Python. A batch is a well defined thing in file format, independent of which implementation we're talking about, it's purely a logical question of how do we get there given the schema / metadata and what's the interface for user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants