Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Avro Data File #1063

Closed
potter420 opened this issue Jul 29, 2021 · 5 comments
Closed

Read Avro Data File #1063

potter420 opened this issue Jul 29, 2021 · 5 comments

Comments

@potter420
Copy link
Contributor

Describe your feature request

Polars or Py-polars can have some function to read avro data files.

I wrote some small packages to read avro into arrow struct array format based on arrow2 and avro-rs packages. Also have some python bindings. It only deal with flattened data for now, but should be able to handle any kind of nested data soon.
I'm planning to release it in the next few days.

Do you see any possibilities of integrating it with polars?

Thank you.

@jorgecarleitao
Copy link
Collaborator

Hey! developer of arrow2 here; Super exciting! do you have a link or public repo over this?

@potter420
Copy link
Contributor Author

I organizing my code to publish it soon, my private repo have rusoto as dependencies to read from S3, so I would remove that.
I have to say I'm pleased with the performance. The 5x faster than the equivalent code in python (may be poorly written) and concurrency out of the box with rayon. I'm cherry picking the commit that you have added all the FFI types to achieve maximum compatibility with pyarrow.

@potter420
Copy link
Contributor Author

@jorgecarleitao Here is my repos, feel free to use/reuse any parts of the code.
https://github.com/potter420/arrow_avro_rs

Thanks for creating arrow2. I'm having a blast with it.

@ritchie46
Copy link
Member

Nice @potter420. This functionality would definitely be valuable behind a feature flag.

@ghuls
Copy link
Collaborator

ghuls commented Feb 18, 2022

Reading and writing Avro files is supported now.

@ghuls ghuls closed this as completed Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants