Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: AvroSource should use the standard CodecFactory to decompress blocks #23213

Closed
steveniemitz opened this issue Sep 13, 2022 · 2 comments · Fixed by #23214
Closed

Comments

@steveniemitz
Copy link
Contributor

What would you like to happen?

Currently AvroSource has its own hand-rolled avro parser, given the class is over 6 years old I assume this is due to it being written before Avro had DataFileReader, etc, with good support for partially reading an avro container file.

One of the downsides of this is that it doesn't share the same CodecFactory infrastructure as avro proper does, so users can't plug in custom CodecFactory instances using CodecFactory.addCodec.

It should be possible to refactor AvroSource to use DataFileReader rather than its own handmade parser, and in doing so would automatically get the benefit of using the normal CodecFactory infrastructure.

Issue Priority

Priority: 3

Issue Component

Component: io-java-avro

@lukecwik
Copy link
Member

From what I vaguely remember, the existing APIs at the time weren't able to support the splitting protocol that was necessary. As a quick sanity check you could compare the DataFileReader interface at the time to the current one to see if there are new APIs related to seeking to specific offsets and knowing what offset we are at.

@steveniemitz
Copy link
Contributor Author

Yeah that was what I figured, all the tests pass using DataFileReader so that's cool. I'm going to run a bigger test with our own internal test suite to make sure it works in the real world too, but the test coverage seems pretty complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants