Add compression support to bijection-avro #174
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This API enhancement is backwards compatible.
We add
withCompression()
factory methods toSpecificAvroCodecs
andGenericAvroCodecs
, and also add the following three convenience methods:withBzip2Compression
withDeflateCompression
withSnappyCompression
(I did not add a
withXzCompression
method as this codec was apparently introduced in Avro 1.7.6, and Bijection is currently still using the older 1.7.5 version.)Usage examples
Compression in Avro is transparent to readers of the data, which means that there is no change needed on the decoding side of things.
No compression support added to
toBinary
methodsPlease note that I did not add corresponding compression support to
GenericAvroCodecs.toBinary
andSpecificAvroCodecs.toBinary
(i.e. the Injection variants that do not embed the Avro schema into the encoded binary data). This is because Avro's API provides compression only at the file container level (i.e. block compression). In other words, without using Avro'sDataFileWriter
class -- which is what Bijection does forapply
but not fortoBinary
-- we cannot set a compression codec. We can try to work around that limitation, but this would turntoBinary
into a renamedapply
method, and it would make the code inconsistent because suddenlytoBinary
would embed the Avro schema into each encoded record (which it does not do in the current code, and which is IMHO the core semantic difference betweentoBinary
andapply
).