Skip to content

Commit

Permalink
simplify interactions with arrow flight APIs (#377)
Browse files Browse the repository at this point in the history
* simplify interactions with arrow flight APIs

Initial work to implement some basic traits

* more polishing and introduction of a couple of wrapper types

Some more polishing of the basic code I provided last week.

* More polishing

Add support for representing tickets as base64 encoded strings.

Also: more polishing of Display, etc...

* improve BOOLEAN writing logic and report error on encoding fail

When writing BOOLEAN data, writing more than 2048 rows of data will
overflow the hard-coded 256 buffer set for the bit-writer in the
PlainEncoder. Once this occurs, further attempts to write to the encoder
fail, becuase capacity is exceeded, but the errors are silently ignored.

This fix improves the error detection and reporting at the point of
encoding and modifies the logic for bit_writing (BOOLEANS). The
bit_writer is initially allocated 256 bytes (as at present), then each
time the capacity is exceeded the capacity is incremented by another
256 bytes.

This certainly resolves the current problem, but it's not exactly a
great fix because the capacity of the bit_writer could now grow
substantially.

Other data types seem to have a more sophisticated mechanism for writing
data which doesn't involve growing or having a fixed size buffer. It
would be desirable to make the BOOLEAN type use this same mechanism if
possible, but that level of change is more intrusive and probably
requires greater knowledge of the implementation than I possess.

resolves: #349

* only manipulate the bit_writer for BOOLEAN data

Tacky, but I can't think of better way to do this without
specialization.

* better isolation of changes

Remove the byte tracking from the PlainEncoder and use the existing
bytes_written() method in BitWriter.

This is neater.

* add test for boolean writer

The test ensures that we can write > 2048 rows to a parquet file and
that when we read the data back, it finishes without hanging (defined as
taking < 5 seconds).

If we don't want that extra complexity, we could remove the
thread/channel stuff and just try to read the file and let the test
runner terminate hanging tests.

* fix capacity calculation error in bool encoding

The values.len() reports the number of values to be encoded and so must
be divided by 8 (bits in a bytes) to determine the effect on the byte
capacity of the bit_writer.

* make BasicAuth accessible

Following merge with master, make sure this is exposed so that
integration tests work.

also: there has been a release since I last looked at this so update the
deprecation warnings.

* fix documentation for ipc_message_from_arrow_schema

TryFrom, not From

* replace deprecated functions in integrations tests with traits

clippy complains about using deprecated functions, so replace them with
the new trait support.

also: fix the trait documentation

* address review comments

 - update deprecated warnings
 - improve TryFrom for DescriptorType
garyanaplan authored Jul 5, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
1 parent 83ad35c commit 21d69ca
Showing 5 changed files with 484 additions and 114 deletions.
1 change: 1 addition & 0 deletions arrow-flight/Cargo.toml
Original file line number Diff line number Diff line change
@@ -27,6 +27,7 @@ license = "Apache-2.0"

[dependencies]
arrow = { path = "../arrow", version = "5.0.0-SNAPSHOT" }
base64 = "0.13"
tonic = "0.4"
bytes = "1"
prost = "0.7"
Loading

0 comments on commit 21d69ca

Please sign in to comment.