-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add into_arrow to IntoCanonicalVTable (#1604)
Historically, we've gated the ability to go from Vortex -> Arrow arrays behind the `Canonical` type, which picks one "blessed" Arrow encoding for each of our DTypes. Since the introduction of VarBinView in #1082, we are in a position where there are now 2 Vortex string encodings that can each be directly converted to Arrow. What's more, FSSTArray internally uses a `VarBin` array to encode the FSST-compressed strings. It delegates in its CompareFn implementation to running a comparison against the values, which are `VarBin` that will use the default `compare` codepath which does `into_canonical()?.into_arrow()?` and then uses the Arrow codec. This is slow now, because VarBin.into_canonical() will iterate over all the strings to build a canonical `VarBinView`. This requires a full decompress which makes the pushdown pointless. This PR augments the existing `IntoCanonicalVTable` allowing encodings to implement their own `into_arrow()` method. The default continues to call `into_canonical().into_arrow()`, but we implement a fast version for VarBin.
- Loading branch information
Showing
11 changed files
with
47 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters