-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dictionary_expresions feature (#4386) #4999
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,7 +24,7 @@ repository = "https://github.com/apache/arrow-datafusion" | |
readme = "README.md" | ||
authors = ["Apache Arrow <[email protected]>"] | ||
license = "Apache-2.0" | ||
keywords = [ "arrow", "query", "sql" ] | ||
keywords = ["arrow", "query", "sql"] | ||
edition = "2021" | ||
rust-version = "1.62" | ||
|
||
|
@@ -35,12 +35,15 @@ path = "src/lib.rs" | |
[features] | ||
crypto_expressions = ["md-5", "sha2", "blake2", "blake3"] | ||
default = ["crypto_expressions", "regex_expressions", "unicode_expressions"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should keep dictionary support as a default, if possible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I fairly strongly disagree, it is pretty esoteric. As a data point, none of IOx's integration tests require this, and we use dictionaries a LOT 😄 It is important to highlight this isn't "dictionary support" but non-scalar, binary dictionary kernels which are pretty unusual in practice There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am not sure how to quantify how esoteric the feature is or how commonly it is used. Clearly IOx uses it. I was just thinking that this PR changes the default behavior But maybe that is ok. Perhaps some other committers have thoughts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That is the point I'm trying to make, IOx doesn't use it, at least not within any of its tests. A user theoretically could construct a query that directly compares dictionary columns, in practice there are extremely limited use-cases that come to mind of this. This feature was only enabled in #4168 prior to that it was disabled There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
🤔 when I didn't enable the dyn dictionary kernels in arrow iox tests failed in some past version We have it enabled here: https://github.com/influxdata/influxdb_iox/blob/6f39ae342e64848bd6555bddbc1d3fa30050f75e/arrow_util/Cargo.toml#L12 |
||
# Enables support for non-scalar, binary operations on dictionaries | ||
# Note: this results in significant additional codegen | ||
dictionary_expressions = ["arrow/dyn_cmp_dict", "arrow/dyn_arith_dict"] | ||
regex_expressions = ["regex"] | ||
unicode_expressions = ["unicode-segmentation"] | ||
|
||
[dependencies] | ||
ahash = { version = "0.8", default-features = false, features = ["runtime-rng"] } | ||
arrow = { version = "31.0.0", features = ["prettyprint", "dyn_cmp_dict"] } | ||
arrow = { version = "31.0.0", features = ["prettyprint"] } | ||
arrow-buffer = "31.0.0" | ||
arrow-schema = "31.0.0" | ||
blake2 = { version = "^0.10.2", optional = true } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't see a compelling reason why this test needed to test comparison of dictionary columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the partitioning columns in a ListingTable are dictionary encoded and this test is verifying the encoding. I think we should put the test back (and gate it with a
#cfg
directive)