-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support skip_values in ColumnValueDecoderImpl #2089
Conversation
fn values_left(&self) -> usize { | ||
self.values_left | ||
} | ||
|
||
fn encoding(&self) -> Encoding { | ||
Encoding::DELTA_BINARY_PACKED | ||
} | ||
|
||
fn skip(&mut self, num_values: usize) -> Result<usize> { | ||
let mut buffer = vec![T::T::default(); num_values]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to kind of defeat the purpose but not sure how to skip delta encoded values without it....
@@ -922,6 +974,11 @@ impl<T: DataType> Decoder<T> for DeltaByteArrayDecoder<T> { | |||
fn encoding(&self) -> Encoding { | |||
Encoding::DELTA_BYTE_ARRAY | |||
} | |||
|
|||
fn skip(&mut self, num_values: usize) -> Result<usize> { | |||
let mut buffer = vec![T::T::default(); num_values]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Codecov Report
@@ Coverage Diff @@
## master #2089 +/- ##
==========================================
+ Coverage 83.70% 83.73% +0.03%
==========================================
Files 224 224
Lines 58958 59391 +433
==========================================
+ Hits 49350 49734 +384
- Misses 9608 9657 +49
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, only some minor nits
tonic::Response<tonic::codec::Streaming<super::HandshakeResponse>>, | ||
tonic::Status, | ||
> { | ||
tonic::Response<tonic::codec::Streaming<super::HandshakeResponse>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you have an out of date lockfile, which is resulting in these changes??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was it. Thanks!
arrow/Cargo.toml
Outdated
@@ -41,7 +41,7 @@ bench = false | |||
ahash = { version = "0.7", default-features = false } | |||
serde = { version = "1.0", default-features = false } | |||
serde_derive = { version = "1.0", default-features = false } | |||
serde_json = { version = "1.0", default-features = false, features = ["preserve_order"] } | |||
serde_json = { version = "1.0", default-features = false, features = ["preserve_order","std"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary if you update your lockfile
parquet/src/data_type.rs
Outdated
let bit_reader = decoder.bit_reader.as_mut().unwrap(); | ||
let num_values = std::cmp::min(num_values, decoder.num_values); | ||
let mut buffer = vec![false; num_values]; | ||
let values_read = bit_reader.get_batch(&mut buffer[..num_values], 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BitReader::skip?
parquet/src/data_type.rs
Outdated
#[inline] | ||
fn skip(decoder: &mut PlainDecoderDetails, num_values: usize) -> Result<usize> { | ||
let data = decoder.data.as_ref().expect("set_data should have been called"); | ||
let num_values = std::cmp::min(num_values, decoder.num_values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let num_values = std::cmp::min(num_values, decoder.num_values); | |
let num_values = num_values.min(decoder.num_values); |
parquet/src/data_type.rs
Outdated
.data | ||
.as_mut() | ||
.expect("set_data should have been called"); | ||
let num_values = std::cmp::min(num_values, decoder.num_values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let num_values = std::cmp::min(num_values, decoder.num_values); | |
let num_values = num_values.min(decoder.num_values); |
parquet/src/encodings/rle.rs
Outdated
"eee", "fff", "ddd", "eee", "fff", "eee", "fff", | ||
"fff", | ||
]; | ||
let skipped = decoder.skip(4).expect("skipping two values"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let skipped = decoder.skip(4).expect("skipping two values"); | |
let skipped = decoder.skip(4).expect("skipping four values"); |
parquet/src/util/bit_util.rs
Outdated
|
||
let mut values_skipped = 0; | ||
|
||
if num_bits > 32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this special case for skip, this exists in get_batch because we only have unpack32 and not unpack64
154982e
to
39a872f
Compare
parquet/src/encodings/decoding.rs
Outdated
@@ -355,6 +362,8 @@ impl<T: DataType> Decoder<T> for DictDecoder<T> { | |||
rle.get_batch_with_dict(&self.dictionary[..], buffer, num_values) | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: useless space.
@@ -291,6 +294,10 @@ impl<T: DataType> Decoder<T> for PlainDecoder<T> { | |||
fn get(&mut self, buffer: &mut [T::T]) -> Result<usize> { | |||
T::T::decode(buffer, &mut self.inner) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set #[inline]
get better perfomance ?🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because i fond encoding
set_data
all with inline
Nice job! 🥳 |
Co-authored-by: Yang Jiang <[email protected]>
Benchmark runs are scheduled for baseline = 13adaa7 and contender = c585544. c585544 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2078
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?