Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework ipc_compression feature flags #1

Draft
wants to merge 129 commits into
base: flight_data_compression
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
3505afa
Increase test coverage of ArrowWriter (#2220)
tustvold Jul 29, 2022
588f408
parquet: export json api with `serde_json` feature name (#2209)
flisky Jul 29, 2022
393f006
Update instructions on how to join the Slack channel (#2219)
HaoYang670 Jul 29, 2022
41d96b2
Add Builder style config objects for object_store (#2204)
alamb Jul 29, 2022
561b14c
Add append_option support to decimal builders (#2225)
Jul 29, 2022
bedeb4f
Only trigger `arrow` CI on changes to arrow (#2227)
alamb Jul 29, 2022
d727618
Rename DataType::Decimal to DataType::Decimal128 (#2229)
viirya Jul 30, 2022
ca43719
Disable value validation for decimal256 case (#2232)
viirya Jul 30, 2022
f41fb1c
Move `FixedSizeBinaryArray` to `array_fixed_size_binary.rs` (#2218)
HaoYang670 Jul 30, 2022
281cd79
Fix max and min value for decimal256 (#2245)
viirya Jul 31, 2022
6c3f9a2
Add LimitStore (#2175) (#2242)
tustvold Jul 31, 2022
99ad915
Automatically grow parquet BitWriter (#2226) (~10% faster) (#2231)
tustvold Jul 31, 2022
3032a52
Make `Schema::fields` and `Schema::metadata` `pub` (#2239)
alamb Jul 31, 2022
b879977
move fixed size list to a seperate file (#2250)
HaoYang670 Aug 1, 2022
42b15a8
Add tests for nested decimal arrays (#2254)
tustvold Aug 1, 2022
2c09ba4
Optimized writing of byte array to parquet (#1764) (2x faster) (#2221)
tustvold Aug 1, 2022
d4f038a
Update `IntervalMonthDayNanoType::make_value()` to conform to specifi…
Aug 1, 2022
b4fa47d
Use initial capacity for interner (#2272)
Dandandan Aug 1, 2022
cd45ecb
Update prost and tonic related crates (#2268)
carols10cents Aug 2, 2022
bde749e
Impl FromIterator for Decimal256Array (#2247)
viirya Aug 2, 2022
58dc611
Fix bugs in the `from_list` function. (#2277)
HaoYang670 Aug 2, 2022
ed9fc56
fix: use signed comparator to compare decimal128 and decimal256 (#2275)
liukun4515 Aug 2, 2022
9a4b1c9
feat: Implement string cast operations for Time32 and Time64 (#2251)
stuartcarnie Aug 2, 2022
4222f5a
Remove fallibility from RLEEncoder (#2226) (#2259)
tustvold Aug 2, 2022
ad65e88
Handle symlinks in LocalFileSystem (#2206) (#2269)
tustvold Aug 2, 2022
6bb4b5e
[Minor] Add tests for temporal cast error paths (#2283)
alamb Aug 2, 2022
3e17891
Improve `object_store crate` documentation (#2260)
alamb Aug 2, 2022
6b2c757
Fix fmt + Mac CI jobs (#2287)
alamb Aug 2, 2022
1f9973c
Separate ArrayReader::next_batch with read_records and consume_batch …
Ted-Jiang Aug 3, 2022
ec83638
Reduce duplication and bounds checks in cast kernels (#2284)
alamb Aug 3, 2022
577a93b
Fix Coverage and Windows builds by installing protoc (#2280)
alamb Aug 3, 2022
b826162
Improve Schema metadata mismatch error (#2238)
alamb Aug 3, 2022
299908e
Retry GCP requests on server error (#2243)
tustvold Aug 3, 2022
1cc8563
Replace the `fn get_data_type` by `const DATA_TYPE` in BinaryArray an…
HaoYang670 Aug 3, 2022
22185fd
Improve types shown in cast error messages (#2295)
alamb Aug 3, 2022
f78d2e6
Fix escaped like wildcards in `like_utf8` / `nlike_utf8` kernels (#2258)
daniel-martinez-maqueda-sap Aug 3, 2022
2cf4cd8
More docs (#2305)
tustvold Aug 3, 2022
8a092e3
Add unpack8, unpack16, unpack64 (#2276) ~10-50% faster (#2278)
tustvold Aug 3, 2022
f40403f
Remove test_utils from default features (#2298) (#2299)
tustvold Aug 3, 2022
e835853
Move with_precision_and_scale to trait (#2292)
viirya Aug 3, 2022
d56d88e
fix: IPC writer should truncate string array with all empty string (#…
JasonLi-cn Aug 4, 2022
4b15b7e
Speedup take_bits (#2307)
Dandandan Aug 4, 2022
d87f6a4
Make FFI support optional, change APIs to be `safe` (#2302) (#2303)
tustvold Aug 4, 2022
5166a08
CI: Only run coverage jobs on master (#2214)
alamb Aug 4, 2022
2683b06
Remove JsonEqual (#2317)
viirya Aug 5, 2022
0af81e8
Pass pull `Request<FlightDescriptor>` to `FlightSqlService` `impl`s …
Aug 5, 2022
297a8fa
Increase default DeltaBitPackEncoder block size (#2282) (#2319)
tustvold Aug 5, 2022
8e30d06
[Minor] Improve arrow and parquet READMEs, document parquet feature f…
alamb Aug 5, 2022
b6eaf22
fix: Fix skip error in calculate_row_count. (#2329)
Ted-Jiang Aug 5, 2022
4a3919b
Add typed dictionary (#2136) (#2297)
tustvold Aug 5, 2022
6859efa
Misc cleanup (#2330)
tustvold Aug 5, 2022
b8fd432
Don't hydrate string dictionaries when writing to parquet (#1764) (#2…
tustvold Aug 5, 2022
87b19f8
MINOR: remove unused comment_bot.yml from CI scripts (#2334)
alamb Aug 5, 2022
50d1e5f
MINOR: make capitalization of CI jobs consistent (#2333)
alamb Aug 5, 2022
3ed0e28
temporal conversion functions should work on negative input properly …
viirya Aug 5, 2022
b1e2bd9
more const evaluations for list array (#2327)
HaoYang670 Aug 5, 2022
9fde389
Update version to `20.0.0` and update `CHANGELOG` (#2323)
alamb Aug 5, 2022
83fab37
Remove vestigal ` object_store/.circleci/` (#2337)
alamb Aug 5, 2022
30c94db
Fix cargo publish (#2340)
alamb Aug 5, 2022
38764c2
Make skip_records in complex_object_array can skip cross row groups (…
Ted-Jiang Aug 6, 2022
4287c0f
Rework ipc_compression feature flags and plumb through errors
alamb Aug 7, 2022
45a5389
fixup flags in reader
alamb Aug 7, 2022
3bd610d
Make stub interface
alamb Aug 7, 2022
5676c6e
Integrate Record Skipping into Column Reader Fuzz Test (#2315)
Ted-Jiang Aug 7, 2022
a2de363
Fix Copy from percent-encoded path (#2353) (#2354)
tustvold Aug 7, 2022
5fae299
Add API to change timezone for timestamp array (#2347)
viirya Aug 7, 2022
f3baeaa
Make ring optional dependency and cleanup tests (#2344)
tustvold Aug 8, 2022
ce2bd1e
Combine multiple selections into the same batch size in skip_records …
Ted-Jiang Aug 8, 2022
0c828a9
Relax path validation (#2355) (#2356)
tustvold Aug 8, 2022
9a630a1
Add ObjectStore::get_ranges (#2293) (#2336)
tustvold Aug 8, 2022
f7b1803
Compiles without ipc_compression support
alamb Aug 8, 2022
201de6e
Fix tests
alamb Aug 8, 2022
0c296ab
refactor: Group metrics into page and column metrics structs (#2363)
Aug 8, 2022
c32f6e1
Clean up writing
alamb Aug 8, 2022
0b407e8
use uniform flag syntax
alamb Aug 8, 2022
37504da
fix flags
alamb Aug 8, 2022
e2456f5
Rename for clarity
alamb Aug 8, 2022
5ab5afd
fix compilation
alamb Aug 8, 2022
e5d9747
Add ipc_compression tests to IC
alamb Aug 8, 2022
3f5ab6b
Remove get_byte_ranges where bound (#2366)
tustvold Aug 8, 2022
8ab3d39
fix: clippy
alamb Aug 8, 2022
3b7f94a
Merge remote-tracking branch 'apache/master' into alamb/help_feature_…
alamb Aug 8, 2022
21eb68d
merge-confligts
alamb Aug 8, 2022
443d7fb
Add note in doc
alamb Aug 8, 2022
beaef5c
Fix object_store lint (#2367)
tustvold Aug 8, 2022
3264d7d
Remove deprecated ParquetWriter (#2380)
tustvold Aug 8, 2022
613b99d
Remove sliceable cursor (#2378)
tustvold Aug 8, 2022
80a6ef7
Fix parquet clippy lints (#1254) (#2377)
tustvold Aug 8, 2022
56f7904
refactor: Make read_num_bytes a function instead of a macro (#2364)
Aug 9, 2022
77c814c
Rewrite `Decimal` and `DecimalArray` using `const_generic` (#2383)
HaoYang670 Aug 9, 2022
b55e3b1
Canonicalize filesystem paths in user-facing APIs (#2370) (#2371)
tustvold Aug 9, 2022
31560a3
Clean up (#2389)
viirya Aug 9, 2022
630506e
Cast between `Decimal128` and `Decimal256` arrays (#2376)
viirya Aug 10, 2022
195d9c5
object_store: Update version to `0.4.0`, initial release scripts, CHA…
alamb Aug 10, 2022
d4ad4b7
Clean the code in `field.rs` and add more tests (#2345)
HaoYang670 Aug 10, 2022
53dd5aa
Implement AsyncFileReader for `Box<dyn AsyncFileReader>` (#2368)
tustvold Aug 10, 2022
27f4762
Remove dead code in verify_release_candidate (#2398)
alamb Aug 10, 2022
c275c5e
Fix DoPutUpdateResult (#2404)
Aug 11, 2022
6b1369e
Tweak object_store changelog (#2400)
tustvold Aug 11, 2022
6948373
fix: Don't instantiate the scalar composition code quadratically for …
Aug 11, 2022
4c0380c
Use correct tags when generating changelogs, fix release tarball typo…
tustvold Aug 11, 2022
b6b1ffd
Add comments to changelog generator script (#2412)
alamb Aug 11, 2022
5127490
Decouple parquet fuzz tests from converter (#1661) (#2386)
tustvold Aug 11, 2022
a90daee
add test for reading decimal value from primitive array reader (#2411)
liukun4515 Aug 11, 2022
4481993
Fix clippy lints (#2414) (#2415)
tustvold Aug 11, 2022
21ba02e
Add Parquet RowFilter API (#2335)
tustvold Aug 11, 2022
43c6eb7
Improve markdown format CI check to show diff (#2399)
alamb Aug 11, 2022
961cd2a
Upgrade ahash to 0.8 (#2410)
Dandandan Aug 11, 2022
b30d363
Fix #2416 Automatic version updates for github actions with dependabo…
iemejia Aug 11, 2022
e7dcfbc
Bump actions/checkout from 2 to 3 (#2421)
dependabot[bot] Aug 11, 2022
b235173
Speed up `Decimal256` validation based on bytes comparison and add be…
liukun4515 Aug 12, 2022
f06c0f9
Bump actions/labeler from 2.2.0 to 4.0.0 (#2420)
dependabot[bot] Aug 12, 2022
6919606
Bump actions/setup-python from 1 to 4 (#2419)
dependabot[bot] Aug 12, 2022
42e1068
Bump actions/setup-node from 2 to 3 (#2418)
dependabot[bot] Aug 12, 2022
d11b388
Implement Skip for DeltaBitPackDecoder (#2393)
Ted-Jiang Aug 12, 2022
ee2818e
Support peek_next_page and skip_next_page in InMemoryPageReader (#2407)
Ted-Jiang Aug 12, 2022
0e97491
refine validation for decimal128 array (#2428)
liukun4515 Aug 12, 2022
0c3c686
Make the API of `fn Decimal:new` be consistent with `fn Decimal:try_n…
HaoYang670 Aug 12, 2022
f60841c
fix (#2432)
HaoYang670 Aug 12, 2022
3db3d54
Remove redundant dev dependencies
alamb Aug 13, 2022
bef901d
improve variable name
alamb Aug 13, 2022
8d1c50d
Apply suggestions from code review
alamb Aug 13, 2022
ee41c32
improve comment in stub.rs
alamb Aug 13, 2022
76a31c1
Merge remote-tracking branch 'apache/master' into alamb/help_feature_…
alamb Aug 13, 2022
c78dd22
Fix for new clippy
alamb Aug 13, 2022
c51c8cc
Clean up clippy
alamb Aug 13, 2022
2ed7ce3
Clean up header writing
alamb Aug 13, 2022
257c3b4
Merge branch 'alamb/help_feature_flags' of github.com:alamb/arrow-rs …
alamb Aug 13, 2022
4f59de4
fmt
alamb Aug 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Impl FromIterator for Decimal256Array (apache#2247)
* Add FromIterator

* For review
  • Loading branch information
viirya authored Aug 2, 2022
commit bde749ee02af4ca0ba023c84dea263bf2c8a5078
124 changes: 103 additions & 21 deletions arrow/src/array/array_decimal.rs
Original file line number Diff line number Diff line change
@@ -16,6 +16,7 @@
// under the License.

use crate::array::{ArrayAccessor, Decimal128Iter, Decimal256Iter};
use num::BigInt;
use std::borrow::Borrow;
use std::convert::From;
use std::fmt;
@@ -27,8 +28,10 @@ use super::{
use super::{BooleanBufferBuilder, FixedSizeBinaryArray};
#[allow(deprecated)]
pub use crate::array::DecimalIter;
use crate::buffer::Buffer;
use crate::datatypes::{validate_decimal_precision, DECIMAL_DEFAULT_SCALE};
use crate::buffer::{Buffer, MutableBuffer};
use crate::datatypes::{
validate_decimal_precision, DECIMAL256_MAX_PRECISION, DECIMAL_DEFAULT_SCALE,
};
use crate::datatypes::{DataType, DECIMAL128_MAX_PRECISION, DECIMAL128_MAX_SCALE};
use crate::error::{ArrowError, Result};
use crate::util::decimal::{BasicDecimal, Decimal128, Decimal256};
@@ -91,6 +94,7 @@ pub trait BasicDecimalArray<T: BasicDecimal, U: From<ArrayData>>:
private_decimal::DecimalArrayPrivate
{
const VALUE_LENGTH: i32;
const DEFAULT_TYPE: DataType;

fn data(&self) -> &ArrayData;

@@ -219,10 +223,17 @@ pub trait BasicDecimalArray<T: BasicDecimal, U: From<ArrayData>>:
let array_data = unsafe { builder.build_unchecked() };
U::from(array_data)
}

/// The default precision and scale used when not specified.
fn default_type() -> DataType {
Self::DEFAULT_TYPE
}
}

impl BasicDecimalArray<Decimal128, Decimal128Array> for Decimal128Array {
const VALUE_LENGTH: i32 = 16;
const DEFAULT_TYPE: DataType =
DataType::Decimal128(DECIMAL128_MAX_PRECISION, DECIMAL_DEFAULT_SCALE);

fn data(&self) -> &ArrayData {
&self.data
@@ -239,6 +250,8 @@ impl BasicDecimalArray<Decimal128, Decimal128Array> for Decimal128Array {

impl BasicDecimalArray<Decimal256, Decimal256Array> for Decimal256Array {
const VALUE_LENGTH: i32 = 32;
const DEFAULT_TYPE: DataType =
DataType::Decimal256(DECIMAL256_MAX_PRECISION, DECIMAL_DEFAULT_SCALE);

fn data(&self) -> &ArrayData {
&self.data
@@ -324,12 +337,6 @@ impl Decimal128Array {
self.data = self.data.with_data_type(new_data_type);
Ok(self)
}

/// The default precision and scale used when not specified.
pub fn default_type() -> DataType {
// Keep maximum precision
DataType::Decimal128(DECIMAL128_MAX_PRECISION, DECIMAL_DEFAULT_SCALE)
}
}

impl From<ArrayData> for Decimal128Array {
@@ -384,6 +391,59 @@ impl<'a> Decimal128Array {
}
}

impl From<BigInt> for Decimal256 {
fn from(bigint: BigInt) -> Self {
Decimal256::from_big_int(&bigint, DECIMAL256_MAX_PRECISION, DECIMAL_DEFAULT_SCALE)
.unwrap()
}
}

fn build_decimal_array_from<U: BasicDecimalArray<T, U>, T>(
null_buf: BooleanBufferBuilder,
buffer: Buffer,
) -> U
where
T: BasicDecimal,
U: From<ArrayData>,
{
let data = unsafe {
ArrayData::new_unchecked(
U::default_type(),
null_buf.len(),
None,
Some(null_buf.into()),
0,
vec![buffer],
vec![],
)
};
U::from(data)
}

impl<Ptr: Into<Decimal256>> FromIterator<Option<Ptr>> for Decimal256Array {
fn from_iter<I: IntoIterator<Item = Option<Ptr>>>(iter: I) -> Self {
let iter = iter.into_iter();
let (lower, upper) = iter.size_hint();
let size_hint = upper.unwrap_or(lower);

let mut null_buf = BooleanBufferBuilder::new(size_hint);

let mut buffer = MutableBuffer::with_capacity(size_hint);

iter.for_each(|item| {
if let Some(a) = item {
null_buf.append(true);
buffer.extend_from_slice(Into::into(a).raw_value());
} else {
null_buf.append(false);
buffer.extend_zeros(32);
}
});

build_decimal_array_from::<Decimal256Array, _>(null_buf, buffer.into())
}
}

impl<Ptr: Borrow<Option<i128>>> FromIterator<Ptr> for Decimal128Array {
fn from_iter<I: IntoIterator<Item = Ptr>>(iter: I) -> Self {
let iter = iter.into_iter();
@@ -405,18 +465,7 @@ impl<Ptr: Borrow<Option<i128>>> FromIterator<Ptr> for Decimal128Array {
})
.collect();

let data = unsafe {
ArrayData::new_unchecked(
Self::default_type(),
null_buf.len(),
None,
Some(null_buf.into()),
0,
vec![buffer],
vec![],
)
};
Decimal128Array::from(data)
build_decimal_array_from::<Decimal128Array, _>(null_buf, buffer)
}
}

@@ -794,7 +843,6 @@ mod tests {

#[test]
fn test_decimal256_iter() {
// TODO: Impl FromIterator for Decimal256Array
let mut builder = Decimal256Builder::new(30, 76, 6);
let value = BigInt::from_str_radix("12345", 10).unwrap();
let decimal1 = Decimal256::from_big_int(&value, 76, 6).unwrap();
@@ -811,4 +859,38 @@ mod tests {
let collected: Vec<_> = array.iter().collect();
assert_eq!(vec![Some(decimal1), None, Some(decimal2)], collected);
}

#[test]
fn test_from_iter_decimal256array() {
let value1 = BigInt::from_str_radix("12345", 10).unwrap();
let value2 = BigInt::from_str_radix("56789", 10).unwrap();

let array: Decimal256Array =
vec![Some(value1.clone()), None, Some(value2.clone())]
.into_iter()
.collect();
assert_eq!(array.len(), 3);
assert_eq!(array.data_type(), &DataType::Decimal256(76, 10));
assert_eq!(
Decimal256::from_big_int(
&value1,
DECIMAL256_MAX_PRECISION,
DECIMAL_DEFAULT_SCALE
)
.unwrap(),
array.value(0)
);
assert!(!array.is_null(0));
assert!(array.is_null(1));
assert_eq!(
Decimal256::from_big_int(
&value2,
DECIMAL256_MAX_PRECISION,
DECIMAL_DEFAULT_SCALE
)
.unwrap(),
array.value(2)
);
assert!(!array.is_null(2));
}
}