Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Add support for arrow::ArrayStatistics: zero-copy types #43983

Closed
Tracked by #43549
kou opened this issue Sep 6, 2024 · 1 comment
Closed
Tracked by #43549

[C++][Parquet] Add support for arrow::ArrayStatistics: zero-copy types #43983

kou opened this issue Sep 6, 2024 · 1 comment

Comments

@kou
Copy link
Member

kou commented Sep 6, 2024

Describe the enhancement requested

Subissue of #43549

For types of

std::shared_ptr<Array> TransferZeroCopy(RecordReader* reader,
const std::shared_ptr<Field>& field) {
std::shared_ptr<::arrow::ArrayData> data;
if (field->nullable()) {
std::vector<std::shared_ptr<Buffer>> buffers = {reader->ReleaseIsValid(),
reader->ReleaseValues()};
data = std::make_shared<::arrow::ArrayData>(field->type(), reader->values_written(),
std::move(buffers), reader->null_count());
} else {
std::vector<std::shared_ptr<Buffer>> buffers = {nullptr, reader->ReleaseValues()};
data = std::make_shared<::arrow::ArrayData>(field->type(), reader->values_written(),
std::move(buffers), /*null_count=*/0);
}
return ::arrow::MakeArray(data);
}

case ::arrow::Type::INT32:
case ::arrow::Type::INT64:
case ::arrow::Type::FLOAT:
case ::arrow::Type::DOUBLE:
result = TransferZeroCopy(reader, value_field);

  • Int32
  • Int64
  • Float
  • Double

case ::arrow::Type::TIMESTAMP: {
const ::arrow::TimestampType& timestamp_type =
checked_cast<::arrow::TimestampType&>(*value_field->type());
if (descr->physical_type() == ::parquet::Type::INT96) {
RETURN_NOT_OK(
TransferInt96(reader, pool, value_field, &result, timestamp_type.unit()));
} else {
switch (timestamp_type.unit()) {
case ::arrow::TimeUnit::MILLI:
case ::arrow::TimeUnit::MICRO:
case ::arrow::TimeUnit::NANO:
result = TransferZeroCopy(reader, value_field);

  • Timestamp[milli]
  • Timestamp[micro]
  • Timestamp[nano]

Component(s)

C++, Parquet

kou added a commit to kou/arrow that referenced this issue Sep 6, 2024
kou added a commit that referenced this issue Sep 8, 2024
…-copy types (#43984)

### Rationale for this change

Statistics is useful for fast processing.

Target types:

* `Int32`
* `Int64`
* `Float`
* `Double`
* `Timestamp[milli]`
* `Timestamp[micro]`
* `Timestamp[nano]`

### What changes are included in this PR?

Map `ColumnChunkMetaData` information to `arrow::ArrayStatistics`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: #43983

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@kou
Copy link
Member Author

kou commented Sep 8, 2024

Issue resolved by pull request 43984
#43984

@kou kou added this to the 18.0.0 milestone Sep 8, 2024
@kou kou closed this as completed Sep 8, 2024
khwilson pushed a commit to khwilson/arrow that referenced this issue Sep 14, 2024
…: zero-copy types (apache#43984)

### Rationale for this change

Statistics is useful for fast processing.

Target types:

* `Int32`
* `Int64`
* `Float`
* `Double`
* `Timestamp[milli]`
* `Timestamp[micro]`
* `Timestamp[nano]`

### What changes are included in this PR?

Map `ColumnChunkMetaData` information to `arrow::ArrayStatistics`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: apache#43983

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant