Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support selectively reading parts of nested column in Parquet reader #8850

Closed
devavret opened this issue Jul 26, 2021 · 0 comments · Fixed by #8933
Closed

[FEA] Support selectively reading parts of nested column in Parquet reader #8850

devavret opened this issue Jul 26, 2021 · 0 comments · Fixed by #8933
Assignees
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@devavret
Copy link
Contributor

Similar to #8848 , we should also allow nested column pruning in parquet reader. It has also been requested here: #7248 (comment)

@devavret devavret added feature request New feature or request Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Jul 26, 2021
@devavret devavret self-assigned this Jul 26, 2021
@beckernick beckernick removed the Needs Triage Need team to review and classify label Jul 26, 2021
@beckernick beckernick added this to the IO Data Type Expansion milestone Jul 26, 2021
rapids-bot bot pushed a commit that referenced this issue Aug 19, 2021
Closes #8850 

Adds ability to select specific children of a nested column. The python API mimics pyarrow and the format is
```python
cudf.read_parquet("test.parquet", columns=["struct1.child1.grandchild2", "struct1.child2"])
```
The C++ API takes each path as a vector
```c++
cudf::io::parquet_reader_options read_args =
  cudf::io::parquet_reader_options::builder(cudf::io::source_info(filepath))
    .columns({{"struct1", "child1", "grandchild2"},
              {"struct1", "child2"}});
```

Authors:
  - Devavret Makkar (https://github.com/devavret)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Christopher Harris (https://github.com/cwharris)

URL: #8933
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants