Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add list_scalar API #7584

Merged
merged 17 commits into from
Apr 22, 2021
Merged

Add list_scalar API #7584

merged 17 commits into from
Apr 22, 2021

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Mar 12, 2021

Closes #5887

This PR adds list_scalar, a scalar for a LIST type column. A list_scalar contains a validity field and a _data field. The validity field describe whether the list_scalar is valid as a whole, and is accessible via derived is_valid member function. The _data field is a cudf::column, owned by the scalar class. The data column can be either nested or non-nested type, depending on what LIST column it originate from.

For example, a scalar from a List<int> column contains a int column:

C = List<int> column: {{1, 2, 3}, {4}, {5, 6}}
C[0] = int column: {1, 2, 3}

A scalar from a List<List<int>> column would contain a List<int> column instead:

C = List<List<int>> column: {{{1, 2}, {}}, {{3}}, {{4, 5}, {6}}}
C[2] = List<int> column: {{4, 5}, {6}}

list_scalar has a view method, which returns a immutable view to the underlying data.

@isVoid isVoid added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Mar 12, 2021
@isVoid isVoid marked this pull request as ready for review March 12, 2021 21:55
@isVoid isVoid requested a review from a team as a code owner March 12, 2021 21:55
@isVoid isVoid requested review from trxcllnt and jrhemstad March 12, 2021 21:55
@codecov
Copy link

codecov bot commented Mar 13, 2021

Codecov Report

Merging #7584 (eaa8024) into branch-0.20 (51336df) will increase coverage by 0.02%.
The diff coverage is 87.91%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.20    #7584      +/-   ##
===============================================
+ Coverage        82.88%   82.91%   +0.02%     
===============================================
  Files              103      103              
  Lines            17668    17658      -10     
===============================================
- Hits             14645    14642       -3     
+ Misses            3023     3016       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/__init__.py 100.00% <ø> (ø)
python/cudf/cudf/utils/cudautils.py 57.75% <25.00%> (ø)
python/cudf/cudf/core/column/column.py 88.64% <71.42%> (ø)
python/cudf/cudf/core/column/numerical.py 94.43% <72.72%> (ø)
python/cudf/cudf/core/dataframe.py 90.87% <83.33%> (+0.01%) ⬆️
python/cudf/cudf/utils/utils.py 89.53% <91.66%> (+0.02%) ⬆️
python/cudf/cudf/core/column/datetime.py 89.91% <100.00%> (ø)
python/cudf/cudf/core/column/timedelta.py 88.66% <100.00%> (ø)
python/cudf/cudf/core/groupby/groupby.py 92.28% <100.00%> (+0.83%) ⬆️
python/cudf/cudf/core/index.py 93.07% <100.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8baed5...eaa8024. Read the comment docs.

cpp/include/cudf/scalar/scalar.hpp Show resolved Hide resolved
cpp/src/scalar/scalar.cpp Outdated Show resolved Hide resolved
@isVoid isVoid self-assigned this Mar 22, 2021
cpp/include/cudf/scalar/scalar.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/scalar/scalar.hpp Show resolved Hide resolved
cpp/include/cudf/scalar/scalar.hpp Outdated Show resolved Hide resolved
@isVoid isVoid requested review from devavret and harrism and removed request for trxcllnt and jrhemstad March 22, 2021 21:56
cpp/include/cudf/types.hpp Outdated Show resolved Hide resolved
@isVoid isVoid requested a review from ttnghia March 22, 2021 22:30
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new class needs to be tested...

cpp/include/cudf/scalar/scalar.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/scalar/scalar.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/scalar/scalar.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/scalar/scalar.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/types.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/scalar/scalar.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/types.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@devavret devavret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed my comments because they were a subset of @harrism 's review comments.

  1. No need for two separate view() and value() methods. This will probably not return a host usable value like the rest of the scalar classes. Any such member can be added later when needed.
  2. I'd vote against having validity as an enum.

@devavret
Copy link
Contributor

Is this meant to support a list of list case? If so, you might want to use list_device_view. @mythrocks would be able to help explain.

@isVoid
Copy link
Contributor Author

isVoid commented Mar 23, 2021

@devavret yes this should support nested lists case. Would love to have more clarification of list_view and list_device_view.

@isVoid
Copy link
Contributor Author

isVoid commented Mar 23, 2021

Addressing @devavret and @jrhemstad 's comment, this PR will not introduce an is_valid enum class. Addressing @harrism 's comment, value and view is combined into a single view function, that returns a list_view. Would like to have clarification for:

  1. Should a function that returns list_device_view be added?
  2. For testing the scalar data equality - I would propose a _data field accessor for list_view that returns a column_view, and resort to expect_columns_equal to check for data equality.

@devavret
Copy link
Contributor

  1. Should a function that returns list_device_view be added?

After an offline discussion, we figured we should return a column_view for now and hopefully revisit when list_view is more fleshed out.

@harrism
Copy link
Member

harrism commented Mar 23, 2021

After an offline discussion, we figured we should return a column_view for now and hopefully revisit when list_view is more fleshed out.

We should be careful and not rush this. I don't understand all the details, but rushing this could result in painful churn, so let's figure out what the right thing to do is. If figuring that out is too big a task, then we can consider a halfway solution.

@isVoid isVoid changed the base branch from branch-0.19 to branch-0.20 April 13, 2021 20:17
@harrism
Copy link
Member

harrism commented Apr 22, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b733082 into rapidsai:branch-0.20 Apr 22, 2021
rapids-bot bot pushed a commit that referenced this pull request Apr 27, 2021
…ssed on to construct validity data (#8072)

Follow up of #7584 

This PR fixes a bug when constructing `list_scalar`, `stream` and `mr` was not passed on to construct validity data in constructor. Currently only default stream and memory resource is used.

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Mark Harris (https://github.com/harrism)

URL: #8072
rapids-bot bot pushed a commit that referenced this pull request Apr 29, 2021
This PR is to add the JNI support for scalar of list, along with building a ColumnVector from a list scalar.

Since the PR #7584 inroduced the list scalar in cpp.

Signed-off-by: Firestarman <[email protected]>

Authors:
  - Liangcai Li (https://github.com/firestarman)

Approvers:
  - Jason Lowe (https://github.com/jlowe)

URL: #8077
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add scalar support for lists
6 participants