Implement per-list sequence #9839

ttnghia · 2021-12-03T16:37:16Z

This PR adds lists::sequences API, allowing to generate per-list sequence. In particular, it allows generating a lists column in which each list is a sequence of numbers/durations. These sequences are generated individually from separate sets of (start, step, size) input values.

Closes #9424.

Note: lists::sequences supports only numeric types (integer types + floating-point types) and duration types.

vuule · 2021-12-04T02:06:55Z

rerun tests

cpp/include/cudf/lists/filling.hpp

cpp/src/lists/sequences.cu

wbo4958 · 2021-12-14T10:15:00Z

Hi @ttnghia

Looks like when size has some negative values, the sequences result will have some issues.

ColumnVector start = ColumnVector.fromBoxedInts(1, 2, 3, 4, 5, 6);
ColumnVector size = ColumnVector.fromBoxedInts(2, 7, -4, 2, 5, 2);

sequences(start, size) will get below result, the second row is not correct

1 2 
2 3 4 4 5 5 6 
null
4 5 
5 6 7 8 9 
6 7

wbo4958 · 2021-12-14T10:57:32Z

BTW, When I tested the sequences with float start and float-step. The result returned from CUDF seems to be INT32 type and the result seems not correct.

float[] x = {1.2f, 2.2f, 3.3f};
Integer[] sizeV = new Integer[]{1, 2, 3};
ColumnVector start = ColumnVector.fromFloats(x);
ColumnVector size = ColumnVector.fromBoxedInts(sizeV);
ColumnVector step = ColumnVector.fromFloats(x);

The output is like that

1 2 
2 3 4 5 6 7 8 
3 4 5 6 
4 5 
5 6 7 8 9 
6 7

ttnghia · 2021-12-14T13:00:25Z

Hi @ttnghia

Looks like when size has some negative values, the sequences result will have some issues.

Yes, the API explicitly says that "if the input size is negative then output is undefined". Thus, the caller must pass in size column containing values that are at least 0.

ttnghia · 2021-12-14T13:18:52Z

BTW, When I tested the sequences with float start and float-step. The result returned from CUDF seems to be INT32 type and the result seems not correct.

Sorry I can't reproduce your issue. I ran your example and get this:

List<float>:
Length : 3
Offsets : 0, 1, 3, 6
   1.20000005, 2.20000005, 4.4000001, 3.29999995, 6.5999999, 9.89999962

So maybe something else is wrong with your test?

wbo4958 · 2021-12-15T00:24:35Z

Okay, Got it. Thx for the explanation

wbo4958 · 2021-12-15T00:25:06Z

BTW, When I tested the sequences with float start and float-step. The result returned from CUDF seems to be INT32 type and the result seems not correct.

Sorry I can't reproduce your issue. I ran your example and get this:
List<float>:
Length : 3
Offsets : 0, 1, 3, 6
   1.20000005, 2.20000005, 4.4000001, 3.29999995, 6.5999999, 9.89999962
So maybe something else is wrong with your test?

Sorry, it turned out my test was wrong.

cpp/src/lists/sequences.cu

ttnghia · 2022-01-04T22:09:40Z

@gpucibot merge

ttnghia added 10 commits December 1, 2021 08:53

Fix doxygen

38349a6

Add header

568767a

Add implementation for numeric-not-bool types

3c52945

Fix doxygen

a1148d1

Output zero size for null lists

a0dfb6a

Implement tests

75d097a

Fix tabulator for duration types

5feaa87

Fix duration type step

8aef416

Add basic unit tests

f2ee956

Complete unit tests

537d51d

ttnghia added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Dec 3, 2021

ttnghia requested review from vyasr and nvdbaranec December 3, 2021 16:37

ttnghia self-assigned this Dec 3, 2021

github-actions bot added the CMake CMake build issue label Dec 3, 2021

ttnghia added 2 commits December 3, 2021 09:42

Add more unit test

dbd6a6c

Rewrite doxygen

5b10089

ttnghia marked this pull request as ready for review December 3, 2021 20:39

ttnghia requested review from a team as code owners December 3, 2021 20:39

rapidsai deleted a comment from codecov bot Dec 3, 2021

ttnghia requested a review from robertmaynard December 3, 2021 20:40

rapidsai deleted a comment from codecov bot Dec 3, 2021

rapidsai deleted a comment from codecov bot Dec 7, 2021

wbo4958 reviewed Dec 14, 2021

View reviewed changes

cpp/include/cudf/lists/filling.hpp Outdated Show resolved Hide resolved

ttnghia added 2 commits December 13, 2021 20:04

Fix example in doxygen

38744fe

Merge branch 'branch-22.02' into list_sequences

838ed3a

rapidsai deleted a comment from codecov bot Dec 14, 2021

wbo4958 reviewed Dec 14, 2021

View reviewed changes

cpp/src/lists/sequences.cu Show resolved Hide resolved

rapidsai deleted a comment from codecov bot Dec 14, 2021

jrhemstad approved these changes Dec 15, 2021

View reviewed changes

wbo4958 mentioned this pull request Dec 16, 2021

Add sequence support [databricks] NVIDIA/spark-rapids#4376

Merged

This comment has been minimized.

Sign in to view

ttnghia requested a review from a team December 16, 2021 18:41

karthikeyann requested changes Jan 4, 2022

View reviewed changes

cpp/src/lists/sequences.cu Show resolved Hide resolved

ttnghia requested review from nvdbaranec and karthikeyann January 4, 2022 13:36

nvdbaranec approved these changes Jan 4, 2022

View reviewed changes

karthikeyann approved these changes Jan 4, 2022

View reviewed changes

rapids-bot bot merged commit 36fa5f3 into rapidsai:branch-22.02 Jan 4, 2022

ttnghia deleted the list_sequences branch January 6, 2022 20:26

firestarman mentioned this pull request Jan 11, 2022

[FEA] Return null for the row where null exists in starts, or sizes, or steps #10012

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement per-list sequence #9839

Implement per-list sequence #9839

ttnghia commented Dec 3, 2021 •

edited

Loading

vuule commented Dec 4, 2021

wbo4958 commented Dec 14, 2021

wbo4958 commented Dec 14, 2021

ttnghia commented Dec 14, 2021

ttnghia commented Dec 14, 2021

wbo4958 commented Dec 15, 2021

wbo4958 commented Dec 15, 2021

This comment has been minimized.

ttnghia commented Jan 4, 2022

Implement per-list sequence #9839

Implement per-list sequence #9839

Conversation

ttnghia commented Dec 3, 2021 • edited Loading

vuule commented Dec 4, 2021

wbo4958 commented Dec 14, 2021

wbo4958 commented Dec 14, 2021

ttnghia commented Dec 14, 2021

ttnghia commented Dec 14, 2021

wbo4958 commented Dec 15, 2021

wbo4958 commented Dec 15, 2021

This comment has been minimized.

ttnghia commented Jan 4, 2022

ttnghia commented Dec 3, 2021 •

edited

Loading