[REVIEW] concatenate row items using a separator defined per row #5204

sriramch · 2020-05-15T04:29:37Z

this Closes [FEA] Concatenate column to scalar and scalar to column #3726
this emulates concatenate_ws spark functionality
provides option for a global separator and global column null replacements
skips null values in a row to perform concatenation

- this Closes rapidsai#3726 - this emulates `concatenate_ws` spark functionality - provides option for a global separator and global column null replacements - skips null values in a row to perform concatenation

codecov · 2020-05-15T06:29:57Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.15@a784e37). Click here to learn what that means.
The diff coverage is n/a.

@@              Coverage Diff               @@
##             branch-0.15    #5204   +/-   ##
==============================================
  Coverage               ?   88.38%           
==============================================
  Files                  ?       55           
  Lines                  ?    10489           
  Branches               ?        0           
==============================================
  Hits                   ?     9271           
  Misses                 ?     1218           
  Partials               ?        0

Impacted Files	Coverage Δ
python/cudf/cudf/utils/queryutils.py	`94.00% <0.00%> (ø)`
python/cudf/cudf/io/avro.py	`81.81% <0.00%> (ø)`
python/cudf/cudf/utils/cudautils.py	`46.52% <0.00%> (ø)`
python/dask_cudf/dask_cudf/io/tests/test_s3.py	`95.65% <0.00%> (ø)`
python/cudf/cudf/_lib/nvtx/colors.py	`46.15% <0.00%> (ø)`
python/cudf/cudf/core/column/column.py	`86.52% <0.00%> (ø)`
python/cudf/cudf/core/column/__init__.py	`100.00% <0.00%> (ø)`
python/dask_cudf/dask_cudf/io/tests/test_csv.py	`100.00% <0.00%> (ø)`
python/cudf/cudf/utils/dtypes.py	`85.38% <0.00%> (ø)`
python/cudf/cudf/utils/docutils.py	`100.00% <0.00%> (ø)`
... and 45 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a784e37...d938fea. Read the comment docs.

…oncatenate

davidwendt

This looks good.
One minor thing. I've been trying to consistently use the plural strings whenever referring to a strings column in the documentation.

cpp/src/strings/combine.cu

- add more tests (all empty string test)

…oncatenate

cpp/include/cudf/strings/combine.hpp

CHANGELOG.md

Co-authored-by: David <[email protected]>

…oncatenate

devavret

This doesn't seem to directly address the requirement of the associated issue. I can approve but prefer not to merge this until @rwlee and @revans2 confirm that the alternate approach you mentioned is acceptable.

sriramch · 2020-05-21T22:52:28Z

This doesn't seem to directly address the requirement of the associated issue. I can approve but prefer not to merge this until @rwlee and @revans2 confirm that the alternate approach you mentioned is acceptable.

[sc] i have checked with @revans2 offline and he did concur that the alternate api's requested in the ticket will result in as much (or more) intermediaries compared to existing append api. further, this api is required to support the concatenate_ws semantics with a separator per row.

…oncatenate

revans2 · 2020-05-22T13:38:21Z

This looks fine to me @rwlee does the API look like what you expected?

devavret · 2020-05-22T15:29:40Z

Just out of curiosity, what can this new API achieve that wasn’t possible with the old concatenate. Eg could you achieve the same result by creating a table_view from {col1, separator_col, col2, separator_col, col3} and an empty string scalar as separator. Is the difference only in the ability to specify different null replacements for value columns and separator columns?

If so, could this be better achieved by making the existing API more generic and take a null replacement per column of the values table? I’m not prescribing that this is how it should be done, just trying to understand the API requirements.

sriramch · 2020-05-22T16:36:41Z

creating a table_view from {col1, separator_col, col2, separator_col, col3} and an empty string scalar as separator.

existing api nulls a row even if one column has a null value for the row. this skips nulls
existing api requires separator rep. to be valid. this doesn't enforce it, and if a separator for a given row is null, the output is null for that row
this api explicitly defines a separator column vs. having to sneak it in between string columns. the caller has to make it to work, which is non-intuitive

Is the difference only in the ability to specify different null replacements for value columns and separator columns?

this is one of them

making the existing API more generic and take a null replacement per column of the values table?

the existing api can be made to work, but i feel it may result in cluttering the existing api. for instance, the existing use-case requires a valid separator scalar. this cannot be enforced, if this has to be retrofitted to the new requirement, as we cannot interpret the semantics of the different columns contained within the table.

devavret · 2020-05-22T18:06:36Z

the existing api can be made to work, but i feel it may result in cluttering the existing api

Fair enough

…oncatenate

rwlee · 2020-05-26T20:21:17Z

This looks fine to me @rwlee does the API look like what you expected?

Yup, this addresses the core functionality we wanted. Thanks @sriramch

…oncatenate

harrism · 2020-05-28T05:22:03Z

Retargeting to 0.15 since we enter code freeze tonight.

harrism · 2020-05-28T05:22:37Z

Make sure you communicate with a maintainer when you open PRs. This one was not on my radar and therefore had not been added to the 0.14 project board.

…oncatenate

sriramch · 2020-05-28T15:05:15Z

@devavret is this good to go?

i have now re-targeted this against 0.15.

…oncatenate

sriramch · 2020-06-01T18:34:42Z

@davidwendt @devavret - can one of you please merge this?

sriramch added 3 commits May 14, 2020 21:32

- concatenate row items using a separator defined per row

6467dbb

- this Closes rapidsai#3726 - this emulates `concatenate_ws` spark functionality - provides option for a global separator and global column null replacements - skips null values in a row to perform concatenation

- test fixes

228cad4

- update changelog

f8e1344

sriramch added feature request New feature or request 3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond Spark Functionality that helps Spark RAPIDS labels May 15, 2020

sriramch requested review from devavret and davidwendt May 15, 2020 04:29

sriramch requested a review from a team as a code owner May 15, 2020 04:29

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

aaa8496

…oncatenate

davidwendt requested changes May 15, 2020

View reviewed changes

cpp/src/strings/combine.cu Outdated Show resolved Hide resolved

sriramch added 3 commits May 15, 2020 19:18

- incorporate review comments

cea4eca

- add more tests (all empty string test)

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

fe17e0e

…oncatenate

- change verbiage from string column[s] -> strings column[s]

d188533

davidwendt requested changes May 15, 2020

View reviewed changes

cpp/include/cudf/strings/combine.hpp Outdated Show resolved Hide resolved

cpp/include/cudf/strings/combine.hpp Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

sriramch and others added 6 commits May 15, 2020 12:54

Update cpp/include/cudf/strings/combine.hpp

113f6b0

Co-authored-by: David <[email protected]>

Update cpp/include/cudf/strings/combine.hpp

2bc4b1e

Co-authored-by: David <[email protected]>

Update CHANGELOG.md

3d7a3e6

Co-authored-by: David <[email protected]>

- refine changelog description

3b3dc80

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

4d6c27b

…oncatenate

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

57f951a

…oncatenate

davidwendt approved these changes May 18, 2020

View reviewed changes

sriramch added 5 commits May 18, 2020 20:01

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

8b84223

…oncatenate

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

c0f1835

…oncatenate

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

10aa099

…oncatenate

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

38bfb86

…oncatenate

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

df12ea5

…oncatenate

devavret reviewed May 21, 2020

View reviewed changes

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

22152e8

…oncatenate

- sync with updates to upstream branch 0.14

975999e

sriramch added 3 commits May 22, 2020 23:54

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

2f0c7cb

…oncatenate

- default stream parameter in detail namespace

72e6c72

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

ecd4019

…oncatenate

sriramch added 2 commits May 26, 2020 22:18

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

620635d

…oncatenate

Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into c…

b25ea14

…oncatenate

harrism changed the base branch from branch-0.14 to branch-0.15 May 28, 2020 05:22

sriramch added 2 commits May 28, 2020 11:15

Merge branch 'branch-0.15' of https://github.com/rapidsai/cudf into c…

f8271cc

…oncatenate

- updates to changelog for 0.15 release

f52518b

devavret approved these changes May 28, 2020

View reviewed changes

Merge branch 'branch-0.15' of https://github.com/rapidsai/cudf into c…

d938fea

…oncatenate

davidwendt merged commit 855e735 into rapidsai:branch-0.15 Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] concatenate row items using a separator defined per row #5204

[REVIEW] concatenate row items using a separator defined per row #5204

sriramch commented May 15, 2020

codecov bot commented May 15, 2020 •

edited

Loading

davidwendt left a comment

devavret left a comment

sriramch commented May 21, 2020

revans2 commented May 22, 2020

devavret commented May 22, 2020

sriramch commented May 22, 2020

devavret commented May 22, 2020

rwlee commented May 26, 2020

harrism commented May 28, 2020

harrism commented May 28, 2020

sriramch commented May 28, 2020

sriramch commented Jun 1, 2020

[REVIEW] concatenate row items using a separator defined per row #5204

[REVIEW] concatenate row items using a separator defined per row #5204

Conversation

sriramch commented May 15, 2020

codecov bot commented May 15, 2020 • edited Loading

Codecov Report

davidwendt left a comment

Choose a reason for hiding this comment

devavret left a comment

Choose a reason for hiding this comment

sriramch commented May 21, 2020

revans2 commented May 22, 2020

devavret commented May 22, 2020

sriramch commented May 22, 2020

devavret commented May 22, 2020

rwlee commented May 26, 2020

harrism commented May 28, 2020

harrism commented May 28, 2020

sriramch commented May 28, 2020

sriramch commented Jun 1, 2020

codecov bot commented May 15, 2020 •

edited

Loading