Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve auto merge conflicts for Branch 21.08 from branch 21.06 #8329

Merged

Conversation

galipremsagar
Copy link
Contributor

This PR resolves conflicts for auto-merger to proceed: #8317

isVoid and others added 25 commits May 20, 2021 01:28
This PR adds support to `make_column_from_scalar` for `list_scalar`. For 0-length columns, a well-formed `LIST` type column, whose child column has the same column hierarchy to the row data stored in `list_scalar` is returned.

Example:
```
slr.data = [1, 2, 3] // An integer list of 1, 2, 3, `data` is an INT column
make_column_from_scalar(s, 2) // List<int> column: {[1, 2, 3], [1, 2, 3]}, whose child column is an `INT` column.

slr.data = [[1, 2], [3]] // A list of integer lists, `data` is a List<int> column
make_column_from_scalar(s, 0) // Well formed, 0-length List<List<int>> column, whose child column is a List<int> column.
```

Closes rapidsai#8088

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Devavret Makkar (https://github.com/devavret)
  - Mark Harris (https://github.com/harrism)

URL: rapidsai#8185
This PR is to support creating a `ColumnVector ` from the byte arrays of UTF8 Strings.

And also let the `Struct` children creation support UTF8 Strings.

Closes rapidsai#8137

Signed-off-by: Firestarman <[email protected]>

Authors:
  - Liangcai Li (https://github.com/firestarman)

Approvers:
  - Allen Xu (https://github.com/wjxiz1992)
  - Jason Lowe (https://github.com/jlowe)
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Alfred Xu (https://github.com/sperlingxx)

URL: rapidsai#8257
This is a small PR to support creating a scalar from an array of utf8 bytes.

Since the PR rapidsai#8257 added the support for ColumnVector creation, so I think we'd better add it for scalar creation to avoid conversions between utf8 strings and Java strings when used in Spark.

Signed-off-by: Firestarman <[email protected]>

Authors:
  - Liangcai Li (https://github.com/firestarman)

Approvers:
  - Bobby Wang (https://github.com/wbo4958)

URL: rapidsai#8294
Currently the Serializable class provides `serialize` and `deserialize` as `abstractmethod`s via the mechanisms afforded by inheritance from `abc.ABC`. Since this class is purely internal to `cudf` and is not describing an abstract interface in a manner useful to consumers of our code, the benefits of the abstract base class concept are outweighed by the performance and maintenance costs. In particular, `isinstance` checks on subclasses of `abc.ABC` are much more expensive than for normal classes (due to an expensive implementation of `__instancecheck__`), and (for better or worse) our code base currently makes use of these checks extensively. In addition, in certain places we can benefit from the use of custom metaclasses in `cudf`, but their usage becomes more cumbersome with `ABC` because metaclasses then also have to inherit from `ABCMeta` (which brings along any associated complexities). This PR removes that inheritance, replacing it with a much simpler approach that simply implements `serialize` and `deserialize` as raising `NotImplementedError`.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - https://github.com/brandon-b-miller

URL: rapidsai#8254
This PR implements `lists::concatenate_list_elements` for list type. Given a lists column in which each row is a list of lists, the output column is generated by concatenating all lists in the same row into a single list.

Example:
```
l = [ [{1, 2}, {3, 4}, {5}], [{6}, {}, {7, 8, 9}] ]
r = lists::concatenate_list_elements(l);
r is [ {1, 2, 3, 4, 5}, {6, 7, 8, 9} ]
```

This closes rapidsai#8164. In addition, `lists::concatenate_rows` is rewritten using `lists::interleave_columns` following by `lists::concatenate_list_elements`, which is significantly shorter.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Jason Lowe (https://github.com/jlowe)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Devavret Makkar (https://github.com/devavret)

URL: rapidsai#8231
Merge `branch-0.19` into `branch-21.06` [skip ci]
This PR updates the `0.20` references in `CHANGELOG.md` to be `21.06`.

Authors:
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - https://github.com/jakirkham

URL: rapidsai#8303
Since we want GDS reads/writes to be 4 KiB aligned, sometimes we can't use the `DeviceMemoryBuffer` as is and need to adjust the size written. This change makes the JNI APIs more flexible to accommodate those.

Authors:
  - Rong Ou (https://github.com/rongou)

Approvers:
  - Jason Lowe (https://github.com/jlowe)

URL: rapidsai#8301
…ai#8265)

Make it so that this works:

```
x = cudf.Series([[1,2,None]])
x[0]
# [1, 2, <NA>]
```

Authors:
  - https://github.com/brandon-b-miller

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Michael Wang (https://github.com/isVoid)

URL: rapidsai#8265
Closes rapidsai#7561 

This PR makes sure upon constructing cudf object, nested types from the pyarrow array is copied to cudf object. This should handle arbitrary nesting of `Lists`, `Structs`. For decimal types, precision is copied from the array.

Authors:
  - Michael Wang (https://github.com/isVoid)
  - Keith Kraus (https://github.com/kkraus14)

Approvers:
  - Keith Kraus (https://github.com/kkraus14)

URL: rapidsai#8244
Signed-off-by: Peixin Li <[email protected]>

supplement to rapidsai#8267, 
as discussed, cudf JNI  and plugin will follow pattern YY.MM.P

Authors:
  - pxLi (https://github.com/pxLi)

Approvers:
  - Jason Lowe (https://github.com/jlowe)
  - Robert (Bobby) Evans (https://github.com/revans2)

URL: rapidsai#8292
After the rework of `cudf::lists::concatenate_rows`, something changed on null handling failed [corresponding cuDF Java tests](https://github.com/rapidsai/cudf/blob/branch-21.06/java/src/test/java/ai/rapids/cudf/ColumnVectorTest.java#L2234). 
In specific, when we apply `concatenate_null_policy::IGNORE`, the output lists are always null free, even if input data contains rows consisting of all nulls. 

In my opinion, we had better creating null mask for input rows of  `all_nulls`, to keep align with single column concatenate.

Signed-off-by: sperlingxx <[email protected]>

Authors:
  - Alfred Xu (https://github.com/sperlingxx)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#8312
Replaces CUDA 10.1/10.2 with 11.0/11.2.

Authors:
  - Ray Douglass (https://github.com/raydouglass)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#8315
Fixes the group-by portion of rapidsai#7611.

When `COLLECT_LIST()` or `COLLECT_SET()` aggregations are called on a grouped input, if the input column is empty, then one sees the following failure:
```
C++ exception with description "cuDF failure at: .../cpp/src/column/column_factories.cpp:67: 
make_empty_column is invalid to call on nested types" thrown in the test body.
```
The operation should have resulted in an empty `LIST` column. `make_empty_column()` does not support `LIST` types (in part because the `data_type` parameter does not capture the types of the child columns).

This commit fixes this by constructing the output column from the specified `values` input, but only for `COLLECT_LIST()` and `COLLECT_SET()`; other aggregation types are unchanged.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Conor Hoekstra (https://github.com/codereport)
  - Nghia Truong (https://github.com/ttnghia)
  - https://github.com/nvdbaranec

URL: rapidsai#8279
Addresses rapidsai#7110 

column_to_strings_fn was specialized for fixed point type to enable support for csv writer. A test was added to validate output file created by csv writer for decimal type column.

Authors:
  - Kumar Aatish (https://github.com/kaatish)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - David Wendt (https://github.com/davidwendt)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Devavret Makkar (https://github.com/devavret)

URL: rapidsai#8296
This enables implicit casting when decimal columns are concatenated with numeric columns by casting the numeric columns to decimal columns.

Closes rapidsai#8264

Authors:
  - https://github.com/ChrisJar

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - https://github.com/brandon-b-miller

URL: rapidsai#8276
…#8282)

Closes rapidsai#4728 

This PR adds a new parameter to the `cudf::strings::concatenate` APIs to specify if separators should be added between null entries when the null-replacement (narep) parameter is valid. If the narep scalar is invalid (i.e. null itself) then the entire output row becomes null. If not, separators are added between each element. Examples:

```
s1 = ['a', 'b', null, 'dd', null]
s2 = ['A', null, 'CC', 'D', null]
concatenate( {s1, s2}, sep='+', narep=invalid ) -> ['a+A', null, null, 'dd+D', null]
concatenate( {s1, s2}, sep='+', narep='@' ) -> ['a+A', 'b+@', '@+CC', 'dd+D', '@+@']
concatenate( {s1, s2}, sep='+', narep='' ) -> ['a+A', 'b+', '+CC', 'dd+D', '+']
```

The new parameter is an enum `separator_on_nulls` which has `YES` or `NO` settings. The default parameter value will be `YES` to keep the current behavior as expected by Python cudf and for consistency with Pandas behavior.
Specifying `NO` here will suppress the separator with null elements (when narep is valid).

```
concatenate( {s1, s2}, sep='+', narep='', NO ) -> ['a+A', 'b', 'CC', 'dd+D', '']
```

This PR also changes the name of the `cudf::strings::concatenate_list_elements` API to `cudf::strings::join_list_elements` instead. The API pattern and behavior more mimic the `cudf::strings::join_strings` then the concatenate functions. Also, these are called by the Python `join` functions so the rename makes it more consistent with cudf.

This is a breaking change in order to make these APIs more consistent. Previously, the separators column version was returning nulls only for an all-null row. This has been changed to honor the `separator_on_null` parameter instead. Currently there was no Python cudf API calling this version. Only the rename required minor changes to the Cython layer.

The gtests were updated to reflect the new behavior. None of the pytests required any changes since the default parameter value matches the original behavior for those APIs that cudf actually calls.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Nghia Truong (https://github.com/ttnghia)
  - Keith Kraus (https://github.com/kkraus14)
  - Thomas Graves (https://github.com/tgravescs)
  - Christopher Harris (https://github.com/cwharris)

URL: rapidsai#8282
This small PR is to replace the JNI implementation with the corresponding cudf API `make_column_from_scalar`.

The PR rapidsai#8185 has added the support for nested type, so it is ok to do this now.

Signed-off-by: Firestarman <[email protected]>

Authors:
  - Liangcai Li (https://github.com/firestarman)

Approvers:
  - Bobby Wang (https://github.com/wbo4958)
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Jason Lowe (https://github.com/jlowe)

URL: rapidsai#8310
Adds a document to describe cuIO behavior with respect to the GDS library use.
Also includes a disclaimer about the current state of the integration.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Keith Kraus (https://github.com/kkraus14)

URL: rapidsai#8293
…_key" (rapidsai#8263)

Reverts rapidsai#8199

According to @allisonvacanti (NVIDIA/thrust#1424 (comment)) this patch will likely have adverse effect on performance. We should revert it until a better solution can be found.

Authors:
  - Christopher Harris (https://github.com/cwharris)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Keith Kraus (https://github.com/kkraus14)
  - Elias Stehle (https://github.com/elstehle)

URL: rapidsai#8263
This PR closes rapidsai#7067.
This was implemented by adding the `_is_homogeneous` property to `DataFrame`. Included are appropriate test cases.

Authors:
  - https://github.com/shaneding

Approvers:
  - https://github.com/brandon-b-miller
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: rapidsai#8299
@galipremsagar galipremsagar requested review from a team as code owners May 24, 2021 14:41
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 24, 2021
@galipremsagar galipremsagar added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels May 24, 2021
@kkraus14
Copy link
Collaborator

@galipremsagar when we fix the auto-merge we typically want to keep the history from the source branch so the history of 21.08 is compatible with 21.06.

This prevents things like partition from working with deeply nested arrays.

I marked this as non-breaking, but I am happy to change it to breaking because I removed a detailed API that is not used anywhere else and is flawed.

Authors:
  - Robert (Bobby) Evans (https://github.com/revans2)

Approvers:
  - https://github.com/nvdbaranec
  - Conor Hoekstra (https://github.com/codereport)
  - Jason Lowe (https://github.com/jlowe)
  - Jake Hemstad (https://github.com/jrhemstad)

URL: rapidsai#8314
@galipremsagar galipremsagar force-pushed the branch-21.08-merge-branch-21.06 branch from f6c945b to fc2cccc Compare May 24, 2021 15:04
Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The java changes look fine.

@galipremsagar galipremsagar force-pushed the branch-21.08-merge-branch-21.06 branch 2 times, most recently from 4cad0e5 to b50cf9b Compare May 24, 2021 15:20
@galipremsagar
Copy link
Contributor Author

rerun tests

CHANGELOG.md Outdated Show resolved Hide resolved
…i#8321)

This PR updates the environment variable thats used to determine the `cuda_version` varaible in our conda recipes.

The `CUDA` environment variable is explicitly set by the Ops team in our Jenkins jobs, whereas `CUDA_VERSION` comes from the `nvidia/cuda` Docker images that we base our images from.

Authors:
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#8321
@galipremsagar galipremsagar force-pushed the branch-21.08-merge-branch-21.06 branch from b50cf9b to 3e296c4 Compare May 24, 2021 15:51
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
tgravescs and others added 2 commits May 24, 2021 17:05
to stringConcatenate when using a scalar separator.

Reference rapidsai#8282 changed to throw an exception if only a single column is passed in to the stringConcatenate using scalar separator.  Update our Java test for that functionality.

Signed-off-by: Thomas Graves <[email protected]>

Authors:
  - Thomas Graves (https://github.com/tgravescs)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Jason Lowe (https://github.com/jlowe)

URL: rapidsai#8330
As part of this commit rapidsai@8406522 we accidentally changed the release version of readme to `21.06`, whereas the stable version currently in `rapidsai` channel is `0.19`.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Keith Kraus (https://github.com/kkraus14)

URL: rapidsai#8331
@kkraus14
Copy link
Collaborator

@galipremsagar you need to pull in the latest merged PRs from 21.06 now as they don't show in this PR.

@galipremsagar galipremsagar force-pushed the branch-21.08-merge-branch-21.06 branch from 3e296c4 to cd4cfba Compare May 24, 2021 18:25
@galipremsagar
Copy link
Contributor Author

@galipremsagar you need to pull in the latest merged PRs from 21.06 now as they don't show in this PR.

Pushed the latest changes

@kkraus14 kkraus14 merged commit 90a244c into rapidsai:branch-21.08 May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.