-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve auto merge conflicts for Branch 21.08 from branch 21.06 #8329
Resolve auto merge conflicts for Branch 21.08 from branch 21.06 #8329
Conversation
This PR adds support to `make_column_from_scalar` for `list_scalar`. For 0-length columns, a well-formed `LIST` type column, whose child column has the same column hierarchy to the row data stored in `list_scalar` is returned. Example: ``` slr.data = [1, 2, 3] // An integer list of 1, 2, 3, `data` is an INT column make_column_from_scalar(s, 2) // List<int> column: {[1, 2, 3], [1, 2, 3]}, whose child column is an `INT` column. slr.data = [[1, 2], [3]] // A list of integer lists, `data` is a List<int> column make_column_from_scalar(s, 0) // Well formed, 0-length List<List<int>> column, whose child column is a List<int> column. ``` Closes rapidsai#8088 Authors: - Michael Wang (https://github.com/isVoid) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Devavret Makkar (https://github.com/devavret) - Mark Harris (https://github.com/harrism) URL: rapidsai#8185
This PR is to support creating a `ColumnVector ` from the byte arrays of UTF8 Strings. And also let the `Struct` children creation support UTF8 Strings. Closes rapidsai#8137 Signed-off-by: Firestarman <[email protected]> Authors: - Liangcai Li (https://github.com/firestarman) Approvers: - Allen Xu (https://github.com/wjxiz1992) - Jason Lowe (https://github.com/jlowe) - Robert (Bobby) Evans (https://github.com/revans2) - Alfred Xu (https://github.com/sperlingxx) URL: rapidsai#8257
This is a small PR to support creating a scalar from an array of utf8 bytes. Since the PR rapidsai#8257 added the support for ColumnVector creation, so I think we'd better add it for scalar creation to avoid conversions between utf8 strings and Java strings when used in Spark. Signed-off-by: Firestarman <[email protected]> Authors: - Liangcai Li (https://github.com/firestarman) Approvers: - Bobby Wang (https://github.com/wbo4958) URL: rapidsai#8294
Depends on rapidsai/rmm#768. Authors: - Rong Ou (https://github.com/rongou) Approvers: - Jason Lowe (https://github.com/jlowe) URL: rapidsai#8266
Currently the Serializable class provides `serialize` and `deserialize` as `abstractmethod`s via the mechanisms afforded by inheritance from `abc.ABC`. Since this class is purely internal to `cudf` and is not describing an abstract interface in a manner useful to consumers of our code, the benefits of the abstract base class concept are outweighed by the performance and maintenance costs. In particular, `isinstance` checks on subclasses of `abc.ABC` are much more expensive than for normal classes (due to an expensive implementation of `__instancecheck__`), and (for better or worse) our code base currently makes use of these checks extensively. In addition, in certain places we can benefit from the use of custom metaclasses in `cudf`, but their usage becomes more cumbersome with `ABC` because metaclasses then also have to inherit from `ABCMeta` (which brings along any associated complexities). This PR removes that inheritance, replacing it with a much simpler approach that simply implements `serialize` and `deserialize` as raising `NotImplementedError`. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) - https://github.com/brandon-b-miller URL: rapidsai#8254
This PR implements `lists::concatenate_list_elements` for list type. Given a lists column in which each row is a list of lists, the output column is generated by concatenating all lists in the same row into a single list. Example: ``` l = [ [{1, 2}, {3, 4}, {5}], [{6}, {}, {7, 8, 9}] ] r = lists::concatenate_list_elements(l); r is [ {1, 2, 3, 4, 5}, {6, 7, 8, 9} ] ``` This closes rapidsai#8164. In addition, `lists::concatenate_rows` is rewritten using `lists::interleave_columns` following by `lists::concatenate_list_elements`, which is significantly shorter. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Jason Lowe (https://github.com/jlowe) - AJ Schmidt (https://github.com/ajschmidt8) - GALI PREM SAGAR (https://github.com/galipremsagar) - Devavret Makkar (https://github.com/devavret) URL: rapidsai#8231
Authors: - Ashwin Srinath (https://github.com/shwina) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Keith Kraus (https://github.com/kkraus14) - Michael Wang (https://github.com/isVoid) - Christopher Harris (https://github.com/cwharris) - Gera Shegalov (https://github.com/gerashegalov) URL: rapidsai#8272
Merge `branch-0.19` into `branch-21.06` [skip ci]
This PR updates the `0.20` references in `CHANGELOG.md` to be `21.06`. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - https://github.com/jakirkham URL: rapidsai#8303
Since we want GDS reads/writes to be 4 KiB aligned, sometimes we can't use the `DeviceMemoryBuffer` as is and need to adjust the size written. This change makes the JNI APIs more flexible to accommodate those. Authors: - Rong Ou (https://github.com/rongou) Approvers: - Jason Lowe (https://github.com/jlowe) URL: rapidsai#8301
…ai#8265) Make it so that this works: ``` x = cudf.Series([[1,2,None]]) x[0] # [1, 2, <NA>] ``` Authors: - https://github.com/brandon-b-miller Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Michael Wang (https://github.com/isVoid) URL: rapidsai#8265
Closes rapidsai#7561 This PR makes sure upon constructing cudf object, nested types from the pyarrow array is copied to cudf object. This should handle arbitrary nesting of `Lists`, `Structs`. For decimal types, precision is copied from the array. Authors: - Michael Wang (https://github.com/isVoid) - Keith Kraus (https://github.com/kkraus14) Approvers: - Keith Kraus (https://github.com/kkraus14) URL: rapidsai#8244
Signed-off-by: Peixin Li <[email protected]> supplement to rapidsai#8267, as discussed, cudf JNI and plugin will follow pattern YY.MM.P Authors: - pxLi (https://github.com/pxLi) Approvers: - Jason Lowe (https://github.com/jlowe) - Robert (Bobby) Evans (https://github.com/revans2) URL: rapidsai#8292
After the rework of `cudf::lists::concatenate_rows`, something changed on null handling failed [corresponding cuDF Java tests](https://github.com/rapidsai/cudf/blob/branch-21.06/java/src/test/java/ai/rapids/cudf/ColumnVectorTest.java#L2234). In specific, when we apply `concatenate_null_policy::IGNORE`, the output lists are always null free, even if input data contains rows consisting of all nulls. In my opinion, we had better creating null mask for input rows of `all_nulls`, to keep align with single column concatenate. Signed-off-by: sperlingxx <[email protected]> Authors: - Alfred Xu (https://github.com/sperlingxx) Approvers: - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#8312
Replaces CUDA 10.1/10.2 with 11.0/11.2. Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#8315
Fixes the group-by portion of rapidsai#7611. When `COLLECT_LIST()` or `COLLECT_SET()` aggregations are called on a grouped input, if the input column is empty, then one sees the following failure: ``` C++ exception with description "cuDF failure at: .../cpp/src/column/column_factories.cpp:67: make_empty_column is invalid to call on nested types" thrown in the test body. ``` The operation should have resulted in an empty `LIST` column. `make_empty_column()` does not support `LIST` types (in part because the `data_type` parameter does not capture the types of the child columns). This commit fixes this by constructing the output column from the specified `values` input, but only for `COLLECT_LIST()` and `COLLECT_SET()`; other aggregation types are unchanged. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Conor Hoekstra (https://github.com/codereport) - Nghia Truong (https://github.com/ttnghia) - https://github.com/nvdbaranec URL: rapidsai#8279
Addresses rapidsai#7110 column_to_strings_fn was specialized for fixed point type to enable support for csv writer. A test was added to validate output file created by csv writer for decimal type column. Authors: - Kumar Aatish (https://github.com/kaatish) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - Vukasin Milovanovic (https://github.com/vuule) - Devavret Makkar (https://github.com/devavret) URL: rapidsai#8296
This enables implicit casting when decimal columns are concatenated with numeric columns by casting the numeric columns to decimal columns. Closes rapidsai#8264 Authors: - https://github.com/ChrisJar Approvers: - Ashwin Srinath (https://github.com/shwina) - https://github.com/brandon-b-miller URL: rapidsai#8276
…#8282) Closes rapidsai#4728 This PR adds a new parameter to the `cudf::strings::concatenate` APIs to specify if separators should be added between null entries when the null-replacement (narep) parameter is valid. If the narep scalar is invalid (i.e. null itself) then the entire output row becomes null. If not, separators are added between each element. Examples: ``` s1 = ['a', 'b', null, 'dd', null] s2 = ['A', null, 'CC', 'D', null] concatenate( {s1, s2}, sep='+', narep=invalid ) -> ['a+A', null, null, 'dd+D', null] concatenate( {s1, s2}, sep='+', narep='@' ) -> ['a+A', 'b+@', '@+CC', 'dd+D', '@+@'] concatenate( {s1, s2}, sep='+', narep='' ) -> ['a+A', 'b+', '+CC', 'dd+D', '+'] ``` The new parameter is an enum `separator_on_nulls` which has `YES` or `NO` settings. The default parameter value will be `YES` to keep the current behavior as expected by Python cudf and for consistency with Pandas behavior. Specifying `NO` here will suppress the separator with null elements (when narep is valid). ``` concatenate( {s1, s2}, sep='+', narep='', NO ) -> ['a+A', 'b', 'CC', 'dd+D', ''] ``` This PR also changes the name of the `cudf::strings::concatenate_list_elements` API to `cudf::strings::join_list_elements` instead. The API pattern and behavior more mimic the `cudf::strings::join_strings` then the concatenate functions. Also, these are called by the Python `join` functions so the rename makes it more consistent with cudf. This is a breaking change in order to make these APIs more consistent. Previously, the separators column version was returning nulls only for an all-null row. This has been changed to honor the `separator_on_null` parameter instead. Currently there was no Python cudf API calling this version. Only the rename required minor changes to the Cython layer. The gtests were updated to reflect the new behavior. None of the pytests required any changes since the default parameter value matches the original behavior for those APIs that cudf actually calls. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Nghia Truong (https://github.com/ttnghia) - Keith Kraus (https://github.com/kkraus14) - Thomas Graves (https://github.com/tgravescs) - Christopher Harris (https://github.com/cwharris) URL: rapidsai#8282
This small PR is to replace the JNI implementation with the corresponding cudf API `make_column_from_scalar`. The PR rapidsai#8185 has added the support for nested type, so it is ok to do this now. Signed-off-by: Firestarman <[email protected]> Authors: - Liangcai Li (https://github.com/firestarman) Approvers: - Bobby Wang (https://github.com/wbo4958) - Robert (Bobby) Evans (https://github.com/revans2) - Jason Lowe (https://github.com/jlowe) URL: rapidsai#8310
Adds a document to describe cuIO behavior with respect to the GDS library use. Also includes a disclaimer about the current state of the integration. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Keith Kraus (https://github.com/kkraus14) URL: rapidsai#8293
…_key" (rapidsai#8263) Reverts rapidsai#8199 According to @allisonvacanti (NVIDIA/thrust#1424 (comment)) this patch will likely have adverse effect on performance. We should revert it until a better solution can be found. Authors: - Christopher Harris (https://github.com/cwharris) Approvers: - David Wendt (https://github.com/davidwendt) - Keith Kraus (https://github.com/kkraus14) - Elias Stehle (https://github.com/elstehle) URL: rapidsai#8263
This PR closes rapidsai#7067. This was implemented by adding the `_is_homogeneous` property to `DataFrame`. Included are appropriate test cases. Authors: - https://github.com/shaneding Approvers: - https://github.com/brandon-b-miller - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#8299
@galipremsagar when we fix the auto-merge we typically want to keep the history from the source branch so the history of |
This prevents things like partition from working with deeply nested arrays. I marked this as non-breaking, but I am happy to change it to breaking because I removed a detailed API that is not used anywhere else and is flawed. Authors: - Robert (Bobby) Evans (https://github.com/revans2) Approvers: - https://github.com/nvdbaranec - Conor Hoekstra (https://github.com/codereport) - Jason Lowe (https://github.com/jlowe) - Jake Hemstad (https://github.com/jrhemstad) URL: rapidsai#8314
f6c945b
to
fc2cccc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The java changes look fine.
4cad0e5
to
b50cf9b
Compare
rerun tests |
…i#8321) This PR updates the environment variable thats used to determine the `cuda_version` varaible in our conda recipes. The `CUDA` environment variable is explicitly set by the Ops team in our Jenkins jobs, whereas `CUDA_VERSION` comes from the `nvidia/cuda` Docker images that we base our images from. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#8321
b50cf9b
to
3e296c4
Compare
to stringConcatenate when using a scalar separator. Reference rapidsai#8282 changed to throw an exception if only a single column is passed in to the stringConcatenate using scalar separator. Update our Java test for that functionality. Signed-off-by: Thomas Graves <[email protected]> Authors: - Thomas Graves (https://github.com/tgravescs) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Jason Lowe (https://github.com/jlowe) URL: rapidsai#8330
As part of this commit rapidsai@8406522 we accidentally changed the release version of readme to `21.06`, whereas the stable version currently in `rapidsai` channel is `0.19`. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Keith Kraus (https://github.com/kkraus14) URL: rapidsai#8331
@galipremsagar you need to pull in the latest merged PRs from 21.06 now as they don't show in this PR. |
3e296c4
to
cd4cfba
Compare
Pushed the latest changes |
This PR resolves conflicts for auto-merger to proceed: #8317