[gpuCI] Auto-merge branch-0.18 to branch-0.19 [skip ci] #3444

GPUtester · 2021-02-01T18:27:54Z

Auto-merge triggered by push to branch-0.18 that creates a PR to keep branch-0.19 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge.

@ajschmidt8

This PR prepares the changelog to be automatically updated during releases. Authors: - AJ Schmidt (@ajschmidt8) Approvers: - Dante Gama Dessavre (@dantegd) URL: #3442

GPUtester · 2021-02-01T18:28:05Z

FAILURE - Unable to auto-merge due to conflicts. Manual merge necessary. Refer to https://docs.rapids.ai/maintainers/gpuci/#auto-mergers for instructions.

@jameslamb

#3331) (#3388) This attempts to fix #3331. See that issue for a lot more details. Today, `.get_combined_model()` for the Dask RandomForest model objects returns `None` if it's called immediately after training. That pattern is recommended in ["Distributed Model Pickling"](https://docs.rapids.ai/api/cuml/stable/pickling_cuml_models.html#Distributed-Model-Pickling). Without this support, there is not a way to save a Dask RandomForest model using only public methods / attributes on those classes. Per #3331 (comment), this PR proposes populating the internal model object whenever `get_combined_model()` is called. ## Notes for Reviewers * I have not tested this locally. I spent about 3 hours trying to build `cuml` from source following https://github.com/rapidsai/cuml/blob/main/BUILD.md, and was not successful. If there is a containerized setup for developing `cuml`, I'd greatly appreciate it and would be happy to try it out. I've added a unit test for this change, so I hope that will be enough to confirm that this works and that CI will catch any mistakes I've made. Thanks for your time and consideration. Authors: - James Lamb (@jameslamb) - John Zedlewski (@JohnZed) Approvers: - John Zedlewski (@JohnZed) URL: #3388

@vinaydes

…#3364) Feature sampling was not supported for experimental backend of Random Forest (RF). This PR implements feature sampling without needing any auxiliary memory storage. Thus the `colids` array is also removed. Authors: - Vinay Deshpande (@vinaydes) Approvers: - Thejaswi. N. S (@teju85) - Philip Hyunsu Cho (@hcho3) - John Zedlewski (@JohnZed) URL: #3364

@mdemoret-nv

This PR sets the default type of "interpolated text" in sphinx to `:py:obj:`. This is very useful for us since we frequently use a single backtick in our python documentation to refer to another python object. Currently, the docstring: ``` `cuml.datasets.make_blobs` ``` Would generate (italicized, variable spaced): ![image](https://user-images.githubusercontent.com/42954918/106529509-dd4c1900-64a7-11eb-9977-49a2594c7c3e.png) This PR changes it to (bold, mono spaced): ![image](https://user-images.githubusercontent.com/42954918/106529282-77f82800-64a7-11eb-86e2-a38e37ccea0d.png) The added benefit here is if the interpolated text is found in the index, it will link to that section. So in the above example, clicking on `cuml.datasets.make_blobs` will take you to the function documentation. Finally, this PR adds a new type of interpolated text role: `:py:` . This should be used for inline python code. For example, the following code: ``` * `cuml.cuml.datasets.make_blobs` for references do objects (functions, classes, modules, etc.) * :py:`import cupy as cp` for inline python code * ``import cupy as cp`` for literal code ``` will generate: ![image](https://user-images.githubusercontent.com/42954918/106530276-3f594e00-64a9-11eb-8edf-569fc9dd829e.png) I also looked for a few examples to replace to help seed usage of these new options. Updating every location would be very time consuming and is best done over time. Authors: - Michael Demoret (@mdemoret-nv) Approvers: - Dante Gama Dessavre (@dantegd) URL: #3445

@cjnolet

This Pull Request adds initial support for multi-node multi-GPU DBSCAN, and fixes the bugs identified in #3094. It works by copying the dataset on all the workers and giving ownership of a subset of points to each one. The workers compute a partial clustering with the knowledge of the relationships between their points and the rest of the dataset, and the partial clusterings are merged to form the final labeling. This merging algorithm is also used to accumulate the results in case a batch-wise approach is used on a worker to limit the memory consumption. The multi-GPU implementation gives great speedups for large datasets, while for small datasets the performance is dominated by the Dask launch overhead, as shown in the figure below: ![mnmg_dbscan_perf](https://user-images.githubusercontent.com/17441062/104958437-55a6da80-59d0-11eb-8a18-fcca0d69c41b.png) Notes: - I have renamed variables in the DBSCAN implementation to match our style conventions (snake case). Sorry for the noise that it adds to this PR. - I refactored some CSR tests to accept multiple test cases instead of hardcoded ones, in order to add corner cases to weak CC. PR #3157 by @cjnolet changed the location of these tests, so I moved those that I had already refactored accordingly. At the moment only the tests that were in `cpp/test/prims/csr.cu` previously have been refactored. I was thinking that the others can be refactored later, I'd like @cjnolet's opinion on this refactoring. - Regarding testing, the MNMG tests are mostly a copy of the single-GPU ones, though I removed a few tests with very small datasets to avoid problems with MNMG (it doesn't really support the edge case where a worker owns 0 sample, as I think it's a fair assumption that MNMG DBSCAN isn't used with such a tiny dataset). - Also regarding tests, I changed the comparison function to account for the fact that border points are ambiguous. It assumes that the labeling of core points is minimal in both our implementation and the reference, so if this assumption changes we will need to update the tests accordingly. If you want to access a pseudo-code description and proof of the new algorithm, feel free to contact me. Tagging people to whom this PR is relevant: @teju85 @tfeher @MatthiasKohl @canonizer Authors: - Louis Sugy (@Nyrio) Approvers: - Tamas Bela Feher (@tfeher) - Corey J. Nolet (@cjnolet) URL: #3382

@viclafargue

Answers #3197 Currently, MNMG KNN and MNMG KNN Cl&Re use separate code. This PR intends to refactor the code on both the Python and C++ sides to use common code as much as possible. In addition, the functions in the C++ code currently have a very large number of arguments, making them unreadable to newcomers. This PR will work on reducing their number with the use of parameter structures like it's done elsewhere in cuML's codebase. Authors: - Victor Lafargue (@viclafargue) Approvers: - Corey J. Nolet (@cjnolet) - Dante Gama Dessavre (@dantegd) URL: #3307

@wphicks

Switch to api_return_generic decorator in order to get correct output type from make_classification Provide tests of global_output_type compliance for all dataset generators Authors: - William Hicks (@wphicks) Approvers: - John Zedlewski (@JohnZed) - Corey J. Nolet (@cjnolet) URL: #3415

@vinaydes

This PR introduces/proposes some of the basic and core (gpu-friendly!) data structures for implementing gplearn in cuML in order to address the issue #2121 . Tagging all who will be involved in this development: @vinaydes @venkywonka @vimarsh6739. PS: It also contains an experimental register-based stack implementation that will be useful while implementing CUDA-based AST evaluation, which is needed for organizing tournaments. Authors: - Thejaswi. N. S (@teju85) Approvers: - Corey J. Nolet (@cjnolet) URL: #3387

@divyegala

Authors: - Divye Gala (@divyegala) Approvers: - John Zedlewski (@JohnZed) URL: #3452

@cjnolet

…screpancies (#3454) Closes #3449 Authors: - Corey J. Nolet (@cjnolet) Approvers: - William Hicks (@wphicks) - Dante Gama Dessavre (@dantegd) URL: #3454

@levsnv

…3370) currently, the new tests take 4s each to run, despite the bare minimum size. We might be able to accelerate FIL for this usecase somwhat (later), so might make sense to merge the slow tests in once they all pass. Authors: - @levsnv Approvers: - Andy Adinets (@canonizer) - John Zedlewski (@JohnZed) URL: #3370

@wphicks

Mark kbinsdiscretizer quantile tests as xfail to avoid cupy/cupy#4607 Authors: - William Hicks (@wphicks) Approvers: - John Zedlewski (@JohnZed) - Michael Demoret (@mdemoret-nv) URL: #3450

@tfeher

closes #3233 This PR introduces 64bit int arithmetic while accessing SVM kernel tile to avoid overflow. Authors: - Tamas Bela Feher (@tfeher) - John Zedlewski (@JohnZed) Approvers: - John Zedlewski (@JohnZed) URL: #3420

@lowener

This PR will exclude all files from `_thirdparty` from the copyright check, since these files are under different licenses. Authors: - Micka (@lowener) Approvers: - AJ Schmidt (@ajschmidt8) - John Zedlewski (@JohnZed) URL: #3453

@cjnolet

Opening a new PR for these changes because I don't seem to be able to have push access to John Z's branch. Authors: - Corey J. Nolet (@cjnolet) - John Zedlewski (@JohnZed) Approvers: - Michael Demoret (@mdemoret-nv) URL: #3448

@lowener

In CI, the test_nearest_neighbors tests can take up to 21 min - about 5x longer than any other test. They cover a lot of functionality, so some time is expected, but there is also a lot of redundancy in parameters. This change reduces the combinatorial explosion significantly in tests and removes a number of superfluous "quality" test options we rarely use that add thousands of always-skipped tests. Locally, this reduces test time from 278s to 50s. Hopefully doesn't cause *too* many conflicts with @lowener's other knn test improvements. Authors: - John Zedlewski (@JohnZed) Approvers: - Victor Lafargue (@viclafargue) - Micka (@lowener) - Dante Gama Dessavre (@dantegd) URL: #3411

@lowener

When using default arguments on PCA I faced an error on the following line `PCA().fit(X).transform(X)`. `transform` was failing because n_components was `None` so I introduced `n_components_` to store the number of components used by the fit function without overwriting the user' `n_components`. I added some tests on this feature too. Authors: - Micka (@lowener) Approvers: - Dante Gama Dessavre (@dantegd) URL: #3320

@beckernick

closes #3366 Also removes an older build.sh script that to my knowledge we don't use anymore cc @beckernick who pointed out we need to bump it due to using cupy.full order parameter Authors: - Dante Gama Dessavre (@dantegd) Approvers: - John Zedlewski (@JohnZed) - Dillon Cullinan (@dillon-cullinan) - AJ Schmidt (@ajschmidt8) URL: #3378

@lowener

Closes #2864. This PR will replace `datasets.make_blobs` and `metrics.accuracy_score` from sklearn to cuml. Authors: - Micka (@lowener) Approvers: - John Zedlewski (@JohnZed) URL: #3408

@mdemoret-nv

Closes #3376 This PR moves a few header files around to fix bad dependencies from `include` -> `src`. Moving forward, the only allowed `#include` direction will be: - `src` -> `include` - `src` -> `src_prims` - `src_prims` -> `include` To facilitate this, a few changes needed to be made: 1. Functions for the C-API have been separated into their own `*_api.cpp` file (some were combined with C++ files) 2. `host_buffer`, `device_buffer`, `hostAllocator` and `deviceAllocator` were moved from `src_prims` to `include` 3. `#include <common/cumlHandle.hpp>` has been removed from all C++ files. This PR includes some additional improvements: - Updated the `include_checker.py` script: - Added functionality to check for badly placed `#pragma once` - Added functionality to fix some of the existing warnings - General refactor to support more checks - Fixed all incorrect uses of `#pragma once` - Fixed incorrect uses of `using namespace ...` in header files outside of a namespace - Removed some unnecessary `#include` While this touches a lot of files, the actual number of changes is relatively small. Below is a before/after comparison of the include graphs: **`src/include`:** Before: ![image](https://user-images.githubusercontent.com/42954918/105561661-ab32fe00-5cd4-11eb-8850-bfeaffaba60f.png) After: ![image](https://user-images.githubusercontent.com/42954918/105561673-b4bc6600-5cd4-11eb-8bb7-04be91f902b6.png) **`src/src_prims`:** Before: ![image](https://user-images.githubusercontent.com/42954918/105561616-79ba3280-5cd4-11eb-8a49-1d0c030df7c1.png) After: ![image](https://user-images.githubusercontent.com/42954918/105561633-89397b80-5cd4-11eb-9702-27b2a809834b.png) Authors: - Michael Demoret (@mdemoret-nv) Approvers: - Dante Gama Dessavre (@dantegd) - John Zedlewski (@JohnZed) - Thejaswi. N. S (@teju85) URL: #3402

@lowener

This solve partially #3435 by fixing the documentation of `SimpleImputer`. The next step will be to allow usage of lists for this algorithm. Authors: - Micka (@lowener) Approvers: - Michael Demoret (@mdemoret-nv) URL: #3447

codecov-io · 2021-02-05T23:55:24Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@fb1c810). Click here to learn what that means.
The diff coverage is n/a.

@@              Coverage Diff               @@
##             branch-0.19    #3444   +/-   ##
==============================================
  Coverage               ?   71.77%           
==============================================
  Files                  ?      212           
  Lines                  ?    17075           
  Branches               ?        0           
==============================================
  Hits                   ?    12256           
  Misses                 ?     4819           
  Partials               ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb1c810...c9c8619. Read the comment docs.

@viclafargue

Partially answers #3354. The PR updates some of the failing MNMG tests so that nightly testing finally pass. This PR addresses MNMG RF and MNMG KNN tests failures. Authors: - Victor Lafargue (@viclafargue) Approvers: - John Zedlewski (@JohnZed) URL: #3348

@dantegd

Fixes the current CI issue dask/distributed#4492 Authors: - Dante Gama Dessavre (@dantegd) - Michael Demoret (@mdemoret-nv) Approvers: - John Zedlewski (@JohnZed) - William Hicks (@wphicks) - @jakirkham - Corey J. Nolet (@cjnolet) URL: #3474

@dantegd

Closes #1739 Addresses most items of #3224 Authors: - Dante Gama Dessavre (@dantegd) Approvers: - John Zedlewski (@JohnZed) URL: #3433

@viclafargue

Following observations in #3318 Authors: - Victor Lafargue (@viclafargue) Approvers: - John Zedlewski (@JohnZed) URL: #3472

@JohnZed

Just small readme and api.rst updates for new algos. Authors: - John Zedlewski (@JohnZed) Approvers: - Michael Demoret (@mdemoret-nv) URL: #3475

@levsnv

…#3480) This is based on #3469. All of the credit for detecting the bug, and part of the credit for fixing it goes to @levsnv. Authors: - Andy Adinets (@canonizer) Approvers: - @levsnv - John Zedlewski (@JohnZed) URL: #3480

Fix forward-merger conflicts in #3444

Prepare Changelog for Automation (#3442)

da87257

This PR prepares the changelog to be automatically updated during releases. Authors: - AJ Schmidt (@ajschmidt8) Approvers: - Dante Gama Dessavre (@dantegd) URL: #3442

rapids-bot bot requested a review from a team as a code owner February 1, 2021 19:30

github-actions bot added the Cython / Python Cython or Python issue label Feb 1, 2021

rapids-bot bot requested a review from a team as a code owner February 2, 2021 15:23

github-actions bot added the libcuml label Feb 2, 2021

mdemoret-nv and others added 2 commits February 2, 2021 19:17

rapids-bot bot requested a review from a team as a code owner February 2, 2021 19:56

github-actions bot added the CMake label Feb 2, 2021

viclafargue and others added 4 commits February 2, 2021 20:16

Relaxing Batched SilhouetteScore Test Constraint (#3452)

d5bb758

Authors: - Divye Gala (@divyegala) Approvers: - John Zedlewski (@JohnZed) URL: #3452

ajschmidt8 mentioned this pull request Feb 3, 2021

Fix forward-merger conflicts in #3444 #3455

Merged

cjnolet and others added 5 commits February 3, 2021 21:18

Fix failing sparse NN test in CI by allowing small number of index di…

d69f08c

…screpancies (#3454) Closes #3449 Authors: - Corey J. Nolet (@cjnolet) Approvers: - William Hicks (@wphicks) - Dante Gama Dessavre (@dantegd) URL: #3454

Mark kbinsdiscretizer quantile tests as xfail (#3450)

a01a55d

Mark kbinsdiscretizer quantile tests as xfail to avoid cupy/cupy#4607 Authors: - William Hicks (@wphicks) Approvers: - John Zedlewski (@JohnZed) - Michael Demoret (@mdemoret-nv) URL: #3450

Fix memory bug for SVM with large n_rows (#3420)

b6e6450

closes #3233 This PR introduces 64bit int arithmetic while accessing SVM kernel tile to avoid overflow. Authors: - Tamas Bela Feher (@tfeher) - John Zedlewski (@JohnZed) Approvers: - John Zedlewski (@JohnZed) URL: #3420

Exempting thirdparty code from copyright checks (#3453)

6844932

This PR will exclude all files from `_thirdparty` from the copyright check, since these files are under different licenses. Authors: - Micka (@lowener) Approvers: - AJ Schmidt (@ajschmidt8) - John Zedlewski (@JohnZed) URL: #3453

rapids-bot bot requested a review from a team as a code owner February 4, 2021 17:37

github-actions bot added the Ops label Feb 4, 2021

cjnolet and others added 6 commits February 4, 2021 18:47

Fix naive bayes inputs (#3448)

4f47fd0

Opening a new PR for these changes because I don't seem to be able to have push access to John Z's branch. Authors: - Corey J. Nolet (@cjnolet) - John Zedlewski (@JohnZed) Approvers: - Michael Demoret (@mdemoret-nv) URL: #3448

Replacing sklearn functions with cuml in RF MNMG notebook (#3408)

dec134a

Closes #2864. This PR will replace `datasets.make_blobs` and `metrics.accuracy_score` from sklearn to cuml. Authors: - Micka (@lowener) Approvers: - John Zedlewski (@JohnZed) URL: #3408

Fixing documentation on SimpleImputer (#3447)

c9c8619

This solve partially #3435 by fixing the documentation of `SimpleImputer`. The next step will be to allow usage of lists for this algorithm. Authors: - Micka (@lowener) Approvers: - Michael Demoret (@mdemoret-nv) URL: #3447

viclafargue and others added 5 commits February 8, 2021 17:42

Update call to dask client persist (#3474)

ec462e7

Fixes the current CI issue dask/distributed#4492 Authors: - Dante Gama Dessavre (@dantegd) - Michael Demoret (@mdemoret-nv) Approvers: - John Zedlewski (@JohnZed) - William Hicks (@wphicks) - @jakirkham - Corey J. Nolet (@cjnolet) URL: #3474

cuml.experimental SHAP improvements (#3433)

8082f3b

Closes #1739 Addresses most items of #3224 Authors: - Dante Gama Dessavre (@dantegd) Approvers: - John Zedlewski (@JohnZed) URL: #3433

Adding warning for IVFPQ (#3472)

ae2aeda

Following observations in #3318 Authors: - Victor Lafargue (@viclafargue) Approvers: - John Zedlewski (@JohnZed) URL: #3472

Minor doc updates for 0.18 (#3475)

f1abbeb

Just small readme and api.rst updates for new algos. Authors: - John Zedlewski (@JohnZed) Approvers: - Michael Demoret (@mdemoret-nv) URL: #3475

dantegd approved these changes Feb 10, 2021

View reviewed changes

ajschmidt8 merged commit 50897aa into branch-0.19 Feb 11, 2021

ajschmidt8 added a commit that referenced this pull request Feb 11, 2021

Merge pull request #3455 from ajschmidt8/branch-0.19-merge-0.18

099f103

Fix forward-merger conflicts in #3444

This was referenced Feb 11, 2021

[REVIEW] Removing sparse prims since they've been moved to raft #3476

Closed

[REVIEW] NVTX Markers for RF and RF-backend #3014

Merged

[WIP] Sparse semiring follow-on & optimizations #3468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gpuCI] Auto-merge branch-0.18 to branch-0.19 [skip ci] #3444

[gpuCI] Auto-merge branch-0.18 to branch-0.19 [skip ci] #3444

GPUtester commented Feb 1, 2021

GPUtester commented Feb 1, 2021

codecov-io commented Feb 5, 2021

[gpuCI] Auto-merge branch-0.18 to branch-0.19 [skip ci] #3444

[gpuCI] Auto-merge branch-0.18 to branch-0.19 [skip ci] #3444

Conversation

GPUtester commented Feb 1, 2021

GPUtester commented Feb 1, 2021

codecov-io commented Feb 5, 2021

Codecov Report