Expand hypothesis testing to all linear models #4974

csadorf · 2022-11-04T17:15:47Z

Should be merged after #4952 and #4973.

This PR adds `get_params()` member function to `TargetEncoder`. Hopefully it can resolve the issue rapidsai#4574 Authors: - Jiwei Liu (https://github.com/daxiongshu) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4588

The 2.3.0 version of Treelite incorporates the following improvements: * GTIL optimization using multiple CPU threads (dmlc/treelite#353, dmlc/treelite#355, dmlc/treelite#357, dmlc/treelite#358, dmlc/treelite#362, dmlc/treelite#367) * dmlc/treelite#365 * dmlc/treelite#366 * dmlc/treelite#368 Requires rapidsai/integration#436 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#4590

Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4598

…i#4593) This PR depends on rapidsai/raft#520 Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4593

Depends on rapidsai#4295 PR allows `libcuml++` to be built with individual algorithms, or individual families of algorithms with the argument `CUML_ALGORITHMS`. It defaults to `ALL`, and can take multiple options like: ``` cmake .. -DCUML_ALGORITHMS="FIL;TREESHAP" ``` which will build a `libcuml++` only containing FIL and GPUTreeSHAP components. PR to update build documentation will follow up. Authors: - Dante Gama Dessavre (https://github.com/dantegd) - Divye Gala (https://github.com/divyegala) Approvers: - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4296

This PR allows `TargetEncoder` to encode the `variance` of the target as requested by rapidsai#4440 Authors: - Jiwei Liu (https://github.com/daxiongshu) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4483

Closes rapidsai#4341. The `classmethod` decorator seems to not be useful here and is blocking the serialization of SimpleImputer. Authors: - Micka (https://github.com/lowener) Approvers: - Victor Lafargue (https://github.com/viclafargue) - William Hicks (https://github.com/wphicks) URL: rapidsai#4439

Fix rapidsai#4525 as well as a hard crash in c++ benchmarks due to some recent changes in raft. Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4594

…#4601) This is the continuation of PR rapidsai#4588 to resolve issue rapidsai#4574 Authors: - Jiwei Liu (https://github.com/daxiongshu) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4601

Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4526

Closes rapidsai#4566. Authors: - Micka (https://github.com/lowener) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4597

like the figure below in https://docs.rapids.ai/api/cuml/stable/api.html#cuml.neighbors.NearestNeighbors ![image](https://user-images.githubusercontent.com/8027142/156071101-0a5bf96a-e073-4dea-8314-dfec733699c2.png) Some captions are going into their heading. Authors: - https://github.com/Yosshi999 Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4609

Closes rapidsai#1666. The implementation of this variant is straightforward and matches sklearn. Authors: - Micka (https://github.com/lowener) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4595

See rapidsai#3569. XFailing right now to unblock CI. Authors: - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4621

Using a brute force approach compared to sklearn's kd/ball tree. Todo: - [x] Implement sample method - [x] Sample weights - [x] Evaluate which metrics are missing - [x] Tests for sample - [x] Docstrings Authors: - Rory Mitchell (https://github.com/RAMitchell) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4545

Answers rapidsai#2821 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4297

Closes rapidsai#4031. Scikit-learn is rescaling the data ([here](https://github.com/scikit-learn/scikit-learn/blob/0d378913be6d7e485b792ea36e9268be31ed52d0/sklearn/linear_model/_base.py#L313)) to take into account the sample_weight parameter. Authors: - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4428

RAFT PR 513 changed meaning of probability for Bernoulli and Scaled Bernoulli distribution. This PR does corresponding change in cuML. Authors: - Vinay Deshpande (https://github.com/vinaydes) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4628

In the near future, the [rapidsai/ops-bot](https://github.com/rapidsai/ops-bot) GitHub application that we use for GitHub automation will be enabled on all repositories in the `rapidsai` GitHub organization. Since not all features of the application are applicable to all repositories, this PR adds a new file, `.github/ops-bot.yaml`, which can configure which features are enabled per repository. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: rapidsai#4630

@levsnv

Templatize FIL types to add float64 support. This is based on the work by @levsnv, specifically rapidsai#4569. This supersedes rapidsai#4569. Authors: - Andy Adinets (https://github.com/canonizer) - Levs Dolgovs (https://github.com/levsnv) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#4625

…i#4633) Add explicit option similar to FAISS and Treelite to be able to build a single `libcuml++` with all RAFT binary dependencies. Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4633

Rapids is upgrading to `2022.02.1` minimum version of dask. This PR updates those pinnings. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4632

…ation) (rapidsai#4556) Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Micka (https://github.com/lowener) URL: rapidsai#4556

[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]

Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4637

[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]

This nanoPR fixes performance regression caused due to improper stream assignments to the decision trees. Before fix: | sno | algo | input | cu_time | cpu_time | cuml_acc | cpu_acc | speedup | n_samples | n_features | max_depth | n_estimators | n_bins | n_streams | n_jobs | n_classes | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 0 | RandomForestClassifier | numpy | 32.635321855545044 | 0.0 | 0.99468 | 0.0 | 0.0 | 800000 | 64 | 8 | 500 | 128 | 4 | -1 | 2 | | 1 | RandomForestClassifier | numpy | 40.36453413963318 | 0.0 | 0.994855 | 0.0 | 0.0 | 800000 | 64 | 10 | 500 | 128 | 4 | -1 | 2 | | 2 | RandomForestClassifier | numpy | 61.35148477554321 | 0.0 | 0.99504 | 0.0 | 0.0 | 800000 | 64 | 16 | 500 | 128 | 4 | -1 | 2 | After fix: | sno | algo | input | cu_time | cpu_time | cuml_acc | cpu_acc | speedup | n_samples | n_features | max_depth | n_estimators | n_bins | n_streams | n_jobs | n_classes | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 0 | RandomForestClassifier | numpy | 28.637776374816895 | 0.0 | 0.99468 | 0.0 | 0.0 | 800000 | 64 | 8 | 500 | 128 | 4 | -1 | 2 1 | RandomForestClassifier | numpy | 34.11380743980408 | 0.0 | 0.994855 | 0.0 | 0.0 | 800000 | 64 | 10 | 500 | 128 | 4 | -1 | 2 2 | RandomForestClassifier | numpy | 47.153409481048584 | 0.0 | 0.99504 | 0.0 | 0.0 | 800000 | 64 | 16 | 500 | 128 | 4 | -1 | 2 Command run in `cuml/` ``` python python/cuml/run_benchmarks.py--num-rows 800000 --num-features 64 --skip-cpu --test-split 0.2 --cuml-param-sweep "n_bins=[128]" "n_streams=[4]" --cpu-param-sweep "n_jobs=[-1]" --param-sweep "max_depth=[8,10,16]" "n_estimators=[500]" --n-reps 1 --csv pool-2112-cls-800000.csv --dataset-param-sweep "n_classes=[2]" --dtype "fp32" --dataset classification -- RandomForestClassifier ``` Authors: - Venkat (https://github.com/venkywonka) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4644

[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]

- The strategy chooses the n_informative default strategy more smartly to satisfy the inequality assumption between the numnber of classes and clusters per class and number of informative features. - The strategy tries to prevent a more informative error message in case that the assumption cannot be met with the given parameters arguments.

csadorf · 2022-12-15T12:49:30Z

We should merge #5065 and address #4963 before moving forward with this.

daxiongshu and others added 30 commits February 22, 2022 18:29

cuml now supports building non static treelite (rapidsai#4598)

5a18429

Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4598

Remove RAFT memory management (2/2) (rapidsai#4526)

d2099b8

Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4526

Fitsne as default tsne method (rapidsai#4597)

701b62f

Closes rapidsai#4566. Authors: - Micka (https://github.com/lowener) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4597

Merge 'branch-22.02' into 'branch-22.04'

a2c91a6

Add Complement Naive Bayes (rapidsai#4595)

2b0c200

Closes rapidsai#1666. The implementation of this variant is straightforward and matches sklearn. Authors: - Micka (https://github.com/lowener) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4595

XFail test_hinge_loss temporarily (rapidsai#4621)

7c3da85

See rapidsai#3569. XFailing right now to unblock CI. Authors: - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4621

Use FAISS with RMM (rapidsai#4297)

4c1d671

Answers rapidsai#2821 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4297

DOC

c04cee4

Merge pull request rapidsai#4636 from rapidsai/branch-22.04

fbc1a50

[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]

Remove RAFT MM includes (rapidsai#4637)

3f54aec

Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4637

Merge pull request rapidsai#4640 from rapidsai/branch-22.04

f3afbac

[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]

Merge pull request rapidsai#4650 from rapidsai/branch-22.04

68e70c2

[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]

csadorf added 14 commits December 5, 2022 06:15

Hypothesize test_weighted_linear_regression test.

63fcd12

Hypothesize test_ridge_regression_model_default test.

55c91ad

Hypothesize test_ridge_regression_model test.

73960b0

Mark test_ridge_regression_model_default as xfail.

cd3db50

Hypothesize test_weighted_ridge test.

28fe66d

Implement standard_classification_datasets hypothesis strategy.

a0af4db

Hypothesize test_logistic_regression.

ac58201

Hypothesize test_logistic_regression_unscaled test.

8b97769

Hypothesize test_logistic_regression_model_default test.

db0017e

Hypothesize test_logistic_regression_model_digits test.

9b59a3d

Hypothesize test_logistic_regression_sparse_only test.

ce341af

Hypothesize test_logistic_regression_decision_function test.

f7dda37

Hypothesize test_logistic_regression_predict_proba test.

c01b867

Hypothesize test_ridge_predict_convert_dtype test.

2fbe4a5

csadorf force-pushed the fea-expand-hypothesis-testing-to-all-linear-models branch from 6849d71 to a09b558 Compare December 6, 2022 12:12

csadorf mentioned this pull request Dec 6, 2022

Expand hypothesis testing for linear models #5065

Merged

csadorf added 6 commits December 6, 2022 12:17

Hypothesize test_logistic_predict_convert_dtype test.

dd6283d

Hypothesize test_logistic_regression_weighting test.

c3e1105

Hypothesize test_linear_models_set_params test.

2b9a447

Hypothesize test_elasticnet_solvers_eq test.

e0dda4a

Remove obsolete utility functions and imports.

b8baffe

csadorf force-pushed the fea-expand-hypothesis-testing-to-all-linear-models branch from c3149ad to b8baffe Compare December 6, 2022 20:20

csadorf changed the title ~~Fea expand hypothesis testing to all linear models~~ Expand hypothesis testing to all linear models Dec 6, 2022

csadorf added 0 - Blocked Cannot progress due to external reasons and removed 2 - In Progress Currenty a work in progress labels Dec 15, 2022

ajschmidt8 force-pushed the branch-23.02 branch from 3bc1de0 to e7fd6cc Compare February 13, 2023 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand hypothesis testing to all linear models #4974

Expand hypothesis testing to all linear models #4974

csadorf commented Nov 4, 2022 •

edited

Loading

csadorf commented Dec 15, 2022

Expand hypothesis testing to all linear models #4974

Are you sure you want to change the base?

Expand hypothesis testing to all linear models #4974

Conversation

csadorf commented Nov 4, 2022 • edited Loading

csadorf commented Dec 15, 2022

csadorf commented Nov 4, 2022 •

edited

Loading