-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand hypothesis testing to all linear models #4974
Draft
csadorf
wants to merge
10,000
commits into
rapidsai:branch-23.02
Choose a base branch
from
csadorf:fea-expand-hypothesis-testing-to-all-linear-models
base: branch-23.02
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Expand hypothesis testing to all linear models #4974
csadorf
wants to merge
10,000
commits into
rapidsai:branch-23.02
from
csadorf:fea-expand-hypothesis-testing-to-all-linear-models
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR adds `get_params()` member function to `TargetEncoder`. Hopefully it can resolve the issue rapidsai#4574 Authors: - Jiwei Liu (https://github.com/daxiongshu) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4588
The 2.3.0 version of Treelite incorporates the following improvements: * GTIL optimization using multiple CPU threads (dmlc/treelite#353, dmlc/treelite#355, dmlc/treelite#357, dmlc/treelite#358, dmlc/treelite#362, dmlc/treelite#367) * dmlc/treelite#365 * dmlc/treelite#366 * dmlc/treelite#368 Requires rapidsai/integration#436 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#4590
Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4598
…i#4593) This PR depends on rapidsai/raft#520 Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4593
Depends on rapidsai#4295 PR allows `libcuml++` to be built with individual algorithms, or individual families of algorithms with the argument `CUML_ALGORITHMS`. It defaults to `ALL`, and can take multiple options like: ``` cmake .. -DCUML_ALGORITHMS="FIL;TREESHAP" ``` which will build a `libcuml++` only containing FIL and GPUTreeSHAP components. PR to update build documentation will follow up. Authors: - Dante Gama Dessavre (https://github.com/dantegd) - Divye Gala (https://github.com/divyegala) Approvers: - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4296
This PR allows `TargetEncoder` to encode the `variance` of the target as requested by rapidsai#4440 Authors: - Jiwei Liu (https://github.com/daxiongshu) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4483
Closes rapidsai#4341. The `classmethod` decorator seems to not be useful here and is blocking the serialization of SimpleImputer. Authors: - Micka (https://github.com/lowener) Approvers: - Victor Lafargue (https://github.com/viclafargue) - William Hicks (https://github.com/wphicks) URL: rapidsai#4439
Fix rapidsai#4525 as well as a hard crash in c++ benchmarks due to some recent changes in raft. Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4594
…#4601) This is the continuation of PR rapidsai#4588 to resolve issue rapidsai#4574 Authors: - Jiwei Liu (https://github.com/daxiongshu) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4601
Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4526
Closes rapidsai#4566. Authors: - Micka (https://github.com/lowener) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4597
like the figure below in https://docs.rapids.ai/api/cuml/stable/api.html#cuml.neighbors.NearestNeighbors ![image](https://user-images.githubusercontent.com/8027142/156071101-0a5bf96a-e073-4dea-8314-dfec733699c2.png) Some captions are going into their heading. Authors: - https://github.com/Yosshi999 Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4609
Closes rapidsai#1666. The implementation of this variant is straightforward and matches sklearn. Authors: - Micka (https://github.com/lowener) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4595
See rapidsai#3569. XFailing right now to unblock CI. Authors: - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4621
Using a brute force approach compared to sklearn's kd/ball tree. Todo: - [x] Implement sample method - [x] Sample weights - [x] Evaluate which metrics are missing - [x] Tests for sample - [x] Docstrings Authors: - Rory Mitchell (https://github.com/RAMitchell) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4545
Answers rapidsai#2821 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4297
Closes rapidsai#4031. Scikit-learn is rescaling the data ([here](https://github.com/scikit-learn/scikit-learn/blob/0d378913be6d7e485b792ea36e9268be31ed52d0/sklearn/linear_model/_base.py#L313)) to take into account the sample_weight parameter. Authors: - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4428
RAFT PR 513 changed meaning of probability for Bernoulli and Scaled Bernoulli distribution. This PR does corresponding change in cuML. Authors: - Vinay Deshpande (https://github.com/vinaydes) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4628
In the near future, the [rapidsai/ops-bot](https://github.com/rapidsai/ops-bot) GitHub application that we use for GitHub automation will be enabled on all repositories in the `rapidsai` GitHub organization. Since not all features of the application are applicable to all repositories, this PR adds a new file, `.github/ops-bot.yaml`, which can configure which features are enabled per repository. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: rapidsai#4630
Templatize FIL types to add float64 support. This is based on the work by @levsnv, specifically rapidsai#4569. This supersedes rapidsai#4569. Authors: - Andy Adinets (https://github.com/canonizer) - Levs Dolgovs (https://github.com/levsnv) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#4625
…i#4633) Add explicit option similar to FAISS and Treelite to be able to build a single `libcuml++` with all RAFT binary dependencies. Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4633
Rapids is upgrading to `2022.02.1` minimum version of dask. This PR updates those pinnings. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4632
…ation) (rapidsai#4556) Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Micka (https://github.com/lowener) URL: rapidsai#4556
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4637
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
This nanoPR fixes performance regression caused due to improper stream assignments to the decision trees. Before fix: | sno | algo | input | cu_time | cpu_time | cuml_acc | cpu_acc | speedup | n_samples | n_features | max_depth | n_estimators | n_bins | n_streams | n_jobs | n_classes | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 0 | RandomForestClassifier | numpy | 32.635321855545044 | 0.0 | 0.99468 | 0.0 | 0.0 | 800000 | 64 | 8 | 500 | 128 | 4 | -1 | 2 | | 1 | RandomForestClassifier | numpy | 40.36453413963318 | 0.0 | 0.994855 | 0.0 | 0.0 | 800000 | 64 | 10 | 500 | 128 | 4 | -1 | 2 | | 2 | RandomForestClassifier | numpy | 61.35148477554321 | 0.0 | 0.99504 | 0.0 | 0.0 | 800000 | 64 | 16 | 500 | 128 | 4 | -1 | 2 | After fix: | sno | algo | input | cu_time | cpu_time | cuml_acc | cpu_acc | speedup | n_samples | n_features | max_depth | n_estimators | n_bins | n_streams | n_jobs | n_classes | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 0 | RandomForestClassifier | numpy | 28.637776374816895 | 0.0 | 0.99468 | 0.0 | 0.0 | 800000 | 64 | 8 | 500 | 128 | 4 | -1 | 2 1 | RandomForestClassifier | numpy | 34.11380743980408 | 0.0 | 0.994855 | 0.0 | 0.0 | 800000 | 64 | 10 | 500 | 128 | 4 | -1 | 2 2 | RandomForestClassifier | numpy | 47.153409481048584 | 0.0 | 0.99504 | 0.0 | 0.0 | 800000 | 64 | 16 | 500 | 128 | 4 | -1 | 2 Command run in `cuml/` ``` python python/cuml/run_benchmarks.py--num-rows 800000 --num-features 64 --skip-cpu --test-split 0.2 --cuml-param-sweep "n_bins=[128]" "n_streams=[4]" --cpu-param-sweep "n_jobs=[-1]" --param-sweep "max_depth=[8,10,16]" "n_estimators=[500]" --n-reps 1 --csv pool-2112-cls-800000.csv --dataset-param-sweep "n_classes=[2]" --dtype "fp32" --dataset classification -- RandomForestClassifier ``` Authors: - Venkat (https://github.com/venkywonka) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4644
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
csadorf
force-pushed
the
fea-expand-hypothesis-testing-to-all-linear-models
branch
from
December 6, 2022 12:12
6849d71
to
a09b558
Compare
- The strategy chooses the n_informative default strategy more smartly to satisfy the inequality assumption between the numnber of classes and clusters per class and number of informative features. - The strategy tries to prevent a more informative error message in case that the assumption cannot be met with the given parameters arguments.
csadorf
force-pushed
the
fea-expand-hypothesis-testing-to-all-linear-models
branch
from
December 6, 2022 20:20
c3149ad
to
b8baffe
Compare
csadorf
changed the title
Fea expand hypothesis testing to all linear models
Expand hypothesis testing to all linear models
Dec 6, 2022
csadorf
added
0 - Blocked
Cannot progress due to external reasons
and removed
2 - In Progress
Currenty a work in progress
labels
Dec 15, 2022
ajschmidt8
force-pushed
the
branch-23.02
branch
from
February 13, 2023 18:56
3bc1de0
to
e7fd6cc
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
0 - Blocked
Cannot progress due to external reasons
Cython / Python
Cython or Python issue
improvement
Improvement / enhancement to an existing function
non-breaking
Non-breaking change
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Should be merged after #4952 and #4973.