-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Efficient out-of-bag computation per honest tree #200
Conversation
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #200 +/- ##
==========================================
+ Coverage 88.84% 89.13% +0.29%
==========================================
Files 49 50 +1
Lines 4328 4492 +164
==========================================
+ Hits 3845 4004 +159
- Misses 483 488 +5 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Changes in Cython code were to fix the building of wheels in Windows
- Changes in GH actions workflows,
examples/calibration
anddoc/
are used to fix the CI workflows - All others are fair game to review
- Cython return types from
intp_t
toint
was done in accordance with MAINT Convertint
tointp_t
ctype def intree/
related code scikit-learn/scikit-learn#27546
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
@sampan501 @PSSF23 @SUKI-O @YuxinB can you give this PR a review? Lmk if there's any questions. |
Right now, in order to have "out-of-bag" samples in the sklearn pipeline, one needs to turn
bootstrap=True
for the forest. Thesample_weights
are adjusted accordingly to "bootstrap" the samples.This PR will allow one to access the OOB sample indices from the forest so MIGHT can take advantage of those for estimating the population statistics (e.g. S@S98, or MI).
Another issue is how can we make out-of-bag samples without bootstrap? This might need to be a features in our sklearn fork as it is a universal property for any supervised forest. I think we can simply leverage the
Changes proposed in this pull request:
oob_samples_
as a fitted property that efficiently generates the OOB indices per treeBefore submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting