Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Efficient out-of-bag computation per honest tree #200

Merged
merged 39 commits into from
Jan 26, 2024
Merged

Conversation

adam2392
Copy link
Collaborator

Right now, in order to have "out-of-bag" samples in the sklearn pipeline, one needs to turn bootstrap=True for the forest. The sample_weights are adjusted accordingly to "bootstrap" the samples.

This PR will allow one to access the OOB sample indices from the forest so MIGHT can take advantage of those for estimating the population statistics (e.g. S@S98, or MI).

Another issue is how can we make out-of-bag samples without bootstrap? This might need to be a features in our sklearn fork as it is a universal property for any supervised forest. I think we can simply leverage the

Changes proposed in this pull request:

  • Add oob_samples_ as a fitted property that efficiently generates the OOB indices per tree

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

PSSF23 and others added 25 commits January 16, 2024 14:01
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Copy link

codecov bot commented Jan 18, 2024

Codecov Report

Attention: 50 lines in your changes are missing coverage. Please review.

Comparison is base (4703a82) 88.84% compared to head (c1c6d64) 89.13%.

Files Patch % Lines
sktree/stats/forestht.py 11.53% 46 Missing ⚠️
sktree/tree/_honest_tree.py 55.55% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #200      +/-   ##
==========================================
+ Coverage   88.84%   89.13%   +0.29%     
==========================================
  Files          49       50       +1     
  Lines        4328     4492     +164     
==========================================
+ Hits         3845     4004     +159     
- Misses        483      488       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Copy link
Collaborator Author

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392 adam2392 requested review from PSSF23, SUKI-O, sampan501 and YuxinB and removed request for PSSF23 and SUKI-O January 19, 2024 16:41
@adam2392
Copy link
Collaborator Author

@sampan501 @PSSF23 @SUKI-O @YuxinB can you give this PR a review? Lmk if there's any questions.

@adam2392 adam2392 merged commit 7fc517e into main Jan 26, 2024
46 of 49 checks passed
@adam2392 adam2392 deleted the honestoob branch January 26, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants