Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add table comparing ensemble methods #51

Open
wants to merge 1 commit into
base: gh-pages
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions _episodes/04-ensemble-methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,15 @@ Machine learning jargon can often be hard to remember, so here is a quick summar
* Bagging - different subsets, same models, trained in parallel
* Boosting - subsets of bad estimates, same models, trained in series

### Which ensemble method is best?

| **Ensemble method** | **What it does** | **Best for** | **Avoid if** |
|----------------------|------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| **Stacking** | Combines predictions from different models trained on the same dataset using a meta-model. | Leveraging diverse models to improve overall performance. | You need simple and fast models or lack diverse base learners. |
| **Bagging** | Trains the same model on different subsets of the data (via bootstrapping) and averages their results. | Reducing variance (e.g., overfitting) and stabilizing predictions in noisy/small datasets. | The problem requires reducing bias or the base model is already stable (e.g., linear regression). |
| **Boosting** | Sequentially trains models, focusing on correcting errors made by previous models. | Capturing complex patterns in large datasets and achieving the highest possible accuracy. | The dataset is small or noisy, or you lack computational resources. |


## Using Bagging (Random Forests) for a classification problem

In this session we'll take another look at the penguins data and applying one of the most common bagging approaches, random forests, to try and solve our species classification problem. First we'll load in the dataset and define a train and test split.
Expand Down
Loading