diff --git a/jupyter-book/_toc.yml b/jupyter-book/_toc.yml
index dfc89c04f..80bb88aa3 100644
--- a/jupyter-book/_toc.yml
+++ b/jupyter-book/_toc.yml
@@ -91,34 +91,26 @@ parts:
     sections:
     - file: linear_models/linear_models_slides
     - file: linear_models/linear_models_quiz_m4_01
-  - file: linear_models/linear_models_regression_index
-    sections:
     - file: python_scripts/linear_regression_without_sklearn
     - file: python_scripts/linear_models_ex_01
     - file: python_scripts/linear_models_sol_01
     - file: python_scripts/linear_regression_in_sklearn
+    - file: python_scripts/logistic_regression
     - file: linear_models/linear_models_quiz_m4_02
   - file: linear_models/linear_models_non_linear_index
     sections:
+    - file: python_scripts/linear_regression_non_linear_link
     - file: python_scripts/linear_models_ex_02
     - file: python_scripts/linear_models_sol_02
-    - file: python_scripts/linear_regression_non_linear_link
-    - file: python_scripts/linear_models_ex_03
-    - file: python_scripts/linear_models_sol_03
+    - file: python_scripts/logistic_regression_non_linear
     - file: linear_models/linear_models_quiz_m4_03
   - file: linear_models/linear_models_regularization_index
     sections:
     - file: linear_models/regularized_linear_models_slides
     - file: python_scripts/linear_models_regularization
-    - file: python_scripts/linear_models_ex_04
-    - file: python_scripts/linear_models_sol_04
     - file: linear_models/linear_models_quiz_m4_04
-  - file: linear_models/linear_models_classification_index
-    sections:
-    - file: python_scripts/logistic_regression
-    - file: python_scripts/linear_models_ex_05
-    - file: python_scripts/linear_models_sol_05
-    - file: python_scripts/logistic_regression_non_linear
+    - file: python_scripts/linear_models_ex_03
+    - file: python_scripts/linear_models_sol_03
     - file: linear_models/linear_models_quiz_m4_05
   - file: linear_models/linear_models_wrap_up_quiz
   - file: linear_models/linear_models_module_take_away
diff --git a/jupyter-book/linear_models/linear_models_classification_index.md b/jupyter-book/linear_models/linear_models_classification_index.md
deleted file mode 100644
index 81399c436..000000000
--- a/jupyter-book/linear_models/linear_models_classification_index.md
+++ /dev/null
@@ -1,5 +0,0 @@
-# Linear model for classification
-
-```{tableofcontents}
-
-```
diff --git a/jupyter-book/linear_models/linear_models_non_linear_index.md b/jupyter-book/linear_models/linear_models_non_linear_index.md
index d56614515..22fe06b20 100644
--- a/jupyter-book/linear_models/linear_models_non_linear_index.md
+++ b/jupyter-book/linear_models/linear_models_non_linear_index.md
@@ -1,4 +1,4 @@
-# Modelling non-linear features-target relationships
+# Non-linear feature engineering for linear models
 
 ```{tableofcontents}
 
diff --git a/jupyter-book/linear_models/linear_models_regression_index.md b/jupyter-book/linear_models/linear_models_regression_index.md
deleted file mode 100644
index 8b8144a84..000000000
--- a/jupyter-book/linear_models/linear_models_regression_index.md
+++ /dev/null
@@ -1,5 +0,0 @@
-# Linear regression
-
-```{tableofcontents}
-
-```
diff --git a/notebooks/linear_models_ex_02.ipynb b/notebooks/linear_models_ex_02.ipynb
index c9c0aad96..4cf750e81 100644
--- a/notebooks/linear_models_ex_02.ipynb
+++ b/notebooks/linear_models_ex_02.ipynb
@@ -4,39 +4,19 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# \ud83d\udcdd Exercise M4.02\n",
+    "# \ud83d\udcdd Exercise M4.03\n",
     "\n",
-    "The goal of this exercise is to build an intuition on what will be the\n",
-    "parameters' values of a linear model when the link between the data and the\n",
-    "target is non-linear.\n",
+    "In all previous notebooks, we only used a single feature in `data`. But we\n",
+    "have already shown that we could add new features to make the model more\n",
+    "expressive by deriving new features, based on the original feature.\n",
     "\n",
-    "First, we will generate such non-linear data.\n",
+    "The aim of this notebook is to train a linear regression algorithm on a\n",
+    "dataset with more than a single feature.\n",
     "\n",
-    "<div class=\"admonition tip alert alert-warning\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Tip</p>\n",
-    "<p class=\"last\"><tt class=\"docutils literal\">np.random.RandomState</tt> allows to create a random number generator which can\n",
-    "be later used to get deterministic results.</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "\n",
-    "# Set the seed for reproduction\n",
-    "rng = np.random.RandomState(0)\n",
-    "\n",
-    "# Generate data\n",
-    "n_sample = 100\n",
-    "data_max, data_min = 1.4, -1.4\n",
-    "len_data = data_max - data_min\n",
-    "data = rng.rand(n_sample) * len_data - len_data / 2\n",
-    "noise = rng.randn(n_sample) * 0.3\n",
-    "target = data**3 - 0.5 * data**2 + noise"
+    "We will load a dataset about house prices in California. The dataset consists\n",
+    "of 8 features regarding the demography and geography of districts in\n",
+    "California and the aim is to predict the median house price of each district.\n",
+    "We will use all 8 features to predict the target, the median house price."
    ]
   },
   {
@@ -45,8 +25,8 @@
    "source": [
     "<div class=\"admonition note alert alert-info\">\n",
     "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
-    "<p class=\"last\">To ease the plotting, we will create a Pandas dataframe containing the data\n",
-    "and target</p>\n",
+    "<p class=\"last\">If you want a deeper overview regarding this dataset, you can refer to the\n",
+    "Appendix - Datasets description section at the end of this MOOC.</p>\n",
     "</div>"
    ]
   },
@@ -56,65 +36,19 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import pandas as pd\n",
+    "from sklearn.datasets import fetch_california_housing\n",
     "\n",
-    "full_data = pd.DataFrame({\"data\": data, \"target\": target})"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import seaborn as sns\n",
-    "\n",
-    "_ = sns.scatterplot(\n",
-    "    data=full_data, x=\"data\", y=\"target\", color=\"black\", alpha=0.5\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "lines_to_next_cell": 2
-   },
-   "source": [
-    "We observe that the link between the data `data` and vector `target` is\n",
-    "non-linear. For instance, `data` could represent the years of experience\n",
-    "(normalized) and `target` the salary (normalized). Therefore, the problem here\n",
-    "would be to infer the salary given the years of experience.\n",
-    "\n",
-    "Using the function `f` defined below, find both the `weight` and the\n",
-    "`intercept` that you think will lead to a good linear model. Plot both the\n",
-    "data and the predictions of this model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def f(data, weight=0, intercept=0):\n",
-    "    target_predict = weight * data + intercept\n",
-    "    return target_predict"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
+    "data, target = fetch_california_housing(as_frame=True, return_X_y=True)\n",
+    "target *= 100  # rescale the target in k$\n",
+    "data.head()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Compute the mean squared error for this model"
+    "Now it is your turn to train a linear regression model on this dataset. First,\n",
+    "create a linear regression model."
    ]
   },
   {
@@ -130,16 +64,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Train a linear regression model on this dataset.\n",
-    "\n",
-    "<div class=\"admonition warning alert alert-danger\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Warning</p>\n",
-    "<p class=\"last\">In scikit-learn, by convention <tt class=\"docutils literal\">data</tt> (also called <tt class=\"docutils literal\">X</tt> in the scikit-learn\n",
-    "documentation) should be a 2D matrix of shape <tt class=\"docutils literal\">(n_samples, n_features)</tt>.\n",
-    "If <tt class=\"docutils literal\">data</tt> is a 1D vector, you need to reshape it into a matrix with a\n",
-    "single column if the vector represents a feature or a single row if the\n",
-    "vector represents a sample.</p>\n",
-    "</div>"
+    "Execute a cross-validation with 10 folds and use the mean absolute error (MAE)\n",
+    "as metric. Be sure to *return* the fitted *estimators*."
    ]
   },
   {
@@ -148,8 +74,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from sklearn.linear_model import LinearRegression\n",
-    "\n",
     "# Write your code here."
    ]
   },
@@ -157,8 +81,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Compute predictions from the linear regression model and plot both the data\n",
-    "and the predictions."
+    "Compute the mean and std of the MAE in thousands of dollars (k$)."
    ]
   },
   {
@@ -172,9 +95,15 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "lines_to_next_cell": 2
+   },
    "source": [
-    "Compute the mean squared error"
+    "Inspect the fitted model using a box plot to show the distribution of values\n",
+    "for the coefficients returned from the cross-validation. Hint: use the\n",
+    "function\n",
+    "[`df.plot.box()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.box.html)\n",
+    "to create a box plot."
    ]
   },
   {
diff --git a/notebooks/linear_models_ex_03.ipynb b/notebooks/linear_models_ex_03.ipynb
deleted file mode 100644
index 4cf750e81..000000000
--- a/notebooks/linear_models_ex_03.ipynb
+++ /dev/null
@@ -1,130 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# \ud83d\udcdd Exercise M4.03\n",
-    "\n",
-    "In all previous notebooks, we only used a single feature in `data`. But we\n",
-    "have already shown that we could add new features to make the model more\n",
-    "expressive by deriving new features, based on the original feature.\n",
-    "\n",
-    "The aim of this notebook is to train a linear regression algorithm on a\n",
-    "dataset with more than a single feature.\n",
-    "\n",
-    "We will load a dataset about house prices in California. The dataset consists\n",
-    "of 8 features regarding the demography and geography of districts in\n",
-    "California and the aim is to predict the median house price of each district.\n",
-    "We will use all 8 features to predict the target, the median house price."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div class=\"admonition note alert alert-info\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
-    "<p class=\"last\">If you want a deeper overview regarding this dataset, you can refer to the\n",
-    "Appendix - Datasets description section at the end of this MOOC.</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.datasets import fetch_california_housing\n",
-    "\n",
-    "data, target = fetch_california_housing(as_frame=True, return_X_y=True)\n",
-    "target *= 100  # rescale the target in k$\n",
-    "data.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now it is your turn to train a linear regression model on this dataset. First,\n",
-    "create a linear regression model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Execute a cross-validation with 10 folds and use the mean absolute error (MAE)\n",
-    "as metric. Be sure to *return* the fitted *estimators*."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Compute the mean and std of the MAE in thousands of dollars (k$)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "lines_to_next_cell": 2
-   },
-   "source": [
-    "Inspect the fitted model using a box plot to show the distribution of values\n",
-    "for the coefficients returned from the cross-validation. Hint: use the\n",
-    "function\n",
-    "[`df.plot.box()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.box.html)\n",
-    "to create a box plot."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  },
-  "kernelspec": {
-   "display_name": "Python 3",
-   "name": "python3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/notebooks/linear_models_ex_04.ipynb b/notebooks/linear_models_ex_04.ipynb
deleted file mode 100644
index 77086778b..000000000
--- a/notebooks/linear_models_ex_04.ipynb
+++ /dev/null
@@ -1,165 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# \ud83d\udcdd Exercise M4.04\n",
-    "\n",
-    "In the previous notebook, we saw the effect of applying some regularization on\n",
-    "the coefficient of a linear model.\n",
-    "\n",
-    "In this exercise, we will study the advantage of using some regularization\n",
-    "when dealing with correlated features.\n",
-    "\n",
-    "We will first create a regression dataset. This dataset will contain 2,000\n",
-    "samples and 5 features from which only 2 features will be informative."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.datasets import make_regression\n",
-    "\n",
-    "data, target, coef = make_regression(\n",
-    "    n_samples=2_000,\n",
-    "    n_features=5,\n",
-    "    n_informative=2,\n",
-    "    shuffle=False,\n",
-    "    coef=True,\n",
-    "    random_state=0,\n",
-    "    noise=30,\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "When creating the dataset, `make_regression` returns the true coefficient used\n",
-    "to generate the dataset. Let's plot this information."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "\n",
-    "feature_names = [\n",
-    "    \"Relevant feature #0\",\n",
-    "    \"Relevant feature #1\",\n",
-    "    \"Noisy feature #0\",\n",
-    "    \"Noisy feature #1\",\n",
-    "    \"Noisy feature #2\",\n",
-    "]\n",
-    "coef = pd.Series(coef, index=feature_names)\n",
-    "coef.plot.barh()\n",
-    "coef"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Create a `LinearRegression` regressor and fit on the entire dataset and check\n",
-    "the value of the coefficients. Are the coefficients of the linear regressor\n",
-    "close to the coefficients used to generate the dataset?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now, create a new dataset that will be the same as `data` with 4 additional\n",
-    "columns that will repeat twice features 0 and 1. This procedure will create\n",
-    "perfectly correlated features."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Fit again the linear regressor on this new dataset and check the coefficients.\n",
-    "What do you observe?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Create a ridge regressor and fit on the same dataset. Check the coefficients.\n",
-    "What do you observe?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Can you find the relationship between the ridge coefficients and the original\n",
-    "coefficients?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  },
-  "kernelspec": {
-   "display_name": "Python 3",
-   "name": "python3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/notebooks/linear_models_ex_05.ipynb b/notebooks/linear_models_ex_05.ipynb
deleted file mode 100644
index 866d52086..000000000
--- a/notebooks/linear_models_ex_05.ipynb
+++ /dev/null
@@ -1,137 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# \ud83d\udcdd Exercise M4.05\n",
-    "\n",
-    "In the previous notebook we set `penalty=\"none\"` to disable regularization\n",
-    "entirely. This parameter can also control the **type** of regularization to\n",
-    "use, whereas the regularization **strength** is set using the parameter `C`.\n",
-    "Setting`penalty=\"none\"` is equivalent to an infinitely large value of `C`. In\n",
-    "this exercise, we ask you to train a logistic regression classifier using the\n",
-    "`penalty=\"l2\"` regularization (which happens to be the default in\n",
-    "scikit-learn) to find by yourself the effect of the parameter `C`.\n",
-    "\n",
-    "We will start by loading the dataset."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div class=\"admonition note alert alert-info\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
-    "<p class=\"last\">If you want a deeper overview regarding this dataset, you can refer to the\n",
-    "Appendix - Datasets description section at the end of this MOOC.</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "\n",
-    "penguins = pd.read_csv(\"../datasets/penguins_classification.csv\")\n",
-    "# only keep the Adelie and Chinstrap classes\n",
-    "penguins = (\n",
-    "    penguins.set_index(\"Species\").loc[[\"Adelie\", \"Chinstrap\"]].reset_index()\n",
-    ")\n",
-    "\n",
-    "culmen_columns = [\"Culmen Length (mm)\", \"Culmen Depth (mm)\"]\n",
-    "target_column = \"Species\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.model_selection import train_test_split\n",
-    "\n",
-    "penguins_train, penguins_test = train_test_split(penguins, random_state=0)\n",
-    "\n",
-    "data_train = penguins_train[culmen_columns]\n",
-    "data_test = penguins_test[culmen_columns]\n",
-    "\n",
-    "target_train = penguins_train[target_column]\n",
-    "target_test = penguins_test[target_column]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "First, let's create our predictive model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.pipeline import make_pipeline\n",
-    "from sklearn.preprocessing import StandardScaler\n",
-    "from sklearn.linear_model import LogisticRegression\n",
-    "\n",
-    "logistic_regression = make_pipeline(\n",
-    "    StandardScaler(), LogisticRegression(penalty=\"l2\")\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Given the following candidates for the `C` parameter, find out the impact of\n",
-    "`C` on the classifier decision boundary. You can use\n",
-    "`sklearn.inspection.DecisionBoundaryDisplay.from_estimator` to plot the\n",
-    "decision function boundary."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "Cs = [0.01, 0.1, 1, 10]\n",
-    "\n",
-    "# Write your code here."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Look at the impact of the `C` hyperparameter on the magnitude of the weights."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  },
-  "kernelspec": {
-   "display_name": "Python 3",
-   "name": "python3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/notebooks/linear_models_sol_02.ipynb b/notebooks/linear_models_sol_02.ipynb
index d56864c4e..634c43171 100644
--- a/notebooks/linear_models_sol_02.ipynb
+++ b/notebooks/linear_models_sol_02.ipynb
@@ -4,39 +4,19 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# \ud83d\udcc3 Solution for Exercise M4.02\n",
+    "# \ud83d\udcc3 Solution for Exercise M4.03\n",
     "\n",
-    "The goal of this exercise is to build an intuition on what will be the\n",
-    "parameters' values of a linear model when the link between the data and the\n",
-    "target is non-linear.\n",
+    "In all previous notebooks, we only used a single feature in `data`. But we\n",
+    "have already shown that we could add new features to make the model more\n",
+    "expressive by deriving new features, based on the original feature.\n",
     "\n",
-    "First, we will generate such non-linear data.\n",
+    "The aim of this notebook is to train a linear regression algorithm on a\n",
+    "dataset with more than a single feature.\n",
     "\n",
-    "<div class=\"admonition tip alert alert-warning\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Tip</p>\n",
-    "<p class=\"last\"><tt class=\"docutils literal\">np.random.RandomState</tt> allows to create a random number generator which can\n",
-    "be later used to get deterministic results.</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "\n",
-    "# Set the seed for reproduction\n",
-    "rng = np.random.RandomState(0)\n",
-    "\n",
-    "# Generate data\n",
-    "n_sample = 100\n",
-    "data_max, data_min = 1.4, -1.4\n",
-    "len_data = data_max - data_min\n",
-    "data = rng.rand(n_sample) * len_data - len_data / 2\n",
-    "noise = rng.randn(n_sample) * 0.3\n",
-    "target = data**3 - 0.5 * data**2 + noise"
+    "We will load a dataset about house prices in California. The dataset consists\n",
+    "of 8 features regarding the demography and geography of districts in\n",
+    "California and the aim is to predict the median house price of each district.\n",
+    "We will use all 8 features to predict the target, the median house price."
    ]
   },
   {
@@ -45,8 +25,8 @@
    "source": [
     "<div class=\"admonition note alert alert-info\">\n",
     "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
-    "<p class=\"last\">To ease the plotting, we will create a Pandas dataframe containing the data\n",
-    "and target</p>\n",
+    "<p class=\"last\">If you want a deeper overview regarding this dataset, you can refer to the\n",
+    "Appendix - Datasets description section at the end of this MOOC.</p>\n",
     "</div>"
    ]
   },
@@ -56,49 +36,19 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import pandas as pd\n",
-    "\n",
-    "full_data = pd.DataFrame({\"data\": data, \"target\": target})"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import seaborn as sns\n",
+    "from sklearn.datasets import fetch_california_housing\n",
     "\n",
-    "_ = sns.scatterplot(\n",
-    "    data=full_data, x=\"data\", y=\"target\", color=\"black\", alpha=0.5\n",
-    ")"
+    "data, target = fetch_california_housing(as_frame=True, return_X_y=True)\n",
+    "target *= 100  # rescale the target in k$\n",
+    "data.head()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "lines_to_next_cell": 2
-   },
-   "source": [
-    "We observe that the link between the data `data` and vector `target` is\n",
-    "non-linear. For instance, `data` could represent the years of experience\n",
-    "(normalized) and `target` the salary (normalized). Therefore, the problem here\n",
-    "would be to infer the salary given the years of experience.\n",
-    "\n",
-    "Using the function `f` defined below, find both the `weight` and the\n",
-    "`intercept` that you think will lead to a good linear model. Plot both the\n",
-    "data and the predictions of this model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
-   "outputs": [],
    "source": [
-    "def f(data, weight=0, intercept=0):\n",
-    "    target_predict = weight * data + intercept\n",
-    "    return target_predict"
+    "Now it is your turn to train a linear regression model on this dataset. First,\n",
+    "create a linear regression model."
    ]
   },
   {
@@ -108,30 +58,17 @@
    "outputs": [],
    "source": [
     "# solution\n",
-    "predictions = f(data, weight=1.2, intercept=-0.2)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "ax = sns.scatterplot(\n",
-    "    data=full_data, x=\"data\", y=\"target\", color=\"black\", alpha=0.5\n",
-    ")\n",
-    "_ = ax.plot(data, predictions)"
+    "from sklearn.linear_model import LinearRegression\n",
+    "\n",
+    "linear_regression = LinearRegression()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Compute the mean squared error for this model"
+    "Execute a cross-validation with 10 folds and use the mean absolute error (MAE)\n",
+    "as metric. Be sure to *return* the fitted *estimators*."
    ]
   },
   {
@@ -141,26 +78,24 @@
    "outputs": [],
    "source": [
     "# solution\n",
-    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import cross_validate\n",
     "\n",
-    "error = mean_squared_error(target, f(data, weight=1.2, intercept=-0.2))\n",
-    "print(f\"The MSE is {error}\")"
+    "cv_results = cross_validate(\n",
+    "    linear_regression,\n",
+    "    data,\n",
+    "    target,\n",
+    "    scoring=\"neg_mean_absolute_error\",\n",
+    "    return_estimator=True,\n",
+    "    cv=10,\n",
+    "    n_jobs=2,\n",
+    ")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Train a linear regression model on this dataset.\n",
-    "\n",
-    "<div class=\"admonition warning alert alert-danger\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Warning</p>\n",
-    "<p class=\"last\">In scikit-learn, by convention <tt class=\"docutils literal\">data</tt> (also called <tt class=\"docutils literal\">X</tt> in the scikit-learn\n",
-    "documentation) should be a 2D matrix of shape <tt class=\"docutils literal\">(n_samples, n_features)</tt>.\n",
-    "If <tt class=\"docutils literal\">data</tt> is a 1D vector, you need to reshape it into a matrix with a\n",
-    "single column if the vector represents a feature or a single row if the\n",
-    "vector represents a sample.</p>\n",
-    "</div>"
+    "Compute the mean and std of the MAE in thousands of dollars (k$)."
    ]
   },
   {
@@ -169,20 +104,25 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from sklearn.linear_model import LinearRegression\n",
-    "\n",
     "# solution\n",
-    "linear_regression = LinearRegression()\n",
-    "data_2d = data.reshape(-1, 1)\n",
-    "linear_regression.fit(data_2d, target)"
+    "print(\n",
+    "    \"Mean absolute error on testing set: \"\n",
+    "    f\"{-cv_results['test_score'].mean():.3f} k$ \u00b1 \"\n",
+    "    f\"{cv_results['test_score'].std():.3f}\"\n",
+    ")"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "lines_to_next_cell": 2
+   },
    "source": [
-    "Compute predictions from the linear regression model and plot both the data\n",
-    "and the predictions."
+    "Inspect the fitted model using a box plot to show the distribution of values\n",
+    "for the coefficients returned from the cross-validation. Hint: use the\n",
+    "function\n",
+    "[`df.plot.box()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.box.html)\n",
+    "to create a box plot."
    ]
   },
   {
@@ -192,7 +132,11 @@
    "outputs": [],
    "source": [
     "# solution\n",
-    "predictions = linear_regression.predict(data_2d)"
+    "import pandas as pd\n",
+    "\n",
+    "weights = pd.DataFrame(\n",
+    "    [est.coef_ for est in cv_results[\"estimator\"]], columns=data.columns\n",
+    ")"
    ]
   },
   {
@@ -205,28 +149,11 @@
    },
    "outputs": [],
    "source": [
-    "ax = sns.scatterplot(\n",
-    "    data=full_data, x=\"data\", y=\"target\", color=\"black\", alpha=0.5\n",
-    ")\n",
-    "_ = ax.plot(data, predictions)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Compute the mean squared error"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "error = mean_squared_error(target, predictions)\n",
-    "print(f\"The MSE is {error}\")"
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "color = {\"whiskers\": \"black\", \"medians\": \"black\", \"caps\": \"black\"}\n",
+    "weights.plot.box(color=color, vert=False)\n",
+    "_ = plt.title(\"Value of linear regression coefficients\")"
    ]
   }
  ],
diff --git a/notebooks/linear_models_sol_03.ipynb b/notebooks/linear_models_sol_03.ipynb
deleted file mode 100644
index 634c43171..000000000
--- a/notebooks/linear_models_sol_03.ipynb
+++ /dev/null
@@ -1,171 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# \ud83d\udcc3 Solution for Exercise M4.03\n",
-    "\n",
-    "In all previous notebooks, we only used a single feature in `data`. But we\n",
-    "have already shown that we could add new features to make the model more\n",
-    "expressive by deriving new features, based on the original feature.\n",
-    "\n",
-    "The aim of this notebook is to train a linear regression algorithm on a\n",
-    "dataset with more than a single feature.\n",
-    "\n",
-    "We will load a dataset about house prices in California. The dataset consists\n",
-    "of 8 features regarding the demography and geography of districts in\n",
-    "California and the aim is to predict the median house price of each district.\n",
-    "We will use all 8 features to predict the target, the median house price."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div class=\"admonition note alert alert-info\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
-    "<p class=\"last\">If you want a deeper overview regarding this dataset, you can refer to the\n",
-    "Appendix - Datasets description section at the end of this MOOC.</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.datasets import fetch_california_housing\n",
-    "\n",
-    "data, target = fetch_california_housing(as_frame=True, return_X_y=True)\n",
-    "target *= 100  # rescale the target in k$\n",
-    "data.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now it is your turn to train a linear regression model on this dataset. First,\n",
-    "create a linear regression model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "from sklearn.linear_model import LinearRegression\n",
-    "\n",
-    "linear_regression = LinearRegression()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Execute a cross-validation with 10 folds and use the mean absolute error (MAE)\n",
-    "as metric. Be sure to *return* the fitted *estimators*."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "from sklearn.model_selection import cross_validate\n",
-    "\n",
-    "cv_results = cross_validate(\n",
-    "    linear_regression,\n",
-    "    data,\n",
-    "    target,\n",
-    "    scoring=\"neg_mean_absolute_error\",\n",
-    "    return_estimator=True,\n",
-    "    cv=10,\n",
-    "    n_jobs=2,\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Compute the mean and std of the MAE in thousands of dollars (k$)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "print(\n",
-    "    \"Mean absolute error on testing set: \"\n",
-    "    f\"{-cv_results['test_score'].mean():.3f} k$ \u00b1 \"\n",
-    "    f\"{cv_results['test_score'].std():.3f}\"\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "lines_to_next_cell": 2
-   },
-   "source": [
-    "Inspect the fitted model using a box plot to show the distribution of values\n",
-    "for the coefficients returned from the cross-validation. Hint: use the\n",
-    "function\n",
-    "[`df.plot.box()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.box.html)\n",
-    "to create a box plot."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "import pandas as pd\n",
-    "\n",
-    "weights = pd.DataFrame(\n",
-    "    [est.coef_ for est in cv_results[\"estimator\"]], columns=data.columns\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "color = {\"whiskers\": \"black\", \"medians\": \"black\", \"caps\": \"black\"}\n",
-    "weights.plot.box(color=color, vert=False)\n",
-    "_ = plt.title(\"Value of linear regression coefficients\")"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  },
-  "kernelspec": {
-   "display_name": "Python 3",
-   "name": "python3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/notebooks/linear_models_sol_04.ipynb b/notebooks/linear_models_sol_04.ipynb
deleted file mode 100644
index f49b0c465..000000000
--- a/notebooks/linear_models_sol_04.ipynb
+++ /dev/null
@@ -1,492 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# \ud83d\udcc3 Solution for Exercise M4.04\n",
-    "\n",
-    "In the previous notebook, we saw the effect of applying some regularization on\n",
-    "the coefficient of a linear model.\n",
-    "\n",
-    "In this exercise, we will study the advantage of using some regularization\n",
-    "when dealing with correlated features.\n",
-    "\n",
-    "We will first create a regression dataset. This dataset will contain 2,000\n",
-    "samples and 5 features from which only 2 features will be informative."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.datasets import make_regression\n",
-    "\n",
-    "data, target, coef = make_regression(\n",
-    "    n_samples=2_000,\n",
-    "    n_features=5,\n",
-    "    n_informative=2,\n",
-    "    shuffle=False,\n",
-    "    coef=True,\n",
-    "    random_state=0,\n",
-    "    noise=30,\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "When creating the dataset, `make_regression` returns the true coefficient used\n",
-    "to generate the dataset. Let's plot this information."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "\n",
-    "feature_names = [\n",
-    "    \"Relevant feature #0\",\n",
-    "    \"Relevant feature #1\",\n",
-    "    \"Noisy feature #0\",\n",
-    "    \"Noisy feature #1\",\n",
-    "    \"Noisy feature #2\",\n",
-    "]\n",
-    "coef = pd.Series(coef, index=feature_names)\n",
-    "coef.plot.barh()\n",
-    "coef"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Create a `LinearRegression` regressor and fit on the entire dataset and check\n",
-    "the value of the coefficients. Are the coefficients of the linear regressor\n",
-    "close to the coefficients used to generate the dataset?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "from sklearn.linear_model import LinearRegression\n",
-    "\n",
-    "linear_regression = LinearRegression()\n",
-    "linear_regression.fit(data, target)\n",
-    "linear_regression.coef_"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "feature_names = [\n",
-    "    \"Relevant feature #0\",\n",
-    "    \"Relevant feature #1\",\n",
-    "    \"Noisy feature #0\",\n",
-    "    \"Noisy feature #1\",\n",
-    "    \"Noisy feature #2\",\n",
-    "]\n",
-    "coef = pd.Series(linear_regression.coef_, index=feature_names)\n",
-    "_ = coef.plot.barh()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "source": [
-    "We see that the coefficients are close to the coefficients used to generate\n",
-    "the dataset. The dispersion is indeed cause by the noise injected during the\n",
-    "dataset generation."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now, create a new dataset that will be the same as `data` with 4 additional\n",
-    "columns that will repeat twice features 0 and 1. This procedure will create\n",
-    "perfectly correlated features."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "import numpy as np\n",
-    "\n",
-    "data = np.concatenate([data, data[:, [0, 1]], data[:, [0, 1]]], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Fit again the linear regressor on this new dataset and check the coefficients.\n",
-    "What do you observe?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "linear_regression = LinearRegression()\n",
-    "linear_regression.fit(data, target)\n",
-    "linear_regression.coef_"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "feature_names = [\n",
-    "    \"Relevant feature #0\",\n",
-    "    \"Relevant feature #1\",\n",
-    "    \"Noisy feature #0\",\n",
-    "    \"Noisy feature #1\",\n",
-    "    \"Noisy feature #2\",\n",
-    "    \"First repetition of feature #0\",\n",
-    "    \"First repetition of  feature #1\",\n",
-    "    \"Second repetition of  feature #0\",\n",
-    "    \"Second repetition of  feature #1\",\n",
-    "]\n",
-    "coef = pd.Series(linear_regression.coef_, index=feature_names)\n",
-    "_ = coef.plot.barh()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "source": [
-    "We see that the coefficient values are far from what one could expect. By\n",
-    "repeating the informative features, one would have expected these coefficients\n",
-    "to be similarly informative.\n",
-    "\n",
-    "Instead, we see that some coefficients have a huge norm ~1e14. It indeed means\n",
-    "that we try to solve an mathematical ill-posed problem. Indeed, finding\n",
-    "coefficients in a linear regression involves inverting the matrix\n",
-    "`np.dot(data.T, data)` which is not possible (or lead to high numerical\n",
-    "errors)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Create a ridge regressor and fit on the same dataset. Check the coefficients.\n",
-    "What do you observe?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "from sklearn.linear_model import Ridge\n",
-    "\n",
-    "ridge = Ridge()\n",
-    "ridge.fit(data, target)\n",
-    "ridge.coef_"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "coef = pd.Series(ridge.coef_, index=feature_names)\n",
-    "_ = coef.plot.barh()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "source": [
-    "We see that the penalty applied on the weights give a better results: the\n",
-    "values of the coefficients do not suffer from numerical issues. Indeed, the\n",
-    "matrix to be inverted internally is `np.dot(data.T, data) + alpha * I`. Adding\n",
-    "this penalty `alpha` allow the inversion without numerical issue."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Can you find the relationship between the ridge coefficients and the original\n",
-    "coefficients?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "ridge.coef_[:5] * 3"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "source": [
-    "Repeating three times each informative features induced to divide the ridge\n",
-    "coefficients by three."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "source": [
-    "<div class=\"admonition tip alert alert-warning\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Tip</p>\n",
-    "<p>We advise to always use a penalty to shrink the magnitude of the weights\n",
-    "toward zero (also called \"l2 penalty\"). In scikit-learn, <tt class=\"docutils literal\">LogisticRegression</tt>\n",
-    "applies such penalty by default. However, one needs to use <tt class=\"docutils literal\">Ridge</tt> (and even\n",
-    "<tt class=\"docutils literal\">RidgeCV</tt> to tune the parameter <tt class=\"docutils literal\">alpha</tt>) instead of <tt class=\"docutils literal\">LinearRegression</tt>.</p>\n",
-    "<p class=\"last\">Other kinds of regularizations exist but will not be covered in this course.</p>\n",
-    "</div>\n",
-    "\n",
-    "## Dealing with correlation between one-hot encoded features\n",
-    "\n",
-    "In this section, we will focus on how to deal with correlated features that\n",
-    "arise naturally when one-hot encoding categorical features.\n",
-    "\n",
-    "Let's first load the Ames housing dataset and take a subset of features that\n",
-    "are only categorical features."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "\n",
-    "ames_housing = pd.read_csv(\"../datasets/house_prices.csv\", na_values=\"?\")\n",
-    "ames_housing = ames_housing.drop(columns=\"Id\")\n",
-    "\n",
-    "categorical_columns = [\"Street\", \"Foundation\", \"CentralAir\", \"PavedDrive\"]\n",
-    "target_name = \"SalePrice\"\n",
-    "X, y = ames_housing[categorical_columns], ames_housing[target_name]\n",
-    "\n",
-    "X_train, X_test, y_train, y_test = train_test_split(\n",
-    "    X, y, test_size=0.2, random_state=0\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "source": [
-    "\n",
-    "We previously presented that a `OneHotEncoder` creates as many columns as\n",
-    "categories. Therefore, there is always one column (i.e. one encoded category)\n",
-    "that can be inferred from the others. Thus, `OneHotEncoder` creates collinear\n",
-    "features.\n",
-    "\n",
-    "We illustrate this behaviour by considering the \"CentralAir\" feature that\n",
-    "contains only two categories:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "X_train[\"CentralAir\"]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "from sklearn.preprocessing import OneHotEncoder\n",
-    "\n",
-    "single_feature = [\"CentralAir\"]\n",
-    "encoder = OneHotEncoder(sparse_output=False, dtype=np.int32)\n",
-    "X_trans = encoder.fit_transform(X_train[single_feature])\n",
-    "X_trans = pd.DataFrame(\n",
-    "    X_trans,\n",
-    "    columns=encoder.get_feature_names_out(input_features=single_feature),\n",
-    ")\n",
-    "X_trans"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "source": [
-    "\n",
-    "Here, we see that the encoded category \"CentralAir_N\" is the opposite of the\n",
-    "encoded category \"CentralAir_Y\". Therefore, we observe that using a\n",
-    "`OneHotEncoder` creates two features having the problematic pattern observed\n",
-    "earlier in this exercise. Training a linear regression model on such a of\n",
-    "one-hot encoded binary feature can therefore lead to numerical problems,\n",
-    "especially without regularization. Furthermore, the two one-hot features are\n",
-    "redundant as they encode exactly the same information in opposite ways.\n",
-    "\n",
-    "Using regularization helps to overcome the numerical issues that we\n",
-    "highlighted earlier in this exercise.\n",
-    "\n",
-    "Another strategy is to arbitrarily drop one of the encoded categories.\n",
-    "Scikit-learn provides such an option by setting the parameter `drop` in the\n",
-    "`OneHotEncoder`. This parameter can be set to `first` to always drop the first\n",
-    "encoded category or `binary_only` to only drop a column in the case of binary\n",
-    "categories."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "encoder = OneHotEncoder(drop=\"first\", sparse_output=False, dtype=np.int32)\n",
-    "X_trans = encoder.fit_transform(X_train[single_feature])\n",
-    "X_trans = pd.DataFrame(\n",
-    "    X_trans,\n",
-    "    columns=encoder.get_feature_names_out(input_features=single_feature),\n",
-    ")\n",
-    "X_trans"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "source": [
-    "\n",
-    "We see that only the second column of the previous encoded data is kept.\n",
-    "Dropping one of the one-hot encoded column is a common practice, especially\n",
-    "for binary categorical features. Note however that this breaks symmetry\n",
-    "between categories and impacts the number of coefficients of the model, their\n",
-    "values, and thus their meaning, especially when applying strong\n",
-    "regularization.\n",
-    "\n",
-    "Let's finally illustrate how to use this option is a machine-learning\n",
-    "pipeline:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "from sklearn.pipeline import make_pipeline\n",
-    "\n",
-    "model = make_pipeline(OneHotEncoder(drop=\"first\", dtype=np.int32), Ridge())\n",
-    "model.fit(X_train, y_train)\n",
-    "n_categories = [X_train[col].nunique() for col in X_train.columns]\n",
-    "print(f\"R2 score on the testing set: {model.score(X_test, y_test):.2f}\")\n",
-    "print(\n",
-    "    f\"Our model contains {model[-1].coef_.size} features while \"\n",
-    "    f\"{sum(n_categories)} categories are originally available.\"\n",
-    ")"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  },
-  "kernelspec": {
-   "display_name": "Python 3",
-   "name": "python3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/notebooks/linear_models_sol_05.ipynb b/notebooks/linear_models_sol_05.ipynb
deleted file mode 100644
index 08bae2e77..000000000
--- a/notebooks/linear_models_sol_05.ipynb
+++ /dev/null
@@ -1,201 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# \ud83d\udcc3 Solution for Exercise M4.05\n",
-    "\n",
-    "In the previous notebook we set `penalty=\"none\"` to disable regularization\n",
-    "entirely. This parameter can also control the **type** of regularization to\n",
-    "use, whereas the regularization **strength** is set using the parameter `C`.\n",
-    "Setting`penalty=\"none\"` is equivalent to an infinitely large value of `C`. In\n",
-    "this exercise, we ask you to train a logistic regression classifier using the\n",
-    "`penalty=\"l2\"` regularization (which happens to be the default in\n",
-    "scikit-learn) to find by yourself the effect of the parameter `C`.\n",
-    "\n",
-    "We will start by loading the dataset."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div class=\"admonition note alert alert-info\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
-    "<p class=\"last\">If you want a deeper overview regarding this dataset, you can refer to the\n",
-    "Appendix - Datasets description section at the end of this MOOC.</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "\n",
-    "penguins = pd.read_csv(\"../datasets/penguins_classification.csv\")\n",
-    "# only keep the Adelie and Chinstrap classes\n",
-    "penguins = (\n",
-    "    penguins.set_index(\"Species\").loc[[\"Adelie\", \"Chinstrap\"]].reset_index()\n",
-    ")\n",
-    "\n",
-    "culmen_columns = [\"Culmen Length (mm)\", \"Culmen Depth (mm)\"]\n",
-    "target_column = \"Species\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.model_selection import train_test_split\n",
-    "\n",
-    "penguins_train, penguins_test = train_test_split(penguins, random_state=0)\n",
-    "\n",
-    "data_train = penguins_train[culmen_columns]\n",
-    "data_test = penguins_test[culmen_columns]\n",
-    "\n",
-    "target_train = penguins_train[target_column]\n",
-    "target_test = penguins_test[target_column]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "First, let's create our predictive model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.pipeline import make_pipeline\n",
-    "from sklearn.preprocessing import StandardScaler\n",
-    "from sklearn.linear_model import LogisticRegression\n",
-    "\n",
-    "logistic_regression = make_pipeline(\n",
-    "    StandardScaler(), LogisticRegression(penalty=\"l2\")\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Given the following candidates for the `C` parameter, find out the impact of\n",
-    "`C` on the classifier decision boundary. You can use\n",
-    "`sklearn.inspection.DecisionBoundaryDisplay.from_estimator` to plot the\n",
-    "decision function boundary."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "Cs = [0.01, 0.1, 1, 10]\n",
-    "\n",
-    "# solution\n",
-    "import matplotlib.pyplot as plt\n",
-    "import seaborn as sns\n",
-    "from sklearn.inspection import DecisionBoundaryDisplay\n",
-    "\n",
-    "for C in Cs:\n",
-    "    logistic_regression.set_params(logisticregression__C=C)\n",
-    "    logistic_regression.fit(data_train, target_train)\n",
-    "    accuracy = logistic_regression.score(data_test, target_test)\n",
-    "\n",
-    "    DecisionBoundaryDisplay.from_estimator(\n",
-    "        logistic_regression,\n",
-    "        data_test,\n",
-    "        response_method=\"predict\",\n",
-    "        cmap=\"RdBu_r\",\n",
-    "        alpha=0.5,\n",
-    "    )\n",
-    "    sns.scatterplot(\n",
-    "        data=penguins_test,\n",
-    "        x=culmen_columns[0],\n",
-    "        y=culmen_columns[1],\n",
-    "        hue=target_column,\n",
-    "        palette=[\"tab:red\", \"tab:blue\"],\n",
-    "    )\n",
-    "    plt.legend(bbox_to_anchor=(1.05, 0.8), loc=\"upper left\")\n",
-    "    plt.title(f\"C: {C} \\n Accuracy on the test set: {accuracy:.2f}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Look at the impact of the `C` hyperparameter on the magnitude of the weights."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# solution\n",
-    "weights_ridge = []\n",
-    "for C in Cs:\n",
-    "    logistic_regression.set_params(logisticregression__C=C)\n",
-    "    logistic_regression.fit(data_train, target_train)\n",
-    "    coefs = logistic_regression[-1].coef_[0]\n",
-    "    weights_ridge.append(pd.Series(coefs, index=culmen_columns))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "weights_ridge = pd.concat(weights_ridge, axis=1, keys=[f\"C: {C}\" for C in Cs])\n",
-    "weights_ridge.plot.barh()\n",
-    "_ = plt.title(\"LogisticRegression weights depending of C\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "tags": [
-     "solution"
-    ]
-   },
-   "source": [
-    "We see that a small `C` will shrink the weights values toward zero. It means\n",
-    "that a small `C` provides a more regularized model. Thus, `C` is the inverse\n",
-    "of the `alpha` coefficient in the `Ridge` model.\n",
-    "\n",
-    "Besides, with a strong penalty (i.e. small `C` value), the weight of the\n",
-    "feature \"Culmen Depth (mm)\" is almost zero. It explains why the decision\n",
-    "separation in the plot is almost perpendicular to the \"Culmen Length (mm)\"\n",
-    "feature."
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  },
-  "kernelspec": {
-   "display_name": "Python 3",
-   "name": "python3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/python_scripts/linear_models_ex_02.py b/python_scripts/linear_models_ex_02.py
index 640c44046..f58a1f0fe 100644
--- a/python_scripts/linear_models_ex_02.py
+++ b/python_scripts/linear_models_ex_02.py
@@ -14,100 +14,80 @@
 # %% [markdown]
 # # 📝 Exercise M4.02
 #
-# The goal of this exercise is to build an intuition on what will be the
-# parameters' values of a linear model when the link between the data and the
-# target is non-linear.
+# In the previous notebook, we showed that we can add new features based on the
+# original feature to make the model more expressive, for instance `x ** 2` or `x ** 3`.
+# In that case we only used a single feature in `data`.
 #
-# First, we will generate such non-linear data.
+# The aim of this notebook is to train a linear regression algorithm on a
+# dataset with more than a single feature. In such a "multi-dimensional" feature
+# space we can derive new features of the form `x1 * x2`, `x2 * x3`,
+# etc. Products of features are usually called "non-linear or
+# multiplicative interactions" between features.
 #
-# ```{tip}
-# `np.random.RandomState` allows to create a random number generator which can
-# be later used to get deterministic results.
-# ```
-
-# %%
-import numpy as np
-
-# Set the seed for reproduction
-rng = np.random.RandomState(0)
-
-# Generate data
-n_sample = 100
-data_max, data_min = 1.4, -1.4
-len_data = data_max - data_min
-data = rng.rand(n_sample) * len_data - len_data / 2
-noise = rng.randn(n_sample) * 0.3
-target = data**3 - 0.5 * data**2 + noise
+# Feature engineering can be an important step of a model pipeline as long as
+# the new features are expected to be predictive. For instance, think of a
+# classification model to decide if a patient has risk of developing a heart
+# disease. This would depend on the patient's Body Mass Index which is defined
+# as `weight / height ** 2`.
+#
+# We load the dataset penguins dataset. We first use a set of 3 numerical
+# features to predict the target, i.e. the body mass of the penguin.
 
 # %% [markdown]
 # ```{note}
-# To ease the plotting, we will create a Pandas dataframe containing the data
-# and target
+# If you want a deeper overview regarding this dataset, you can refer to the
+# Appendix - Datasets description section at the end of this MOOC.
 # ```
 
 # %%
 import pandas as pd
 
-full_data = pd.DataFrame({"data": data, "target": target})
+penguins = pd.read_csv("../datasets/penguins.csv")
 
-# %%
-import seaborn as sns
+columns = ["Flipper Length (mm)", "Culmen Length (mm)", "Culmen Depth (mm)"]
+target_name = "Body Mass (g)"
 
-_ = sns.scatterplot(
-    data=full_data, x="data", y="target", color="black", alpha=0.5
-)
+# Remove lines with missing values for the columns of interest
+penguins_non_missing = penguins[columns + [target_name]].dropna()
 
-# %% [markdown]
-# We observe that the link between the data `data` and vector `target` is
-# non-linear. For instance, `data` could represent the years of experience
-# (normalized) and `target` the salary (normalized). Therefore, the problem here
-# would be to infer the salary given the years of experience.
-#
-# Using the function `f` defined below, find both the `weight` and the
-# `intercept` that you think will lead to a good linear model. Plot both the
-# data and the predictions of this model.
-
-
-# %%
-def f(data, weight=0, intercept=0):
-    target_predict = weight * data + intercept
-    return target_predict
+data = penguins_non_missing[columns]
+target = penguins_non_missing[target_name]
+data.head()
 
+# %% [markdown]
+# Now it is your turn to train a linear regression model on this dataset. First,
+# create a linear regression model.
 
 # %%
 # Write your code here.
 
 # %% [markdown]
-# Compute the mean squared error for this model
+# Execute a cross-validation with 10 folds and use the mean absolute error (MAE)
+# as metric.
 
 # %%
 # Write your code here.
 
 # %% [markdown]
-# Train a linear regression model on this dataset.
-#
-# ```{warning}
-# In scikit-learn, by convention `data` (also called `X` in the scikit-learn
-# documentation) should be a 2D matrix of shape `(n_samples, n_features)`.
-# If `data` is a 1D vector, you need to reshape it into a matrix with a
-# single column if the vector represents a feature or a single row if the
-# vector represents a sample.
-# ```
+# Compute the mean and std of the MAE in grams (g).
 
 # %%
-from sklearn.linear_model import LinearRegression
-
 # Write your code here.
 
 # %% [markdown]
-# Compute predictions from the linear regression model and plot both the data
-# and the predictions.
+# Now create a pipeline using `make_pipeline` consisting of a
+# `PolynomialFeatures` and a linear regression. Set `degree=2` and
+# `interaction_only=True` to the feature engineering step. Remember not to
+# include the bias to avoid redundancies with the linear's regression intercept.
+#
+# Use the same strategy as before to cross-validate such a pipeline.
 
 # %%
 # Write your code here.
 
 # %% [markdown]
-# Compute the mean squared error
+# Compute the mean and std of the MAE in grams (g) and compare with the results
+# without feature engineering.
 
 # %%
 # Write your code here.
diff --git a/python_scripts/linear_models_ex_03.py b/python_scripts/linear_models_ex_03.py
index 3ab6949a3..9c311e817 100644
--- a/python_scripts/linear_models_ex_03.py
+++ b/python_scripts/linear_models_ex_03.py
@@ -14,24 +14,14 @@
 # %% [markdown]
 # # 📝 Exercise M4.03
 #
-# In the previous notebook, we showed that we can add new features based on the
-# original feature to make the model more expressive, for instance `x ** 2` or `x ** 3`.
-# In that case we only used a single feature in `data`.
+# The parameter `penalty` can control the **type** of regularization to use,
+# whereas the regularization **strength** is set using the parameter `C`.
+# Setting`penalty="none"` is equivalent to an infinitely large value of `C`. In
+# this exercise, we ask you to train a logistic regression classifier using the
+# `penalty="l2"` regularization (which happens to be the default in
+# scikit-learn) to find by yourself the effect of the parameter `C`.
 #
-# The aim of this notebook is to train a linear regression algorithm on a
-# dataset with more than a single feature. In such a "multi-dimensional" feature
-# space we can derive new features of the form `x1 * x2`, `x2 * x3`,
-# etc. Products of features are usually called "non-linear or
-# multiplicative interactions" between features.
-#
-# Feature engineering can be an important step of a model pipeline as long as
-# the new features are expected to be predictive. For instance, think of a
-# classification model to decide if a patient has risk of developing a heart
-# disease. This would depend on the patient's Body Mass Index which is defined
-# as `weight / height ** 2`.
-#
-# We load the dataset penguins dataset. We first use a set of 3 numerical
-# features to predict the target, i.e. the body mass of the penguin.
+# We start by loading the dataset.
 
 # %% [markdown]
 # ```{note}
@@ -42,52 +32,51 @@
 # %%
 import pandas as pd
 
-penguins = pd.read_csv("../datasets/penguins.csv")
+penguins = pd.read_csv("../datasets/penguins_classification.csv")
+# only keep the Adelie and Chinstrap classes
+penguins = (
+    penguins.set_index("Species").loc[["Adelie", "Chinstrap"]].reset_index()
+)
 
-columns = ["Flipper Length (mm)", "Culmen Length (mm)", "Culmen Depth (mm)"]
-target_name = "Body Mass (g)"
+culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
+target_column = "Species"
 
-# Remove lines with missing values for the columns of interest
-penguins_non_missing = penguins[columns + [target_name]].dropna()
+# %%
+from sklearn.model_selection import train_test_split
 
-data = penguins_non_missing[columns]
-target = penguins_non_missing[target_name]
-data.head()
+penguins_train, penguins_test = train_test_split(penguins, random_state=0)
 
-# %% [markdown]
-# Now it is your turn to train a linear regression model on this dataset. First,
-# create a linear regression model.
+data_train = penguins_train[culmen_columns]
+data_test = penguins_test[culmen_columns]
 
-# %%
-# Write your code here.
+target_train = penguins_train[target_column]
+target_test = penguins_test[target_column]
 
 # %% [markdown]
-# Execute a cross-validation with 10 folds and use the mean absolute error (MAE)
-# as metric.
+# First, let's create our predictive model.
 
 # %%
-# Write your code here.
+from sklearn.pipeline import make_pipeline
+from sklearn.preprocessing import StandardScaler
+from sklearn.linear_model import LogisticRegression
 
-# %% [markdown]
-# Compute the mean and std of the MAE in grams (g).
-
-# %%
-# Write your code here.
+logistic_regression = make_pipeline(
+    StandardScaler(), LogisticRegression(penalty="l2")
+)
 
 # %% [markdown]
-# Now create a pipeline using `make_pipeline` consisting of a
-# `PolynomialFeatures` and a linear regression. Set `degree=2` and
-# `interaction_only=True` to the feature engineering step. Remember not to
-# include the bias to avoid redundancies with the linear's regression intercept.
-#
-# Use the same strategy as before to cross-validate such a pipeline.
+# Given the following candidates for the `C` parameter, find out the impact of
+# `C` on the classifier decision boundary. You can use
+# `sklearn.inspection.DecisionBoundaryDisplay.from_estimator` to plot the
+# decision function boundary.
 
 # %%
+Cs = [0.01, 0.1, 1, 10]
+
 # Write your code here.
 
 # %% [markdown]
-# Compute the mean and std of the MAE in grams (g) and compare with the results
-# without feature engineering.
+# Look at the impact of the `C` hyperparameter on the magnitude of the weights.
 
 # %%
 # Write your code here.
diff --git a/python_scripts/linear_models_ex_04.py b/python_scripts/linear_models_ex_04.py
deleted file mode 100644
index 18191bccf..000000000
--- a/python_scripts/linear_models_ex_04.py
+++ /dev/null
@@ -1,92 +0,0 @@
-# ---
-# jupyter:
-#   jupytext:
-#     text_representation:
-#       extension: .py
-#       format_name: percent
-#       format_version: '1.3'
-#       jupytext_version: 1.14.5
-#   kernelspec:
-#     display_name: Python 3
-#     name: python3
-# ---
-
-# %% [markdown]
-# # 📝 Exercise M4.04
-#
-# In the previous notebook, we saw the effect of applying some regularization on
-# the coefficient of a linear model.
-#
-# In this exercise, we will study the advantage of using some regularization
-# when dealing with correlated features.
-#
-# We will first create a regression dataset. This dataset will contain 2,000
-# samples and 5 features from which only 2 features will be informative.
-
-# %%
-from sklearn.datasets import make_regression
-
-data, target, coef = make_regression(
-    n_samples=2_000,
-    n_features=5,
-    n_informative=2,
-    shuffle=False,
-    coef=True,
-    random_state=0,
-    noise=30,
-)
-
-# %% [markdown]
-# When creating the dataset, `make_regression` returns the true coefficient used
-# to generate the dataset. Let's plot this information.
-
-# %%
-import pandas as pd
-
-feature_names = [
-    "Relevant feature #0",
-    "Relevant feature #1",
-    "Noisy feature #0",
-    "Noisy feature #1",
-    "Noisy feature #2",
-]
-coef = pd.Series(coef, index=feature_names)
-coef.plot.barh()
-coef
-
-# %% [markdown]
-# Create a `LinearRegression` regressor and fit on the entire dataset and check
-# the value of the coefficients. Are the coefficients of the linear regressor
-# close to the coefficients used to generate the dataset?
-
-# %%
-# Write your code here.
-
-# %% [markdown]
-# Now, create a new dataset that will be the same as `data` with 4 additional
-# columns that will repeat twice features 0 and 1. This procedure will create
-# perfectly correlated features.
-
-# %%
-# Write your code here.
-
-# %% [markdown]
-# Fit again the linear regressor on this new dataset and check the coefficients.
-# What do you observe?
-
-# %%
-# Write your code here.
-
-# %% [markdown]
-# Create a ridge regressor and fit on the same dataset. Check the coefficients.
-# What do you observe?
-
-# %%
-# Write your code here.
-
-# %% [markdown]
-# Can you find the relationship between the ridge coefficients and the original
-# coefficients?
-
-# %%
-# Write your code here.
diff --git a/python_scripts/linear_models_ex_05.py b/python_scripts/linear_models_ex_05.py
deleted file mode 100644
index 1c36b83c2..000000000
--- a/python_scripts/linear_models_ex_05.py
+++ /dev/null
@@ -1,83 +0,0 @@
-# ---
-# jupyter:
-#   jupytext:
-#     text_representation:
-#       extension: .py
-#       format_name: percent
-#       format_version: '1.3'
-#       jupytext_version: 1.14.5
-#   kernelspec:
-#     display_name: Python 3
-#     name: python3
-# ---
-
-# %% [markdown]
-# # 📝 Exercise M4.05
-#
-# In the previous notebook we set `penalty="none"` to disable regularization
-# entirely. This parameter can also control the **type** of regularization to
-# use, whereas the regularization **strength** is set using the parameter `C`.
-# Setting`penalty="none"` is equivalent to an infinitely large value of `C`. In
-# this exercise, we ask you to train a logistic regression classifier using the
-# `penalty="l2"` regularization (which happens to be the default in
-# scikit-learn) to find by yourself the effect of the parameter `C`.
-#
-# We will start by loading the dataset.
-
-# %% [markdown]
-# ```{note}
-# If you want a deeper overview regarding this dataset, you can refer to the
-# Appendix - Datasets description section at the end of this MOOC.
-# ```
-
-# %%
-import pandas as pd
-
-penguins = pd.read_csv("../datasets/penguins_classification.csv")
-# only keep the Adelie and Chinstrap classes
-penguins = (
-    penguins.set_index("Species").loc[["Adelie", "Chinstrap"]].reset_index()
-)
-
-culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
-target_column = "Species"
-
-# %%
-from sklearn.model_selection import train_test_split
-
-penguins_train, penguins_test = train_test_split(penguins, random_state=0)
-
-data_train = penguins_train[culmen_columns]
-data_test = penguins_test[culmen_columns]
-
-target_train = penguins_train[target_column]
-target_test = penguins_test[target_column]
-
-# %% [markdown]
-# First, let's create our predictive model.
-
-# %%
-from sklearn.pipeline import make_pipeline
-from sklearn.preprocessing import StandardScaler
-from sklearn.linear_model import LogisticRegression
-
-logistic_regression = make_pipeline(
-    StandardScaler(), LogisticRegression(penalty="l2")
-)
-
-# %% [markdown]
-# Given the following candidates for the `C` parameter, find out the impact of
-# `C` on the classifier decision boundary. You can use
-# `sklearn.inspection.DecisionBoundaryDisplay.from_estimator` to plot the
-# decision function boundary.
-
-# %%
-Cs = [0.01, 0.1, 1, 10]
-
-# Write your code here.
-
-# %% [markdown]
-# Look at the impact of the `C` hyperparameter on the magnitude of the weights.
-
-# %%
-# Write your code here.
diff --git a/python_scripts/linear_models_sol_02.py b/python_scripts/linear_models_sol_02.py
index d62a4b983..3abc476da 100644
--- a/python_scripts/linear_models_sol_02.py
+++ b/python_scripts/linear_models_sol_02.py
@@ -8,123 +8,127 @@
 # %% [markdown]
 # # 📃 Solution for Exercise M4.02
 #
-# The goal of this exercise is to build an intuition on what will be the
-# parameters' values of a linear model when the link between the data and the
-# target is non-linear.
+# In the previous notebook, we showed that we can add new features based on the
+# original feature to make the model more expressive, for instance `x ** 2` or `x ** 3`.
+# In that case we only used a single feature in `data`.
 #
-# First, we will generate such non-linear data.
+# The aim of this notebook is to train a linear regression algorithm on a
+# dataset with more than a single feature. In such a "multi-dimensional" feature
+# space we can derive new features of the form `x1 * x2`, `x2 * x3`,
+# etc. Products of features are usually called "non-linear or
+# multiplicative interactions" between features.
 #
-# ```{tip}
-# `np.random.RandomState` allows to create a random number generator which can
-# be later used to get deterministic results.
-# ```
-
-# %%
-import numpy as np
-
-# Set the seed for reproduction
-rng = np.random.RandomState(0)
-
-# Generate data
-n_sample = 100
-data_max, data_min = 1.4, -1.4
-len_data = data_max - data_min
-data = rng.rand(n_sample) * len_data - len_data / 2
-noise = rng.randn(n_sample) * 0.3
-target = data**3 - 0.5 * data**2 + noise
+# Feature engineering can be an important step of a model pipeline as long as
+# the new features are expected to be predictive. For instance, think of a
+# classification model to decide if a patient has risk of developing a heart
+# disease. This would depend on the patient's Body Mass Index which is defined
+# as `weight / height ** 2`.
+#
+# We load the dataset penguins dataset. We first use a set of 3 numerical
+# features to predict the target, i.e. the body mass of the penguin.
 
 # %% [markdown]
 # ```{note}
-# To ease the plotting, we will create a Pandas dataframe containing the data
-# and target
+# If you want a deeper overview regarding this dataset, you can refer to the
+# Appendix - Datasets description section at the end of this MOOC.
 # ```
 
 # %%
 import pandas as pd
 
-full_data = pd.DataFrame({"data": data, "target": target})
-
-# %%
-import seaborn as sns
-
-_ = sns.scatterplot(
-    data=full_data, x="data", y="target", color="black", alpha=0.5
-)
+penguins = pd.read_csv("../datasets/penguins.csv")
 
-# %% [markdown]
-# We observe that the link between the data `data` and vector `target` is
-# non-linear. For instance, `data` could represent the years of experience
-# (normalized) and `target` the salary (normalized). Therefore, the problem here
-# would be to infer the salary given the years of experience.
-#
-# Using the function `f` defined below, find both the `weight` and the
-# `intercept` that you think will lead to a good linear model. Plot both the
-# data and the predictions of this model.
+columns = ["Flipper Length (mm)", "Culmen Length (mm)", "Culmen Depth (mm)"]
+target_name = "Body Mass (g)"
 
+# Remove lines with missing values for the columns of interest
+penguins_non_missing = penguins[columns + [target_name]].dropna()
 
-# %%
-def f(data, weight=0, intercept=0):
-    target_predict = weight * data + intercept
-    return target_predict
+data = penguins_non_missing[columns]
+target = penguins_non_missing[target_name]
+data.head()
 
+# %% [markdown]
+# Now it is your turn to train a linear regression model on this dataset. First,
+# create a linear regression model.
 
 # %%
 # solution
-predictions = f(data, weight=1.2, intercept=-0.2)
+from sklearn.linear_model import LinearRegression
 
-# %% tags=["solution"]
-ax = sns.scatterplot(
-    data=full_data, x="data", y="target", color="black", alpha=0.5
-)
-_ = ax.plot(data, predictions)
+linear_regression = LinearRegression()
 
 # %% [markdown]
-# Compute the mean squared error for this model
+# Execute a cross-validation with 10 folds and use the mean absolute error (MAE)
+# as metric.
 
 # %%
 # solution
-from sklearn.metrics import mean_squared_error
-
-error = mean_squared_error(target, f(data, weight=1.2, intercept=-0.2))
-print(f"The MSE is {error}")
+from sklearn.model_selection import cross_validate
+
+cv_results = cross_validate(
+    linear_regression,
+    data,
+    target,
+    cv=10,
+    scoring="neg_mean_absolute_error",
+    n_jobs=2,
+)
 
 # %% [markdown]
-# Train a linear regression model on this dataset.
-#
-# ```{warning}
-# In scikit-learn, by convention `data` (also called `X` in the scikit-learn
-# documentation) should be a 2D matrix of shape `(n_samples, n_features)`.
-# If `data` is a 1D vector, you need to reshape it into a matrix with a
-# single column if the vector represents a feature or a single row if the
-# vector represents a sample.
-# ```
+# Compute the mean and std of the MAE in grams (g).
 
 # %%
-from sklearn.linear_model import LinearRegression
-
 # solution
-linear_regression = LinearRegression()
-data_2d = data.reshape(-1, 1)
-linear_regression.fit(data_2d, target)
+print(
+    "Mean absolute error on testing set with original features: "
+    f"{-cv_results['test_score'].mean():.3f} ± "
+    f"{cv_results['test_score'].std():.3f} g"
+)
 
 # %% [markdown]
-# Compute predictions from the linear regression model and plot both the data
-# and the predictions.
+# Now create a pipeline using `make_pipeline` consisting of a
+# `PolynomialFeatures` and a linear regression. Set `degree=2` and
+# `interaction_only=True` to the feature engineering step. Remember not to
+# include the bias to avoid redundancies with the linear's regression intercept.
+#
+# Use the same strategy as before to cross-validate such a pipeline.
 
 # %%
 # solution
-predictions = linear_regression.predict(data_2d)
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.pipeline import make_pipeline
 
-# %% tags=["solution"]
-ax = sns.scatterplot(
-    data=full_data, x="data", y="target", color="black", alpha=0.5
+poly_features = PolynomialFeatures(
+    degree=2, include_bias=False, interaction_only=True
+)
+linear_regression_interactions = make_pipeline(
+    poly_features, linear_regression
+)
+
+cv_results = cross_validate(
+    linear_regression_interactions,
+    data,
+    target,
+    cv=10,
+    scoring="neg_mean_absolute_error",
+    n_jobs=2,
 )
-_ = ax.plot(data, predictions)
 
 # %% [markdown]
-# Compute the mean squared error
+# Compute the mean and std of the MAE in grams (g) and compare with the results
+# without feature engineering.
 
 # %%
 # solution
-error = mean_squared_error(target, predictions)
-print(f"The MSE is {error}")
+print(
+    "Mean absolute error on testing set with interactions: "
+    f"{-cv_results['test_score'].mean():.3f} ± "
+    f"{cv_results['test_score'].std():.3f} g"
+)
+
+# %% [markdown] tags=["solution"]
+# We observe that the mean absolute error is lower and less spread with the
+# enriched features. In this case the "interactions" are indeed predictive. In
+# the following notebook we will see what happens when the enriched features are
+# non-predictive and how to deal with this case.
diff --git a/python_scripts/linear_models_sol_03.py b/python_scripts/linear_models_sol_03.py
index 0cacfcf0d..d789c8522 100644
--- a/python_scripts/linear_models_sol_03.py
+++ b/python_scripts/linear_models_sol_03.py
@@ -8,24 +8,14 @@
 # %% [markdown]
 # # 📃 Solution for Exercise M4.03
 #
-# In the previous notebook, we showed that we can add new features based on the
-# original feature to make the model more expressive, for instance `x ** 2` or `x ** 3`.
-# In that case we only used a single feature in `data`.
+# The parameter `penalty` can control the **type** of regularization to use,
+# whereas the regularization **strength** is set using the parameter `C`.
+# Setting`penalty="none"` is equivalent to an infinitely large value of `C`. In
+# this exercise, we ask you to train a logistic regression classifier using the
+# `penalty="l2"` regularization (which happens to be the default in
+# scikit-learn) to find by yourself the effect of the parameter `C`.
 #
-# The aim of this notebook is to train a linear regression algorithm on a
-# dataset with more than a single feature. In such a "multi-dimensional" feature
-# space we can derive new features of the form `x1 * x2`, `x2 * x3`,
-# etc. Products of features are usually called "non-linear or
-# multiplicative interactions" between features.
-#
-# Feature engineering can be an important step of a model pipeline as long as
-# the new features are expected to be predictive. For instance, think of a
-# classification model to decide if a patient has risk of developing a heart
-# disease. This would depend on the patient's Body Mass Index which is defined
-# as `weight / height ** 2`.
-#
-# We load the dataset penguins dataset. We first use a set of 3 numerical
-# features to predict the target, i.e. the body mass of the penguin.
+# We start by loading the dataset.
 
 # %% [markdown]
 # ```{note}
@@ -36,99 +26,97 @@
 # %%
 import pandas as pd
 
-penguins = pd.read_csv("../datasets/penguins.csv")
-
-columns = ["Flipper Length (mm)", "Culmen Length (mm)", "Culmen Depth (mm)"]
-target_name = "Body Mass (g)"
-
-# Remove lines with missing values for the columns of interest
-penguins_non_missing = penguins[columns + [target_name]].dropna()
-
-data = penguins_non_missing[columns]
-target = penguins_non_missing[target_name]
-data.head()
+penguins = pd.read_csv("../datasets/penguins_classification.csv")
+# only keep the Adelie and Chinstrap classes
+penguins = (
+    penguins.set_index("Species").loc[["Adelie", "Chinstrap"]].reset_index()
+)
 
-# %% [markdown]
-# Now it is your turn to train a linear regression model on this dataset. First,
-# create a linear regression model.
+culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
+target_column = "Species"
 
 # %%
-# solution
-from sklearn.linear_model import LinearRegression
+from sklearn.model_selection import train_test_split
 
-linear_regression = LinearRegression()
+penguins_train, penguins_test = train_test_split(penguins, random_state=0)
 
-# %% [markdown]
-# Execute a cross-validation with 10 folds and use the mean absolute error (MAE)
-# as metric.
+data_train = penguins_train[culmen_columns]
+data_test = penguins_test[culmen_columns]
 
-# %%
-# solution
-from sklearn.model_selection import cross_validate
-
-cv_results = cross_validate(
-    linear_regression,
-    data,
-    target,
-    cv=10,
-    scoring="neg_mean_absolute_error",
-    n_jobs=2,
-)
+target_train = penguins_train[target_column]
+target_test = penguins_test[target_column]
 
 # %% [markdown]
-# Compute the mean and std of the MAE in grams (g).
+# First, let's create our predictive model.
 
 # %%
-# solution
-print(
-    "Mean absolute error on testing set with original features: "
-    f"{-cv_results['test_score'].mean():.3f} ± "
-    f"{cv_results['test_score'].std():.3f} g"
+from sklearn.pipeline import make_pipeline
+from sklearn.preprocessing import StandardScaler
+from sklearn.linear_model import LogisticRegression
+
+logistic_regression = make_pipeline(
+    StandardScaler(), LogisticRegression(penalty="l2")
 )
 
 # %% [markdown]
-# Now create a pipeline using `make_pipeline` consisting of a
-# `PolynomialFeatures` and a linear regression. Set `degree=2` and
-# `interaction_only=True` to the feature engineering step. Remember not to
-# include the bias to avoid redundancies with the linear's regression intercept.
-#
-# Use the same strategy as before to cross-validate such a pipeline.
+# Given the following candidates for the `C` parameter, find out the impact of
+# `C` on the classifier decision boundary. You can use
+# `sklearn.inspection.DecisionBoundaryDisplay.from_estimator` to plot the
+# decision function boundary.
 
 # %%
-# solution
-from sklearn.preprocessing import PolynomialFeatures
-from sklearn.pipeline import make_pipeline
-
-poly_features = PolynomialFeatures(
-    degree=2, include_bias=False, interaction_only=True
-)
-linear_regression_interactions = make_pipeline(
-    poly_features, linear_regression
-)
+Cs = [0.01, 0.1, 1, 10]
 
-cv_results = cross_validate(
-    linear_regression_interactions,
-    data,
-    target,
-    cv=10,
-    scoring="neg_mean_absolute_error",
-    n_jobs=2,
-)
+# solution
+import matplotlib.pyplot as plt
+import seaborn as sns
+from sklearn.inspection import DecisionBoundaryDisplay
+
+for C in Cs:
+    logistic_regression.set_params(logisticregression__C=C)
+    logistic_regression.fit(data_train, target_train)
+    accuracy = logistic_regression.score(data_test, target_test)
+
+    DecisionBoundaryDisplay.from_estimator(
+        logistic_regression,
+        data_test,
+        response_method="predict",
+        cmap="RdBu_r",
+        alpha=0.5,
+    )
+    sns.scatterplot(
+        data=penguins_test,
+        x=culmen_columns[0],
+        y=culmen_columns[1],
+        hue=target_column,
+        palette=["tab:red", "tab:blue"],
+    )
+    plt.legend(bbox_to_anchor=(1.05, 0.8), loc="upper left")
+    plt.title(f"C: {C} \n Accuracy on the test set: {accuracy:.2f}")
 
 # %% [markdown]
-# Compute the mean and std of the MAE in grams (g) and compare with the results
-# without feature engineering.
+# Look at the impact of the `C` hyperparameter on the magnitude of the weights.
 
 # %%
 # solution
-print(
-    "Mean absolute error on testing set with interactions: "
-    f"{-cv_results['test_score'].mean():.3f} ± "
-    f"{cv_results['test_score'].std():.3f} g"
-)
+weights_ridge = []
+for C in Cs:
+    logistic_regression.set_params(logisticregression__C=C)
+    logistic_regression.fit(data_train, target_train)
+    coefs = logistic_regression[-1].coef_[0]
+    weights_ridge.append(pd.Series(coefs, index=culmen_columns))
+
+# %% tags=["solution"]
+weights_ridge = pd.concat(weights_ridge, axis=1, keys=[f"C: {C}" for C in Cs])
+weights_ridge.plot.barh()
+_ = plt.title("LogisticRegression weights depending of C")
 
 # %% [markdown] tags=["solution"]
-# We observe that the mean absolute error is lower and less spread with the
-# enriched features. In this case the "interactions" are indeed predictive. In
-# the following notebook we will see what happens when the enriched features are
-# non-predictive and how to deal with this case.
+# We see that a small `C` will shrink the weights values toward zero. It means
+# that a small `C` provides a more regularized model. Thus, `C` is the inverse
+# of the `alpha` coefficient in the `Ridge` model.
+#
+# Besides, with a strong penalty (i.e. small `C` value), the weight of the
+# feature "Culmen Depth (mm)" is almost zero. It explains why the decision
+# separation in the plot is almost perpendicular to the "Culmen Length (mm)"
+# feature.
diff --git a/python_scripts/linear_models_sol_04.py b/python_scripts/linear_models_sol_04.py
deleted file mode 100644
index a759c3d24..000000000
--- a/python_scripts/linear_models_sol_04.py
+++ /dev/null
@@ -1,269 +0,0 @@
-# ---
-# jupyter:
-#   kernelspec:
-#     display_name: Python 3
-#     name: python3
-# ---
-
-# %% [markdown]
-# # 📃 Solution for Exercise M4.04
-#
-# In the previous notebook, we saw the effect of applying some regularization on
-# the coefficient of a linear model.
-#
-# In this exercise, we will study the advantage of using some regularization
-# when dealing with correlated features.
-#
-# We will first create a regression dataset. This dataset will contain 2,000
-# samples and 5 features from which only 2 features will be informative.
-
-# %%
-from sklearn.datasets import make_regression
-
-data, target, coef = make_regression(
-    n_samples=2_000,
-    n_features=5,
-    n_informative=2,
-    shuffle=False,
-    coef=True,
-    random_state=0,
-    noise=30,
-)
-
-# %% [markdown]
-# When creating the dataset, `make_regression` returns the true coefficient used
-# to generate the dataset. Let's plot this information.
-
-# %%
-import pandas as pd
-
-feature_names = [
-    "Relevant feature #0",
-    "Relevant feature #1",
-    "Noisy feature #0",
-    "Noisy feature #1",
-    "Noisy feature #2",
-]
-coef = pd.Series(coef, index=feature_names)
-coef.plot.barh()
-coef
-
-# %% [markdown]
-# Create a `LinearRegression` regressor and fit on the entire dataset and check
-# the value of the coefficients. Are the coefficients of the linear regressor
-# close to the coefficients used to generate the dataset?
-
-# %%
-# solution
-from sklearn.linear_model import LinearRegression
-
-linear_regression = LinearRegression()
-linear_regression.fit(data, target)
-linear_regression.coef_
-
-# %% tags=["solution"]
-feature_names = [
-    "Relevant feature #0",
-    "Relevant feature #1",
-    "Noisy feature #0",
-    "Noisy feature #1",
-    "Noisy feature #2",
-]
-coef = pd.Series(linear_regression.coef_, index=feature_names)
-_ = coef.plot.barh()
-
-# %% [markdown] tags=["solution"]
-# We see that the coefficients are close to the coefficients used to generate
-# the dataset. The dispersion is indeed cause by the noise injected during the
-# dataset generation.
-
-# %% [markdown]
-# Now, create a new dataset that will be the same as `data` with 4 additional
-# columns that will repeat twice features 0 and 1. This procedure will create
-# perfectly correlated features.
-
-# %%
-# solution
-import numpy as np
-
-data = np.concatenate([data, data[:, [0, 1]], data[:, [0, 1]]], axis=1)
-
-# %% [markdown]
-# Fit again the linear regressor on this new dataset and check the coefficients.
-# What do you observe?
-
-# %%
-# solution
-linear_regression = LinearRegression()
-linear_regression.fit(data, target)
-linear_regression.coef_
-
-# %% tags=["solution"]
-feature_names = [
-    "Relevant feature #0",
-    "Relevant feature #1",
-    "Noisy feature #0",
-    "Noisy feature #1",
-    "Noisy feature #2",
-    "First repetition of feature #0",
-    "First repetition of  feature #1",
-    "Second repetition of  feature #0",
-    "Second repetition of  feature #1",
-]
-coef = pd.Series(linear_regression.coef_, index=feature_names)
-_ = coef.plot.barh()
-
-# %% [markdown] tags=["solution"]
-# We see that the coefficient values are far from what one could expect. By
-# repeating the informative features, one would have expected these coefficients
-# to be similarly informative.
-#
-# Instead, we see that some coefficients have a huge norm ~1e14. It indeed means
-# that we try to solve an mathematical ill-posed problem. Indeed, finding
-# coefficients in a linear regression involves inverting the matrix
-# `np.dot(data.T, data)` which is not possible (or lead to high numerical
-# errors).
-
-# %% [markdown]
-# Create a ridge regressor and fit on the same dataset. Check the coefficients.
-# What do you observe?
-
-# %%
-# solution
-from sklearn.linear_model import Ridge
-
-ridge = Ridge()
-ridge.fit(data, target)
-ridge.coef_
-
-# %% tags=["solution"]
-coef = pd.Series(ridge.coef_, index=feature_names)
-_ = coef.plot.barh()
-
-# %% [markdown] tags=["solution"]
-# We see that the penalty applied on the weights give a better results: the
-# values of the coefficients do not suffer from numerical issues. Indeed, the
-# matrix to be inverted internally is `np.dot(data.T, data) + alpha * I`. Adding
-# this penalty `alpha` allow the inversion without numerical issue.
-
-# %% [markdown]
-# Can you find the relationship between the ridge coefficients and the original
-# coefficients?
-
-# %%
-# solution
-ridge.coef_[:5] * 3
-
-# %% [markdown] tags=["solution"]
-# Repeating three times each informative features induced to divide the ridge
-# coefficients by three.
-
-# %% [markdown] tags=["solution"]
-# ```{tip}
-# We advise to always use a penalty to shrink the magnitude of the weights
-# toward zero (also called "l2 penalty"). In scikit-learn, `LogisticRegression`
-# applies such penalty by default. However, one needs to use `Ridge` (and even
-# `RidgeCV` to tune the parameter `alpha`) instead of `LinearRegression`.
-#
-# Other kinds of regularizations exist but will not be covered in this course.
-# ```
-#
-# ## Dealing with correlation between one-hot encoded features
-#
-# In this section, we will focus on how to deal with correlated features that
-# arise naturally when one-hot encoding categorical features.
-#
-# Let's first load the Ames housing dataset and take a subset of features that
-# are only categorical features.
-
-# %% tags=["solution"]
-import pandas as pd
-from sklearn.model_selection import train_test_split
-
-ames_housing = pd.read_csv("../datasets/house_prices.csv", na_values="?")
-ames_housing = ames_housing.drop(columns="Id")
-
-categorical_columns = ["Street", "Foundation", "CentralAir", "PavedDrive"]
-target_name = "SalePrice"
-X, y = ames_housing[categorical_columns], ames_housing[target_name]
-
-X_train, X_test, y_train, y_test = train_test_split(
-    X, y, test_size=0.2, random_state=0
-)
-
-# %% [markdown] tags=["solution"]
-#
-# We previously presented that a `OneHotEncoder` creates as many columns as
-# categories. Therefore, there is always one column (i.e. one encoded category)
-# that can be inferred from the others. Thus, `OneHotEncoder` creates collinear
-# features.
-#
-# We illustrate this behaviour by considering the "CentralAir" feature that
-# contains only two categories:
-
-# %% tags=["solution"]
-X_train["CentralAir"]
-
-# %% tags=["solution"]
-from sklearn.preprocessing import OneHotEncoder
-
-single_feature = ["CentralAir"]
-encoder = OneHotEncoder(sparse_output=False, dtype=np.int32)
-X_trans = encoder.fit_transform(X_train[single_feature])
-X_trans = pd.DataFrame(
-    X_trans,
-    columns=encoder.get_feature_names_out(input_features=single_feature),
-)
-X_trans
-
-# %% [markdown] tags=["solution"]
-#
-# Here, we see that the encoded category "CentralAir_N" is the opposite of the
-# encoded category "CentralAir_Y". Therefore, we observe that using a
-# `OneHotEncoder` creates two features having the problematic pattern observed
-# earlier in this exercise. Training a linear regression model on such a of
-# one-hot encoded binary feature can therefore lead to numerical problems,
-# especially without regularization. Furthermore, the two one-hot features are
-# redundant as they encode exactly the same information in opposite ways.
-#
-# Using regularization helps to overcome the numerical issues that we
-# highlighted earlier in this exercise.
-#
-# Another strategy is to arbitrarily drop one of the encoded categories.
-# Scikit-learn provides such an option by setting the parameter `drop` in the
-# `OneHotEncoder`. This parameter can be set to `first` to always drop the first
-# encoded category or `binary_only` to only drop a column in the case of binary
-# categories.
-
-# %% tags=["solution"]
-encoder = OneHotEncoder(drop="first", sparse_output=False, dtype=np.int32)
-X_trans = encoder.fit_transform(X_train[single_feature])
-X_trans = pd.DataFrame(
-    X_trans,
-    columns=encoder.get_feature_names_out(input_features=single_feature),
-)
-X_trans
-
-# %% [markdown] tags=["solution"]
-#
-# We see that only the second column of the previous encoded data is kept.
-# Dropping one of the one-hot encoded column is a common practice, especially
-# for binary categorical features. Note however that this breaks symmetry
-# between categories and impacts the number of coefficients of the model, their
-# values, and thus their meaning, especially when applying strong
-# regularization.
-#
-# Let's finally illustrate how to use this option is a machine-learning
-# pipeline:
-
-# %% tags=["solution"]
-from sklearn.pipeline import make_pipeline
-
-model = make_pipeline(OneHotEncoder(drop="first", dtype=np.int32), Ridge())
-model.fit(X_train, y_train)
-n_categories = [X_train[col].nunique() for col in X_train.columns]
-print(f"R2 score on the testing set: {model.score(X_test, y_test):.2f}")
-print(
-    f"Our model contains {model[-1].coef_.size} features while "
-    f"{sum(n_categories)} categories are originally available."
-)
diff --git a/python_scripts/linear_models_sol_05.py b/python_scripts/linear_models_sol_05.py
deleted file mode 100644
index bc4a15df1..000000000
--- a/python_scripts/linear_models_sol_05.py
+++ /dev/null
@@ -1,123 +0,0 @@
-# ---
-# jupyter:
-#   kernelspec:
-#     display_name: Python 3
-#     name: python3
-# ---
-
-# %% [markdown]
-# # 📃 Solution for Exercise M4.05
-#
-# In the previous notebook we set `penalty="none"` to disable regularization
-# entirely. This parameter can also control the **type** of regularization to
-# use, whereas the regularization **strength** is set using the parameter `C`.
-# Setting`penalty="none"` is equivalent to an infinitely large value of `C`. In
-# this exercise, we ask you to train a logistic regression classifier using the
-# `penalty="l2"` regularization (which happens to be the default in
-# scikit-learn) to find by yourself the effect of the parameter `C`.
-#
-# We will start by loading the dataset.
-
-# %% [markdown]
-# ```{note}
-# If you want a deeper overview regarding this dataset, you can refer to the
-# Appendix - Datasets description section at the end of this MOOC.
-# ```
-
-# %%
-import pandas as pd
-
-penguins = pd.read_csv("../datasets/penguins_classification.csv")
-# only keep the Adelie and Chinstrap classes
-penguins = (
-    penguins.set_index("Species").loc[["Adelie", "Chinstrap"]].reset_index()
-)
-
-culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
-target_column = "Species"
-
-# %%
-from sklearn.model_selection import train_test_split
-
-penguins_train, penguins_test = train_test_split(penguins, random_state=0)
-
-data_train = penguins_train[culmen_columns]
-data_test = penguins_test[culmen_columns]
-
-target_train = penguins_train[target_column]
-target_test = penguins_test[target_column]
-
-# %% [markdown]
-# First, let's create our predictive model.
-
-# %%
-from sklearn.pipeline import make_pipeline
-from sklearn.preprocessing import StandardScaler
-from sklearn.linear_model import LogisticRegression
-
-logistic_regression = make_pipeline(
-    StandardScaler(), LogisticRegression(penalty="l2")
-)
-
-# %% [markdown]
-# Given the following candidates for the `C` parameter, find out the impact of
-# `C` on the classifier decision boundary. You can use
-# `sklearn.inspection.DecisionBoundaryDisplay.from_estimator` to plot the
-# decision function boundary.
-
-# %%
-Cs = [0.01, 0.1, 1, 10]
-
-# solution
-import matplotlib.pyplot as plt
-import seaborn as sns
-from sklearn.inspection import DecisionBoundaryDisplay
-
-for C in Cs:
-    logistic_regression.set_params(logisticregression__C=C)
-    logistic_regression.fit(data_train, target_train)
-    accuracy = logistic_regression.score(data_test, target_test)
-
-    DecisionBoundaryDisplay.from_estimator(
-        logistic_regression,
-        data_test,
-        response_method="predict",
-        cmap="RdBu_r",
-        alpha=0.5,
-    )
-    sns.scatterplot(
-        data=penguins_test,
-        x=culmen_columns[0],
-        y=culmen_columns[1],
-        hue=target_column,
-        palette=["tab:red", "tab:blue"],
-    )
-    plt.legend(bbox_to_anchor=(1.05, 0.8), loc="upper left")
-    plt.title(f"C: {C} \n Accuracy on the test set: {accuracy:.2f}")
-
-# %% [markdown]
-# Look at the impact of the `C` hyperparameter on the magnitude of the weights.
-
-# %%
-# solution
-weights_ridge = []
-for C in Cs:
-    logistic_regression.set_params(logisticregression__C=C)
-    logistic_regression.fit(data_train, target_train)
-    coefs = logistic_regression[-1].coef_[0]
-    weights_ridge.append(pd.Series(coefs, index=culmen_columns))
-
-# %% tags=["solution"]
-weights_ridge = pd.concat(weights_ridge, axis=1, keys=[f"C: {C}" for C in Cs])
-weights_ridge.plot.barh()
-_ = plt.title("LogisticRegression weights depending of C")
-
-# %% [markdown] tags=["solution"]
-# We see that a small `C` will shrink the weights values toward zero. It means
-# that a small `C` provides a more regularized model. Thus, `C` is the inverse
-# of the `alpha` coefficient in the `Ridge` model.
-#
-# Besides, with a strong penalty (i.e. small `C` value), the weight of the
-# feature "Culmen Depth (mm)" is almost zero. It explains why the decision
-# separation in the plot is almost perpendicular to the "Culmen Length (mm)"
-# feature.
diff --git a/python_scripts/logistic_regression.py b/python_scripts/logistic_regression.py
index 3156ebda0..45487341b 100644
--- a/python_scripts/logistic_regression.py
+++ b/python_scripts/logistic_regression.py
@@ -78,9 +78,7 @@
 from sklearn.preprocessing import StandardScaler
 from sklearn.linear_model import LogisticRegression
 
-logistic_regression = make_pipeline(
-    StandardScaler(), LogisticRegression(penalty=None)
-)
+logistic_regression = make_pipeline(StandardScaler(), LogisticRegression())
 logistic_regression.fit(data_train, target_train)
 accuracy = logistic_regression.score(data_test, target_test)
 print(f"Accuracy on test set: {accuracy:.3f}")
@@ -124,8 +122,7 @@
 
 # %% [markdown]
 # Thus, we see that our decision function is represented by a line separating
-# the 2 classes. We should also note that we did not impose any regularization
-# by setting the parameter `penalty` to `'none'`.
+# the 2 classes.
 #
 # Since the line is oblique, it means that we used a combination of both
 # features: