Remove M4.02 and rename other exercises accordingly

INRIA · Aug 31, 2023 · 513d696 · 513d696
1 parent fc4553f
commit 513d696
Show file tree

Hide file tree

Showing 11 changed files with 326 additions and 1,016 deletions.
diff --git a/jupyter-book/_toc.yml b/jupyter-book/_toc.yml
@@ -99,20 +99,18 @@ parts:
     - file: linear_models/linear_models_quiz_m4_02
   - file: linear_models/linear_models_non_linear_index
     sections:
+    - file: python_scripts/linear_regression_non_linear_link
     - file: python_scripts/linear_models_ex_02
     - file: python_scripts/linear_models_sol_02
-    - file: python_scripts/linear_regression_non_linear_link
-    - file: python_scripts/linear_models_ex_03
-    - file: python_scripts/linear_models_sol_03
     - file: python_scripts/logistic_regression_non_linear
     - file: linear_models/linear_models_quiz_m4_03
   - file: linear_models/linear_models_regularization_index
     sections:
     - file: linear_models/regularized_linear_models_slides
     - file: python_scripts/linear_models_regularization
     - file: linear_models/linear_models_quiz_m4_04
-    - file: python_scripts/linear_models_ex_04
-    - file: python_scripts/linear_models_sol_04
+    - file: python_scripts/linear_models_ex_03
+    - file: python_scripts/linear_models_sol_03
     - file: linear_models/linear_models_quiz_m4_05
   - file: linear_models/linear_models_wrap_up_quiz
   - file: linear_models/linear_models_module_take_away

diff --git a/notebooks/linear_models_ex_02.ipynb b/notebooks/linear_models_ex_02.ipynb
@@ -4,39 +4,19 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# \ud83d\udcdd Exercise M4.02\n",
+    "# \ud83d\udcdd Exercise M4.03\n",
     "\n",
-    "The goal of this exercise is to build an intuition on what will be the\n",
-    "parameters' values of a linear model when the link between the data and the\n",
-    "target is non-linear.\n",
+    "In all previous notebooks, we only used a single feature in `data`. But we\n",
+    "have already shown that we could add new features to make the model more\n",
+    "expressive by deriving new features, based on the original feature.\n",
     "\n",
-    "First, we will generate such non-linear data.\n",
+    "The aim of this notebook is to train a linear regression algorithm on a\n",
+    "dataset with more than a single feature.\n",
     "\n",
-    "<div class=\"admonition tip alert alert-warning\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Tip</p>\n",
-    "<p class=\"last\"><tt class=\"docutils literal\">np.random.RandomState</tt> allows to create a random number generator which can\n",
-    "be later used to get deterministic results.</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "\n",
-    "# Set the seed for reproduction\n",
-    "rng = np.random.RandomState(0)\n",
-    "\n",
-    "# Generate data\n",
-    "n_sample = 100\n",
-    "data_max, data_min = 1.4, -1.4\n",
-    "len_data = data_max - data_min\n",
-    "data = rng.rand(n_sample) * len_data - len_data / 2\n",
-    "noise = rng.randn(n_sample) * 0.3\n",
-    "target = data**3 - 0.5 * data**2 + noise"
+    "We will load a dataset about house prices in California. The dataset consists\n",
+    "of 8 features regarding the demography and geography of districts in\n",
+    "California and the aim is to predict the median house price of each district.\n",
+    "We will use all 8 features to predict the target, the median house price."
    ]
   },
   {
@@ -45,8 +25,8 @@
    "source": [
     "<div class=\"admonition note alert alert-info\">\n",
     "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
-    "<p class=\"last\">To ease the plotting, we will create a Pandas dataframe containing the data\n",
-    "and target</p>\n",
+    "<p class=\"last\">If you want a deeper overview regarding this dataset, you can refer to the\n",
+    "Appendix - Datasets description section at the end of this MOOC.</p>\n",
     "</div>"
    ]
   },
@@ -56,65 +36,19 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import pandas as pd\n",
+    "from sklearn.datasets import fetch_california_housing\n",
     "\n",
-    "full_data = pd.DataFrame({\"data\": data, \"target\": target})"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import seaborn as sns\n",
-    "\n",
-    "_ = sns.scatterplot(\n",
-    "    data=full_data, x=\"data\", y=\"target\", color=\"black\", alpha=0.5\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "lines_to_next_cell": 2
-   },
-   "source": [
-    "We observe that the link between the data `data` and vector `target` is\n",
-    "non-linear. For instance, `data` could represent the years of experience\n",
-    "(normalized) and `target` the salary (normalized). Therefore, the problem here\n",
-    "would be to infer the salary given the years of experience.\n",
-    "\n",
-    "Using the function `f` defined below, find both the `weight` and the\n",
-    "`intercept` that you think will lead to a good linear model. Plot both the\n",
-    "data and the predictions of this model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def f(data, weight=0, intercept=0):\n",
-    "    target_predict = weight * data + intercept\n",
-    "    return target_predict"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Write your code here."
+    "data, target = fetch_california_housing(as_frame=True, return_X_y=True)\n",
+    "target *= 100  # rescale the target in k$\n",
+    "data.head()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Compute the mean squared error for this model"
+    "Now it is your turn to train a linear regression model on this dataset. First,\n",
+    "create a linear regression model."
    ]
   },
   {
@@ -130,16 +64,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Train a linear regression model on this dataset.\n",
-    "\n",
-    "<div class=\"admonition warning alert alert-danger\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Warning</p>\n",
-    "<p class=\"last\">In scikit-learn, by convention <tt class=\"docutils literal\">data</tt> (also called <tt class=\"docutils literal\">X</tt> in the scikit-learn\n",
-    "documentation) should be a 2D matrix of shape <tt class=\"docutils literal\">(n_samples, n_features)</tt>.\n",
-    "If <tt class=\"docutils literal\">data</tt> is a 1D vector, you need to reshape it into a matrix with a\n",
-    "single column if the vector represents a feature or a single row if the\n",
-    "vector represents a sample.</p>\n",
-    "</div>"
+    "Execute a cross-validation with 10 folds and use the mean absolute error (MAE)\n",
+    "as metric. Be sure to *return* the fitted *estimators*."
    ]
   },
   {
@@ -148,17 +74,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from sklearn.linear_model import LinearRegression\n",
-    "\n",
     "# Write your code here."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Compute predictions from the linear regression model and plot both the data\n",
-    "and the predictions."
+    "Compute the mean and std of the MAE in thousands of dollars (k$)."
    ]
   },
   {
@@ -172,9 +95,15 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "lines_to_next_cell": 2
+   },
    "source": [
-    "Compute the mean squared error"
+    "Inspect the fitted model using a box plot to show the distribution of values\n",
+    "for the coefficients returned from the cross-validation. Hint: use the\n",
+    "function\n",
+    "[`df.plot.box()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.box.html)\n",
+    "to create a box plot."
    ]
   },
   {

diff --git a/notebooks/linear_models_ex_03.ipynb b/notebooks/linear_models_ex_03.ipynb