Skip to content

Commit

Permalink
Remove M4.02 and rename other exercises accordingly
Browse files Browse the repository at this point in the history
  • Loading branch information
ArturoAmorQ committed Aug 31, 2023
1 parent fc4553f commit 513d696
Show file tree
Hide file tree
Showing 11 changed files with 326 additions and 1,016 deletions.
8 changes: 3 additions & 5 deletions jupyter-book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,20 +99,18 @@ parts:
- file: linear_models/linear_models_quiz_m4_02
- file: linear_models/linear_models_non_linear_index
sections:
- file: python_scripts/linear_regression_non_linear_link
- file: python_scripts/linear_models_ex_02
- file: python_scripts/linear_models_sol_02
- file: python_scripts/linear_regression_non_linear_link
- file: python_scripts/linear_models_ex_03
- file: python_scripts/linear_models_sol_03
- file: python_scripts/logistic_regression_non_linear
- file: linear_models/linear_models_quiz_m4_03
- file: linear_models/linear_models_regularization_index
sections:
- file: linear_models/regularized_linear_models_slides
- file: python_scripts/linear_models_regularization
- file: linear_models/linear_models_quiz_m4_04
- file: python_scripts/linear_models_ex_04
- file: python_scripts/linear_models_sol_04
- file: python_scripts/linear_models_ex_03
- file: python_scripts/linear_models_sol_03
- file: linear_models/linear_models_quiz_m4_05
- file: linear_models/linear_models_wrap_up_quiz
- file: linear_models/linear_models_module_take_away
Expand Down
129 changes: 29 additions & 100 deletions notebooks/linear_models_ex_02.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,39 +4,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# \ud83d\udcdd Exercise M4.02\n",
"# \ud83d\udcdd Exercise M4.03\n",
"\n",
"The goal of this exercise is to build an intuition on what will be the\n",
"parameters' values of a linear model when the link between the data and the\n",
"target is non-linear.\n",
"In all previous notebooks, we only used a single feature in `data`. But we\n",
"have already shown that we could add new features to make the model more\n",
"expressive by deriving new features, based on the original feature.\n",
"\n",
"First, we will generate such non-linear data.\n",
"The aim of this notebook is to train a linear regression algorithm on a\n",
"dataset with more than a single feature.\n",
"\n",
"<div class=\"admonition tip alert alert-warning\">\n",
"<p class=\"first admonition-title\" style=\"font-weight: bold;\">Tip</p>\n",
"<p class=\"last\"><tt class=\"docutils literal\">np.random.RandomState</tt> allows to create a random number generator which can\n",
"be later used to get deterministic results.</p>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# Set the seed for reproduction\n",
"rng = np.random.RandomState(0)\n",
"\n",
"# Generate data\n",
"n_sample = 100\n",
"data_max, data_min = 1.4, -1.4\n",
"len_data = data_max - data_min\n",
"data = rng.rand(n_sample) * len_data - len_data / 2\n",
"noise = rng.randn(n_sample) * 0.3\n",
"target = data**3 - 0.5 * data**2 + noise"
"We will load a dataset about house prices in California. The dataset consists\n",
"of 8 features regarding the demography and geography of districts in\n",
"California and the aim is to predict the median house price of each district.\n",
"We will use all 8 features to predict the target, the median house price."
]
},
{
Expand All @@ -45,8 +25,8 @@
"source": [
"<div class=\"admonition note alert alert-info\">\n",
"<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
"<p class=\"last\">To ease the plotting, we will create a Pandas dataframe containing the data\n",
"and target</p>\n",
"<p class=\"last\">If you want a deeper overview regarding this dataset, you can refer to the\n",
"Appendix - Datasets description section at the end of this MOOC.</p>\n",
"</div>"
]
},
Expand All @@ -56,65 +36,19 @@
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from sklearn.datasets import fetch_california_housing\n",
"\n",
"full_data = pd.DataFrame({\"data\": data, \"target\": target})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import seaborn as sns\n",
"\n",
"_ = sns.scatterplot(\n",
" data=full_data, x=\"data\", y=\"target\", color=\"black\", alpha=0.5\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"lines_to_next_cell": 2
},
"source": [
"We observe that the link between the data `data` and vector `target` is\n",
"non-linear. For instance, `data` could represent the years of experience\n",
"(normalized) and `target` the salary (normalized). Therefore, the problem here\n",
"would be to infer the salary given the years of experience.\n",
"\n",
"Using the function `f` defined below, find both the `weight` and the\n",
"`intercept` that you think will lead to a good linear model. Plot both the\n",
"data and the predictions of this model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def f(data, weight=0, intercept=0):\n",
" target_predict = weight * data + intercept\n",
" return target_predict"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Write your code here."
"data, target = fetch_california_housing(as_frame=True, return_X_y=True)\n",
"target *= 100 # rescale the target in k$\n",
"data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compute the mean squared error for this model"
"Now it is your turn to train a linear regression model on this dataset. First,\n",
"create a linear regression model."
]
},
{
Expand All @@ -130,16 +64,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Train a linear regression model on this dataset.\n",
"\n",
"<div class=\"admonition warning alert alert-danger\">\n",
"<p class=\"first admonition-title\" style=\"font-weight: bold;\">Warning</p>\n",
"<p class=\"last\">In scikit-learn, by convention <tt class=\"docutils literal\">data</tt> (also called <tt class=\"docutils literal\">X</tt> in the scikit-learn\n",
"documentation) should be a 2D matrix of shape <tt class=\"docutils literal\">(n_samples, n_features)</tt>.\n",
"If <tt class=\"docutils literal\">data</tt> is a 1D vector, you need to reshape it into a matrix with a\n",
"single column if the vector represents a feature or a single row if the\n",
"vector represents a sample.</p>\n",
"</div>"
"Execute a cross-validation with 10 folds and use the mean absolute error (MAE)\n",
"as metric. Be sure to *return* the fitted *estimators*."
]
},
{
Expand All @@ -148,17 +74,14 @@
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LinearRegression\n",
"\n",
"# Write your code here."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compute predictions from the linear regression model and plot both the data\n",
"and the predictions."
"Compute the mean and std of the MAE in thousands of dollars (k$)."
]
},
{
Expand All @@ -172,9 +95,15 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"lines_to_next_cell": 2
},
"source": [
"Compute the mean squared error"
"Inspect the fitted model using a box plot to show the distribution of values\n",
"for the coefficients returned from the cross-validation. Hint: use the\n",
"function\n",
"[`df.plot.box()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.box.html)\n",
"to create a box plot."
]
},
{
Expand Down
130 changes: 0 additions & 130 deletions notebooks/linear_models_ex_03.ipynb

This file was deleted.

Loading

0 comments on commit 513d696

Please sign in to comment.