From 6ba12f20916e8e0886c0e0170bccfa94cdb1e58c Mon Sep 17 00:00:00 2001
From: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
Date: Mon, 23 Oct 2023 12:23:13 +0200
Subject: [PATCH] Synchronize quizzes (#735)

Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu>
---
 .../linear_models/linear_models_quiz_m4_03.md |   2 +-
 .../linear_models/linear_models_quiz_m4_05.md |   5 +-
 .../linear_models_wrap_up_quiz.md             | 143 +++++-------------
 .../wrap_up_quiz.md                           |  46 ++++--
 4 files changed, 76 insertions(+), 120 deletions(-)

diff --git a/jupyter-book/linear_models/linear_models_quiz_m4_03.md b/jupyter-book/linear_models/linear_models_quiz_m4_03.md
index eb458ad46..2989e3e66 100644
--- a/jupyter-book/linear_models/linear_models_quiz_m4_03.md
+++ b/jupyter-book/linear_models/linear_models_quiz_m4_03.md
@@ -33,7 +33,7 @@ _Select a single answer_
 Combining (one or more) feature engineering transformers in a single pipeline:
 
 - a) increases the expressivity of the model
-- b) ensures that models extrapolate accurately regardless of its distribution
+- b) ensures that models extrapolate accurately regardless of the distribution of the data
 - c) may require tuning additional hyperparameters
 - d) inherently prevents any underfitting
 
diff --git a/jupyter-book/linear_models/linear_models_quiz_m4_05.md b/jupyter-book/linear_models/linear_models_quiz_m4_05.md
index 1fe12883c..45ce1f9b8 100644
--- a/jupyter-book/linear_models/linear_models_quiz_m4_05.md
+++ b/jupyter-book/linear_models/linear_models_quiz_m4_05.md
@@ -25,10 +25,11 @@ _Select a single answer_
 +++
 
 ```{admonition} Question
-In logistic regression, increasing the regularization strength makes the model:
+In logistic regression, increasing the regularization strength (by
+decreasing the value of `C`) makes the model:
 
 - a) more likely to overfit to the training data
-- b) more flexible, fitting closely to the training data
+- b) more confident: the values returned by `predict_proba` are closer to 0 or 1
 - c) less complex, potentially underfitting the training data
 
 _Select a single answer_
diff --git a/jupyter-book/linear_models/linear_models_wrap_up_quiz.md b/jupyter-book/linear_models/linear_models_wrap_up_quiz.md
index 3f8f18d72..e903e29b2 100644
--- a/jupyter-book/linear_models/linear_models_wrap_up_quiz.md
+++ b/jupyter-book/linear_models/linear_models_wrap_up_quiz.md
@@ -153,132 +153,59 @@ _Select a single answer_
 
 +++
 
-Now, we will tackle a classification problem instead of a regression problem.
-Load the Adult Census dataset with the following snippet of code and we will
-work only with **numerical features**.
+So far we only used the list of `numerical_features` to build the predictive
+model. Now create a preprocessor to deal separately with the numerical and
+categorical columns:
 
-```python
-adult_census = pd.read_csv("../datasets/adult-census.csv")
-target = adult_census["class"]
-data = adult_census.select_dtypes(["integer", "floating"])
-data = data.drop(columns=["education-num"])
-```
-
-```{admonition} Question
-How many numerical features are present in the dataset contained in the
-variable `data`?
+- categorical features can be selected if they have an `object` data type;
+- use an `OneHotEncoder` to encode the categorical features;
+- numerical features should correspond to the `numerical_features` as defined
+  above. This is a subset of the features that are not an `object` data type;
+- use an `StandardScaler` to scale the numerical features.
 
-- a) 3
-- b) 4
-- c) 5
-
-_Select a single answer_
-```
-
-+++
+The last step of the pipeline should be a `RidgeCV` with the same set of `alphas`
+to evaluate as previously.
 
 ```{admonition} Question
-Compare the generalization performance using the accuracy of the two following
-predictive models using a 10-fold cross-validation:
-
-- a linear model composed of a `StandardScaler` and a `LogisticRegression`
-- a `DummyClassifier` predicting the most frequent class
-
-By comparing the cross-validation test scores of both models fold-to-fold, count the number
-of times the linear model has a better test score than the dummy classifier
-Select the range which this number belongs to:
+By comparing the cross-validation test scores fold-to-fold for the model with
+`numerical_features` only and the model with both `numerical_features` and
+`categorical_features`, count the number of times the simple model has a better
+test score than the model with all features. Select the range which this number
+belongs to:
 
-- a) [0, 3]: the linear model is substantially worse than the dummy classifier
+- a) [0, 3]: the simple model is consistently worse than the model with all features
 - b) [4, 6]: both models are almost equivalent
-- c) [7, 10]: the linear model is substantially better than the dummy classifier
+- c) [7, 10]: the simple model is consistently better than the model with all features
 
 _Select a single answer_
 ```
 
 +++
 
-```{admonition} Question
-What is the most important feature seen by the logistic regression?
-
-- a) `"age"`
-- b) `"capital-gain"`
-- c) `"capital-loss"`
-- d) `"hours-per-week"`
-
-_Select a single answer_
-```
-
-+++
-
-Now, we will work with **both numerical and categorical features**. You can
-load Adult Census with the following snippet:
-
-```python
-adult_census = pd.read_csv("../datasets/adult-census.csv")
-target = adult_census["class"]
-data = adult_census.drop(columns=["class", "education-num"])
-```
+In this Module we saw that non-linear feature engineering may yield a more
+predictive pipeline, as long as we take care of adjusting the regularization to
+avoid overfitting.
 
-Create a predictive model where the categorical data must be one-hot encoded,
-the numerical data must be scaled, and the predictor is a
-logistic regression classifier.
+Try this approach by building a new pipeline similar to the previous one but
+replacing the `StandardScaler` by a `SplineTransformer` (with default
+hyperparameter values) to better model the non-linear influence of the
+numerical features.
 
-Use the same 10-fold cross-validation strategy as above to evaluate this
-complex pipeline.
+Furthermore, let the new pipeline model feature interactions by adding a new
+`Nystroem` step between the preprocessor and the `RidgeCV` estimator. Set
+`kernel="poly"`, `degree=2` and `n_components=300` for this new feature
+engineering step.
 
 ```{admonition} Question
-Look at the cross-validation test scores for both models and count the number of
-times the model using both numerical and categorical features has a better
-test score than the model using only numerical features.
-Select the range which this number belongs to:
+By comparing the cross-validation test scores fold-to-fold for the model with
+both `numerical_features` and `categorical_features`, and the model that
+performs non-linear feature engineering; count the number of times the
+non-linear pipeline has a better test score than the model with simpler
+preprocessing. Select the range which this number belongs to:
 
-- a) [0, 3]: the model using both numerical and categorical features is
-  substantially worse than the model using only numerical features
+- a) [0, 3]: the new non-linear pipeline is consistently worse than the previous pipeline
 - b) [4, 6]: both models are almost equivalent
-- c) [7, 10]: the model using both numerical and categorical features is
-  substantially better than the model using only numerical features
+- c) [7, 10]: the new non-linear pipeline is consistently better than the previous pipeline
 
 _Select a single answer_
 ```
-
-+++
-
-For the following questions, you can use the following snippet to get the
-feature names after the preprocessing performed.
-
-```python
-preprocessor.fit(data)
-feature_names = (preprocessor.named_transformers_["onehotencoder"]
-                             .get_feature_names_out(categorical_columns)).tolist()
-feature_names += numerical_columns
-feature_names
-```
-
-There is as many feature names as coefficients in the last step of your
-predictive pipeline.
-
-```{admonition} Question
-Which of the following pair of features is most impacting the
-predictions of the logistic regression classifier based on
-the relative magnitude of its coefficients?
-
-- a) `"hours-per-week"` and `"native-country_Columbia"`
-- b) `"workclass_?"` and `"native_country_?"`
-- c) `"capital-gain"` and `"education_Doctorate"`
-
-_Select a single answer_
-```
-
-+++
-
-```{admonition} Question
-What is the effect of decreasing the `C` parameter on the coefficients?
-
-- a) shrinking the magnitude of the weights towards zeros
-- b) increasing the magnitude of the weights
-- c) reducing the weights' variance
-- d) increasing the weights' variance
-- e) it has no influence on the weights' variance
-
-_Select all answers that apply_
-```
diff --git a/jupyter-book/predictive_modeling_pipeline/wrap_up_quiz.md b/jupyter-book/predictive_modeling_pipeline/wrap_up_quiz.md
index 179d9ed09..65e1fa6df 100644
--- a/jupyter-book/predictive_modeling_pipeline/wrap_up_quiz.md
+++ b/jupyter-book/predictive_modeling_pipeline/wrap_up_quiz.md
@@ -127,15 +127,43 @@ can process both the numerical and categorical features together as follows:
   `OneHotEncoder`.
 
 ```{admonition} Question
-One way to compare two models is by comparing the cross-validation test scores
-of both models fold-to-fold, i.e. counting the number of folds where one model
-has a better test score than the other. Let's compare the model using all
-features with the model consisting of only numerical features. Select the range
-of folds where the former has a better test score than the latter:
-
-- a) [0, 3]: the pipeline using all features is substantially worse than the pipeline using only numerical feature
-- b) [4, 6]: both pipelines are almost equivalent
-- c) [7, 10]: the pipeline using all features is substantially better than the pipeline using only numerical feature
+What is the accuracy score obtained by 10-fold cross-validation of the pipeline
+using both the numerical and categorical features?
+
+- a) ~0.7
+- b) ~0.9
+- c) ~1.0
+
+_Select a single answer_
+```
+
++++
+
+One way to compare two models is by comparing their means, but small differences
+in performance measures might easily turn out to be merely by chance (e.g.
+when using random resampling during cross-validation), and not because one
+model predicts systematically better than the other.
+
+Another way is to compare cross-validation test scores of both models
+fold-to-fold, i.e. counting the number of folds where one model has a better
+test score than the other. This provides some extra information: are some
+partitions of the data making the classifaction task particularly easy or hard
+for both models?
+
+Let's visualize the second approach.
+
+![Fold-to-fold comparison](../../figures/numerical_pipeline_wrap_up_quiz_comparison.png)
+
+```{admonition} Question
+Select the true statement.
+
+The number of folds where the model using all features perform better than the
+model using only numerical features lies in the range:
+
+- a) [0, 3]: the model using all features is consistently worse
+- b) [4, 6]: both models are almost equivalent
+- c) [7, 10]: the model using all features is consistently better
+
 
 _Select a single answer_
 ```