Skip to content

Commit

Permalink
Synchronize quizzes (#735)
Browse files Browse the repository at this point in the history
Co-authored-by: ArturoAmorQ <[email protected]>
  • Loading branch information
ArturoAmorQ and ArturoAmorQ authored Oct 23, 2023
1 parent c9a7ad4 commit 6ba12f2
Show file tree
Hide file tree
Showing 4 changed files with 76 additions and 120 deletions.
2 changes: 1 addition & 1 deletion jupyter-book/linear_models/linear_models_quiz_m4_03.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ _Select a single answer_
Combining (one or more) feature engineering transformers in a single pipeline:
- a) increases the expressivity of the model
- b) ensures that models extrapolate accurately regardless of its distribution
- b) ensures that models extrapolate accurately regardless of the distribution of the data
- c) may require tuning additional hyperparameters
- d) inherently prevents any underfitting
Expand Down
5 changes: 3 additions & 2 deletions jupyter-book/linear_models/linear_models_quiz_m4_05.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,11 @@ _Select a single answer_
+++

```{admonition} Question
In logistic regression, increasing the regularization strength makes the model:
In logistic regression, increasing the regularization strength (by
decreasing the value of `C`) makes the model:
- a) more likely to overfit to the training data
- b) more flexible, fitting closely to the training data
- b) more confident: the values returned by `predict_proba` are closer to 0 or 1
- c) less complex, potentially underfitting the training data
_Select a single answer_
Expand Down
143 changes: 35 additions & 108 deletions jupyter-book/linear_models/linear_models_wrap_up_quiz.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,132 +153,59 @@ _Select a single answer_

+++

Now, we will tackle a classification problem instead of a regression problem.
Load the Adult Census dataset with the following snippet of code and we will
work only with **numerical features**.
So far we only used the list of `numerical_features` to build the predictive
model. Now create a preprocessor to deal separately with the numerical and
categorical columns:

```python
adult_census = pd.read_csv("../datasets/adult-census.csv")
target = adult_census["class"]
data = adult_census.select_dtypes(["integer", "floating"])
data = data.drop(columns=["education-num"])
```

```{admonition} Question
How many numerical features are present in the dataset contained in the
variable `data`?
- categorical features can be selected if they have an `object` data type;
- use an `OneHotEncoder` to encode the categorical features;
- numerical features should correspond to the `numerical_features` as defined
above. This is a subset of the features that are not an `object` data type;
- use an `StandardScaler` to scale the numerical features.

- a) 3
- b) 4
- c) 5
_Select a single answer_
```

+++
The last step of the pipeline should be a `RidgeCV` with the same set of `alphas`
to evaluate as previously.

```{admonition} Question
Compare the generalization performance using the accuracy of the two following
predictive models using a 10-fold cross-validation:
- a linear model composed of a `StandardScaler` and a `LogisticRegression`
- a `DummyClassifier` predicting the most frequent class
By comparing the cross-validation test scores of both models fold-to-fold, count the number
of times the linear model has a better test score than the dummy classifier
Select the range which this number belongs to:
By comparing the cross-validation test scores fold-to-fold for the model with
`numerical_features` only and the model with both `numerical_features` and
`categorical_features`, count the number of times the simple model has a better
test score than the model with all features. Select the range which this number
belongs to:
- a) [0, 3]: the linear model is substantially worse than the dummy classifier
- a) [0, 3]: the simple model is consistently worse than the model with all features
- b) [4, 6]: both models are almost equivalent
- c) [7, 10]: the linear model is substantially better than the dummy classifier
- c) [7, 10]: the simple model is consistently better than the model with all features
_Select a single answer_
```

+++

```{admonition} Question
What is the most important feature seen by the logistic regression?
- a) `"age"`
- b) `"capital-gain"`
- c) `"capital-loss"`
- d) `"hours-per-week"`
_Select a single answer_
```

+++

Now, we will work with **both numerical and categorical features**. You can
load Adult Census with the following snippet:

```python
adult_census = pd.read_csv("../datasets/adult-census.csv")
target = adult_census["class"]
data = adult_census.drop(columns=["class", "education-num"])
```
In this Module we saw that non-linear feature engineering may yield a more
predictive pipeline, as long as we take care of adjusting the regularization to
avoid overfitting.

Create a predictive model where the categorical data must be one-hot encoded,
the numerical data must be scaled, and the predictor is a
logistic regression classifier.
Try this approach by building a new pipeline similar to the previous one but
replacing the `StandardScaler` by a `SplineTransformer` (with default
hyperparameter values) to better model the non-linear influence of the
numerical features.

Use the same 10-fold cross-validation strategy as above to evaluate this
complex pipeline.
Furthermore, let the new pipeline model feature interactions by adding a new
`Nystroem` step between the preprocessor and the `RidgeCV` estimator. Set
`kernel="poly"`, `degree=2` and `n_components=300` for this new feature
engineering step.

```{admonition} Question
Look at the cross-validation test scores for both models and count the number of
times the model using both numerical and categorical features has a better
test score than the model using only numerical features.
Select the range which this number belongs to:
By comparing the cross-validation test scores fold-to-fold for the model with
both `numerical_features` and `categorical_features`, and the model that
performs non-linear feature engineering; count the number of times the
non-linear pipeline has a better test score than the model with simpler
preprocessing. Select the range which this number belongs to:
- a) [0, 3]: the model using both numerical and categorical features is
substantially worse than the model using only numerical features
- a) [0, 3]: the new non-linear pipeline is consistently worse than the previous pipeline
- b) [4, 6]: both models are almost equivalent
- c) [7, 10]: the model using both numerical and categorical features is
substantially better than the model using only numerical features
- c) [7, 10]: the new non-linear pipeline is consistently better than the previous pipeline
_Select a single answer_
```

+++

For the following questions, you can use the following snippet to get the
feature names after the preprocessing performed.

```python
preprocessor.fit(data)
feature_names = (preprocessor.named_transformers_["onehotencoder"]
.get_feature_names_out(categorical_columns)).tolist()
feature_names += numerical_columns
feature_names
```

There is as many feature names as coefficients in the last step of your
predictive pipeline.

```{admonition} Question
Which of the following pair of features is most impacting the
predictions of the logistic regression classifier based on
the relative magnitude of its coefficients?
- a) `"hours-per-week"` and `"native-country_Columbia"`
- b) `"workclass_?"` and `"native_country_?"`
- c) `"capital-gain"` and `"education_Doctorate"`
_Select a single answer_
```

+++

```{admonition} Question
What is the effect of decreasing the `C` parameter on the coefficients?
- a) shrinking the magnitude of the weights towards zeros
- b) increasing the magnitude of the weights
- c) reducing the weights' variance
- d) increasing the weights' variance
- e) it has no influence on the weights' variance
_Select all answers that apply_
```
46 changes: 37 additions & 9 deletions jupyter-book/predictive_modeling_pipeline/wrap_up_quiz.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,15 +127,43 @@ can process both the numerical and categorical features together as follows:
`OneHotEncoder`.

```{admonition} Question
One way to compare two models is by comparing the cross-validation test scores
of both models fold-to-fold, i.e. counting the number of folds where one model
has a better test score than the other. Let's compare the model using all
features with the model consisting of only numerical features. Select the range
of folds where the former has a better test score than the latter:
- a) [0, 3]: the pipeline using all features is substantially worse than the pipeline using only numerical feature
- b) [4, 6]: both pipelines are almost equivalent
- c) [7, 10]: the pipeline using all features is substantially better than the pipeline using only numerical feature
What is the accuracy score obtained by 10-fold cross-validation of the pipeline
using both the numerical and categorical features?
- a) ~0.7
- b) ~0.9
- c) ~1.0
_Select a single answer_
```

+++

One way to compare two models is by comparing their means, but small differences
in performance measures might easily turn out to be merely by chance (e.g.
when using random resampling during cross-validation), and not because one
model predicts systematically better than the other.

Another way is to compare cross-validation test scores of both models
fold-to-fold, i.e. counting the number of folds where one model has a better
test score than the other. This provides some extra information: are some
partitions of the data making the classifaction task particularly easy or hard
for both models?

Let's visualize the second approach.

![Fold-to-fold comparison](../../figures/numerical_pipeline_wrap_up_quiz_comparison.png)

```{admonition} Question
Select the true statement.
The number of folds where the model using all features perform better than the
model using only numerical features lies in the range:
- a) [0, 3]: the model using all features is consistently worse
- b) [4, 6]: both models are almost equivalent
- c) [7, 10]: the model using all features is consistently better
_Select a single answer_
```

0 comments on commit 6ba12f2

Please sign in to comment.