ENH Convert some of the Wrap-up M4 content into exercise #731

ArturoAmorQ · 2023-10-17T12:55:19Z

Fixes #707.

Follows #711, which made the logistic_regression_non_linear notebook redundant. This PR creates an exercise to show the use of feature engineering using a more realistic dataset. In particular, demonstrates feature interaction when using one-hot encoding.

Note: I had to build the exercises from the solutions to correctly render the index.

ArturoAmorQ · 2023-10-17T14:55:41Z

python_scripts/linear_models_ex_03.py

+# preprocessor.fit(data)
+# feature_names = (
+#     preprocessor.named_transformers_["onehotencoder"].get_feature_names_out(
+#         categorical_columns
+#     )
+# ).tolist()
+# feature_names += numerical_columns
+# feature_names


For info: I had to comment these lines by hand as it was rising a flake8 error F821 undefined name when building the exercise from the solution.

We may need to think of a better way to avoid this situation in the future.

Maybe we can add the F821 failure to the ignore list in the flake8 configuration. Since we run all the code of the notebooks, including the solutions when building the jupyterbook we should be safe. The only code we do not check automatically are the solutions to the wrap up quiz but they are in the private repo.

The issue here is that preprocessor is defined in the solution but not in the exercise. So I think it will raise an error when building the jupyterbook.

Indeed. We can keep it that way then.

Maybe add a comment to state to reuse the preprocessor variable defined in the solution of the previous question.

ogrisel · 2023-10-18T17:03:42Z

Similarly to the private review made on the wrap up quiz, I am worried that fitting such a pipeline might require too much RAM and CPU on jupyterhub and as a result might crash or be painfully slow to execute. We should either trim the number of categorical features using min_frequency or max_categories in OneHotEncoder or alternatively use a Nystroem approximation with n_components between 100 and 1000.

Furthermore, I would also replace standard scaling by SplineTransformer to get a more expressive model.

…rap_up_M4_content

ArturoAmorQ · 2023-10-23T09:31:50Z

Furthermore, I would also replace standard scaling by SplineTransformer to get a more expressive model.

I decided not to do so because feature engineering actually degraded the performance of the spline model.

python_scripts/linear_models_ex_03.py

ogrisel · 2023-10-26T10:15:14Z

Hum, it seems that I broken the jupyter preview...

ArturoAmorQ · 2023-10-27T09:17:51Z

I will have to merge this PR in its current state to possibly debug the synchro of notebooks and their impact on FUN. We can always iterate on it on future PRs.

) 008cff4

ArturoAmorQ added 7 commits October 17, 2023 14:49

ENH Convert some of the Wrap-up M4 content into exercise

846a9f6

Fix CI

280c4c9

Remove content related to regularization

c97f7ec

Exercise M4.04 fixes

4a6e132

Exercise M4.03 fixes

480e401

Iter

b704279

Build exercises from solutions

cc62433

ArturoAmorQ commented Oct 17, 2023

View reviewed changes

ArturoAmorQ added 3 commits October 20, 2023 12:06

Add F821 failure to the ignore list in flake8 config

afba33f

Revert add F821 failure to the ignore list in flake8 config

9518a17

Use min_frequency to trim number of categorical features

c472672

ArturoAmorQ mentioned this pull request Oct 20, 2023

Synchronize quizzes #735

Merged

Merge branch 'main' of github.com:INRIA/scikit-learn-mooc into move_W…

a479fcd

…rap_up_M4_content

Solve conflicts

c4fed31

ArturoAmorQ commented Oct 25, 2023

View reviewed changes

python_scripts/linear_models_ex_03.py Outdated Show resolved Hide resolved

Revert unnecessary change

f2e4f01

Better way to avoid flake8 undefined name error

28986c2

ArturoAmorQ merged commit 008cff4 into INRIA:main Oct 27, 2023
2 checks passed

ArturoAmorQ deleted the move_Wrap_up_M4_content branch October 27, 2023 09:22

github-actions bot pushed a commit that referenced this pull request Oct 27, 2023

[ci skip] ENH Convert some of the Wrap-up M4 content into exercise (#731

adf38a8

) 008cff4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Convert some of the Wrap-up M4 content into exercise #731

ENH Convert some of the Wrap-up M4 content into exercise #731

ArturoAmorQ commented Oct 17, 2023 •

edited

Loading

ArturoAmorQ Oct 17, 2023 •

edited

Loading

ogrisel Oct 18, 2023

ArturoAmorQ Oct 20, 2023

ogrisel Oct 20, 2023

ogrisel commented Oct 18, 2023

ArturoAmorQ commented Oct 23, 2023

ogrisel commented Oct 26, 2023

ArturoAmorQ commented Oct 27, 2023

ENH Convert some of the Wrap-up M4 content into exercise #731

ENH Convert some of the Wrap-up M4 content into exercise #731

Conversation

ArturoAmorQ commented Oct 17, 2023 • edited Loading

ArturoAmorQ Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

ogrisel Oct 18, 2023

Choose a reason for hiding this comment

ArturoAmorQ Oct 20, 2023

Choose a reason for hiding this comment

ogrisel Oct 20, 2023

Choose a reason for hiding this comment

ogrisel commented Oct 18, 2023

ArturoAmorQ commented Oct 23, 2023

ogrisel commented Oct 26, 2023

ArturoAmorQ commented Oct 27, 2023

ArturoAmorQ commented Oct 17, 2023 •

edited

Loading

ArturoAmorQ Oct 17, 2023 •

edited

Loading