You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 28, 2022. It is now read-only.
As you can see if you run get_dummies on any feature since it is one hot encoding, the last column can be fully predicted from the rest of the columns, in fact, it is an XNOR relationship. So the correct way to use get_dummies is to use drop_first = True.
It's, of course, left to the user to write the get_dummies command but there is not talk about this issue in the notebook. If you agree this is a valid issue and the notebook needs to be changed, please update the instructions so that students will add the drop_first argument.
Hello !
I know of the finding donors project that uses get_dummies(). I don't remember if there are any others.
Please refer to this: pandas-dev/pandas#12042
As you can see if you run get_dummies on any feature since it is one hot encoding, the last column can be fully predicted from the rest of the columns, in fact, it is an XNOR relationship. So the correct way to use get_dummies is to use drop_first = True.
It's, of course, left to the user to write the get_dummies command but there is not talk about this issue in the notebook. If you agree this is a valid issue and the notebook needs to be changed, please update the instructions so that students will add the drop_first argument.
If you do end up making this change, please acknowledge Nupur (https://discussions.udacity.com/t/how-to-avoid-collinearity-problem-with-pd-getdummies/284692) who pointed this out.
The text was updated successfully, but these errors were encountered: