ENH: get_dummies on DataFrames #8133

TomAugspurger · 2014-08-28T13:29:02Z

get_dummies currently just expects a Series.

In [17]: data
Out[17]: 
   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   

                                                Name     Sex  Age  SibSp  \
0                            Braund, Mr. Owen Harris    male   22      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female   38      1   

   Parch     Ticket     Fare Cabin Embarked  
0      0  A/5 21171   7.2500   NaN        S  
1      0   PC 17599  71.2833   C85        C

If it took DataFrames we could change the required call from

features = pd.concat([data.get(['Fare', 'Age']),
                      pd.get_dummies(data.Sex, prefix='Sex'),
                      pd.get_dummies(data.Pclass, prefix='Pclass'),
                      pd.get_dummies(data.Embarked, prefix='Embarked')],
                     axis=1)

to

features = pd.get_dummies(data)

We'll infer that things with object dtype need to be encoded as 0's and 1's, but also take arguments to explicitly encode a column, or not.

The column names in the output will automatically include the original column name as a prefix, which can be overridden by the prefix kwarg by passing a list or dictionary.

Same thing with prefix separators.

On NaN handling, I think we'll have one {prefix}_NaN output column per original column when dummy_na is True.

I've got some tests written already.

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2014-08-28T18:00:59Z

Ha, something like this already exists! convert_dummies in pandas/core/reshape, but it isn't exported under the pd. namespace, and I didn't find it in the documentaiton.

I'll think about whether to adjust that function at all, or just document it as is. I think the defaults can be improved a bit (which would be API changing) but I wonder if this function is ever used..,

TomAugspurger · 2014-08-28T18:06:42Z

Actually, what I have in mind should be backwards incompatible. It's changing a positional argument to a keyword argument, so we should be fine.

Turns out it was Wes who wrote this originally.

jreback · 2014-08-28T18:13:47Z

doesn't look like convert_dummies is used anywhere (internal/external).

so you can go ahead an integrate with get_dummies for functionaility as described above (which is prob more useful)

jorisvandenbossche · 2014-08-28T18:37:22Z

also not a single mention of convert_dummies on SO. I also would just integrate it in get_dummies with the API we want, instead of adding (or better publicizing) another function.

jreback added API Design labels Aug 28, 2014

TomAugspurger mentioned this issue Aug 29, 2014

ENH: let get_dummies take a DataFrame #8140

Merged

jreback added this to the 0.15.0 milestone Aug 29, 2014

TomAugspurger closed this as completed in #8140 Sep 1, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: get_dummies on DataFrames #8133

ENH: get_dummies on DataFrames #8133

TomAugspurger commented Aug 28, 2014

TomAugspurger commented Aug 28, 2014

TomAugspurger commented Aug 28, 2014

jreback commented Aug 28, 2014

jorisvandenbossche commented Aug 28, 2014

ENH: get_dummies on DataFrames #8133

ENH: get_dummies on DataFrames #8133

Comments

TomAugspurger commented Aug 28, 2014

TomAugspurger commented Aug 28, 2014

TomAugspurger commented Aug 28, 2014

jreback commented Aug 28, 2014

jorisvandenbossche commented Aug 28, 2014