Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add the observed parameter to get_dummies #60585

Open
1 of 3 tasks
alonme opened this issue Dec 17, 2024 · 0 comments
Open
1 of 3 tasks

ENH: Add the observed parameter to get_dummies #60585

alonme opened this issue Dec 17, 2024 · 0 comments
Labels
Categorical Categorical Data Type Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@alonme
Copy link
Contributor

alonme commented Dec 17, 2024

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The get_dummies function creates columns for all possible values of categorical series and not the ones that are observed, or are actually in the passed dataframe

Example:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'letter':['a','b','c']})

In [3]: pd.get_dummies(df[df['letter'] == 'a'])
Out[3]:
   letter_a
0         1

In [4]: df['letter'] = df['letter'].astype("category")

In [5]: pd.get_dummies(df[df['letter'] == 'a'])
Out[5]:
   letter_a  letter_b  letter_c
0         1         0         0

Feature Description

Add the observed parameter to the get_dummies function, which will have the same behavior as the parameter with the same name in the groupby functions

Alternative Solutions

  1. Change the behavior to always use the observed values only
  2. Document this behavior so its clear to users (users can remove the unneeded columns later if they want to)

Additional Context

No response

@alonme alonme added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 17, 2024
@asishm asishm added the Categorical Categorical Data Type label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants