-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of duplicate columns in pandas.io.sql.read_frame #2738
Comments
There were a lot of dupe col bugs fixed in 9.1 (or 9.0, can't remember), so make sure you're using as for df.columns = new_columns not working, I get this with with git master:
if you're not getting the same behaviour on a recent version, please open an issue with |
Is this now fixed? (at least the behaviour after it's been read in via sql)? |
dups are pretty good in master now.... |
also, not sure it makes sense (in general) to dedupe pre-pandas... |
maybe add an option to the |
Is there already a method to do that, which is just be applied after reading? |
I would close this and just add to the master list...(but lower down) |
already added |
@hayd , I'm pushing to coalesce the bits and pieces of SQL around @mangecoeur recent work closing. |
Calling pandas.io.sql.read_frame can results in data frame with duplicate column names. For example when SQL query contains joins on tables with duplicate columns.
Data frames with duplicate column names cause errors in many pandas functions. I can't even rename columns as df.columns = new_columns generates errors.
I think correct behavior would be for pandas.io.sql.read_frame have an option to "deduplicate" column names (for example by adding a number) or generate an error with duplicate column names.
The text was updated successfully, but these errors were encountered: