Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: add index_label keyword to to_sql #6642

Merged

Conversation

jorisvandenbossche
Copy link
Member

Further work on #6292. While looking at possible multi-index support, I thought of first adding this:

to do:

  • should also change this in generic.py
  • check for name conflicts (warn and suggest to use index_label?)

@mangecoeur

@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

why don't you just have index accept a string / list / True

True means use the index name (raise if it is None)
string use this as the cindex label
list use if multiple cols (and raise if not a mi)

false / none are don't store index

instead of adding another kw

@jorisvandenbossche
Copy link
Member Author

I did that for consistency with to_csv (and to_excel)

@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

ok then!
didn't realize that

@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

should think about what to do if u have an unnamed column (I know you are defaulting it) but maybe should warn/raise? (I mean an unnamed index an no label is specified)

@jorisvandenbossche
Copy link
Member Author

What do you mean with an unnamed column?

Also, now 'pandas_index' is used when the index has no name (and not just 'index'). But maybe should also think about that. Is there any precedence somewhere in pandas? df.reset_index() just uses 'index'.

@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

yep maybe need to do exactly like reset index
have to worry about name conflicts though (what if their is a column named index )

@jorisvandenbossche
Copy link
Member Author

The name conflicts is now also a problem (if you would have a column named pandas_index). It doesn't give an error, but the column is overwritten by the index if the names are the same. So maybe just raise a warning for this?

@jreback
Copy link
Contributor

jreback commented Mar 15, 2014

yep

and I would name index
and level_0, etc for mi ( in fact u can just do a reset_index directly)

@mangecoeur
Copy link
Contributor

My concern with using the name "index" is that it is not allowed as a column name in certain SQL flavors (this is a problem for other reserved keywords too). See for example http://dev.mysql.com/doc/refman/5.6/en/reserved-words.html

Note - i don't know if they make the distinction between lower case and upper case versions

@jorisvandenbossche
Copy link
Member Author

@mangecoeur Good concern (and sql mostly makes no distinction between lower and upper case, with some exceptions).
But after some thinking about it, there are a lot of reserved keywords, so this could be a problem more in general. But the way this is dealt with, is by quoting those column names. For example a dataframe that I wrote to postgresql using to_sql, gets this description in postgresql:

CREATE TABLE test_column_keyword
(
  index integer,
  "select" integer,
  "Col2" double precision
)

So the select column is quoted because this is a reserved keyword, the Col2 column is also quoted because it is mixed with uppercase (and because postgresql otherwise converts everything to lowercase, the column name will not be recognised/reserved), and the index column is not quoted as this is not a reserved keyword in postgresql (but it is in MySQL).

So for postgresql this is working, and I suppose other database systems will have a similar mechanism to deal with this kind of column names?

@jreback jreback added this to the 0.14.0 milestone Mar 22, 2014
@jreback jreback added the SQL label Mar 22, 2014
@jorisvandenbossche
Copy link
Member Author

OK, to move forward, I am going to merge this. In any case will also have to touch it again (the naming issue) when adding multi-index support.

jorisvandenbossche added a commit that referenced this pull request Mar 28, 2014
SQL: add index_label keyword to to_sql
@jorisvandenbossche jorisvandenbossche merged commit de167f7 into pandas-dev:master Mar 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants