Specify type "category" in read_sql #17862

sfromm-plm · 2017-10-13T13:10:03Z

Would it be possible to add a feature allowing one to specify that a particular column to be downloaded via read_sql has a particular data type? I'm particularly interested in forcing certain dataframe columns to be categorical, to conserve memory usage.

Yes, I could transform the data after receiving it from the database, but the datasets I'm working with are really large, causing huge memory usage high water marks. (An alternative is to break transfer the data in chunks and then transform the data in each chunk, but that complicates the code and, in my test runs, takes a little longer.)

The text was updated successfully, but these errors were encountered:

jreback · 2017-10-14T16:22:52Z

well if sql were to actually provide a dictionary / category encoded column then this would work. but this is generally an implementation detail of the particular backend and I don't believe is exposed to users.

you might want to look at #17790 which is highly performant.

jorisvandenbossche · 2017-10-16T09:24:30Z

We have had several similar enhancement requests, and I think we can add this (just like read_csv has one as well). See #13049, #6798, #17560
In general this is exactly the same as doing astype after reading, but for the case of using chunksize I think it can be easier to do this inside read_sql to every chunk is consistent.

jorisvandenbossche · 2017-10-16T09:26:35Z

@sfromm-plm But to be clear, @jreback is correct in that this won't really "work" for categorical type of data, in the sense that it will not give much memory usage gain. As the values are first returned from the database as plain values, and would only afterwards be converted to Categorical.

sfromm-plm · 2017-10-16T14:21:15Z

@jorisvandenbossche : Right. @jreback's answer wasn't immediately clear to me, but I settled on what you just said.

Thanks for the replies, guys, even if it wasn't what I was hoping to hear.

jreback closed this as completed Oct 14, 2017

jreback added Categorical Categorical Data Type IO SQL to_sql, read_sql, read_sql_query labels Oct 14, 2017

jreback added this to the No action milestone Oct 14, 2017

jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Oct 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify type "category" in read_sql #17862

Specify type "category" in read_sql #17862

sfromm-plm commented Oct 13, 2017

jreback commented Oct 14, 2017

jorisvandenbossche commented Oct 16, 2017

jorisvandenbossche commented Oct 16, 2017

sfromm-plm commented Oct 16, 2017

Specify type "category" in read_sql #17862

Specify type "category" in read_sql #17862

Comments

sfromm-plm commented Oct 13, 2017

jreback commented Oct 14, 2017

jorisvandenbossche commented Oct 16, 2017

jorisvandenbossche commented Oct 16, 2017

sfromm-plm commented Oct 16, 2017